Native Compilation of Numba on M1-based Mac

# Native Compilation of Numba on M1-based Mac ###### tags: Blog > This is cross-published on my [website](https://ludger.fyi/blog/numba-mac/) ## Motivation The current installation instructions of [Numba](https://github.com/numba/numba) face some difficulties induced by the way Apple packages its software development toolkit. This note documents how to set Numba up from start to end on an M1-based Mac, and where alternate approaches fail. We are building with support for OpenMP to utilize all available threads, but without support for [Threading Building Blocks](https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Tbb-on-new-macOS-arm/td-p/1214887), which are unsupported on Mac, and likewise without support for CUDA, which is also not available on Mac. ## Building Numba from Source for Local Development with OpenMP-Support To build Numba on Mac (or any platform for that matter) we need to begin by creating a conda environment with the base-dependencies ```bash conda create -n numbaenv python=3.10 numba/label/dev::llvmlite numpy scipy jinja2 cffi ``` and activate the environment ```bash conda activate numbaenv ``` At which point we have installed [llvmlite](https://llvmlite.readthedocs.io/en/latest/) for the LLVM JIT-engine underpinning Numba, whose installaton we can now verify from a Python REPL. ```python import llvmlite llvmlite.__version__ ``` We can now clone the source of Numba. ```bash git clone git@github.com:numba/numba.git && cd numba ``` Preemptively disable Threading Building Blocks ```bash export NUMBA_DISABLE_TBB=1 ``` Point to the Mac-specific software development toolkit, the path to which can be found with `xcrun --show-sdk-path` ```bash export SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk ``` > The important pointer to set the `SDKROOT` was found in the [conda build-scripts](https://github.com/numba/numba/blob/main/buildscripts/incremental/build.sh) of Numba, but downloading the older software development toolkit, as done in the build script, is not necessary on a modern Mac if you have already downloaded the software development toolkit libraries. And install the OSX-specific Clang-compilers provided by Conda to not use Apple's system compiler which have OpenMP disabled, and cannot import OpenMP or accept the `-fopenmp` flag at compilation time. ```bash conda install clang clangdev ``` The command to build Numba itself, for development purposes with `--noopt` (no optimizations), and `--debug` (debugging options enables) is then ```bash python setup.py build_ext --inplace --noopt --debug ``` If we would just want to use it locally without the need for the development-focussed deactivation of optimizations we would build with just the `--inplace` option ```bash python setup.py build_ext --inplace ``` After which we can install the Numba wheel into our environment ```bash python -m pip install --no-deps -e . ``` And verify its installation either from the REPL with ```python import numba numba.__version__ ``` or from the command line with Numba's provided utility ```bash numba -s ``` The output should look similar to this ```bash System info: -------------------------------------------------------------------------------- __Time Stamp__ Report started (local time) : 2022-11-26 23:48:15.001253 UTC start time : 2022-11-27 05:48:15.001264 Running time (s) : 0.718704 __Hardware Information__ Machine : arm64 CPU Name : cyclone CPU Count : 10 Number of accessible CPUs : ? List of accessible CPUs cores : ? CFS Restrictions (CPUs worth of runtime) : None CPU Features : Memory Total (MB) : 32768 Free Memory (MB) : 23 __OS Information__ Platform Name : macOS-13.0.1-arm64-arm-64bit Platform Release : 22.1.0 OS Name : Darwin OS Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 OS Specific Version : 13.0.1 arm64 Libc Version : ? __Python Information__ Python Compiler : Clang 14.0.6 Python Implementation : CPython Python Version : 3.10.8 Python Locale : None.UTF-8 __Numba Toolchain Versions__ Numba Version : 0.57.0dev0+846.g728263512 llvmlite Version : 0.40.0dev0+43.g7783803 __LLVM Information__ LLVM Version : 11.1.0 __CUDA Information__ CUDA Device Initialized : False CUDA Driver Version : ? CUDA Runtime Version : ? CUDA NVIDIA Bindings Available : ? CUDA NVIDIA Bindings In Use : ? CUDA Detect Output: None CUDA Libraries Test Output: None __NumPy Information__ NumPy Version : 1.23.4 NumPy Supported SIMD features : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP') NumPy Supported SIMD dispatch : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM') NumPy Supported SIMD baseline : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD') NumPy AVX512_SKX support detected : False __SVML Information__ SVML State, config.USING_SVML : False SVML Library Loaded : False llvmlite Using SVML Patched LLVM : True SVML Operational : False __Threading Layer Information__ TBB Threading Layer Available : False +--> Disabled due to Unknown import problem. OpenMP Threading Layer Available : True +-->Vendor: Intel Workqueue Threading Layer Available : True +-->Workqueue imported successfully. __Numba Environment Variable Information__ None found. __Conda Information__ Conda Build : 3.21.9 Conda Env : 4.13.0 Conda Platform : osx-arm64 Conda Python Version : 3.9.11.final.0 Conda Root Writable : True __Installed Packages__ blas 1.0 openblas bzip2 1.0.8 h620ffc9_4 ca-certificates 2022.10.11 hca03da5_0 certifi 2022.9.24 py310hca03da5_0 cffi 1.15.1 py310h80987f9_2 clang 14.0.6 hca03da5_0 clang-14 14.0.6 default_hf5194b7_0 clang-format 14.0.6 default_hf5194b7_0 clang-format-14 14.0.6 default_hf5194b7_0 clang-tools 14.0.6 default_hf5194b7_0 clangdev 14.0.6 default_hf5194b7_0 clangxx 14.0.6 default_hf5194b7_0 fftw 3.3.9 h1a28f6b_1 jinja2 3.1.2 py310hca03da5_0 libclang 14.0.6 default_hf5194b7_0 libclang-cpp 14.0.6 default_hf5194b7_0 libclang-cpp14 14.0.6 default_hf5194b7_0 libclang13 14.0.6 default_hf5a4b0a_0 libcxx 14.0.6 h848a8c0_0 libffi 3.4.2 hca03da5_6 libgfortran 5.0.0 11_3_0_hca03da5_28 libgfortran5 11.3.0 h009349e_28 libllvm14 14.0.6 h7ec7a93_1 libopenblas 0.3.21 h269037a_0 llvm-openmp 14.0.6 hc6e5704_0 llvm-tools 14.0.6 h7ec7a93_1 llvmdev 14.0.6 h7ec7a93_1 llvmlite 0.40.0dev0 py310_43 numba/label/dev markupsafe 2.1.1 py310h1a28f6b_0 ncurses 6.3 h1a28f6b_3 numba 0.57.0.dev0+846.g728263512 dev_0 <develop> numpy 1.23.4 py310hb93e574_0 numpy-base 1.23.4 py310haf87e8b_0 openssl 1.1.1s h1a28f6b_0 pip 22.2.2 py310hca03da5_0 pycparser 2.21 pyhd3eb1b0_0 python 3.10.8 hc0d8a6c_1 readline 8.2 h1a28f6b_0 scipy 1.9.3 py310h20cbe94_0 setuptools 65.5.0 py310hca03da5_0 sqlite 3.40.0 h7a7dc30_0 tk 8.6.12 hb8d0fd4_0 tzdata 2022f h04d1e81_0 wheel 0.37.1 pyhd3eb1b0_0 xz 5.2.6 h1a28f6b_0 zlib 1.2.13 h5a0b063_0 No errors reported. __Warning log__ Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present. Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'> Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it. -------------------------------------------------------------------------------- ``` ## Approaches not to take (for mental Sanity) There are multiple approaches I attempted, but ended up aborting due to their instability, as well as the inherent fallacies with calling from multiple header directories with similar contents at the same time. ### LLVM build from source for the Compiler Circumvent the inability of the native Clang-compiler on Mac, we can try to build our own LLVM toolchain. As LLVMlite is based on a patched-up version of the `release/14.x` branch of LLVM, checking out the source for llvm on that branch would be the most natural option ```bash git clone https://github.com/llvm/llvm-project && cd llvm-project git checkout release/14.x mkdir build && cd build ``` To then build the Clang-toolchain from source for the M1-based Mac with the OpenMP libraries ```bash cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release\ -DLLVM_ENABLE_ASSERTIONS=ON\ -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;openmp"\ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"\ -DLLVM_LINK_LLVM_DYLIB=ON\ -DLLVM_ENABLE_EH=ON\ -DLLVM_ENABLE_FFI=ON\ -DLLVM_ENABLE_RTTI=ON\ -DLLVM_INCLUDE_DOCS=OFF\ -DLLVM_INSTALL_UTILS=ON\ -DLLVM_ENABLE_Z3_SOLVER=OFF\ -DLLVM_TARGETS_TO_BUILD="AArch64"\ -DLIBOMP_INSTALL_ALIASES=OFF\ -DLLVM_CREATE_XCODE_TOOLCHAIN=ON\ -DLLVM_BUILD_LLVM_C_DYLIB=ON\ -DLLVM_ENABLE_LIBCXX=ON\ -DRUNTIMES_CMAKE_ARGS=DCMAKE_INSTALL_RPATH="@loader_path/../lib" ``` We then have to export the `$PATH`, and `$LD_LIBRARY_PATH` paths for them to be found before we can continue the attempt to build Numba with this toolchain ```bash export PATH=${PWD}/bin:$PATH export LD_LIBRARY_PATH=${PWD}/lib:$LD_LIBRARY_PATH ``` So where does this begin to fail? - At first we are missing the `stdio.h` library, which we can get from the XCode Developers SDK by manually pointing to the directory of the headers library. - Where this approach really **collapses** is at the next attempt where we run into conflicting versions of header libraries, which would require a lot of manual linking calls to not search for the header libraries in the entire directories. ### Python Virtualenv-based Install Another typically logical approach to take would be to use a Python-based virtual environment, and hence avoid the Conda-induced isolation with Conda's own libraries and just have a very thin `virtualenv`. To do this we'd ```bash python3 -m venv numbaenv && source numbaenv/bin/activate pip install llvmlite ``` at which point we can clone the Numba-repo and start a CPU-based build, which for the lack of abilities only run `Numba Threads` without [Threading Building Blocks](https://github.com/oneapi-src/oneTBB), or OpenMP ```bash python setup.py build_ext --inplace --noopt --debug ``` So where does this particular approach begin to fail? - The Numba library built from `main` is **incompatible** with the version of the llvmlite library shipped by pip. As such, the two **do not** work together. ### Setting CFlags, and LFlags as suggested on Stackoverflow A typical suggestion on Stackoverflow, or similar sites is to set compiler, and linking flags at the command line to fix the compiler not finding certain header files, and setting a deployment target, i.e. ```bash export CFLAGS="-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include" export LDFLAGS=-L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib export MACOSX_DEPLOYMENT_TARGET=14.1 ``` So why is this approach **undesirable** and ultimately **fails**? - We end up patching up the first missing libraries such as `stdio.h`, but the next missing library is then `vector.h`, and the deluge of missing libraries does not seem to stop. As such I would call this an *unclean* approach, which I ultimately did not manage to get to work.