owned this note
owned this note
Published
Linked with GitHub
# Native Compilation of Numba on M1-based Mac
###### tags: Blog
> This is cross-published on my [website](https://ludger.fyi/blog/numba-mac/)
## Motivation
The current installation instructions of [Numba](https://github.com/numba/numba) face some difficulties induced by the way Apple packages its software development toolkit. This note documents how to set Numba up from start to end on an M1-based Mac, and where alternate approaches fail. We are building with support for OpenMP to utilize all available threads, but without support for [Threading Building Blocks](https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Tbb-on-new-macOS-arm/td-p/1214887), which are unsupported on Mac, and likewise without support for CUDA, which is also not available on Mac.
## Building Numba from Source for Local Development with OpenMP-Support
To build Numba on Mac (or any platform for that matter) we need to begin by creating a conda environment with the base-dependencies
```bash
conda create -n numbaenv python=3.10 numba/label/dev::llvmlite numpy scipy jinja2 cffi
```
and activate the environment
```bash
conda activate numbaenv
```
At which point we have installed [llvmlite](https://llvmlite.readthedocs.io/en/latest/) for the LLVM JIT-engine underpinning Numba, whose installaton we can now verify from a Python REPL.
```python
import llvmlite
llvmlite.__version__
```
We can now clone the source of Numba.
```bash
git clone git@github.com:numba/numba.git && cd numba
```
Preemptively disable Threading Building Blocks
```bash
export NUMBA_DISABLE_TBB=1
```
Point to the Mac-specific software development toolkit, the path to which can be found with `xcrun --show-sdk-path`
```bash
export SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
```
> The important pointer to set the `SDKROOT` was found in the [conda build-scripts](https://github.com/numba/numba/blob/main/buildscripts/incremental/build.sh) of Numba, but downloading the older software development toolkit, as done in the build script, is not necessary on a modern Mac if you have already downloaded the software development toolkit libraries.
And install the OSX-specific Clang-compilers provided by Conda to not use Apple's system compiler which have OpenMP disabled, and cannot import OpenMP or accept the `-fopenmp` flag at compilation time.
```bash
conda install clang clangdev
```
The command to build Numba itself, for development purposes with `--noopt` (no optimizations), and `--debug` (debugging options enables) is then
```bash
python setup.py build_ext --inplace --noopt --debug
```
If we would just want to use it locally without the need for the development-focussed deactivation of optimizations we would build with just the `--inplace` option
```bash
python setup.py build_ext --inplace
```
After which we can install the Numba wheel into our environment
```bash
python -m pip install --no-deps -e .
```
And verify its installation either from the REPL with
```python
import numba
numba.__version__
```
or from the command line with Numba's provided utility
```bash
numba -s
```
The output should look similar to this
```bash
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2022-11-26 23:48:15.001253
UTC start time : 2022-11-27 05:48:15.001264
Running time (s) : 0.718704
__Hardware Information__
Machine : arm64
CPU Name : cyclone
CPU Count : 10
Number of accessible CPUs : ?
List of accessible CPUs cores : ?
CFS Restrictions (CPUs worth of runtime) : None
CPU Features :
Memory Total (MB) : 32768
Free Memory (MB) : 23
__OS Information__
Platform Name : macOS-13.0.1-arm64-arm-64bit
Platform Release : 22.1.0
OS Name : Darwin
OS Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000
OS Specific Version : 13.0.1 arm64
Libc Version : ?
__Python Information__
Python Compiler : Clang 14.0.6
Python Implementation : CPython
Python Version : 3.10.8
Python Locale : None.UTF-8
__Numba Toolchain Versions__
Numba Version : 0.57.0dev0+846.g728263512
llvmlite Version : 0.40.0dev0+43.g7783803
__LLVM Information__
LLVM Version : 11.1.0
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Runtime Version : ?
CUDA NVIDIA Bindings Available : ?
CUDA NVIDIA Bindings In Use : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__NumPy Information__
NumPy Version : 1.23.4
NumPy Supported SIMD features : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD', 'FPHP', 'ASIMDHP', 'ASIMDDP')
NumPy Supported SIMD dispatch : ('ASIMDHP', 'ASIMDDP', 'ASIMDFHM')
NumPy Supported SIMD baseline : ('NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD')
NumPy AVX512_SKX support detected : False
__SVML Information__
SVML State, config.USING_SVML : False
SVML Library Loaded : False
llvmlite Using SVML Patched LLVM : True
SVML Operational : False
__Threading Layer Information__
TBB Threading Layer Available : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available : True
+-->Vendor: Intel
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda Build : 3.21.9
Conda Env : 4.13.0
Conda Platform : osx-arm64
Conda Python Version : 3.9.11.final.0
Conda Root Writable : True
__Installed Packages__
blas 1.0 openblas
bzip2 1.0.8 h620ffc9_4
ca-certificates 2022.10.11 hca03da5_0
certifi 2022.9.24 py310hca03da5_0
cffi 1.15.1 py310h80987f9_2
clang 14.0.6 hca03da5_0
clang-14 14.0.6 default_hf5194b7_0
clang-format 14.0.6 default_hf5194b7_0
clang-format-14 14.0.6 default_hf5194b7_0
clang-tools 14.0.6 default_hf5194b7_0
clangdev 14.0.6 default_hf5194b7_0
clangxx 14.0.6 default_hf5194b7_0
fftw 3.3.9 h1a28f6b_1
jinja2 3.1.2 py310hca03da5_0
libclang 14.0.6 default_hf5194b7_0
libclang-cpp 14.0.6 default_hf5194b7_0
libclang-cpp14 14.0.6 default_hf5194b7_0
libclang13 14.0.6 default_hf5a4b0a_0
libcxx 14.0.6 h848a8c0_0
libffi 3.4.2 hca03da5_6
libgfortran 5.0.0 11_3_0_hca03da5_28
libgfortran5 11.3.0 h009349e_28
libllvm14 14.0.6 h7ec7a93_1
libopenblas 0.3.21 h269037a_0
llvm-openmp 14.0.6 hc6e5704_0
llvm-tools 14.0.6 h7ec7a93_1
llvmdev 14.0.6 h7ec7a93_1
llvmlite 0.40.0dev0 py310_43 numba/label/dev
markupsafe 2.1.1 py310h1a28f6b_0
ncurses 6.3 h1a28f6b_3
numba 0.57.0.dev0+846.g728263512 dev_0 <develop>
numpy 1.23.4 py310hb93e574_0
numpy-base 1.23.4 py310haf87e8b_0
openssl 1.1.1s h1a28f6b_0
pip 22.2.2 py310hca03da5_0
pycparser 2.21 pyhd3eb1b0_0
python 3.10.8 hc0d8a6c_1
readline 8.2 h1a28f6b_0
scipy 1.9.3 py310h20cbe94_0
setuptools 65.5.0 py310hca03da5_0
sqlite 3.40.0 h7a7dc30_0
tk 8.6.12 hb8d0fd4_0
tzdata 2022f h04d1e81_0
wheel 0.37.1 pyhd3eb1b0_0
xz 5.2.6 h1a28f6b_0
zlib 1.2.13 h5a0b063_0
No errors reported.
__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
```
## Approaches not to take (for mental Sanity)
There are multiple approaches I attempted, but ended up aborting due to their instability, as well as the inherent fallacies with calling from multiple header directories with similar contents at the same time.
### LLVM build from source for the Compiler
Circumvent the inability of the native Clang-compiler on Mac, we can try to build our own LLVM toolchain. As LLVMlite is based on a patched-up version of the `release/14.x` branch of LLVM, checking out the source for llvm on that branch would be the most natural option
```bash
git clone https://github.com/llvm/llvm-project && cd llvm-project
git checkout release/14.x
mkdir build && cd build
```
To then build the Clang-toolchain from source for the M1-based Mac with the OpenMP libraries
```bash
cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release\
-DLLVM_ENABLE_ASSERTIONS=ON\
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;openmp"\
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"\
-DLLVM_LINK_LLVM_DYLIB=ON\
-DLLVM_ENABLE_EH=ON\
-DLLVM_ENABLE_FFI=ON\
-DLLVM_ENABLE_RTTI=ON\
-DLLVM_INCLUDE_DOCS=OFF\
-DLLVM_INSTALL_UTILS=ON\
-DLLVM_ENABLE_Z3_SOLVER=OFF\
-DLLVM_TARGETS_TO_BUILD="AArch64"\
-DLIBOMP_INSTALL_ALIASES=OFF\
-DLLVM_CREATE_XCODE_TOOLCHAIN=ON\
-DLLVM_BUILD_LLVM_C_DYLIB=ON\
-DLLVM_ENABLE_LIBCXX=ON\
-DRUNTIMES_CMAKE_ARGS=DCMAKE_INSTALL_RPATH="@loader_path/../lib"
```
We then have to export the `$PATH`, and `$LD_LIBRARY_PATH` paths for them to be found before we can continue the attempt to build Numba with this toolchain
```bash
export PATH=${PWD}/bin:$PATH
export LD_LIBRARY_PATH=${PWD}/lib:$LD_LIBRARY_PATH
```
So where does this begin to fail?
- At first we are missing the `stdio.h` library, which we can get from the XCode Developers SDK by manually pointing to the directory of the headers library.
- Where this approach really **collapses** is at the next attempt where we run into conflicting versions of header libraries, which would require a lot of manual linking calls to not search for the header libraries in the entire directories.
### Python Virtualenv-based Install
Another typically logical approach to take would be to use a Python-based virtual environment, and hence avoid the Conda-induced isolation with Conda's own libraries and just have a very thin `virtualenv`. To do this we'd
```bash
python3 -m venv numbaenv && source numbaenv/bin/activate
pip install llvmlite
```
at which point we can clone the Numba-repo and start a CPU-based build, which for the lack of abilities only run `Numba Threads` without [Threading Building Blocks](https://github.com/oneapi-src/oneTBB), or OpenMP
```bash
python setup.py build_ext --inplace --noopt --debug
```
So where does this particular approach begin to fail?
- The Numba library built from `main` is **incompatible** with the version of the llvmlite library shipped by pip. As such, the two **do not** work together.
### Setting CFlags, and LFlags as suggested on Stackoverflow
A typical suggestion on Stackoverflow, or similar sites is to set compiler, and linking flags at the command line to fix the compiler not finding certain header files, and setting a deployment target, i.e.
```bash
export CFLAGS="-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include"
export LDFLAGS=-L/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib
export MACOSX_DEPLOYMENT_TARGET=14.1
```
So why is this approach **undesirable** and ultimately **fails**?
- We end up patching up the first missing libraries such as `stdio.h`, but the next missing library is then `vector.h`, and the deluge of missing libraries does not seem to stop. As such I would call this an *unclean* approach, which I ultimately did not manage to get to work.