# Importing a shared library
###### tags: `book`
## ctypes
The [ctypes](https://docs.python.org/3/library/ctypes.html) library is part of the Python standard libary and provides convenient access to C compatible data types and loading of shared libraries. C datatypes like `c_int`, `c_double`, `c_long` etc. are supported and libraries are loaded with `CDLL`. A simple example on the use of `ctypes` was given in the [Introduction](/AgSRXeM7QrOh_-2jtCZDXQ).
### Working with NumPy arrays
The most convenient way to work with arrays is through [numpy.ctypeslib](https://numpy.org/doc/stable/reference/routines.ctypeslib.html).
In particular, it includes the functions `as_array`, which converts a C array to a NumPy `ndarray`, and `as_ctypes`, which converts an `ndarray` to a C array.
Here is an example passing a NumPy array to a Fortran subroutine that sums the column values:
```python
from ctypes import CDLL, byref, c_int
import numpy as np
from numpy.ctypeslib import as_ctypes, as_array
lib = CDLL("mod_sum.so")
a = np.array([[1, 2, 3], [1, 2, 3]], dtype=np.float64)
sum_col = np.empty(a.shape[1], dtype=np.float64)
c_a = as_ctypes(a)
c_n_r_a = c_int(a.shape[0])
c_n_c_a = c_int(a.shape[1])
c_sum_col = as_ctypes(sum_col)
lib.sum_columns(byref(c_a), c_n_r_a, c_n_c_a, byref(c_sum_col))
print(sum_col)
> array([2., 4., 6.])
```
Note the explicit declaration of the datatypes for the Numpy arrays which avoids NumPy guessing the type from the input values. In accordance with C standard, arrays need to be passed *by reference* rather than by value.
That is handled by the `byref` funciton of `ctypes`.
Here is the underlying Fortran module:
```fortran
module mod_sum
use, intrinsic :: iso_c_binding, only: c_double, c_int
implicit none
contains
subroutine sum_columns(a, n_r_a, n_c_a, sum_col) bind(c)
real(c_double), intent(in) :: a(n_c_a,n_r_a)
integer(c_int), value, intent(in) :: n_c_a, n_r_a
real(c_double) :: sum_col(n_c_a)
sum_col = sum(a, 2)
end subroutine sum_columns
end module mod_sum
```
### Derived types
`ctypes` interfaces with C structs via Python classes that subclass `ctypes.Structure`.
In this example we create two two-dimensional points and pass them to the function `add_points` which sums the x and y components:
```python
from ctypes import CDLL, byref, c_double, Structure
class Point(Structure):
_fields_ = [("x", c_double),
("y", c_double)]
lib = CDLL("mod_point.so")
lib.add_points.restype = Point
a = Point(c_double(1.1), c_double(1.5))
b = Point(c_double(2.4), c_double(5.3))
c = lib.add_points(byref(a), byref(b))
print(c.x, c.y)
3.5 6.8
```
Notice that we needed to specify `Point` as the the return type of the function with the `.restype` attribute. Here is the Fortran module:
```fortran
module mod_point
use, intrinsic :: iso_c_binding, only: c_double
implicit none
type, bind(c) :: Point
real(c_double) :: x
real(c_double) :: y
end type Point
contains
type(Point) function add_points(a, b) bind(c)
type(Point), intent(in) :: a, b
add_points % x = a % x + b % x
add_points % y = a % y + b % y
print *, add_points % x
end function add_points
end module mod_point
```
## cffi
[cffi](https://cffi.readthedocs.io/en/latest/index.html) is a C Foreign Function Interface for Python.
It is more flexible than `ctypes` when writing more complex interfaces. Here is an example using the `mod_sum.so` shared library given above:
```python
>>> import cffi
>>> import numpy as np
>>> from numpy.ctypeslib import as_array
>>> ffi = cffi.FFI()
>>> lib = ffi.dlopen("mod_sum.so")
>>> ffi.cdef("void sum_columns(double *a, int n_r_a, int n_c_a, double *b);")
>>> a = np.array([[1, 2, 3], [1, 2, 3]], dtype=np.float64)
>>> sum_col = np.empty(a.shape[1], dtype=np.float64)
>>> c_a = ffi.cast("double *", a.ctypes.data)
>>> c_n_r_a = ffi.cast("int", a.shape[0])
>>> c_n_c_a = ffi.cast("int", a.shape[1])
>>> c_sum_col = ffi.cast("double *", sum_col.ctypes.data)
>>> lib.sum_columns(c_a, c_n_r_a, c_n_c_a, c_sum_col)
>>> print(sum_col)
array([2., 4., 6.])
```
First, a `FFI` object is created to handle all the interactions with the library. The library is opened with `FFF.dlopen`. One key difference between `ctypes` and `cffi` concerns the need to include C declarations using `cdef`. This method takes a C code string as it would be given in a C header file. In fact, if you already have a header file with the following contents,
```c
void sum_columns(double *a, int n_r_a, int n_c_a, double *b);
```
it can be conveniently fed directly into `cdef`:
```python
with open("mod_sum.h") as f:
ffi.cdef(f.read())
```
Another difference is that `FFI.cast` is used to cast Python types as C types.
The first argument to `cast` is a C code string describing the type.
Pointers are described by using the `<type> *` syntax, and there is therefore no need for a separate `byref` function as in `ctypes`.
The `pycparser` module used to read the header file has some limitations, headers using `#define` or `#ifdef` preprocessor logic cannot be parsed directly and have to be preprocessed manually first.
### Working with NumPy arrays
A NumPy array can be converted to a C array in two different ways:
```python
c_array = ffi.cast("double *", a.ctypes.data)
c_array = ffi.from_buffer("double *", a)
```
Converting a C array to a NumPy array is a bit more involved than with ctypes and is done in two steps.
```python
>>> n_elements = a.size
>>> size_element = ffi.sizeof(c_a)
>>> size = n_elements * size_element
>>> buffer = ffi.buffer(c_a, size=size)
>>> array = np.frombuffer(buffer, dtype=np.float64)
>>> array = array.reshape(a.shape)
>>> array
array([[1., 2., 3.],
[1., 2., 3.]])
```
First you create a buffer with `FFI.buffer` and then you read that buffer with `numpy.frombuffer`.
Note that you need knowledge of the array shape to (1) calculate the size of the buffer, and (2) reshape the final NumPy array.
This procedure is a bit cumbersome to write everytime it would be needed and is better made into a function.
### Derived types
A derived type is declared with `cdef` in the usual C way.
Here an example for the Point struct above:
```python
>>> import cffi
>>>
>>> ffi = cffi.FFI()
>>> lib = ffi.dlopen("mod_point.so")
>>> ffi.cdef("""\
>>> typedef struct Point{
>>> double x;
>>> double y;
>>> } Point;\
>>> """)
>>> ffi.cdef("Point add_points(Point *a, Point *b);")
>>> c_point_a = ffi.new("Point *", [ffi.cast("double", 1.1), ffi.cast("double", 1.5)])
>>> c_point_b = ffi.new("Point *", [ffi.cast("double", 2.4), ffi.cast("double", 5.3)])
>>> c_point_c = lib.add_points(c_point_a, c_point_b)
>>> c_point_c
<cdata 'Point' owning 16 bytes>
>>> c_point_c.x
3.5
>>> c_point_c.y
6.8
```
In this example we are able to work directly with the resulting C Point object, but for more complicated use cases we should construct a wrapper class in Python:
```python
class Point:
def __init__(self, x=None, y=None):
if x is not None and y is not None:
self._c_point = ffi.new("Point *", [ffi.cast("double", x), ffi.cast("double", y)])
@classmethod
def from_c_point(cls, c_point):
point = Point(float(c_point.x), float(c_point.y))
point._c_point = c_point
return point
@property
def x(self):
return float(self._c_point.x)
@property
def y(self):
return float(self._c_point.y)
```
We can now wrap our C Point as a Python Point:
```python
>>> point_c = Point.from_c_point(c_point_c)
>>> point_c.x
3.5
>>> point_c.y
6.8
```
Derived types, which cannot be made intercompatible with `bind(C)` attribute, can still be made accessible as opaque data pointer in Python.
```fortran=
module mod_alloc
use, intrinsic :: iso_c_binding
implicit none
type :: container
real(c_double), allocatable :: val(:)
end type container
contains
function new_container(n, val) result(vptr) bind(C)
integer(c_int), value, intent(in) :: n
real(c_double), intent(in) :: val(n)
type(c_ptr) :: vptr
type(container), pointer :: cont
allocate(cont)
cont%val = val(:n)
vptr = c_loc(cont)
end function new_container
subroutine delete_container(vptr) bind(C)
type(c_ptr), value, intent(in) :: vptr
type(container), pointer :: cont
call c_f_pointer(vptr, cont)
deallocate(cont)
end subroutine delete_container
function get_sum(vptr) result(sum_val) bind(C)
type(c_ptr), value, intent(in) :: vptr
real(c_double) :: sum_val
type(container), pointer :: cont
call c_f_pointer(vptr, cont)
sum_val = sum(cont%val)
end function get_sum
end module mod_alloc
```
Which we can use in Python as
```python=
>>> import cffi
>>> ffi = cffi.FFI()
>>> lib = ffi.dlopen("mod_alloc.so")
>>> ffi.cdef("""\
... typedef struct _container* container;
... extern container new_container(int, double*);
... extern void delete_container(container);
... extern double get_sum(container);""")
>>> val = [3.0, 4.0, 5.0]
>>> cont = lib.new_container(len(val), val)
>>> lib.get_sum(cont)
12.0
>>> lib.delete_container(cont)
```
The `typedef` declares an opaque pointer to the Fortran data.
While we cannot directly interact with the content of the container in Python anymore, it allows to export almost any data type available in Fortran.
This can become especially useful to make class polymorphic objects with a well-defined API available in Python.
Since we are using `pointer` attributes on the library side, we have to explicitly free the memory after we are done with the data.
To create a more pythonic way to work with the container we would wrap it in a class which takes care of the details of the memory management
```python=
class Container:
_cont = ffi.NULL
def __enter__(self, val: List[float]):
self._cont = lib.new_container(len(val), val)
return self
def __exit__(self):
lib.delete_container(self._cont)
def sum(self):
lib.get_sum(self._cont)
```
This class allows us to use our object in a with context
```python
>>> with Container([3.0, 4.0, 5.0]) as cont:
... cont.sum()
12.0
```
### Garbage collection
Resources allocated in the library have to be freed explicitly in the library as well.
The `cffi` module provides a garbage collection mechanism to automatically associate a deconstructor with an object.
```python
cont = ffi.gc(lib.new_container(len(val), val), lib.delete_container)
```
We can now simply use the object like any other Python object and rely on the garbage collector to free the memory allocation.
### Different cffi modes
Here we have been using `cffi` in the *ABI* mode by accessing the library at the binary level.
In the *API* mode, we would instead have compiled C code to handle the access for us.
We're also using the *in-line* mode, where everything is set up every time the Python code is imported, rather than the *out-of-line* mode, where a separate module is set up once and then can be imported.
These differences are important for optimizing performance when building and packaging applications. More about that in [...].
### Combining with setuptools
The `cffi` out-of-line API builder can be readily combined with setuptools.
The ffibuilder is defined in a separate `build.py` script and can be added with
```python=
from setuptools import setup
setup(
cffi_modules=["build.py:ffibuilder"],
)
```
The `build.py` script is used to define out-of-line API mode for `cffi`.
A simple FFI builder is given here
```python=
"""FFI builder module for usage from setup.py."""
import cffi
ffibuilder = cffi.FFI()
ffibuilder.set_source(
"mylib._mylib",
'#include "mylib.h"',
libraries=["mylib"],
)
with open("mylib.h") as fd:
ffibuilder.cdef(fd.read())
if __name__ == "__main__":
ffibuilder.distutils_extension(".")
```
Running the script outside of setup creates the C source code of the extension module and allows in principle to compile it yourself.
However, it is easier to let setuptools take care of compiling and linking against your Python installation.
#### Finding a library with pkg-config
The pkg-conf dependency is a frequently used format to describe how to build against an existing project.
The pc-file format is supported in Python with the `pkgconfig` module which allows us to easily import any library.
The `pkgconfig` package will be a `setup_requires` dependency in our `setup.py` or `pyproject.toml`.
A usual pc-file looks like this
```pc=
prefix=/usr
libdir=${prefix}/lib
includedir=${prefix}/include
Name: mylib
Description: My fancy library
Version: 1.0.0
Libs: -L${libdir} -lmylib
Cflags: -I${includedir}
```
The pc-file contains the information required to compile and link our library from any other project.
This information can be readily used in our FFI builder.
```python=
"""
FFI builder module with automatic library detection via pkgconfig.
"""
import os
import cffi
import subprocess
import pkgconfig
if not pkgconfig.exists("mylib"):
raise Exception("Unable to find pkg-config package 'mylib'")
kwargs = pkgconfig.parse("mylib")
cc = os.environ["CC"] if "CC" in os.environ else "cc"
cflags = pkgconfig.cflags("mylib").split()
module_name = "mylib._mylib"
p = subprocess.Popen(
[cc, *cflags, "-E", "-"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
out, err = p.communicate(b'#include "mylib.h"')
cdefs = out.decode()
ffibuilder = cffi.FFI()
ffibuilder.set_source(module_name, '#include "mylib.h"', **kwargs)
ffibuilder.cdef(cdefs)
if __name__ == "__main__":
ffibuilder.distutils_extension(".")
```
We also added a step here to preprocess the header file in case it contains `#define` or `#include` preprocessor which cannot be handled by the `pycparser` module.
## Cython
[Cython](https://cython.org) is a Python extension that can compile Python code as faster C code, but it also has capabilities for loading shared libraries.
Getting a working Cython interface isn't as easy as with `ctypes` or `cffi`, but it can be used to build Python modules automatically with setuptools as we will investigate in the [Building with setuptools](/eR2qt_8BQE6f86yWorCjgQ) chapter.
To use Cython to load the shared library, we first need three files:
1. A `.pxd` Cython declaration file
2. A `.pyx` Cython source file
3. A `.h` C declaration file
Importantly, the `.pxd` and `.pyx` files need to have different names and be in the same directory.
The `.pxd` file resembles the C header file and contains declarations for the code that you want to access.
> :warning: The `.pxd` and `.pyx` files should have different names
```python
# file: c_mod_sum.pxd
cdef extern from "mod_sum.h":
void sum_columns(double *a, int n_r_a, int n_c_a, double *sum_col)
```
```c
// file: mod_sum.h
void sum_columns(double *a, int n_r_a, int n_c_a, double *b);
```
Then we write the `.pyx` Cython source file that exposes a Python function:
```python
# file: mod_sum.pyx
cimport c_mod_sum
import numpy as np
def sum_columns(double [:, :] a):
sum_col = np.empty(a.shape[1], dtype=np.float64)
cdef double [:] c_sum_col = sum_col
cdef int c_n_a_r = a.shape[0]
cdef int c_n_a_c = a.shape[1]
c_mod_sum.sum_columns(&a[0, 0], c_n_a_r, c_n_a_c, &c_sum_col[0])
return sum_col
```
The syntax of Cython is very close to Python. `cimport` is used to import the `.pxd` file and `cdef` is used to define C variables. The `&` operator is used to pass varibles by reference, and will be discussed more below.
### Building the Cython module
The next step is to build a Python module, and for that we will create a `setup.py` file:
```python
# file: setup.py
from setuptools import setup, Extension
from Cython.Build import cythonize
setup(
name='Test Cython',
ext_modules = cythonize([
Extension("py_mod_sum", ["mod_sum.pyx"],
libraries=["mod_sum"],
library_dirs=["."],)
]),
zip_safe=False)
```
This tells setuptools to create a C extension module with the name `py_mod_sum` from the Cython source file `mod_sum.pyx`.
The `mod_sum` library is used and setuptools will search for it in the regular paths as well as the directories in `library_dirs`.
Here we direct setuptools to look for the library in the current directory (the same as were `setup.py` is).
We now need to make sure that our shared library file is called `libmod_sum.so` as the install process will automatically prepend "lib" to the name when searching. We then build the module with:
```shell
python setup.py build_ext --inplace
```
That generates a file in the current directory with the name `py_mod_sum.cpython-39-darwin.so` or similar, depending on operating system and version of Python.
We can now import this file as a module directly into Python and use it:
```python
>>> from py_mod_sum import sum_columns
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, 2, 3]])
>>> sum_columns(a)
array([2., 4., 6.])
```
> :warning: Copying the shared library to a new filename might not work as expected. For example, on MacOS, the `install_name` property will still correspond to the original name and need to be [changed](https://matthew-brett.github.io/docosx/mac_runtime_link.html) with the `install_name_tool` command line tool. To be sure that everything will work as intended, build the shared library with the correct filename from the start.
### Working with NumPy arrays
The recommended way of working with NumPy arrays in Cython is through *memory views*. A memory view is created via the following syntax:
```python
# Create a memory view of a 1D array
cdef double [:] view_1d = np.array([1.0, 2.0, 3.0], dtype=np.float64)
# Create a memory view of a 2D array
cdef double [:, :] view_2d = np.array([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0]], dtype=np.float64)
```
As usual, arrays should be passed to C function by reference.
In Cython this is achieved by using the `&` operator and the giving first element in the array:
```python
c_mod_sum.sum_columns(&a[0, 0], c_n_a_r, c_n_a_c, &c_sum_col[0])
```
There are many more details on working with NumPy arrays in the Cython [documentation](https://cython.readthedocs.io/en/latest/src/userguide/numpy_tutorial.html).
### Derived types
Starting from our point example, we write the `.pxd` file:
```python
# file: c_mod_point.pxd
cdef extern from "mod_point.h":
ctypedef struct Point:
double x
double y
cdef Point add_points(Point *a, Point*b)
```
which closely mirrors our C header file:
```c
// file: mod_point.h
typedef struct Point{
double x;
double y;
} Point;
Point add_points(Point *a, Point *b);
```
To work with the C point struct in Python, we construct a wrapper class and a Python function to work with this class:
```python
cimport c_mod_point
cdef class Point:
cdef c_mod_point.Point _point
def __cinit__(self, double x, double y):
self._point.x = x
self._point.y = y
@property
def x(self):
return self._point.x
@property
def y(self):
return self._point.y
def add_points(Point a, Point b):
c_point = c_mod_point.add_points(&a._point, &b._point)
point = Point(c_point.x, c_point.y)
return point
```
Cython classes are called *extension types* and are defined with the `cdef class` syntax.
We declare that this class should hold an attribute `_point` which holds the C point struct, and the `__cinit__` constructor method is used to initialize this object.
We also define two properties which allows us to access the attributes of the C struct. Finally, the `add_points` function is a Python wrapper for our C function.
As the C function returns a C struct, we need to convert that to the Python class before returning to the user.
As before, we need to build with a `setup.py`:
```python=
from setuptools import setup, Extension
from Cython.Build import cythonize
setup(
name='Test Cython',
ext_modules = cythonize([
Extension("py_mod_point", ["mod_point.pyx"],
libraries=["mod_point"],
library_dirs=["."],)
]),
zip_safe=False,
```
After building with `python setup.py build_ext --inplace`, we can now import the class and function and work with them in Python.
```python
from py_mod_point import add_points, Point
>>> p_1 = Point(1.1, 1.5)
>>> p_2 = Point(2.5, 5.3)
>>> p_3 = add_points(p_1, p_2)
>>> p_3
>>> p_3.x
3.6
>>> p_3.y
6.8
```
## Wrapping the Python C interface
It's often better to hide away the technicalities of the Python-Fortran interface behind regular Python function and classes.
This is actually what we did with Cython above.
The end user then does have to worry about converting datatypes, memory management etc. Here is an example with `ctypes`:
```python
from ctypes import CDLL, byref
import numpy as np
from numpy.ctypeslib import as_ctypes
lib = CDLL("mod_sum.so")
def sum_columns(a):
a = np.ascontiguousarray(a)
sum_col = np.empty(a.shape[1], dtype=np.float64)
c_a = as_ctypes(a)
c_n_r_a = c_int(a.shape[0])
c_n_c_a = c_int(a.shape[1])
c_sum_col = as_ctypes(sum_col)
lib.sum_columns(byref(c_a), c_n_r_a, c_n_c_a, byref(c_sum_col)
return sum_col
```
This function (1) converts the input into a C contiguous NumPy array, (2) converts the Numpy array to a C array, (3) creates an empty array to hold the result of the calculation, (4) runs the subroutine to modify the result array that is then returned. The end user is completely oblivious that any Fortran code has been run behind the scenes.