# Dispatching and backend selection
**https://hackmd.io/@seberg/spatch**
## Dispatching/backend selection from NetworkX to skimage
### Use cases
### Problem space
For requirements see also: https://github.com/scientific-python/spatch/issues/1
1. Type dispatching -> Backend selection -> automatic conversion
* Type dispatching means: cupy in -> cupy out
* "backend selection":
* `backends=[cupy, numpy]`, `with backend("cupy"): ...`, ...
* User enables (chooses) a backend that takes over (numpy -> CUDA/MPI/... -> numpy out)
2. Other dimensions:
* Runtime/online documentation for/of backends
* Backend specific arguments?
* NetworkX documents backend additional kwargs
* e.g. NetworkX `cugraph_kwargs={...}` to use for backend and ignore otherwise
* Or just pass in extra keyword, `chunks=chunks`
* "Lazy imports":
* We don't want to import cupy, dask, ... if the user only
uses a few small NumPy arrays.
* Easy to inspect which backends are being used
3. API: How do we want to spell things
* Ignore the "sausage" of doing the dispatching
* E.g. `@dispatchable` in the library (skimage)
* `@implements` in the provider (cucim)
* IMO, this one can be a bit specialized, i.e. the "official" API provided doesn't have to be as lean.
4. Testing and endorsement of plugins.
5. Inspection!
* Python logging (or so)
* Know which backend was used or not used.
#### Solutions
```
def find_algo(graph0, graph1):
return graph0, graph1
@dispatchable(0, "")
def algo(graph0, graph1):
return
```
* Type dispatching:
* NetworkX has `graph.__networkx_backend__`, NumPy has `__array_function__` on the types
* Adding to dunder to `np.ndarray` is tricky
* Could also just loop through backends until one matches.
* My (seberg's) opinion:
* Use types only initially (can cache it for very quick dispatching e.g. if there are 5 backends but you use numpy arrays)
* Could be a `can_run/matches` function
* But... maybe start with `cupy:ndarray` and `numpy:ndarray` exact type matching?
(Ensures minimal dispatching, easily in the future)
* Tim: Have a function potentially for each function.
* seberg: Or is this a "should run" (i.e. a second call later)
* return NotImplemented, "This is too small!"
* Backend selection:
* When a backend is chosen and can work on types, it should.
* Subtleties/problems?
* What if a backend works with cupy, but input is numpy
* `numpy -> numpy` or `numpy -> cupy`? Both can make sense, although `numpy -> cupy` might break things if done globally.
* To resolve this, backend may need to know if the user *forced* a specific backend.
* This is very important, but I think we can start with type dispatching only.
* NetworkX can_run/should_run:
* Can inform why a backend is not chosen.
* Maybe logging should be available for information like "I had to copy that the GPU more than once!".
* Use import hooks with a very lightweight description of the backend:
* No import of `cucim` in the `skimage-cucim-backend`.
* Mapping from `skimage.segmentation:watershed` to `cucim.segmentation:watershed` (maybe you don't even need the second name, because you can guess it).
### User selecting backend:
```
with backend("cucim"):
func()
func(cupy.asarray(x))
func.invoke(cupy.narray)(...)
func(..., backend="cucim")
with skimage.log_backend_use() as info:
...
```
More complex case from sklearn:
```python=
pipeline = make_pipeline(
Transformer1() # can be optimized by backend1
Transformer2() # can be optimized by backend1 and backend 2
Classifier()
)
with backend(["backend2", "backend1"]):
pipeline.fit(X_train, y_train).score(X_test, y_test)
# sklearn use-case:
class Transform:
def fit():
...
def transform():
...
def func_validator(arr1, param):
if param not in [None, "default"]:
raise ValueError(...)
return arr1
@dispatchable(0)
def func(arr1, param=None):
...
```
```
create_empty_image(like=...)
```
* Validation can be very nice.
* Breaks down if the backend can do more than the original implementation.
* You could "inherit" the default validation function, but the backend could override it.
* If you want to check shapes, etc. the backend must be the one that validates.
* Make sure tracebacks are not too deep!
### Input validation?
* the original library should validate inputs?
* NetworkX just defers to the backend.
### What to do in spatch
* Start with type dispatching
* `func(backend="...")`, `func(dispatching_info)`
* Backend could do the "should_run"/"wants_run" in the backend without ever importing the library.
One way:
```
@implements("skimage.colors:rgb2")
def my_impl(*args, **kwargs):
...
def get_implementation(name):
..
```
```
def test_mytest(conv):
res = skiage_func(conv.to(numpy_arr))
res = conv.from(res)
```
* Sklearn requires providing a to/from numpy function.
Or the plugin just mirrors the namespace exactly.
---
## Array API
* What topics should we talk about?
* array API "the missing bits"
* functions that exist in Numpy/PyTorch/etc but aren't in the standard
* hurdles encountered during adoption
* CI
* Extensions - fft, linalg, (special?)
* Limitations - compiled code
* Overlap with delegation/dispatching
* Why not just dispatching? Big consumers & popular array libraries vs smaller libraries and consumers
---
## Narwhals/pandas dispatching