Dispatching and backend selection

# Dispatching and backend selection **https://hackmd.io/@seberg/spatch** ## Dispatching/backend selection from NetworkX to skimage ### Use cases ### Problem space For requirements see also: https://github.com/scientific-python/spatch/issues/1 1. Type dispatching -> Backend selection -> automatic conversion * Type dispatching means: cupy in -> cupy out * "backend selection": * `backends=[cupy, numpy]`, `with backend("cupy"): ...`, ... * User enables (chooses) a backend that takes over (numpy -> CUDA/MPI/... -> numpy out) 2. Other dimensions: * Runtime/online documentation for/of backends * Backend specific arguments? * NetworkX documents backend additional kwargs * e.g. NetworkX `cugraph_kwargs={...}` to use for backend and ignore otherwise * Or just pass in extra keyword, `chunks=chunks` * "Lazy imports": * We don't want to import cupy, dask, ... if the user only uses a few small NumPy arrays. * Easy to inspect which backends are being used 3. API: How do we want to spell things * Ignore the "sausage" of doing the dispatching * E.g. `@dispatchable` in the library (skimage) * `@implements` in the provider (cucim) * IMO, this one can be a bit specialized, i.e. the "official" API provided doesn't have to be as lean. 4. Testing and endorsement of plugins. 5. Inspection! * Python logging (or so) * Know which backend was used or not used. #### Solutions ``` def find_algo(graph0, graph1): return graph0, graph1 @dispatchable(0, "") def algo(graph0, graph1): return ``` * Type dispatching: * NetworkX has `graph.__networkx_backend__`, NumPy has `__array_function__` on the types * Adding to dunder to `np.ndarray` is tricky * Could also just loop through backends until one matches. * My (seberg's) opinion: * Use types only initially (can cache it for very quick dispatching e.g. if there are 5 backends but you use numpy arrays) * Could be a `can_run/matches` function * But... maybe start with `cupy:ndarray` and `numpy:ndarray` exact type matching? (Ensures minimal dispatching, easily in the future) * Tim: Have a function potentially for each function. * seberg: Or is this a "should run" (i.e. a second call later) * return NotImplemented, "This is too small!" * Backend selection: * When a backend is chosen and can work on types, it should. * Subtleties/problems? * What if a backend works with cupy, but input is numpy * `numpy -> numpy` or `numpy -> cupy`? Both can make sense, although `numpy -> cupy` might break things if done globally. * To resolve this, backend may need to know if the user *forced* a specific backend. * This is very important, but I think we can start with type dispatching only. * NetworkX can_run/should_run: * Can inform why a backend is not chosen. * Maybe logging should be available for information like "I had to copy that the GPU more than once!". * Use import hooks with a very lightweight description of the backend: * No import of `cucim` in the `skimage-cucim-backend`. * Mapping from `skimage.segmentation:watershed` to `cucim.segmentation:watershed` (maybe you don't even need the second name, because you can guess it). ### User selecting backend: ``` with backend("cucim"): func() func(cupy.asarray(x)) func.invoke(cupy.narray)(...) func(..., backend="cucim") with skimage.log_backend_use() as info: ... ``` More complex case from sklearn: ```python= pipeline = make_pipeline( Transformer1() # can be optimized by backend1 Transformer2() # can be optimized by backend1 and backend 2 Classifier() ) with backend(["backend2", "backend1"]): pipeline.fit(X_train, y_train).score(X_test, y_test) # sklearn use-case: class Transform: def fit(): ... def transform(): ... def func_validator(arr1, param): if param not in [None, "default"]: raise ValueError(...) return arr1 @dispatchable(0) def func(arr1, param=None): ... ``` ``` create_empty_image(like=...) ``` * Validation can be very nice. * Breaks down if the backend can do more than the original implementation. * You could "inherit" the default validation function, but the backend could override it. * If you want to check shapes, etc. the backend must be the one that validates. * Make sure tracebacks are not too deep! ### Input validation? * the original library should validate inputs? * NetworkX just defers to the backend. ### What to do in spatch * Start with type dispatching * `func(backend="...")`, `func(dispatching_info)` * Backend could do the "should_run"/"wants_run" in the backend without ever importing the library. One way: ``` @implements("skimage.colors:rgb2") def my_impl(*args, **kwargs): ... def get_implementation(name): .. ``` ``` def test_mytest(conv): res = skiage_func(conv.to(numpy_arr)) res = conv.from(res) ``` * Sklearn requires providing a to/from numpy function. Or the plugin just mirrors the namespace exactly. --- ## Array API * What topics should we talk about? * array API "the missing bits" * functions that exist in Numpy/PyTorch/etc but aren't in the standard * hurdles encountered during adoption * CI * Extensions - fft, linalg, (special?) * Limitations - compiled code * Overlap with delegation/dispatching * Why not just dispatching? Big consumers & popular array libraries vs smaller libraries and consumers --- ## Narwhals/pandas dispatching