owned this note
owned this note
Published
Linked with GitHub
Design space of dispatching and NetworkX lessons
================================================
> Opinionated take-aways and summary by [name=seberg]
Types of dispatching
--------------------
As a pet-peeve of me (seberg), I would like to point out again that we have two main categories of dispatching (also as nomenclature below):
1. **Type dispatching**:
* A system inspects the inputs and uses a different implementation for different types.
* Examples:
*`np.mean(cupy_array) -> cupy_array`.
* `dask_array + numpy_array -> dask_array`. (Python binary ops)
* Generally the term "Multiple dispatch".
* **NetworkX** uses type dispatching (via it's own mechanism).
2. **Backend selection**:
* An *alternative* implementation is provided, which could differ only in the algorithm used (i.e. a different computational backend that with comparable results).
* Is *not* just selected based on the types (types could play a role). Rather it is (maybe?) user selected or enabled.
* Examples (possible, add NetworkX one):
* `process_image(numpy_array, backend="gpu") -> numpy_array`
* `with config(backend="gpu"): process_image(image)`
* ...
Backend selection can encompass type dispatching (if it isn't purely a computational backend, e.g. an alternative algorithm implemented by someone else).
However, I am mentioning it since type dispatching has (to me) clearer concepts.
Things to consider (learn from NetworkX) beyond dispatching
----------------------------------------
I think there are a few take-aways that NetworkX does very well:
* NetworkX modifies the doc strings based on the installed backends.
* NetworkX has some additional mechanisms, which may not be relevant for everyone:
* A conversion cache since conversion can be slow (in the case of NetworkX attached to the consumed objects).
* ...
**Implementation takeaway: Entrypoints**
If listing possible backends in the docs is desired the use of entry-points is definitely required.
NetworkX has two entry-points, one of the backend and one for additional information (optional?).
These must *not* do any expensive imports but only define the features provided by the backend.
Even if we don't do anything with the docs until import, it seems good to have an entry point to be able to discover what is available?
How to do the exact dispatching
-------------------------------
**Type dispatching:**
1. **NetworkX** attaches `__networkx_backend__ = backend_name` to the object. I.e. it uses an AbstractType defined via the dunder. (Done on the object, not type, although it shouldn't matter much.)
* if backend is missing, an error is raised
* Does currently not try to promote between multiple backends. (Raise an error, even if one backend signals it cannot handle it.)
2. You can use "proper" multiple dispatching via `isinstance/issubclass` checks:
* If you do not wish to import things, there may be ways to do this:
* via `__module__ + __qualname__` (could even walk the mro, but I doubt subclassing is relevant)
* Could register at import (e.g. `ABC.register()`), but that requires the object provider to have some code.
* Allow users to provide ABC. Can do both of the above, but also e.g. `hasattr("__array_namespace__")`
3. The dunder way of Python operators or `__array_function__`:
* Ask each type whether it wants to handle it and let the first that does do. (You might sort subclasses first, which only matters if you are not invariant during dispatch. Array-function isn't, but it is unclear that it is helpful.)
4. The uarray way:
* Namespaces of possible implementations which are queried for whether they want to handle a function living in its subnamespace.
5. The "linear" way: Simply loop backends (in some order) and use the first which matches.
Another point is whether or not other inputs are taken into account. Many dispatchers (e.g. also NetworkX) does this via a `match` or `can_handle` function.
**Backend Selection and prioritization:**
Possible ways to do backend selection:
* `backend="backend"` during function call.
* `dispatchable.invoke(type or backend)(...)` (some multiple dispatchers do this, IIRC).
* `backend[API]()` or `backend.API()` may be another way. Or `dispatchable.plugin()` (one example https://github.com/metagraph-dev/metagraph)
The additional is enabling a backend/prioritizing it through [context managers](https://github.com/networkx/networkx/pull/7485?notification_referrer_id=NT_kwDN8hmxMTA5NTAxOTY5NDk6NjE5Nzc), which is also something uarray does.
Automatic conversion?
---------------------
A backend system could provide automatic conversion:
* Convert input arguments to the required input type
* Automatically convert output arguments back again.
The first point is mostly interesting if you allow selecting a backend explicitly. If the user very explicitly selects a CuPy backend, then the input could be converted.
* This could be part of the normal conversion, e.g. `cp.asarray()` already allows NumPy arrays.
* Could be a hook function to provide a specific converter (seberg: I don't have a good reason for this right now)
*NetworkX* (not sure!), doesn't care about converting back again. In general, it assumes all return types are interchangable from a user perspective (quack like a dict of dicts).
**Fallback implementation:**
By allowing to signal conversion a container would allow falling back to another implementation.
In practice, even without this implementing `__array__()` or quacking like a dict-of-dicts may work for many of such fallbacks.
Should do/Can do?
-----------------
* *NetworkX*:
* Backend can be queried for support
* Backend can choose not to do something (bad parameters, problem too small)
* `__array_function__`: Backend would have to implement the fallback itself. No way to defer to "next" backend.
* `uarray`: Can return `NotImplemented` to not match.
Additional notes on backend priority?
-------------------------------------
When it comes to prioritizing backends the `backend=` approach is very explicit. But other approaches could be e.g. context variables.
Even if backends are only registered, if backends do not match very strictly (type invariant), multiple backends could match and the order in which they are tried would matter.
There are various approaches that could be used to deal with this:
* Require users to specify a lit of all active `backends=[...]` which are tried in order.
* Prioritize dispatching for type dispatching:
* Requires a type hierarchy. E.g. if a backend works for all `Array API` arrays, `issubclass(np.ndarray, ArrayAPICapable)` is true and a multiple-dispatcher can prioritize.
* Warning/error could be given when an order cannot be established (seberg: not sure this is necessary clear at registration time).
* The use of a `with` statement could prioritize any backend before all (reorder).
* This is also the choice of `uarray`, with issues. (It reordered within namespace scopes so that you have to know at which namespace scope level a backend was registered to know if it would be priorized over another.)
*
API variations
--------------
What type of API strictness do you want to enforce or even explicitly allow passing extra kwargs that are backend specific.
NetworkX allows additional kwargs (or even different ones maybe?). This is not common in scikit-image/cucim for example (but unsupport kwarg exists).
It has it's own way to pass these to a function.
Sometimes cuCim uses different accuracy compared to scikit-image.
Introspection capabilities
--------------------------
What introspection should exist? A `.plan()` that tells you everything of what _would_ happen. Just being able to `.invoke()` to do the dispatching, or a fetch?
If parameters are important