yt uses unyt
to track and convert units so if we are using dask
for IO and want to return delayed arrays, we need some level of dask
-unyt
support.
In the notebook, working with unyt and dask, I demonstrate an initial prototype of a dask
-unyt
array. In this notebook, I create a custom dask collection by sublcassing the primary dask.array
class and adding some unyt
functionality in hidden sidecar attributes. This custom class is handled automatically by the dask
scheduler, so that if we have a large dask array with a dask client running and we create our new dask
-unyt
array, e.g.:
import dask.array as da
from dask.distributed import Client
client = Client(threads_per_worker=2, n_workers=2)
x = da.random.random((10000, 10000), chunks=(1000, 1000))
x_da = unyt_from_dask(x, unyt.m)
then when we do operations like finding the minimum value across all the chunks:
x_da.min().compute()
we are returned a standard unyt_array
unyt_array(3.03822589e-09, 'm')
that was calculated by processing each chunk of the array separately.
This implementation handles unyt
functionality as hidden sidecar attributes. These attributes track changes to units separately from the Dask graph and apply final conversion factors only after calls to .compute()
.
This notebook demonstrates a general and fairly straightforward way to build in Dask support to unyt
which can be used in conjuction with, for example, the prototype dask-enabled particle reader to return arrays with both dask and unyt functionality preserved.