Try   HackMD

return to main post

3. dask-unyt arrays

yt uses unyt to track and convert units so if we are using dask for IO and want to return delayed arrays, we need some level of dask-unyt support.

In the notebook, working with unyt and dask, I demonstrate an initial prototype of a dask-unyt array. In this notebook, I create a custom dask collection by sublcassing the primary dask.array class and adding some unyt functionality in hidden sidecar attributes. This custom class is handled automatically by the dask scheduler, so that if we have a large dask array with a dask client running and we create our new dask-unyt array, e.g.:

import dask.array as da
from dask.distributed import Client
client = Client(threads_per_worker=2, n_workers=2)

x = da.random.random((10000, 10000), chunks=(1000, 1000))
x_da = unyt_from_dask(x, unyt.m)

then when we do operations like finding the minimum value across all the chunks:

x_da.min().compute()

we are returned a standard unyt_array

unyt_array(3.03822589e-09, 'm')

that was calculated by processing each chunk of the array separately.

This implementation handles unyt functionality as hidden sidecar attributes. These attributes track changes to units separately from the Dask graph and apply final conversion factors only after calls to .compute().

This notebook demonstrates a general and fairly straightforward way to build in Dask support to unyt which can be used in conjuction with, for example, the prototype dask-enabled particle reader to return arrays with both dask and unyt functionality preserved.

return to main post