data-exp-lab

@data-exp-lab

Public team

Joined on Sep 4, 2020

  • A sketch of what we might want to do in the yt stuff for dask, async, etc. Basically, we currently have index objects on every dataset. I think the obvious next steps for getting us to work with dask involve first identifying the operations that are important, which I would put as derived fields (including spatial derived fields), generation of input to visualization routines, and high-level operations like max mean etc. These include dimensional reduction (projections). Make our dataset object, which currently is a 1:1 mapping to index objects, instead have the option to have multiple index objects. The first step would be to simply turn index into a list and have it iterate over them. Modify our chunking system to operate asynchronously eliminate the ability to access an array implicitly by getitem on a dataset, and make this an explicit operation. For instance, with that first step, it would mean that we could have multiple particle indices associated with a single dataset. For instance, having co-registered halos and particles, or co-registered fluid/geographic datasets from different sources. If we assume that each field type is associated with an index, this would simplify the process -- so we could have 1:N mappings from index to field types, but 1:1 from field types to index objects
     Like  Bookmark