volume rendering intro for geodata in yt this presentation: https://bit.ly/ytgeostuff yt's overview on volume rendering: link Chris's 2020 AGU poster: citeable archive, direct repo link. Other code/repos: yt: https://yt-project.org/ ytgeotools: https://github.com/chrishavlin/ytgeotools yt_idv: https://github.com/yt-project/yt_idv
9/27/2022Table of Contents Overview Experiments in daskifying yt Development plan Some Final Notes yt and Dask: an overview In the past months, I've been investigating and working on integrating Dask into the yt codebase. This document provides an overview of my efforts to date but also is meant as a a preliminary YTEP (or pYTEP?) to solicit feedback from the yt community at an early stage before getting to far into the weeds of refactoring.
1/15/2021return to main post 1. (particle) data IO Full notebook available here. yt reads particle and grid-based data by iterating across the chunks, with frontend-specific IO functions. For gridded data, each frontend implements a _read_fluid_selection (e.g., yt.frontend.amrvac.AMRVACIOHandler._read_fluid_selection) that iterates over chunks and returns a flat dictionary with numpy arrays concatenated across each chunk. The particle data, frontends must implement a similar function, _read_particle_fields, that typically gets invoked within the BaseIOHanlder._read_particle_selection function. In both cases, the read functions accept the chunks iterator, the fields to read and a selector object: def _read_particle_fields(self, chunks, ptf, selector): def _read_fluid_selection(self, chunks, selector, fields, size):
1/15/2021return to main post 2. profile calculation: operations on chunked data The prototype dask-Gadget reader above returns the flat numpy arrays expected by _read_particle_selection(), but if instead we could return dask objects, we could easily build up parallel workflows. In the notebook here, I explored how to create a daskified-profile calculation under the assumption that our yt IO returns dask arrays. The notebook demonstrates two approaches. In the first, I use intrinsic dask array functions to recreate a yt profile using dask's implementation of np.digitize and then summing over the digitized bins. The performance isn't great, though, and it's not obvious how the approach could be extended to reproduce all of the functionality of ds.create_profile() in yt (multiple binning variables, returning multiple statistics at once, etc.). In the second approach, I instead work out how to apply the existing yt profile functions directly to the dask arrays. When you have a delayed dask array, you can loop over a chunk, apply a function to each chunk and then reduce the result across the chunks. In pseudo-code, creating a profile looks like:
1/6/2021or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up