Try   HackMD

Datashader Meetings

First Meeting - 01/11/22

Introduction & History

  • Original vision for Datashader as described by Peter:
    • Statistical computation aggregating kernels, e.g. of uncertainty per datapoint
  • Simpler vision that actually got implemented: "2D histograms"
  • Original version written in Java, then briefly implemented as "abstract rendering" in Bokeh in early days of Anaconda/Continuum, then finally implemented from scratch in Python/Numba with some basic Bokeh support
  • Originally created using funding from government grants, then funded through various contracts with primarily government agencies
  • Currently no dedicated Datashader funding, though it's improved in small ways as part of many other funded projects, including some current funding for improving Datashader timeseries/line rendering support
  • The history of Datashader development increased capabilities along two axes: (1) which glyphs that can be rendered (starting with points, soon followed by polylines) and later adding support for rasters, quad/tri-meshes, polygons etc., and (2) secondarily adding data structures that would allow efficient rendering (starting with pandas/dask and pushing into GPU with cuDF and ragged arrays with spatialpandas)

Makepath interest

  • Primarily interested in efficient rendering pipeline for both raster and vector data
  • Datashader addresses vector -> raster
  • Hoping to add support for distributed Canvas, i.e. outputting rasters that are larger than memory (previously out of scope because of Datashader assuming it is rendering to a display device with limited resolution)
  • Other interests: Anti-aliasing, ragged array representations (e.g. for contouring)

Agenda

  • Ragged Arrays:
    • Geopandas vs Spatialpandas
    • Awkward Array vs Arrow
    • History: Geopandas has history of depending on geo-stack, while spatialpandas only depends on arrow and numba. Geopandas pushing towards more efficient in-memory representation and eliminating problematic required dependencies.
    • Ownership of Spatialpandas?
  • Distributed Canvas:
    • Datashader originally built with opposite assumption "data much larger than memory; canvas basically a screen buffer"
  • Project "Ownership"
    • Currently Jim is project owner
    • PR merging is blocked by Jim's limited bandwidth for reviewing complex PRs
    • Add secondary owner?
  • Active PRs:
    • Discrete color keys
    • Ian working on
  • Need high-level "developer docs" for the internal design of rendering pipeline covering multi-dispatch, numba code generation, optimization passes