Drilldown

2020-06-16 Drilldown

  • Bokeh is based on events, and if you wanted to link your various plots in Bokeh, you'd write callbacks that subscribe to Bokeh events. This is very general and can handle linked brushing, drilldown, and just about any other GUI interactions you might come up with. But it's difficult to reason about and not compositional; complex dashboards end up tightly bound together by their callback graph so that each new dashboard has to be built up from scratch.
  • HoloViews abstracts the Bokeh event system into streams, which can have a semantic content (being "about the data"). HoloViews streams are compositional and reactive, which makes it much simpler to connect them up as you build a complex system, each bit taking care of its own business. The streams are Python (Parameterized) objects that can be accessed and queried independently of the event/callback system, allowing you to develop and maintain separate parts of your application separately, and port bits of functionality from one dashboard to another. But it still requires reasoning about streams of data and how they are connected up, using a mental model that's far from where most data-science users are starting. Ideally, interactions between plots would be expressed as relationships between datasets and dimensions, not connecting streams of values from sources to sinks.
  • link_selections now lets a user set up various streams automatically, traversing a set of plots and finding all related dimensions and shared datasets, then creating streams that connect all of them to a shared selection object and also adding extra selected plots overlaid on the main plots. For the special case of a shared selection object, this approach allows users to achieve quite sophisticated ways of exploring multidimensional datasets too large to be conveyed in a single plot, without having to learn about events or even streams. However, at present only a single "cross-filter" configuration is supported: each plot visualizes all the data it covers, and then also has an overlaid selection of a subset of what is in the main plot.
  • Can we do the same for drilldown, where there is no underlying main plot, but which in many ways otherwise similar to link_selections? In a drilldown, the user interacts in some way with one plot, selecting one (or sometimes multiple) item(s) that determine what is shown in another separate plot. A canonical example is to select some aggregated or reduced value in a main plot (e.g. the current value of X for a county or a company or some other entity), then the drilldown plot shows a non-aggregated or unreduced plot (e.g. a timeseries of value X, a distribution of value X, or the same for some other different value Y).
  • GUI tools do provide some support for drilldowns, but it is not clear how relevant they are to what we are trying to do here:
  • Examples of what we consider drilldown:

What is required to specify a drilldown?

  • Dataset
  • Main plot:
    • What is the reduction you performed to get the plot?
      • select (latest, min, max, etc.)
      • aggregate (average, mode, etc.)
      • sample
    • What is the index that you can select on this plot?
  • Derived/drilldown plot
    • Get selection index on main plot
    • What do you do with other dimensions, i.e. what reductions are applied?
    • Is the selection index dimension reduced or do you generate an overlay for multiple drilldown indices?
    • Where does the drilldown plot go? Tooltip (pops up) or different plot (already in layout)?

Examples

COVID Choropleth Map Example

  • Dataset [Counties, Time] (Geometry, Rate)
  • Main Plot:
    • Reduction: Select('Time')
    • Index: 'County'
  • Derived:
    • Takes County index and produces Curve [Time] (Rate)
    • Plot goes into tooltip

Example with extra dimensions

  • Dataset [Counties, Time, Age] (Rate)
  • Main plot:
    • Reduction: Select('Time'), Mean('Sum')
    • Index: 'County Index' -> Transform -> 'County FIPS'
  • Derived:
    • Either aggregate by Sum('Age') or display curve for each Age

What would be required to extend linked selections to support drilldown?

There is substantial overlap between drilldown and link_selections. What would be required to extend linked selections to support drilldown:

  • Support different selection tools, i.e. tap, hover instead of only lasso/box select
  • Real difference in drilldown vs. linked selection is what happens on no selection, i.e. is it empty, do you display all the data, or is there some other initial default?
  • Could leverage the index_columns feature in linked_selections since that effectively produces an index which is equivalent to the selection index you would usually see in a drilldown plot.

The benefit of using linked selection machinery for drilldown is that the information about reductions would all be encoded in the pipeline for the derived/drilldown plot.

The API also differs; since link_selections usually cross-filters across multiple plots and applies the selections bi- or multi-directionally it may not be suitable for drilldown. In the drilldown case you usually want to explicitly designate the main plot which represents some low-dimensional reduction of the full data and the drilldown plot which reveals some, usually unaggregated or less aggregated, view of (a small subset of) the data.

Summary

The two feasible options seem to be:

  1. Provide a new API that allows providing either one or two datasets for the main and derived plot and specifies the reduction operations plus how to use the selection index to filter the drilldown plot. E.g. in the choropleth example the selection index ould be the county you are hovering over.
    • Pros: Very explicit
    • Cons: Potentially limited, entirely different implementation from link_selections, unclear how to specify the element, styling options etc.
  2. Use the mechanisms we introduced for link_selections and potentially reuse large parts of its implementation, i.e. use the .dataset and .pipeline to replay the transforms applied to the Dataset.
    • Pros: Instead of providing a dataset and how to transform it into the derived plot given the selection index, you simply provide an element that specifies a pipeline that generates the drilldown plot. There is no need for callbacks or abstract specification of reductions, and styling is simply inherited.
    • Cons: The user specification of a derived plot/element would in many cases blow up your browser because it relies on the fact that the selection index is actually applied, e.g. in the timeseries per county example you would basically take Dataset ['Time', 'County'] ('Rate') and turn that into Curve ['Time'] ('Rate') but without knowing the drilldown selection on the 'County' that is a jumbled mess. It's particularly problematic for gridded datasets, e.g. imagine a gridded Dataset [lat, lon, time] (Temperature). There is no way to declare Image [lon, lat] (Temperature) without first selecting a specific time, but if 'Time' represents the index you are selecting then how would you specify this?

Basically, we need a way to create an Element as if a selection was already available, but with the selection being lazy and not actually used until it's connected up later to a main plot. Perhaps we can create some suitable selection object that can be provided to the Element with some user-determined initial state (all data, no data, some initial selection), which is then overridden by link_selections once a selection is made?