napari: asynchronous slicing

# napari: asynchronous slicing Author: [Andy Sweet](mailto:andrewdsweet@gmail.com) Author: Jun Xi Ni Status: Draft Type: Standards Track Created: 2022-04-12 STALE: see [NAP-4 on async slicing]( https://github.com/napari/napari/blob/main/docs/naps/4-async-slicing.md) ## Abstract Slicing a layer refers to the act of generating a partial view of the layer's data. The main use of slicing in napari is to define the data that should be rendered in the canvas based on the dimension slider positions. This project has two major aims. 1. Slice layers asynchronously. 2. Improve the architecture of slicing layers. We considered addressing these two aims in two separate projects, especially as (2) is likely a prerequisite for many acceptable implementations of (1). However, we believe that pursuing a behavioral goal like (1) should help prevent over-engineering of (2), while simultaneously providing important and tangible benefits to napari's users. The aims should ideally cover all napari Layer types, though initial progress may be scoped to image and points layers. ## Motivation and scope ### Asynchronous slicing Currently, all slicing in napari is performed synchronously. For example, if a dimension slider is moved, napari blocks while slicing each layer before updating the canvas. When slicing layers is slow, this blocking behavior makes interacting with data difficult and napari may be reported as non-responsive. ![The napari viewer displaying a 2D slice of a remote 3D multi-resolution electron microscopy image. Dragging the slider changes the 2D slice, but the slider position and canvas updates are slow and make napari non-responsive.](https://i.imgur.com/cAJxkLq.gif) Consider two reasons why slicing can be slow. 1. Some layer specific slicing operations perform non-trivial calculations (e.g. points). 2. The layer data is read lazily (i.e. it is not in RAM) and latency from the source may be non-negligible (e.g. stored remotely, napari-omero plugin). By slicing asynchronously, we can keep napari responsive while allowing for slow slicing operations. We could also consider optimizing napari to make (1) less of a problem, but that is outside the scope of this project. ### Slicing architecture Currently, there are a number of existing problems with the technical design of slicing in napari. - Layers have too much state [^issue-792] [^issue-1353] [^issue-1775]. - The logic is hard to understand and debug [^issue-2156]. - The names of the classes are not self-explanatory [^issue-1574]. Some of these problems and complexity were caused by a previous effort around asynchronous slicing in an effort to keep it isolated from the core code base. By contrast, our approach in this project is to redesign slicing in napari to provide a solid foundation for asychronous slicing and related future work like multi-canvas and multi-scale slicing. ### Goals To summarize the scope of this project, we define a few high level goals, each with many prioritized features where P0 is a must-have, P1 is a should-have, and P2 is a could-have. Some of these goals may already be achieved by napari in its current form, but are captured here to prevent any regression caused by this work. #### 1. Remain responsive when slicing slow data - P0. When moving a dimension slider, the slider remains responsive so that I can navigate to the desired location. - Slider can be moved when data is in the middle of loading. - Slider location does not return to position of last loaded slice after it was moved to a different position. - P0. When the slider is dragged, only slice at some positions so that I don’t wait for unwanted intermediate slicing positions. - Once slider is moved, wait before performing slicing operation, and cancel any prior pending operations (i.e. be lazy). - If we can reliably infer that slicing will be fast (e.g. data is a numpy array), consider skipping this delay. - P0. When slicing fails, I am notified so that I can understand what went wrong. - May want to limit the number of notifications (e.g. lost network connection for remote data). - P1. When moving a dimension slider and the slice doesn’t immediately load, I am notified that it is being generated, so that I am aware that my action is being handled. - Need a visual cue that a slice is loading. - Show visual cue to identify the specific layer(s) that are loading in the case where one layer loads faster than another. #### 2. Clean up slice state and logic in layers - P0. Encapsulate the slice request and response state for each layer type, so that I can quickly and clearly identify those. - Minimize number of (nested) classes per layer-type (e.g. `ImageSlice`, `ImageSliceData`, `ImageView`, `ImageLoader`). - Capture the request state from the `Dims` object. - Capture the response state that vispy needs to display the layer in its scene (e.g. array-like data, scene transform, style values). - P0. Simplify the program flow of slicing, so that developing and debugging against allows for faster implementation. - Reduce the complexity of the call stack associated with slicing a layer. - The implementation details for some layer/data types might be complex (e.g. multi-scale image), but the high level logic should be simple. - P1. Move the slice state off the layer, so that its attributes only represent the whole data. - Layer may still have a function to get a slice. - May need alternatives to access currently private state (e.g. 3D interactivity), though doesn't necessarily need to be in the Layer. E.g. a plugin with an ND layer, that gets interaction data from 3D visualization , needs some way to get that data back to ND. - P2. Store multiple slices associated with each layer, so that I can cache previously generated slices. - Pick a default cache size that should not strain most machines (e.g. 0-1GB). - Make cache size a user defined preference. #### 3. Measure slicing latencies on representative examples - P0. Define representative examples that currently cause *desirable* behavior in napari, so that I can check that async slicing does not degrade those. - E.g. 2D slice of a 3D image layer where all data fits in RAM, but not VRAM. - P0. Define representative examples that currently cause *undesirable* behavior in napari, so that I can check that async slicing improves those. - E.g. 2D slice of a 3D points layer where all data fits in RAM, but not VRAM. - E.g. 2D slice of a 3D image layer where all data does not on local storage. - P0. Define slicing benchmarks, so that I can understand if my changes impact overall timing or memory usage. - E.g. Do not increase the latency of generating a single slice more than 10%. - E.g. Decrease the latency of dealing with 25 slice requests over 1 second by 50%. - P1. Log specific slicing latencies, so that I can summarize important measurements beyond granular profile timings. - Latency logs are local only (i.e. not sent/stored remotely). - Add an easy way for users to enable writing these latency measurements. ### Non-goals To help clarify the scope, we also define some things that were are not explicit goals of this project and give some insight into why they were rejected. - Make a single slicing operation faster. - The slicing code should mostly remain unchanged. - Useful future work, that may be made easier by changes here. - Scope creep: can be done independently on this work. - Improve slicing functionality. - For example, handling out-of-plane rotations in 3D+ images. - The slicing code should mostly remain unchanged. - Useful future work, that may be made easier by changes here. - Scope creep: can be done independently on this work. - Toggle the async setting on or off, so that I have control over the way my data loads. - May complicate the program flow of slicing. - When moving a dimension slider and the slice doesn’t immediately load, show of a low level of detail version of it, so that I can preview what is upcoming. - Requires a low level of detail version to exist. - Scope creep: should be part of a to-be-defined multi-scale project. - Store multiple slices associated with each layer, so that I can easily implement a multi-canvas mode for napari. - Scope creep: should be part of a to-be-defined multi-canvas project. - Solutions for goal (2) should not block this in the future. - Open/save layers asynchronously. - More related to plugin execution. - Lazily load parts of data based on the canvas' current field of view. - An optimization that is dependent on specific data formats (e.g. tiled image). - Identify and assign dimensions to layers and transforms. - Scope creep: should be part of a to-be-defined dimensions project. - Solutions for goal (2) should not block this in the future. - Thick slices of non-visualized dimensions. - Scope creep: currently being prototyped [^pull-4334]. - Solutions for goal (2) should not block this in the future. - Keep the experimental async fork working. - Nice to have, but should not put too much effort into this. - Do not delete some existing code, which may be moved into vispy (e.g. VispyTiledImageLayer). ## Related work As this project focuses on re-designing slicing in napari, this section contains information on how slicing in napari currently works. ### Existing slice logic The following diagram shows the call sequence generated by moving the position of a dimension slider in napari. ![](https://raw.githubusercontent.com/andy-sweet/napari-diagrams/main/napari-slicing-sync-calls.drawio.svg) Moving the slider generates mouse events that the Qt main event loop handles, which eventually emits napari's `Dims.events.current_step` event, which in turn triggers the refresh of each layer. A refresh first updates the layer's slice state using `Layer.set_view_slice`, then emits the `Layer.events.set_data` event, which finally passes on the layer's new slice state to the vispy scene node using `VispyBaseLayer._on_data_change`. All these calls occur on the main thread and thus the app does not return to the Qt main event loop until each layer has been sliced and each vispy node has been updated. This means that any other updates to the app, like redrawing the slider position, or interactions with the app, like moving the slider somewhere else, are blocked until slicing is done. This is what causes napari to stop responding when slicing is slow. Each subclass of `Layer` has its own type-specific implementation of `set_view_slice`, which uses the updated dims/slice state in combination with `Layer.data` to generate and store sliced data. Similarly, each subclass of `VispyBaseLayer` has its own type-specific implementation of `_on_data_change`, which uses the new sliced data in the layer, may post-process it and then passes it to vispy to be rendered on the GPU. ### Existing slice state It's important to understand what state is currently used for slicing in napari. Ideally, we want to encapsulate this state into an immutable slice request and response, rather than keep it on the layer as mutable state, some of which may be read by the vispy layer when slicing is done. This is especially important for asynchronous slicing because the main thread may mutate this state while slicing is occurring, resulting in unpredictable and potentially unsafe behavior. - `Layer` - `data`: array-like, the full data that will be sliced - `corner_pixels`: `Array[int, (2, ndim)]`, used for multi-scale images only - `scale_factor`: `int`, converts from canvas to world coordinates based on canvas zoom - `loaded`: `bool`, only used for experimental async image slicing - `_transforms`: `TransformChain`, transforms data coordinates to world coordinates - `_ndim`: `int`, the data dimensionality - `_ndisplay`: `int`, the display dimensionality (either 2 or 3) - `_dims_point`: `List[Union[numeric, slice]]`, the current slice position in world coordinates - `_dims_order`: `Tuple[int]`, the ordering of dimensions, where the last dimensions are visualized - `_data_level`: `int`, the multi-scale level currently being visualized - `_thumbnail`: `ndarray`, a small 2D image of the current slice - `_ImageBase` - `_slice`: `ImageSlice`, contains a loader, and the sliced image and thumbnail - lots of complexity encapsulated here and other related classes like `ImageSliceData` - `_empty`: `bool`, True if slice is an empty image, False otherwise (i.e. hasn't been filled by exp async slicing yet?) - `_should_calc_clims`: `bool`, if True reset contrast limits on new slice - `_keep_auto_contrast`: `bool`, if True reset contrast limits on new data/slice - `Points` - `__indices_view` : `Array[int, (-1,)]`, indices of points (i.e. rows of `data`) that are in the current slice/view - lots of private properties derived from this like `_indices_view` and `_view_data` - `_view_size_scale` : `Union[float, Array[float, (-1)]]`, used with thick slices of points `_view_size` to make out of slice points appear smaller - `_round_index`: `bool`, used to round data slice indices for all layer types except points - `_max_points_thumbnail`: `int`, if more points than this in slice, randomly sample them - `_selected_view`: `list[int]`, intersection of `_selected_data` and `_indices_view`, could be a cached property ## Detailed description: in progress The following diagram shows the new proposed approach to slicing layers asynchronously. ![](https://raw.githubusercontent.com/andy-sweet/napari-diagrams/main/napari-slicing-async-calls.drawio.svg) As with the existing synchronous slicing design, the `Dims.events.current_step` event is the shared starting point. In the new approach, we pass `ViewerModel.layers` through to the newly defined `LayerSlicer`, which synchronously makes a slice request for each layers on the main thread. Each layer then processes this request asynchronous on a dedicated thread for slicing, while the main thread returns quickly to the Qt main event loop, allowing napari to keep responding to other updates and interactions. When all the layers have generated slice responses, the `slice_ready` event is emitted. That triggers `QtViewer._on_slice_ready` to be executed on the main thread, ### Slice request and response Our core approach is to remove state from the layers that is currently mutated by `Layer._slice_dims` and instead encapsulate it in `LayerSampleRequest` and `LayerSampleResponse` classes that are the input and output of layer-type specific sampling function. ```python class LayerSampleRequest: # The data to be sampled. data: ArrayLike # Mapping from world to data coordinates. transform: Transform # Sample index in world coordinates. point: Tuple[float, ...] # The dimensions displayed in the canvas. dims_displayed: Tuple[int, ...] # The dimensions not-displayed in the canvas. dims_not_displayed: Tuple[int, ...] class LayerSampleResponse: # The sampled data. data: ArrayLike # Mapping from data to world coordinates. transform: Transform class Layer: ... def sample(request: LayerSampleRequest) -> LayerSampleResponse: ... ``` We use the term sampling instead of slicing to avoid confusion with Python's built-in `slice` function. The requests and response types may have some layer-type specific fields. They will also be initially private, so that we allow for some changes as we better understand what the long term public API should look like. Ideally `Layer.sample` should be a static function that does not depend on any layer state other than that copied and referenced in the request, so could just be defined as a plain function. We define a `LayerSampler` class to generate the appropriate sample requests caused by changes to state in some of napari's existing model classes, such as `Dims`, `Camera`, and the layers themselves. This class schedules the required calls to `Layer.sample` to be executed asynchronously and provides some way for the vispy layers to consume the generated response data when those asynchronous tasks are done. ![](https://i.imgur.com/lZx63mT.png) Running sampling tasks asynchronously has two advantages. 1. Sampling code that releases the GIL (e.g. reading data) will truly run in parallel and allow the main thread to keep the GUI responsive. 2. Pending sample tasks can be cancelled, avoiding unnecessary work and responding more quickly to the tasks that matter (e.g. the latest slider position). ![](https://i.imgur.com/vQqlLNt.jpg) ### Generate layer sample requests on the main thread - Make immutable copies of small state (e.g. `Dims.point`). - No need for mutexes to read/write this state. - Use a reference to the existing `Layer.data`. - In general, this could be huge so we can't make an immutable copy. - If the reference is mutated in place, we might get an inconsistent sample, but I think it should be safe. A refresh after the mutation should produce a consistent slice, so this should be OK. - If `Layer.data` is reassigned, then the sample will be stale, but reassigning data should cause a refresh anyway. - Could also require a lock to access `Layer.data` when slicing is occurring. - Is the GIL enough here? Only if we access `data` once asynchronously. - Could use a weak reference. - If the strong reference count is 0 (e.g. because it was reassigned), then we could use this as a sign to stop and return `None`, which might allow us to finish early and act as a form of cancellation. - Consider adding a `cancelled: bool` field to allow finer grained cancellation. - Without this, we can only cancel pending tasks that have not started execution yet. - With this, we could finish sampling a layer early, and could also prevent canvas updates. ### Use a single threaded executor for sampling - [`ThreadPoolExecutor(max_workers=1)`](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor) - Use [`ThreadPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) instead of [`ProcessPoolExecutor`](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) because all `Layer.data` would need to be stored in [`SharedMemory`](https://docs.python.org/3/library/multiprocessing.shared_memory.html) which could be expensive to guarantee (i.e. memory duplication). - Keep async program logic as simple as possible. - Pending sample tasks, which may become stale and thus useless, can be cancelled. - One extra thread minimizes GIL contention when that is a problem. - We want to wait for all layers to be sampled before updating them on the canvas to prevent strange layer blending behavior. ## Implementation - Simple prototype that just runs `Layer._slice_dims` asynchronously in a potentially unsafe or inconsistent way: https://github.com/andy-sweet/napari/pull/10 - More complex prototype that defines layer slice/sample request and response and uses them for slicing: https://github.com/andy-sweet/napari/pull/7 ### Possible milestones 1. Simple async slicing that just runs `_slice_dims` asynchronously while holding a reentrant layer-wide lock that all other access to the layer must obtain. Maybe only runs asynchronously if `data` is a dask array or similar. 2. Runs `_set_view_slice` asynchronously, but all input state should be set in `_slice_dims` on the main thread. Output state should be set by `_set_view_slice` and it will need to obtain some lock to do so. 3. Define sample request, request, and function for each layer type. Make sampling stateless other than a weak reference to layer data. ## Future work - Load low-res sample of data first (sync or maybe async too), followed by desired res later. - Asynchronously load tiles of data and render them as they are ready. - Instead of using `Dims.point`, define a bounding box in space to define a thick sampling window for all layers. This enables things like thick slicing. ## Alternatives - Just call `Layer.set_view_slice` asynchronously, and just leave all existing slice state on `Layer`. - Simple to implement and shouldn't break anything that is currently dependent on such state. - Needs at least one lock to prevent safe/sensible read/write access to layer slice state (e.g. a lock to control access to the entire layer) - How to handle events that should probably be emitted on the main thread? - Does not address goal 2. - Only access `Layer.data` asynchronously. - Targets main cause of unresponsiveness (i.e. reading data). - No events are emitted on the non-main thread. - Less lazy when cancelling is possible (i.e. we do more work on the main thread before submitting the async task). - Splits up slicing logic, making program flow harder to follow. - Does not address goal 2. - Use `QThread` and similar utilities instead of `concurrent.futures` - Standard way for plugins to support long running operations. - Can track progress and allow more opportunity for cancellation with `yielded` signal. - Can easily process done callback (which might update Qt widgets) on main thread. - Need to define our own task queue to achieve lazy slicing. - Need to connect a `QObject`, which ties our core to Qt, unless the code that controls threads does not live in core. - Use `asyncio` package instead of `concurrent.futures` - Mostly syntactic sugar on top of `concurrent.futures`. - Likely need an `asyncio` main event loop distinct from Qt's main event loop, which could cause issues. ## Discussion - [Initial announcement and on Zulip](https://napari.zulipchat.com/#narrow/stream/296574-working-group-architecture/topic/Async.20slicing.20project). - Consider (re)sampling instead of slicing as the name for the operation discussed here. - [Problems with `NAPARI_ASYNC=1`](https://forum.image.sc/t/even-with-napari-async-1-data-loading-is-blocking-the-ui-thread/68097/4) - [Removing slice state from layer](https://github.com/napari/napari/issues/4682) ### Decisions ## Copyright This document is dedicated to the public domain with the Creative Commons CC0 license [^cc0]. Attribution to this source is encouraged where appropriate, as per CC0+BY [^cc0-by]. ## References and footnotes [^cc0]: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication, <https://creativecommons.org/publicdomain/zero/1.0/> [^cc0-by]: <https://dancohen.org/2013/11/26/cc0-by/> [^issue-792]: napari issue 792, <https://github.com/napari/napari/issues/792> [^issue-1353]: napari issue 1353, <https://github.com/napari/napari/issues/1353> [^issue-1574]: napari issue 1574, <https://github.com/napari/napari/issues/1574> [^issue-1775]: napari issue 1775, <https://github.com/napari/napari/issues/1775> [^issue-2156]: napari issue 2156, <https://github.com/napari/napari/issues/2156> [^pull-4334]: napari pull request 4334, <https://github.com/napari/napari/pull/4334>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.