owned this note
owned this note
Published
Linked with GitHub
# Encoding layer style from features
Author: Andy Sweet
Status: Draft
Type: Standards Track
## Abstract
We propose an approach for unifying the API of style attributes, like colors, strings, and sizes, across napari layer types. In general, style values are encoded from a layer's feature table, which provides a concise and powerful API for generating visualizations. However, some encoding types may also ignore the features table, instead always returning a single constant value or some manually specified values. Unifying this API across layers makes this behavior easier to explain to users, plugin developers, and napari developers.
## Motivation and Scope
### Existing behavior
Some layers have a features table that contains one row for each element in the layer and one column for each feature. For example, a points layer with three points might have the following features table.
```python
points.features = {
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
}
```
Currently, users can specify some style attributes to be derived from the features table. For example, the point face colors could be derived from the class feature
```python
points.face_color_cycle = {'a': 'blue', 'b': 'green'}
points.face_color = 'class'
points.face_color # => [[0, 0, 1, 1], [0, 1, 0, 1], [0, 0, 1, 1]]
```
where we also coerce user input from names to RGBA arrays.
Alternatively, users can manually specify style values independently of the features table. For example, the point face colors could be specified as constant across all points
```python
points.face_color = 'red'
points.face_color # => [[1, 0, 0, 1], [1, 0, 0, 1], [1, 0, 0, 1]]
```
or individually assigned to different colors.
```python
points.face_color = ['blue', 'red', 'green']
points.face_color # => [[0, 0, 1, 1], [1, 0, 0, 1], [0, 1, 0, 1]]
```
### Existing problems
While the current style API is powerful, it also has some problems.
#### Layers have too many style attributes
In general, the different layer classes have many attributes related to style. For example, the `Points` layer has five attributes (`face_color`, `face_color_mode`, `face_colormap`, `face_color_cycle`, `current_face_color`) just for defining the face color of points. These multiply quickly as we introduce more style attributes (e.g. `Points` has five more attributes for edge color) and also describe state that may not always be relevant (e.g. `face_colormap` is not important if we're using a cycle). Therefore, style attributes both inflate the public API of `Points`, which generally makes it harder to understand, and dominate it, potentially hiding more important attributes like `data`, `features`, and `scale`.
#### Layers have inconsistent style attributes
The style API of different layer types is inconsistent. For example, we have `edge_color` in the `Vectors` layer but `color_by` in the `Tracks` layer for the same purpose. Also, the underlying implementation is not consistently shared. For example, `ColorManager` is used in `Points` and `Vectors`, but not in `Tracks` or `Shapes`, which leads to bugs (e.g. [#3013](https://github.com/napari/napari/issues/3013)).
#### Only colors can be easily encoded from features
Only color attributes like `Points.face_color`, `Shapes.edge_color`, and `Vectors.edge_color` can currently be derived from features. Other style attributes like `Points.shown` and `Points.size` can be set per-element, but cannot be derived from features. This increases inconsistency and limits the power of napari as a visualization tool.
#### ColorManager is difficult to understand
The `ColorManager` class is the main way in which some of the style implementation is shared across layer types, so we could consider using it elsewhere and creating similar classes for other attributes. However, the existing class is large and difficult to understand. In particular, the Pydantic [root validator](https://github.com/napari/napari/blob/b8e5d05d97e173ac08ffbd4fcaf042d654baa9be/napari/layers/utils/color_manager.py#L180) and [`_from_layer_kwargs`](https://github.com/napari/napari/blob/b8e5d05d97e173ac08ffbd4fcaf042d654baa9be/napari/layers/utils/color_manager.py#L454) make understanding program flow and debugging challenging even for seemingly simple operations. Other methods like [`_add`](https://github.com/napari/napari/blob/b8e5d05d97e173ac08ffbd4fcaf042d654baa9be/napari/layers/utils/color_manager.py#L285) branch significantly based on `ColorManager.color_mode`.
### Scope
#### Goals
- Reduce the number of layer style attributes.
- Define a consistent style API across points, shapes, vectors, and tracks layers.
- Try to extend this to labels and surface layers, but don't force it.
- Share the implementation of style across layer and encoding types.
- Do not require users to import encoding types to use them.
- Simplify adding a new style encoding to a layer.
- Retain support for manual style encodings.
- Avoid unnecessary memory copies.
- Ergonomically coerce user input values assigned to encoding fields.
#### Non-goals
- Decrease time of generating encoded style values.
- Though we should aim not to significantly increase this time.
- Make style attributes that are currently manual (e.g. `Points.size`) into encodings.
- This should be made easy by this project, but doing so should be done as follow-up work.
## Detailed Description
Our core approach to unifying the style API is to define a style encoding protocol
```python
class StyleEncoding:
def __call__(self, DataFrameLike: features) -> ArrayLike:
...
```
where `features` is from `Layer.features` and the return value is an array-like that is broadcastable to the length of the given features.
We only require something broadcastable to avoid unnecessary copies when dealing with single constant style values. We only require an array-like so that the implementation of different encoding can choose how to store its values (e.g. in VRAM to avoid copies). In general, this means that if a numpy array is required, then the output of `__call__` should be passed through `np.asarray` and/or `np.broadcast_to`.
Anything that derives style values from layer features should satisfy this protocol. In this proposal, we describe color and string encodings, but the design should generalize to boolean, numeric, and other similar encodings.
### Color encodings
First, consider the high level API and possible implementation of color encodings. In napari, the standard form for a single color value is an RGBA array of shape (4,), and multiple colors are stored in an array of shape (-1, 4). Therefore, each color encoding should return an array-like of shape (4,) or (-1, 4) to satisfy the style encoding protocol
Constant and manual encodings are easy to define.
```python
class ConstantColorEncoding:
constant: Array[float, (4,)]
def __call__(self, features):
return self.constant
class ManualColorEncoding:
array: Array[float, (-1, 4)]
default: Array[float, (4,)]
def __call__(self, features):
num_needed = features.shape[0] - self.array.shape
if num_needed > 0:
return np.concat(self.array, [constant] * num_needed)
return self.array[:features.shape[0]]
```
Given the existing color and colormap utilities in napari, encoding colors from features is also straightforward
```python
class DirectColorEncoding:
feature: str
def __call__(self, features):
return transform_color(features[self.feature])
class NominalColorEncoding:
feature: str
colormap: CategoricalColormap
def __call__(self, features):
return self.colormap(features[self.feature])
class QuantitativeColorEncoding:
feature: str
colormap: Colormap
contrast_limits: Tuple[float, float]
def __call__(self, features):
values = np.interp(features[self.features], self.contrast_limits, (0, 1))
return self.colormap(values)
```
where we use the name prefices nominal and quantitative to be consistent with altair/vega.
Altogether, we expect usage like the following.
```python
features = pd.DataFrame({
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
'color': ['green', 'blue', 'red'],
})
color = ConstantColorEncoding(constant='red')
color(features)
# => [1, 0, 0, 1]
color = ManualColorEncoding(array=['red', 'green', 'blue']))
color(features)
# => [[1, 0, 0, 1], [0, 1, 0, 1], [0, 0, 1, 1]]
color = DirectColorEncoding(feature='color')
color(features)
# => [[0, 1, 0, 1], [0, 0, 1, 1], [1, 0, 0, 1]]
color = NominalColorEncoding(
feature='class',
colormap={'a': 'blue', 'b': 'red'},
)
color(features)
# => [[0, 0, 1, 1], [1, 0, 0, 1], [0, 0, 1, 1]]
color = QuantitativeColorEncoding(feature='conf', colormap='gray')
color(features)
# => [[0.5, 0.5, 0.5, 1], [1, 1, 1, 1], [0, 0, 0, 1]]
```
### String encodings
Next, consider string encodings, which can be used to add text annotations to points and shapes layers.
Constant, manual, and direct encodings are easy to define and are almost identical to the corresponding color encodings except for the type of values returned.
```python
class ConstantStringEncoding:
constant: str
def __call__(self, features):
return self.constant
class ManualStringEncoding:
array: Array[str, (-1)]
default: str
def __call__(self, features):
num_needed = features.shape[0] - self.array.shape
if num_needed > 0:
return np.concat(self.array, [constant] * num_needed)
return self.array[:features.shape[0]]
class DirectStringEncoding:
feature: str
def __call__(self, features):
return features[self.feature]
```
A more interesting string encoding is one that encodes a format string that only includes fields that are feature names. This uses multiple columns in the feature table to generate string values.
```python
class FormatStringEncoding:
format: str
def __call__(self, features):
return [
self.format.format(**dict(features.iloc[i]))
for i in range(features.shape[0])
]
```
Altogether, we expect usage to look like the following.
```python
features = pd.DataFrame({
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
})
string = ConstantStringEncoding(constant='')
string (features)
# => ''
string = ManualStringEncoding(array=['one', 'two', 'three']))
string(features)
# => ['one', 'two', 'three']
string = DirectStringEncoding(feature='class')
string(features)
# => ['a', 'b', 'a']
string = FormatStringEncoding(format='{class}: {conf:.2f}')
string(features)
# => ['a: 0.10', 'b: 0.90', 'c: 0.50']
```
### Coercion of assigned encoding values
In the examples so far, we explicitly created instances of encoding types like `NominalColorEncoding` and `FormatStringEncoding`. However, that is often verbose and also requires importing those classes.
#### Coercion from a dictionary
To avoid requiring imports, we plan to be able to coerce each style encoding from a dictionary input as follows.
```python
features = pd.DataFrame({
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
})
style.color = {'constant': 'red'}
# => ConstantColorEncoding(constant='red')
style.color = {'array': ['red', 'green', 'blue']}
# => ManualColorEncoding(array=['red', 'green', 'blue']))
style.color = {'feature': 'color'}
# => DirectColorEncoding(feature='color')
style.color = {'feature': 'class', 'colormap': {'a': 'blue', 'b': 'red'}}
# => NominalColorEncoding(feature='class', colormap={'a': 'blue', 'b': 'red'})
style.color = {'feature': 'conf', 'colormap': 'gray'}
# => QuantitativeColorEncoding(feature='conf', colormap='gray')
text.string = {'constant': ''}
# => ConstantStringEncoding(constant='')
text.string = {'array': ['one', 'two', 'three']}
# => ManualStringEncoding(array=['one', 'two', 'three']))
text.string = {'feature': 'class'}
# => DirectStringEncoding(feature='class')
text.string = {'format': '{class}: {conf:.2f}'}
# => FormatStringEncoding(format='{class}: {conf:.2f}')
```
Calling the setters of fields in style collections or text manager will call the coercing function, which will attempt to match the key/value pairs of one of the built-in encodings. If there are no matches, the setter should fail. In the case that there are multiple matches, one will be used and the user can change this behavior either by providing a disambiguating field like `'type': 'quantitative'` or by explicitly importing and using the desired encoding type.
#### Coercion from a string
Even though dictionary coercion allow users to avoid imports, it is still a little more verbose than some similar current usage in napari and similar libraries like altair. For example, assigning a feature column name to the existing `Points.face_color` effectively assigns it to be a derived face color encoding.
We propose not supporting coercion from a string, except in cases where a string can be interpreted as a single style value (e.g. `'red'` for a color), in which case a constant encoding is returned. For example, assigning strings to colors and string encoding fields would work as follows.
```python
features = pd.DataFrame({
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
})
style.color = 'class' # => raises ValueError
style.color = 'conf' # => raises ValueError
style.color = 'red' # => ConstantColorEncoding(constant='red')
style.color = 'rad' # => raises ValueError
text.string = 'class' # => ConstantStringEncoding(constant='class')
text.string = 'conf' # => ConstantStringEncoding(constant='conf')
text.string = '{class}: {conf:.2f}'
# => ConstantStringEncoding(constant='{class}: {conf:.2f}')
```
Our reasoning for this is as follows.
- Simple to explain and document.
- Simple to implement.
- Failures are fast and can be informative.
Instead we require the more verbose form that assigns an instance of an encoding or the dictionary form described above, such as `style.color = {'feature': 'class', ...}` or `text.string = {'format': '{class}: {conf:.2f}'}`. These are slightly verbose, but they are also explicit and more descriptive, so likely have an overall positive effect on code readability.
This breaks some existing behavior, though as the API entry points are also changing (e.g. `Points.style.face_color` replaces `Points.face_color`), existing users will have to make updates anyway. More complex coercion behavior can be introduced later, though any coercion from a string that is currently successful will always take precedence.
## Related Work
- Altair: https://altair-viz.github.io/
- Python package for declarative data visualization based on vega.
- Vega: https://vega.github.io/vega/
- Visualization grammar/language/schema for declaring data visualizations.
- [napari `ColorManager`](https://github.com/napari/napari/blob/990c9d8107622ad79a372f0367692257e33f7182/napari/layers/utils/color_manager.py#L103)
- [Unified layer properties/features state accepted proposal](https://hackmd.io/emsLO3cUR3O9nxSc6YYwUA?both)
## Implementation
Implementation tasks and issues are being tracked using a [public GitHub project board](https://github.com/napari/napari/projects/11).
## Backward Compatibility
The proposed changes are not intended to be strictly backwards compatible and are planned to be made for napari version 0.5. However, if desired, we could implement existing property getters and setters like `Points.face_color` fairly easily.
```python
class Points:
...
@property
def face_color(self) -> np.ndarray:
return np.broadcast_to(
self.style.face_color(self.features),
(self.num_points, 4))
@face_color.setter
def face_color(self, face_color) -> None:
self.style.face_color = face_color
...
```
## Future Work
### Coercion from a string to a derived encoding
In this proposal, we do not support coercion from a string to a derived encoding, where the string represents the name of the feature column that style values should be derived from. The main reasons were to keep things simple and explicit in this first pass. In the future, if we receive feedback that a more concise form is important, then we should consider implementing it despite possible extra complexity.
In order for this type of coercion to be useful, we likely need to use a layer's features table to check that a string actually represents a feature column name. This allows us to fail quickly and would also allow us to automatically determine the type of encoding by looking at the dtype of the feature column. For example, we might expect the following behavior.
```python
features = pd.DataFrame({
'class': ['a', 'b', 'a'],
'conf': [0.1, 0.9, 0.5],
})
style.color = 'class' # => NominalColorEncoding(feature='class', ...)
style.color = 'conf' # => QuantitativeColorEncoding(feature='conf', ...)
style.color = 'red' # => ConstantColorEncoding(constant='red')
style.color = 'rad' # => raises ValueError
text.string = 'class' # => DirectStringEncoding(feature='class')
text.string = 'conf' # => DirectStringEncoding(feature='conf')
text.string = '{class}: {conf:.2f}' # => FormatStringEncoding(feature='conf')
```
### Convert other style attributes to encodings
Some style attributes can be provided on a per-element basis, but cannot currently be encoded from features. As part of future work, we plan to define these as style encodins. For example `Points.size`, `Points.edge_width`, and `Shapes.edge_width` could be defined as floating point number encodings.
```python
class ConstantFloatEncoding:
constant: float
def __call__(self, features):
return self.constant
class ManualFloatEncoding:
array: Array[float, (-1,)]
def __call__(self, features):
...
class AffineFloatEncoding:
feature: str
scale: float
offset: float
def __call__(self, features):
self.scale * features[self.feature] + self.offset
```
Similarly, `Points.shown` could be defined with Boolean encodings.
```python
class ConstantBooleanEncoding:
constant: bool
def __call__(self, features):
return self.constant
class ManualBooleanEncoding:
array: Array[bool, (-1,)]
def __call__(self, features):
...
class DerivedBooleanEncoding:
operation: BoolOp
def __call__(self, features):
return [
operation.eval(features.iloc[i])
for i in features.shape[0]
]
```
## Alternatives
- Use altair directly.
- Pros
- Avoid new API design.
- Use an existing standard.
- Cons
- Altair does not provide access to encoded values. It only outputs a visualization.
- napari/vispy would not be able to support many of the encodings and channels in altair.
- While we decided to not use altair directly, we did use some of its naming (e.g. encodings, channels, nominal, quantitative).
- Use the existing `ColorManager` to encode colors
- Pros
- Avoid new API design.
- Focus on simplifying existing implementation.
- Cons
- Does not solve some of the existing problems.
- Other style value types (e.g. strings) need other solutions.
## Discussion
- Original issue/vision/epic discussion: https://github.com/napari/napari/issues/2866
- Using a Python `Protocol` as a Pydantic field: https://napari.zulipchat.com/#narrow/stream/296574-working-group-architecture/topic/nice.20protocols.20blog.20post
- Multi-color text manager closed/draft PR: https://github.com/napari/napari/pull/2969
- Property maps to generate colors and strings in text manager closed/draft PR: https://github.com/napari/napari/pull/3327
- String and color encodings in text manager closed/draft PR: https://github.com/napari/napari/pull/3493
- Strongly consider not coercing strings to derived encodings: https://github.com/napari/napari/pull/3493#discussion_r741912085
- Color encoding in text manager: https://github.com/napari/napari/pull/4464
- Question: how to handle string assignments to encodings?
- Coercing color encodings from strings:
- https://napari.zulipchat.com/#narrow/stream/212875-general/topic/.E2.9C.94.20Vote.3A.20colors.20from.20strings
### Decisions
- Coerce a dictionary to an encoding.
- See https://github.com/napari/napari/pull/3327#discussion_r704913565
- Reasoning: users should not need to import napari-specific classes
- Coerce a color name string to a constant encoding.
- See: https://napari.zulipchat.com/#narrow/stream/212875-general/topic/.E2.9C.94.20Vote.3A.20colors.20from.20strings
- Reasoning: this behavior is not surprising, even when there is a feature with a color name (which should happen rarely).
- Define each style encoding field as a protocol instead of union
- https://github.com/napari/napari/pull/3493#discussion_r741915817
- Reasoning
- Support for custom encodings
- Easier to understand the API of the value returned by the field getter