Geff Community Meeting September 2025

# Geff Community Meeting September 2025 September 19 2025, 11am ET Attendees: Morgan Schwartz, Caroline Malin-Mayor, Laura Xenard, Teun Huijben ## Python code base updates TLDR: Version 1 of `geff` and `geff-spec` expected in about 1 week! ### Variable length properties Progress! PR for the variable shaped array properties ready for review in the next day or two. The API is a bit different from before: GEFF `write_arrays` expects the variable length property for all nodes/edges as an `np.array` with dtype `np.object_`, where each element in the array is another `np.array` with the same number of dimensions and dtype as all the others, as well as a normal missing array. We provide a helper function to convert `Sequence[ArrayLike | None]` to a valid `PropDictNpArray`. It will ensure all the ndims and dtypes are the same (if possible), construct the missing array, and fill in Nones with empty arrays of proper ndim and type. #### String arrays Explicitly encoding string arrays into byte arrays with utf-8 caused issues. Specifically, a zarr-python bug in byte array access, and a zarr-python warning that bytes are not actually in the v3 spec.... Variable length utf-8 encoded strings are also not in the main zarr v3 spec, but they are in the [zarr-extensions](https://github.com/zarr-developers/zarr-extensions/tree/main/data-types/string) spec which is supported by zarr-python. If we stick with our custom byte encoding scheme, we need to implement it in GEFF java and python. If we use the standard scheme defined in the extended spec, we only need to implement it in (N5/zarr) Java, which is where this code properly lives anyways. John Bogovic agreed that this plan makes sense. Proposal: GEFF spec says strings should be stored according to the zarr-extensions variable length utf-8 encoded strings spec. For the immediate future, string properties won't be readable in Java, but we push to get that code contributed to n5 reader in Java. It will "just work" in python without any special implementaiton from us. ### Unified API for reading and writing different backends - Done! Thanks to Milly, we now have a nice standardized API for reading and writing GEFFs - https://github.com/live-image-tracking-tools/geff/blob/main/src/geff/_graph_libs/_api_wrapper.py ``` def read( store: StoreLike, structure_validation: bool = True, node_props: list[str] | None = None, edge_props: list[str] | None = None, data_validation: ValidationConfig | None = None, *, backend: SupportedBackend = "networkx", **backend_kwargs: Any, ) -> tuple[SupportedGraphType, GeffMetadata]: """ Read a GEFF to a chosen backend. Args: store (StoreLike): The path or zarr store to the root of the geff zarr, where the .attrs contains the geff metadata. structure_validation (bool, optional): Flag indicating whether to perform validation on the geff file before loading into memory. If set to False and there are format issues, will likely fail with a cryptic error. Defaults to True. node_props (list of str, optional): The names of the node properties to load, if None all properties will be loaded, defaults to None. edge_props (list of str, optional): The names of the edge properties to load, if None all properties will be loaded, defaults to None. backend ({"networkx", "rustworkx", "spatial-graph"}): Flag for the chosen backend, default is "networkx". data_validation (ValidationConfig, optional): Optional configuration for which optional types of data to validate. Each option defaults to False. backend_kwargs (Any): Additional kwargs that may be accepted by the backend when reading the data. Returns: tuple[Any, GeffMetadata]: Graph object of the chosen backend, and the GEFF metadata. """ ``` ``` def write( graph: SupportedGraphType, store: StoreLike, metadata: GeffMetadata | None = None, axis_names: list[str] | None = None, axis_units: list[str | None] | None = None, axis_types: list[Literal[AxisType] | None] | None = None, zarr_format: Literal[2, 3] = 2, *args: Any, **kwargs: Any, ) -> None: """Write a supported graph object to the geff file format. Args: graph (SupportedGraphType): An instance of a supported graph object. store (str | Path | zarr store): The path/str to the output zarr, or the store itself. Opens in append mode, so will only overwrite geff-controlled groups. metadata (GeffMetadata, optional): The original metadata of the graph. Defaults to None. If provided, will override the graph properties. axis_names (list[str], optional): The names of the spatial dims represented in position property. Defaults to None. Will override both value in graph properties and metadata if provided. axis_units (list[str | None], optional): The units of the spatial dims represented in position property. Defaults to None. Will override value both value in graph properties and metadata if provided. axis_types (list[Literal[AxisType] | None], optional): The types of the spatial dims represented in position property. Usually one of "time", "space", or "channel". Defaults to None. Will override both value in graph properties and metadata if provided. zarr_format (Literal[2, 3], optional): The version of zarr to write. Defaults to 2. *args (Any): Additional args that may be accepted by the backend when writing from a specific type of graph. **kwargs (Any): Additional kwargs that may be accepted by the backend when writing from a specific type of graph. """ ``` ### Refactor into two packages, `geff` and `geff-spec` Open PR thanks to Morgan and Talley's teamwork: https://github.com/live-image-tracking-tools/geff/pull/329 Just waiting for the variable length properties to get merged so I stop creating insane merge conflicts, plus the spec discussions we are about to have, then we will merge and test, and release version 1 of `geff` and `geff-spec`! ## Specification ### Units Conclusions from last meeting: - We need a defined list of things that can be stored on disk. - Pint can be a python API convenience to cast to the defined list of units. Remaining questions after last meeting: - Is the list of OME-zarr/ngff units ready to be relied upon? - Units SHOULD or MUST be (on disk) on the defined list? Proposed answers: - Yes, the OME-zarr list of units is thorough for spatial and temporal units and is as good as anything else I could come up with as a defined list. I propose we should continue having this as our defined list of spatial and temporal units. - I propose that geffs SHOULD on disk be one of the list. Our writers can cast/warn. - Related proposal: no channel dimensions in the "axes" spec -> agreed that we will keep the channel for ease of visualizing graphs with multichannel image data #### Proposed Axis specification: > axes (list[Axis] | None, default: None ) – An OPTIONAL list of Axis objects defining the spatiotemporal axes of each node in the graph. Any spatial properties or transformations (e.g. affine transformation, ellipsoid covariance matrix) will correspond to the axis order in this list. Each Axis has the following attrbutes: > - name (REQUIRED): the name of an existing numerical attribute on the nodes > - type (OPTIONAL): One of `space` or `time` ~~or `channel`~~. > - unit (OPTIONAL): MUST be one of the valid OME-Zarr spatial or temporal units (link to list). > - min (OPTIONAL): the smallest value found on any node for this axis > - max (OPTIONAL): the largest value found on any node for this axis. > >See Axis for more information. ``` VALID_SPACE_UNITS = [ "angstrom", "attometer", "centimeter", "decimeter", "exameter", "femtometer", "foot", "gigameter", "hectometer", "inch", "kilometer", "megameter", "meter", "micrometer", "mile", "millimeter", "nanometer", "parsec", "petameter", "picometer", "terameter", "yard", "yoctometer", "yottameter", "zeptometer", "zettameter", ] VALID_TIME_UNITS = [ "attosecond", "centisecond", "day", "decisecond", "exasecond", "femtosecond", "gigasecond", "hectosecond", "hour", "kilosecond", "megasecond", "microsecond", "millisecond", "minute", "nanosecond", "petasecond", "picosecond", "second", "terasecond", "yoctosecond", "yottasecond", "zeptosecond", "zettasecond", ] ``` ### Affine model Looking through the issues and discussions on our Affine implementation, it became clear that we need to be more specific on what the Affine matrix represents, and thus what the spatial Axes represent. Questions to consider: What is the affine transformation between? "Pixel" units and "physical" units? Do we assume graphs are in "pixel" units if no Affine is provided, or in "physical" units? If we assume "physical" units, should we also provide a helper for loading in "pixel" units? Or just for loading in "physical" units? Can the affine be applied in time? For scaling and offset this might be meaningful but rotation or shearing between space and time is just chaos. Do we really need a full affine implementation or just offset and scaling? Again, offset and scaling in relationship to what? #### Proposal: For spatial graphs (graphs with spatial axes), the default is to assume that all axes values are corresponding with some "physical", Euclidean space. This fits with the option to specify units on the axes, and the things we offer as unit options (e.g, no "pixel" unit). (DISCUSS - other option is to assume it is in "pixels" if there is a related_image I suppose.) We then have options for what the affine matrix means/does: 1. Including an Affine matrix implies that the units are NOT actually stored in the physical space, and need to be transformed to match the physical space (what we implicitly do now, I think). 2. We ALWAYS store the physical space locations. The Affine matrix gives you a way to map to some "non-physical" space (e.g pixel space - the inverse of what we do now). You can even have multiple affines, each with a name of the space it maps to. 3. We ALWAYS store the physical space locations. There is no affine in GEFF spec. If you want to scale to match pixels, you get the pixel size/affine transformation from the image metadata and apply that. This works nicely for multiscale resolution images where the points are the same for all resolution levels. We provide an option for passing in a matrix to the write function to automatically convert pixels to "physical" space before writing. #### Discussion/Conclusion - Not immediately clear that there are cases where rotation is needed. Offset and scale seem more useful and standard. - Add pixel/voxel and frames as valid units (v1) - Replace affine with offset and scale (v1) - Add an optional metadata field describing the output unit after applying scale and offset (v1) - Units should be a SHOULD not a MUST in order to allow things like "isotropic voxels (v1) - Keep the channel for ease of visualizing graphs with multichannel image data ## Geffception? - Yoh and Laura are working on a test geffception file that doesn't change anything to the current library/spec ![geffception](https://hackmd.io/_uploads/HJkCCxiixl.png) ![geffception(1)](https://hackmd.io/_uploads/SyXyybjigl.png) ![geffception(2)](https://hackmd.io/_uploads/S19ykZiogg.png) Notes: - Yoh sent Laura a file exported from his tool, but due to different assumptions in the tools it wasn't easy to load into Pycellin - Laura created a sample geffception file from TrackMate -> PyCellin -> geff - Tracket geff and lineage geff inside the main node/edge geff - Only duplication of information was the tracklet ID and lineage ID - Tracklet and lineage geffs are totally independent from each other - Added related objects to the other related geffs - Call read 3x to get the three graphs. No difference between saving 3 separate geffs and geffception-ing them. Saving them inside is just convenient. - Overall it was quite straightforward, no changes to GEFF needed - just conventions for what to call the tracklet and lineage geffs - No reason not to do it! :seal: of approval How can spec support? - Add an option to related objects to say "this is another geff that is related to a node or edge property" (v1.1) - Add data validation to check that the node IDs match the property values, the edges retain the strcture of the original, etc. Side note: negative ints for node ids if its not hard ## Things to add to next meeting agenda - Issues from updating to V1 - Geffception PR - Examples that people should test their tools on (script that writes a set of edge cases to disk) - path to DOI - zenodo? Figshare? JOSS? etc.