--- tags: scverse, open2c, data-structures, zarr, meeting --- # 2022-11-09: open2c + scverse data-structures meeting *Attendees: Trevor Manz, Isaac Virshup Nezar Abdennur, Geoff Fudenberg, Aleksandra Galitsyna* ## Agenda * Presentation * HiGlass (10 min) [Trevor Slides](https://docs.google.com/presentation/d/1TujqZGOmFewW1v45rIOT9vj5YZaqRZ6_WHVC1G5tiOg/edit?usp=sharing) * AnnData (10 min) [Isaac's slides](https://drive.google.com/file/d/1XGO9Etil_0vFPPOSsG_AmqS3WPtzAcvu/view?usp=sharing) * File formats/ interoperability * Anndata + multivec * Zarr, Parquet, Kerchunk, Dask * Pairwise or higher-order data * Performance/ scaling * Multiscale data * Memory usage * hg * higlass tileset "protocol" * Standardized tileset adaptors for multiscale * Single-cell and single-locus embeddings ## Notes * https://pangeo-forge.org * Zarr as a reader for hdf5 * Trevor Pres * Higlass tilesets and data formats * HiGlass supports many formats * higlass client * consistent api for retrieving regardless of input * Especially multiscale * Abstraction over 1d/ 2d genomics formats * Abstraction * TilesetInfo – metadata about the pyramid * Tilesize, pyramid shape * TileData * Can be pixel or sparse * Q: Multiscale for snps? * SNps are supported * Multiscale is a bit more complicates * clodius implements tilesets for genomics formats * https://github.com/higlass/clodius * Higlass visualization defined in a json * Server contains a set of vizualizations and datasets * Clodius does the range querying * higlass-python is python API for using in a notebook env * v2 (https://github.com/manzt/hg) * User defined tilesets, user defines functions * Idea: representing tileset as a zarr dataset * Once it is a zarr, maybe don't need a server process * Or, maybe you don't with kerchunk * Zep 3 discussion – non fixed size chunks * https://github.com/orgs/zarr-developers/discussions/52 * Isaac Presentation * Multivec, can we overlap? * Spatial indexing? * Can also be used in genomic coordinate systems (Rtrees used interally for some formats) * Z-curve indexing (https://en.wikipedia.org/wiki/Z-order_curve) * Arrow * Accessing single rows from cooler's * Follow up – on zulip? * Multiscale access * Spatial indexing * Bioframe PRs * Common file formats for bioinformatic data types