---
tags: scverse, open2c, meeting
---
# 2022-10-05: open2c + higlass + scverse meeting
*Attendees:*
* Nezar Abdennur (Open2C, HiGlass)
* Geoff Fudenberg (Open2C)
* Nils Gehlenborg (HiGlass, Gosling, Vitessce)
* Ilya Flyamer (Open2C)
* Peter Kerpedjiev (HiGlass)
* Fritz Lekschas (HiGlass)
* Trevor Manz (HiGlass)
* Isaac Virshup (scverse)
* Ignacio Ibarra (scverse)
* Danila Bredikhin (scverse)
* Sasha Galitsyna (Mirny lab, Open2C)
## Agenda
* 1-2min intros
* 2-5min general presentation of community scope and projects
* scverse (Danila)
* Open2C (Nezar)
* HiGlass (TBD)
* Topic discussion; possible topics:
* genomic ranges
* genomic metadata
* multiscale genome x observation data (multivec)
* sparse array formats and file storage (anndata, cooler)
* visualization
* data access APIs
* Areas of collaboration/ follow up?
## Notes
### Intros
* Ignacio - Postdoc in Munich
* Nezar – New PI at UMass –
* Ilya – postdoc in Basel – wet + dry working on hic
* Danila – phd at EMBL – multiomics in scverse
* Nils – PI in harvard – interested in data storage for this and spatial
* Sasha – – Previously worked in scatac, now involved with open2c
* Fritz – head of viz at biotech startup – worked on higlass
* Peter – SE & amazon – Prev higlass dev
* Jeff – Assist prof USC – HiC Analysis, cooltools, bioframe – polymer analysis
* Trevor – Phd w/ Nils, interning with Fritz – interest in data formats, currently working on higlass
### Presentations
* scverse
* goals: maintanance, community
* common storage + analysis tools
* big current projects
* community
* spatial
* regulatory genomics
* scaling of data
* open2c
* pairtools: aggregating hic data
* file format: cooler
* cooltools is ananlysis toolkit
* polychrom for md simulations
* bioframe: replacement for bedtools in python
* higlass: sibling project for viz
* cooler:
* annotated sparse matrices
* w/ binned genomic coordinates
* collection of cooler
* hdf5 + zarr
* Chunked algorithms, currently largely custom
* higlass
* Multiscale viz of genomic data
* Currently working on factoring
* Aligning 2d genome with 1d, but making this view more configurable
* Applications
* Viz across omics (also gosling + vitessce)
* linked views
* gosling grammar for viewing
* HiGlass Manages datatypes
* Main interest: managing data and datatypes well
### Discussions
* Data formats
* client side: https://github.com/manzt/coolr
* higlass-python v2 https://github.com/manzt/hg
* if higlass formats were in zarr, would you need the server?
* A: Probably not
* tileset creation on the time
* Machine learning formats
* Geoff – https://www.nature.com/articles/s41592-020-0958-x
* hic with deep learning
* genome confirmation modelling
* Currently bound by GPU mem
* Large sequences
* Isaac: 2-bit encoding?
* Multivec
* https://paper.dropbox.com/doc/Multivec-Spec--BVlBnY0uhAfONcAGYPV_oiS4Ag-3IelZjzjXDo7mGy3SkGUF
* Could we converge with anndata?
* Observation x genome bin
* Features not in anndata
* chunk by chrom
* (D) Can use MuData for that?
* genomic metadata
* (D) Would be great to standardise!
* Fritz, Peter, Nils?, Trevor (?)
* Genomic range (Danila, Geoff)
* Danila use case: atac seq, ranged summarized experiment
* BioFrame genomic view (different views on a genome)
* Nezar: problem of genome being a set of non contiguous views
* Binned representation (bed-like vs bedgraph-like)
* Maybe talk to dask for out of core?
### Follow up
* Genomic range/interval representations:
* Metadata (genome assembly, etc.)
* Binned genomic ranges (begGraph) vs arbitrary intervals (bed)
* Deep learning representations
* Schemas for known tabular formats
* File Formats:
* Anndata + multivec
* Multiscale data
* Zarr, Parquet, Kerchunk
* Vis systems:
* Memory usage
* hg
* higlass tileset "protocol"
* Single-cell and single-locus embeddings
* Standardized tileset adaptors for multiscale