# Multiple tables discussion 2023-09-18 Slides: https://docs.google.com/presentation/d/1tI39gCZB8Bw9EGmzk1nYB41Wgji8rpd18UgD89TXkzc/edit#slide=id.g28049423d26_1_83 ## Notes * Overview of current structure * spatialdata_attrs * region is convenience, could be unique values of region_key column * Overview of new proposal * one table for each shape name * how to enforce unique indices? ### Examples * Example 1 * Example 2 * confusing for users, obsm deconvulated indices are different * lots of extra layers * you can use pandas in obsm, not just obs * alternative solution * massive MuData object * modalities with deconvulated cell types and aggregate gene expression * Example 3 * current implementation * subset like in scanpy * works for one modality, but complex for multiple modalities * new design * more code, but still transparant * better support for multiple modalities ### issue of data saving, not directly related to multiple tables multiple tables would also solve issues editing of the table of an existing sdata object - link to https://fsspec.github.io/kerchunk/index.html which also solves some parts of this ### comparison list ### alternative proposal - new proposal SpatialData construcutor, keys do not need to match ## Discussion SpatialData with elements that have one AnnData or one MuData object - pro: local objects with a local scope, editing them is easy SpatialData with just one big MuData object - downside: editing the big table could have global consequences Splitting is much easier than combine Multiple tables are more handy than analysizing all data at once in one big table, but both is nice Combining spatialdata objects is possible, so use SpatialData constructor to create virtual sets of elements seperate tables, load them in SpatialData constructor AnnData support lazy loading from Zarr Indices that are strings that are not integers are support in AnnData - maybe also UUID in the future? ## Feedback on design Option 1: write better API's for current approach Option 2: labels, shapes and points can have one table (AnnData or MuData) + one global tables (AnnData or MuData); 2 Label elements cannot have the same single table Option 3 (Mark): drops instance_key, region_key, would brake IO, introduces region again; coolest option; would support one table for both label and shape? Wouter: option 2 sounds good, but better 3 Isaac: Option 1 is bad, unclear difference option 2 and 3, option 2 seems good Arne: Option 2 is fine Benjamin: Option 3 still unclear Frank: Option 2, data duplication is bad Lotte: Option 2, but with good support and documentation Luca: Option 2 good to start with