# [Greenline] Data-related project for the Climate Platform
<!-- Add the tag for the current cycle number in the top bar -->
- Shaped by: Nikki & Christos
- Appetite (FTEs, weeks): at least 40% of our time to investigate various tools/platforms
- Developers: Nikki & Christos
## Problem
<!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this -->
Mainly investigate a list of tools/platforms as discussed with Mauro at the QPM:
- [ ] [EERIE Experiment collection for data at DKRZ](https://discover.dkrz.de/external/stac2.cloud.dkrz.de/fastapi/collections/eerie?.language=en): This has a ZARR assumption for the data format, and object store filesystem, but they are also using software layers to interface to NetCDF and Lustre filesystems.
- [ ] [HIOPY](https://gitlab.gwdg.de/ican/hiopy): We are also investigating HIOPY to handle the IO for both the Fortran icon-exclaim runs and for the Green Line. HIOPY is natively targeting ZARR. Coordinate with Will & Andreas.
- [ ] Storing Data at CSCS and sharing to the outside world: which filesystem to use
- [ ] Understand storage requirements
- [ ] Investigate the available couplers/orchestrators, i.e. [YAC](https://dkrz-sw.gitlab-pages.dkrz.de/yac/index.html), [Sirocco](https://github.com/C2SM/Sirocco), [Aiida](https://aiida.readthedocs.io/projects/aiida-core/en/latest/topics/cli.html) and how the work with hiopy.
- [ ] Find as much data as possible by asking the scientists.
- [ ] Hand over data to CSCS people for them to understand the metadata and how to make usage of the Marmot Graph
- [ ] Identify what search parameters should be used (e.g. specific fields): Marmot Graph
- [ ] Once metadata setup is done, clear up where the data is coming from and whether all data needs to be on CSCS machines or can sit somewhere else. Also if compressed or uncompressed data needs to be there.
- [ ] Regarding compressed data, compressors, filters, and serializers names can be used as search parameters.
- [ ] See how EERIE and Hiopy fit into the picture
- [ ] See whether dc_toolkit setup is good to construct Marmot Graph optimally or if we need to compress/output data in some other way
- [ ] What about decompression?
- [ ] A web interface was mentioned as the gateway for scientists to access data. Investigate more how ebrains frontend fits into the picture and see what else is needed
## Appetite
Full cycle.
## Solution
<!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand -->
- [ ] All Marmot Graph related work: support, etc. [Nikki & Christos]
- [ ] Investigate hiopy [Nikki & Christos]
- [ ] Case study: Compress a small subset of the data and create a presentation [Nikki & Christos but on different subsets]
- [ ]
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
## No-gos
<!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable -->
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
List opened and closed PRs here:
- [ ] Task 1 ([PR#xxxx](https://github.com/icon-exclaim/icon4py/pulls))