# Local `sleap-roots` Image Processing Pipeline
---
# Container Flow Chart
```mermaid
graph LR
images-downloader --> sleap-roots-predict
models-downloader --> sleap-roots-predict
sleap-roots-predict --> predictions-uploader
sleap-roots-predict --> sleap-roots-traits
sleap-roots-traits --> traits-uploader
```
---
# Test data can be found [here, updated 2025-04-01](https://salkinstitute.box.com/s/k4zjefjbt88mbflbwlofz89ws2reko1x).
---
# images-downloader
**Input**: Downloaded from Bloom
- ~~`query_params.json` from user~~
- query will be made with Bloom CLI instead
```json
{
experiment: str,
species_name: str,
min_age: int,
max_age: int,
min_wave: int = 0,
max_wave: int = inf,
scan_size: int = 72,
max_scans: int = inf,
scans_per_batch: int = ??,
min_date: str = "2000-01-01",
max_date: str = "3000-01-01",
accessions: List[str] = None,
treatments: List[str] = None,
scanner_id: List[str] = None,
scan_id: List[str] = None,
dry_run: int = 0,
}
```
**Output**:
- `scans.csv` from Bloom with all Bloom metadata per sample
- images named by `scan_id` and `frame_index` that meet query in Bloom
# models-downloader
https://gitlab.com/salk-tm/models-downloader
```
registry.gitlab.com/salk-tm/models-downloader:latest
```
**Input**:
- Zipped [current models, 2024-06-11](https://salkinstitute.box.com/s/sq0yhwvg2av8t6f67q0xua2jupngmnua) downloaded from Bloom
- `model_chooser_table` logic for choosing the correct models given the user parameters
- [Current model chooser table, 2024-06-11](https://salkinstitute.box.com/s/qatqpzvw41103g0lpwcbluyoxp991kne)
- `model_params.json` from user for choosing models with `species`, `mode`, `age` specified for default models OR specific `model_id` for a given root type
```json
{
"species": "canola",
"mode": "cylinder",
"age": "7",
"lateral_model_id": null,
"primary_model_id": null,
"crown_model_id": null
}
```
**Output**:
- Chosen zipped current models for data
- `models.csv`: `model_id`, `model_path`, `model_type`
**TODO**:
- implement getting `model_params` from `scans.csv` to pass (all) relevant models to `sleap-roots-predict` using metadata from Bloom
- we should have `min_age` and `max_age` in the params instead of one age
# sleap-roots-predict
https://gitlab.com/salk-tm/sleap-roots-predict
```
registry.gitlab.com/salk-tm/sleap-roots-predict:latest
```
The main function of this container expects three arguments since it takes in models AND images.
**Images Input**:
- `scans.csv`: all Bloom metadata to organize images by scan
- Images organized by Bloom
- `dataset_params.json`: parameters used to filter `scans.csv` to choose samples you want predictions for
- similar to `query_params.json`
```
{
experiment: str,
species_name: str,
min_age: int,
max_age: int,
min_wave: int = 0,
max_wave: int = inf,
scan_size: int = 72,
max_scans: int = inf,
scans_per_batch: int = ??,
min_date: str = "2000-01-01",
max_date: str = "3000-01-01",
accessions: List[str] = None,
treatments: List[str] = None,
scanner_id: List[str] = None,
dry_run: int = 0,
}
```
**Models Input**:
- Zipped models and `models.csv`
**Output**:
- Prediction files for each scan
- name of each prediction file has this format: `scan_{scan_id}.model_{model_id}.root_{model_type}.slp`
- `predictions.csv`: `scans.csv` merged with `primary`, `lateral`, and `crown` prediction paths
- update `dataset_params.json` to have all arguments listed above
**TODO**:
- implement getting `dataset_params` from `scans.csv` to filter samples using metadata from Bloom and to get predictions on a per sample basis.
- ~~Fix path relative to images folder output from Bloom~~
- ~~Optimize predict function:~~
- Right now it iterate over scan paths, makes a video for each scan, loads the models and predicts on each scan. The model loading for each scan is inefficient but more adaptable than loading the models before. --> Group scans by models needed first.
- `DockerFile` should use different base container depending on computer architecture
# sleap-roots-traits
https://gitlab.com/salk-tm/sleap-roots-traits
The **run_pipeline.sh** script runnning the containers in the proper sequence is in this repo.
```
registry.gitlab.com/salk-tm/sleap-roots-traits:latest
```
**Input**:
- `predictions.csv`: all scan metadata from Bloom and prediction info
- `pipeline_params.json` to choose `pipeline_class`
```json
{
"species": "canola",
"mode": "cylinder",
"age": "7",
"pipeline_class": Null
}
```
**Output**:
- CSV files of summarized root traits per `Series`
- JSON files containing calculated traits per image and root
- metadata:
- `sleap-roots` version
- `pipeline` class
**TODO**:
- implement getting `pipeline_params` from `scans.csv` to choose relevant pipeline on a per sample basis using metadata from Bloom
- output csvs should be in long format
- JSON root traits in `sleap-roots`
- age should be `min_age` and `max_age` instead of one age
# predictions-uploader
Input:
- .slp prediction files
- metadata:
- `model_id` for each prediction (type)
Output: Uploaded to Bloom
- .slp prediction files
- JSON formatted predictions
- minimal metadata for plotting predictions in Bloom
- metadata:
- `model_id` for each prediction (type)
# traits-uploader
Input:
- CSV files and/or JSON files containing calculated traits for the images and series
- metadata:
- `sleap-roots` version
- `pipeline` class
Output: Uploaded to Bloom
- CSV files and/or JSON files containing calculated traits for the images and series
- metadata:
- `sleap-roots` version
- `pipeline` class