Local `sleap-roots` Image Processing Pipeline

# Local `sleap-roots` Image Processing Pipeline --- # Container Flow Chart ```mermaid graph LR images-downloader --> sleap-roots-predict models-downloader --> sleap-roots-predict sleap-roots-predict --> predictions-uploader sleap-roots-predict --> sleap-roots-traits sleap-roots-traits --> traits-uploader ``` --- # Test data can be found [here, updated 2025-04-01](https://salkinstitute.box.com/s/k4zjefjbt88mbflbwlofz89ws2reko1x). --- # images-downloader **Input**: Downloaded from Bloom - ~~`query_params.json` from user~~ - query will be made with Bloom CLI instead ```json { experiment: str, species_name: str, min_age: int, max_age: int, min_wave: int = 0, max_wave: int = inf, scan_size: int = 72, max_scans: int = inf, scans_per_batch: int = ??, min_date: str = "2000-01-01", max_date: str = "3000-01-01", accessions: List[str] = None, treatments: List[str] = None, scanner_id: List[str] = None, scan_id: List[str] = None, dry_run: int = 0, } ``` **Output**: - `scans.csv` from Bloom with all Bloom metadata per sample - images named by `scan_id` and `frame_index` that meet query in Bloom # models-downloader https://gitlab.com/salk-tm/models-downloader ``` registry.gitlab.com/salk-tm/models-downloader:latest ``` **Input**: - Zipped [current models, 2024-06-11](https://salkinstitute.box.com/s/sq0yhwvg2av8t6f67q0xua2jupngmnua) downloaded from Bloom - `model_chooser_table` logic for choosing the correct models given the user parameters - [Current model chooser table, 2024-06-11](https://salkinstitute.box.com/s/qatqpzvw41103g0lpwcbluyoxp991kne) - `model_params.json` from user for choosing models with `species`, `mode`, `age` specified for default models OR specific `model_id` for a given root type ```json { "species": "canola", "mode": "cylinder", "age": "7", "lateral_model_id": null, "primary_model_id": null, "crown_model_id": null } ``` **Output**: - Chosen zipped current models for data - `models.csv`: `model_id`, `model_path`, `model_type` **TODO**: - implement getting `model_params` from `scans.csv` to pass (all) relevant models to `sleap-roots-predict` using metadata from Bloom - we should have `min_age` and `max_age` in the params instead of one age # sleap-roots-predict https://gitlab.com/salk-tm/sleap-roots-predict ``` registry.gitlab.com/salk-tm/sleap-roots-predict:latest ``` The main function of this container expects three arguments since it takes in models AND images. **Images Input**: - `scans.csv`: all Bloom metadata to organize images by scan - Images organized by Bloom - `dataset_params.json`: parameters used to filter `scans.csv` to choose samples you want predictions for - similar to `query_params.json` ``` { experiment: str, species_name: str, min_age: int, max_age: int, min_wave: int = 0, max_wave: int = inf, scan_size: int = 72, max_scans: int = inf, scans_per_batch: int = ??, min_date: str = "2000-01-01", max_date: str = "3000-01-01", accessions: List[str] = None, treatments: List[str] = None, scanner_id: List[str] = None, dry_run: int = 0, } ``` **Models Input**: - Zipped models and `models.csv` **Output**: - Prediction files for each scan - name of each prediction file has this format: `scan_{scan_id}.model_{model_id}.root_{model_type}.slp` - `predictions.csv`: `scans.csv` merged with `primary`, `lateral`, and `crown` prediction paths - update `dataset_params.json` to have all arguments listed above **TODO**: - implement getting `dataset_params` from `scans.csv` to filter samples using metadata from Bloom and to get predictions on a per sample basis. - ~~Fix path relative to images folder output from Bloom~~ - ~~Optimize predict function:~~ - Right now it iterate over scan paths, makes a video for each scan, loads the models and predicts on each scan. The model loading for each scan is inefficient but more adaptable than loading the models before. --> Group scans by models needed first. - `DockerFile` should use different base container depending on computer architecture # sleap-roots-traits https://gitlab.com/salk-tm/sleap-roots-traits The **run_pipeline.sh** script runnning the containers in the proper sequence is in this repo. ``` registry.gitlab.com/salk-tm/sleap-roots-traits:latest ``` **Input**: - `predictions.csv`: all scan metadata from Bloom and prediction info - `pipeline_params.json` to choose `pipeline_class` ```json { "species": "canola", "mode": "cylinder", "age": "7", "pipeline_class": Null } ``` **Output**: - CSV files of summarized root traits per `Series` - JSON files containing calculated traits per image and root - metadata: - `sleap-roots` version - `pipeline` class **TODO**: - implement getting `pipeline_params` from `scans.csv` to choose relevant pipeline on a per sample basis using metadata from Bloom - output csvs should be in long format - JSON root traits in `sleap-roots` - age should be `min_age` and `max_age` instead of one age # predictions-uploader Input: - .slp prediction files - metadata: - `model_id` for each prediction (type) Output: Uploaded to Bloom - .slp prediction files - JSON formatted predictions - minimal metadata for plotting predictions in Bloom - metadata: - `model_id` for each prediction (type) # traits-uploader Input: - CSV files and/or JSON files containing calculated traits for the images and series - metadata: - `sleap-roots` version - `pipeline` class Output: Uploaded to Bloom - CSV files and/or JSON files containing calculated traits for the images and series - metadata: - `sleap-roots` version - `pipeline` class