starrynight: Optical Pooled Screening Workflow Guide
guide provides documentation for the starrynight image processing system for optical pooled screening.
The system consists of 10 different modules, and this document currently shows how to use the illumination correction module alone.
One-Time Setup Steps
These steps are performed once regardless of which module you're using.
Creating a Test Dataset
First, create a test fixture with a single plate of Cell Painting images:
- Copy sample files from S3:
A sampling of a dataset was first downloaded like this:
To keep data sizes manageable for this tutorial, the original files were compressed with this command, resulting in a 50x lossy compression:
- Organize them in the following structure:
scratch/starrynight_example/Source1/Batch1/images/
└── Plate1
├── 20X_CP_Plate1_20240319_122800_179
│ ├── WellA1_PointA1_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq0000.compressed.tiff
│ ├── WellA1_PointA1_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq0001.compressed.tiff
│ ├── WellA2_PointA2_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1025.compressed.tiff
│ ├── WellA2_PointA2_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1026.compressed.tiff
│ ├── WellB1_PointB1_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq3075.compressed.tiff
│ └── WellB1_PointB1_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq3076.compressed.tiff
├── 20X_c10_SBS-10
│ ├── WellA1_PointA1_0000_ChannelC,A,T,G,DAPI_Seq0000.compressed.tiff
│ ├── WellA1_PointA1_0001_ChannelC,A,T,G,DAPI_Seq0001.compressed.tiff
│ ├── WellA2_PointA2_0000_ChannelC,A,T,G,DAPI_Seq1025.compressed.tiff
│ ├── WellA2_PointA2_0001_ChannelC,A,T,G,DAPI_Seq1026.compressed.tiff
│ ├── WellB1_PointB1_0000_ChannelC,A,T,G,DAPI_Seq3075.compressed.tiff
│ └── WellB1_PointB1_0001_ChannelC,A,T,G,DAPI_Seq3076.compressed.tiff
...
└── 20X_c9_SBS-9
├── WellA1_PointA1_0000_ChannelC,A,T,G,DAPI_Seq0000.compressed.tiff
├── WellA1_PointA1_0001_ChannelC,A,T,G,DAPI_Seq0001.compressed.tiff
├── WellA2_PointA2_0000_ChannelC,A,T,G,DAPI_Seq1025.compressed.tiff
├── WellA2_PointA2_0001_ChannelC,A,T,G,DAPI_Seq1026.compressed.tiff
├── WellB1_PointB1_0000_ChannelC,A,T,G,DAPI_Seq3075.compressed.tiff
└── WellB1_PointB1_0001_ChannelC,A,T,G,DAPI_Seq3076.compressed.tiff
Clone the Repository
Setup the Environment
Generate the Inventory
Create a listing of all image files in the dataset:
Expected Output:
Note: The main output is inventory.parquet
, which is a consolidated file. The other files in the inv
directory are shards used during processing.
Generate the Index
Parse the inventory to create a structured index with metadata for each image:
Expected Output:
The index.parquet
file contains structured metadata for each image, e.g.,:
Note:
When running on macOS, you may see error messages about .DS_Store
files.
These are hidden files created by macOS Finder.
The parser cannot handle filenames starting with a dot, but these errors can be safely ignored.
The indexing process will continue successfully and create a complete index of your actual data files.
If desired, you can remove these files with find /path/to/data -name ".DS_Store" -delete
, but it's not necessary as they don't affect the final result.
Module 1: Illumination Correction
This module generates illumination correction files to account for uneven illumination across the field of view in Cell Painting images.
Step 1: Generate CellProfiler LoadData Files for Illumination Correction
Create CSV files that CellProfiler will use to load and process images:
Expected Output:
Step 2: Generate CellProfiler Pipelines for Illumination Correction
Create CellProfiler pipeline files (.cppipe) that define the illumination correction workflow:
Expected Output:
Step 3: Execute CellProfiler Pipeline for Illumination Correction
Run the CellProfiler pipelines to generate illumination correction files:
Expected Output:
Visualizing Results
To visualize an illumination correction file, you can use this Python snippet:
CP Presegcheck
Create load data
Create the cellprofiler pipeline
Invoke CP pipeline
CP Segcheck
Create load data
Create the cellprofiler pipeline
Invoke CP pipeline
Align
Create load data
Create the cellprofiler pipeline
Invoke CP pipeline
Preprocess
Create load data
Create the cellprofiler pipeline
Invoke CP pipeline
Analysis
Create load data
Create the cellprofiler pipeline
Invoke CP pipeline
Illum Calc sbs
Gen loaddata
Gen cppipe
Pipeline Development Issues
Test Fixture Development
- Create Test Fixture and Comparison Framework
- Set up directory structure matching PCP output format
- Create soft links to relevant subset of PCP existing outputs
- Develop scripts to compare pipeline outputs head-to-head
- Define metrics for successful output matching
- Implement validation checks for key output characteristics
- Document the test fixture structure and usage
First Pass Implementation
-
Implement Combined "Align" and "Illumination Correct" Steps
- Develop combined module maintaining exact functionality
- Ensure code structure follows PCP implementation
-
Implement Combined "Preseg Check" and "Seg Check" Steps
- Develop combined module maintaining exact functionality
- Ensure code structure follows PCP implementation
-
Output Matching Validation
- Run pipeline with current StarryNight structure but modified modules
- Compare outputs against test fixture (PCP outputs)
- Document discrepancies and solutions
Second Pass Implementation
-
Implement Structure Changes for Stitch and Crop Integration
- Update upstream data structures to match PCP implementation
- Prepare interfaces for Stitch and Crop module integration
-
Integrate Stitch and Crop Module
- Import exact code from PCP implementation
- Implement necessary adapters for upstream/downstream compatibility
- Test integration with the pipeline
-
Full Pipeline Validation with Updated Structure
- Run end-to-end tests with the new structure
- Validate against test fixture
- Document any remaining discrepancies
QC Implementation
- Implement QC Metrics and Visualization for Merck Presentation
- Review PCP Python notebooks for AMD_screening/20231011_batch_1
- Implement modules to generate key QC metrics and visualizations
- Determine appropriate dataset size for meaningful metrics
- Create presentation-ready outputs matching reference notebooks
- Document interpretation guidelines for stakeholders
Future Improvements (Post-Current Implementation)
-
Standardize StarryNight Algorithm Methods
- Analyze current implementation patterns
- Define standard interface for algorithm modules
- Refactor existing code to follow standards
-
Create Extension Documentation
- Document how to extend the system with new algorithms
- Create templates for new algorithm modules
- Define testing requirements for extensions
-
Implement Algorithm Unit Tests
- Develop test cases for each algorithm component
- Implement automated testing framework
- Ensure minimum test coverage thresholds