Try   HackMD

starrynight: Optical Pooled Screening Workflow Guide

This guide provides documentation for the starrynight image processing system for optical pooled screening. The system consists of 10 different modules, and this document currently shows how to use the illumination correction module alone.

One-Time Setup Steps

These steps are performed once regardless of which module you're using.

Creating a Test Dataset

First, create a test fixture with a single plate of Cell Painting images:

  1. Copy sample files from S3:
    ​​​mkdir -p scratch
    ​​​aws s3 sync s3://imaging-platform/projects/2024_03_12_starrynight/starrynight_example scratch/starrynight_example
    
    ​​​# Optional: Copy reference output files to compare your results
    ​​​aws s3 sync s3://imaging-platform/projects/2024_03_12_starrynight/starrynight_example_workspace scratch/starrynight_example_workspace_reference
    

A sampling of a dataset was first downloaded like this:

export S3_PATH="s3://BUCKET/projects/PROJECT/BATCH"

# Copy SBS images
parallel mkdir -p scratch/starrynight_example/Source1/Batch1/images/Plate1/20X_c{1}_SBS-{1}/ ::: 1 2 3 4 5 6 7 8 9 10

parallel --match '.*' --match '(.*) (.*) (.*)' aws s3 cp "${S3_PATH}/images/Plate1/20X_c{1}_SBS-{1}/Well{2.1}_Point{2.1}_{2.2}_ChannelC,A,T,G,DAPI_Seq{2.3}.ome.tiff" "scratch/starrynight_example/Source1/Batch1/images/Plate1/20X_c{1}_SBS-{1}/" ::: 1 2 3 4 5 6 7 8 9 10 ::: "A1 0000 0000" "A1 0001 0001" "A2 0000 1025" "A2 0001 1026" "B1 0000 3075" "B1 0001 3076"

# Copy Cell Painting images
mkdir -p scratch/starrynight_example/Source1/Batch1/images/20X_CP_Plate1_20240319_122800_179

parallel --match '(.*) (.*) (.*)' aws s3 cp "${S3_PATH}/images/Plate1/20X_CP_Plate1_20240319_122800_179/Well{1.1}_Point{1.1}_{1.2}_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq{1.3}.ome.tiff" "scratch/starrynight_example/Source1/Batch1/images/Plate1/20X_CP_Plate1_20240319_122800_179/" ::: "A1 0000 0000" "A1 0001 0001" "A2 0000 1025" "A2 0001 1026" "B1 0000 3075" "B1 0001 3076"

To keep data sizes manageable for this tutorial, the original files were compressed with this command, resulting in a 50x lossy compression:

find . -type f -name "*.ome.tiff" | parallel 'magick {} -compress jpeg -quality 80 {= s/\.ome\.tiff$/.compressed.tiff/ =}' 
find . -type f -name "*.ome.tiff" -exec rm {} +
  1. Organize them in the following structure:
    ​​​ scratch/starrynight_example/Source1/Batch1/images/
    ​​​ └── Plate1
    ​​​     ├── 20X_CP_Plate1_20240319_122800_179
    ​​​     │   ├── WellA1_PointA1_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq0000.compressed.tiff
    ​​​     │   ├── WellA1_PointA1_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq0001.compressed.tiff
    ​​​     │   ├── WellA2_PointA2_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1025.compressed.tiff
    ​​​     │   ├── WellA2_PointA2_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1026.compressed.tiff
    ​​​     │   ├── WellB1_PointB1_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq3075.compressed.tiff
    ​​​     │   └── WellB1_PointB1_0001_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq3076.compressed.tiff
    ​​​     ├── 20X_c10_SBS-10
    ​​​     │   ├── WellA1_PointA1_0000_ChannelC,A,T,G,DAPI_Seq0000.compressed.tiff
    ​​​     │   ├── WellA1_PointA1_0001_ChannelC,A,T,G,DAPI_Seq0001.compressed.tiff
    ​​​     │   ├── WellA2_PointA2_0000_ChannelC,A,T,G,DAPI_Seq1025.compressed.tiff
    ​​​     │   ├── WellA2_PointA2_0001_ChannelC,A,T,G,DAPI_Seq1026.compressed.tiff
    ​​​     │   ├── WellB1_PointB1_0000_ChannelC,A,T,G,DAPI_Seq3075.compressed.tiff
    ​​​     │   └── WellB1_PointB1_0001_ChannelC,A,T,G,DAPI_Seq3076.compressed.tiff
    ​​​   ...
    ​​​     └── 20X_c9_SBS-9
    ​​​         ├── WellA1_PointA1_0000_ChannelC,A,T,G,DAPI_Seq0000.compressed.tiff
    ​​​         ├── WellA1_PointA1_0001_ChannelC,A,T,G,DAPI_Seq0001.compressed.tiff
    ​​​         ├── WellA2_PointA2_0000_ChannelC,A,T,G,DAPI_Seq1025.compressed.tiff
    ​​​         ├── WellA2_PointA2_0001_ChannelC,A,T,G,DAPI_Seq1026.compressed.tiff
    ​​​         ├── WellB1_PointB1_0000_ChannelC,A,T,G,DAPI_Seq3075.compressed.tiff
    ​​​         └── WellB1_PointB1_0001_ChannelC,A,T,G,DAPI_Seq3076.compressed.tiff
    

Clone the Repository

git clone git@github.com:broadinstitute/starrynight.git

Setup the Environment

cd starrynight
nix develop --extra-experimental-features nix-command --extra-experimental-features flakes
uv sync 

Generate the Inventory

Create a listing of all image files in the dataset:

starrynight inventory gen \
    -d ./scratch/starrynight_example \
    -o ./scratch/starrynight_example_workspace/inventory

Expected Output:

./scratch/starrynight_example_workspace/inventory
├── inv
│   ├── inventory_0_cpeuwixtgl.parquet
│   ├── inventory_0_klqcqzwrur.parquet
│   └── [Additional shard files...]
└── inventory.parquet

Note: The main output is inventory.parquet, which is a consolidated file. The other files in the inv directory are shards used during processing.

Generate the Index

Parse the inventory to create a structured index with metadata for each image:

starrynight index gen \
    -i ./scratch/starrynight_example_workspace/inventory/inventory.parquet \
    -o ./scratch/starrynight_example_workspace/index/

Expected Output:

./scratch/starrynight_example_workspace/index/
└── index.parquet

The index.parquet file contains structured metadata for each image, e.g.,:

duckdb -line -c 'select * from "./scratch/starrynight_example_workspace/index/index.parquet" limit 1;'

          key = starrynight_example/Source1/Batch1/images/Plate1/20X_CP_Plate1_20240319_122800_179/WellA2_PointA2_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1025.ome.tiff
       prefix = scratch
   dataset_id = starrynight-example
     batch_id = Batch1
     plate_id = Plate1
     cycle_id = 
magnification = 20
      well_id = A2
      site_id = 1025
 channel_dict = [PhalloAF750, ZO1-AF488, DAPI]
     filename = WellA2_PointA2_0000_ChannelPhalloAF750,ZO1-AF488,DAPI_Seq1025.ome.tiff
    extension = tiff
 is_sbs_image = false
     is_image = true
       is_dir = false

Note: When running on macOS, you may see error messages about .DS_Store files. These are hidden files created by macOS Finder. The parser cannot handle filenames starting with a dot, but these errors can be safely ignored. The indexing process will continue successfully and create a complete index of your actual data files. If desired, you can remove these files with find /path/to/data -name ".DS_Store" -delete, but it's not necessary as they don't affect the final result.

Module 1: Illumination Correction

This module generates illumination correction files to account for uneven illumination across the field of view in Cell Painting images.

Step 1: Generate CellProfiler LoadData Files for Illumination Correction

Create CSV files that CellProfiler will use to load and process images:

starrynight illum calc loaddata \
    -i ./scratch/starrynight_example_workspace/index/index.parquet \
    -o ./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/illum/illum_calc

Expected Output:

./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/illum/illum_calc
└── Batch1
    └── illum_calc_Batch1_Plate1.csv

Step 2: Generate CellProfiler Pipelines for Illumination Correction

Create CellProfiler pipeline files (.cppipe) that define the illumination correction workflow:

starrynight illum calc cppipe \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/illum/illum_calc/ \
    -o ./scratch/starrynight_example_workspace/cellprofiler/cppipe/cp/illum/illum_calc \
    -w ./scratch/starrynight_example_workspace/illum/cp/illum_calc

Expected Output:

./scratch/starrynight_example_workspace/cellprofiler/cppipe/cp/illum/illum_calc
└── Batch1
    └── illum_calc_Batch1_Plate1.cppipe

Step 3: Execute CellProfiler Pipeline for Illumination Correction

Run the CellProfiler pipelines to generate illumination correction files:

starrynight cp \
    -p ./scratch/starrynight_example_workspace/cellprofiler/cppipe/cp/illum/illum_calc/ \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/illum/illum_calc \
    -o ./scratch/starrynight_example_workspace/illum/cp/illum_calc

Expected Output:

./scratch/starrynight_example_workspace/illum/cp/illum_calc
├── Batch1_Plate1_IllumOrigDAPI.npy
├── Batch1_Plate1_IllumOrigPhalloAF750.npy
└── Batch1_Plate1_IllumOrigZO1-AF488.npy

Visualizing Results

To visualize an illumination correction file, you can use this Python snippet:

import numpy as np
import matplotlib.pyplot as plt

# Load the illumination correction file
data = np.load('scratch/starrynight_example_workspace/illum/cp/illum_calc/Batch1_Plate1_IllumOrigDAPI.npy')

# Create a visualization
plt.figure(figsize=(10,8))
plt.imshow(data, cmap='viridis')
plt.colorbar()
plt.title('DAPI Illumination Correction')
plt.show()

CP Presegcheck

Create load data

starrynight presegcheck loaddata \
    -i ./scratch/starrynight_example/workspace/index/index.parquet \
    -o ./scratch/starrynight_example/workspace/cellprofiler/loaddata/cp/presegcheck/ \
    -c ./scratch/starrynight_example/workspace/workspace/illum/cp/illum_apply/

Create the cellprofiler pipeline

starrynight presegcheck cppipe \
    -l ./scratch/starrynight_example/workspace/cellprofiler/loaddata/cp/presegcheck/ \
    -o ./scratch/starrynight_example/workspace/cellprofiler/cppipe/cp/presegcheck/ \
    -w ./scratch/starrynight_example/workspace/presegcheck/cp/ \
    -n DAPI \
    -c PhalloAF750

Invoke CP pipeline

starrynight cp \
    -p ./scratch/starrynight_example/workspace/cellprofiler/cppipe/cp/presegcheck \
    -l ./scratch/starrynight_example/workspace/cellprofiler/loaddata/cp/presegcheck \
    -o ./scratch/starrynight_example/workspace/presegcheck/cp/

CP Segcheck

Create load data

starrynight segcheck loaddata \
    -i ./scratch/starrynight_example/workspace/index/index.parquet \
    -o ./scratch/starrynight_example/workspace/cellprofiler/loaddata/cp/segcheck/ \
    -c ./scratch/starrynight_example/workspace/workspace/illum/cp/illum_apply/ -n "DAPI" --cell "PhalloAF750"

Create the cellprofiler pipeline

starrynight segcheck cppipe \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/segcheck/ \
    -o ./scratch/starrynight_example_workspace/cellprofiler/cppipe/cp/segcheck/ \
    -w ./scratch/starrynight_example_workspace/segcheck/cp/ \
    -n DAPI \
    -c PhalloAF750

Invoke CP pipeline

starrynight cp \
    -p ./scratch/starrynight_example_workspace/cellprofiler/cppipe/cp/segcheck \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/cp/segcheck \
    -o ./scratch/starrynight_example_workspace/segcheck/cp/

Align

Create load data

starrynight align loaddata \
    -i ./scratch/starrynight_example/workspace/index/index.parquet \
    -o ./scratch/starrynight_example/workspace/cellprofiler/loaddata/sbs/align \
    -c ./scratch/starrynight_example/workspace/illum/sbs/illum_apply/ \
    --nuclei DAPI

Create the cellprofiler pipeline

starrynight align cppipe \
    -l ./scratch/starrynight_example/workspace/cellprofiler/loaddata/sbs/align \
    -o ./scratch/starrynight_example/workspace/cellprofiler/cppipe/sbs/align \
    -w ./scratch/starrynight_example/workspace/align \
    --nuclei DAPI

Invoke CP pipeline

starrynight cp \
    -p ./scratch/starrynight_example/workspace/cellprofiler/cppipe/sbs/align \
    -l ./scratch/starrynight_example/workspace/cellprofiler/loaddata/sbs/align \
    -o ./scratch/starrynight_example/workspace/align \
    --sbs \
    --jobs 1

Preprocess

Create load data

starrynight preprocess loaddata \
    -i ./scratch/starrynight_example_workspace/index/index.parquet \
    -o ./scratch/starrynight_example_workspace/cellprofiler/loaddata/sbs/preprocess/ \
    -c ./scratch/starrynight_example_workspace/workspace/illum/cp/illum_apply/ \
    -a ./scratch/starrynight_example_workspace/workspace/align/sbs/ \
    -n "DAPI"

Create the cellprofiler pipeline

starrynight preprocess cppipe \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/sbs/preprocess/ \
    -o ./scratch/starrynight_example_workspace/cellprofiler/cppipe/sbs/preprocess/ \
    -w ./scratch/starrynight_example_workspace/preprocess/sbs/ \
    -b ./scratch/starrynight_example_workspace/barcodes/barcode.csv
    -n DAPI \

Invoke CP pipeline

starrynight cp \
    -p ./scratch/starrynight_example_workspace/cellprofiler/cppipe/sbs/preprocess \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/sbs/preprocess \
    -o ./scratch/starrynight_example_workspace/preprocess/sbs/

Analysis

Create load data

starrynight analysis loaddata \
    -i ./scratch/starrynight_example_workspace/index/index.parquet \
    -o ./scratch/starrynight_example_workspace/cellprofiler/loaddata/analysis \
    -c ./scratch/starrynight_example_workspace/workspace/illum/cp/illum_apply/ \
    -a ./scratch/starrynight_example_workspace/workspace/align/sbs/ \
    -n "DAPI" --cell "PhalloAF750"

Create the cellprofiler pipeline

starrynight analysis cppipe \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/analysis/ \
    -o ./scratch/starrynight_example_workspace/cellprofiler/cppipe/analysis/ \
    -w ./scratch/starrynight_example_workspace/analysis/ \
    -n DAPI \
    -c PhalloAF750 \

Invoke CP pipeline

starrynight cp \
    -p ./scratch/starrynight_example_workspace/cellprofiler/cppipe/analysis \
    -l ./scratch/starrynight_example_workspace/cellprofiler/loaddata/analysis \
    -o ./scratch/starrynight_example_workspace/analysis/

Illum Calc sbs

Gen loaddata

starrynight illum calc loaddata \
    -i ./scratch/starrynight_example/workspace/index/index.parquet \
    -o ./scratch/starrynight_example/workspace/cellprofiler/loaddata/sbs/illum/illum_calc \
    --sbs

Gen cppipe


Pipeline Development Issues

Test Fixture Development

  1. Create Test Fixture and Comparison Framework
    • Set up directory structure matching PCP output format
    • Create soft links to relevant subset of PCP existing outputs
    • Develop scripts to compare pipeline outputs head-to-head
    • Define metrics for successful output matching
    • Implement validation checks for key output characteristics
    • Document the test fixture structure and usage

First Pass Implementation

  1. Implement Combined "Align" and "Illumination Correct" Steps

    • Develop combined module maintaining exact functionality
    • Ensure code structure follows PCP implementation
  2. Implement Combined "Preseg Check" and "Seg Check" Steps

    • Develop combined module maintaining exact functionality
    • Ensure code structure follows PCP implementation
  3. Output Matching Validation

    • Run pipeline with current StarryNight structure but modified modules
    • Compare outputs against test fixture (PCP outputs)
    • Document discrepancies and solutions

Second Pass Implementation

  1. Implement Structure Changes for Stitch and Crop Integration

    • Update upstream data structures to match PCP implementation
    • Prepare interfaces for Stitch and Crop module integration
  2. Integrate Stitch and Crop Module

    • Import exact code from PCP implementation
    • Implement necessary adapters for upstream/downstream compatibility
    • Test integration with the pipeline
  3. Full Pipeline Validation with Updated Structure

    • Run end-to-end tests with the new structure
    • Validate against test fixture
    • Document any remaining discrepancies

QC Implementation

  1. Implement QC Metrics and Visualization for Merck Presentation
    • Review PCP Python notebooks for AMD_screening/20231011_batch_1
    • Implement modules to generate key QC metrics and visualizations
    • Determine appropriate dataset size for meaningful metrics
    • Create presentation-ready outputs matching reference notebooks
    • Document interpretation guidelines for stakeholders

Future Improvements (Post-Current Implementation)

  1. Standardize StarryNight Algorithm Methods

    • Analyze current implementation patterns
    • Define standard interface for algorithm modules
    • Refactor existing code to follow standards
  2. Create Extension Documentation

    • Document how to extend the system with new algorithms
    • Create templates for new algorithm modules
    • Define testing requirements for extensions
  3. Implement Algorithm Unit Tests

    • Develop test cases for each algorithm component
    • Implement automated testing framework
    • Ensure minimum test coverage thresholds