PaSTa is a nextflow-based end-to-end image analysis pipeline for decoding image-based spatial transcriptomics data. It performs imaging cycle registration, cell segmentation and transcripts peak decoding. It is currently supports analysis of three types of ST technology:
We're working on a 7-cycle, 5-channel dataset. Image data from each cycle is a z-projected 5-channel hyperstack ome.tif.
Due to the constraint of running time and resources we have, we will be working on a small crop (yellow, ~ 2500*800 pixels) from this whole mouse brain section.
Go to https://gitpod.io/new/#https://github.com/nextflow-io/training
Log in using your GitHub credentials
For the workspace chose next options:
Create an empty folder, e.g. i2k_demo:
mkdir i2k_demo
cd i2k_demo
Download two configuration files for the pipeline:
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/params/params_tiny_only.yaml
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/run.config
-resume
to save time):nextflow run bioinfotongli/Image-ST -r v0.1.1 \
-profile local,docker -c run.config \
-params-file params_tiny_only.yaml \
-with-report report_tiny_only.html
The nextflow pipeline is here:
https://github.com/BioinfoTongLI/Image-ST/tree/main
Which is composed by the following modules.
https://github.com/BioinfoTongLI/modules
and the corresponding container used in each of the modules are in:
https://github.com/BioinfoTongLI/containers
Credits to Konrad Rokicki:
https://github.com/BioImageTools/containers
The minimum parameters required to run the pipeline is specified with Parameter file
The run.config takes extra settings for specific runs.
A pip-installable image alignment tools using ome-tiffs
https://github.com/VasylVaskivskyi/microaligner
A tiled version of cellpose segmentation to bypass RAM issue. Save outputs as polygons (wkt/geojson) (https://github.com/BioinfoTongLI/containers/tree/main/tiled-cellpose/3.0.10-py10)
A deep-learning based RNA spot peak-calling. Similarily written in a tiled version to avoid RAM limitation. (https://github.com/BioinfoTongLI/containers/tree/main/spotiflow/0.4.2-py11)
A probabilistic RNA spot barcode decoding algorithm (https://github.com/gerstung-lab/postcode).
Construct a spatialdata (https://spatialdata.scverse.org/en/stable/) object using the compoenent previously generated above.
Go to output folder:
ls output
Short explanation of output folders:
spatialdata
- object (folder) which contains all main outputs of the pipeline in spatialdata format. It contains DAPI images, segmentation masks and decoded spots,cellpose_segmentation_merged_wkt
- contains polygons of segmented cells for whole imagenaive_cellpose_segmentation
- contains polygons and downscaled mask images of segmented cells for image slicespeak_profiles
- contains files with peak positions and peak profiles used for decoding step (this information is not stored in spatialdata)PoSTcode_decoding_output
- contains table with all peaks decoded, their positions and probability of decoding resultsregistered_stacks
- outputs of registration process, contains registered stacks of imagesregistration_configs
- configuration files that were used for registration processslice_jsons
- csv with boundaries of image slicesspotiflow_peaks
- spot peaks called with spotiflowoutput
folder)zip -r demo.sdata.zip spatialdata/ISS_demo_tiny.sdata
demo.sdata.zip
file in the left panel of VS code with right click - > Download to your local computer.conda env create --file=napari_spatialdata_environment.yml
conda activate napari_spatialdata
ipython kernel install --user --name=napari_spatialdata
Quick and dirty solution is to manually specify singularity dir by setting:
singularity cache clean
export SINGULARITY_CACHEDIR=./singularity_image_dir
export NXF_SINGULARITY_CACHEDIR=./singularity_image_dir
ext.args=”--[key] [value]”
in the run.config file.An example is
withName: POSTCODE {
ext.args = "--channel_names 'DAPI,Cy5,AF488,Cy3,AF750'"
}
Exception: URL fetch failure on https://drive.switch.ch/index.php/s/6AoTEgpIAeQMRvX/download: None -- [Errno -3] Temporary failure in name resolution
Or CellPose
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
Mostly likely you've reached max download (?), wait a bit and try later OR manually download those models and update the configuration file.
They are pre uploaded to Wellcome Sanger Institute’s S3 buckets, specifically the https://spatial_demo_datasets.cog.sanger.ac.uk/
Nextflow is able to download the files as long as these configurations are included in the run.config file.
aws {
client {
endpoint="https://cog.sanger.ac.uk"
signerOverride = "S3SignerType"
}
}