PAn-Spatial Transcriptomics Analysis (PaSTa) pipeline (workshop #56)

PaSTa is a nextflow-based end-to-end image analysis pipeline for decoding image-based spatial transcriptomics data. It performs imaging cycle registration, cell segmentation and transcripts peak decoding. It is currently supports analysis of three types of ST technology:

  • in-situ sequencing-like encoding
  • MERFISH-like encoding
  • RNAScope-like labelling
    Screenshot 2024-10-22 at 13.30.45

Dataset explanation:

Screenshot 2024-10-16 at 22.39.57
We're working on a 7-cycle, 5-channel dataset. Image data from each cycle is a z-projected 5-channel hyperstack ome.tif.
Due to the constraint of running time and resources we have, we will be working on a small crop (yellow, ~ 2500*800 pixels) from this whole mouse brain section.

RUNNING THE PIPELINE

Prerequisites:

  • Internet access
  • A Gitpod account
    Or
    Unix system + Nextflow + docker/singularity + 50Gb of storage + 16 Gb of RAM.
  • (Optional) Jupyter notebook + Napari-spatialdata on your local computer for visualization

I. Run the pipeline

  • Go to https://gitpod.io/new/#https://github.com/nextflow-io/training

  • Log in using your GitHub credentials

  • For the workspace chose next options:
    Screenshot 2024-10-22 at 13.41.43

    • (i) Training, (ii) VS code - browser, (iii) Large
  • Create an empty folder, e.g. i2k_demo:
    mkdir i2k_demo
    cd i2k_demo

  • Download two configuration files for the pipeline:

wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/params/params_tiny_only.yaml
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/run.config
  • Run Nextflow pipeline (if you need to rerun the pipeline add flag -resume to save time):
nextflow run bioinfotongli/Image-ST -r v0.1.1 \
    -profile local,docker -c run.config \
    -params-file params_tiny_only.yaml \
    -with-report report_tiny_only.html

Step-by-step pipeline explanation

The nextflow pipeline is here:
https://github.com/BioinfoTongLI/Image-ST/tree/main
Which is composed by the following modules.
https://github.com/BioinfoTongLI/modules
and the corresponding container used in each of the modules are in:
https://github.com/BioinfoTongLI/containers
Screenshot 2024-10-22 at 13.53.28

Credits to Konrad Rokicki:
https://github.com/BioImageTools/containers

Configuration files

The minimum parameters required to run the pipeline is specified with Parameter file

The run.config takes extra settings for specific runs.

image

Step 1. Whole-slide registration

A pip-installable image alignment tools using ome-tiffs
https://github.com/VasylVaskivskyi/microaligner

Step 2. Tiled CellPose segmentation

A tiled version of cellpose segmentation to bypass RAM issue. Save outputs as polygons (wkt/geojson) (https://github.com/BioinfoTongLI/containers/tree/main/tiled-cellpose/3.0.10-py10)

Step 3. Tiled Spotiflow peak-calling

A deep-learning based RNA spot peak-calling. Similarily written in a tiled version to avoid RAM limitation. (https://github.com/BioinfoTongLI/containers/tree/main/spotiflow/0.4.2-py11)

Step 4. PoSTcode decoding

A probabilistic RNA spot barcode decoding algorithm (https://github.com/gerstung-lab/postcode).

Step 5. SpatialData object construction

Construct a spatialdata (https://spatialdata.scverse.org/en/stable/) object using the compoenent previously generated above.

II. Checking output

Go to output folder:
ls output
Short explanation of output folders:

  • spatialdata - object (folder) which contains all main outputs of the pipeline in spatialdata format. It contains DAPI images, segmentation masks and decoded spots,
  • cellpose_segmentation_merged_wkt - contains polygons of segmented cells for whole image
  • naive_cellpose_segmentation - contains polygons and downscaled mask images of segmented cells for image slices
  • peak_profiles - contains files with peak positions and peak profiles used for decoding step (this information is not stored in spatialdata)
  • PoSTcode_decoding_output - contains table with all peaks decoded, their positions and probability of decoding results
  • registered_stacks - outputs of registration process, contains registered stacks of images
  • registration_configs - configuration files that were used for registration process
  • slice_jsons - csv with boundaries of image slices
  • spotiflow_peaks - spot peaks called with spotiflow

III. Visualization

  • Zip spatial data folder (from the output folder)
    zip -r demo.sdata.zip spatialdata/ISS_demo_tiny.sdata
  • Download the demo.sdata.zip file in the left panel of VS code with right click - > Download to your local computer.
  • Unzip it
  • Download Jupyter notebook and environment yaml file from https://github.com/cellgeni/SNP_tools/tree/main/visualisation_tools
  • Open Jupyter Notebook
  • Install spatialdata and napari-spatialdata packages (first cell)
  • In case this doesn't work - install Jupyter kernel based on environment.yml file
    • Open Anaconda Prompt and navigate to the folder with both files
    • Install environment using yml file:
      conda env create --file=napari_spatialdata_environment.yml
    • Activate environment:
      conda activate napari_spatialdata
    • Install Jupyter Notebook kernel
      ipython kernel install --user --name=napari_spatialdata
    • Go to Jupyter Notebook, and open downloaded notebook, chose kernel “napari_spatialdata”
    • If the kernel isn't available or doesn't work try to restart kernel or restart jupyter notebook or restart Anaconda
  • In Jupyter Notebook import libraries (cell 3) and add path to unzipped spatialdata folder (cell 4)
  • Run cells 4-6 in Notebook and explore dataset with Napari
  • If you want to highlight specific cell number run last cell ()

FAQ

  1. My HOME dir is full when running Singularity image conversion on HPC.

Quick and dirty solution is to manually specify singularity dir by setting:

singularity cache clean
export SINGULARITY_CACHEDIR=./singularity_image_dir
export NXF_SINGULARITY_CACHEDIR=./singularity_image_dir
  1. How do I modify parameters to specific process/step?
    By following nf-core standard, it is possible to add any parameters to the main script using ext.args=”--[key] [value]” in the run.config file.

An example is

	withName: POSTCODE {
    		ext.args = "--channel_names 'DAPI,Cy5,AF488,Cy3,AF750'"
	}
  1. Cannot download pretrained model for the deep-learning tools (Spotiflow/CellPose)
    Spotiflows
Exception: URL fetch failure on https://drive.switch.ch/index.php/s/6AoTEgpIAeQMRvX/download: None -- [Errno -3] Temporary failure in name resolution
Or CellPose
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

Mostly likely you've reached max download (?), wait a bit and try later OR manually download those models and update the configuration file.

  1. Where are the demo datasets?

They are pre uploaded to Wellcome Sanger Institute’s S3 buckets, specifically the https://spatial_demo_datasets.cog.sanger.ac.uk/

Nextflow is able to download the files as long as these configurations are included in the run.config file.

aws {
	client {
    	endpoint="https://cog.sanger.ac.uk"
    	signerOverride = "S3SignerType"
    }
}
Select a repo