PAn-Spatial Transcriptomics Analysis (PaSTa) pipeline (workshop #56)

PaSTa is a nextflow-based end-to-end image analysis pipeline for decoding image-based spatial transcriptomics data. It performs imaging cycle registration, cell segmentation and transcripts peak decoding. It is currently supports analysis of three types of ST technology:

in-situ sequencing-like encoding
MERFISH-like encoding
RNAScope-like labelling

Dataset explanation:

Screenshot 2024-10-16 at 22.39.57
We're working on a 7-cycle, 5-channel dataset. Image data from each cycle is a z-projected 5-channel hyperstack ome.tif.
Due to the constraint of running time and resources we have, we will be working on a small crop (yellow, ~ 2500*800 pixels) from this whole mouse brain section.

RUNNING THE PIPELINE

Prerequisites:

Internet access
A Gitpod account
Or
Unix system + Nextflow + docker/singularity + 50Gb of storage + 16 Gb of RAM.
(Optional) Jupyter notebook + Napari-spatialdata on your local computer for visualization

I. Run the pipeline

Go to https://gitpod.io/new/#https://github.com/nextflow-io/training
Log in using your GitHub credentials
For the workspace chose next options:
- (i) Training, (ii) VS code - browser, (iii) Large
Create an empty folder, e.g. i2k_demo:
mkdir i2k_demo
cd i2k_demo
Download two configuration files for the pipeline:

wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/params/params_tiny_only.yaml
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/run.config

Run Nextflow pipeline (if you need to rerun the pipeline add flag -resume to save time):

nextflow run bioinfotongli/Image-ST -r v0.1.1 \
    -profile local,docker -c run.config \
    -params-file params_tiny_only.yaml \
    -with-report report_tiny_only.html

Step-by-step pipeline explanation

The nextflow pipeline is here:
https://github.com/BioinfoTongLI/Image-ST/tree/main
Which is composed by the following modules.
https://github.com/BioinfoTongLI/modules
and the corresponding container used in each of the modules are in:
https://github.com/BioinfoTongLI/containers
Screenshot 2024-10-22 at 13.53.28

Credits to Konrad Rokicki:
https://github.com/BioImageTools/containers

Configuration files

The minimum parameters required to run the pipeline is specified with Parameter file

The run.config takes extra settings for specific runs.

Step 1. Whole-slide registration

A pip-installable image alignment tools using ome-tiffs
https://github.com/VasylVaskivskyi/microaligner

Step 2. Tiled CellPose segmentation

A tiled version of cellpose segmentation to bypass RAM issue. Save outputs as polygons (wkt/geojson) (https://github.com/BioinfoTongLI/containers/tree/main/tiled-cellpose/3.0.10-py10)

Step 3. Tiled Spotiflow peak-calling

A deep-learning based RNA spot peak-calling. Similarily written in a tiled version to avoid RAM limitation. (https://github.com/BioinfoTongLI/containers/tree/main/spotiflow/0.4.2-py11)

Step 4. PoSTcode decoding

A probabilistic RNA spot barcode decoding algorithm (https://github.com/gerstung-lab/postcode).

Step 5. SpatialData object construction

Construct a spatialdata (https://spatialdata.scverse.org/en/stable/) object using the compoenent previously generated above.

II. Checking output

Go to output folder:
ls output
Short explanation of output folders:

spatialdata - object (folder) which contains all main outputs of the pipeline in spatialdata format. It contains DAPI images, segmentation masks and decoded spots,
cellpose_segmentation_merged_wkt - contains polygons of segmented cells for whole image
naive_cellpose_segmentation - contains polygons and downscaled mask images of segmented cells for image slices
peak_profiles - contains files with peak positions and peak profiles used for decoding step (this information is not stored in spatialdata)
PoSTcode_decoding_output - contains table with all peaks decoded, their positions and probability of decoding results
registered_stacks - outputs of registration process, contains registered stacks of images
registration_configs - configuration files that were used for registration process
slice_jsons - csv with boundaries of image slices
spotiflow_peaks - spot peaks called with spotiflow

III. Visualization

Zip spatial data folder (from the output folder)
zip -r demo.sdata.zip spatialdata/ISS_demo_tiny.sdata
Download the demo.sdata.zip file in the left panel of VS code with right click - > Download to your local computer.
Unzip it
Download Jupyter notebook and environment yaml file from https://github.com/cellgeni/SNP_tools/tree/main/visualisation_tools
Open Jupyter Notebook
Install spatialdata and napari-spatialdata packages (first cell)
In case this doesn't work - install Jupyter kernel based on environment.yml file
- Open Anaconda Prompt and navigate to the folder with both files
- Install environment using yml file:
  conda env create --file=napari_spatialdata_environment.yml
- Activate environment:
  conda activate napari_spatialdata
- Install Jupyter Notebook kernel
  ipython kernel install --user --name=napari_spatialdata
- Go to Jupyter Notebook, and open downloaded notebook, chose kernel “napari_spatialdata”
- If the kernel isn't available or doesn't work try to restart kernel or restart jupyter notebook or restart Anaconda
In Jupyter Notebook import libraries (cell 3) and add path to unzipped spatialdata folder (cell 4)
Run cells 4-6 in Notebook and explore dataset with Napari
If you want to highlight specific cell number run last cell ()

FAQ

My HOME dir is full when running Singularity image conversion on HPC.

Quick and dirty solution is to manually specify singularity dir by setting:

singularity cache clean
export SINGULARITY_CACHEDIR=./singularity_image_dir
export NXF_SINGULARITY_CACHEDIR=./singularity_image_dir

How do I modify parameters to specific process/step?
By following nf-core standard, it is possible to add any parameters to the main script using ext.args=”--[key] [value]” in the run.config file.

An example is

	withName: POSTCODE {
    		ext.args = "--channel_names 'DAPI,Cy5,AF488,Cy3,AF750'"
	}

Cannot download pretrained model for the deep-learning tools (Spotiflow/CellPose)
Spotiflows

Exception: URL fetch failure on https://drive.switch.ch/index.php/s/6AoTEgpIAeQMRvX/download: None -- [Errno -3] Temporary failure in name resolution
Or CellPose
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

Mostly likely you've reached max download (?), wait a bit and try later OR manually download those models and update the configuration file.

Where are the demo datasets?

They are pre uploaded to Wellcome Sanger Institute’s S3 buckets, specifically the https://spatial_demo_datasets.cog.sanger.ac.uk/

Nextflow is able to download the files as long as these configurations are included in the run.config file.

aws {
	client {
    	endpoint="https://cog.sanger.ac.uk"
    	signerOverride = "S3SignerType"
    }
}

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.