owned this note
owned this note
Published
Linked with GitHub
# PAn-Spatial Transcriptomics Analysis (PaSTa) pipeline (workshop #56)
PaSTa is a nextflow-based end-to-end image analysis pipeline for decoding image-based spatial transcriptomics data. It performs imaging cycle registration, cell segmentation and transcripts peak decoding. It is currently supports analysis of three types of ST technology:
- _in-situ_ sequencing-like encoding
- MERFISH-like encoding
- RNAScope-like labelling

## Dataset explanation:

We're working on a 7-cycle, 5-channel dataset. Image data from each cycle is a z-projected 5-channel hyperstack ome.tif.
Due to the constraint of running time and resources we have, we will be working on a small crop (yellow, ~ 2500\*800 pixels) from this whole mouse brain section.
# RUNNING THE PIPELINE
## Prerequisites:
- Internet access
- A Gitpod account
_*Or*_
Unix system + Nextflow + docker/singularity + 50Gb of storage + 16 Gb of RAM.
- (Optional) Jupyter notebook + Napari-spatialdata on your local computer for visualization
### I. Run the pipeline
- Go to https://gitpod.io/new/#https://github.com/nextflow-io/training
- Log in using your GitHub credentials
- For the workspace chose next options:

- (i) Training, (ii) VS code - browser, (iii) Large
- Create an empty folder, e.g. i2k_demo:
`mkdir i2k_demo`
`cd i2k_demo`
- Download two configuration files for the pipeline:
```
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/params/params_tiny_only.yaml
wget https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/run.config
```
- Run Nextflow pipeline (if you need to rerun the pipeline add flag `-resume` to save time):
```
nextflow run bioinfotongli/Image-ST -r v0.1.1 \
-profile local,docker -c run.config \
-params-file params_tiny_only.yaml \
-with-report report_tiny_only.html
```
### Step-by-step pipeline explanation
The nextflow pipeline is here:
https://github.com/BioinfoTongLI/Image-ST/tree/main
Which is composed by the following modules.
https://github.com/BioinfoTongLI/modules
and the corresponding container used in each of the modules are in:
https://github.com/BioinfoTongLI/containers

_Credits to Konrad Rokicki_:
https://github.com/BioImageTools/containers
#### Configuration files
The minimum parameters required to run the pipeline is specified with [Parameter file](https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/params/params_tiny_only.yaml)
The [run.config](https://spatial_demo_datasets.cog.sanger.ac.uk/ISS/run.config) takes extra settings for specific runs.

#### Step 1. Whole-slide registration
A pip-installable image alignment tools using ome-tiffs
https://github.com/VasylVaskivskyi/microaligner
#### Step 2. Tiled CellPose segmentation
A tiled version of cellpose segmentation to bypass RAM issue. Save outputs as polygons (wkt/geojson) (https://github.com/BioinfoTongLI/containers/tree/main/tiled-cellpose/3.0.10-py10)
#### Step 3. Tiled Spotiflow peak-calling
A deep-learning based RNA spot peak-calling. Similarily written in a tiled version to avoid RAM limitation. (https://github.com/BioinfoTongLI/containers/tree/main/spotiflow/0.4.2-py11)
#### Step 4. PoSTcode decoding
A probabilistic RNA spot barcode decoding algorithm (https://github.com/gerstung-lab/postcode).
#### Step 5. SpatialData object construction
Construct a spatialdata (https://spatialdata.scverse.org/en/stable/) object using the compoenent previously generated above.
### II. Checking output
Go to output folder:
`ls output`
Short explanation of output folders:
- `spatialdata` - object (folder) which contains all main outputs of the pipeline in _spatialdata_ format. It contains DAPI images, segmentation masks and decoded spots,
- `cellpose_segmentation_merged_wkt` - contains polygons of segmented cells for whole image
- `naive_cellpose_segmentation` - contains polygons and downscaled mask images of segmented cells for image slices
- `peak_profiles` - contains files with peak positions and peak profiles used for decoding step (this information is not stored in spatialdata)
- `PoSTcode_decoding_output` - contains table with all peaks decoded, their positions and probability of decoding results
- `registered_stacks` - outputs of registration process, contains registered stacks of images
- `registration_configs` - configuration files that were used for registration process
- `slice_jsons` - csv with boundaries of image slices
- `spotiflow_peaks` - spot peaks called with spotiflow
### III. Visualization
- Zip spatial data folder (from the `output` folder)
`zip -r demo.sdata.zip spatialdata/ISS_demo_tiny.sdata`
- Download the `demo.sdata.zip` file in the left panel of VS code with right click - > Download to your local computer.
- Unzip it
- Download Jupyter notebook and environment yaml file from https://github.com/cellgeni/SNP_tools/tree/main/visualisation_tools
- Open Jupyter Notebook
- Install *spatialdata* and *napari-spatialdata* packages (first cell)
- In case this doesn't work - install Jupyter kernel based on *environment.yml* file
- Open Anaconda Prompt and navigate to the folder with both files
- Install environment using yml file:
`conda env create --file=napari_spatialdata_environment.yml`
- Activate environment:
`conda activate napari_spatialdata`
- Install Jupyter Notebook kernel
`ipython kernel install --user --name=napari_spatialdata`
- Go to Jupyter Notebook, and open downloaded notebook, chose kernel “napari_spatialdata”
- If the kernel isn't available or doesn't work try to restart kernel or restart jupyter notebook or restart Anaconda
- In Jupyter Notebook import libraries (cell 3) and add path to unzipped spatialdata folder (cell 4)
- Run cells 4-6 in Notebook and explore dataset with Napari
- If you want to highlight specific cell number run last cell ()
# FAQ
1. My HOME dir is full when running Singularity image conversion on HPC.
Quick and dirty solution is to manually specify singularity dir by setting:
```
singularity cache clean
export SINGULARITY_CACHEDIR=./singularity_image_dir
export NXF_SINGULARITY_CACHEDIR=./singularity_image_dir
```
2. How do I modify parameters to specific process/step?
By following nf-core standard, it is possible to add any parameters to the main script using `ext.args=”--[key] [value]”` in the run.config file.
An example is
```
withName: POSTCODE {
ext.args = "--channel_names 'DAPI,Cy5,AF488,Cy3,AF750'"
}
```
3. Cannot download pretrained model for the deep-learning tools (Spotiflow/CellPose)
Spotiflows
```
Exception: URL fetch failure on https://drive.switch.ch/index.php/s/6AoTEgpIAeQMRvX/download: None -- [Errno -3] Temporary failure in name resolution
Or CellPose
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
```
Mostly likely you've reached max download (?), wait a bit and try later OR manually download those models and update the configuration file.
4. Where are the demo datasets?
They are pre uploaded to Wellcome Sanger Institute’s S3 buckets, specifically the https://spatial_demo_datasets.cog.sanger.ac.uk/
Nextflow is able to download the files as long as these configurations are included in the run.config file.
```
aws {
client {
endpoint="https://cog.sanger.ac.uk"
signerOverride = "S3SignerType"
}
}
```