Spatial Transcriptomics Working Group - Analysis Resources - HackMD

<style> .reveal { font-size: 18px; } .reveal pre { 2 font-size: 20px; } .reveal section p { text-align: left; font-size: 18px; line-height: 1.2em; vertical-align: top; } .reveal section figcaption { text-align: center; font-size: 20px; line-height: 1.2em; vertical-align: top; } .reveal section h1 { font-size: 26pxem; vertical-align: top; } .reveal section h2 { font-size: 24px; line-height: 1.2em; vertical-align: top; } .reveal section h3 { font-size: 22px; line-height: 1.2em; vertical-align: top; } .reveal ul { display: block; } .reveal ol { display: block; } </style> <img align="center" width="25%" src="https://hackmd.io/_uploads/Syhyrl9uT.png" /> # Spatial Transcriptomics Working Group ## Analysis Resources Ivan E. Cao-Berg Research Software Specialist Brain Image Library Biomedical Applications Group Jan 9, 2024 1-4 pm ET --- ## Before we begin - :question: Have a question during the presentation? <a href="https://www.lifewire.com/raise-hand-in-zoom-5100882"><img src="https://hackmd.io/_uploads/r1PWX3YOa.png" width="50%" /></a> - :warning: Have an issue or a question after the workshop? - Send an email to the Help Desk `bil-support@psc.edu` --- ## Resources available during this workshop * A SLURM reservation named `workshop` that lasts 24 hours. * Access to the large memory nodes using the `compute` partition that is shared among all users. --- <img align="left" src="https://slurm.schedmd.com/slurm_logo.png" width="15%"/> Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. ```bash= sinfo #view information about Slurm nodes and partitions squeue #view information about jobs located in the Slurm scheduling queue scontrol #view or modify Slurm configuration and state sbatch #submit a batch script to Slurm ``` The commands above are the most common commands you might be using for this hackathon. For full documentation about SLURM, click [here](https://slurm.schedmd.com/documentation.html). --- ### `sinfo` - Example 1 ```bash= sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST compute* up 2-00:00:00 1 drain l008 compute* up 2-00:00:00 7 idle l[001-007] ``` As a participant of this hackathon, you should have access to the partition `compute` using the reservation `hackathon`. --- ### `squeue` - Example 1 Use `squeue -u $(whoami)` to list your jobs and their status ```bash= squeue -u $(whoami) JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 14243 compute script.s icaoberg R 15:34 ``` --- ### `sbatch` - Example 1 Consider the following file named `script.sh` ```bash= cat script.sh #!/bin/bash module load anaconda3 pip install --user cowsay cowsay "Hello, World" ``` `sbatch` is used to submit jobs to the scheduler. For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html). --- ### `sbatch` - Example 1 (cont.) :::info :bulb: Remember to use the reservation `hackathon` when submitting to the scheduler. ::: ```bash sbatch -p compute --reservation=hackathon script.sh Submitted batch job 82721 ``` For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html). --- ### `sbatch` - Example 1 (cont.) If you do not specify an output filename, the scheduler will create a file automatically. In this example `slurm-82721.out` ```bash cat slurm-82721.out ____________ | Hello, World | ============ \ \ ^__^ (oo)\_______ (__)\ )\/\ ||----w | || || ``` For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html). --- ### `sbatch` - Example 2 ```bash sbatch -p compute -N1 script.sh #number of nodes - please avoid using it! sbatch -p compute -n1 script.sh #number of cores sbatch -p compute --mem=64Gb script.sh #memory sbatch -p compute -N1 -n10 --mem=128Gb script.sh #combine them all as needed ``` For more info on `sbatch`, click [here](https://slurm.schedmd.com/sbatch.html). --- ### `scancel` - Example 1 ```bash scancel -u $(whoami) #cancel all my jobs scancel 1234 #cancel job 1234 ``` ---- ## `interact` The interact command is an in-house script for starting interactive sessions. ```bash > interact -h Usage: interact [OPTIONS] -d Turn on debugging information --debug --noconfig Do not process config files -gpu Allocate 1 gpu in the GPU-shared partition --gpu --gres=<list> Specifies a comma delimited list of generic consumable resources. e.g.: --gres=gpu:1 --mem=<MB> Real memory required per node in MegaBytes ... ``` --- ### `interact` - Useful Tips and Tricks - `interact` is a wrapper built in house. - Use `interact` and avoid using `salloc` or `srun` on BIL hardware. - The template is ```bash interact -A tra220018p -p compute -n <number-of-cores> --mem=<memory> ``` - Remember to specify the account and reservation when using `interact` - Account: `tra220018p` - Reservation: `reservation` --- ## In a nutshell - `LMOD` is used to load software in the workshop. - `SLURM` is used to submit jobs to the scheduler managing the large-memory nodes. - `interact` is used to start interactive sessions on the large-memory nodes. --- ## Workflow management systems (WMS) A practical understanding of WMS benefits, automation, and implementation. 1. **[Snakemake](https://snakemake.github.io/).** Snakemake is a workflow management system that uses a Python-based domain-specific language. It is known for its simplicity and flexibility, making it popular among bioinformaticians for defining and executing data analysis pipelines. 2. **[Nextflow](https://www.nextflow.io/).** Nextflow is a data-driven workflow management system that enables the creation of reproducible and scalable bioinformatics workflows. It uses a domain-specific language called DSL2, which is based on Groovy. 3. **[Common Workflow Language (CWL)](https://www.commonwl.org/).** CWL is not a specific workflow management system but a standardized way to describe and execute bioinformatics workflows. Several workflow engines, including Cromwell and Rabix/Benten, support CWL, making it a popular choice for interoperability. These workflow management systems are widely used in bioinformatics to automate and streamline the analysis of biological data, from genomics to proteomics and beyond. Researchers often choose a system based on their specific requirements, familiarity with the tools, and the nature of their data analysis tasks. --- [![image](https://hackmd.io/_uploads/rkuGo95_6.png)](https://ondemand.bil.psc.edu) --- ![image](https://hackmd.io/_uploads/SysA6qc_a.png) --- ![image](https://hackmd.io/_uploads/H1MEAc9_p.png) --- ![image](https://hackmd.io/_uploads/HkT8A95up.png) --- ## Hands-on Activity: Running Cellpose ![Screenshot 2024-01-09 at 12.46.55 PM](https://hackmd.io/_uploads/ryeC8WjdT.png) --- ![Screenshot 2024-01-09 at 12.49.34 PM](https://hackmd.io/_uploads/SkQBvboda.png) --- ![Screenshot 2024-01-09 at 12.52.09 PM](https://hackmd.io/_uploads/r16CDbjup.png) --- ![Screenshot 2024-01-09 at 12.52.47 PM](https://hackmd.io/_uploads/ryvbObs_6.png) --- ![Screenshot 2024-01-09 at 12.55.47 PM](https://hackmd.io/_uploads/HkN6ubo_p.png) --- ![Screenshot 2024-01-09 at 1.09.34 PM](https://hackmd.io/_uploads/Bk4e2bi_p.png) * without GPUs --- ![Screenshot 2024-01-09 at 1.20.00 PM](https://hackmd.io/_uploads/rJnOR-ouT.png) ---