Running fmriprep or mriqc on a cluster

Connect to the cluster

Open your terminal and type:

ssh <username>@graham.computecanada.ca # Graham login node

Add your ssh key

SSH key pairs are very useful to avoid typing passwords

Modules

There are pre-installed modules that you’ll need to load in order to use. To see all modules available:
module avail

To load module (you can put this in your .bashrc if you need the module all the time):
module load <module_name>

Example: Check if git is available and load it

module avail git
module load apps/git/2.13.0

Transfer files

Use scp to copy individual files and directories

scp <filename> <username>@graham.computecanada.ca:<PATH/TO/FILE>
scp <username>@graham.computecanada.ca:<PATH/TO/FILE> <LocalPath>

Use rsync to sync files or directories

rsync <LocalPath/filename> <username>@graham.computecanada.ca:<PATH/TO/FILE>
rsync <username>@graham.computecanada.ca:<PATH/TO/FILE> <LocalPath> 

Use datalad

Install Datalad on your cluster :

module load git-annex python/3
virtualenv ~/venv_datalad
source ~/venv_datalad/bin/activate
pip install datalad

https://cbs-discourse.uwo.ca/t/installing-datalad-on-compute-canada/23?fbclid=IwAR0cCi1HeA5uU0eHGmR9tdwlbtElpDAcdONRK5cPtPVo5g8RKAg_Iv37Kxo

Running jobs

Create job script

Here is an example of a simple bash script:

#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --account=def-flepore
echo 'Hello, world!'
sleep 20

Submit job

In the cluster terminal

sbatch <name of the file>

Example:

sbatch simple.sh
Submitted batch job 65869853

Check job status

Use squeue or sq to list jobs

sq

JOBID     USER              ACCOUNT           NAME  ST  TIME_LEFT NODES CPUS TRES_PER_N MIN_MEM NODELIST (REASON) 
65869853 mmaclean      def-flepore_cpu      simple.sh  PD       5:00     1    1        N/A    256M  (Priority) 

Use email notification to learn when your job starts and ends by adding the following at the top of your script:

    #SBATCH --mail-user=michele.maclean@umontreal.ca
    #SBATCH --mail-type=BEGIN
    #SBATCH --mail-type=END
    #SBATCH --mail-type=FAIL
    #SBATCH --mail-type=REQUEUE
    #SBATCH --mail-type=ALL

Cancel job

scancel <jobid>
scancel 65869853

Where does the output go

By default the output is placed in a file named "slurm-", suffixed with the job ID number and ".out", e.g. slurm-65869853.out, in the directory from which the job was submitted. Having the job ID as part of the file name is convenient for troubleshooting
Files will be output according to where you specified in your bash script

Tip

  1. Use $SCRATCH disk to run your scripts, because $SCRATCH is much faster than $HOME`.
  2. Keep a working directory, if things crash you don't have to start from the beginning
  3. check usage/space left on cluster: diskusage_report
  4. See if you need to delete files from scratch once jobs are complete- fmriprep output is quite heavy

Running singularity on a cluster

Download the containers for fmriprep and mriqc here:

Repro nim containers: https://github.com/ReproNim/containers

  1. create directory
mkdir parallel_analysis
  1. install containers from repronim
cd parallel_analysis
datalad install https://github.com/ReproNim/containers.git 
  1. retrieve the container you want, e.g., fmriprep
datalad get containers/images/bids/bids-fmriprep--21.0.1.sing

you might need to unlock the container to be able to use it

datalad unlock containers/images/bids/bids-fmriprep--21.0.1.sing

Run fmriprep on cluster

Have your freesurfer license

Here is an example script

    #!/bin/bash
    #-------------------------------------------    
    #SBATCH -J fmriprep
    #SBATCH --account=def-flepore
    #SBATCH --time=15:00:00
    #SBATCH -n 1
    #SBATCH --cpus-per-task=4
    #SBATCH --mem-per-cpu=8G
    #SBATCH --mail-user=michele.maclean@umontreal.ca
    #SBATCH --mail-type=BEGIN
    #SBATCH --mail-type=END
    #SBATCH --mail-type=FAIL
    #SBATCH --mail-type=REQUEUE
    #SBATCH --mail-type=ALL
    # ------------------------------------------

    source ~/venv_datalad/bin/activate
    module load git-annex/8.20200810
    module load freesurfer/5.3.0
    module load singularity/3.8

    cd

    singularity run --cleanenv \
        -B /home/mmaclean/scratch:/scratch \
        -B /home/mmaclean/projects/def-flepore/mmaclean:/mmaclean \
        /home/mmaclean/projects/def-flepore/mmaclean/parallel_analysis/containers/images/bids/bids-fmriprep--21.0.1.sing \
        /mmaclean/raw /mmaclean/fmriprep-output \
        participant --participant-label CTL01 \
        --work-dir /scratch/work-fmriprep \
        --fs-license-file /mmaclean/license/freesurfer.txt \
        --output-spaces MNI152NLin2009cAsym T1w \
        --skip_bids_validation --notrack --stop-on-first-crash

Run mriqc on cluster

Here is an example script

    #!/bin/bash
    #-------------------------------------------
    #SBATCH -J mriqc
    #SBATCH --account=def-flepore
    #SBATCH --time=5:00:00
    #SBATCH -n 1
    #SBATCH --cpus-per-task=8
    #SBATCH --mem-per-cpu=10G
    #SBATCH --mail-user=michele.maclean@umontreal.ca
    #SBATCH --mail-type=BEGIN
    #SBATCH --mail-type=END
    #SBATCH --mail-type=FAIL
    #SBATCH --mail-type=REQUEUE
    #SBATCH --mail-type=ALL
    # ------------------------------------------

    source ~/venv_datalad/bin/activate
    module load git-annex/8.20200810
    module load freesurfer/5.3.0
    module load singularity/3.8

    cd

    singularity run --cleanenv \
            -B /home/mmaclean/scratch:/scratch \
            -B /home/mmaclean/projects/def-flepore/mmaclean:/mmaclean \
            /home/mmaclean/projects/def-flepore/mmaclean/parallel_analysis/containers/images/bids/bids-mriqc--0.16.1.sing \
            /mmaclean/raw /mmaclean/mriqc \
            participant --participant-label CTL01 CTL02 CTL03 \
            -w /scratch/work-mriqc \
    	--no-sub


Select a repo