nf-core/bytesize: Custom scripts

<a href="https://www.nf-co.re"><img src="https://raw.githubusercontent.com/nf-core/logos/master/byte-size-logos/bytesize-darkbg.svg" width="65%"><img></a> ## Custom scripts <img src="https://openmoji.org/data/color/svg/1F431-200D-1F4BB.svg" width=50> Chris Hakkaart | <img src="https://openmoji.org/data/color/svg/E040.svg" width=50> @chris_hakk | <img src="https://openmoji.org/data/color/svg/E045.svg" width=50> christopher-hakkaart <img src="https://github.com/seqeralabs/logos/blob/master/seqera-logo-white.png?raw=true" width=300> --- ### Custom scripts 1. Background 2. `myfirstpipeline.nf` 3. How to use a `bin/` directory 4. How to use a `templates/` directory 5. Managing dependencies 6. Summary --- ### Background - Real-world pipelines often use custom scripts written in different languages (BASH, R, Python, others...) - With Nextflow you can integrate any scripting language into a workflow by adding the corresponding shebang to code blocks. - You can avoid keeping large code blocks in your main workflow by executing them as custom scripts. ---  ## `myfirstpipeline.nf` ```bash process MYSCRIPT { input: val STR output: stdout script: """ echo $STR | tr '[a-z]' '[A-Z]' """ } workflow { Channel.of('this', 'that', 'other') | MYSCRIPT | view } ``` ---  ## `myfirstpipeline.nf` ```bash process MYSCRIPT { input: val STR output: stdout script: """ #!/usr/bin/env Rscript cat(toupper("$STR")) """ } workflow { Channel.of('this', 'that', 'other') | MYSCRIPT | view } ``` ---  ## `myfirstpipeline.nf` ![](https://i.imgur.com/c216hjl.gif) ---  ### `/full/path/to/myfirstscript.r` ```R #!/usr/bin/env Rscript args = commandArgs(trailingOnly=TRUE) cat(toupper(args[1])) ``` ### `/full/path/to/myfirstpipeline.nf` ```bash process MYSCRIPT { input: val STR output: stdout script: """ /full/path/to/myfirstscript.r ${STR} """ } workflow { Channel.of('this', 'that', 'other') | MYSCRIPT | view } ``` Don't forget to make your script executable with `chmod +x myfirstscript.r` ---  ## How to use a `bin/` directory ![](https://i.imgur.com/JqpFkjV.png) ---  ## How to use a `bin/` directory ### `bin/myfirstscript.r` ```R #!/usr/bin/env Rscript args = commandArgs(trailingOnly=TRUE) cat(toupper(args[1])) ``` ### `myfirstpipeline.nf` ```bash process MYSCRIPT { input: val STR output: stdout script: """ myfirstscript.r ${STR} """ } workflow { Channel.of('this', 'that', 'other') | MYSCRIPT | view } ``` ---  ## How to use a `templates/` directory ![](https://i.imgur.com/q8e4gB9.png) ---  ## How to use a `templates/` directory ### `templates/myfirstscript.r` ```R #!/usr/bin/env Rscript cat(toupper($STR)) ``` ### `myfirstpipeline.nf` ```bash process MYSCRIPT { input: val STR output: stdout script: template 'myfirstscript.r' } workflow { Channel.of('this', 'that', 'other') | MYSCRIPT | view } ``` --- ## Managing dependencies - Dependencies are managed the same way as other tools. - Can require one or more tool(s)/package(s). - Multiple tools can be combined in a mulled container. - Helper tools and documentation are available - [`nf-core modules mulled`](https://nf-co.re/tools/#generate-the-name-for-a-multi-tool-container-image) - [`multi-package-containers`](https://github.com/BioContainers/multi-package-containers/blob/master/combinations/hash.tsv) --- ## [`modules/local/salmon_summarizedexperiment.nf`](https://github.com/nf-core/rnaseq/blob/master/modules/local/salmon_summarizedexperiment.nf)  ```R process SALMON_SUMMARIZEDEXPERIMENT { tag "$tx2gene" label "process_medium" conda (params.enable_conda ? "bioconda::bioconductor-summarizedexperiment=1.20.0" : null) container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/bioconductor-summarizedexperiment:1.20.0--r40_0' : 'quay.io/biocontainers/bioconductor-summarizedexperiment:1.20.0--r40_0' }" input: path counts path tpm path tx2gene output: path "*.rds" , emit: rds path "versions.yml", emit: versions when: task.ext.when == null || task.ext.when script: // This script is bundled with the pipeline, in nf-core/rnaseq/bin/ """ salmon_summarizedexperiment.r \\ NULL \\ $counts \\ $tpm cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') bioconductor-summarizedexperiment: \$(Rscript -e "library(SummarizedExperiment); cat(as.character(packageVersion('SummarizedExperiment')))") END_VERSIONS """ } ``` ---  ## [`modules/local/deseq2_qc.nf`](https://github.com/nf-core/rnaseq/blob/master/modules/local/deseq2_qc.nf) ```R process DESEQ2_QC { label "process_medium" // (Bio)conda packages have intentionally not been pinned to a specific version // This was to avoid the pipeline failing due to package conflicts whilst creating the environment when using -profile conda conda (params.enable_conda ? "conda-forge::r-base bioconda::bioconductor-deseq2 bioconda::bioconductor-biocparallel bioconda::bioconductor-tximport bioconda::bioconductor-complexheatmap conda-forge::r-optparse conda-forge::r-ggplot2 conda-forge::r-rcolorbrewer conda-forge::r-pheatmap" : null) container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91:ab110436faf952a33575c64dd74615a84011450b-0' : 'quay.io/biocontainers/mulled-v2-8849acf39a43cdd6c839a369a74c0adc823e2f91:ab110436faf952a33575c64dd74615a84011450b-0' }" input: path counts path pca_header_multiqc path clustering_header_multiqc output: path "*.pdf" , optional:true, emit: pdf path "*.RData" , optional:true, emit: rdata path "*pca.vals.txt" , optional:true, emit: pca_txt path "*pca.vals_mqc.tsv" , optional:true, emit: pca_multiqc path "*sample.dists.txt" , optional:true, emit: dists_txt path "*sample.dists_mqc.tsv", optional:true, emit: dists_multiqc path "*.log" , optional:true, emit: log path "size_factors" , optional:true, emit: size_factors path "versions.yml" , emit: versions when: task.ext.when == null || task.ext.when script: def args = task.ext.args ?: '' def args2 = task.ext.args2 ?: '' def label_lower = args2.toLowerCase() def label_upper = args2.toUpperCase() """ deseq2_qc.r \\ --count_file $counts \\ --outdir ./ \\ --cores $task.cpus \\ $args if [ -f "R_sessionInfo.log" ]; then sed "s/deseq2_pca/${label_lower}_deseq2_pca/g" <$pca_header_multiqc >tmp.txt sed -i -e "s/DESeq2 PCA/${label_upper} DESeq2 PCA/g" tmp.txt cat tmp.txt *.pca.vals.txt > ${label_lower}.pca.vals_mqc.tsv sed "s/deseq2_clustering/${label_lower}_deseq2_clustering/g" <$clustering_header_multiqc >tmp.txt sed -i -e "s/DESeq2 sample/${label_upper} DESeq2 sample/g" tmp.txt cat tmp.txt *.sample.dists.txt > ${label_lower}.sample.dists_mqc.tsv fi cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') bioconductor-deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))") END_VERSIONS """ } ``` --- ## Summary - Nextflow can use custom scripts written from many different languages. - Custom scripts can be stored in the `bin/` or the `templates/` directory. - Dependencies can be managed using conda and containers. --- ![](https://i.imgur.com/2Xow8Ti.png)