nf-core/funcscan tutorial

--- title: nf-core/funcscan tutorial tags: nf-core,documenation,funcscan,pipeline --- # nf-core/funcscan tutorial In this tutorial, we will guide you through how to set up a run for nf-core/funcscan almost from scratch! It will show you how to install nextflow, how to turn on and off different screening categories, and particular tools within each category. We will simulate having performed _de novo_ assembly of two metagenomes, and wish to screen for antimicrobial resistance genes (ARG) and antimicrobial peptides (AMP). ## Prerequisites For this tutorial you will need basic command line experience. You will also need at a minimum the following software installed: - A Unix operating system (Linux, OSX etc.) - [conda](https://docs.conda.io/en/latest/miniconda.html) (or [mamba](https://github.com/conda-forge/miniforge#mambaforge)) - With channels correctly configured ```bash conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge ``` Please see the documentation of the respective tools as necessary. ## Software and Databases First we will make a directory that we will run our test in, and change into it. ```bash mkdir funcscan-run cd funcscan-run/ ``` As nf-core/funcscan uses Nextflow to run the pipeline, we will install Nextflow using conda in a separate software environment called `nf-core` ```bash mamba create -n nf-core -c bioconda nextflow nf-core ``` > ℹ️ Replace `mamba` with `conda` if you did not install `mamba`. Once this environment is installed, we can activate it ```bash conda activate nf-core ``` You can check this has successfully installed Nextflow by running ```bash nextflow -version ``` This should print the version of Nextflow, and then deactivate the environment with ```bash conda deactivate ``` ## Make samplesheet Next we will download some example metagenome assemblies, and prepare our input samplesheet. First we will make a directory called `samples/`, and download two FASTA files into it. ```bash mkdir samples/ cd samples/ ## The resulting FASTA files will total ~11MB curl https://www.ebi.ac.uk/metagenomics/api/v1/analyses/MGYA00575326/file/ERZ1664520_FASTA.fasta.gz -o sample1.fasta.gz curl https://www.ebi.ac.uk/metagenomics/api/v1/analyses/MGYA00575327/file/ERZ1664518_FASTA.fasta.gz -o sample2.fasta.gz cd ../ ``` Now we can use some simple bash commands to programmatically create the make a input CSV samplesheet. ```bash ## Get the full paths for each file ls -1 samples/* > paths.txt ## Construct a sample name based on the filename sed 's#samples/##g;s#.fasta.gz##g' paths.txt > samplenames.txt ## Create the samplesheet, adding a header and then adding the samplenames and paths echo 'sample,fasta' > samplesheet.csv paste -d "," samplenames.txt paths.txt >> samplesheet.csv ``` The contents of the resulting `samplesheet.csv` should look like: ``` sample,fasta sample1,samples/sample1.fasta.gz sample2,samples/sample2.fasta.gz ``` > ⚠️ The `sed` command may not work on OSX, due to differences in `sed` builds between Linux (here) and OSX! Check the before running! ## Pipeline execution preparation Now we can run funcscan! As in some cases pipelines can take a long time, a good practise is to utilise `screen` sessions, that allow you to run your pipeline in the background, and give you your terminal back to do other things while waiting. To create a screen session we can run ```bash screen -R funcscan-run ``` Next we load our software environment containing Nextflow ```bash conda activate nf-core ``` And then download the pipeline code ```bash nextflow pull nf-core/funcscan ``` ## Pipeline exectuion Now we can construct our pipeline run command! > ⚠️ Important: do not execute the command until the tutorial says! First we specify the pipeline name and the version we want to run. ```bash nextflow run nf-core/funcscan -r 1.0.0 \ ``` Then we specify which software environment system to use. In this case we will use `conda` sa we have already got this on our machine for the purposes of this tutorial. ```bash nextflow run nf-core/funcscan -r 1.0.0 \ -profile conda \ ``` Next we need to specify our input samplesheet we constructed and where we want to specify our results directory to go. ```bash nextflow run nf-core/funcscan -r 1.0.0 \ -profile conda \ --input 'samplesheet.csv' \ --outdir './results' \ ``` To specify which categories of biomolcules you wish to screen for, we turn on the respective workflows. In this case, we _dont_ want to screen for biosynthetic gene clusters (BGCs), but we _do_ want to turn on screening for ARGs and AMPs. We do this by specifying the following `--run_*_screening` flags. ```bash nextflow run nf-core/funcscan -r 1.0.0 \ -profile conda \ --input 'samplesheet.csv' \ --outdir './results' \ --run_amp_screening \ --run_arg_screening \ ``` Once we have done this, we can customise a little which tools we want to run for each screening category. For example, we want to skip the [HMMsearch](http://hmmer.org/) modules and [AMPlify](https://github.com/bcgsc/AMPlify) for AMPs, [deepARG](https://bitbucket.org/gusphdproj/deeparg-ss/src/master/deeparg/) and [RGI](https://github.com/arpcard/rgi) for ARGs. We do this with `--*_skip_<tool>` flags. Otherwise all other tools in the two screening categories will be used. ```bash nextflow run nf-core/funcscan -r 1.0.0 \ -profile conda \ --input 'samplesheet.csv' \ --outdir './results' \ --run_amp_screening \ --run_arg_screening \ --amp_skip_hmmsearch \ --amp_skip_amplify \ --arg_skip_deeparg \ --arg_skip_rgi ``` Now hit enter to run the command 🚀! You should now get progress bars indicating what steps of the pipeline are being executed. The pipeline will install the software via conda, download any required databasee, and run the contigs listed in the samplesheet against each tool! If the pipeline is taking a long time (it will take about 37 minutes on a laptop), you can detach from your screen session by pressing `ctrl + a` on your keyboard and the `d` to `detach`. This should return you to your prompt. This should take you back to your normal terminal. To re-attach to the screen session, you can type ```bash screen -r funcscan-run ``` to see the pipeline progress information. However hopefully, if completed, you should now see a `Pipeline completed successfully!` message. ## Results  ## Clean Up Once you've explored the results and wish to finish the tutorial you can deactivate the conda environment with: ```bash conda deactivate ``` You can then exit the screen session by typing: ```bash exit ``` And delete the directory we ran the pipeline > ⚠️ If you wish to retain any downloaded databases to re-use them in future pipeline runs, make sure to move these to another location _before_ running the next command! ```bash rm -r /<path>/<to>/funcscan-run/ ``` > ℹ️ Optional: if you wish to also wish to delete the conda environmens we made, run `conda env list` and identify the two environments (`amrfinderplus` and `nf-core`) and delete the listed directories. And you should be done! ``` [[id: sample1, anon: prokka] [/path/to/one.fa, /path/to/two.fa, /path/to/a_dir/, /path/to/a_dir2/]] ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.