# wf-metagenomics usage
## Basic commands cheatsheet
- printing the directory you are currently in: `pwd`
- listing the contents of the directory you're in: `ls` (`ls -a` to also list hidden files)
- entering a directory: `cd <directory name>`
- going back one directory: `cd ..` (you can go back two, three or more directories with `../..`, `../../..`, etc)
- opening a file to browse its contents: `less <file name>`. Press `q` once you're done to exit. You can also browse a compressed .gz file without decompressing it by using `zless <file name>`.
- moving a file or directory: `mv <file name> <destination>`. You can also use this command to rename a file or directory: `mv <current name> <new name>`.
- copying a file: `cp <file name> <destination>`. You can copy a directory and its contents with `cp -r <directory name> <destination>`
- removing a file (/!\ THIS IS PERMANENT /!\ ): `rm <file name>`. To remove a directory with all its content: `rm -r <directory name>`.
- creating a directory: `mkdir <name of directory>`.
## Launching the wf-metagenomics pipeline
To launch a command with kraken2 (classification via kmer assignment), filtering for reads between 1300 and 1800 bp:
```
nextflow run epi2me-labs/wf-metagenomics --fastq <path to data> \
--kraken2 --out_dir <path to output directory> --min_len 1300 --max_len 1800
```
To launch a command with minimap2 (classification via alignement) with the same filtering:
```
nextflow run epi2me-labs/wf-metagenomics --fastq data/Elena_cleaned_data \
--out_dir wf-metagenomics_minimap2_filtered \
--min_len 1300 --max_len 1800 --kraken2 false --minimap2 true
```
Remarks:
- the data directory is supposed to contain one directory per barcode with the fastq files inside - remove any empty files, as they generate errors.
- the workflow will create two working directories, store_dir and work, in the directory you launch the command from.
## Full options list
```
Core options
--fastq [string] A fastq file or directory containing fastq input files or directories of input files.
--max_len [integer] Specify read length upper limit [default: 2000]
--min_len [integer] Specify read length lower limit [default: 200]
--out_dir [string] Directory for output of all user-facing files. [default: output]
--report_name [string] Optional report suffix [default: report]
--sample [string] A sample name for non-multiplexed data. Permissible if passing a file or directory of .fastq(.gz).
--sample_sheet [string] CSV file with columns named `barcode`, `sample_name` and `type`. Permissible if passing a directory containing barcodeXX
sub-directories.
--store_dir [string] Where to store initial download of taxonomy database.
--source [string] Sets the default reference, databases and taxonomy used. Choices: ['TARGLOCI'] [default: ncbi_16s_18s]
--taxonomy [string] Optionally override the taxonomy used [.tar.gz or Dir]
Minimap2 options
--minimap2 [boolean] Enables classification via alignment
--reference [string] Specifically override reference [.fna]
--ref2taxid [string] Specifically override ref2taxid mapping. Format is .tsv (refname taxid), no header row.
--minimap2filter [string] Filter output of minimap2 by taxids inc. child nodes, E.g. "9606,1404"
--minimap2exclude [boolean] Invert minimap2filter and exclude the given taxids instead
--split_prefix [boolean] Enable if using a very large reference with minimap2
Kraken2 options
--kraken2 [boolean] Enables classification via kmer-assignment [default: true]
--database [string] Specifically override database [.tar.gz or Dir]
--kraken2exclude [boolean] Invert kraken2exclude and exclude the given taxids instead
--kraken2minimap [boolean] Run minimap2 only on reads classified by Kraken2 [default: true]
--bracken_dist [string] Specifically override bracken kmer dist file
--bracken_length [integer] Set the length value bracken will use [default: 1000]
--watch_path [boolean] Specify if you want the workflow to continuously process input files as they become available in the input --fastq
directories.
--read_limit [integer] Use with watch_path if you don't want workflow to run indefinitely, will STOP pipeline when this number of reads have been
processed.,
--run_indefinitely [boolean] When used with watch_path will watch indefinitely if set to true [default: true]
--batch_size [integer] Number of fastq files to process in each batch [default: 1]
--port [integer] Port number for the kraken server to use [default: 8080]
--bracken_level [string] Set the level that bracken will output at [default: S]
Generic options
--threads [integer] Set max number of threads to use. Note: limited by config executor cpus. [default: 8]
--help [boolean] Display help text.
```