wf-metagenomics usage

# wf-metagenomics usage ## Basic commands cheatsheet - printing the directory you are currently in: `pwd` - listing the contents of the directory you're in: `ls` (`ls -a` to also list hidden files) - entering a directory: `cd <directory name>` - going back one directory: `cd ..` (you can go back two, three or more directories with `../..`, `../../..`, etc) - opening a file to browse its contents: `less <file name>`. Press `q` once you're done to exit. You can also browse a compressed .gz file without decompressing it by using `zless <file name>`. - moving a file or directory: `mv <file name> <destination>`. You can also use this command to rename a file or directory: `mv <current name> <new name>`. - copying a file: `cp <file name> <destination>`. You can copy a directory and its contents with `cp -r <directory name> <destination>` - removing a file (/!\ THIS IS PERMANENT /!\ ): `rm <file name>`. To remove a directory with all its content: `rm -r <directory name>`. - creating a directory: `mkdir <name of directory>`. ## Launching the wf-metagenomics pipeline To launch a command with kraken2 (classification via kmer assignment), filtering for reads between 1300 and 1800 bp: ``` nextflow run epi2me-labs/wf-metagenomics --fastq <path to data> \ --kraken2 --out_dir <path to output directory> --min_len 1300 --max_len 1800 ``` To launch a command with minimap2 (classification via alignement) with the same filtering: ``` nextflow run epi2me-labs/wf-metagenomics --fastq data/Elena_cleaned_data \ --out_dir wf-metagenomics_minimap2_filtered \ --min_len 1300 --max_len 1800 --kraken2 false --minimap2 true ``` Remarks: - the data directory is supposed to contain one directory per barcode with the fastq files inside - remove any empty files, as they generate errors. - the workflow will create two working directories, store_dir and work, in the directory you launch the command from. ## Full options list ``` Core options --fastq [string] A fastq file or directory containing fastq input files or directories of input files. --max_len [integer] Specify read length upper limit [default: 2000] --min_len [integer] Specify read length lower limit [default: 200] --out_dir [string] Directory for output of all user-facing files. [default: output] --report_name [string] Optional report suffix [default: report] --sample [string] A sample name for non-multiplexed data. Permissible if passing a file or directory of .fastq(.gz). --sample_sheet [string] CSV file with columns named `barcode`, `sample_name` and `type`. Permissible if passing a directory containing barcodeXX sub-directories. --store_dir [string] Where to store initial download of taxonomy database. --source [string] Sets the default reference, databases and taxonomy used. Choices: ['TARGLOCI'] [default: ncbi_16s_18s] --taxonomy [string] Optionally override the taxonomy used [.tar.gz or Dir] Minimap2 options --minimap2 [boolean] Enables classification via alignment --reference [string] Specifically override reference [.fna] --ref2taxid [string] Specifically override ref2taxid mapping. Format is .tsv (refname taxid), no header row. --minimap2filter [string] Filter output of minimap2 by taxids inc. child nodes, E.g. "9606,1404" --minimap2exclude [boolean] Invert minimap2filter and exclude the given taxids instead --split_prefix [boolean] Enable if using a very large reference with minimap2 Kraken2 options --kraken2 [boolean] Enables classification via kmer-assignment [default: true] --database [string] Specifically override database [.tar.gz or Dir] --kraken2exclude [boolean] Invert kraken2exclude and exclude the given taxids instead --kraken2minimap [boolean] Run minimap2 only on reads classified by Kraken2 [default: true] --bracken_dist [string] Specifically override bracken kmer dist file --bracken_length [integer] Set the length value bracken will use [default: 1000] --watch_path [boolean] Specify if you want the workflow to continuously process input files as they become available in the input --fastq directories. --read_limit [integer] Use with watch_path if you don't want workflow to run indefinitely, will STOP pipeline when this number of reads have been processed., --run_indefinitely [boolean] When used with watch_path will watch indefinitely if set to true [default: true] --batch_size [integer] Number of fastq files to process in each batch [default: 1] --port [integer] Port number for the kraken server to use [default: 8080] --bracken_level [string] Set the level that bracken will output at [default: S] Generic options --threads [integer] Set max number of threads to use. Note: limited by config executor cpus. [default: 8] --help [boolean] Display help text. ```