kraken2/bracken setup and read-classification example

--- tags: JPL-HBCU --- > These were eventually modified further and stored [here](https://github.com/AstrobioMike/JPL-HBCU-2020/wiki/Building-the-reference-databases) # kraken2/bracken setup and read-classification example The complete setup took about 3 hours to set up as run below. (Initially built on S1.Xxlarge instance.) [toc] ## Creating conda environment ```bash conda create -n kraken2 -c conda-forge -c bioconda -c defaults kraken2 bracken conda activate kraken2 ``` ## Setting up kraken2 standard database Following along with [here](https://github.com/DerrickWood/kraken2/wiki/Manual#standard-kraken-2-database) ```bash mkdir kraken2-standard-db ``` ### Downloading and building reference database Downloading reference info (took ~XX minutes **STARTED 2:30**) (note, this also masks low-complexity regions by default): ```bash kraken2-build --standard --db kraken2-standard-db/ --threads 42 ``` ## Setting up Bracken Roughly following along from [here](https://github.com/jenniferlu717/Bracken#bracken-253-abundance-estimation). ```bash bracken-build -d kraken2-standard-db -t 42 -l 150 ``` ## Clean up Removing intermediate files (saves a lot of space): ```bash kraken2-build --clean --db kraken2-standard-db/ ``` ## Example run Getting tiny example data: ```bash curl -L -o sample-1-R1.fq.gz https://ndownloader.figshare.com/files/23237460 curl -L -o sample-1-R2.fq.gz https://ndownloader.figshare.com/files/23237460 ``` #### Kraken2 This is just an example. Parameters and settings are not special here. Consult their [documentation](https://github.com/DerrickWood/kraken2/wiki/Manual) and help menu (`kraken2 -h`) while figuring out how you want to run things 🙂 ```bash kraken2 --db kraken2-standard-db/ --threads 6 \ --output sample-1-kraken2-out.txt --report sample-1-kraken2-report.txt \ --paired sample-1-R1.fq.gz sample-1-R2.fq.gz ``` #### Bracken Same deal, this is just an example. Parameters and settings are not special here, consult their [documentation](https://github.com/jenniferlu717/Bracken#step-3-run-bracken-for-abundance-estimation) and help menu (`bracken -h`) while figuring out how you want to run things 🙂 ```bash bracken -r 150 -d kraken2-standard-db/ -i sample-1-kraken2-report.txt \ -o sample-1-bracken-out.tsv ``` <br> > **NOTE** Depending on how things are being evaluated, we may or may not need/want the `bracken` step. If the goal is to track what each individual read was assigned to, that might be better done with just the `kraken2` output. If the goal is to compare expected relative abundances of taxa, that would be better done with the `bracken` output.