---
tags: JPL-HBCU
---
> These were eventually modified further and stored [here](https://github.com/AstrobioMike/JPL-HBCU-2020/wiki/Building-the-reference-databases)
# kraken2/bracken setup and read-classification example
The complete setup took about 3 hours to set up as run below. (Initially built on S1.Xxlarge instance.)
[toc]
## Creating conda environment
```bash
conda create -n kraken2 -c conda-forge -c bioconda -c defaults kraken2 bracken
conda activate kraken2
```
## Setting up kraken2 standard database
Following along with [here](https://github.com/DerrickWood/kraken2/wiki/Manual#standard-kraken-2-database)
```bash
mkdir kraken2-standard-db
```
### Downloading and building reference database
Downloading reference info (took ~XX minutes **STARTED 2:30**) (note, this also masks low-complexity regions by default):
```bash
kraken2-build --standard --db kraken2-standard-db/ --threads 42
```
## Setting up Bracken
Roughly following along from [here](https://github.com/jenniferlu717/Bracken#bracken-253-abundance-estimation).
```bash
bracken-build -d kraken2-standard-db -t 42 -l 150
```
## Clean up
Removing intermediate files (saves a lot of space):
```bash
kraken2-build --clean --db kraken2-standard-db/
```
## Example run
Getting tiny example data:
```bash
curl -L -o sample-1-R1.fq.gz https://ndownloader.figshare.com/files/23237460
curl -L -o sample-1-R2.fq.gz https://ndownloader.figshare.com/files/23237460
```
#### Kraken2
This is just an example. Parameters and settings are not special here. Consult their [documentation](https://github.com/DerrickWood/kraken2/wiki/Manual) and help menu (`kraken2 -h`) while figuring out how you want to run things 🙂
```bash
kraken2 --db kraken2-standard-db/ --threads 6 \
--output sample-1-kraken2-out.txt --report sample-1-kraken2-report.txt \
--paired sample-1-R1.fq.gz sample-1-R2.fq.gz
```
#### Bracken
Same deal, this is just an example. Parameters and settings are not special here, consult their [documentation](https://github.com/jenniferlu717/Bracken#step-3-run-bracken-for-abundance-estimation) and help menu (`bracken -h`) while figuring out how you want to run things 🙂
```bash
bracken -r 150 -d kraken2-standard-db/ -i sample-1-kraken2-report.txt \
-o sample-1-bracken-out.tsv
```
<br>
> **NOTE**
Depending on how things are being evaluated, we may or may not need/want the `bracken` step. If the goal is to track what each individual read was assigned to, that might be better done with just the `kraken2` output. If the goal is to compare expected relative abundances of taxa, that would be better done with the `bracken` output.