## Using sourmash for viral taxonomy + what viruses are in which sample
- replace read mapping/ get a quick idea of what viruses from elsewhere are in sample
- In case it doesn't replace satisfactory, can still do a fg against a database, then pull out only sequences that sourmash finds, and then readmap to those. Makes the whole read mapping process easier, as the db will be smaller
- Taxonomy: genomad v sourmash
## Compare read mapping and fastgather
- can we retrieve the approx same amount of vOTUs?
- alpha and beta diversity
- tresholds
- map to 95% dedup, then also fastgather against vOTU sketches.
Did fastgather against vOTU db, but no names, because I think I did something wrong creating signatures. If no names, then I can link the genome lenght and bp recovered, and find the approx 75% coverage for each.
Resketch.
## Genomad result
How did I run commands:
- vContact2, genomad, [fastmultigather](https://github.com/AnneliektH/2023-swine-sra/blob/main/sourmash/viral_taxonomy.ipynb)
- [Snakefile](https://github.com/AnneliektH/2023-swine-sra/blob/main/sourmash/snakefiles/Snakefile_virtax) to go from fmg --> taxonomy