## Using sourmash for viral taxonomy + what viruses are in which sample - replace read mapping/ get a quick idea of what viruses from elsewhere are in sample - In case it doesn't replace satisfactory, can still do a fg against a database, then pull out only sequences that sourmash finds, and then readmap to those. Makes the whole read mapping process easier, as the db will be smaller - Taxonomy: genomad v sourmash ## Compare read mapping and fastgather - can we retrieve the approx same amount of vOTUs? - alpha and beta diversity - tresholds - map to 95% dedup, then also fastgather against vOTU sketches. Did fastgather against vOTU db, but no names, because I think I did something wrong creating signatures. If no names, then I can link the genome lenght and bp recovered, and find the approx 75% coverage for each. Resketch. ## Genomad result How did I run commands: - vContact2, genomad, [fastmultigather](https://github.com/AnneliektH/2023-swine-sra/blob/main/sourmash/viral_taxonomy.ipynb) - [Snakefile](https://github.com/AnneliektH/2023-swine-sra/blob/main/sourmash/snakefiles/Snakefile_virtax) to go from fmg --> taxonomy