--- tags: meetings, titus, agenda, 2023, May 2023 --- Titus/Tessa Agenda, 05/04/2023 === ## JOSS Paper - most authors signed off, will go tally missing responses. ## SpillOver project - seq classification via sourmash tax approach: - needed (done): [fix tax grep md5s PR](https://github.com/sourmash-bio/sourmash/pull/2602) - needs (in progress): [ICTV taxonomy PR](https://github.com/sourmash-bio/sourmash/pull/2608) - currently running subset to see how well k=21,scaled=1 works for classification. - of 100: > cat output.spillover/gather/dna/*.k21.gather.txt | grep total | sort | uniq -c 28 found 1 matches total; 1 found 11 matches total; 12 found 2 matches total; 30 found 3 matches total; 12 found 4 matches total; 7 found 5 matches total; 4 found 6 matches total; 5 found 7 matches total; 1 found 9 matches total; not sure how useful these matches are yet - Viral LINs for classification, clustering - need to reach out + talk with Boris, Reza et al, see what (if anything) they're already doing in the viral space - CD-HIT clustering - (to do) ## FRO Updates - Website plan update - Adam, Suzanne website is far ahead of my efforts. Hosting at USDA is hard because of paperwork for usda.gov site. - suggested us hosting - they said they could package and make public within ~a week - but it still needs cleanup, some work, so we should consider it a soft launch - Would add "A collaborative effort between UC Davis, JGI and USDA" at base of website - We could fork into sourmash-bio github, but not sure about maintenance/support responsibilities. - In the meantime, I will write sourmash docs on all the large-scale search sites (greyhound, mastiff), search parameters, guidelines, etc. - [proposal](https://docs.google.com/document/d/1z4Nz3Tl1ycWHK2XNCmS3Lg9ccodRdbSeYV08AGKIg_A/edit): **revamped first two paragraphs** for pathogen workflows, urgency. Still need to revamp remaining text, want to better describe milestones + risks. ### Some highlights from meetings - Adam + Suzanne / USDA chat - pathogen dashboard idea - neat metadata percentage info (e.g. 40% of datasets in mastiff have lat/long info) - list of high-priority pathogens - Chris Gulvik (CDC) chat - "pathogen-agnostic" tool is lacking. CDC teams are working on an in-house tool, but still a long way off - wastewater surveillance folks may be interested - chatting with Shatavia Morrison 5/8/23 - setting up a monitoring effort in/for Thailand. Resources are an issue, lightweight local tools would be ideal - suggested local mastiff db's for these sorts of situations - NOT web (e.g. MiSeq data transfer usually takes 10min in their lab, took 3 days there) - Amanda - great context + suggestions re handling datasets shared across databases (e.g. present both in SRA, non-SRA) - dataset discovery - MetaSeek / Adrienne Hoarfrost chat - metadata search, start from MetaSeek approach - Metadata imputation: - MetaSeek used a series of manual rules. Learn from this + use an ML approach to do better - build workflow for ml/deep learning dataset identification - build test/train split (not random, make sure sequence similarity is appropriate). Also suggest a small, representative subset for testing - **Rob Finn**: meeting 5/15 - **Rayan Chikhi**, **Rodney Brister** - no response ## Other - contig-level workflow could be used for spillover, might be