---
tags: meetings, titus, agenda, 2023, May 2023
---
Titus/Tessa Agenda, 05/04/2023
===
## JOSS Paper
- most authors signed off, will go tally missing responses.
## SpillOver project
- seq classification via sourmash tax approach:
- needed (done): [fix tax grep md5s PR](https://github.com/sourmash-bio/sourmash/pull/2602)
- needs (in progress): [ICTV taxonomy PR](https://github.com/sourmash-bio/sourmash/pull/2608)
- currently running subset to see how well k=21,scaled=1 works for classification.
- of 100:
> cat output.spillover/gather/dna/*.k21.gather.txt | grep total | sort | uniq -c
28 found 1 matches total;
1 found 11 matches total;
12 found 2 matches total;
30 found 3 matches total;
12 found 4 matches total;
7 found 5 matches total;
4 found 6 matches total;
5 found 7 matches total;
1 found 9 matches total;
not sure how useful these matches are yet
- Viral LINs for classification, clustering
- need to reach out + talk with Boris, Reza et al, see what (if anything) they're already doing in the viral space
- CD-HIT clustering
- (to do)
## FRO Updates
- Website plan update
- Adam, Suzanne website is far ahead of my efforts. Hosting at USDA is hard because of paperwork for usda.gov site.
- suggested us hosting
- they said they could package and make public within ~a week
- but it still needs cleanup, some work, so we should consider it a soft launch
- Would add "A collaborative effort between UC Davis, JGI and USDA" at base of website
- We could fork into sourmash-bio github, but not sure about maintenance/support responsibilities.
- In the meantime, I will write sourmash docs on all the large-scale search sites (greyhound, mastiff), search parameters, guidelines, etc.
- [proposal](https://docs.google.com/document/d/1z4Nz3Tl1ycWHK2XNCmS3Lg9ccodRdbSeYV08AGKIg_A/edit): **revamped first two paragraphs** for pathogen workflows, urgency. Still need to revamp remaining text, want to better describe milestones + risks.
### Some highlights from meetings
- Adam + Suzanne / USDA chat
- pathogen dashboard idea
- neat metadata percentage info (e.g. 40% of datasets in mastiff have lat/long info)
- list of high-priority pathogens
- Chris Gulvik (CDC) chat
- "pathogen-agnostic" tool is lacking. CDC teams are working on an in-house tool, but still a long way off
- wastewater surveillance folks may be interested
- chatting with Shatavia Morrison 5/8/23
- setting up a monitoring effort in/for Thailand. Resources are an issue, lightweight local tools would be ideal
- suggested local mastiff db's for these sorts of situations
- NOT web (e.g. MiSeq data transfer usually takes 10min in their lab, took 3 days there)
- Amanda
- great context + suggestions re handling datasets shared across databases (e.g. present both in SRA, non-SRA)
- dataset discovery
- MetaSeek / Adrienne Hoarfrost chat
- metadata search, start from MetaSeek approach
- Metadata imputation:
- MetaSeek used a series of manual rules. Learn from this + use an ML approach to do better
- build workflow for ml/deep learning dataset identification
- build test/train split (not random, make sure sequence similarity is appropriate). Also suggest a small, representative subset for testing
- **Rob Finn**: meeting 5/15
- **Rayan Chikhi**, **Rodney Brister** - no response
## Other
- contig-level workflow could be used for spillover, might be