Tessa Pierce Ward

@bluegenes

Prime membership

Joined on Jun 9, 2017

  • This follows from the Introduction to FracMinHash sketching for sequence comparison and Comparing genomes metagenomes using FracMinHash sketches modules. Please refer to the installation instructions in the introduction for installing sourmash and related dependencies. With metagenomic sequencing, a first step is often to determine what organisms are present in the sequenced community. Here, we demonstrate comprehensive metagenome compositional analysis using sourmash FracMinHash sketching, and aggregation to taxonomic groupings. This analysis is entirely reference-dependent. The quality of your results will depend in part on the quality and completeness of the reference database. Before we begin: Do you have sourmash installed? sourmash --version
     Like  Bookmark
  • Prerequisites This module assumes you have a basic understanding of k-mers. Here, we will use FracMinHash sketching to conduct fast yet accurate comparisons of genomic datasets. Specifically, we will explore the following comparisons with k-mers and FracMinHash sketching: - Jaccard - Containment - ANI estimation - Comparing many genomes Software used in this session: sourmash
     Like  Bookmark
  • This follows from the Introduction to FracMinHash sketching for sequence comparison module. Please see the installation instructions there for installing sourmash and other dependencies. A common application for sketching is detecting genomes of interest within metagenomes to determine presence/absence of organisms of interest, such as detecting pathogenic microbes in wastewater metagenomic sequencing. sourmash FracMinHash sketching is well-suited to conducting these comparisons at scale. We first look at genome -x- metagenome comparisons and then move to finding genomes(s) in metagenomes at scale. Genome x Metagenome comparisons In this section, we will compare a query genome to a metagenome sample. The goal is to determine if the query genome is present in the metagenome and, if so, how much of it is present. Datasets we will use:
     Like  Bookmark
  • Setup mamba create -n smash-pg sourmash=4.8.10 mamba activate smash-pg pip install sourmash_utils sourmash_plugin_pangenomics sourmash_plugin_betterplot git clone git@github.com:ctb/2024-pangenome-hash-corr.git cd 2024-pangenome-hash-corr mkdir rs-pangenome cd rs-pangenome if the git@ link doesn't work for you, use https://github.com/ctb/2024-pangenome-hash-corr.git
     Like  Bookmark
  • Setup mamba create -n smash-pg sourmash=4.8.10 mamba activate smash-pg pip install sourmash_utils sourmash_plugin_pangenomics sourmash_plugin_betterplot git clone git@github.com:ctb/2024-pangenome-hash-corr.git cd 2024-pangenome-hash-corr mkdir rs-pangenome cd rs-pangenome Download sketches and taxonomic information Download a database
     Like  Bookmark
  • Analyzing Metagenome Composition using the LIN taxonomic framework Tessa Pierce Ward July 2024 requires sourmash v4.8+ All materials for this workshop: https://github.com/vinatzer-lab/ICPPB2024_workshop
     Like  Bookmark
  • HackMD + Markdown for Lab Notebook a few useful commands tips: Main Title bullet 1 bullet 2 code markdown cheatsheet
     Like  Bookmark
  • CTB meeting 04/11/24 Metadata is terrible Interpretation of metadata is terrible and annoying People want to be able to use private databases Search works .. now what? What are we still missing? Where do we need to go?
     Like  Bookmark
  • feb 2024: https://github.com/sourmash-bio/sourmash_plugin_branchwater/issues/214 apr 2024, with PR: https://github.com/sourmash-bio/sourmash_plugin_branchwater/pull/298: software/version command details time max RAM
     Like  Bookmark
  • branchwater branchwater #134 (the largest one!) branchwater #197 nearly finished: branchwater #205 branchwater #217 sourmash
     Like  Bookmark
  • Hi @L, @e and @s, Thank you so much for your reviews and your enthusiasm for sourmash! We have put a lot of effort into maintenance over the past few years, and it's fantastic to get some external feedback for improvement. @ctb, @luiz, and I (the sourmash maintainers) have addressed each of your comments via the issues posted above. As those address each point in-line, I will summarize here and provide text and explanations where needed. We have cut a new release with these changes (v4.8.6), and plan to release version v4.9.0 when review is accepted and we have a new DOI. JOSS manuscript fixes ensure all citations have DOI and update citation for recently published preprint (https://github.com/sourmash-bio/sourmash/pull/2964)
     Like  Bookmark
  • article 3-mer frequencies (composition) For each long read, we count the frequencies of all 64 3-mers in this read and merge the reverse complements to form a vector of 32 dimensions. The resulting vector is then normalised by the total number of 3-mers observed in the read. sketch singleton 3-mers --> vector of 3-mer frequencies per read/contig (build csv of frequencies of all 32 canonical 3-mers) 15-mer --> coverage
     Like  Bookmark
  • Here are the fundamental changes and advancements we have in mind for the sourmash software. Many of these require significant developer time and may change based on new research directions or observed need. We encourage community and user direction requests via our issue tracker! Multithreading of all utilities Status: in progress Improved binary storage Sourmash has benefitted greatly from storing sketches in standard JSON format (and as gzipped json files).
     Like  Bookmark
  • hackmd-github-sync-badge Goal: Assess viral taxonomic profiling (+classification?) using gather--> tax workflow on mock and real datasets Motivation: a preprint (below) uses sourmash through WhatThePhage and claims it performs poorly for viral classification. The workflow conducts contig-level classification using k21,scaled100 and sourmash search to the phage database using jaccard similarity. their commands: database preparation:sourmash compute --scaled 100 -k 21 --singleton \
     Like  Bookmark
  • manysearch search function, mastiff_manysearch: https://github.com/sourmash-bio/pyo3_branchwater/blob/try-mastiff/src/lib.rs#L903-L1028 threading: uses pyo3_branchwater multithreading framework to spawn threads for each query to search across the rocksdb database (i think) does it load queries to the number of threads we have, opening new ones after the prior ones finish? extracting 'match' md5sum I'm not currently sure how to get this out of the rocksdb, since i think it's just returning the path to the match sig. Am I missing something?
     Like  Bookmark
  • Misc Can Titus be backup for pyopensci issue Aug 16-22? https://github.com/pyOpenSci/software-submission/issues/129 Find meeting time for Katrina Kalantar During the week of August 22-25th I have quite a bit of flexibility Tu-Fri in the afternoons (PST) and during the week of August 28-Sept 1 I have a lot of flexibility in the mornings (PST). Are there any times that work particularly well for you? Also happy to look outside those ranges as needed! Any immediate thoughts on OSCollective? I think I'm now +1. Project Updates Read recruitment comparison workflow:running; on github here
     Like  Bookmark
  • Submitting Author: Tessa Pierce-Ward (@bluegenes) All current maintainers: @ctb, @luizirber, @bluegenes Package Name: sourmash One-Line Description of Package: sourmash is a command line tool and Python library for sketching collections of DNA, RNA, and amino acid k-mers for biological sequence search, comparison, and analysis Repository Link: https://github.com/sourmash-bio/sourmash Version submitted: 4.8.2 Editor: TBDReviewer 1: TBDReviewer 2: TBDArchive: TBDVersion accepted: TBD Date accepted (month/day/year): TBD Code of Conduct & Commitment to Maintain Package
     Like  Bookmark
  • hackmd-github-sync-badge BLAST results generated with new 2023-08-07 spillover accession file. differences from previous: used 2023-08-07 spillover file ICTV VMR database included nearly all entries for vmr 38. There are still two missing (Salmonella phage Fels2, Caenorhabditis elegans Cer13 virus). These are prophage that need to be extracted from their host genomes. There are only 19 spillover accessions which had no hits via either blastn or blastx. They do not seem to correspond to those missing 2 reference VMRs. Instead, it looks like most of them were originally labeled as "Simian immunodeficiency virus", which we do have a VMR genome for. I ended up digging into these a bit, and found a useful illustrative example for thresholding.
     Like  Bookmark
  • hackmd-github-sync-badge General info: BigQuery Metadata search ref Rob Edwards blog post here To get all Metagenome / Microbiome / Metatranscriptome data: We use temporary tables to store the two main searches: what are amplicon projects and what are metagenome/microbiome/metatranscriptome projects, and then we find the projects that are metagenomes: first, just look at the accs:
     Like  Bookmark
  • Sticky exercise: Let's bring it back out to the questions that brought you here to STAMPS! Please populate your stickies (yellow and pink) as follows: Yellow/Green Stickies: Scientific Questions "Questions you'd like to ask of microbial communities." Please try to abstract/generalize!who is there? what are they doing? Pink Stickies: Tools
     Like  Bookmark