spacegraphcats ho! lab meeting

what we're doing, and some associated chopportunities.

(Oct 19, 2020 lab meeting)

github.com/spacegraphcats/spacegraphcats/

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

here be dragons… and maybe (ok, definitely!) some technical debt

spacegraphcats ho! lab meeting what we're doing, and some associated chopportunities. (Oct 19, 2020 lab meeting) github.com/spacegraphcats/spacegraphcats/ here be dragons … and maybe (ok, definitely!) some technical debt

	for cdbg_node in cdbg_nodes:
	# v--- the tricky bit is building cdbg_annots!
	annots = cdbg_annots.get(cdbg_node, [])
	dom_id = catlas.cdbg_to_dom[cdbg_node]
	dom_annots[dom_id].update(annots)

	cdbg_annots = defaultdict(set)

	for gene_name, gene_seq in screed.open(gene_sequences):
	kmers = khmer.sequence_to_kmers(gene_seq)

	# do the search - the tricky bit, in more detail
	cdbg_nodes = catlas.nodes_by_kmers(kmers)

	# save annotations
	for cdbg_node_id in cdbg_nodes:
	cdbg_annots[cdbg_node_id].add(gene_name)

	mphf = mphf_build_table(all_kmers)
	lookups = {}
	for cdbg_node_id, sequence in cdbg_nodes.items():
	for kmer in khmer.sequence_to_kmers(sequence):
	mphf_id = mphf.kmer_to_hash(kmer)
	lookups[mphf_id] = cdbg_node_id

spacegraphcats ho! lab meeting

background: motivation for graph-based approaches in metagenomics

MAPPING AND ASSEMBLY BAD

what does this do? why do we do it?

benchmarking considerations…

taxonomy results for r=1 on mock community data:

taxonomy results for r=5 on mock community data:

how do we do it, code-wise?

expanding annotations from cDBG nodes to dom nodes

building cdbg_annots often relies on k-mers

nodes_by_kmers relies on trickiness to scale

retrieving reads for cDBG contigs

building a read index

current strategy

future strategies??

thanks!