Centrifuge reference database build

UPDATE

In getting closer to publication, I split these CoV-IRT microbial subgroup related programs into their own conda package. The custom programs on this page that start with bit- will be replaced by versions that just start with cov- that are included in that conda install. That should be installed with conda as shown on that page, and the install bit instructions below should be ignored.

Centrifuge reference database build
UPDATE

Installing centrifuge

Creating and installing in a new conda environment:

  # blast is included for dustmasker
conda create -y -n centrifuge -c conda-forge -c bioconda -c defaults centrifuge=1.0.4_beta blast=2.9.0

conda activate centrifuge

Building on 12-Sept-2020

Takes about 2 days for the centrifuge-build command as run below.

  # this step took a minute
centrifuge-download -P 50 -o taxonomy taxonomy
  # this step took about 30 minutes
centrifuge-download -P 50 -o library -m -d "archaea,bacteria,viral,fungi" refseq > seqid2taxid.map

  # adding human (this took about 15 minutes)
centrifuge-download -o library -m -d "vertebrate_mammalian" -a "Chromosome" -t 9606 -c 'reference genome' refseq >> seqid2taxid.map

cat library/archaea/*.fna > input-sequences.fna
cat library/vertebrate_mammalian/*.fna >> input-sequences.fna
cat library/viral/*.fna >> input-sequences.fna
cat library/fungi/*.fna >> input-sequences.fna
cat library/bacteria/*.fna >> input-sequences.fna

# can get rid of these now to recover some space unless wanted
  # but typically too many to do it with one command
rm library/archaea/*.fna
rm library/vertebrate_mammalian/*.fna
rm library/viral/*.fna
rm library/fungi/*.fna
rm library/bacteria/*.fna

centrifuge-build -p 50 --bmax 1342177280 --conversion-table seqid2taxid.map \
                 --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \
                 input-sequences.fna centrifuge-ref-db

Database download

This is ~22GB compressed and 36GB decompressed/unpacked:

curl -L -o centrifuge-ref-db.tar.gz https://ndownloader.figshare.com/files/24725264

Unpacking/decompressing:

tar -xzvf centrifuge-ref-db.tar.gz

Example run

Downloading one of our samples (CRR125950):

curl -L -o CRR125950_filtered_1_reads_trim25_1.fq.gz https://osf.io/hg7p4/download

Running classification:

centrifuge -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db -U CRR125950_filtered_1_reads_trim25_1.fq.gz \
           -S CRR125950-our-filtered-centrifuge-out.tsv \
           --report-file CRR125950-our-filtered-centrifuge-report.tsv \
           -k 1 -p 20

Running one without setting -k 1:

centrifuge -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db -U CRR125950_filtered_1_reads_trim25_1.fq.gz \
           -S CRR125950-our-filtered-centrifuge-out-no-k1.tsv \
           --report-file CRR125950-our-filtered-centrifuge-report-no-k1.tsv \
           -p 20

The CRR125950-our-filtered-centrifuge-out.tsv file holds the read-level results:

head CRR125950-our-filtered-centrifuge-out.tsv | column -ts $'\t' | sed 's/^/# /'
# readID                                   seqID         taxID  score  2ndBestScore  hitLength  queryLength  numMatches
# A00159:145:H75T2DMXX:1:1101:7735:13792   NC_000011.10  9606   81     0             24         37           1
# A00159:145:H75T2DMXX:1:1101:10755:13573  unclassified  0      0      0             0          162          1
# A00159:145:H75T2DMXX:1:1101:31069:14215  unclassified  0      0      0             0          160          1
# A00159:145:H75T2DMXX:1:1101:3097:13808   unclassified  0      0      0             0          157          1
# A00159:145:H75T2DMXX:1:1101:10158:12383  unclassified  0      0      0             0          150          1
# A00159:145:H75T2DMXX:1:1101:11216:13557  unclassified  0      0      0             0          30           1
# A00159:145:H75T2DMXX:1:1101:1054:14622   unclassified  0      0      0             0          175          1
# A00159:145:H75T2DMXX:1:1101:9986:14559   unclassified  0      0      0             0          170          1
# A00159:145:H75T2DMXX:1:1101:22688:14074  NC_012920.1   9606   121    0             26         26           1

The CRR125950-our-filtered-centrifuge-report.tsv file holds the taxon-level summary results:

head CRR125950-our-filtered-centrifuge-report.tsv | column -ts $'\t' | sed 's/^/# /'
# name                       taxID  taxRank       genomeSize  numReads  numUniqueReads  abundance
# root                       1      no rank       0           6902      6902            0.0
# Bacteria                   2      superkingdom  0           642       642             0.0
# Buchnera aphidicola        9      species       626933      15        15              0.0
# Shewanella                 22     genus         5140018     1         1               0.0
# Shewanella putrefaciens    24     species       4749735     1         1               0.0
# Myxococcaceae              31     family        9636120     1         1               0.0
# Myxococcus xanthus         34     species       9139763     2         2               0.0
# Corallococcus macrosporus  35     species       8973512     1         1               0.0
# Cystobacter fuscus         43     species       12349744    2         2               0.0

And the "Abundance" column in that file sums to 1:

awk -F $'\t' ' { sum += $7 } END { print sum } ' CRR125950-our-filtered-centrifuge-report.tsv | sed 's/^/# /'
# 1

NOTE

this seems to consider only what's classified in this file

I'm not entirely sure how it's getting the relative abundances – maybe incorporating read-length and genome-size, but looking at a run with all 100-bp reads didn't make disentangling it any easier

That file also only has the genus and species, we can add the rest with one of my helper programs:

Adding full lineage info

    # getting taxids
cut -f 2 CRR125950-our-filtered-centrifuge-report.tsv | tail -n +2 > taxids.tmp

    # getting lineage info (bit 1.8.10+)
bit-get-lineage-from-taxids -i taxids.tmp -o cent-lineages.tmp

    # combining with rel. abundances
paste CRR125950-our-filtered-centrifuge-report.tsv <( cut -f 2- cent-lineages.tmp ) | sed 's/specific_name/species/' > CRR125950-centrifuge-report-with-full-lineages.tsv

rm taxids.tmp cent-lineages.tmp

So the table now looks like this:

head CRR125950-centrifuge-report-with-full-lineages.tsv | column -ts $'\t' | sed 's/^/# /'
# name                       taxID  taxRank       genomeSize  numReads  numUniqueReads  abundance  domain    phylum          class                order             family          genus          species
# root                       1      no rank       0           6902      6902            0.0        NA        NA              NA                   NA                NA              NA             NA
# Bacteria                   2      superkingdom  0           642       642             0.0        Bacteria  NA              NA                   NA                NA              NA             NA
# Buchnera aphidicola        9      species       626933      15        15              0.0        Bacteria  Proteobacteria  Gammaproteobacteria  Enterobacterales  Erwiniaceae     Buchnera       Buchnera aphidicola
# Shewanella                 22     genus         5140018     1         1               0.0        Bacteria  Proteobacteria  Gammaproteobacteria  Alteromonadales   Shewanellaceae  Shewanella     NA
# Shewanella putrefaciens    24     species       4749735     1         1               0.0        Bacteria  Proteobacteria  Gammaproteobacteria  Alteromonadales   Shewanellaceae  Shewanella     Shewanella putrefaciens
# Myxococcaceae              31     family        9636120     1         1               0.0        Bacteria  Proteobacteria  Deltaproteobacteria  Myxococcales      Myxococcaceae   NA             NA
# Myxococcus xanthus         34     species       9139763     2         2               0.0        Bacteria  Proteobacteria  Deltaproteobacteria  Myxococcales      Myxococcaceae   Myxococcus     Myxococcus xanthus
# Corallococcus macrosporus  35     species       8973512     1         1               0.0        Bacteria  Proteobacteria  Deltaproteobacteria  Myxococcales      Myxococcaceae   Corallococcus  Corallococcus macrosporus
# Cystobacter fuscus         43     species       12349744    2         2               0.0        Bacteria  Proteobacteria  Deltaproteobacteria  Myxococcales      Archangiaceae   Cystobacter    Cystobacter fuscus

kraken-style report

Can make a kraken-style report (defined here which has percentages of reads covered by each taxon (the percentages do not sum to 100 because they are reported for all ranks):

Running on one with -k 1:

centrifuge-kreport -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db CRR125950-our-filtered-centrifuge-out.tsv > CRR125950-our-filtered-centrifuge-kreport.tsv

head CRR125950-our-filtered-centrifuge-kreport.tsv | sed 's/^/# /'
#  79.29        1850506 1850506 U       0       unclassified
#  20.71        483448  6902    -       1       root
#  20.39        475810  0       -       131567    cellular organisms
#  18.64        435010  465     D       2759        Eukaryota
#  18.62        434545  0       -       33154         Opisthokonta
#  18.49        431617  0       K       33208           Metazoa
#  18.49        431617  0       -       6072              Eumetazoa
#  18.49        431617  0       -       33213               Bilateria
#  18.49        431617  0       -       33511                 Deuterostomia
#  18.49        431617  0       P       7711                    Chordata

Column 3 does sum to the total number of reads in the sample (2,333,954):

awk ' { sum += $3 } END { print sum } ' CRR125950-our-filtered-centrifuge-kreport.tsv | sed 's/^/# /'
# 2333954

So we could turn that into a proportion that includes unclassified, but we wouldn't have the same rank for everything, so would need to extend out further.
* E.g. in the above example, we'd have 465 reads assigned to Eukaryota, and those reads wouldn't have anything at the genus or species level if we were trying to summarize at those levels.
* Note to self: could take all rows where column 3 != 0; pull the read count and taxid; get full lineage info; then use the counts on any rank we want to summarize (this would just have "unclassified" for any read that didn't have one at that rank, which would be true at that rank, but maybe not a higher rank)

Running on one with no -k 1:

centrifuge-kreport -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db CRR125950-our-filtered-centrifuge-out-no-k1.tsv > CRR125950-our-filtered-centrifuge-kreport-no-k1.tsv

head CRR125950-our-filtered-centrifuge-kreport-no-k1.tsv | sed 's/^/# /'
#  79.29        1850506 1850506 U       0       unclassified
#  20.71        483448  5420    -       1       root
#  20.45        477297  1621    -       131567    cellular organisms
#  18.64        434946  0       D       2759        Eukaryota
#  18.64        434946  512     -       33154         Opisthokonta
#  18.49        431546  0       K       33208           Metazoa
#  18.49        431546  0       -       6072              Eumetazoa
#  18.49        431546  0       -       33213               Bilateria
#  18.49        431546  0       -       33511                 Deuterostomia
#  18.49        431546  0       P       7711                    Chordata

Reporting fractions of read assignments instead of LCA

I'm kinda digging the idea behind the --no-lca option (see here), seeing what that looks like:

Running on one with -k 1:

centrifuge-kreport -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db --no-lca \
                   CRR125950-our-filtered-centrifuge-out.tsv \
                   > CRR125950-centrifuge-reformatted-no-lca.tsv

That one doesn't do anything different (meaning its output was the same as CRR125950-our-filtered-centrifuge-kreport.tsv). I assume this is because it was run with -k 1, so there were no assignments to split up into fractions (each read was forced to go to just one).

Running on one with no -k 1:

centrifuge-kreport -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db --no-lca \
                   CRR125950-our-filtered-centrifuge-out-no-k1.tsv \
                   > CRR125950-centrifuge-reformatted-no-lca-no-k1.tsv

head CRR125950-centrifuge-reformatted-no-lca-no-k1.tsv | sed 's/^/# /'
#  79.29	1850512	1850512	U	0	unclassified
#  20.71	483441	0	-	1	root
#  20.58	480396	0	-	131567	  cellular organisms
#  18.80	438853	0	D	2759	    Eukaryota
#  18.80	438853	0	-	33154	      Opisthokonta
#  18.66	435402	0	K	33208	        Metazoa
#  18.66	435402	0	-	6072	          Eumetazoa
#  18.66	435402	0	-	33213	            Bilateria
#  18.66	435402	0	-	33511	              Deuterostomia
#  18.66	435402	0	P	7711	                Chordata

This comes up a little lower than the total number of reads in the sample (2,333,954):

awk ' { sum += $3 } END { print sum } ' CRR125950-centrifuge-reformatted-no-lca-no-k1.tsv | sed 's/^/# /'
# 2332719

I don't know what to use.

Running example with all equal read sizes

conda install -c conda-forge -c bioconda -c defaults fastp

Filtering to only keep reads of length 100:

fastp -i CRR125950_filtered_1_reads_trim25_1.fq.gz -o CRR125950_filtered_1_reads_trim25_1-len-filt.fq.gz --max_len1 100 -l 100 -t 4

Running classification on same-length reads with -k 1:

centrifuge -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db -U CRR125950_filtered_1_reads_trim25_1-len-filt.fq.gz \
           -S CRR125950-our-filtered-centrifuge-out-len-filt.tsv \
           --report-file CRR125950-our-filtered-centrifuge-report-len-filt.tsv \
           -k 1 -p 20

Running simulated sample from Krista

Grabbing this one:

curl -L -o UnAmbiguouslyMapped_ds.buccal_trim2.fq.gz https://osf.io/r46cm/download

And I think the right info file, at the species level, from here:

curl -L -o Species_Buccal_Truth.csv https://osf.io/m2d8y/download

Running centrifuge with -k 1 set:

centrifuge -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db -U UnAmbiguouslyMapped_ds.buccal_trim2.fq.gz \
           -S UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out.tsv \
           --report-file UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv \
           -k 1 -p 20

Running centrifuge with no -k set:

centrifuge -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db -U UnAmbiguouslyMapped_ds.buccal_trim2.fq.gz \
           -S UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out-no-k1.tsv \
           --report-file UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report-no-k1.tsv \
           -p 20

  # and making a kreport for this one
centrifuge-kreport -x 12-Sept-2020-centrifuge-ref-db/centrifuge-ref-db --no-lca UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out-no-k1.tsv > UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-kreport-no-k1.tsv

Results

Result files can be downloaded with the following:

curl -L -o UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv https://ndownloader.figshare.com/files/24732191

curl -L -o UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out.tsv https://ndownloader.figshare.com/files/24732194

With `-k 1` set

Of the 600,000 reads, 25,691 were left unclassified:

grep -c -w "unclassified" UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out.tsv | sed 's/^/# /'
# 25691

Peeking at the report tab, sorted by relative abundance (these top 20 hold > 99% of the estimated abundance):

cat <( head -n 1 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv ) <( sort -t $'\t' -grk 7 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv | head -n 20 ) | sed 's/^/|/' | sed "s/\\t/ | /g" | sed 's/$/ |/'

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Streptococcus thermophilus	1308	species	1844731	49994	49994	0.143208
Streptococcus pyogenes	1314	species	1944914	45724	45724	0.124289
Streptococcus pneumoniae	1313	species	2114821	47055	47055	0.119754
Streptococcus mutans	1309	species	2024183	46686	46686	0.110201
Streptococcus sanguinis	1305	species	2388435	49893	49893	0.109841
Veillonella dispar	39778	species	2116915	45427	45427	0.107906
Haemophilus influenzae	727	species	2278803	48421	48421	0.0980891
Streptococcus equi subsp. zooepidemicus MGCS10565	552526	strain	2024171	145	145	0.0784375
Streptococcus salivarius	1304	species	2213879	40471	40471	0.0740379
Streptococcus dysgalactiae	1334	species	7257305	12007	12007	0.00710079
Streptococcus pneumoniae CGSP14	516950	strain	2209198	2618	2618	0.00635523
Streptococcus pyogenes MGAS10270	370552	strain	1928252	2250	2250	0.00587681
Streptococcus salivarius CCHSS3	1048332	strain	2217184	2844	2844	0.00469715
Haemophilus parainfluenzae T3T1	862965	strain	2086875	14188	14188	0.00360981
Haemophilus parainfluenzae	729	species	2086875	27051	27051	0.00240993
Neisseria subflava	28449	species	4517530	11978	11978	0.00108579
Streptococcus mutans GS-5	1198676	strain	2027088	359	359	0.000748662
Streptococcus salivarius JIM8777	347253	strain	2210574	683	683	0.000614505
Streptococcus sp. NCTC 11567	2583584	species	2147716	236	236	0.000529961
Streptococcus sp. FDAARGOS_192	1839799	species	2435494	488	488	0.000275644

Could take all with an abundance assignment, convert their taxids to lineages, combine at species- or genus-level, then will have to standard ranks…

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Streptococcus thermophilus	1308	species	1844731	49994	49994	0.143208
Streptococcus sanguinis	1305	species	2388435	49893	49893	0.109841
Haemophilus influenzae	727	species	2278803	48421	48421	0.0980891
Streptococcus pneumoniae	1313	species	2114821	47055	47055	0.119754
Streptococcus mutans	1309	species	2024183	46686	46686	0.110201
Streptococcus	1301	genus	2449574	46675	46675	0.0
Streptococcus pyogenes	1314	species	1944914	45724	45724	0.124289
Veillonella dispar	39778	species	2116915	45427	45427	0.107906
Streptococcus salivarius	1304	species	2213879	40471	40471	0.0740379
Streptococcus equi	1336	species	5577323	36424	36424	0.0
Haemophilus parainfluenzae	729	species	2086875	27051	27051	0.00240993
Haemophilus parainfluenzae T3T1	862965	strain	2086875	14188	14188	0.00360981
Streptococcus dysgalactiae	1334	species	7257305	12007	12007	0.00710079
Neisseria subflava	28449	species	4517530	11978	11978	0.00108579
Streptococcus equi subsp. zooepidemicus	40041	subspecies	19334008	6114	6114	0
Neisseria mucosa	488	species	5008700	5479	5479	0.0
Neisseria	482	genus	2223758	4680	4680	0.0
Veillonella	29465	genus	2132142	4427	4427	0.0
Neisseria flavescens	484	species	2231882	3101	3101	0.0

This adds to the complication of understanding their "abundance" column, as anything that has a lower rank present will have a 0 for abundance (e.g. Steptococcus genus, or Streptococcus equi – S. equi, has abundance lower down assigned to a specific strain that has few counts, but gets 8% of the population):

cat <( head -n 1 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv ) <( grep "Streptococcus equi" UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv ) | sed 's/^/|/' | sed "s/\\t/ | /g" | sed 's/$/ |/'

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Streptococcus equinus	1335	species	13680225	310	310	5.26736e-05
Streptococcus equi	1336	species	5577323	36424	36424	0.0
Streptococcus equi subsp. zooepidemicus	40041	subspecies	19334008	6114	6114	0
Streptococcus equi subsp. zooepidemicus MGCS10565	552526	strain	2024171	145	145	0.0784375
Streptococcus equi subsp. equi 4047	553482	strain	2253793	640	640	0
Streptococcus equi subsp. zooepidemicus CY	1403449	strain	2107382	85	85	0.0

Without `-k 1` set

Of the 600,000 reads, 25,691 were left unclassified (same):

grep -c -w "unclassified" UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-out-no-k1.tsv | sed 's/^/# /'
# 25691

Peeking at the report tab, sorted by number of reads assigned:

cat <( head -n 1 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report-no-k1.tsv ) <( sort -t $'\t' -nrk6 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report-no-k1.tsv ) | head -n 20 | sed 's/^/|/' | sed "s/\\t/ | /g" | sed 's/$/ |/'

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Streptococcus thermophilus	1308	species	1844731	49994	49994	0.143208
Streptococcus sanguinis	1305	species	2388435	49893	49893	0.109841
Haemophilus influenzae	727	species	2278803	48421	48421	0.0980891
Streptococcus pneumoniae	1313	species	2114821	47055	47055	0.119754
Streptococcus mutans	1309	species	2024183	46686	46686	0.110201
Streptococcus	1301	genus	2449574	46675	46675	0.0
Streptococcus pyogenes	1314	species	1944914	45724	45724	0.124289
Veillonella dispar	39778	species	2116915	45427	45427	0.107906
Streptococcus salivarius	1304	species	2213879	40471	40471	0.0740379
Streptococcus equi	1336	species	5577323	36424	36424	0.0
Haemophilus parainfluenzae	729	species	2086875	27051	27051	0.00240993
Haemophilus parainfluenzae T3T1	862965	strain	2086875	14188	14188	0.00360981
Streptococcus dysgalactiae	1334	species	7257305	12007	12007	0.00710079
Neisseria subflava	28449	species	4517530	11978	11978	0.00108579
Streptococcus equi subsp. zooepidemicus	40041	subspecies	19334008	6114	6114	0
Neisseria mucosa	488	species	5008700	5479	5479	0.0
Neisseria	482	genus	2223758	4680	4680	0.0
Veillonella	29465	genus	2132142	4427	4427	0.0
Neisseria flavescens	484	species	2231882	3101	3101	0.0

cat <( head -n 1 UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv ) <( grep "Streptococcus equi" UnAmbiguouslyMapped_ds.buccal_trim2-centrifuge-report.tsv ) | sed 's/^/|/' | sed "s/\\t/ | /g" | sed 's/$/ |/'

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance
Streptococcus equinus	1335	species	13680225	310	310	5.26736e-05
Streptococcus equi	1336	species	5577323	36424	36424	0.0
Streptococcus equi subsp. zooepidemicus	40041	subspecies	19334008	6114	6114	0
Streptococcus equi subsp. zooepidemicus MGCS10565	552526	strain	2024171	145	145	0.0784375
Streptococcus equi subsp. equi 4047	553482	strain	2253793	640	640	0
Streptococcus equi subsp. zooepidemicus CY	1403449	strain	2107382	85	85	0.0

Kiran Adhikari

2024/02/14 20:55:20

Hey Mike, I am using this page as reference and trying to reproduce this work i am getting errors: 1. /usr/bin/cat: Argument list too long 2. Warning: Taxonomy ID 3111325 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! Warning: Taxonomy ID 3111776 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! Warning: Taxonomy ID 3117733 is not in the provided taxonomy tree (taxonomy/nodes.dmp)! bmax according to bmax setting: 1342177280 Using parameters --bmax 1006632960 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 1006632960 --dcv 1024 Constructing suffix-array element generator Building DifferenceCoverSample If you could guide me through this it would be really helpful. Thank you in advance.

Centrifuge reference database build

UPDATE

Installing centrifuge

Building on 12-Sept-2020

Database download

Example run

Adding full lineage info

kraken-style report

Reporting fractions of read assignments instead of LCA

Running example with all equal read sizes

Running simulated sample from Krista

Results

With -k 1 set

Without -k 1 set

Read more

BRAILLE update (31-Mar-2021)

STAMPS 2024 MAG concepts

STAMPS 2023 GToTree tutorial

Creating STAMPS 2023 JetStream2 image

With `-k 1` set

Without `-k 1` set