# Interpreting GenomeScope profiles for VGP genome assemblies
By Tanya Lama
*GenomeScope* is used as part of the VGP 1.6 assembly pipeline to estimate the overall characteristics of a genome including genome size, heterozygosity rate and repeat content from unprocessed short reads.
Giulio and I compared the outputs of jellyfish and meryl for fAntMac1 (warty frogfish) as follows:
Each plot is coverage of the kmer (x) by kmer counts (y).You can interpret these profiles similar to KAT plots, where you expect a diploid peak and a haploid peak. When we run genomescope we set kmer length = 31. In other words, we are looking for copies (kmer counts) of unique motifs that are 31bp long. We expect these unique motifs to be present at approximately the sequencing depth of our raw data.

## 1. jellyfish: genome size estimate: 521,013,374 bp
jellyfish has a coverage cutoff of 250x, which you can see on the log scale plot

#### GenomeScope Profile in log scale:

## 2. meryl genome size estimate: 604,861,782

#### GenomeScope Profile in log scale:
Here we can see that meryl is better than jellyfish at measuring kmer coverage, including those above 250x. In this genome, it appears we have many 31bp long kmers that are present 1,000 to 10,000 times in the genome (highly repetitive regions). Not counting them (i.e. as jellyfish does) leads to an underestimation of genome size.

*TL;DR: use meryl please*
###### tags: `VGP`