The previous tutorial here ended with the generation of bins and co Assembly but what now?
Now we can look at how the bins are distributed at each time point.
For that we use R
with the multitool library tidyverse
library(tidyverse)
We will be loading the individual library mapping to the coassembly first:
covlist<-list.files("Covstats_FS2C_B1/", full.names = T)
cov<-list()
for (i in 1:length(covlist)){
cov[[i]]<-read_tsv(covlist[i]) %>%
rename(ID=`#ID`) %>%
separate(ID, into = "ID", sep=" ",extra = "drop") %>%
mutate(Lib=paste0("D", i-1))
}
cov<-reduce(cov, rbind)
Then the contig distribution summary generated by binny
bin<-read_tsv("contig_data.tsv")%>%
separate(bin, into=c("binID1","binID2"),sep="\\.") %>%
type_convert
Then the GTDB-Tk summary results
tax<-read_tsv("gtdbtk/gtdbtk.bac120.summary.tsv") %>%
separate(classification, into=c("K","P","C","O","F","G","S"), sep=";") %>%
mutate(user_genome=gsub("binny_", "", user_genome)) %>%
separate(user_genome, into=c("binID1","binID2","C","P"), extra="drop") %>%
mutate(binID2=as.numeric(binID2),
binID1=gsub("R0","R",binID1),
binID1=gsub("I0","I",binID1))
What a metagenome looks like
cov %>%
rename(contig=ID) %>%
left_join(bin) %>%
left_join(tax) %>%
ggplot(aes(y=Avg_fold,x=Ref_GC,col=O))+
geom_point()+
facet_wrap(~Lib)+
scale_y_log10()
Learn More โ
How bin distribution varies accross libraries (here they don't varie much)
cov %>%
rename(contig=ID) %>%
select(contig,Lib,Avg_fold) %>%
spread(Lib,Avg_fold) %>%
left_join(bin) %>%
left_join(tax) %>%
ggplot(aes(x=D0,y=D1,col=O))+
geom_point()+
scale_y_log10()+
scale_x_log10()
Learn More โ
Then the average bin coverage (average fold) mapping within each library
cov %>%
rename(contig=ID) %>%
left_join(bin) %>%
left_join(tax) %>%
ggplot(aes(x=paste(binID1,binID2),y=Avg_fold,col=O))+
geom_boxplot()+
scale_y_log10()+
facet_wrap(~Lib)
Learn More โ
And the average coverage for each bin accross time
cov %>%
rename(contig=ID) %>%
left_join(bin) %>%
left_join(tax) %>%
group_by(Lib) %>%
mutate(prop_cov=prop.table(Avg_fold)*100) %>%
ggplot(aes(x=Lib,y=prop_cov,col=O))+
geom_boxplot()+
facet_wrap(~S, scales = "free")
Learn More โ
tutorials
,R
, Metagenomics
,Binning
This page will be edited to regroup a list of links to useful bioinformatics tutorials
Aug 18, 2023library(tidyverse)library(readxl)
Aug 3, 2023First note in HackMD. Let see if this space is useful or just yet another forgetable thing that will accumulate digital dust. This note is designed to set up a Google environement to perform metagenomic co Assembly on mouse stool sample. Each individual mouse was sample multiple point in time allowing to boos the microbial signal. Google VM environment preparation sudo apt-get update sudo apt-get install bzip2 libxml2-dev sudo apt-get install git wget tar unzip sudo ln -s /usr/bin/python3 /usr/bin/python
May 9, 2023Quick MD note on how to run the GTDB-Tk pipeline on a google machine. GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy GTDB. This is useful to estimate which taxonomic unit the bin you get from a metagenomic assembly belongs to. I started a Google E2 standard machine with 16 CPUs, 64 GB RAM, and a 200 GB HDD to run the pipeline. Google CLI The Google CLI is a very convenient way to interact with the Google VMs. More details TBD
May 8, 2023or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up