Try โ€‚โ€‰HackMD

Processing binny and CoAssembly results

The previous tutorial here ended with the generation of bins and co Assembly but what now?

Now we can look at how the bins are distributed at each time point.

For that we use R with the multitool library tidyverse

library(tidyverse)

Loading the files

We will be loading the individual library mapping to the coassembly first:

covlist<-list.files("Covstats_FS2C_B1/", full.names = T)

cov<-list()
for (i in 1:length(covlist)){
  cov[[i]]<-read_tsv(covlist[i]) %>% 
    rename(ID=`#ID`) %>% 
    separate(ID, into = "ID", sep=" ",extra = "drop") %>% 
    mutate(Lib=paste0("D", i-1))
}

cov<-reduce(cov, rbind)

Then the contig distribution summary generated by binny

bin<-read_tsv("contig_data.tsv")%>% 
  separate(bin, into=c("binID1","binID2"),sep="\\.") %>% 
  type_convert

Then the GTDB-Tk summary results

tax<-read_tsv("gtdbtk/gtdbtk.bac120.summary.tsv") %>% 
  separate(classification, into=c("K","P","C","O","F","G","S"), sep=";") %>% 
  mutate(user_genome=gsub("binny_", "", user_genome)) %>% 
  separate(user_genome, into=c("binID1","binID2","C","P"), extra="drop") %>% 
  mutate(binID2=as.numeric(binID2),
         binID1=gsub("R0","R",binID1),
         binID1=gsub("I0","I",binID1))

Plotting

What a metagenome looks like

cov %>% 
  rename(contig=ID) %>% 
  left_join(bin) %>% 
  left_join(tax) %>% 
  ggplot(aes(y=Avg_fold,x=Ref_GC,col=O))+
  geom_point()+
  facet_wrap(~Lib)+
  scale_y_log10()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

How bin distribution varies accross libraries (here they don't varie much)

cov %>% 
  rename(contig=ID) %>% 
  select(contig,Lib,Avg_fold) %>% 
  spread(Lib,Avg_fold) %>% 
  left_join(bin) %>% 
  left_join(tax) %>% 
  ggplot(aes(x=D0,y=D1,col=O))+
  geom_point()+
  scale_y_log10()+
  scale_x_log10()

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

Then the average bin coverage (average fold) mapping within each library

cov %>% 
  rename(contig=ID) %>% 
  left_join(bin) %>% 
  left_join(tax) %>% 
  ggplot(aes(x=paste(binID1,binID2),y=Avg_fold,col=O))+
  geom_boxplot()+
  scale_y_log10()+
  facet_wrap(~Lib)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

And the average coverage for each bin accross time


cov %>% 
  rename(contig=ID) %>% 
  left_join(bin) %>% 
  left_join(tax) %>% 
  group_by(Lib) %>% 
  mutate(prop_cov=prop.table(Avg_fold)*100) %>% 
  ggplot(aes(x=Lib,y=prop_cov,col=O))+
  geom_boxplot()+
  facet_wrap(~S, scales = "free")

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

tags: tutorials,R, Metagenomics,Binning