TFM Diego - HackMD

###### tags: `Estudiantes` `WGS` `Honguitos` # TFM Diego :mushroom: Lo primero que necesitamos es buscar un pipeline para analizar datos de Micobioma partiendo de datos de shotgun (a.k.a WGS) y datos de ITS, luego hacer comparaciones entre resultados y por último redes integrando datos de bacterias/arqueas y hongos. * [Objetivos](https://docs.google.com/document/d/1CBouisDcUgnfrCVvgtNYOAemHWchtdvCeIQfQQ5B2o4/edit?usp=sharing) del TFM. * [Estructura ](https://docs.google.com/document/d/1Atqi0V9hjclysTt4L1SaTuy-ssOiXcWWs2VACR-zFzA/edit?usp=sharing)de la memoria. ## Papers importantes: - Redes [hongos](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0393-0). Me gusta. - [Review](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8790610/#:~:text=For%20mycobiome%20analysis%2C%20fungal%20DNA,the%20fungal%20rRNA%20gene%20locus.). En la Tabla 1 ponen algunos software para analizar datos de ITS y WGS. - En [este](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01586-y#Sec10) de los enterotipos usan el pipeline de QIIME2 - DADA2 y la BD UNITE para analizar secuencias de ITS. Tambien sacan vías metaCyc de fungi con Hummann :exploding_head: - [Aquí](https://www.nature.com/articles/s41598-024-56585-2#Sec10) usan USEARCH + BD UNITE para ITS. - [Este](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0192898) va sobre la identificación de hongos en shotgun data. - [Paper](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01693-w#Sec2) interesante donde comparan metodologías y evaluan relaciones interkingdom. - [Paper ](https://www.sciencedirect.com/science/article/pii/S2001037022002902)de FunOMIC :eyes: - :warning: [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9928459/) de Celiacos que nos interesa para sacar controles de pacientes CD con dieta sin gluten. - [Fungal dysbiosis](https://pubmed.ncbi.nlm.nih.gov/33723701/) en niños con CD. - Compración 16S y shotgun - [puede servir de modelo](https://www.nature.com/articles/s41598-021-82726-y) - Otras comparación de 16S y shotgun - [Aqui](https://www.sciencedirect.com/science/article/pii/S266723752200296X) hay varios diagramas de Venn :sleuth_or_spy: ## Bases de Datos: >Revisar qué nos sirve y qué no :slightly_smiling_face: - [FungiDB](https://fungidb.org/fungidb/app) - [UNITE](https://unite.ut.ee/) - [Mycobank](https://www.mycobank.org/) - [Entrada](https://forum.qiime2.org/t/its-database-other-than-unite-database/28642) en el foro de QIIME2 hablando sobre otras DB. - [FunOMIC](https://manichanh.vhir.org/funomic/) - Ojo parece guay ## Tutoriales y otra info interesante: **ITS:** - [Tutorial](https://forum.qiime2.org/t/fungal-its-analysis-tutorial/7351) de QIIME2 para analizar ITS - Ver si hay algo más actulizado :eyes: - No sé si es [este](https://github.com/gregcaporaso/2017.06.23-q2-fungal-tutorial) es el mismo tutorial de arriba, revisar. - Un [GitHub](https://github.com/colinbrislawn/unite-train) sobre como entrenar UNITE o algo así. Revisar :sweat_smile: **WGS:** - GitHub de [FindFungi](https://github.com/GiantSpaceRobot/FindFungi). - GitHub de [FunOMIC](https://github.com/ManichanhLab/FunOMIC) **Otros:** - QC y filtrar host con [kneadData](https://huttenhower.sph.harvard.edu/kneaddata/). ## Datos que usaremos Los datos que necesitas de Shotgun están en Arya en el directorio: `/home/diego/hongos/WGS` No consigo los datos limpios de HOST, así que tenemos que repetir esto. Se tarda mucho, podemos hacerlo en el servidor. Así vas aprendiendo como se usa. Los datos de ITS que tenemos por ahora están en Grogu (Aquí está todo el rawdata tal cual lo envia la empresa de secuenciación) `/home/diego/hongos/ITS` ## Conexión a workstations y servidor - Usaremos el protocolo ssh. Aquí mas [info](https://www.digitalocean.com/community/tutorials/how-to-use-ssh-to-connect-to-a-remote-server-es) de como funciona. - Los datos para la conexión están [aquí](https://docs.google.com/document/d/1Gy9XxnW43boUeAC5gAEVoaLSN59Ig-ELTzWdkgZue2Y/edit?usp=sharing). - Usar [Screen](https://maojr.github.io/screencheatsheet/) para guardar las sesiones del terminal. ## Para analizar WGS: ### Paso 1: Filtrar host :dancers: #### MetaWRAP Info sobre [METAWRAP](https://github.com/bxlab/metaWRAP/blob/master/Usage_tutorial.md) :memo: Usar lo siguiente como guia, modifica lo necesario, ojo con la versiones y actualizaciones. **Mejor simpre lo más actualizado.** Mejor vé [aquí](https://github.com/bxlab/metaWRAP) la instalación de Metawrap, que está más completo. ```bash=s conda create -y -n metawrap-env phyton=2.7 conda install -y -c ursky metwrap-mg ``` Sacar data de c/carpeta y juntar en una. Luego descomprimir ```bash=s gunzip *.gz ``` :::info Cambiar nombre/ quitar toda la :shit: del nombre De D1807_ESFP200009043-1a_HCYJCDSXY_L2_1.fq a D1807_1.fq ::: ```bash=1 mkdir RAW_READS mv *.fq /home/laura/Laura/ECCMID/RAW_READS ls RAW_READS mkdir READ_QC ``` Limpiar y filtrar reads del host... (Paciencia :hourglass_flowing_sand:) ```bash=1 conda activate metawrap-env for F in RAW_READS/*_1.fq; do R=${F%_*}_2.fq BASE=${F##*/} SAMPLE=${BASE%_*} metawrap read_qc -1 $F -2 $R -t 6 -o READ_QC/$SAMPLE & done ``` Mover las secuencias limpias y sin human-host ```bash=1 mkdir CLEAN_READS for i in READ_QC/*; do b=${i#*/} mv ${i}/final_pure_reads_1.fastq CLEAN_READS/${b}_1.fastq mv ${i}/final_pure_reads_2.fastq CLEAN_READS/${b}_2.fastq done ``` ## Análisis en R de Diversidades y abundancias Un ejemplo.... ```R=1 #Alfa diversidades plot_richness(physeq, color = "Group", x = "Group", measures = c("Chao1", "Simpson", "Shannon")) + geom_boxplot(aes(fill = Group), alpha=.7) + scale_color_manual(values = c("#141B41", "#306BAC","#6F9CEB")) + scale_fill_manual(values = c("#141B41", "#306BAC","#6F9CEB")) + theme_bw() ####Alpha diversidades#### metadata <- read_csv("Data/Metadata/metadata.csv") #Comparaciones alfa diversidad alpha_physeq <- estimate_richness(physeq, measures=c("Chao1", "Shannon", "Simpson")) %>% as.data.frame() %>% select(-se.chao1) %>% rownames_to_column("SampleID") %>% left_join(metadata) color <- c("#4c6216","#bb706fff") C <- ggboxplot(alpha_physeq, x = "Group", y = "Chao1", fill = "Group", palette = color, alpha=0.8,outlier.shape = NA) + stat_compare_means(label = "p.signif") + theme_light() + theme(legend.position = "none") + geom_jitter(aes(colour = Group),position=position_jitter(0.2), alpha=0.5) + ggtitle("Chao1") + labs(x = "",y = "") Si <- ggboxplot(alpha_physeq, x = "Group", y = "Simpson", fill = "Group", palette = color, alpha=0.8,outlier.shape = NA) + stat_compare_means(label = "p.signif") + theme_light() + theme(legend.position = "none") + geom_jitter(aes(colour = Group),position=position_jitter(0.2), alpha=0.5) + ggtitle("Simpson") + labs(x = "",y = "") S <- ggboxplot(alpha_physeq, x = "Group", y = "Shannon", fill = "Group", palette = color, alpha=0.8,outlier.shape = NA) + stat_compare_means(label = "p.signif") + theme_light() + theme(legend.position = "none") + geom_jitter(aes(colour = Group),position=position_jitter(0.2), alpha=0.5) + ggtitle("Shanon") + labs(x = "",y = "") library(patchwork) C + S + Si + theme(legend.position = "right") #Beta diversidades #Unifrac physeq.rel <- transform(physeq, transform="compositional") colori <- c("#bb706fff","#4c6216") ordinated_taxa_unifrac <- ordinate(physeq.rel, method="MDS", distance="bray",weighted=TRUE) P1 <- plot_ordination(physeq.rel, ordinated_taxa_unifrac, color="Group", title = "Unifrac MDS Analysis") + theme_bw() + scale_color_manual(values=colori) #Poner más axis #Comp 1 y 3 P2 <- plot_ordination(physeq.rel, ordinated_taxa_unifrac, color="Group", title = "Unifrac MDS Analysis", axes=c(1,3)) + theme_bw() + scale_color_manual(values = colori) #Comp 2 y 3 P3 <- plot_ordination(physeq.rel, ordinated_taxa_unifrac, color="Group", title = "Unifrac MDS Analysis", axes=c(2,3)) + theme_bw() + scale_color_manual(values = colori) P1 + P2 + P3 + plot_layout(guides = "collect") #PERMANOVA # Pick relative abundances (compositional) and sample metadata otu <- abundances(physeq.rel) meta <- meta(physeq.rel) # samples x species as input permanova <- adonis(t(otu) ~ Group, data = meta, permutations=99, method = "bray") permanova # P-value print(as.data.frame(permanova$aov.tab)["Group", "Pr(>F)"]) #Checking the homogeneity condition dist <- vegdist(t(otu)) anova(betadisper(dist, meta$Group)) coef <- coefficients(permanova)["Group1",] top.coef <- coef[rev(order(abs(coef)))[1:20]] par(mar = c(3, 14, 2, 1)) barplot(sort(top.coef), horiz = T, las = 1, main = "Top taxa") #Top 20 genus library(MetBrewer) paleta20 <- met.brewer("Renoir", 20) tb <- psmelt(physeq) tb20genus <- tb %>% group_by(Genus) %>% summarise(M=median(Abundance)) %>% ungroup() %>% unique() %>% top_n(M,n=20) tb20genus <- tb20genus %>% inner_join(tb) Plot20g <- ggplot(tb20genus, aes(Group, Abundance ,fill=Genus)) + geom_col(position="fill") + scale_fill_manual(values=paleta20) Plot20g <- Plot20g + theme_minimal() + xlab("") + ylab("") + theme(legend.title=element_text(color= "black", size=9), legend.text=element_text(size=8, face="italic")) Plot20g <- Plot20g + ggtitle ("Top genera") Plot20g #Phylum #Dif Phylum paleta5 <- met.brewer("Renoir", 5) Phyl <- tax_glom(physeq, taxrank = "Phylum") tb_phyl <- psmelt(Phyl) tbphylum <- tb_phyl %>% group_by(Phylum) %>% summarise(M=median(Abundance)) %>% ungroup() %>% unique() %>% top_n(M,n=5) tbphylum <- tbphylum %>% inner_join(tb_phyl) Plotf <- ggplot(tbphylum, aes(Group, Abundance ,fill=Phylum)) + geom_col(position="fill") + scale_fill_manual(values=paleta5) Plotf <- Plotf + theme_minimal() + xlab("") + ylab("Relative Abundance") + theme(legend.title=element_text(color= "black", size=9), legend.text=element_text(size=8, face="italic")) Plotf <- Plotf + ggtitle("Top phyla") Plotf Plotf + Plot20g ```