Análisis de resultados del gen 16S rRNA con R

--- title: 'Análisis de resultados del gen 16S rRNA con R' disqus: hackmd --- Análisis de resultados del gen 16S rRNA con R === **Dra. E. Ernestina Godoy Lozano** elizabeth.godoy@insp.mx tinagodoy@gmail.com ## Temario [TOC] ## Crear el espacio de trabajo ### Ejercicio 1. Conectarnos al servidor ```=bash= ssh -X alumnoX@132.248.32.180 ``` ### Ejercicio 2: Descomprimir Rproject Ir al directorio EMetagen ```=bash=2 cd EMetagen ``` Descomprimir modulo de trabajo ```=bash=2 tar -xzvf /Share/EMetagen/16S_analisis.tar.gz ``` ### Ejercicio 3: Conectarnos al R Server 1. Abrir un navegador web 2. Ir a la siguiente direccion (link/liga): http://132.248.32.180:8787 3. Ingresar usuario (Username) y contraseña (Password) ![Captura de pantalla 2024-02-23 a la(s) 12.42.58 p.m.](https://hackmd.io/_uploads/r1PHDvL36.png) 6. **¡Perfecto!** Haz ingresado al R Studio Server!!! # Análisis en R con RStudio ## Configuración del espacio de trabajo ### Ejercicio 4: Posicionarnos en el directorio de trabajo ```=Rscript= getwd() setwd("~/EMetagen/16S_analisis") dir() ``` ### Ejercicio 5: Cargar librerias de trabajo Usaremos la función **library()** ```=Rscript=4 library(gplots) library(ggplot2) library(vegan) library(reshape2) library(metagenomeSeq) library(RColorBrewer) library(phyloseq) library(MASS) library(gtools) ```` #### Cargar funciones extras Usaremos la función **source()** ```=Rscript=13 source("scripts/pairwise.adonis.R") source("scripts/abund_relativa.R") source("scripts/taxonomic_levels.R") ``` ## Matrices de abundancia ### Ejercicio 6: Modificación de matriz de abundancia #### Niveles taxonómicos Diapositivas: 15-17 1. Cargar datos a espacio de trabajo ```=Rscript=16 matrix_abund <- read.delim("data/TablaFinal.txt", header=TRUE) View(matrix_abund) dim(matrix_abund) tabla <- as.matrix(data.frame(matrix_abund[,2:16])) rownames(tabla) <- matrix_abund$taxonomy View(tabla) ``` 2. Escribir matriz de abundancias absolutas modificada ```=Rscript=22 exportMat(tabla, file ="data/integrated_matrix.txt") ``` 3. Añadir los nombres de los taxa a un objeto ```=Rscript=23 asignacion_taxonomica <- data.frame(completo=rownames(tabla)) View(asignacion_taxonomica) ``` 4. Asignar los taxa a su nivel taxonomico correspondiente Usaremos la función **separate()** de tidyr para aplicar una función a todo un dataFrame ```=Rscript=25 asignacion_taxonomica <- asignacion_taxonomica %>% separate(completo, into = c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species", "Subspecies"), sep = ";", fill = "right", remove = FALSE) ``` 5. Visualización de la tabla ```=Rscript=30 View(asignacion_taxonomica) ``` #### Comprimir la matriz a distintos niveles taxonómicos Usaremos la funcion **taxa_levels()** que cargamos previamente ```=Rscript=31 taxonomic_levels("data/TablaFinal.txt", "data") dir("data") ``` ### Ejercicio 7: Normalización de matrices de abundancia Diapositiva 18 ```=Rscript=33 View(tabla) ``` 1. Sacar conteos totales por muestra ```=Rscript=34 totales <- colSums(tabla) totales ``` 2. Generar una nueva matriz ```=Rscript=36 tabla_rel <- matrix(nrow=dim(tabla)[1], ncol=dim(tabla)[2]) colnames(tabla_rel) <- colnames(tabla) rownames(tabla_rel) <- rownames(tabla) tabla_rel <- sweep(tabla, 2, totales, FUN = "/") * 100 View(tabla_rel) colSums(tabla_rel) ``` 3. Escribir la matrix normalizada en nuestros archivos ```=Rscript=44 exportMat(tabla_rel, file ="data/percent_integrated_matrix.txt") ``` 4. Correr la función **abund_relativa()** para los niveles de Género y Clase ```=Rscript=45 abund_relativa("data/Genus.txt", "data/percent_Genus.txt") abund_relativa("data/Class.txt", "data/percent_Class.txt") dir("data") ``` ### Ejercicio 8: Generación de barras apiladas con Cortes de abundancia Diapositivas: 19- 21 1. Cargar matriz a nivel de *clase* en abundancias relativas ```=Rscript=48 tabla <- read.delim("data/percent_Class.txt", header=T, row.names=1) View(tabla) dim(tabla) ``` 2. Determinar un corte de abundancia, aqui marcamos un corte del 1% ```=Rscript=51 cutoff <- 1 ``` 3. Calcular la media por fila y filtrar las que son > 1% ```=Rscript=52 tabla_filtrada <- tabla %>% as.data.frame() %>% rownames_to_column("Taxa") %>% mutate(Media = rowMeans(select(., -Taxa))) %>% filter(Media > cutoff) %>% arrange(desc(Media)) %>% # ordenar de mayor a menor la abundancia select(-Media) View(tabla_filtrada) ``` 4. Restaurar los nombres de fila y columna ```=Rscript=62 new <- tabla_filtrada %>% column_to_rownames("Taxa") ``` 5. Calcular totales y fila de "Abundance < 1%" ```=Rscript=64 resto <- 100 - colSums(new) fila_resto <- as.data.frame(t(resto)) rownames(fila_resto) <- paste("Abundance <", cutoff, "%") ``` 6. Combinar las tablas ```=Rscript=67 new <- bind_rows(new, fila_resto) ``` 7. Escribir nueva matriz de abundancia ```=Rscript=68 exportMat(as.matrix(new), file ="data/percent_Class_more_1_perc.txt") ``` 8. Contraer la información de la matrix en dos columnas (Fusionar la matriz) ```=Rscript=69 melt_subtable <- melt(as.matrix(new)) ``` Cambiar los nombres de la tabla ```=Rscript=70 colnames(melt_subtable) colnames(melt_subtable) <- c("Taxa", "Sample","Freq") ``` Cambiar el tipo de clase de la columna *Taxa* ```=Rscript=72 melt_subtable$Taxa <- factor(melt_subtable$Taxa, levels=rownames(new)) head(melt_subtable) ``` 9. Generar un vector de colores al azar Quitar la escala de grises del vector de colores ```=Rscript=74 set.seed(123) # para reproducibilidad color = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)] ``` Generar un vector de colores al azar con un color gris para los taxa menos abundantes ```=Rscript=76 # Opción 1: my_color=c(sample(color, dim(new)[1]-1, replace=FALSE), "gray") # Opción 2: n_taxa <- nrow(new) my_color <- c(colorRampPalette(brewer.pal(8, "Set1"))(n_taxa - 1), "gray") ``` 10. Generar mi grafica de barras apiladas con el corte de abundancia ```=Rscript=82 ggplot(melt_subtable, aes(x= Sample, y=Freq, fill=Taxa)) + geom_bar(stat="identity") + scale_fill_manual(values=my_color, limits=rownames(new)) + theme(axis.text.x = element_text(angle=30, hjust=1, vjust=1, size=6), panel.background= element_rect(fill = NA, colour = "black"), panel.grid.major = element_line(colour = "grey90"), legend.key = element_rect(fill = "white"), legend.text=element_text(size=6), legend.position="bottom") + labs(x = "", y = "Relative abundance (%)", fill="") + scale_y_continuous(breaks=c(0, 20, 40, 60, 80, 100)) ``` 11. Guardar imagen con función **ggsave()** ```=Rscript=93 ggsave("figuras/sock_plots_Class_more_1_perc.png", device = "png", dpi = 300, units = "px", bg = "white") ``` ### Ejercicio 9: Library *qiime2R* #### Leyendo y manejando artifacs en R Diapositivas: 23-29 1. Cargar librerias de trabajo ```=Rscript=95 library(qiime2R) library(tidyverse) ``` 2. Leer el "**artifac**" que genero QIIME2 (*table.qza*) ```=Rscript=97 tabla_qza <- read_qza("qiime2/table.qza") ``` 3. Visualizar la estructura de nuestro objeto ```=Rscript=98 str(tabla_qza) class(tabla_qza) ``` 4. Extraer la información del objeto tabla_qza como una lista ```=Rscript=100 names(tabla_qza) ``` 5. Acceder a la información de las features (ASVs) ```=Rscript=101 View(tabla_qza$data) head(tabla_qza$data) ``` 6. Asociar la taxonomía ```=Rscript=103 taxonomia_qza <- read_qza("qiime2/taxonomy_blast.qza") str(taxonomia_qza) class(taxonomia_qza) ``` 7. Acceder a la taxonomía ```=Rscript=106 View(taxonomia_qza$data) ``` 8. Dividir en niveles taxonómicos la taxonomía usando **parse_taxonomy()** ```=Rscript=107 taxa_sep <- parse_taxonomy(taxonomia_qza$data) View(taxa_sep) ``` 9. Comprimir el *data.frame* a un nivel taxonómico en particular (Nivel de *clase*) ```=Rscript=109 class_table <- summarize_taxa(tabla_qza$data, taxa_sep)$Class View(class_table) ``` #### Gráfica de barras apilada y heatmaps usando la información de metadata 1. Cargar el archivo de metadata.tsv ```=Rscript=111 metadata <- read_q2metadata("qiime2/metadata.tsv") View(metadata) ``` 2. Generación de gráfica de barras apiladas ```=Rscript=113 taxa_barplot(class_table, metadata,"Zone") ``` 3. Generación de gráfica de Heatmap ```=Rscript=114 taxa_heatmap(class_table, metadata,"Type") ``` 4. Cambiar gamma de colores de acuerdo a la gramática de ***ggplot2*** ```=Rscript=115 taxa_barplot(class_table, metadata,"Zone") + scale_fill_manual(values=my_color) + theme(text = element_text(size=10), legend.position="bottom") + scale_y_continuous(breaks=c(0, 20, 40, 60, 80, 100)) + labs(fill="") taxa_heatmap(class_table, metadata,"Type") + scale_fill_gradientn(colours = terrain.colors(10)) taxa_heatmap(class_table, metadata,"Type") + scale_fill_gradient(low = "purple", high = "green3", na.value = NA) ``` 5. Guardar imagen en png ```=Rscript=126 png("figuras/taxa_heatmap.png", width = 10*300, height = 8*300, res = 300, pointsize = 8) taxa_heatmap(class_table, metadata,"Type") dev.off() ``` #### Generación de un archivo de tipo phyloseq ```=Rscript=129 physeq_qiime2 <- qza_to_phyloseq ( features="qiime2/table.qza", tree="qiime2/rooted-tree.qza", taxonomy="qiime2/taxonomy_blast.qza", metadata = "qiime2/metadata.tsv") physeq_qiime2 class(physeq_qiime2) str(physeq_qiime2) ``` ## Alfa y Beta diversidad ### Ejercicio 10: Cálculo de los estimadores de alfa diversidad usando *phyloseq* Diapositivas: 30-34 #### Cargar los datos ```=Rscript=138 variables <- read.delim("data/abiotic_variables.txt", header=T, row.names=1) abun_table <- read.delim("data/Genus.txt", header=T, row.names=1) abun_table_mat <- as.matrix(abun_table) dim(abun_table) View(abun_table) taxrank_mat <- as.matrix(rownames(abun_table_mat)) colnames(taxrank_mat) <- ("TAX") rownames(taxrank_mat) <- rownames(abun_table_mat) head(taxrank_mat) ``` #### Generar objetos de tipo phyloseq con las funciones de phyloseq especificas 1. Matriz de OTUs (En nuestro caso es la matriz de *Géneros*) ```=Rscript=148 OTU <- otu_table(abun_table_mat, taxa_are_rows=TRUE) ``` 2. Una matriz con las anotaciones de los OTUS (En nuestro caso es la anotación taxonómica a nivel de *género*) ```=Rscript=149 TAX <- tax_table(taxrank_mat) ``` 3. Cargar tabla de metadatos ```=Rscript=150 MET <- sample_data(variables) ``` 4. Construimos nuestro objeto de *phyloseq* ```=Rscript=151 physeq <- phyloseq(OTU, TAX, MET) str(physeq) ``` 5. Obtenemos los índices de diversidad y los colocamos en una tabla ```=Rscript=153 diversity_index <- data.frame(round(estimate_richness(physeq, measures=c("Observed", "Chao1", "Shannon", "Simpson")), 2)) diversity_index ``` Usar el objeto que construimos con ***qiime2R*** ```=Rscript=155 diversity_index2 <- data.frame(round(estimate_richness(physeq_qiime2, measures=c("Observed", "Chao1", "Shannon", "Simpson")), 2)) diversity_index2 ``` 6. Escribimos la tabla ```=Rscript=157 write.table(diversity_index, "data/alpha_diversity_index.txt", quote = FALSE, sep = "\t", row.names = TRUE, col.names=TRUE) ``` 7. Generar un boxplot usando el índice de Shannon ```=Rscript=158 diversity_index$Clase <- sapply(as.character(rownames(diversity_index)), function(x){strsplit(x, "_")[[1]][1]}) ggplot(diversity_index, aes(x=Clase, y=Shannon, fill=Clase)) + geom_boxplot(show.legend=FALSE) + geom_point(color="black", alpha=0.9, position = "jitter", shape=21, show.legend=FALSE) + labs(x = "", y = "Indice de Shannon", fill="") + scale_fill_manual(values=c("royalblue2", "tomato2", "seagreen3")) + theme_minimal() ggsave("figuras/boxplot_shannon.png", device = "png", dpi = 300, units = "px", bg = "white") ``` ### Ejercicio 11: Generación de PCoA/NMDS con phyloseq Diapositivas: 35-38 #### Generación del PCoA con el método de Bray-Curtis usando la función **ordinate()** ```=Rscript=168 ordination <- ordinate(physeq, "PCoA", "bray") plot_ordination(physeq, ordination, color="grupo") ``` Podemos usar la información de alguna variable continua ```=Rscript=170 plot_ordination(physeq, ordination, color="Deep") ``` Cambiarle el color a otra paleta de colores ```=Rscript=171 plot_ordination(physeq, ordination, color="Deep") + scale_colour_gradientn(colours= topo.colors(10)) ``` Vamos a usar la sintáxis de ggplot2 para mejorar las gráficas ```=Rscript=173 plot_ordination(physeq, ordination, color="grupo") + stat_ellipse() + scale_color_manual(values= c("deeppink1", "darkturquoise", "gold")) + labs(title="PCoA") + theme_bw() ``` #### Ejemplo de como generar un PCoA cuando tenemos la información del árbol filogenético ```=Rscript=178 plot_ordination(physeq_qiime2, ordinate(physeq_qiime2, "PCoA", "wunifrac"), color="Type") ``` Guardar nuestro gráfico ```=Rscript=179 plot_ordination(physeq, ordination, color="grupo") + stat_ellipse() + scale_color_manual(values= c("deeppink1", "darkturquoise", "gold")) + labs(title="PCoA") + theme_bw() ggsave("figuras/PCoA_bray_points_phyloseq.png", device = "png", dpi = 300, units = "px") ``` #### Generación de un NMDS ```=Rscript=183 plot_ordination(physeq, ordinate(physeq, "NMDS", "bray"), color="grupo") + scale_color_manual(values= c("deeppink1", "darkturquoise", "gold")) + labs(title="NMDS") + theme_bw() ggsave("figuras/NMDS_bray_points_phyloseq.png", device = "png", dpi = 300, units = "px") ``` ### Ejercicio 12: ANOSIM y ADONIS Diapositivas: 38-39 #### Normalizar usando el método de metagenomeSeq. 1. Cargar matriz de conteos absolutos ```=Rscript=187 datos <- loadMeta("data/Genus.txt") ``` 2. Transformarla a un objeto de metagenomeSeq llamado "*MRexperiment*" ```=Rscript=188 MRexp <- newMRexperiment(datos$counts) ``` 3. Calcular el percentil para el cual sumar los conteos absolutos ```=Rscript=189 p <- cumNormStatFast(MRexp) ``` 4. Normalizamos la matriz ```=Rscript=190 normalized_matrix <- cumNormMat(MRexp, p = p) head(normalized_matrix) ``` 5. Escribimos la matriz normalizada ```=Rscript=192 exportMat(normalized_matrix, file ="data/normal_Genus.txt") ``` 6. Transponer la matriz normalizada para que las filas sean las muestras y los taxa las columnas ```=Rscript=193 transp_mat <- t(normalized_matrix) ``` 7. Generar grupos ```=Rscript=194 muestras <- data.frame(estacion=rownames(transp_mat)) muestras$clase <- sapply(as.character(muestras$estacion), function(x){strsplit(x, "_")[[1]][1]}) grupos <- muestras$clase ``` #### ANOSIM (Diferencias entre todos los grupos) ```=Rscript=198 str(transp_mat) resultado_anosim <- anosim(transp_mat, grouping= grupos, permutations = 999, distance = "bray") resultado_anosim ``` #### ADONIS pareado (comparación pareada de los grupos) ```=Rscript=204 pairwise.adonis(x= transp_mat, grupos, sim.function = 'vegdist', sim.method='bray', p.adjust.m='bonferroni') ``` ## Información de la sesión y Listado de objetos ```=Rscript=208 ls() sessionInfo() ``` ## Información adicional y enlaces :::info [Gamma de Colores en R](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) [Colores HTML](https://htmlcolorcodes.com/) [Paleta de colores Opción 1](https://github.com/EmilHvitfeldt/r-color-palettes) [Paleta de colores Opción 2](https://r-charts.com/color-palettes/) [Paleta de colores Opción 3](https://gradients.app/es/colorpalette) [*metagenomeSeq*](https://www.cbcb.umd.edu/software/metagenomeSeq) [*phyloseq*](https://joey711.github.io/phyloseq/) [*qiime2R*](https://github.com/jbisanz/qiime2R) ::: ###### tags: `UUASMB` `Bitácoras` `Metagenómica`