# RNA-Seq analysis
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
```
setwd("/Users/olivia/Desktop/Gen811\ Final\ Project")
#if (!require("BiocManager", quietly = TRUE))
#install.packages("BiocManager")
#BiocManager::install('DESeq2')
library("BiocManager")
library ('DESeq2')
ME49_cts <- read.table("GSE217226_ME49_AP2XII-2_RNAseq_counts.tab",sep="\t", header = T)
RH_cts <- read.table("GSE217226_RH_AP2XII-2_RNAseq_counts.tab",sep="\t", header = T)
colnames(ME49_cts) <- c("X","CON1ME49","CON2ME49","CON3ME49","IAA1ME49","IAA2ME49","IAA3ME49")
colnames(RH_cts) <- c("X","CON1RH","CON2RH","CON3RH","IAA1RH","IAA2RH","IAA3RH")
```
Next, we need to get that metadata file from excel to R. Save the excel file as a 'tsv' in 'save as'. Save it to your project folder so you dont need to move it to read it into R
```
metadata <- read.table('the_metadata_file.tsv')
```
# Make a metadata specific to each experiment
```
## ME49_metadata
Sample Names Strain Treatment
CON1ME49 ME49 CONTROL
CON2ME49 ME49 CONTROL
CON3ME49 ME49 CONTROL
IAA1ME49 ME49 KD
IAA2ME49 ME49 KD
IAA3ME49 ME49 KD
## RH_metadata
CON1RH RH CONTROL
CON2RH RH CONTROL
CON3RH RH CONTROL
IAA1RH RH KD
IAA2RH RH KD
IAA3RH RH KD
```
ME49 metadata
```
ME49_metadata <- METADATA[1:6,]
ME49_cts <- METADATA[1:6,]
rownames(ME49_metadata) <- ME49_metadata$Sample.Names
rownames(ME49_cts) <- ME49_cts$Sample.Names
ME49_metadata$Strain <- factor(ME49_metadata$Strain, levels = c('ME49')
ME49_metadata$Treatment <- factor(ME49_metadata$Treatment, levels = c('CONTROL','KD'))
```
Formatting ME49 metadata done
Format the ME49 read counts
```
rownames(ME49_cts) <- ME49_cts[,1]
ME49_cts <- ME49_cts[,-1]
```
RH metadata
```
RH_metadata <- METADATA[7:12,]
rownames(RH_metadata) <- RH_metadata$Sample.Names
RH_metadata$Strain <- factor(RH_metadata$Strain)
RH_metadata$Treatment <- factor(RH_metadata$Treatment, levels = c('CONTROL','KD'))
```
Formatting RH metadata done
Format the RH read counts
```
rownames(RH_cts) <- RH_cts[,1]
RH_cts <- RH_cts[,-1]
```
All of the data and metadata is now formatted to do the differential expression analysis. Use the DESeq() function to perform the analysis, and then the results() function to see the p-values for each gene.
ME49 results the 'countData' is your counts (ME49_cts) and the 'colData' is your 'ME49_metadata'
```
ME49_data <- DESeqDataSetFromMatrix(countData = ME49_cts, colData = ME49_metadata, design = ~ treatment)
ME49_data <- DESeq(ME49_data)
ME49_results <- results(ME49_data)
```
The above should work. Your code had overwritten the count data with the metadata.
Error in DESeqDataSetFromMatrix(countData = ME49_cts, colData = ME49_cts, : ncol(countData) == nrow(colData) is not TRUE
The same error occured with the RH data
```
RH_cts <- DESeqDataSetFromMatrix(countData = RH_cts, colData = RH_cts, design = ~ Treatment)
Error in DESeqDataSetFromMatrix(countData = RH_cts, colData = RH_cts, :
ncol(countData) == nrow(colData) is not TRUE
ME49_cts <- DESeqDataSetFromMatrix(countData=countData,colData=ME49_cts, design=~Treatment, tidy = TRUE)
Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed
In addition: Warning message: non-unique value when setting 'row.names': ‘ME49’
ME49_data <- DESeq(ME49_data)
ME49_results <- results(ME49_data)
## Print the results
ME49_results
```
RH results
```
### do this too
```