# RNA-Seq (Transcriptome Guided) [Salmon]
https://hackmd.io/@xrissymae/B1TQlR36V
For our project, we're working with White Lupin, which is a non-model organism. In order to do a differential expression analysis, we would have to build a transcriptome to guide and count our short-reads against. Thankfully, somebody else built a reference transcriptome we can use called the *Lupinus albus* Gene Index version 2 (LAGI02).
### Pipeline:

### Reference:
1. [ANGUS DIBSI Tutorial 2018](https://angus.readthedocs.io/en/2018/rna-seq.html)
2. [Salmon Documents](https://salmon.readthedocs.io/en/latest/salmon.html)
## **Before Starting**: Install programs
1. Log into Comet
2. Download and install bioconda
```
curl -O -L https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```
Say yes to everything. If you see ">>>", you're still within the installation process.
Once finished, run the following command to activate the conda environment.
```
source ~/.bashrc
```
Now enable various channels for software installation
```
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
```
3. Install Salmon using *conda*
```
conda install salmon
```
4. Install multiqc
```
conda install multiqc
```
## Downloading our Sequences
Instead of using our super large real data, we'll be using a practice RNA-Seq set from [Schurch et al, 2016](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/).
1. Create a logs folder
```
cd ~
mkdir logs
```
2. Create new directory for sequences in Oasis.
```
cd /oasis/scratch/comet/$USER/temp_project
mkdir data
cd data
```
3. Download data from yeast RNA-Seq Study.
#### RNA-Seq Data
```
curl -L https://osf.io/5daup/download -o ERR458493.fastq.gz
curl -L https://osf.io/8rvh5/download -o ERR458494.fastq.gz
curl -L https://osf.io/2wvn3/download -o ERR458495.fastq.gz
curl -L https://osf.io/xju4a/download -o ERR458500.fastq.gz
curl -L https://osf.io/nmqe6/download -o ERR458501.fastq.gz
curl -L https://osf.io/qfsze/download -o ERR458502.fastq.gz
```
4. Change permissions just incase. This will remove writing priviledges to prevent modifying the data.
* if you type `ls -l`, you should see:
```
-rw-r--r-- 1 klorilla ceb101 59532325 May 29 21:21 ERR458493.fastq.gz
-rw-r--r-- 1 klorilla ceb101 58566854 May 29 21:21 ERR458494.fastq.gz
-rw-r--r-- 1 klorilla ceb101 58114810 May 29 21:21 ERR458495.fastq.gz
-rw-r--r-- 1 klorilla ceb101 102201086 May 29 21:21 ERR458500.fastq.gz
-rw-r--r-- 1 klorilla ceb101 101222099 May 29 21:21 ERR458501.fastq.gz
-rw-r--r-- 1 klorilla ceb101 100585843 May 29 21:22 ERR458502.fastq.gz
```
* Now remove the writing priviledge by typing`chmod a-w *` and check permissions again with `ls -l`. The (rw-r--r--) should now be (r--r--r--)
## Link data files into your working directory
* Head back to your main directory `~` and link the files from Oasis.
```
cd ~
mkdir data
cd data
ln -fs /oasis/scratch/comet/$USER/temp_project/data/* .
ls -l
```
* We do this so that the data is easier to work with than having to type `/oasis/scratch/comet/$USER/temp_project/data/` all the time.
## Quality Check with FastQC
1. Create new folder for fastqc
```
cd ~
mkdir quality
cd quality
```
2. Download the fastqc script to the folder and move file to fastqc folder.
```
wget https://raw.githubusercontent.com/xrissymae/biobasics/master/fastqc.sh
```
3. Edit fastqc.sh file with `vim`.
4. Run fastqc on yeast RNA-seq files.
```
sbatch fastqc.sh
```
5. Run `multiqc .` to consolidate the fastqc data.
6. Download `.html` files into your local computer through Globus or secure copy.
* **Globus**: Login via app.globus.org.
* **Commandline**: In your local terminal (not connected to Comet), type:
```
mkdir ~/Desktop/fastqc
cd ~/Desktop/fastqc
scp $USER@comet.sdsc.xsede.org:~/quality/*.html .
ls
```
### Example Reports
1. [Good Illumina Data](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/good_sequence_short_fastqc.html)
2. [Bad Illumina Data](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/bad_sequence_fastqc.html)
* [More FastQC Information](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
## Trim Adapters with Trimmomatic
1. Download the adapters you will be using. For our real data, we have a custom adapter file (`adapters.fa`), which is based off the latest Illumina Paired-End TruSeq3 Library. But for this tutorial, we'll be using the standard file for Illumina Paired-End TruSeq2 Libraries (`TruSeq2-PE.fa`).
```
cp /opt/biotools/trimmomatic/adapters/TruSeq2-PE.fa .
```
2. Download [trimmomatic script](https://raw.githubusercontent.com/xrissymae/biobasics/master/trim.sh) to working directory.
```
wget https://raw.githubusercontent.com/xrissymae/biobasics/master/trim.sh
```
3. Edit Trimmomatic shell file.
4. Run script (~4min).
## FastQC Again
1. We'll run another quality check on our now trimmed data by editing our fastqc.sh file and changing the input directory and extension (`~/data/` -> `~/quality`;`fastq` -> `qc.fq`)
2. Download the `.html` files into your local to view.
## Read Mapping & Counting with Salmon
After quality checks, we can now map our reads to a reference transcriptome. We will use [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html).
1. Download reference transcriptome.
```
cd ~
mkdir index
cd index
curl -O https://downloads.yeastgenome.org/sequence/S288C_reference/orf_dna/orf_coding.fasta.gz
```
2. Make a new folder for your rnaseq data
```
cd ~
mkdir rnaseq
cd rnaseq
```
3. Create index for reference transcriptome.
```
salmon index --index yeast_orfs --type quasi --transcripts ~/index/orf_coding.fasta.gz
```
3. Map reads to reference index.
```
for i in ~/quality/*.fq.gz
do
salmon quant -i yeast_orfs --libType U -r $i -o $i.quant --seqBias --gcBias
done
```
## Gather Counts
* This [python script](https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/gather-counts.py) by Titus Brown basically takes the raw counts from the salmon `.quant` files and makes a new file.
```
curl -L -O https://raw.githubusercontent.com/ngs-docs/2018-ggg201b/master/lab6-rnaseq/gather-counts.py
python2 gather-counts.py
```
## Data Analysis with EdgeR
1. Download and run [R script](https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/yeast.salmon.R).
```
curl -L -O https://raw.githubusercontent.com/ngs-docs/angus/2018/scripts/yeast.salmon.R
Rscript --no-save yeast.salmon.R
```
* Uses the package edgeR: [manual](https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)
* Outputs
1. `yeast-edgeR-MA-plot.pdf` MA-plot
2. `yeast-edgeR-MDS.pdf` MDS Plot
3. `yeast-edgeR.csv` CSV file of DEGs
2. Download the 3 files to your personal computer to view.
Alternatively, you can run this R script in your personal computer by downloading the `.quant` files from the "Gather Counts" step to your personal computer.