# **Hannah RNASeq Project**
## **To dos**:
- [x] QC
- [x] FastQC
- [x] MultiQC
- [x] QoRTs
- [x] Hisat2 Index for Macaca Genome
- [x] Mapping
- [x] Hisat2
- [x] RNACocktail
- Counts
- HTSeq
### **Samples**:
*** **By Email** ***:
Data copied to Frank as
/mnt/mobydisk/pan/genomics/data/uchandran/GRC\_DATA\_DELIVERY/20180313
Thanks,
Jason
On 3/13/2018 1:29 PM, Hollingshead, Deborah J wrote:
Jason,
Can you transfer the data for the following runs to the GAC for analysis?
171212\_NS500211\_0453_AHMFFVBGX3
171213\_NS500211\_0454_AHM5NYBGX3
171214\_NS500211\_0455_AHN77CBGX3
171208\_NS500211\_0452_AHM7NLBGX3
It all goes to the usual place: /mnt/mobydisk/pan/genomics/data/uchandran/GRC\_DATA\_DELIVERY
Thanks,
Debby
***Meeting with Hannah on Mar 15, 2018***:
54 samples (non human primates – monkey) fastq files transferred from HTC
12 animals
6 animals – 3 time points (pre-inf, early-inf, late-inf)
6 animals – 6 time points (pre-inf, early-inf, late-inf, pre-treat, early-treat, post-treat)
**Metadata**

**Location of data**:
/mnt/mobydisk/groupshares/uchandran/SRI/Hannah/RNASeq/
**References**:
Refence genome : **_Macaca fascicularis: Macaca\_fascicularis\_5.0_**
***Genome*** : Ensembl Genome: ftp://ftp.ensembl.org/pub/current_fasta/macaca_fascicularis/dna/Macaca_fascicularis.Macaca_fascicularis_5.0.dna.toplevel.fa.gz
For buidling index for Hisat, remove the contigs from toplevel.fa file and used it for mapping
***GTF***: ftp://ftp.ensembl.org/pub/current_gtf/macaca_fascicularis/Macaca_fascicularis.Macaca_fascicularis_5.0.91.gtf.gz/mnt/mobydisk/pan/genomics/data/uchandran/refs/Macaca_fascicularis/Ensembl/GTF
**Results**:
**FastQC**:
/mnt/mobydisk/groupshares/uchandran/SRI/Hannah/RNASeq/QC/FastQC
**MultiQC**:
https://pitt.app.box.com/folder/48203828912
The rawread files (fastq) files sent by sequencing core were not right files. So they sent the new files around May 15th.
I re ran the fastqc, multi qc and hisat2 mapping
Fastqc shows over represented seq and duplication in some samples. There is also poor mapping for the same samples.
# Email update for Monkey project (New samples) to Hannah :
There are total 54 samples (non-human primates, monkey) and we received fastq files (paired end). I ran both Hisat2 and CLC pipelines . The details are as below:
1. Quallity check on raw fastq files (FastQC and mulitquc)
2. Read mapping (Hisat2 and CLC)
3. Quliaty check after mapping (Qorts)
Overall, there are few samples that did not pass QC and had GC distribution and duplication issues. For example, P2-M28-Pre with 89% and P2-M22-Mid sample with 75% duplication. These issues could be either due to sequencing errors, PCR, species contamination or other unknown reason (MultiQC report).
Next, I mapped the reads to the Ensembl monkey reference genome ( Macaca fascicularis: Macaca_fascicularis_5.0) with both CLC and Hisat2. When I did quality check on the mapped files, we found that overall mapping percentages are low in general with around 60-65% unique read mapping. For some samples such as P2-M28-pre has 54% mapping. Also, intron and intergenic mapping was high in general which could be due to the total RNA prep or the genome annotation (Qorts plots and stats).For example for P2-M28-Pre sample, intergenic mapping was very high (aprox 85%) and exon mapping was only 7%. P2-M22 has also same issue with high intergenic mapping. The same samples also has high sequence duplication. I have also provided the excel with all the samples with RINs (QC and mapping stats). The first column with qc of raw reads such as % GC, duplication ..etc, second column with Hisat2 mapping, 3rd column with CLC mapping stats, intron, intergenic mapping …etc.
I have uploaded the results in to Box. There are two folders, one with CLC and another one with Hisat2. In Hisat2 folder, there is QC in which I have provided Fastqc, multiqc (aggregate of fastqc results) and Qorts plots (qc on Hisat2 mapped files).
All the files including raw data and results are available in pghbio.
# Data and Results :
pghbio : /pghbio/dbmi/Genomics_Core/chaparal/HannahG_RNASeq/RNASeq_New
Pitt box : https://pitt.app.box.com/folder/48203705878
|