Squire test on Drosophila samples

# Squire test on Drosophila samples ###### tags: `TE_expression` `RNAseq` `dm6` ## Dataset Two RNAseq libraries from Marie Fablet. ### FastQC & Trimming Paired-end 100bp | Sample Name | M Seqs | Trimmed | |:------------- | ------ |:-------:| | CNVR359_S2.R1 | 24.7 | 18.5 | | CNVR359_S2.R2 | 24.7 | 18.5 | | CNVR376_S2.R1 | 17.4 | 13.5 | | CNVR376_S2.R2 | 17.4 | 13.5 | Trimming Didn't bother with adaptor trimming. If needed, use trim_galore instead. ```ssh while read line; do fileNameR1=$(echo $line|awk '{print $1}'); fileNameR2=$(echo $line|awk '{print $2}'); #trimming java -jar /home/data/rrebollo/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 4 -phred33 $fileNameR1.fastq.gz $fileNameR2.fastq.gz $fileNameR1.paired.fastq $fileNameR1.unpaired.fastq.gz $fileNameR2.paired.fastq $fileNameR2.unpaired.fastq.gz LEADING:20 TRAILING:20 AVGQUAL:25 SLIDINGWINDOW:10:30 MINLEN:35; done<marie_samples.txt ``` ## Squire run on paired-end sequences Github : https://github.com/wyang17/SQuIRE Need to change the count script as seen here : https://github.com/wyang17/SQuIRE/issues/19 ```ssh #install squire #create conda env with 2.7.3a STAR conda create --name squire --override-channels -c iuc -c bioconda -c conda-forge -c defaults -c r python=2.7.13 bioconductor-deseq2=1.16.1 r-base=3.4.1 r-pheatmap bioconductor-vsn bioconductor-biocparallel=1.12.0 r-ggrepel star=2.7.3a bedtools=2.25.0 samtools=1.1 stringtie=1.3.3 igvtools=2.3.93 ucsc-genepredtobed ucsc-gtftogenepred ucsc-genepredtogtf ucsc-bedgraphtobigwig r-hexbin #activate environment source activate squire #install squire git clone https://github.com/wyang17/SQuIRE; cd SQuIRE; pip install -e . #Squire squire Fetch -b dm6 -f -c -r -g -x -p 4 -v squire Clean -b dm6 -v squire Map -b dm6 -g -p 4 -r 100 -1 CNVR359_S2.R1.paired.fastq -2 CNVR359_S2.R2.paired.fastq -v squire Count -b dm6 -r 100 -n CNVR359 -p 4 -s 1 -v squire Map -b dm6 -g -p 4 -r 100 -1 CNVR376_S2.R1.paired.fastq -2 CNVR376_S2.R2.paired.fastq -v squire Count -b dm6 -r 100 -n CNVR376 -p 4 -s 1 -v squire Call -1 CNVR359 -2 CNVR376 -A CNVR376 -B CNVR376 -p 4 -N Droso_Marie -f pdf -v ``` *Marie* you can check the multiqc on the FTP to see all the mapping statistics. Don't bother with the samples "single". ## Squire run on single-end sequences Ran squire on R1 as single-end in order to have a similar run as our DualRNAseq data. ``` #Squire squire Map -b dm6 -g -p 4 -r 100 -1 CNVR359_S2.R1.single.fastq -v squire Count -b dm6 -r 100 -n CNVR359_S2.R1.single -p 4 -s 1 -v squire Map -b dm6 -g -p 4 -r 100 -1 CNVR376_S2.R1.single.fastq -v squire Count -b dm6 -r 100 -n CNVR376_S2.R1.single -p 4 -s 1 -v ``` ## Data on the FTP All samples labeled "paired" stem from the paired end analysis, the "single" from the single end analysis. The raw data and trimmed data is present in the Test_Marie_Squire folder. Within are all squire folders, created automatically by squire: * Squire fetch : genome + gene and TE annotations + STAR index * Squire clean : TE annotation simplified (automatically done by squire clean) * Squire map: BAM files (squire uses STAR) and all STAR info * Squire count: count table per TE copy + gene count table * Squire call : deseq2 on copies