# RNAseq EUSCHISTUS
Pipeline for the analyses of Euschistus heros genitalia RNA-seqs.
## Aligning reads to the genome
Creating a genome index:
```
# STAR has a problem with gff3 files. First convert to GTF:
gffread -T EherGenomeAnnot.gff3 -o EherGenomeAnnot.gtf
# build index
#!/bin/bash
#SBATCH -J star_index
#SBATCH -p day
#SBATCH -n 50
#SBATCH -o ./log_star_index
#SBATCH -e ./log_star_index_error
STAR --runThreadN 10 --runMode genomeGenerate --genomeDir /home/bruno/PROJECT_Eher/00_genome --genomeFastaFiles /home/bruno/PROJECT_Eher/00_genome/EherGenome.fa --sjdbGTFfile /home/bruno/PROJECT_Eher/00_genome/EherGenomeAnnot.gtf --sjdbOverhang 99 --outTmpDir /home/bruno/PROJECT_Eher/03_star/tempStar2 --limitGenomeGenerateRAM 14000000000
```
Mapping reads to the genome using STAR:
```
#!/bin/bash
#SBATCH -J star_map
#SBATCH -p day
#SBATCH -n 10
#SBATCH -o ./log_star_map
#SBATCH -e ./log_star_map_error
while read i
do
STAR --runThreadN 15 --genomeDir /home/bruno/PROJECT_Eher/00_genome --readFilesIn /home/bruno/PROJECT_Eher/01_rawseqs/${i}/${i}_1.fq.gz /home/bruno/PROJECT_Eher/01_rawseqs/${i}/${i}_2.fq.gz --outSAMtype BAM SortedByCoordinate --outFileNamePrefix mapped_reads_${i} --readFilesCommand zcat
done < samplesList.txt
# the argument "--readFilesCommand zcat" is important if the rnaseq read files are compacted
```
Then counting expression data with featureCounts:
```
#!/bin/bash
#SBATCH -J featcounts
#SBATCH -p day
#SBATCH -n 30
#SBATCH -o ./log_featcounts
#SBATCH -e ./log_featcounts_error
while read i
do
featureCounts -T 15 -p -a /home/bruno/PROJECT_Eher/00_genome/EherGenomeAnnot.gtf -o counts_${i}.txt /home/bruno/PROJECT_Eher/03_star/mapped_reads_${i}Aligned.sortedByCoord.out.bam
done < samplesList.txt
```
With Kallisto:
```
# build index
kallisto index -i kallIN_Cmai.idx /home/bgenev/PROJECT_EVOEXP/02_assemblies/Cmai_longest_cdhit.fas
```
```
# counting expression
#!/bin/bash
#SBATCH -J kall_count
#SBATCH -p day
#SBATCH -n 70
#SBATCH -o ./log_kall_count
#SBATCH -e ./log_kall_count_error
while read i
do
kallisto quant -t 70 -i ./00_genome/EherGenome_INDEX.idx -o ./02_quant/quant_${i} -b 100 ./01_rawseqs/${i}/${i}_1.fq.gz ./01_rawseqs/${i}/${i}_2.fq.gz
```
Functional annotation I ran in eggNog-mapper (online) and deepGOplus:
```
conda activate deepgoplus
deepgoplus --data-root /home/bruno/anaconda3/bin/deepgoplus/data --in-file /home/bruno/PROJECT_Eher/00_genome/EherGenome_PRO_longest.faa
```
I will run the same analyses with Chinavia impicticornis
First lets get orthologous
```
#getting orthologs using reciprocal blast with orthofinder
```
We will align reads to the transcriptome (reduced with only orthologous seqs to E heros)
```
salmon index -t Cimpicticornis_PRO_final_1to1Eher.faa -i Cimp_1to1_index