# RNAseq EUSCHISTUS Pipeline for the analyses of Euschistus heros genitalia RNA-seqs. ## Aligning reads to the genome Creating a genome index: ``` # STAR has a problem with gff3 files. First convert to GTF: gffread -T EherGenomeAnnot.gff3 -o EherGenomeAnnot.gtf # build index #!/bin/bash #SBATCH -J star_index #SBATCH -p day #SBATCH -n 50 #SBATCH -o ./log_star_index #SBATCH -e ./log_star_index_error STAR --runThreadN 10 --runMode genomeGenerate --genomeDir /home/bruno/PROJECT_Eher/00_genome --genomeFastaFiles /home/bruno/PROJECT_Eher/00_genome/EherGenome.fa --sjdbGTFfile /home/bruno/PROJECT_Eher/00_genome/EherGenomeAnnot.gtf --sjdbOverhang 99 --outTmpDir /home/bruno/PROJECT_Eher/03_star/tempStar2 --limitGenomeGenerateRAM 14000000000 ``` Mapping reads to the genome using STAR: ``` #!/bin/bash #SBATCH -J star_map #SBATCH -p day #SBATCH -n 10 #SBATCH -o ./log_star_map #SBATCH -e ./log_star_map_error while read i do STAR --runThreadN 15 --genomeDir /home/bruno/PROJECT_Eher/00_genome --readFilesIn /home/bruno/PROJECT_Eher/01_rawseqs/${i}/${i}_1.fq.gz /home/bruno/PROJECT_Eher/01_rawseqs/${i}/${i}_2.fq.gz --outSAMtype BAM SortedByCoordinate --outFileNamePrefix mapped_reads_${i} --readFilesCommand zcat done < samplesList.txt # the argument "--readFilesCommand zcat" is important if the rnaseq read files are compacted ``` Then counting expression data with featureCounts: ``` #!/bin/bash #SBATCH -J featcounts #SBATCH -p day #SBATCH -n 30 #SBATCH -o ./log_featcounts #SBATCH -e ./log_featcounts_error while read i do featureCounts -T 15 -p -a /home/bruno/PROJECT_Eher/00_genome/EherGenomeAnnot.gtf -o counts_${i}.txt /home/bruno/PROJECT_Eher/03_star/mapped_reads_${i}Aligned.sortedByCoord.out.bam done < samplesList.txt ``` With Kallisto: ``` # build index kallisto index -i kallIN_Cmai.idx /home/bgenev/PROJECT_EVOEXP/02_assemblies/Cmai_longest_cdhit.fas ``` ``` # counting expression #!/bin/bash #SBATCH -J kall_count #SBATCH -p day #SBATCH -n 70 #SBATCH -o ./log_kall_count #SBATCH -e ./log_kall_count_error while read i do kallisto quant -t 70 -i ./00_genome/EherGenome_INDEX.idx -o ./02_quant/quant_${i} -b 100 ./01_rawseqs/${i}/${i}_1.fq.gz ./01_rawseqs/${i}/${i}_2.fq.gz ``` Functional annotation I ran in eggNog-mapper (online) and deepGOplus: ``` conda activate deepgoplus deepgoplus --data-root /home/bruno/anaconda3/bin/deepgoplus/data --in-file /home/bruno/PROJECT_Eher/00_genome/EherGenome_PRO_longest.faa ``` I will run the same analyses with Chinavia impicticornis First lets get orthologous ``` #getting orthologs using reciprocal blast with orthofinder ``` We will align reads to the transcriptome (reduced with only orthologous seqs to E heros) ``` salmon index -t Cimpicticornis_PRO_final_1to1Eher.faa -i Cimp_1to1_index