HackMD - Collaborative Markdown Knowledge Base

**RELATÓRIO DE DISCIPLINA - BIOINFORMÁTICA APLICADA II: ANÁLISE DE TRANSCRIPTOMAS** ``` Aluno: Ramon Guedes Matos Docente: Daniel Guariz Pinheiro ``` Espécie escolhida para analisar o transcriptoma: *Vitis vinifera* (uva) Organela-alvo: mitocôndria Taxonomy ID: 29760 **Obtenção do genoma referência:** ``` wget ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/745/GCF_000003745.3_12X/GCF_000003745.3_12X_genomic.fna.gz ``` #Descompactados os arquivos .fna.gz e .gff.gz (gunzip *) **Limpeza, ajustes e conversão das referências:** ``` nano cleanfasta.sh ``` ``` #!/bin/bash infile=$1 if [ ! ${infile} ]; then echo "Missing input fasta file" exit fi if [ ! -e ${infile} ]; then echo "Not found input fasta file (${infile})" exit fi sed 's/^>$[^\.]\+$.*/>\1/' ${infile} ``` Execução do script: ``` chmod a+x cleanfasta.sh ``` **Limpeza dos cabeçalhos e minimização dos arquivos fasta/genômicos:** ``` ./cleanfasta.sh GCF_000003745.3_12X_genomic.fna > genome.fa fixNCBIgff.sh GCF_000003745.3_12X_genomic.gff genome.gff gffread genome.gff -g genome.fa -T -o genome.gtf gffread genome.gff -g genome.fa -w transcriptome.fa grep '^>' GCF_000003745.3_12X_genomic.fna | sed 's/^>$[^.]\+$\.[0-9]\+ /\1\t/' > genome.txt ``` #Total de genes anotados no genoma: **29963** | comando: ``` grep -c -P '\tgene\t' genome.gff ``` #Cromossomo 1 a ser trabalhado: NC_012007 **Identificação de acesso dos transcritos/isoformas:** Obs.: foram selecionados 3 genes com dois transcritos cada ``` nano ACCS.txt ``` ![](https://i.imgur.com/0JqqIJ5.png) A primeira coluna contém os transcritos/isoformas e a segunda seus respectivos genes. **O script a seguir foi executado para pegar o fasta dos transcritos e anexar ao arquivo ‘transcriptoma.fa’:** ``` #!/bin/bash rm -f transcriptoma.fa for acc in XM_010651729.2 XM_010651725.2 XM_010651713.2 XM_002263575.3 XM_010651 756.2 XM_010651752.2 ; do echo "Pegando FASTA para ${acc} ..." esearch -db nucleotide -query ${acc} | efetch \ -format fasta >> transcriptoma.fa done ``` **Para verificar se o processo foi sucedido, é exibido uma linha com as informações para cada transcrito selecionado:** ``` grep ‘^>’ transcriptoma.fa ``` ![](https://i.imgur.com/A7tZpyD.png) **Para condição biológica A e B, foram criados arquivos de abundância para cada transcrito:** ![](https://i.imgur.com/11A6y2e.png) **Os resultados das somas de cada abundância tem que ser 1:** ``` perl -F"\t" -lane 'INIT{$sum=0;} $sum+=$F[1]; END{print "SOMA: $sum"; } ' abundance_A.txt perl -F"\t" -lane 'INIT{$sum=0;} $sum+=$F[1]; END{print "SOMA: $sum"; } ' abundance_B.txt ``` ![](https://i.imgur.com/iYkDt6g.png) **O genoma referência foi reduzido para somente o cromossomo selecionado para a análise (NC_012007) para arquivos em formato fasta e gff:** ``` echo -e "NC_012007" | pullseq -N -i genome.fa > toygenome.fa grep -P '^(##gff|NC_012007\t)' genome.gff > toygenome.gff ``` **Gerando arquivos .FASTQ das réplicas dos dados das abundâncias dos transcritos.** O script para eucariotos foi utilizado: ``` #!/bin/bash # # INGLÊS/ENGLISH # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # http://www.gnu.org/copyleft/gpl.html # # # PORTUGUÊS/PORTUGUESE # Este programa é distribuído na expectativa de ser útil aos seus # usuários, porém NÃO TEM NENHUMA GARANTIA, EXPLÍCITAS OU IMPLÍCITAS, # COMERCIAIS OU DE ATENDIMENTO A UMA DETERMINADA FINALIDADE. Consulte # a Licença Pública Geral GNU para maiores detalhes. # http://www.gnu.org/copyleft/gpl.html # # Copyright (C) 2012 Universidade de São Paulo # # Universidade de São Paulo # Laboratório de Biologia do Desenvolvimento de Abelhas # Núcleo de Bioinformática (LBDA-BioInfo) # # Daniel Guariz Pinheiro # dgpinheiro@gmail.com # http://zulu.fmrp.usp.br/bioinfo # nreps=$1 if [ ! ${nreps} ]; then nreps=2 else let nreps+=0 if [ ${nreps} -lt 2 ]; then echo "[ERROR] Invalid number of replicates (${nreps})" 1>&2 exit fi fi nreads=$2 if [ ! ${nreads} ]; then nreads=25000 else let nreads+=0 if [ ${nreads} -lt 1 ]; then echo "[ERROR] Invalid number of reads (${nreads})" 1>&2 exit fi fi echo "[WARN] Using ${nreps} biological replicates with ${nreads} reads." 1>&2 rm -f transcriptoma.fa IFS=$'\n' for acc in $(cut -f 1 ./ACCS.txt); do echo "Pegando FASTA para ${acc} ..." esearch -db nucleotide -query ${acc} | efetch \ -format fasta >> transcriptoma.fa done for biogroup in A B; do for rep in `seq 1 ${nreps}`; do echo "Gerando reads para amostra ${biogroup} réplica ${rep} ..." generate_fragments.py -r transcriptoma.fa \ -a ./abundance_${biogroup}.txt \ -o ./tmp.frags_${biogroup}_${rep} \ -t ${nreads} \ -i 300 \ -s 30 cat ./tmp.frags_${biogroup}_${rep}.1.fasta | renameSeqs.pl \ -if FASTA \ -of FASTA \ -p SAMPLE${biogroup}${rep} \ -w 1000 | \ sed 's/^>$\S\+$.*/>\1/' \ > ./frags_${biogroup}${rep}.fa cat ./frags_${biogroup}${rep}.fa | simNGS -a \ AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \ -p paired \ /usr/local/bioinfo/simNGS/data/s_4_0099.runfile \ -n 151 > ./SAMPLE${biogroup}${rep}.fastq 2> SAMPLE${biogroup}${rep}.err.txt mkdir -p ./raw deinterleave_pairs SAMPLE${biogroup}${rep}.fastq \ -o ./raw/SAMPLE${biogroup}${rep}_R1.fastq \ ./raw/SAMPLE${biogroup}${rep}_R2.fastq rm -f ./tmp.frags_${biogroup}_${rep}.1.fasta ./frags_${biogroup}${rep}.fa ./SAMPLE${biogroup}${rep}.fastq ./SAMPLE${biogroup}${rep}.err.txt echo "Número de reads ${biogroup}${rep} R1:" $(echo "$(cat raw/SAMPLE${biogroup}${rep}_R1.fastq | wc -l)/4" | bc) echo "Número de reads ${biogroup}${rep} R2:" $(echo "$(cat raw/SAMPLE${biogroup}${rep}_R2.fastq | wc -l)/4" | bc) done done ``` Em seguida, executou-se o script: ``` ./sim.sh ``` **Resultado (gerou o diretório raw com os .fastq):** ![](https://i.imgur.com/cqIjIdj.png) Total de 65056 reads! #**Alinhamento com o Bowtie2** 1. Indexação do genoma: ``` bowtie2-build -f genome.fa genome ``` 2. Criação do diretório para colocar todos os índices gerados: ``` mkdir ./refs ``` 3. Alinhamento das amostras (SAMPLEA-SAMPLEB): ``` time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEA1_R1.fastq -2 ./raw/SAMPLEA1_R2.fastq -S bowtie2-A1.sam time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEA2_R1.fastq -2 ./raw/SAMPLEA2_R2.fastq -S bowtie2-A2.sam time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEB1_R1.fastq -2 ./raw/SAMPLEB1_R2.fastq -S bowtie2-B1.sam time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEB2_R1.fastq -2 ./raw/SAMPLEB2_R2.fastq -S bowtie2-B2.sam ``` Conversão dos arquivos **.sam** para **.bam** (A1[test], A2, B1, B2): ``` samtools view -b bowtie2-B2.sam > bowtie2-B2.bam samtools sort bowtie2-B2.bam -o bowtie2-B2-sorted.bam samtools index bowtie2-B2-sorted.bam samtools view -b bowtie2-test.sam > bowtie2-test.bam samtools sort bowtie2-test.bam -o bowtie2-test-sorted.bam samtools index bowtie2-test-sorted.bam samtools view -b bowtie2-A2.sam > bowtie2-A2.bam samtools sort bowtie2-A2.bam -o bowtie2-A2-sorted.bam samtools index bowtie2-A2-sorted.bam samtools view -b bowtie2-B1.sam > bowtie2-B1.bam samtools sort bowtie2-B1.bam -o bowtie2-B1-sorted.bam samtools index bowtie2-B1-sorted.bam samtools view -b bowtie2-B2.sam > bowtie2-B2.bam samtools sort bowtie2-B2.bam -o bowtie2-B2-sorted.bam samtools index bowtie2-B2-sorted.bam ``` Arquivos resultantes do alinhamento: ![](https://i.imgur.com/TAegVo4.png) **Visualizando alinhamentos no IGV** Os arquivos .BAM e .BAI e o genoma.fa foram transferidos do Putty para o computador (sistema operacional Windows) via WinSCP. No IGV foram importados os arquivos para visualização das reads contra o genoma. Segue um exemplo de uma amostra: ![](https://i.imgur.com/cRiDIWl.png) ![](https://i.imgur.com/k0nuZkW.png) **MONTAGEM DO TRANSCRIPTOMA COM REFERÊNCIA** #Aplicando o pipeline rnaseq-ref.sh ``` #!/bin/bash # ALINHADOR tophat OU star aligner=${1} if [ ! ${aligner} ]; then echo "[ERROR] Missing aligner (tophat or star)." 1>&2 exit fi if [ "${aligner}" != "tophat" ] && [ "${aligner}" != "star" ]; then echo "[ERROR] Aligner must be \"tophat\" or \"star\" (${aligner})." 1>&2 exit fi indir=${2} # SE ${indir} NÃO EXISTE, OU SEJA, SE NÃO FOI PASSADO ARGUMENTO 1 NA LINHA DE CO MANDO if [ ! ${indir} ]; then echo "[ERROR] Missing input directory." 1>&2 exit fi # SE ${indir} NÃO É DIRETÓRIO if [ ! -d ${indir} ]; then echo "[ERROR] Wrong input directory (${indir})." 1>&2 exit fi outdir=${3} # SE ${outdir} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 2 NA LINHA DE COMANDO if [ ! ${outdir} ]; then echo "[ERROR] Missing output directory." 1>&2 exit fi # SE ${outdir} NÃO É DIRETÓRIO if [ ! -d ${outdir} ]; then echo "[ERROR] Wrong output directory (${outdir})." 1>&2 exit fi # Número de CORES para o processamento # ATENÇÃO: Não exceder o limite da máquina THREADS=${4} if [ ! ${THREADS} ]; then echo "[ERROR] Missing number of threads." 1>&2 exit fi refgtf=${5} # SE ${refgtf} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 3 NA LINHA DE COMANDO if [ ! ${refgtf} ]; then echo "[ERROR] Missing GTF file." 1>&2 exit fi if [ ! -e "${refgtf}" ]; then echo "[ERROR] Not found GTF file (${refgtf})." 1>&2 exit fi refseq=${6} # SE ${refseq} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 4 NA LINHA DE COMANDO if [ ! ${refseq} ]; then echo "[ERROR] Missing GENOME fasta file." 1>&2 exit fi if [ ! -e "${refseq}" ]; then echo "Not found GENOME fasta file (${refseq})." 1>&2 exit fi # Opção cufflinks/stringtie assembler=${7} if [ ! ${assembler} ]; then echo "[ERROR] Missing assembler (cufflinks or stringtie)." 1>&2 exit fi if [ "${assembler}" != "cufflinks" ] && [ "${assembler}" != "stringtie" ]; then echo "[ERROR] Assembler must be \"cufflinks\" or \"stringtie\" (${assembler})." 1>&2 exit fi # Contaminantes contaminants=${8} if [ ! ${contaminants} ]; then echo "[ERROR] Missing contaminant info. Please set \"NA\" here for execution without contaminants." 1>&2 exit fi if [ "${contaminants}" == "NA" ]; then contaminants="" fi ./preprocess5.sh "${indir}" "${outdir}" "${THREADS}" ${contaminants} # gene_info gene_info=${9} # taxonomy_id taxonomy_id=${10} echo -e "Starting Transcriptome Assembly ..." # Criação de estrutura de diretórios curdir=`pwd` refseq_abs_path=$(readlink -f ${refseq}) if [ "${aligner}" == "tophat" ]; then mkdir -p ${outdir}/tophat_index mkdir -p ${outdir}/tophat_out_pe mkdir -p ${outdir}/tophat_out_se mkdir -p ${outdir}/tophat_out_final if [ ! -e "${outdir}/tophat_index/genome.fa" ]; then cd ${outdir}/tophat_index ln -s ${refseq_abs_path} genome.fa cd ${curdir} fi if [ ! -e "${outdir}/tophat_index/genome.1.bt2" ]; then echo -e "Indexing genome with TopHat2 ..." cd ${outdir}/tophat_index bowtie2-build --threads ${THREADS} \ genome.fa genome > bowtie2.out.txt 2> bowtie2.err.txt cd ${curdir} fi else # CASO CONTRÁRIO SERÁ star mkdir -p ${outdir}/star_index mkdir -p ${outdir}/star_out_pe mkdir -p ${outdir}/star_out_se mkdir -p ${outdir}/star_out_final if [ ! -e "${outdir}/star_index/genome.fa" ]; then cd ${outdir}/star_index ln -s ${refseq_abs_path} genome.fa cd ${curdir} fi if [ ! -e "${outdir}/star_index/SAindex" ]; then absrefgtf=`readlink -f ${refgtf}` cd ${outdir}/star_index echo -e "Indexing genome with STAR ..." STAR --runThreadN ${THREADS} \ --runMode genomeGenerate \ --genomeFastaFiles genome.fa \ --genomeDir ./ \ --sjdbGTFfile ${absrefgtf} \ --genomeSAindexNbases 12 \ --sjdbOverhang 149 \ > STAR.genomeGenerate.log.out.txt \ 2> STAR.genomeGenerate.log.err.txt cd ${curdir} fi fi if [ "${assembler}" == "cufflinks" ]; then mkdir -p ${outdir}/${aligner}_cufflinks mkdir -p ${outdir}/${aligner}_cuffmerge elif [ "${assembler}" == "stringtie" ]; then mkdir -p ${outdir}/${aligner}_stringtie mkdir -p ${outdir}/${aligner}_stringmerge else echo "[ERROR] Unexpected error!" 1>&2 exit fi mkdir -p ${outdir}/${aligner}_${assembler}_cuffcompare mkdir -p ${outdir}/${aligner}_${assembler}_cuffquant mkdir -p ${outdir}/${aligner}_${assembler}_cuffnorm mkdir -p ${outdir}/${aligner}_${assembler}_cuffdiff for r1 in `find ${outdir}/ -name '*.prinseq.cleaned_1.fastq'`; do r2=`echo ${r1} | sed 's/prinseq.cleaned_1.fastq/prinseq.cleaned_2.fastq/'` if [ ! -e ${r2} ]; then echo "[ERROR] Not found R2 (${r2})." 1>&2 exit fi echo -e "\tFound R1 ($(basename ${r1})) & R2 ($(basename ${r2})) ..." r1_singletons=`echo ${r1} | sed 's/prinseq.cleaned_1.fastq/prinseq.cleaned_1_singletons.fastq/'` r2_singletons=`echo ${r2} | sed 's/prinseq.cleaned_2.fastq/prinseq.cleaned_2_singletons.fastq/'` if [ ! -e ${r1_singletons} ]; then echo "[ERROR] Not found R1 singletons (${r1_singletons})." 1>&2 exit fi if [ ! -e ${r2_singletons} ]; then echo "[ERROR] Not found R2 singletons (${r2_singletons})." 1>&2 exit fi name=`basename ${r1} .fastq | sed 's/.atropos_final.prinseq.cleaned_1//'` mkdir -p ${outdir}/align_out_final/${name} if [ "${aligner}" == "tophat" ]; then if [ ! -e "${outdir}/tophat_out_pe/${name}/accepted_hits.bam" ]; then echo -e "\tTopHat2 alignment (${name}) paired-end reads X genome ..." tophat2 --min-anchor 8 \ --min-intron-length 50 \ --max-intron-length 5000 \ --max-multihits 20 \ --transcriptome-max-hits 10 \ --prefilter-multihits \ --num-threads ${THREADS} \ --GTF ${refgtf} \ --transcriptome-index ${outdir}/tophat_index/transcriptome \ --mate-inner-dist 0 \ --mate-std-dev 50 \ --coverage-search \ --microexon-search \ --b2-very-sensitive \ --library-type fr-unstranded \ --output-dir ${outdir}/tophat_out_pe/${name} \ --no-sort-bam \ ${outdir}/tophat_index/genome \ ${r1} \ ${r2} > ${outdir}/tophat_out_pe/${name}.log.out.txt \ 2> ${outdir}/tophat_out_pe/${name}.log.err.txt else echo -e "\tFound Tophat2 output for PE (${name})..." fi if [ ! -e "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then mkdir -p ${outdir}/tophat_out_se/${name} cat ${r1_singletons} ${r2_singletons} > ${outdir}/tophat_out_se/${name}/singletons.fastq if [ -s "${outdir}/tophat_out_se/${name}/singletons.fastq" ]; then echo -e "\tTopHat2 alignment (${name}) singleton reads X genome ..." tophat2 --min-anchor 8 \ --min-intron-length 50 \ --max-intron-length 5000 \ --max-multihits 20 \ --transcriptome-max-hits 10 \ --prefilter-multihits \ --num-threads ${THREADS} \ --GTF ${refgtf} \ --transcriptome-index ${outdir}/tophat_index/transcriptome \ --coverage-search \ --microexon-search \ --b2-very-sensitive \ --library-type fr-unstranded \ --output-dir ${outdir}/tophat_out_se/${name} \ --no-sort-bam \ ${outdir}/tophat_index/genome \ ${outdir}/tophat_out_se/${name}/singletons.fastq \ > ${outdir}/tophat_out_se/${name}.log.out.txt \ 2> ${outdir}/tophat_out_se/${name}.log.err.txt # Considerar a implementação do TopHat-Recondition https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1058-x fi else echo -e "\tFound Tophat2 output for SE (${name})..." fi if [ ! -e "${outdir}/tophat_out_final/${name}/accepted_hits.bam" ]; then mkdir -p ${outdir}/tophat_out_final/${name} if [ -s "${outdir}/tophat_out_pe/${name}/accepted_hits.bam" ]; then if [ -s "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then echo -e "\tMerging TopHat2 results ..." samtools view -H ${outdir}/tophat_out_pe/${name}/accepted_hits.bam > ${outdir}/tophat_out_final/${name}/Header.txt samtools merge -n --threads ${THREADS} \ -h ${outdir}/tophat_out_final/${name}/Header.txt \ ${outdir}/tophat_out_final/${name}/accepted_hits.bam \ ${outdir}/tophat_out_pe/${name}/accepted_hits.bam \ ${outdir}/tophat_out_se/${name}/accepted_hits.bam \ > ${outdir}/tophat_out_final/${name}.log.out.txt \ 2> ${outdir}/tophat_out_final/${name}.log.err.txt else pe_result_abs_path=$(readlink -f ${outdir}/tophat_out_pe/${name}/accepted_hits.bam) cd ${outdir}/tophat_out_final/${name}/ ln -s ${pe_result_abs_path} accepted_hits.bam cd ${curdir} fi else if [ -s "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then se_result_abs_path=$(readlink -f ${outdir}/tophat_out_se/${name}/accepted_hits.bam) cd ${outdir}/tophat_out_final/${name}/ ln -s ${se_result_abs_path} accepted_hits.bam cd ${curdir} else echo -e "[ERROR] Not found any alignment for PE or SE reads." 1>&2 fi fi else echo -e "\tFound Tophat2 output final (${name})..." fi if [ ! -e "${outdir}/tophat_out_final/${name}/Aligned.sorted.bam" ]; then echo -e "\tSorting alignments (${name})..." samtools sort --threads ${THREADS} \ -o ${outdir}/tophat_out_final/${name}/Aligned.sorted.bam \ ${outdir}/tophat_out_final/${name}/accepted_hits.bam \ > ${outdir}/tophat_out_final/${name}/Aligned.sorted.out.txt \ 2> ${outdir}/tophat_out_final/${name}/Aligned.sorted.err.txt fi # SEMPRE VAMOS REMOVER O LINK SIMBÓLICO PARA QUE AO ESCOLHER UM OUTRO # ALINHADOR ELE SEJA SUBSTITUÍDO #if [ ! -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then rm -f ${outdir}/align_out_final/${name}/Aligned.out.bam rm -f ${outdir}/align_out_final/${name}/Aligned.sorted.bam if [ -e "${outdir}/tophat_out_final/${name}/accepted_hits.bam" ]; then align_final_out=`readlink -f ${outdir}/tophat_out_final/${name}/accepted_hits.bam` align_sorted_out=`readlink -f ${outdir}/tophat_out_final/${name}/Aligned.sorted.bam` cd ${outdir}/align_out_final/${name} ln -s ${align_final_out} Aligned.out.bam ln -s ${align_sorted_out} Aligned.sorted.bam cd ${curdir} else echo "[ERROR] Not found Tophat final output (${outdir}/tophat_out_final/${name}/accepted_hits.bam)" 2>&1 exit fi #fi else # SE NÃO FOR tophat ENTÃO star if [ ! -e "${outdir}/star_out_pe/${name}/Aligned.out.bam" ]; then echo -e "\tSTAR alignment (${name}) paired-end reads X genome ..." mkdir -p ${outdir}/star_out_pe/${name}/ # Para a execução do cufflinks é necessário: --outSAMstrandField intronMotif e --outFilterIntronMotifs RemoveNoncanonical STAR --runThreadN ${THREADS} \ --genomeDir ${outdir}/star_index/ \ --readFilesIn ${r1} ${r2} \ --outSAMstrandField intronMotif \ --outFilterIntronMotifs RemoveNoncanonical \ --sjdbGTFfile ${refgtf} \ --outFilterMultimapNmax 20 \ --outFileNamePrefix ${outdir}/star_out_pe/${name}/ \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outSJfilterReads Unique \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverReadLmax 0.02 \ --alignIntronMin 50 \ --alignIntronMax 5000 \ --alignMatesGapMax 5000 \ > ${outdir}/star_out_pe/${name}.log.out.txt \ 2> ${outdir}/star_out_pe/${name}.log.err.txt else echo -e "\tFound STAR output for PE (${name})..." fi if [ ! -e "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then mkdir -p ${outdir}/star_out_se/${name} cat ${r1_singletons} ${r2_singletons} > ${outdir}/star_out_se/${name}/singletons.fastq if [ -s "${outdir}/star_out_se/${name}/singletons.fastq" ]; then echo -e "\tSTAR alignment (${name}) singleton reads X genome ..." STAR --runThreadN ${THREADS} \ --genomeDir ${outdir}/star_index/ \ --readFilesIn ${r1} \ --outSAMstrandField intronMotif \ --outFilterIntronMotifs RemoveNoncanonical \ --sjdbGTFfile ${refgtf} \ --outFilterMultimapNmax 20 \ --outFileNamePrefix ${outdir}/star_out_se/${name}/ \ --outSAMtype BAM Unsorted \ --outFilterType BySJout \ --outSJfilterReads Unique \ --alignSJoverhangMin 8 \ --alignSJDBoverhangMin 1 \ --outFilterMismatchNmax 999 \ --outFilterMismatchNoverReadLmax 0.02 \ --alignIntronMin 50 \ --alignIntronMax 5000 \ --alignMatesGapMax 5000 \ > ${outdir}/star_out_se/${name}.log.out.txt \ 2> ${outdir}/star_out_se/${name}.log.err.txt fi else echo -e "\tFound STAR output for SE (${name})..." fi if [ ! -e "${outdir}/star_out_final/${name}/Aligned.out.bam" ]; then mkdir -p ${outdir}/star_out_final/${name} if [ -s "${outdir}/star_out_pe/${name}/Aligned.out.bam" ]; then if [ -s "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then echo -e "\tMerging STAR results ..." samtools view -H ${outdir}/star_out_pe/${name}/Aligned.out.bam > ${outdir}/star_out_final/${name}/Header.txt samtools merge -n --threads ${THREADS} \ -h ${outdir}/star_out_final/${name}/Header.txt \ ${outdir}/star_out_final/${name}/Aligned.out.bam \ ${outdir}/star_out_pe/${name}/Aligned.out.bam \ ${outdir}/star_out_se/${name}/Aligned.out.bam \ > ${outdir}/star_out_final/${name}.log.out.txt \ 2> ${outdir}/star_out_final/${name}.log.err.txt samtools sort -n --threads ${THREADS} \ ${outdir}/star_out_final/${name}/Aligned.out.bam \ -o ${outdir}/star_out_final/${name}/Aligned.named.out.bam rm -f ${outdir}/star_out_final/${name}/Aligned.out.bam mv ${outdir}/star_out_final/${name}/Aligned.named.out.bam \ ${outdir}/star_out_final/${name}/Aligned.out.bam else pe_result_abs_path=$(readlink -f ${outdir}/star_out_pe/${name}/Aligned.out.bam) cd ${outdir}/star_out_final/${name}/ ln -s ${pe_result_abs_path} Aligned.out.bam cd ${curdir} fi else if [ -s "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then se_result_abs_path=$(readlink -f ${outdir}/star_out_se/${name}/Aligned.out.bam) cd ${outdir}/star_out_final/${name}/ ln -s ${se_result_abs_path} Aligned.out.bam cd ${curdir} else echo -e "[ERROR] Not found any alignment for PE or SE reads." 1>&2 fi fi else echo -e "\tFound STAR output final (${name})..." fi if [ ! -e "${outdir}/star_out_final/${name}/Aligned.sorted.bam" ]; then echo -e "\tSorting alignments (${name})..." samtools sort --threads ${THREADS} \ -o ${outdir}/star_out_final/${name}/Aligned.sorted.bam \ ${outdir}/star_out_final/${name}/Aligned.out.bam \ > ${outdir}/star_out_final/${name}/Aligned.sorted.out.txt \ 2> ${outdir}/star_out_final/${name}/Aligned.sorted.err.txt fi # SEMPRE VAMOS REMOVER O LINK SIMBÓLICO PARA QUE AO ESCOLHER UM OUTRO # ALINHADOR ELE SEJA SUBSTITUÍDO #if [ ! -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then rm -f ${outdir}/align_out_final/${name}/Aligned.out.bam rm -f ${outdir}/align_out_final/${name}/Aligned.sorted.bam if [ -e "${outdir}/star_out_final/${name}/Aligned.out.bam" ]; then align_final_out=`readlink -f ${outdir}/star_out_final/${name}/Aligned.out.bam` align_sorted_out=`readlink -f ${outdir}/star_out_final/${name}/Aligned.sorted.bam` cd ${outdir}/align_out_final/${name} ln -s ${align_final_out} Aligned.out.bam ln -s ${align_sorted_out} Aligned.sorted.bam cd ${curdir} else echo "[ERROR] Not found STAR final output (${outdir}/star_out_final/${name}/Aligned.out.bam)" 2>&1 exit fi #fi fi mkdir -p ${outdir}/align_out_info/ if [ -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then if [ ! -e "${outdir}/align_out_info/${name}.${aligner}.out.txt" ]; then echo -e "\tGet alignment information (${name}) [${aligner}] ..." SAM_nameSorted_to_uniq_count_stats.pl ${outdir}/align_out_final/${name}/Aligned.out.bam > ${outdir}/align_out_info/${name}.${aligner}.out.txt 2> ${outdir}/align_out_info/${name}.${aligner}.err.txt fi mkdir -p ${outdir}/${aligner}_cufflinks/${name} if [ "${assembler}" == "cufflinks" ]; then if [ ! -e "${outdir}/${aligner}_cufflinks/${name}/transcripts.gtf" ]; then echo -e "\tAssembly transcriptome (${name}) [cufflinks]\n" cufflinks --output-dir ${outdir}/${aligner}_cufflinks/${name} \ --num-threads ${THREADS} \ --GTF-guide ${refgtf} \ --frag-bias-correct ${refseq} \ --multi-read-correct \ --library-type fr-unstranded \ --frag-len-mean 300 \ --frag-len-std-dev 30 \ --total-hits-norm \ --min-isoform-fraction 0.25 \ --pre-mrna-fraction 0.15 \ --min-frags-per-transfrag 10 \ --junc-alpha 0.001 \ --small-anchor-fraction 0.08 \ --overhang-tolerance 8 \ --min-intron-length 50 \ --max-intron-length 5000 \ --trim-3-avgcov-thresh 5 \ --trim-3-dropoff-frac 0.1 \ --max-multiread-fraction 0.75 \ --overlap-radius 25 \ --3-overhang-tolerance 600 \ --intron-overhang-tolerance 50 \ ${outdir}/align_out_final/${name}/Aligned.sorted.bam \ > ${outdir}/${aligner}_cufflinks/${name}/cufflinks.out.txt \ 2> ${outdir}/${aligner}_cufflinks/${name}/cufflinks.err.txt fi else if [ ! -e "${outdir}/${aligner}_stringtie/${name}/transcripts.gtf" ]; then mkdir -p ${outdir}/${aligner}_stringtie/${name} echo -e "\tAssembly transcriptome (${name}) [stringtie]\n" stringtie ${outdir}/align_out_final/${name}/Aligned.sorted.bam \ -G ${refgtf} \ -f 0.25 \ -m 200 \ -o ${outdir}/${aligner}_stringtie/${name}/transcripts.gtf \ -a 8 \ -j 2 \ -c 4 \ -v \ -g 25 \ -C ${outdir}/${aligner}_stringtie/${name}/coverages.txt \ -M 0.9 \ -p ${THREADS} \ -A ${outdir}/${aligner}_stringtie/${name}/abundances.txt \ -B \ > ${outdir}/${aligner}_stringtie/${name}/stringtie.out.txt \ 2> ${outdir}/${aligner}_stringtie/${name}/stringtie.err.txt fi fi else echo -e "[ERROR] Not found alignment data (${outdir}/align_out_final/${name}/Aligned.out.bam)" 1>&2 fi done rm -f ${outdir}/assembly_GTF_list.txt for transc in `find ${outdir}/${aligner}_${assembler} -name transcripts.gtf`; do #echo -e "\tProcessing transcriptome ${transc} ..." echo ${transc} >> ${outdir}/assembly_GTF_list.txt done transcriptomeref="" if [ "${assembler}" == "cufflinks" ]; then if [ ! -e "${outdir}/${aligner}_cuffmerge/merged.gtf" ]; then echo -e "\tMerging transcriptomes (${outdir}/assembly_GTF_list.txt) in a transcriptome reference [cuffmerge]\n" cuffmerge -o ${outdir}/${aligner}_cuffmerge \ --ref-gtf ${refgtf} \ --ref-sequence ${refseq} \ --min-isoform-fraction 0.25 \ --num-threads ${THREADS} \ ${outdir}/assembly_GTF_list.txt \ > ${outdir}/${aligner}_cuffmerge/cuffmerge.out.txt \ 2> ${outdir}/${aligner}_cuffmerge/cuffmerge.err.txt fi transcriptomeref="${outdir}/${aligner}_cuffmerge/merged.gtf" else if [ ! -e "${outdir}/${aligner}_stringmerge/merged.gtf" ]; then echo -e "\tMerging transcriptomes (${outdir}/assembly_GTF_list.txt) in a transcriptome reference [stringtie]\n" stringtie --merge \ -G ${refgtf} \ -o ${outdir}/${aligner}_stringmerge/merged.gtf \ -m 200 \ -c 4 \ -F 4 \ -T 4 \ -f 0.25 \ -g 100 \ ${outdir}/assembly_GTF_list.txt \ > ${outdir}/${aligner}_stringmerge/stringmerge.out.txt \ 2> ${outdir}/${aligner}_stringmerge/stringmerge.err.txt fi transcriptomeref="${outdir}/${aligner}_stringmerge/merged.gtf" fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf" ]; then echo -e "\tRunning cuffcompare with ${aligner} & ${assembler} transcriptome reference (${transcriptomeref})..." cuffcompare -r ${refgtf} \ -s ${refseq} \ -o ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp \ ${transcriptomeref} \ > ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.err.txt # ANOTAÇÃO DOS TRANSCRITOS "TCONS" perl -F"\t" -lane 'INIT { print join("\t","transcript_id","nearest_ref","class_code"); } my ($transcript_id)=$F[8]=~/transcript_id \"([^\"]+)\"/; my ($nearest_r ef)=$F[8]=~/nearest_ref \"([^\"]+)\"/; $nearest_ref=~s/^rna-//; $nearest_ref=~s/_[1-9]+$//; $nearest_ref=~s/^rna_gene-//; my ($class_code)=$F[8]=~/class_code \"([^\"]+) \"/; print $transcript_id,"\t",$nearest_ref||"","\t",$class_code||"";' ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf | awk 'NR == 1; NR > 1 {print $0 | "sort -u"}' > ${outdir}/${aligner}_${assembler}_cuffcompare/TCONS.nearest_ref.txt fi transcriptomeref="${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf" # LISTA DE VALORES NÃO REDUNDANTES (NOME DO GRUPO BIOLÓGICO) # Ex.: (CONTROL TEST) biogroup_label=() for bamfile in `find ${outdir}/align_out_final -name Aligned.sorted.bam`; do name=`basename $(dirname ${bamfile})` if [ ! -e "${outdir}/${aligner}_${assembler}_cuffquant/${name}/abundances.cxb" ]; then echo -e "\tRunning cuffquant using sample ${name} as using ${aligner} & ${assembler} (${transcriptomeref}) ..." mkdir -p ${outdir}/${aligner}_${assembler}_cuffquant/${name} cuffquant --output-dir ${outdir}/${aligner}_${assembler}_cuffquant/${name} \ --frag-bias-correct ${refseq} \ --multi-read-correct \ --num-threads ${THREADS} \ --library-type fr-unstranded \ --frag-len-mean 300 \ --frag-len-std-dev 30 \ --max-bundle-frags 9999999 \ --max-frag-multihits 20 \ ${transcriptomeref} \ ${bamfile} \ > ${outdir}/${aligner}_${assembler}_cuffquant/${name}/cuffquant.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffquant/${name}/cuffquant.log.err.txt fi groupname=`echo ${name} | sed 's/[_\.\#\-]\?[0-9]\+$//'` biogroup_label=($(printf "%s\n" ${biogroup_label[@]} ${groupname} | sort -u )) done biogroup_files=() echo -e "\tCollecting Expression Data from cuffquant output (*.cxb) ..." for label in ${biogroup_label[@]}; do echo -e "\t\tCollecting .cxb files for ${label} ..." group=() for cxbfile in `ls ${outdir}/${aligner}_${assembler}_cuffquant/${label}*/abundances.cxb`; do echo -e "\t\t\tFound ${cxbfile}" group=(${group[@]} "${cxbfile}") done biogroup_files=(${biogroup_files[@]} $(IFS=, ; echo "${group[*]}") ) done echo -e "Starting Gene Expression Analysis ..." echo -e "\t\tLabels.: " $(IFS=, ; echo "${biogroup_label[*]}") echo -e "\t\tFiles..: " ${biogroup_files[*]} if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.count_table" ]; then echo -e "\t\t\tGenerating abundance matrices (cuffnorm) ..." cuffnorm --output-dir ${outdir}/${aligner}_${assembler}_cuffnorm \ --labels $(IFS=, ; echo "${biogroup_label[*]}") \ --num-threads ${THREADS} \ --library-type fr-unstranded \ --library-norm-method geometric \ --output-format simple-table \ ${transcriptomeref} \ ${biogroup_files[*]} \ > ${outdir}/${aligner}_${assembler}_cuffnorm/cuffnorm.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffnorm/cuffnorm.log.err.txt fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.raw_count_table.txt" ]; then de-normalize-cuffnorm.R --in=${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.count_table \ --st=${outdir}/${aligner}_${assembler}_cuffnorm/samples.table \ --out=${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.raw_count_table.txt > ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.isoforms.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.isoforms.err.txt fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/genes.raw_count_table.txt" ]; then de-normalize-cuffnorm.R --in=${outdir}/${aligner}_${assembler}_cuffnorm/genes.count_table \ --st=${outdir}/${aligner}_${assembler}_cuffnorm/samples.table \ --out=${outdir}/${aligner}_${assembler}_cuffnorm/genes.raw_count_table.txt > ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.genes.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.genes.err.txt fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff" ]; then echo -e "\t\t\tAnalysing differential expression (cuffdiff) ..." cuffdiff --output-dir ${outdir}/${aligner}_${assembler}_cuffdiff \ --labels $(IFS=, ; echo "${biogroup_label[*]}") \ --frag-bias-correct ${refseq} \ --multi-read-correct \ --num-threads ${THREADS} \ --library-type fr-unstranded \ --frag-len-mean 300 \ --frag-len-std-dev 30 \ --max-bundle-frags 9999999 \ --max-frag-multihits 20 \ --total-hits-norm \ --min-reps-for-js-test 2 \ --library-norm-method geometric \ --dispersion-method per-condition \ --min-alignment-count 10 \ ${transcriptomeref} \ ${biogroup_files[*]} \ > ${outdir}/${aligner}_${assembler}_cuffdiff/cuffdiff.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffdiff/cuffdiff.log.err.txt fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff.annot.txt" ]; then echo "Annotating isoform_exp.diff ..." mergeR.R --x=${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff \ --by.x="test_id" \ --y=${outdir}/${aligner}_${assembler}_cuffcompare/TCONS.nearest_ref.txt \ --by.y="transcript_id" \ --all.x \ --print.out.label \ --out=${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff.annot.txt \ > ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.isoform.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.isoform.log.err.txt fi if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.annot.txt" ]; then if [ ${gene_info} ] && [ ${taxonomy_id} ]; then if [ ! -e ${gene_info} ]; then echo "[ERROR] Wrong gene_info file (${gene_info})." 1>&2 exit fi if [[ ! ${taxonomy_id} =~ [0-9]+ ]]; then echo "[ERROR] Wrong taxonomy_id (${taxonomy_id})." 1>&2 exit fi if [ ! -e "${outdir}/gene_info.${taxonomy_id}.txt" ]; then echo "Getting gene_info (${gene_info}) for taxonomy_id = ${taxonomy_id} ..." cat ${gene_info} | perl -F"\t" -slane 'INIT { $taxonomy_id+=0; } if ($.==1) { print $_; } elsif ($F[0]==$taxonomy_id) { print $_; } ' -- -taxono my_id=${taxonomy_id} | cut -f 2,3,5,9 > ${outdir}/gene_info.${taxonomy_id}.txt fi echo "Annotating gene_exp.diff ..." splitteR.R --x="${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff" \ --col.x="gene" \ --by.x="," \ --out="${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.spplitted.txt" \ > ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.err.txt mergeR.R --x=${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.spplitted.txt \ --by.x="gene" \ --y=${outdir}/gene_info.${taxonomy_id}.txt \ --by.y="GeneID" \ --all.x \ --print.out.label \ --out=${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.annot.txt \ > ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.out.txt \ 2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.err.txt fi fi ``` Neste script são executados os comandos para alinhamento no TopHat e STAR gerando arquivos ou reads alinhadas em .bam. Na sequência são executados o pré e pós processamento das amostras; trimagem dos adaptadores com o Atropos; avaliação de contaminantes; redução do genoma; montagem dos transcritos usando o cufflinks; e a fusão dos transcritos individuais usando o cuffmerge; finalizando com a quantificação baseada na nova referência usando o cuffquant. Comando para executar o pipeline: ``` ./rnaseq-ref.sh tophat raw/ rnaseq-ref_out/ 12 raw/genome.gtf raw/genome.fa cufflinks NA ``` Diretórios resultantes do pipeline ![](https://i.imgur.com/4xBrBec.png) Todos os arquivos gerados (tree): ``` . |-- align_out_final | |-- SAMPLEA1 | | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA1/accepted_hits.bam | | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA1/Aligned.sorted.bam | |-- SAMPLEA2 | | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA2/accepted_hits.bam | | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA2/Aligned.sorted.bam | |-- SAMPLEB1 | | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB1/accepted_hits.bam | | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB1/Aligned.sorted.bam | `-- SAMPLEB2 | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB2/accepted_hits.bam | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB2/Aligned.sorted.bam |-- align_out_info | |-- SAMPLEA1.tophat.err.txt | |-- SAMPLEA1.tophat.out.txt | |-- SAMPLEA2.tophat.err.txt | |-- SAMPLEA2.tophat.out.txt | |-- SAMPLEB1.tophat.err.txt | |-- SAMPLEB1.tophat.out.txt | |-- SAMPLEB2.tophat.err.txt | `-- SAMPLEB2.tophat.out.txt |-- assembly_GTF_list.txt |-- processed | |-- atropos | | |-- SAMPLEA1.atropos_adapter.log.err.txt | | |-- SAMPLEA1.atropos_adapter.log.out.txt | | |-- SAMPLEA1.atropos.log.err.txt | | |-- SAMPLEA1.atropos.log.out.txt | | |-- SAMPLEA1_R1.atropos_final.fastq | | |-- SAMPLEA1_R2.atropos_final.fastq | | |-- SAMPLEA2.atropos_adapter.log.err.txt | | |-- SAMPLEA2.atropos_adapter.log.out.txt | | |-- SAMPLEA2.atropos.log.err.txt | | |-- SAMPLEA2.atropos.log.out.txt | | |-- SAMPLEA2_R1.atropos_final.fastq | | |-- SAMPLEA2_R2.atropos_final.fastq | | |-- SAMPLEB1.atropos_adapter.log.err.txt | | |-- SAMPLEB1.atropos_adapter.log.out.txt | | |-- SAMPLEB1.atropos.log.err.txt | | |-- SAMPLEB1.atropos.log.out.txt | | |-- SAMPLEB1_R1.atropos_final.fastq | | |-- SAMPLEB1_R2.atropos_final.fastq | | |-- SAMPLEB2.atropos_adapter.log.err.txt | | |-- SAMPLEB2.atropos_adapter.log.out.txt | | |-- SAMPLEB2.atropos.log.err.txt | | |-- SAMPLEB2.atropos.log.out.txt | | |-- SAMPLEB2_R1.atropos_final.fastq | | `-- SAMPLEB2_R2.atropos_final.fastq | |-- cleaned | | |-- SAMPLEA1.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_1.fastq | | |-- SAMPLEA1.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_1_singletons.fastq | | |-- SAMPLEA1.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_2.fastq | | |-- SAMPLEA1.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_2_singletons.fastq | | |-- SAMPLEA2.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_1.fastq | | |-- SAMPLEA2.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_1_singletons.fastq | | |-- SAMPLEA2.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_2.fastq | | |-- SAMPLEA2.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_2_singletons.fastq | | |-- SAMPLEB1.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_1.fastq | | |-- SAMPLEB1.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_1_singletons.fastq | | |-- SAMPLEB1.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_2.fastq | | |-- SAMPLEB1.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_2_singletons.fastq | | |-- SAMPLEB2.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_1.fastq | | |-- SAMPLEB2.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_1_singletons.fastq | | |-- SAMPLEB2.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_2.fastq | | `-- SAMPLEB2.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_2_singletons.fastq | |-- fastqc | | |-- pos | | | |-- SAMPLEA1.atropos_final.prinseq_1_fastqc.html | | | |-- SAMPLEA1.atropos_final.prinseq_1_fastqc.zip | | | |-- SAMPLEA1.atropos_final.prinseq_1.log.err.txt | | | |-- SAMPLEA1.atropos_final.prinseq_1.log.out.txt | | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons_fastqc.html | | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons_fastqc.zip | | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons.log.err.txt | | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons.log.out.txt | | | |-- SAMPLEA1.atropos_final.prinseq_2_fastqc.html | | | |-- SAMPLEA1.atropos_final.prinseq_2_fastqc.zip | | | |-- SAMPLEA1.atropos_final.prinseq_2.log.err.txt | | | |-- SAMPLEA1.atropos_final.prinseq_2.log.out.txt | | | |-- SAMPLEA2.atropos_final.prinseq_1_fastqc.html | | | |-- SAMPLEA2.atropos_final.prinseq_1_fastqc.zip | | | |-- SAMPLEA2.atropos_final.prinseq_1.log.err.txt | | | |-- SAMPLEA2.atropos_final.prinseq_1.log.out.txt | | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons_fastqc.html | | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons_fastqc.zip | | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons.log.err.txt | | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons.log.out.txt | | | |-- SAMPLEA2.atropos_final.prinseq_2_fastqc.html | | | |-- SAMPLEA2.atropos_final.prinseq_2_fastqc.zip | | | |-- SAMPLEA2.atropos_final.prinseq_2.log.err.txt | | | |-- SAMPLEA2.atropos_final.prinseq_2.log.out.txt | | | |-- SAMPLEB1.atropos_final.prinseq_1_fastqc.html | | | |-- SAMPLEB1.atropos_final.prinseq_1_fastqc.zip | | | |-- SAMPLEB1.atropos_final.prinseq_1.log.err.txt | | | |-- SAMPLEB1.atropos_final.prinseq_1.log.out.txt | | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons_fastqc.html | | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons_fastqc.zip | | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons.log.err.txt | | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons.log.out.txt | | | |-- SAMPLEB1.atropos_final.prinseq_2_fastqc.html | | | |-- SAMPLEB1.atropos_final.prinseq_2_fastqc.zip | | | |-- SAMPLEB1.atropos_final.prinseq_2.log.err.txt | | | |-- SAMPLEB1.atropos_final.prinseq_2.log.out.txt | | | |-- SAMPLEB2.atropos_final.prinseq_1_fastqc.html | | | |-- SAMPLEB2.atropos_final.prinseq_1_fastqc.zip | | | |-- SAMPLEB2.atropos_final.prinseq_1.log.err.txt | | | |-- SAMPLEB2.atropos_final.prinseq_1.log.out.txt | | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons_fastqc.html | | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons_fastqc.zip | | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons.log.err.txt | | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons.log.out.txt | | | |-- SAMPLEB2.atropos_final.prinseq_2_fastqc.html | | | |-- SAMPLEB2.atropos_final.prinseq_2_fastqc.zip | | | |-- SAMPLEB2.atropos_final.prinseq_2.log.err.txt | | | `-- SAMPLEB2.atropos_final.prinseq_2.log.out.txt | | `-- pre | | |-- SAMPLEA1_R1_fastqc.html | | |-- SAMPLEA1_R1_fastqc.zip | | |-- SAMPLEA1_R1.log.err.txt | | |-- SAMPLEA1_R1.log.out.txt | | |-- SAMPLEA1_R2_fastqc.html | | |-- SAMPLEA1_R2_fastqc.zip | | |-- SAMPLEA1_R2.log.err.txt | | |-- SAMPLEA1_R2.log.out.txt | | |-- SAMPLEA2_R1_fastqc.html | | |-- SAMPLEA2_R1_fastqc.zip | | |-- SAMPLEA2_R1.log.err.txt | | |-- SAMPLEA2_R1.log.out.txt | | |-- SAMPLEA2_R2_fastqc.html | | |-- SAMPLEA2_R2_fastqc.zip | | |-- SAMPLEA2_R2.log.err.txt | | |-- SAMPLEA2_R2.log.out.txt | | |-- SAMPLEB1_R1_fastqc.html | | |-- SAMPLEB1_R1_fastqc.zip | | |-- SAMPLEB1_R1.log.err.txt | | |-- SAMPLEB1_R1.log.out.txt | | |-- SAMPLEB1_R2_fastqc.html | | |-- SAMPLEB1_R2_fastqc.zip | | |-- SAMPLEB1_R2.log.err.txt | | |-- SAMPLEB1_R2.log.out.txt | | |-- SAMPLEB2_R1_fastqc.html | | |-- SAMPLEB2_R1_fastqc.zip | | |-- SAMPLEB2_R1.log.err.txt | | |-- SAMPLEB2_R1.log.out.txt | | |-- SAMPLEB2_R2_fastqc.html | | |-- SAMPLEB2_R2_fastqc.zip | | |-- SAMPLEB2_R2.log.err.txt | | `-- SAMPLEB2_R2.log.out.txt | `-- prinseq | |-- SAMPLEA1.atropos_final.prinseq_1.fastq | |-- SAMPLEA1.atropos_final.prinseq_1_singletons.fastq | |-- SAMPLEA1.atropos_final.prinseq_2.fastq | |-- SAMPLEA1.atropos_final.prinseq_2_singletons.fastq | |-- SAMPLEA1.atropos_final.prinseq.err.log | |-- SAMPLEA1.atropos_final.prinseq.out.log | |-- SAMPLEA1.atropos_final.prinseq_singletons.fastq | |-- SAMPLEA2.atropos_final.prinseq_1.fastq | |-- SAMPLEA2.atropos_final.prinseq_1_singletons.fastq | |-- SAMPLEA2.atropos_final.prinseq_2.fastq | |-- SAMPLEA2.atropos_final.prinseq_2_singletons.fastq | |-- SAMPLEA2.atropos_final.prinseq.err.log | |-- SAMPLEA2.atropos_final.prinseq.out.log | |-- SAMPLEA2.atropos_final.prinseq_singletons.fastq | |-- SAMPLEB1.atropos_final.prinseq_1.fastq | |-- SAMPLEB1.atropos_final.prinseq_1_singletons.fastq | |-- SAMPLEB1.atropos_final.prinseq_2.fastq | |-- SAMPLEB1.atropos_final.prinseq_2_singletons.fastq | |-- SAMPLEB1.atropos_final.prinseq.err.log | |-- SAMPLEB1.atropos_final.prinseq.out.log | |-- SAMPLEB1.atropos_final.prinseq_singletons.fastq | |-- SAMPLEB2.atropos_final.prinseq_1.fastq | |-- SAMPLEB2.atropos_final.prinseq_1_singletons.fastq | |-- SAMPLEB2.atropos_final.prinseq_2.fastq | |-- SAMPLEB2.atropos_final.prinseq_2_singletons.fastq | |-- SAMPLEB2.atropos_final.prinseq.err.log | |-- SAMPLEB2.atropos_final.prinseq.out.log | `-- SAMPLEB2.atropos_final.prinseq_singletons.fastq |-- tophat_cufflinks | |-- SAMPLEA1 | | |-- cufflinks.err.txt | | |-- cufflinks.out.txt | | |-- genes.fpkm_tracking | | |-- isoforms.fpkm_tracking | | |-- skipped.gtf | | `-- transcripts.gtf | |-- SAMPLEA2 | | |-- cufflinks.err.txt | | |-- cufflinks.out.txt | | |-- genes.fpkm_tracking | | |-- isoforms.fpkm_tracking | | |-- skipped.gtf | | `-- transcripts.gtf | |-- SAMPLEB1 | | |-- cufflinks.err.txt | | |-- cufflinks.out.txt | | |-- genes.fpkm_tracking | | |-- isoforms.fpkm_tracking | | |-- skipped.gtf | | `-- transcripts.gtf | `-- SAMPLEB2 | |-- cufflinks.err.txt | |-- cufflinks.out.txt | |-- genes.fpkm_tracking | |-- isoforms.fpkm_tracking | |-- skipped.gtf | `-- transcripts.gtf |-- tophat_cufflinks_cuffcompare | |-- cuffcmp.combined.gtf | |-- cuffcmp.err.txt | |-- cuffcmp.loci | |-- cuffcmp.out.txt | |-- cuffcmp.stats | |-- cuffcmp.tracking | `-- TCONS.nearest_ref.txt |-- tophat_cufflinks_cuffdiff | |-- bias_params.info | |-- cds.count_tracking | |-- cds.diff | |-- cds_exp.diff | |-- cds.fpkm_tracking | |-- cds.read_group_tracking | |-- cuffdiff.log.err.txt | |-- cuffdiff.log.out.txt | |-- gene_exp.diff | |-- genes.count_tracking | |-- genes.fpkm_tracking | |-- genes.read_group_tracking | |-- isoform_exp.diff | |-- isoform_exp.diff.annot.txt | |-- isoforms.count_tracking | |-- isoforms.fpkm_tracking | |-- isoforms.read_group_tracking | |-- mergeR.isoform.log.err.txt | |-- mergeR.isoform.log.out.txt | |-- promoters.diff | |-- read_groups.info | |-- run.info | |-- splicing.diff | |-- tss_group_exp.diff | |-- tss_groups.count_tracking | |-- tss_groups.fpkm_tracking | |-- tss_groups.read_group_tracking | `-- var_model.info |-- tophat_cufflinks_cuffnorm | |-- cds.attr_table | |-- cds.count_table | |-- cds.fpkm_table | |-- cuffnorm.log.err.txt | |-- cuffnorm.log.out.txt | |-- de-normalize-cuffnorm.genes.err.txt | |-- de-normalize-cuffnorm.genes.out.txt | |-- de-normalize-cuffnorm.isoforms.err.txt | |-- de-normalize-cuffnorm.isoforms.out.txt | |-- genes.attr_table | |-- genes.count_table | |-- genes.fpkm_table | |-- genes.raw_count_table.txt | |-- isoforms.attr_table | |-- isoforms.count_table | |-- isoforms.fpkm_table | |-- isoforms.raw_count_table.txt | |-- run.info | |-- samples.table | |-- tss_groups.attr_table | |-- tss_groups.count_table | `-- tss_groups.fpkm_table |-- tophat_cufflinks_cuffquant | |-- SAMPLEA1 | | |-- abundances.cxb | | |-- cuffquant.log.err.txt | | `-- cuffquant.log.out.txt | |-- SAMPLEA2 | | |-- abundances.cxb | | |-- cuffquant.log.err.txt | | `-- cuffquant.log.out.txt | |-- SAMPLEB1 | | |-- abundances.cxb | | |-- cuffquant.log.err.txt | | `-- cuffquant.log.out.txt | `-- SAMPLEB2 | |-- abundances.cxb | |-- cuffquant.log.err.txt | `-- cuffquant.log.out.txt |-- tophat_cuffmerge | |-- cuffcmp.merged.gtf.refmap | |-- cuffcmp.merged.gtf.tmap | |-- cuffmerge.err.txt | |-- cuffmerge.out.txt | |-- logs | | `-- run.log | `-- merged.gtf |-- tophat_index | |-- bowtie2.err.txt | |-- bowtie2.out.txt | |-- genome.1.bt2 | |-- genome.2.bt2 | |-- genome.3.bt2 | |-- genome.4.bt2 | |-- genome.fa -> /state/partition1/rgmatos/Vvinifera/Refs/raw/genome.fa | |-- genome.rev.1.bt2 | |-- genome.rev.2.bt2 | |-- transcriptome.1.bt2 | |-- transcriptome.2.bt2 | |-- transcriptome.3.bt2 | |-- transcriptome.4.bt2 | |-- transcriptome.fa | |-- transcriptome.fa.tlst | |-- transcriptome.gff | |-- transcriptome.rev.1.bt2 | |-- transcriptome.rev.2.bt2 | `-- transcriptome.ver |-- tophat_out_final | |-- SAMPLEA1 | | |-- accepted_hits.bam | | |-- Aligned.sorted.bam | | |-- Aligned.sorted.err.txt | | |-- Aligned.sorted.out.txt | | `-- Header.txt | |-- SAMPLEA1.log.err.txt | |-- SAMPLEA1.log.out.txt | |-- SAMPLEA2 | | |-- accepted_hits.bam | | |-- Aligned.sorted.bam | | |-- Aligned.sorted.err.txt | | |-- Aligned.sorted.out.txt | | `-- Header.txt | |-- SAMPLEA2.log.err.txt | |-- SAMPLEA2.log.out.txt | |-- SAMPLEB1 | | |-- accepted_hits.bam | | |-- Aligned.sorted.bam | | |-- Aligned.sorted.err.txt | | |-- Aligned.sorted.out.txt | | `-- Header.txt | |-- SAMPLEB1.log.err.txt | |-- SAMPLEB1.log.out.txt | |-- SAMPLEB2 | | |-- accepted_hits.bam | | |-- Aligned.sorted.bam | | |-- Aligned.sorted.err.txt | | |-- Aligned.sorted.out.txt | | `-- Header.txt | |-- SAMPLEB2.log.err.txt | `-- SAMPLEB2.log.out.txt |-- tophat_out_pe | |-- SAMPLEA1 | | |-- accepted_hits.bam | | |-- align_summary.txt | | |-- deletions.bed | | |-- insertions.bed | | |-- junctions.bed | | |-- logs | | | |-- bam_merge_um.log | | | |-- bowtie_build.log | | | |-- bowtie.left_kept_reads.log | | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | | |-- bowtie.right_kept_reads.log | | | |-- bowtie.right_kept_reads.m2g_um_seg1.log | | | |-- bowtie.right_kept_reads.m2g_um_seg2.log | | | |-- bowtie.right_kept_reads.m2g_um_seg3.log | | | |-- bowtie.right_kept_reads.m2g_um_seg4.log | | | |-- bowtie.right_kept_reads.m2g_um_seg5.log | | | |-- bowtie.right_kept_reads.m2g_um_seg6.log | | | |-- bowtie.SAMPLEA1.atropos_final.prinseq.cleaned_1.log | | | |-- bowtie.SAMPLEA1.atropos_final.prinseq.cleaned_2.log | | | |-- gtf_juncs.log | | | |-- juncs_db.log | | | |-- long_spanning_reads.segs.log | | | |-- m2g_left_kept_reads.err | | | |-- m2g_left_kept_reads.out | | | |-- m2g_right_kept_reads.err | | | |-- m2g_right_kept_reads.out | | | |-- prep_reads.from_preflt.left.log | | | |-- prep_reads.from_preflt.right.log | | | |-- prep_reads.log | | | |-- prep_reads.prefilter_left.log | | | |-- prep_reads.prefilter_right.log | | | |-- reports.log | | | |-- run.log | | | |-- segment_juncs.log | | | `-- tophat.log | | |-- prep_reads.info | | `-- unmapped.bam | |-- SAMPLEA1.log.err.txt | |-- SAMPLEA1.log.out.txt | |-- SAMPLEA2 | | |-- accepted_hits.bam | | |-- align_summary.txt | | |-- deletions.bed | | |-- insertions.bed | | |-- junctions.bed | | |-- logs | | | |-- bam_merge_um.log | | | |-- bowtie_build.log | | | |-- bowtie.left_kept_reads.log | | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | | |-- bowtie.right_kept_reads.log | | | |-- bowtie.right_kept_reads.m2g_um_seg1.log | | | |-- bowtie.right_kept_reads.m2g_um_seg2.log | | | |-- bowtie.right_kept_reads.m2g_um_seg3.log | | | |-- bowtie.right_kept_reads.m2g_um_seg4.log | | | |-- bowtie.right_kept_reads.m2g_um_seg5.log | | | |-- bowtie.right_kept_reads.m2g_um_seg6.log | | | |-- bowtie.SAMPLEA2.atropos_final.prinseq.cleaned_1.log | | | |-- bowtie.SAMPLEA2.atropos_final.prinseq.cleaned_2.log | | | |-- g2f.err | | | |-- g2f.out | | | |-- gtf_juncs.log | | | |-- juncs_db.log | | | |-- long_spanning_reads.segs.log | | | |-- m2g_left_kept_reads.err | | | |-- m2g_left_kept_reads.out | | | |-- m2g_right_kept_reads.err | | | |-- m2g_right_kept_reads.out | | | |-- prep_reads.from_preflt.left.log | | | |-- prep_reads.from_preflt.right.log | | | |-- prep_reads.log | | | |-- prep_reads.prefilter_left.log | | | |-- prep_reads.prefilter_right.log | | | |-- reports.log | | | |-- run.log | | | |-- segment_juncs.log | | | `-- tophat.log | | |-- prep_reads.info | | `-- unmapped.bam | |-- SAMPLEA2.log.err.txt | |-- SAMPLEA2.log.out.txt | |-- SAMPLEB1 | | |-- accepted_hits.bam | | |-- align_summary.txt | | |-- deletions.bed | | |-- insertions.bed | | |-- junctions.bed | | |-- logs | | | |-- bam_merge_um.log | | | |-- bowtie_build.log | | | |-- bowtie.left_kept_reads.log | | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | | |-- bowtie.right_kept_reads.log | | | |-- bowtie.right_kept_reads.m2g_um_seg1.log | | | |-- bowtie.right_kept_reads.m2g_um_seg2.log | | | |-- bowtie.right_kept_reads.m2g_um_seg3.log | | | |-- bowtie.right_kept_reads.m2g_um_seg4.log | | | |-- bowtie.right_kept_reads.m2g_um_seg5.log | | | |-- bowtie.right_kept_reads.m2g_um_seg6.log | | | |-- bowtie.SAMPLEB1.atropos_final.prinseq.cleaned_1.log | | | |-- bowtie.SAMPLEB1.atropos_final.prinseq.cleaned_2.log | | | |-- gtf_juncs.log | | | |-- juncs_db.log | | | |-- long_spanning_reads.segs.log | | | |-- m2g_left_kept_reads.err | | | |-- m2g_left_kept_reads.out | | | |-- m2g_right_kept_reads.err | | | |-- m2g_right_kept_reads.out | | | |-- prep_reads.from_preflt.left.log | | | |-- prep_reads.from_preflt.right.log | | | |-- prep_reads.log | | | |-- prep_reads.prefilter_left.log | | | |-- prep_reads.prefilter_right.log | | | |-- reports.log | | | |-- run.log | | | |-- segment_juncs.log | | | `-- tophat.log | | |-- prep_reads.info | | `-- unmapped.bam | |-- SAMPLEB1.log.err.txt | |-- SAMPLEB1.log.out.txt | |-- SAMPLEB2 | | |-- accepted_hits.bam | | |-- align_summary.txt | | |-- deletions.bed | | |-- insertions.bed | | |-- junctions.bed | | |-- logs | | | |-- bam_merge_um.log | | | |-- bowtie_build.log | | | |-- bowtie.left_kept_reads.log | | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | | |-- bowtie.right_kept_reads.log | | | |-- bowtie.right_kept_reads.m2g_um_seg1.log | | | |-- bowtie.right_kept_reads.m2g_um_seg2.log | | | |-- bowtie.right_kept_reads.m2g_um_seg3.log | | | |-- bowtie.right_kept_reads.m2g_um_seg4.log | | | |-- bowtie.right_kept_reads.m2g_um_seg5.log | | | |-- bowtie.right_kept_reads.m2g_um_seg6.log | | | |-- bowtie.SAMPLEB2.atropos_final.prinseq.cleaned_1.log | | | |-- bowtie.SAMPLEB2.atropos_final.prinseq.cleaned_2.log | | | |-- gtf_juncs.log | | | |-- juncs_db.log | | | |-- long_spanning_reads.segs.log | | | |-- m2g_left_kept_reads.err | | | |-- m2g_left_kept_reads.out | | | |-- m2g_right_kept_reads.err | | | |-- m2g_right_kept_reads.out | | | |-- prep_reads.from_preflt.left.log | | | |-- prep_reads.from_preflt.right.log | | | |-- prep_reads.log | | | |-- prep_reads.prefilter_left.log | | | |-- prep_reads.prefilter_right.log | | | |-- reports.log | | | |-- run.log | | | |-- segment_juncs.log | | | `-- tophat.log | | |-- prep_reads.info | | `-- unmapped.bam | |-- SAMPLEB2.log.err.txt | `-- SAMPLEB2.log.out.txt `-- tophat_out_se |-- SAMPLEA1 | |-- accepted_hits.bam | |-- align_summary.txt | |-- deletions.bed | |-- insertions.bed | |-- junctions.bed | |-- logs | | |-- bowtie_build.log | | |-- bowtie.left_kept_reads.log | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | |-- bowtie.singletons.log | | |-- gtf_juncs.log | | |-- juncs_db.log | | |-- long_spanning_reads.segs.log | | |-- m2g_left_kept_reads.err | | |-- m2g_left_kept_reads.out | | |-- prep_reads.from_preflt.left.log | | |-- prep_reads.log | | |-- prep_reads.prefilter_left.log | | |-- reports.log | | |-- run.log | | |-- segment_juncs.log | | `-- tophat.log | |-- prep_reads.info | |-- singletons.fastq | `-- unmapped.bam |-- SAMPLEA1.log.err.txt |-- SAMPLEA1.log.out.txt |-- SAMPLEA2 | |-- accepted_hits.bam | |-- align_summary.txt | |-- deletions.bed | |-- insertions.bed | |-- junctions.bed | |-- logs | | |-- bowtie_build.log | | |-- bowtie.left_kept_reads.log | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | |-- bowtie.singletons.log | | |-- gtf_juncs.log | | |-- juncs_db.log | | |-- long_spanning_reads.segs.log | | |-- m2g_left_kept_reads.err | | |-- m2g_left_kept_reads.out | | |-- prep_reads.from_preflt.left.log | | |-- prep_reads.log | | |-- prep_reads.prefilter_left.log | | |-- reports.log | | |-- run.log | | |-- segment_juncs.log | | `-- tophat.log | |-- prep_reads.info | |-- singletons.fastq | `-- unmapped.bam |-- SAMPLEA2.log.err.txt |-- SAMPLEA2.log.out.txt |-- SAMPLEB1 | |-- accepted_hits.bam | |-- align_summary.txt | |-- deletions.bed | |-- insertions.bed | |-- junctions.bed | |-- logs | | |-- bowtie_build.log | | |-- bowtie.left_kept_reads.log | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | |-- bowtie.singletons.log | | |-- gtf_juncs.log | | |-- juncs_db.log | | |-- long_spanning_reads.segs.log | | |-- m2g_left_kept_reads.err | | |-- m2g_left_kept_reads.out | | |-- prep_reads.from_preflt.left.log | | |-- prep_reads.log | | |-- prep_reads.prefilter_left.log | | |-- reports.log | | |-- run.log | | |-- segment_juncs.log | | `-- tophat.log | |-- prep_reads.info | |-- singletons.fastq | `-- unmapped.bam |-- SAMPLEB1.log.err.txt |-- SAMPLEB1.log.out.txt |-- SAMPLEB2 | |-- accepted_hits.bam | |-- align_summary.txt | |-- deletions.bed | |-- insertions.bed | |-- junctions.bed | |-- logs | | |-- bowtie_build.log | | |-- bowtie.left_kept_reads.log | | |-- bowtie.left_kept_reads.m2g_um_seg1.log | | |-- bowtie.left_kept_reads.m2g_um_seg2.log | | |-- bowtie.left_kept_reads.m2g_um_seg3.log | | |-- bowtie.left_kept_reads.m2g_um_seg4.log | | |-- bowtie.left_kept_reads.m2g_um_seg5.log | | |-- bowtie.left_kept_reads.m2g_um_seg6.log | | |-- bowtie.singletons.log | | |-- gtf_juncs.log | | |-- juncs_db.log | | |-- long_spanning_reads.segs.log | | |-- m2g_left_kept_reads.err | | |-- m2g_left_kept_reads.out | | |-- prep_reads.from_preflt.left.log | | |-- prep_reads.log | | |-- prep_reads.prefilter_left.log | | |-- reports.log | | |-- run.log | | |-- segment_juncs.log | | `-- tophat.log | |-- prep_reads.info | |-- singletons.fastq | `-- unmapped.bam |-- SAMPLEB2.log.err.txt `-- SAMPLEB2.log.out.txt ``` **MONTAGEM DO TRANSCRIPTOMA *DE NOVO*** #Aplicando o pipeline rnaseq-novo.sh ``` #!/bin/bash # input - diretório contendo os arquivos de entrada no formato .fastq input=$1 # validação do parâmetro "input" if [ ! ${input} ] then echo "[ERROR] Missing input directory." 1>&2 exit else if [ ! -d ${input} ] then echo "[ERROR] Wrong input directory (${input})." 1>&2 exit fi fi # output - diretório para armazenar o resultado do processo de montagem output=$2 # validação do parâmetro "output" if [ ! ${output} ] then echo "[ERROR] Missing output directory." 1>&2 exit else if [ ! -d ${output} ] then echo "[ERROR] Wrong output directory (${output})." 1>&2 exit fi fi # Número de CORES para o processamento # ATENÇÃO: Não exceder o limite da máquina THREADS=$3 if [ ! ${THREADS} ]; then echo "[ERROR] Missing number of threads." 1>&2 exit fi # Quantidade de memória para o processamento com Jellyfish # ATENÇÃO: Não exceder o limite da máquina MEM=$4 if [ ! ${MEM} ]; then echo "[ERROR] Missing memory." 1>&2 exit fi ### # Arquivos e diretórios de saída (output) # basedir_out="${output}/" renamed_out="${basedir_out}/renamed" trinity_out="${basedir_out}/trinity_assembled" mkdir -p ${renamed_out} mkdir -p ${trinity_out} left=() left_singleton=() right=() right_singleton=() echo "Performing renaming step ..." for fastq in `ls ${input}/*.fastq`; do # obtendo nome do arquivo fastqbn=`basename ${fastq}`; if [[ ! $fastqbn =~ \.bad_ ]]; then renamed_fastq="${renamed_out}/${fastqbn}" if [ ! -e ${renamed_fastq} ]; then echo -e "\tRenaming ${fastqbn} ..." if [[ ${fastqbn} =~ _1[\._] ]]; then awk '{ if (NR%4==1) { if ($1!~/\/1$/) { print $1"/1" } else { print $1 } } else if (NR%4==3) { print "+" } else { print $1 } }' ${fastq} > ${renamed_fastq} elif [[ ${fastqbn} =~ _2[\._] ]]; then awk '{ if (NR%4==1) { if ($1!~/\/2$/) { print $1"/2" } else { print $1 } } else if (NR%4==3) { print "+" } else { print $1 } }' ${fastq} > ${renamed_fastq} else echo "Warning: ${fastqbn} discarded!" fi fi if [[ ${fastqbn} =~ _1[\._] ]]; then if [[ ${fastqbn} =~ singletons ]]; then if [ -s ${renamed_fastq} ]; then left_singleton=($(printf "%s\n" ${left_singleton[@]} ${renamed_fastq} | sort -u )) fi else left=($(printf "%s\n" ${left[@]} ${renamed_fastq} | sort -u )) fi elif [[ ${fastqbn} =~ _2[\._] ]]; then if [[ ${fastqbn} =~ singleton ]]; then if [ -s ${renamed_fastq} ]; then right_singleton=($(printf "%s\n" ${right_singleton[@]} ${renamed_fastq} | sort -u )) fi else right=($(printf "%s\n" ${right[@]} ${renamed_fastq} | sort -u )) fi else echo "Warning: ${fastqbn} discarded!" fi fi done #for l in ${left[@]}; do # echo -e "L: ${l}"; #done # #for r in ${right[@]}; do # echo -e "R: ${r}"; #done # #for ls in ${left_singleton[@]}; do # echo -e "LS: ${ls}"; #done # #for rs in ${right_singleton[@]}; do # echo -e "RS: ${rs}"; #done if [ ! -d ${trinity_out}/Trinity.fasta ]; then echo -e "Assembling step (Trinity) ..." rm -fr ${trinity_out} mkdir -p ${trinity_out} Trinity --output ${trinity_out} \ --seqType fq \ --max_memory ${MEM} \ --CPU ${THREADS} \ --min_per_id_same_path 98 \ --max_diffs_same_path 2 \ --path_reinforcement_distance 10 \ --group_pairs_distance 500 \ --min_kmer_cov 3 \ --min_glue 5 \ --min_contig_length 300 \ --left $(IFS=, ; echo "${left[*]},${left_singleton[*]},${right_singleton[*]}") \ --right $(IFS=, ; echo "${right[*]}") \ > ${trinity_out}/Trinity.log.out.txt \ 2> ${trinity_out}/Trinity.log.err.txt fi ``` Diante das duas condições biológicas, após a aplicação dos pipelines, os valores a serem considerados são os valores de significância, valores brutos de expressão para cada amostra, valor normalizado por condição, log2 fold change, conforme seguem na tabela: ![](https://i.imgur.com/FAs9keT.png) Na tabela, é indicado os valores da abundância A e B submetido para simulação. É informado as diferenças de cada amostra, o esperado das proporções se indicaram redução ou aumento da expressão dos genes selecionados para simulação.