**RELATÓRIO DE DISCIPLINA - BIOINFORMÁTICA APLICADA II: ANÁLISE DE TRANSCRIPTOMAS**
```
Aluno: Ramon Guedes Matos
Docente: Daniel Guariz Pinheiro
```
Espécie escolhida para analisar o transcriptoma: *Vitis vinifera* (uva)
Organela-alvo: mitocôndria
Taxonomy ID: 29760
**Obtenção do genoma referência:**
```
wget ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/003/745/GCF_000003745.3_12X/GCF_000003745.3_12X_genomic.fna.gz
```
#Descompactados os arquivos .fna.gz e .gff.gz (gunzip *)
**Limpeza, ajustes e conversão das referências:**
```
nano cleanfasta.sh
```
```
#!/bin/bash
infile=$1
if [ ! ${infile} ]; then
echo "Missing input fasta file"
exit
fi
if [ ! -e ${infile} ]; then
echo "Not found input fasta file (${infile})"
exit
fi
sed 's/^>\([^\.]\+\).*/>\1/' ${infile}
```
Execução do script:
```
chmod a+x cleanfasta.sh
```
**Limpeza dos cabeçalhos e minimização dos arquivos fasta/genômicos:**
```
./cleanfasta.sh GCF_000003745.3_12X_genomic.fna > genome.fa
fixNCBIgff.sh GCF_000003745.3_12X_genomic.gff genome.gff
gffread genome.gff -g genome.fa -T -o genome.gtf
gffread genome.gff -g genome.fa -w transcriptome.fa
grep '^>' GCF_000003745.3_12X_genomic.fna | sed 's/^>\([^.]\+\)\.[0-9]\+ /\1\t/' > genome.txt
```
#Total de genes anotados no genoma: **29963** | comando:
```
grep -c -P '\tgene\t' genome.gff
```
#Cromossomo 1 a ser trabalhado: NC_012007
**Identificação de acesso dos transcritos/isoformas:**
Obs.: foram selecionados 3 genes com dois transcritos cada
```
nano ACCS.txt
```

A primeira coluna contém os transcritos/isoformas e a segunda seus respectivos genes.
**O script a seguir foi executado para pegar o fasta dos transcritos e anexar ao arquivo ‘transcriptoma.fa’:**
```
#!/bin/bash
rm -f transcriptoma.fa
for acc in XM_010651729.2 XM_010651725.2 XM_010651713.2 XM_002263575.3 XM_010651
756.2 XM_010651752.2 ; do
echo "Pegando FASTA para ${acc} ..."
esearch -db nucleotide -query ${acc} | efetch \
-format fasta >> transcriptoma.fa
done
```
**Para verificar se o processo foi sucedido, é exibido uma linha com as informações para cada transcrito selecionado:**
```
grep ‘^>’ transcriptoma.fa
```

**Para condição biológica A e B, foram criados arquivos de abundância para cada transcrito:**

**Os resultados das somas de cada abundância tem que ser 1:**
```
perl -F"\t" -lane 'INIT{$sum=0;} $sum+=$F[1]; END{print "SOMA: $sum"; } ' abundance_A.txt
perl -F"\t" -lane 'INIT{$sum=0;} $sum+=$F[1]; END{print "SOMA: $sum"; } ' abundance_B.txt
```

**O genoma referência foi reduzido para somente o cromossomo selecionado para a análise (NC_012007) para arquivos em formato fasta e gff:**
```
echo -e "NC_012007" | pullseq -N -i genome.fa > toygenome.fa
grep -P '^(##gff|NC_012007\t)' genome.gff > toygenome.gff
```
**Gerando arquivos .FASTQ das réplicas dos dados das abundâncias dos transcritos.**
O script para eucariotos foi utilizado:
```
#!/bin/bash
#
# INGLÊS/ENGLISH
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# http://www.gnu.org/copyleft/gpl.html
#
#
# PORTUGUÊS/PORTUGUESE
# Este programa é distribuído na expectativa de ser útil aos seus
# usuários, porém NÃO TEM NENHUMA GARANTIA, EXPLÍCITAS OU IMPLÍCITAS,
# COMERCIAIS OU DE ATENDIMENTO A UMA DETERMINADA FINALIDADE. Consulte
# a Licença Pública Geral GNU para maiores detalhes.
# http://www.gnu.org/copyleft/gpl.html
#
# Copyright (C) 2012 Universidade de São Paulo
#
# Universidade de São Paulo
# Laboratório de Biologia do Desenvolvimento de Abelhas
# Núcleo de Bioinformática (LBDA-BioInfo)
#
# Daniel Guariz Pinheiro
# dgpinheiro@gmail.com
# http://zulu.fmrp.usp.br/bioinfo
#
nreps=$1
if [ ! ${nreps} ]; then
nreps=2
else
let nreps+=0
if [ ${nreps} -lt 2 ]; then
echo "[ERROR] Invalid number of replicates (${nreps})" 1>&2
exit
fi
fi
nreads=$2
if [ ! ${nreads} ]; then
nreads=25000
else
let nreads+=0
if [ ${nreads} -lt 1 ]; then
echo "[ERROR] Invalid number of reads (${nreads})" 1>&2
exit
fi
fi
echo "[WARN] Using ${nreps} biological replicates with ${nreads} reads." 1>&2
rm -f transcriptoma.fa
IFS=$'\n'
for acc in $(cut -f 1 ./ACCS.txt); do
echo "Pegando FASTA para ${acc} ..."
esearch -db nucleotide -query ${acc} | efetch \
-format fasta >> transcriptoma.fa
done
for biogroup in A B; do
for rep in `seq 1 ${nreps}`; do
echo "Gerando reads para amostra ${biogroup} réplica ${rep} ..."
generate_fragments.py -r transcriptoma.fa \
-a ./abundance_${biogroup}.txt \
-o ./tmp.frags_${biogroup}_${rep} \
-t ${nreads} \
-i 300 \
-s 30
cat ./tmp.frags_${biogroup}_${rep}.1.fasta | renameSeqs.pl \
-if FASTA \
-of FASTA \
-p SAMPLE${biogroup}${rep} \
-w 1000 | \
sed 's/^>\(\S\+\).*/>\1/' \
> ./frags_${biogroup}${rep}.fa
cat ./frags_${biogroup}${rep}.fa | simNGS -a \
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
-p paired \
/usr/local/bioinfo/simNGS/data/s_4_0099.runfile \
-n 151 > ./SAMPLE${biogroup}${rep}.fastq 2> SAMPLE${biogroup}${rep}.err.txt
mkdir -p ./raw
deinterleave_pairs SAMPLE${biogroup}${rep}.fastq \
-o ./raw/SAMPLE${biogroup}${rep}_R1.fastq \
./raw/SAMPLE${biogroup}${rep}_R2.fastq
rm -f ./tmp.frags_${biogroup}_${rep}.1.fasta ./frags_${biogroup}${rep}.fa ./SAMPLE${biogroup}${rep}.fastq ./SAMPLE${biogroup}${rep}.err.txt
echo "Número de reads ${biogroup}${rep} R1:" $(echo "$(cat raw/SAMPLE${biogroup}${rep}_R1.fastq | wc -l)/4" | bc)
echo "Número de reads ${biogroup}${rep} R2:" $(echo "$(cat raw/SAMPLE${biogroup}${rep}_R2.fastq | wc -l)/4" | bc)
done
done
```
Em seguida, executou-se o script:
```
./sim.sh
```
**Resultado (gerou o diretório raw com os .fastq):**

Total de 65056 reads!
#**Alinhamento com o Bowtie2**
1. Indexação do genoma:
```
bowtie2-build -f genome.fa genome
```
2. Criação do diretório para colocar todos os índices gerados:
```
mkdir ./refs
```
3. Alinhamento das amostras (SAMPLEA-SAMPLEB):
```
time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEA1_R1.fastq -2 ./raw/SAMPLEA1_R2.fastq -S bowtie2-A1.sam
time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEA2_R1.fastq -2 ./raw/SAMPLEA2_R2.fastq -S bowtie2-A2.sam
time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEB1_R1.fastq -2 ./raw/SAMPLEB1_R2.fastq -S bowtie2-B1.sam
time bowtie2 -x ./refs/genome -q -1 ./raw/SAMPLEB2_R1.fastq -2 ./raw/SAMPLEB2_R2.fastq -S bowtie2-B2.sam
```
Conversão dos arquivos **.sam** para **.bam** (A1[test], A2, B1, B2):
```
samtools view -b bowtie2-B2.sam > bowtie2-B2.bam
samtools sort bowtie2-B2.bam -o bowtie2-B2-sorted.bam
samtools index bowtie2-B2-sorted.bam
samtools view -b bowtie2-test.sam > bowtie2-test.bam
samtools sort bowtie2-test.bam -o bowtie2-test-sorted.bam
samtools index bowtie2-test-sorted.bam
samtools view -b bowtie2-A2.sam > bowtie2-A2.bam
samtools sort bowtie2-A2.bam -o bowtie2-A2-sorted.bam
samtools index bowtie2-A2-sorted.bam
samtools view -b bowtie2-B1.sam > bowtie2-B1.bam
samtools sort bowtie2-B1.bam -o bowtie2-B1-sorted.bam
samtools index bowtie2-B1-sorted.bam
samtools view -b bowtie2-B2.sam > bowtie2-B2.bam
samtools sort bowtie2-B2.bam -o bowtie2-B2-sorted.bam
samtools index bowtie2-B2-sorted.bam
```
Arquivos resultantes do alinhamento:

**Visualizando alinhamentos no IGV**
Os arquivos .BAM e .BAI e o genoma.fa foram transferidos do Putty para o computador (sistema operacional Windows) via WinSCP. No IGV foram importados os arquivos para visualização das reads contra o genoma. Segue um exemplo de uma amostra:


**MONTAGEM DO TRANSCRIPTOMA COM REFERÊNCIA**
#Aplicando o pipeline rnaseq-ref.sh
```
#!/bin/bash
# ALINHADOR tophat OU star
aligner=${1}
if [ ! ${aligner} ]; then
echo "[ERROR] Missing aligner (tophat or star)." 1>&2
exit
fi
if [ "${aligner}" != "tophat" ] &&
[ "${aligner}" != "star" ]; then
echo "[ERROR] Aligner must be \"tophat\" or \"star\" (${aligner})." 1>&2
exit
fi
indir=${2}
# SE ${indir} NÃO EXISTE, OU SEJA, SE NÃO FOI PASSADO ARGUMENTO 1 NA LINHA DE CO
MANDO
if [ ! ${indir} ]; then
echo "[ERROR] Missing input directory." 1>&2
exit
fi
# SE ${indir} NÃO É DIRETÓRIO
if [ ! -d ${indir} ]; then
echo "[ERROR] Wrong input directory (${indir})." 1>&2
exit
fi
outdir=${3}
# SE ${outdir} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 2 NA LINHA DE COMANDO
if [ ! ${outdir} ]; then
echo "[ERROR] Missing output directory." 1>&2
exit
fi
# SE ${outdir} NÃO É DIRETÓRIO
if [ ! -d ${outdir} ]; then
echo "[ERROR] Wrong output directory (${outdir})." 1>&2
exit
fi
# Número de CORES para o processamento
# ATENÇÃO: Não exceder o limite da máquina
THREADS=${4}
if [ ! ${THREADS} ]; then
echo "[ERROR] Missing number of threads." 1>&2
exit
fi
refgtf=${5}
# SE ${refgtf} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 3 NA LINHA DE COMANDO
if [ ! ${refgtf} ]; then
echo "[ERROR] Missing GTF file." 1>&2
exit
fi
if [ ! -e "${refgtf}" ]; then
echo "[ERROR] Not found GTF file (${refgtf})." 1>&2
exit
fi
refseq=${6}
# SE ${refseq} NÃO EXISTE, SE NÃO FOI PASSADO ARGUMENTO 4 NA LINHA DE COMANDO
if [ ! ${refseq} ]; then
echo "[ERROR] Missing GENOME fasta file." 1>&2
exit
fi
if [ ! -e "${refseq}" ]; then
echo "Not found GENOME fasta file (${refseq})." 1>&2
exit
fi
# Opção cufflinks/stringtie
assembler=${7}
if [ ! ${assembler} ]; then
echo "[ERROR] Missing assembler (cufflinks or stringtie)." 1>&2
exit
fi
if [ "${assembler}" != "cufflinks" ] &&
[ "${assembler}" != "stringtie" ]; then
echo "[ERROR] Assembler must be \"cufflinks\" or \"stringtie\" (${assembler})." 1>&2
exit
fi
# Contaminantes
contaminants=${8}
if [ ! ${contaminants} ]; then
echo "[ERROR] Missing contaminant info. Please set \"NA\" here for execution without contaminants." 1>&2
exit
fi
if [ "${contaminants}" == "NA" ]; then
contaminants=""
fi
./preprocess5.sh "${indir}" "${outdir}" "${THREADS}" ${contaminants}
# gene_info
gene_info=${9}
# taxonomy_id
taxonomy_id=${10}
echo -e "Starting Transcriptome Assembly ..."
# Criação de estrutura de diretórios
curdir=`pwd`
refseq_abs_path=$(readlink -f ${refseq})
if [ "${aligner}" == "tophat" ]; then
mkdir -p ${outdir}/tophat_index
mkdir -p ${outdir}/tophat_out_pe
mkdir -p ${outdir}/tophat_out_se
mkdir -p ${outdir}/tophat_out_final
if [ ! -e "${outdir}/tophat_index/genome.fa" ]; then
cd ${outdir}/tophat_index
ln -s ${refseq_abs_path} genome.fa
cd ${curdir}
fi
if [ ! -e "${outdir}/tophat_index/genome.1.bt2" ]; then
echo -e "Indexing genome with TopHat2 ..."
cd ${outdir}/tophat_index
bowtie2-build --threads ${THREADS} \
genome.fa genome > bowtie2.out.txt 2> bowtie2.err.txt
cd ${curdir}
fi
else
# CASO CONTRÁRIO SERÁ star
mkdir -p ${outdir}/star_index
mkdir -p ${outdir}/star_out_pe
mkdir -p ${outdir}/star_out_se
mkdir -p ${outdir}/star_out_final
if [ ! -e "${outdir}/star_index/genome.fa" ]; then
cd ${outdir}/star_index
ln -s ${refseq_abs_path} genome.fa
cd ${curdir}
fi
if [ ! -e "${outdir}/star_index/SAindex" ]; then
absrefgtf=`readlink -f ${refgtf}`
cd ${outdir}/star_index
echo -e "Indexing genome with STAR ..."
STAR --runThreadN ${THREADS} \
--runMode genomeGenerate \
--genomeFastaFiles genome.fa \
--genomeDir ./ \
--sjdbGTFfile ${absrefgtf} \
--genomeSAindexNbases 12 \
--sjdbOverhang 149 \
> STAR.genomeGenerate.log.out.txt \
2> STAR.genomeGenerate.log.err.txt
cd ${curdir}
fi
fi
if [ "${assembler}" == "cufflinks" ]; then
mkdir -p ${outdir}/${aligner}_cufflinks
mkdir -p ${outdir}/${aligner}_cuffmerge
elif [ "${assembler}" == "stringtie" ]; then
mkdir -p ${outdir}/${aligner}_stringtie
mkdir -p ${outdir}/${aligner}_stringmerge
else
echo "[ERROR] Unexpected error!" 1>&2
exit
fi
mkdir -p ${outdir}/${aligner}_${assembler}_cuffcompare
mkdir -p ${outdir}/${aligner}_${assembler}_cuffquant
mkdir -p ${outdir}/${aligner}_${assembler}_cuffnorm
mkdir -p ${outdir}/${aligner}_${assembler}_cuffdiff
for r1 in `find ${outdir}/ -name '*.prinseq.cleaned_1.fastq'`; do
r2=`echo ${r1} | sed 's/prinseq.cleaned_1.fastq/prinseq.cleaned_2.fastq/'`
if [ ! -e ${r2} ]; then
echo "[ERROR] Not found R2 (${r2})." 1>&2
exit
fi
echo -e "\tFound R1 ($(basename ${r1})) & R2 ($(basename ${r2})) ..."
r1_singletons=`echo ${r1} | sed 's/prinseq.cleaned_1.fastq/prinseq.cleaned_1_singletons.fastq/'`
r2_singletons=`echo ${r2} | sed 's/prinseq.cleaned_2.fastq/prinseq.cleaned_2_singletons.fastq/'`
if [ ! -e ${r1_singletons} ]; then
echo "[ERROR] Not found R1 singletons (${r1_singletons})." 1>&2
exit
fi
if [ ! -e ${r2_singletons} ]; then
echo "[ERROR] Not found R2 singletons (${r2_singletons})." 1>&2
exit
fi
name=`basename ${r1} .fastq | sed 's/.atropos_final.prinseq.cleaned_1//'`
mkdir -p ${outdir}/align_out_final/${name}
if [ "${aligner}" == "tophat" ]; then
if [ ! -e "${outdir}/tophat_out_pe/${name}/accepted_hits.bam" ]; then
echo -e "\tTopHat2 alignment (${name}) paired-end reads X genome ..."
tophat2 --min-anchor 8 \
--min-intron-length 50 \
--max-intron-length 5000 \
--max-multihits 20 \
--transcriptome-max-hits 10 \
--prefilter-multihits \
--num-threads ${THREADS} \
--GTF ${refgtf} \
--transcriptome-index ${outdir}/tophat_index/transcriptome \
--mate-inner-dist 0 \
--mate-std-dev 50 \
--coverage-search \
--microexon-search \
--b2-very-sensitive \
--library-type fr-unstranded \
--output-dir ${outdir}/tophat_out_pe/${name} \
--no-sort-bam \
${outdir}/tophat_index/genome \
${r1} \
${r2} > ${outdir}/tophat_out_pe/${name}.log.out.txt \
2> ${outdir}/tophat_out_pe/${name}.log.err.txt
else
echo -e "\tFound Tophat2 output for PE (${name})..."
fi
if [ ! -e "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then
mkdir -p ${outdir}/tophat_out_se/${name}
cat ${r1_singletons} ${r2_singletons} > ${outdir}/tophat_out_se/${name}/singletons.fastq
if [ -s "${outdir}/tophat_out_se/${name}/singletons.fastq" ]; then
echo -e "\tTopHat2 alignment (${name}) singleton reads X genome ..."
tophat2 --min-anchor 8 \
--min-intron-length 50 \
--max-intron-length 5000 \
--max-multihits 20 \
--transcriptome-max-hits 10 \
--prefilter-multihits \
--num-threads ${THREADS} \
--GTF ${refgtf} \
--transcriptome-index ${outdir}/tophat_index/transcriptome \
--coverage-search \
--microexon-search \
--b2-very-sensitive \
--library-type fr-unstranded \
--output-dir ${outdir}/tophat_out_se/${name} \
--no-sort-bam \
${outdir}/tophat_index/genome \
${outdir}/tophat_out_se/${name}/singletons.fastq \
> ${outdir}/tophat_out_se/${name}.log.out.txt \
2> ${outdir}/tophat_out_se/${name}.log.err.txt
# Considerar a implementação do TopHat-Recondition https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1058-x
fi
else
echo -e "\tFound Tophat2 output for SE (${name})..."
fi
if [ ! -e "${outdir}/tophat_out_final/${name}/accepted_hits.bam" ]; then
mkdir -p ${outdir}/tophat_out_final/${name}
if [ -s "${outdir}/tophat_out_pe/${name}/accepted_hits.bam" ]; then
if [ -s "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then
echo -e "\tMerging TopHat2 results ..."
samtools view -H ${outdir}/tophat_out_pe/${name}/accepted_hits.bam > ${outdir}/tophat_out_final/${name}/Header.txt
samtools merge -n --threads ${THREADS} \
-h ${outdir}/tophat_out_final/${name}/Header.txt \
${outdir}/tophat_out_final/${name}/accepted_hits.bam \
${outdir}/tophat_out_pe/${name}/accepted_hits.bam \
${outdir}/tophat_out_se/${name}/accepted_hits.bam \
> ${outdir}/tophat_out_final/${name}.log.out.txt \
2> ${outdir}/tophat_out_final/${name}.log.err.txt
else
pe_result_abs_path=$(readlink -f ${outdir}/tophat_out_pe/${name}/accepted_hits.bam)
cd ${outdir}/tophat_out_final/${name}/
ln -s ${pe_result_abs_path} accepted_hits.bam
cd ${curdir}
fi
else
if [ -s "${outdir}/tophat_out_se/${name}/accepted_hits.bam" ]; then
se_result_abs_path=$(readlink -f ${outdir}/tophat_out_se/${name}/accepted_hits.bam)
cd ${outdir}/tophat_out_final/${name}/
ln -s ${se_result_abs_path} accepted_hits.bam
cd ${curdir}
else
echo -e "[ERROR] Not found any alignment for PE or SE reads." 1>&2
fi
fi
else
echo -e "\tFound Tophat2 output final (${name})..."
fi
if [ ! -e "${outdir}/tophat_out_final/${name}/Aligned.sorted.bam" ]; then
echo -e "\tSorting alignments (${name})..."
samtools sort --threads ${THREADS} \
-o ${outdir}/tophat_out_final/${name}/Aligned.sorted.bam \
${outdir}/tophat_out_final/${name}/accepted_hits.bam \
> ${outdir}/tophat_out_final/${name}/Aligned.sorted.out.txt \
2> ${outdir}/tophat_out_final/${name}/Aligned.sorted.err.txt
fi
# SEMPRE VAMOS REMOVER O LINK SIMBÓLICO PARA QUE AO ESCOLHER UM OUTRO
# ALINHADOR ELE SEJA SUBSTITUÍDO
#if [ ! -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then
rm -f ${outdir}/align_out_final/${name}/Aligned.out.bam
rm -f ${outdir}/align_out_final/${name}/Aligned.sorted.bam
if [ -e "${outdir}/tophat_out_final/${name}/accepted_hits.bam" ]; then
align_final_out=`readlink -f ${outdir}/tophat_out_final/${name}/accepted_hits.bam`
align_sorted_out=`readlink -f ${outdir}/tophat_out_final/${name}/Aligned.sorted.bam`
cd ${outdir}/align_out_final/${name}
ln -s ${align_final_out} Aligned.out.bam
ln -s ${align_sorted_out} Aligned.sorted.bam
cd ${curdir}
else
echo "[ERROR] Not found Tophat final output (${outdir}/tophat_out_final/${name}/accepted_hits.bam)" 2>&1
exit
fi
#fi
else
# SE NÃO FOR tophat ENTÃO star
if [ ! -e "${outdir}/star_out_pe/${name}/Aligned.out.bam" ]; then
echo -e "\tSTAR alignment (${name}) paired-end reads X genome ..."
mkdir -p ${outdir}/star_out_pe/${name}/
# Para a execução do cufflinks é necessário: --outSAMstrandField intronMotif e --outFilterIntronMotifs RemoveNoncanonical
STAR --runThreadN ${THREADS} \
--genomeDir ${outdir}/star_index/ \
--readFilesIn ${r1} ${r2} \
--outSAMstrandField intronMotif \
--outFilterIntronMotifs RemoveNoncanonical \
--sjdbGTFfile ${refgtf} \
--outFilterMultimapNmax 20 \
--outFileNamePrefix ${outdir}/star_out_pe/${name}/ \
--outSAMtype BAM Unsorted \
--outFilterType BySJout \
--outSJfilterReads Unique \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.02 \
--alignIntronMin 50 \
--alignIntronMax 5000 \
--alignMatesGapMax 5000 \
> ${outdir}/star_out_pe/${name}.log.out.txt \
2> ${outdir}/star_out_pe/${name}.log.err.txt
else
echo -e "\tFound STAR output for PE (${name})..."
fi
if [ ! -e "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then
mkdir -p ${outdir}/star_out_se/${name}
cat ${r1_singletons} ${r2_singletons} > ${outdir}/star_out_se/${name}/singletons.fastq
if [ -s "${outdir}/star_out_se/${name}/singletons.fastq" ]; then
echo -e "\tSTAR alignment (${name}) singleton reads X genome ..."
STAR --runThreadN ${THREADS} \
--genomeDir ${outdir}/star_index/ \
--readFilesIn ${r1} \
--outSAMstrandField intronMotif \
--outFilterIntronMotifs RemoveNoncanonical \
--sjdbGTFfile ${refgtf} \
--outFilterMultimapNmax 20 \
--outFileNamePrefix ${outdir}/star_out_se/${name}/ \
--outSAMtype BAM Unsorted \
--outFilterType BySJout \
--outSJfilterReads Unique \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.02 \
--alignIntronMin 50 \
--alignIntronMax 5000 \
--alignMatesGapMax 5000 \
> ${outdir}/star_out_se/${name}.log.out.txt \
2> ${outdir}/star_out_se/${name}.log.err.txt
fi
else
echo -e "\tFound STAR output for SE (${name})..."
fi
if [ ! -e "${outdir}/star_out_final/${name}/Aligned.out.bam" ]; then
mkdir -p ${outdir}/star_out_final/${name}
if [ -s "${outdir}/star_out_pe/${name}/Aligned.out.bam" ]; then
if [ -s "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then
echo -e "\tMerging STAR results ..."
samtools view -H ${outdir}/star_out_pe/${name}/Aligned.out.bam > ${outdir}/star_out_final/${name}/Header.txt
samtools merge -n --threads ${THREADS} \
-h ${outdir}/star_out_final/${name}/Header.txt \
${outdir}/star_out_final/${name}/Aligned.out.bam \
${outdir}/star_out_pe/${name}/Aligned.out.bam \
${outdir}/star_out_se/${name}/Aligned.out.bam \
> ${outdir}/star_out_final/${name}.log.out.txt \
2> ${outdir}/star_out_final/${name}.log.err.txt
samtools sort -n --threads ${THREADS} \
${outdir}/star_out_final/${name}/Aligned.out.bam \
-o ${outdir}/star_out_final/${name}/Aligned.named.out.bam
rm -f ${outdir}/star_out_final/${name}/Aligned.out.bam
mv ${outdir}/star_out_final/${name}/Aligned.named.out.bam \
${outdir}/star_out_final/${name}/Aligned.out.bam
else
pe_result_abs_path=$(readlink -f ${outdir}/star_out_pe/${name}/Aligned.out.bam)
cd ${outdir}/star_out_final/${name}/
ln -s ${pe_result_abs_path} Aligned.out.bam
cd ${curdir}
fi
else
if [ -s "${outdir}/star_out_se/${name}/Aligned.out.bam" ]; then
se_result_abs_path=$(readlink -f ${outdir}/star_out_se/${name}/Aligned.out.bam)
cd ${outdir}/star_out_final/${name}/
ln -s ${se_result_abs_path} Aligned.out.bam
cd ${curdir}
else
echo -e "[ERROR] Not found any alignment for PE or SE reads." 1>&2
fi
fi
else
echo -e "\tFound STAR output final (${name})..."
fi
if [ ! -e "${outdir}/star_out_final/${name}/Aligned.sorted.bam" ]; then
echo -e "\tSorting alignments (${name})..."
samtools sort --threads ${THREADS} \
-o ${outdir}/star_out_final/${name}/Aligned.sorted.bam \
${outdir}/star_out_final/${name}/Aligned.out.bam \
> ${outdir}/star_out_final/${name}/Aligned.sorted.out.txt \
2> ${outdir}/star_out_final/${name}/Aligned.sorted.err.txt
fi
# SEMPRE VAMOS REMOVER O LINK SIMBÓLICO PARA QUE AO ESCOLHER UM OUTRO
# ALINHADOR ELE SEJA SUBSTITUÍDO
#if [ ! -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then
rm -f ${outdir}/align_out_final/${name}/Aligned.out.bam
rm -f ${outdir}/align_out_final/${name}/Aligned.sorted.bam
if [ -e "${outdir}/star_out_final/${name}/Aligned.out.bam" ]; then
align_final_out=`readlink -f ${outdir}/star_out_final/${name}/Aligned.out.bam`
align_sorted_out=`readlink -f ${outdir}/star_out_final/${name}/Aligned.sorted.bam`
cd ${outdir}/align_out_final/${name}
ln -s ${align_final_out} Aligned.out.bam
ln -s ${align_sorted_out} Aligned.sorted.bam
cd ${curdir}
else
echo "[ERROR] Not found STAR final output (${outdir}/star_out_final/${name}/Aligned.out.bam)" 2>&1
exit
fi
#fi
fi
mkdir -p ${outdir}/align_out_info/
if [ -e "${outdir}/align_out_final/${name}/Aligned.out.bam" ]; then
if [ ! -e "${outdir}/align_out_info/${name}.${aligner}.out.txt" ]; then
echo -e "\tGet alignment information (${name}) [${aligner}] ..."
SAM_nameSorted_to_uniq_count_stats.pl ${outdir}/align_out_final/${name}/Aligned.out.bam > ${outdir}/align_out_info/${name}.${aligner}.out.txt 2>
${outdir}/align_out_info/${name}.${aligner}.err.txt
fi
mkdir -p ${outdir}/${aligner}_cufflinks/${name}
if [ "${assembler}" == "cufflinks" ]; then
if [ ! -e "${outdir}/${aligner}_cufflinks/${name}/transcripts.gtf" ]; then
echo -e "\tAssembly transcriptome (${name}) [cufflinks]\n"
cufflinks --output-dir ${outdir}/${aligner}_cufflinks/${name} \
--num-threads ${THREADS} \
--GTF-guide ${refgtf} \
--frag-bias-correct ${refseq} \
--multi-read-correct \
--library-type fr-unstranded \
--frag-len-mean 300 \
--frag-len-std-dev 30 \
--total-hits-norm \
--min-isoform-fraction 0.25 \
--pre-mrna-fraction 0.15 \
--min-frags-per-transfrag 10 \
--junc-alpha 0.001 \
--small-anchor-fraction 0.08 \
--overhang-tolerance 8 \
--min-intron-length 50 \
--max-intron-length 5000 \
--trim-3-avgcov-thresh 5 \
--trim-3-dropoff-frac 0.1 \
--max-multiread-fraction 0.75 \
--overlap-radius 25 \
--3-overhang-tolerance 600 \
--intron-overhang-tolerance 50 \
${outdir}/align_out_final/${name}/Aligned.sorted.bam \
> ${outdir}/${aligner}_cufflinks/${name}/cufflinks.out.txt \
2> ${outdir}/${aligner}_cufflinks/${name}/cufflinks.err.txt
fi
else
if [ ! -e "${outdir}/${aligner}_stringtie/${name}/transcripts.gtf" ]; then
mkdir -p ${outdir}/${aligner}_stringtie/${name}
echo -e "\tAssembly transcriptome (${name}) [stringtie]\n"
stringtie ${outdir}/align_out_final/${name}/Aligned.sorted.bam \
-G ${refgtf} \
-f 0.25 \
-m 200 \
-o ${outdir}/${aligner}_stringtie/${name}/transcripts.gtf \
-a 8 \
-j 2 \
-c 4 \
-v \
-g 25 \
-C ${outdir}/${aligner}_stringtie/${name}/coverages.txt \
-M 0.9 \
-p ${THREADS} \
-A ${outdir}/${aligner}_stringtie/${name}/abundances.txt \
-B \
> ${outdir}/${aligner}_stringtie/${name}/stringtie.out.txt \
2> ${outdir}/${aligner}_stringtie/${name}/stringtie.err.txt
fi
fi
else
echo -e "[ERROR] Not found alignment data (${outdir}/align_out_final/${name}/Aligned.out.bam)" 1>&2
fi
done
rm -f ${outdir}/assembly_GTF_list.txt
for transc in `find ${outdir}/${aligner}_${assembler} -name transcripts.gtf`; do
#echo -e "\tProcessing transcriptome ${transc} ..."
echo ${transc} >> ${outdir}/assembly_GTF_list.txt
done
transcriptomeref=""
if [ "${assembler}" == "cufflinks" ]; then
if [ ! -e "${outdir}/${aligner}_cuffmerge/merged.gtf" ]; then
echo -e "\tMerging transcriptomes (${outdir}/assembly_GTF_list.txt) in a transcriptome reference [cuffmerge]\n"
cuffmerge -o ${outdir}/${aligner}_cuffmerge \
--ref-gtf ${refgtf} \
--ref-sequence ${refseq} \
--min-isoform-fraction 0.25 \
--num-threads ${THREADS} \
${outdir}/assembly_GTF_list.txt \
> ${outdir}/${aligner}_cuffmerge/cuffmerge.out.txt \
2> ${outdir}/${aligner}_cuffmerge/cuffmerge.err.txt
fi
transcriptomeref="${outdir}/${aligner}_cuffmerge/merged.gtf"
else
if [ ! -e "${outdir}/${aligner}_stringmerge/merged.gtf" ]; then
echo -e "\tMerging transcriptomes (${outdir}/assembly_GTF_list.txt) in a transcriptome reference [stringtie]\n"
stringtie --merge \
-G ${refgtf} \
-o ${outdir}/${aligner}_stringmerge/merged.gtf \
-m 200 \
-c 4 \
-F 4 \
-T 4 \
-f 0.25 \
-g 100 \
${outdir}/assembly_GTF_list.txt \
> ${outdir}/${aligner}_stringmerge/stringmerge.out.txt \
2> ${outdir}/${aligner}_stringmerge/stringmerge.err.txt
fi
transcriptomeref="${outdir}/${aligner}_stringmerge/merged.gtf"
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf" ]; then
echo -e "\tRunning cuffcompare with ${aligner} & ${assembler} transcriptome reference (${transcriptomeref})..."
cuffcompare -r ${refgtf} \
-s ${refseq} \
-o ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp \
${transcriptomeref} \
> ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.err.txt
# ANOTAÇÃO DOS TRANSCRITOS "TCONS"
perl -F"\t" -lane 'INIT { print join("\t","transcript_id","nearest_ref","class_code"); } my ($transcript_id)=$F[8]=~/transcript_id \"([^\"]+)\"/; my ($nearest_r
ef)=$F[8]=~/nearest_ref \"([^\"]+)\"/; $nearest_ref=~s/^rna-//; $nearest_ref=~s/_[1-9]+$//; $nearest_ref=~s/^rna_gene-//; my ($class_code)=$F[8]=~/class_code \"([^\"]+)
\"/; print $transcript_id,"\t",$nearest_ref||"","\t",$class_code||"";' ${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf | awk 'NR == 1; NR > 1 {print
$0 | "sort -u"}' > ${outdir}/${aligner}_${assembler}_cuffcompare/TCONS.nearest_ref.txt
fi
transcriptomeref="${outdir}/${aligner}_${assembler}_cuffcompare/cuffcmp.combined.gtf"
# LISTA DE VALORES NÃO REDUNDANTES (NOME DO GRUPO BIOLÓGICO)
# Ex.: (CONTROL TEST)
biogroup_label=()
for bamfile in `find ${outdir}/align_out_final -name Aligned.sorted.bam`; do
name=`basename $(dirname ${bamfile})`
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffquant/${name}/abundances.cxb" ]; then
echo -e "\tRunning cuffquant using sample ${name} as using ${aligner} & ${assembler} (${transcriptomeref}) ..."
mkdir -p ${outdir}/${aligner}_${assembler}_cuffquant/${name}
cuffquant --output-dir ${outdir}/${aligner}_${assembler}_cuffquant/${name} \
--frag-bias-correct ${refseq} \
--multi-read-correct \
--num-threads ${THREADS} \
--library-type fr-unstranded \
--frag-len-mean 300 \
--frag-len-std-dev 30 \
--max-bundle-frags 9999999 \
--max-frag-multihits 20 \
${transcriptomeref} \
${bamfile} \
> ${outdir}/${aligner}_${assembler}_cuffquant/${name}/cuffquant.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffquant/${name}/cuffquant.log.err.txt
fi
groupname=`echo ${name} | sed 's/[_\.\#\-]\?[0-9]\+$//'`
biogroup_label=($(printf "%s\n" ${biogroup_label[@]} ${groupname} | sort -u ))
done
biogroup_files=()
echo -e "\tCollecting Expression Data from cuffquant output (*.cxb) ..."
for label in ${biogroup_label[@]}; do
echo -e "\t\tCollecting .cxb files for ${label} ..."
group=()
for cxbfile in `ls ${outdir}/${aligner}_${assembler}_cuffquant/${label}*/abundances.cxb`; do
echo -e "\t\t\tFound ${cxbfile}"
group=(${group[@]} "${cxbfile}")
done
biogroup_files=(${biogroup_files[@]} $(IFS=, ; echo "${group[*]}") )
done
echo -e "Starting Gene Expression Analysis ..."
echo -e "\t\tLabels.: " $(IFS=, ; echo "${biogroup_label[*]}")
echo -e "\t\tFiles..: " ${biogroup_files[*]}
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.count_table" ]; then
echo -e "\t\t\tGenerating abundance matrices (cuffnorm) ..."
cuffnorm --output-dir ${outdir}/${aligner}_${assembler}_cuffnorm \
--labels $(IFS=, ; echo "${biogroup_label[*]}") \
--num-threads ${THREADS} \
--library-type fr-unstranded \
--library-norm-method geometric \
--output-format simple-table \
${transcriptomeref} \
${biogroup_files[*]} \
> ${outdir}/${aligner}_${assembler}_cuffnorm/cuffnorm.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffnorm/cuffnorm.log.err.txt
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.raw_count_table.txt" ]; then
de-normalize-cuffnorm.R --in=${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.count_table \
--st=${outdir}/${aligner}_${assembler}_cuffnorm/samples.table \
--out=${outdir}/${aligner}_${assembler}_cuffnorm/isoforms.raw_count_table.txt
> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.isoforms.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.isoforms.err.txt
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffnorm/genes.raw_count_table.txt" ]; then
de-normalize-cuffnorm.R --in=${outdir}/${aligner}_${assembler}_cuffnorm/genes.count_table \
--st=${outdir}/${aligner}_${assembler}_cuffnorm/samples.table \
--out=${outdir}/${aligner}_${assembler}_cuffnorm/genes.raw_count_table.txt
> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.genes.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffnorm/de-normalize-cuffnorm.genes.err.txt
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff" ]; then
echo -e "\t\t\tAnalysing differential expression (cuffdiff) ..."
cuffdiff --output-dir ${outdir}/${aligner}_${assembler}_cuffdiff \
--labels $(IFS=, ; echo "${biogroup_label[*]}") \
--frag-bias-correct ${refseq} \
--multi-read-correct \
--num-threads ${THREADS} \
--library-type fr-unstranded \
--frag-len-mean 300 \
--frag-len-std-dev 30 \
--max-bundle-frags 9999999 \
--max-frag-multihits 20 \
--total-hits-norm \
--min-reps-for-js-test 2 \
--library-norm-method geometric \
--dispersion-method per-condition \
--min-alignment-count 10 \
${transcriptomeref} \
${biogroup_files[*]} \
> ${outdir}/${aligner}_${assembler}_cuffdiff/cuffdiff.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffdiff/cuffdiff.log.err.txt
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff.annot.txt" ]; then
echo "Annotating isoform_exp.diff ..."
mergeR.R --x=${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff \
--by.x="test_id" \
--y=${outdir}/${aligner}_${assembler}_cuffcompare/TCONS.nearest_ref.txt \
--by.y="transcript_id" \
--all.x \
--print.out.label \
--out=${outdir}/${aligner}_${assembler}_cuffdiff/isoform_exp.diff.annot.txt \
> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.isoform.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.isoform.log.err.txt
fi
if [ ! -e "${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.annot.txt" ]; then
if [ ${gene_info} ] && [ ${taxonomy_id} ]; then
if [ ! -e ${gene_info} ]; then
echo "[ERROR] Wrong gene_info file (${gene_info})." 1>&2
exit
fi
if [[ ! ${taxonomy_id} =~ [0-9]+ ]]; then
echo "[ERROR] Wrong taxonomy_id (${taxonomy_id})." 1>&2
exit
fi
if [ ! -e "${outdir}/gene_info.${taxonomy_id}.txt" ]; then
echo "Getting gene_info (${gene_info}) for taxonomy_id = ${taxonomy_id} ..."
cat ${gene_info} | perl -F"\t" -slane 'INIT { $taxonomy_id+=0; } if ($.==1) { print $_; } elsif ($F[0]==$taxonomy_id) { print $_; } ' -- -taxono
my_id=${taxonomy_id} | cut -f 2,3,5,9 > ${outdir}/gene_info.${taxonomy_id}.txt
fi
echo "Annotating gene_exp.diff ..."
splitteR.R --x="${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff" \
--col.x="gene" \
--by.x="," \
--out="${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.spplitted.txt" \
> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.err.txt
mergeR.R --x=${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.spplitted.txt \
--by.x="gene" \
--y=${outdir}/gene_info.${taxonomy_id}.txt \
--by.y="GeneID" \
--all.x \
--print.out.label \
--out=${outdir}/${aligner}_${assembler}_cuffdiff/gene_exp.diff.annot.txt \
> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.out.txt \
2> ${outdir}/${aligner}_${assembler}_cuffdiff/mergeR.gene.log.err.txt
fi
fi
```
Neste script são executados os comandos para alinhamento no TopHat e STAR gerando arquivos ou reads alinhadas em .bam. Na sequência são executados o pré e pós processamento das amostras; trimagem dos adaptadores com o Atropos; avaliação de contaminantes; redução do genoma; montagem dos transcritos usando o cufflinks; e a fusão dos transcritos individuais usando o cuffmerge; finalizando com a quantificação baseada na nova referência usando o cuffquant.
Comando para executar o pipeline:
```
./rnaseq-ref.sh tophat raw/ rnaseq-ref_out/ 12 raw/genome.gtf raw/genome.fa cufflinks NA
```
Diretórios resultantes do pipeline

Todos os arquivos gerados (tree):
```
.
|-- align_out_final
| |-- SAMPLEA1
| | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA1/accepted_hits.bam
| | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA1/Aligned.sorted.bam
| |-- SAMPLEA2
| | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA2/accepted_hits.bam
| | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEA2/Aligned.sorted.bam
| |-- SAMPLEB1
| | |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB1/accepted_hits.bam
| | `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB1/Aligned.sorted.bam
| `-- SAMPLEB2
| |-- Aligned.out.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB2/accepted_hits.bam
| `-- Aligned.sorted.bam -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/tophat_out_final/SAMPLEB2/Aligned.sorted.bam
|-- align_out_info
| |-- SAMPLEA1.tophat.err.txt
| |-- SAMPLEA1.tophat.out.txt
| |-- SAMPLEA2.tophat.err.txt
| |-- SAMPLEA2.tophat.out.txt
| |-- SAMPLEB1.tophat.err.txt
| |-- SAMPLEB1.tophat.out.txt
| |-- SAMPLEB2.tophat.err.txt
| `-- SAMPLEB2.tophat.out.txt
|-- assembly_GTF_list.txt
|-- processed
| |-- atropos
| | |-- SAMPLEA1.atropos_adapter.log.err.txt
| | |-- SAMPLEA1.atropos_adapter.log.out.txt
| | |-- SAMPLEA1.atropos.log.err.txt
| | |-- SAMPLEA1.atropos.log.out.txt
| | |-- SAMPLEA1_R1.atropos_final.fastq
| | |-- SAMPLEA1_R2.atropos_final.fastq
| | |-- SAMPLEA2.atropos_adapter.log.err.txt
| | |-- SAMPLEA2.atropos_adapter.log.out.txt
| | |-- SAMPLEA2.atropos.log.err.txt
| | |-- SAMPLEA2.atropos.log.out.txt
| | |-- SAMPLEA2_R1.atropos_final.fastq
| | |-- SAMPLEA2_R2.atropos_final.fastq
| | |-- SAMPLEB1.atropos_adapter.log.err.txt
| | |-- SAMPLEB1.atropos_adapter.log.out.txt
| | |-- SAMPLEB1.atropos.log.err.txt
| | |-- SAMPLEB1.atropos.log.out.txt
| | |-- SAMPLEB1_R1.atropos_final.fastq
| | |-- SAMPLEB1_R2.atropos_final.fastq
| | |-- SAMPLEB2.atropos_adapter.log.err.txt
| | |-- SAMPLEB2.atropos_adapter.log.out.txt
| | |-- SAMPLEB2.atropos.log.err.txt
| | |-- SAMPLEB2.atropos.log.out.txt
| | |-- SAMPLEB2_R1.atropos_final.fastq
| | `-- SAMPLEB2_R2.atropos_final.fastq
| |-- cleaned
| | |-- SAMPLEA1.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_1.fastq
| | |-- SAMPLEA1.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_1_singletons.fastq
| | |-- SAMPLEA1.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_2.fastq
| | |-- SAMPLEA1.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA1.atropos_final.prinseq_2_singletons.fastq
| | |-- SAMPLEA2.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_1.fastq
| | |-- SAMPLEA2.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_1_singletons.fastq
| | |-- SAMPLEA2.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_2.fastq
| | |-- SAMPLEA2.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEA2.atropos_final.prinseq_2_singletons.fastq
| | |-- SAMPLEB1.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_1.fastq
| | |-- SAMPLEB1.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_1_singletons.fastq
| | |-- SAMPLEB1.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_2.fastq
| | |-- SAMPLEB1.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB1.atropos_final.prinseq_2_singletons.fastq
| | |-- SAMPLEB2.atropos_final.prinseq.cleaned_1.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_1.fastq
| | |-- SAMPLEB2.atropos_final.prinseq.cleaned_1_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_1_singletons.fastq
| | |-- SAMPLEB2.atropos_final.prinseq.cleaned_2.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_2.fastq
| | `-- SAMPLEB2.atropos_final.prinseq.cleaned_2_singletons.fastq -> /state/partition1/rgmatos/Vvinifera/Refs/rnaseq-ref_out/processed/prinseq/SAMPLEB2.atropos_final.prinseq_2_singletons.fastq
| |-- fastqc
| | |-- pos
| | | |-- SAMPLEA1.atropos_final.prinseq_1_fastqc.html
| | | |-- SAMPLEA1.atropos_final.prinseq_1_fastqc.zip
| | | |-- SAMPLEA1.atropos_final.prinseq_1.log.err.txt
| | | |-- SAMPLEA1.atropos_final.prinseq_1.log.out.txt
| | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons_fastqc.html
| | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons_fastqc.zip
| | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons.log.err.txt
| | | |-- SAMPLEA1.atropos_final.prinseq_1_singletons.log.out.txt
| | | |-- SAMPLEA1.atropos_final.prinseq_2_fastqc.html
| | | |-- SAMPLEA1.atropos_final.prinseq_2_fastqc.zip
| | | |-- SAMPLEA1.atropos_final.prinseq_2.log.err.txt
| | | |-- SAMPLEA1.atropos_final.prinseq_2.log.out.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_1_fastqc.html
| | | |-- SAMPLEA2.atropos_final.prinseq_1_fastqc.zip
| | | |-- SAMPLEA2.atropos_final.prinseq_1.log.err.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_1.log.out.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons_fastqc.html
| | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons_fastqc.zip
| | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons.log.err.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_1_singletons.log.out.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_2_fastqc.html
| | | |-- SAMPLEA2.atropos_final.prinseq_2_fastqc.zip
| | | |-- SAMPLEA2.atropos_final.prinseq_2.log.err.txt
| | | |-- SAMPLEA2.atropos_final.prinseq_2.log.out.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_1_fastqc.html
| | | |-- SAMPLEB1.atropos_final.prinseq_1_fastqc.zip
| | | |-- SAMPLEB1.atropos_final.prinseq_1.log.err.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_1.log.out.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons_fastqc.html
| | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons_fastqc.zip
| | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons.log.err.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_1_singletons.log.out.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_2_fastqc.html
| | | |-- SAMPLEB1.atropos_final.prinseq_2_fastqc.zip
| | | |-- SAMPLEB1.atropos_final.prinseq_2.log.err.txt
| | | |-- SAMPLEB1.atropos_final.prinseq_2.log.out.txt
| | | |-- SAMPLEB2.atropos_final.prinseq_1_fastqc.html
| | | |-- SAMPLEB2.atropos_final.prinseq_1_fastqc.zip
| | | |-- SAMPLEB2.atropos_final.prinseq_1.log.err.txt
| | | |-- SAMPLEB2.atropos_final.prinseq_1.log.out.txt
| | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons_fastqc.html
| | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons_fastqc.zip
| | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons.log.err.txt
| | | |-- SAMPLEB2.atropos_final.prinseq_1_singletons.log.out.txt
| | | |-- SAMPLEB2.atropos_final.prinseq_2_fastqc.html
| | | |-- SAMPLEB2.atropos_final.prinseq_2_fastqc.zip
| | | |-- SAMPLEB2.atropos_final.prinseq_2.log.err.txt
| | | `-- SAMPLEB2.atropos_final.prinseq_2.log.out.txt
| | `-- pre
| | |-- SAMPLEA1_R1_fastqc.html
| | |-- SAMPLEA1_R1_fastqc.zip
| | |-- SAMPLEA1_R1.log.err.txt
| | |-- SAMPLEA1_R1.log.out.txt
| | |-- SAMPLEA1_R2_fastqc.html
| | |-- SAMPLEA1_R2_fastqc.zip
| | |-- SAMPLEA1_R2.log.err.txt
| | |-- SAMPLEA1_R2.log.out.txt
| | |-- SAMPLEA2_R1_fastqc.html
| | |-- SAMPLEA2_R1_fastqc.zip
| | |-- SAMPLEA2_R1.log.err.txt
| | |-- SAMPLEA2_R1.log.out.txt
| | |-- SAMPLEA2_R2_fastqc.html
| | |-- SAMPLEA2_R2_fastqc.zip
| | |-- SAMPLEA2_R2.log.err.txt
| | |-- SAMPLEA2_R2.log.out.txt
| | |-- SAMPLEB1_R1_fastqc.html
| | |-- SAMPLEB1_R1_fastqc.zip
| | |-- SAMPLEB1_R1.log.err.txt
| | |-- SAMPLEB1_R1.log.out.txt
| | |-- SAMPLEB1_R2_fastqc.html
| | |-- SAMPLEB1_R2_fastqc.zip
| | |-- SAMPLEB1_R2.log.err.txt
| | |-- SAMPLEB1_R2.log.out.txt
| | |-- SAMPLEB2_R1_fastqc.html
| | |-- SAMPLEB2_R1_fastqc.zip
| | |-- SAMPLEB2_R1.log.err.txt
| | |-- SAMPLEB2_R1.log.out.txt
| | |-- SAMPLEB2_R2_fastqc.html
| | |-- SAMPLEB2_R2_fastqc.zip
| | |-- SAMPLEB2_R2.log.err.txt
| | `-- SAMPLEB2_R2.log.out.txt
| `-- prinseq
| |-- SAMPLEA1.atropos_final.prinseq_1.fastq
| |-- SAMPLEA1.atropos_final.prinseq_1_singletons.fastq
| |-- SAMPLEA1.atropos_final.prinseq_2.fastq
| |-- SAMPLEA1.atropos_final.prinseq_2_singletons.fastq
| |-- SAMPLEA1.atropos_final.prinseq.err.log
| |-- SAMPLEA1.atropos_final.prinseq.out.log
| |-- SAMPLEA1.atropos_final.prinseq_singletons.fastq
| |-- SAMPLEA2.atropos_final.prinseq_1.fastq
| |-- SAMPLEA2.atropos_final.prinseq_1_singletons.fastq
| |-- SAMPLEA2.atropos_final.prinseq_2.fastq
| |-- SAMPLEA2.atropos_final.prinseq_2_singletons.fastq
| |-- SAMPLEA2.atropos_final.prinseq.err.log
| |-- SAMPLEA2.atropos_final.prinseq.out.log
| |-- SAMPLEA2.atropos_final.prinseq_singletons.fastq
| |-- SAMPLEB1.atropos_final.prinseq_1.fastq
| |-- SAMPLEB1.atropos_final.prinseq_1_singletons.fastq
| |-- SAMPLEB1.atropos_final.prinseq_2.fastq
| |-- SAMPLEB1.atropos_final.prinseq_2_singletons.fastq
| |-- SAMPLEB1.atropos_final.prinseq.err.log
| |-- SAMPLEB1.atropos_final.prinseq.out.log
| |-- SAMPLEB1.atropos_final.prinseq_singletons.fastq
| |-- SAMPLEB2.atropos_final.prinseq_1.fastq
| |-- SAMPLEB2.atropos_final.prinseq_1_singletons.fastq
| |-- SAMPLEB2.atropos_final.prinseq_2.fastq
| |-- SAMPLEB2.atropos_final.prinseq_2_singletons.fastq
| |-- SAMPLEB2.atropos_final.prinseq.err.log
| |-- SAMPLEB2.atropos_final.prinseq.out.log
| `-- SAMPLEB2.atropos_final.prinseq_singletons.fastq
|-- tophat_cufflinks
| |-- SAMPLEA1
| | |-- cufflinks.err.txt
| | |-- cufflinks.out.txt
| | |-- genes.fpkm_tracking
| | |-- isoforms.fpkm_tracking
| | |-- skipped.gtf
| | `-- transcripts.gtf
| |-- SAMPLEA2
| | |-- cufflinks.err.txt
| | |-- cufflinks.out.txt
| | |-- genes.fpkm_tracking
| | |-- isoforms.fpkm_tracking
| | |-- skipped.gtf
| | `-- transcripts.gtf
| |-- SAMPLEB1
| | |-- cufflinks.err.txt
| | |-- cufflinks.out.txt
| | |-- genes.fpkm_tracking
| | |-- isoforms.fpkm_tracking
| | |-- skipped.gtf
| | `-- transcripts.gtf
| `-- SAMPLEB2
| |-- cufflinks.err.txt
| |-- cufflinks.out.txt
| |-- genes.fpkm_tracking
| |-- isoforms.fpkm_tracking
| |-- skipped.gtf
| `-- transcripts.gtf
|-- tophat_cufflinks_cuffcompare
| |-- cuffcmp.combined.gtf
| |-- cuffcmp.err.txt
| |-- cuffcmp.loci
| |-- cuffcmp.out.txt
| |-- cuffcmp.stats
| |-- cuffcmp.tracking
| `-- TCONS.nearest_ref.txt
|-- tophat_cufflinks_cuffdiff
| |-- bias_params.info
| |-- cds.count_tracking
| |-- cds.diff
| |-- cds_exp.diff
| |-- cds.fpkm_tracking
| |-- cds.read_group_tracking
| |-- cuffdiff.log.err.txt
| |-- cuffdiff.log.out.txt
| |-- gene_exp.diff
| |-- genes.count_tracking
| |-- genes.fpkm_tracking
| |-- genes.read_group_tracking
| |-- isoform_exp.diff
| |-- isoform_exp.diff.annot.txt
| |-- isoforms.count_tracking
| |-- isoforms.fpkm_tracking
| |-- isoforms.read_group_tracking
| |-- mergeR.isoform.log.err.txt
| |-- mergeR.isoform.log.out.txt
| |-- promoters.diff
| |-- read_groups.info
| |-- run.info
| |-- splicing.diff
| |-- tss_group_exp.diff
| |-- tss_groups.count_tracking
| |-- tss_groups.fpkm_tracking
| |-- tss_groups.read_group_tracking
| `-- var_model.info
|-- tophat_cufflinks_cuffnorm
| |-- cds.attr_table
| |-- cds.count_table
| |-- cds.fpkm_table
| |-- cuffnorm.log.err.txt
| |-- cuffnorm.log.out.txt
| |-- de-normalize-cuffnorm.genes.err.txt
| |-- de-normalize-cuffnorm.genes.out.txt
| |-- de-normalize-cuffnorm.isoforms.err.txt
| |-- de-normalize-cuffnorm.isoforms.out.txt
| |-- genes.attr_table
| |-- genes.count_table
| |-- genes.fpkm_table
| |-- genes.raw_count_table.txt
| |-- isoforms.attr_table
| |-- isoforms.count_table
| |-- isoforms.fpkm_table
| |-- isoforms.raw_count_table.txt
| |-- run.info
| |-- samples.table
| |-- tss_groups.attr_table
| |-- tss_groups.count_table
| `-- tss_groups.fpkm_table
|-- tophat_cufflinks_cuffquant
| |-- SAMPLEA1
| | |-- abundances.cxb
| | |-- cuffquant.log.err.txt
| | `-- cuffquant.log.out.txt
| |-- SAMPLEA2
| | |-- abundances.cxb
| | |-- cuffquant.log.err.txt
| | `-- cuffquant.log.out.txt
| |-- SAMPLEB1
| | |-- abundances.cxb
| | |-- cuffquant.log.err.txt
| | `-- cuffquant.log.out.txt
| `-- SAMPLEB2
| |-- abundances.cxb
| |-- cuffquant.log.err.txt
| `-- cuffquant.log.out.txt
|-- tophat_cuffmerge
| |-- cuffcmp.merged.gtf.refmap
| |-- cuffcmp.merged.gtf.tmap
| |-- cuffmerge.err.txt
| |-- cuffmerge.out.txt
| |-- logs
| | `-- run.log
| `-- merged.gtf
|-- tophat_index
| |-- bowtie2.err.txt
| |-- bowtie2.out.txt
| |-- genome.1.bt2
| |-- genome.2.bt2
| |-- genome.3.bt2
| |-- genome.4.bt2
| |-- genome.fa -> /state/partition1/rgmatos/Vvinifera/Refs/raw/genome.fa
| |-- genome.rev.1.bt2
| |-- genome.rev.2.bt2
| |-- transcriptome.1.bt2
| |-- transcriptome.2.bt2
| |-- transcriptome.3.bt2
| |-- transcriptome.4.bt2
| |-- transcriptome.fa
| |-- transcriptome.fa.tlst
| |-- transcriptome.gff
| |-- transcriptome.rev.1.bt2
| |-- transcriptome.rev.2.bt2
| `-- transcriptome.ver
|-- tophat_out_final
| |-- SAMPLEA1
| | |-- accepted_hits.bam
| | |-- Aligned.sorted.bam
| | |-- Aligned.sorted.err.txt
| | |-- Aligned.sorted.out.txt
| | `-- Header.txt
| |-- SAMPLEA1.log.err.txt
| |-- SAMPLEA1.log.out.txt
| |-- SAMPLEA2
| | |-- accepted_hits.bam
| | |-- Aligned.sorted.bam
| | |-- Aligned.sorted.err.txt
| | |-- Aligned.sorted.out.txt
| | `-- Header.txt
| |-- SAMPLEA2.log.err.txt
| |-- SAMPLEA2.log.out.txt
| |-- SAMPLEB1
| | |-- accepted_hits.bam
| | |-- Aligned.sorted.bam
| | |-- Aligned.sorted.err.txt
| | |-- Aligned.sorted.out.txt
| | `-- Header.txt
| |-- SAMPLEB1.log.err.txt
| |-- SAMPLEB1.log.out.txt
| |-- SAMPLEB2
| | |-- accepted_hits.bam
| | |-- Aligned.sorted.bam
| | |-- Aligned.sorted.err.txt
| | |-- Aligned.sorted.out.txt
| | `-- Header.txt
| |-- SAMPLEB2.log.err.txt
| `-- SAMPLEB2.log.out.txt
|-- tophat_out_pe
| |-- SAMPLEA1
| | |-- accepted_hits.bam
| | |-- align_summary.txt
| | |-- deletions.bed
| | |-- insertions.bed
| | |-- junctions.bed
| | |-- logs
| | | |-- bam_merge_um.log
| | | |-- bowtie_build.log
| | | |-- bowtie.left_kept_reads.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.right_kept_reads.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.SAMPLEA1.atropos_final.prinseq.cleaned_1.log
| | | |-- bowtie.SAMPLEA1.atropos_final.prinseq.cleaned_2.log
| | | |-- gtf_juncs.log
| | | |-- juncs_db.log
| | | |-- long_spanning_reads.segs.log
| | | |-- m2g_left_kept_reads.err
| | | |-- m2g_left_kept_reads.out
| | | |-- m2g_right_kept_reads.err
| | | |-- m2g_right_kept_reads.out
| | | |-- prep_reads.from_preflt.left.log
| | | |-- prep_reads.from_preflt.right.log
| | | |-- prep_reads.log
| | | |-- prep_reads.prefilter_left.log
| | | |-- prep_reads.prefilter_right.log
| | | |-- reports.log
| | | |-- run.log
| | | |-- segment_juncs.log
| | | `-- tophat.log
| | |-- prep_reads.info
| | `-- unmapped.bam
| |-- SAMPLEA1.log.err.txt
| |-- SAMPLEA1.log.out.txt
| |-- SAMPLEA2
| | |-- accepted_hits.bam
| | |-- align_summary.txt
| | |-- deletions.bed
| | |-- insertions.bed
| | |-- junctions.bed
| | |-- logs
| | | |-- bam_merge_um.log
| | | |-- bowtie_build.log
| | | |-- bowtie.left_kept_reads.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.right_kept_reads.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.SAMPLEA2.atropos_final.prinseq.cleaned_1.log
| | | |-- bowtie.SAMPLEA2.atropos_final.prinseq.cleaned_2.log
| | | |-- g2f.err
| | | |-- g2f.out
| | | |-- gtf_juncs.log
| | | |-- juncs_db.log
| | | |-- long_spanning_reads.segs.log
| | | |-- m2g_left_kept_reads.err
| | | |-- m2g_left_kept_reads.out
| | | |-- m2g_right_kept_reads.err
| | | |-- m2g_right_kept_reads.out
| | | |-- prep_reads.from_preflt.left.log
| | | |-- prep_reads.from_preflt.right.log
| | | |-- prep_reads.log
| | | |-- prep_reads.prefilter_left.log
| | | |-- prep_reads.prefilter_right.log
| | | |-- reports.log
| | | |-- run.log
| | | |-- segment_juncs.log
| | | `-- tophat.log
| | |-- prep_reads.info
| | `-- unmapped.bam
| |-- SAMPLEA2.log.err.txt
| |-- SAMPLEA2.log.out.txt
| |-- SAMPLEB1
| | |-- accepted_hits.bam
| | |-- align_summary.txt
| | |-- deletions.bed
| | |-- insertions.bed
| | |-- junctions.bed
| | |-- logs
| | | |-- bam_merge_um.log
| | | |-- bowtie_build.log
| | | |-- bowtie.left_kept_reads.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.right_kept_reads.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.SAMPLEB1.atropos_final.prinseq.cleaned_1.log
| | | |-- bowtie.SAMPLEB1.atropos_final.prinseq.cleaned_2.log
| | | |-- gtf_juncs.log
| | | |-- juncs_db.log
| | | |-- long_spanning_reads.segs.log
| | | |-- m2g_left_kept_reads.err
| | | |-- m2g_left_kept_reads.out
| | | |-- m2g_right_kept_reads.err
| | | |-- m2g_right_kept_reads.out
| | | |-- prep_reads.from_preflt.left.log
| | | |-- prep_reads.from_preflt.right.log
| | | |-- prep_reads.log
| | | |-- prep_reads.prefilter_left.log
| | | |-- prep_reads.prefilter_right.log
| | | |-- reports.log
| | | |-- run.log
| | | |-- segment_juncs.log
| | | `-- tophat.log
| | |-- prep_reads.info
| | `-- unmapped.bam
| |-- SAMPLEB1.log.err.txt
| |-- SAMPLEB1.log.out.txt
| |-- SAMPLEB2
| | |-- accepted_hits.bam
| | |-- align_summary.txt
| | |-- deletions.bed
| | |-- insertions.bed
| | |-- junctions.bed
| | |-- logs
| | | |-- bam_merge_um.log
| | | |-- bowtie_build.log
| | | |-- bowtie.left_kept_reads.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.right_kept_reads.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg1.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg2.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg3.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg4.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg5.log
| | | |-- bowtie.right_kept_reads.m2g_um_seg6.log
| | | |-- bowtie.SAMPLEB2.atropos_final.prinseq.cleaned_1.log
| | | |-- bowtie.SAMPLEB2.atropos_final.prinseq.cleaned_2.log
| | | |-- gtf_juncs.log
| | | |-- juncs_db.log
| | | |-- long_spanning_reads.segs.log
| | | |-- m2g_left_kept_reads.err
| | | |-- m2g_left_kept_reads.out
| | | |-- m2g_right_kept_reads.err
| | | |-- m2g_right_kept_reads.out
| | | |-- prep_reads.from_preflt.left.log
| | | |-- prep_reads.from_preflt.right.log
| | | |-- prep_reads.log
| | | |-- prep_reads.prefilter_left.log
| | | |-- prep_reads.prefilter_right.log
| | | |-- reports.log
| | | |-- run.log
| | | |-- segment_juncs.log
| | | `-- tophat.log
| | |-- prep_reads.info
| | `-- unmapped.bam
| |-- SAMPLEB2.log.err.txt
| `-- SAMPLEB2.log.out.txt
`-- tophat_out_se
|-- SAMPLEA1
| |-- accepted_hits.bam
| |-- align_summary.txt
| |-- deletions.bed
| |-- insertions.bed
| |-- junctions.bed
| |-- logs
| | |-- bowtie_build.log
| | |-- bowtie.left_kept_reads.log
| | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | |-- bowtie.singletons.log
| | |-- gtf_juncs.log
| | |-- juncs_db.log
| | |-- long_spanning_reads.segs.log
| | |-- m2g_left_kept_reads.err
| | |-- m2g_left_kept_reads.out
| | |-- prep_reads.from_preflt.left.log
| | |-- prep_reads.log
| | |-- prep_reads.prefilter_left.log
| | |-- reports.log
| | |-- run.log
| | |-- segment_juncs.log
| | `-- tophat.log
| |-- prep_reads.info
| |-- singletons.fastq
| `-- unmapped.bam
|-- SAMPLEA1.log.err.txt
|-- SAMPLEA1.log.out.txt
|-- SAMPLEA2
| |-- accepted_hits.bam
| |-- align_summary.txt
| |-- deletions.bed
| |-- insertions.bed
| |-- junctions.bed
| |-- logs
| | |-- bowtie_build.log
| | |-- bowtie.left_kept_reads.log
| | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | |-- bowtie.singletons.log
| | |-- gtf_juncs.log
| | |-- juncs_db.log
| | |-- long_spanning_reads.segs.log
| | |-- m2g_left_kept_reads.err
| | |-- m2g_left_kept_reads.out
| | |-- prep_reads.from_preflt.left.log
| | |-- prep_reads.log
| | |-- prep_reads.prefilter_left.log
| | |-- reports.log
| | |-- run.log
| | |-- segment_juncs.log
| | `-- tophat.log
| |-- prep_reads.info
| |-- singletons.fastq
| `-- unmapped.bam
|-- SAMPLEA2.log.err.txt
|-- SAMPLEA2.log.out.txt
|-- SAMPLEB1
| |-- accepted_hits.bam
| |-- align_summary.txt
| |-- deletions.bed
| |-- insertions.bed
| |-- junctions.bed
| |-- logs
| | |-- bowtie_build.log
| | |-- bowtie.left_kept_reads.log
| | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | |-- bowtie.singletons.log
| | |-- gtf_juncs.log
| | |-- juncs_db.log
| | |-- long_spanning_reads.segs.log
| | |-- m2g_left_kept_reads.err
| | |-- m2g_left_kept_reads.out
| | |-- prep_reads.from_preflt.left.log
| | |-- prep_reads.log
| | |-- prep_reads.prefilter_left.log
| | |-- reports.log
| | |-- run.log
| | |-- segment_juncs.log
| | `-- tophat.log
| |-- prep_reads.info
| |-- singletons.fastq
| `-- unmapped.bam
|-- SAMPLEB1.log.err.txt
|-- SAMPLEB1.log.out.txt
|-- SAMPLEB2
| |-- accepted_hits.bam
| |-- align_summary.txt
| |-- deletions.bed
| |-- insertions.bed
| |-- junctions.bed
| |-- logs
| | |-- bowtie_build.log
| | |-- bowtie.left_kept_reads.log
| | |-- bowtie.left_kept_reads.m2g_um_seg1.log
| | |-- bowtie.left_kept_reads.m2g_um_seg2.log
| | |-- bowtie.left_kept_reads.m2g_um_seg3.log
| | |-- bowtie.left_kept_reads.m2g_um_seg4.log
| | |-- bowtie.left_kept_reads.m2g_um_seg5.log
| | |-- bowtie.left_kept_reads.m2g_um_seg6.log
| | |-- bowtie.singletons.log
| | |-- gtf_juncs.log
| | |-- juncs_db.log
| | |-- long_spanning_reads.segs.log
| | |-- m2g_left_kept_reads.err
| | |-- m2g_left_kept_reads.out
| | |-- prep_reads.from_preflt.left.log
| | |-- prep_reads.log
| | |-- prep_reads.prefilter_left.log
| | |-- reports.log
| | |-- run.log
| | |-- segment_juncs.log
| | `-- tophat.log
| |-- prep_reads.info
| |-- singletons.fastq
| `-- unmapped.bam
|-- SAMPLEB2.log.err.txt
`-- SAMPLEB2.log.out.txt
```
**MONTAGEM DO TRANSCRIPTOMA *DE NOVO***
#Aplicando o pipeline rnaseq-novo.sh
```
#!/bin/bash
# input - diretório contendo os arquivos de entrada no formato .fastq
input=$1
# validação do parâmetro "input"
if [ ! ${input} ]
then
echo "[ERROR] Missing input directory." 1>&2
exit
else
if [ ! -d ${input} ]
then
echo "[ERROR] Wrong input directory (${input})." 1>&2
exit
fi
fi
# output - diretório para armazenar o resultado do processo de montagem
output=$2
# validação do parâmetro "output"
if [ ! ${output} ]
then
echo "[ERROR] Missing output directory." 1>&2
exit
else
if [ ! -d ${output} ]
then
echo "[ERROR] Wrong output directory (${output})." 1>&2
exit
fi
fi
# Número de CORES para o processamento
# ATENÇÃO: Não exceder o limite da máquina
THREADS=$3
if [ ! ${THREADS} ]; then
echo "[ERROR] Missing number of threads." 1>&2
exit
fi
# Quantidade de memória para o processamento com Jellyfish
# ATENÇÃO: Não exceder o limite da máquina
MEM=$4
if [ ! ${MEM} ]; then
echo "[ERROR] Missing memory." 1>&2
exit
fi
###
# Arquivos e diretórios de saída (output)
#
basedir_out="${output}/"
renamed_out="${basedir_out}/renamed"
trinity_out="${basedir_out}/trinity_assembled"
mkdir -p ${renamed_out}
mkdir -p ${trinity_out}
left=()
left_singleton=()
right=()
right_singleton=()
echo "Performing renaming step ..."
for fastq in `ls ${input}/*.fastq`; do
# obtendo nome do arquivo
fastqbn=`basename ${fastq}`;
if [[ ! $fastqbn =~ \.bad_ ]]; then
renamed_fastq="${renamed_out}/${fastqbn}"
if [ ! -e ${renamed_fastq} ]; then
echo -e "\tRenaming ${fastqbn} ..."
if [[ ${fastqbn} =~ _1[\._] ]]; then
awk '{ if (NR%4==1) { if ($1!~/\/1$/) { print $1"/1" } else { print $1 } } else if (NR%4==3) { print "+" } else { print $1 } }' ${fastq}
> ${renamed_fastq}
elif [[ ${fastqbn} =~ _2[\._] ]]; then
awk '{ if (NR%4==1) { if ($1!~/\/2$/) { print $1"/2" } else { print $1 } } else if (NR%4==3) { print "+" } else { print $1 } }' ${fastq}
> ${renamed_fastq}
else
echo "Warning: ${fastqbn} discarded!"
fi
fi
if [[ ${fastqbn} =~ _1[\._] ]]; then
if [[ ${fastqbn} =~ singletons ]]; then
if [ -s ${renamed_fastq} ]; then
left_singleton=($(printf "%s\n" ${left_singleton[@]} ${renamed_fastq} | sort -u ))
fi
else
left=($(printf "%s\n" ${left[@]} ${renamed_fastq} | sort -u ))
fi
elif [[ ${fastqbn} =~ _2[\._] ]]; then
if [[ ${fastqbn} =~ singleton ]]; then
if [ -s ${renamed_fastq} ]; then
right_singleton=($(printf "%s\n" ${right_singleton[@]} ${renamed_fastq} | sort -u ))
fi
else
right=($(printf "%s\n" ${right[@]} ${renamed_fastq} | sort -u ))
fi
else
echo "Warning: ${fastqbn} discarded!"
fi
fi
done
#for l in ${left[@]}; do
# echo -e "L: ${l}";
#done
#
#for r in ${right[@]}; do
# echo -e "R: ${r}";
#done
#
#for ls in ${left_singleton[@]}; do
# echo -e "LS: ${ls}";
#done
#
#for rs in ${right_singleton[@]}; do
# echo -e "RS: ${rs}";
#done
if [ ! -d ${trinity_out}/Trinity.fasta ]; then
echo -e "Assembling step (Trinity) ..."
rm -fr ${trinity_out}
mkdir -p ${trinity_out}
Trinity --output ${trinity_out} \
--seqType fq \
--max_memory ${MEM} \
--CPU ${THREADS} \
--min_per_id_same_path 98 \
--max_diffs_same_path 2 \
--path_reinforcement_distance 10 \
--group_pairs_distance 500 \
--min_kmer_cov 3 \
--min_glue 5 \
--min_contig_length 300 \
--left $(IFS=, ; echo "${left[*]},${left_singleton[*]},${right_singleton[*]}") \
--right $(IFS=, ; echo "${right[*]}") \
> ${trinity_out}/Trinity.log.out.txt \
2> ${trinity_out}/Trinity.log.err.txt
fi
```
Diante das duas condições biológicas, após a aplicação dos pipelines, os valores a serem considerados são os valores de significância, valores brutos de expressão para cada amostra, valor normalizado por condição, log2 fold change, conforme seguem na tabela:

Na tabela, é indicado os valores da abundância A e B submetido para simulação. É informado as diferenças de cada amostra, o esperado das proporções se indicaram redução ou aumento da expressão dos genes selecionados para simulação.