# Pasar todas las READS a fasta para hacer una busqueda en ellas
## Estamos haiendo este procedimiento para hacer búsquedas de BLAST en las secciones de un cromosoma que se alinean con las reads de un experimento de secuenciación de mRNA
### 1. Set Directories:
```
mkdir -p /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4
cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4
```
### 2. Get the genomes:
```
export PATH=/lustre/work/mhoyosro/software/sratoolkit/bin:${PATH}
```
I am going to explore this sequencing experiment from Texas A&M University
```
prefetch --max-size 99999999999 SRR18652217
```
As a reference I am going to use Genome assembly mMyoMyo1.p from Bat1k August 2020
```
curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_014108235.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT"
```
### 3. Unpack the genomes:
```
fastq-dump SRR18652217.sra
```
### 4. Convert to FASTA:
```
. /home/mhoyosro/conda/etc/profile.d/conda.sh
conda activate BLAST
conda install bioconda::seqtk
seqtk seq -A SRR18652217.fastq > SRR18652217.fasta
```
### 5. Map the samples against the reference genome:
Load the necessary modules
```
module load gcc/9.2.0
module load bwa/0.7.17
module load samtools/1.11
reference="/lustre/scratch/mhoyosro/project1/MSMC2/mMyo/genomes/mMyo.fa"
bwa mem $reference SRR18652217.fasta > SRR18652217.sam
```
pasar a bam
```
samtools view -bS SRR18652217.sam > SRR18652217.bam
```
indexar el bam
```
samtools index SRR18652217.bam
```
### 6. Arquitectura de los scaffolds
Bueno aca la idea es explicar como estan organizados los scaffolds de mi referencia. EN MI CASO, mi referencia tien 92 scaffolds. Cada scaffold en mi referencia se llama scaffold_m19_p_ y cada uno tiene un indicador de número, por ejemplo el primer scaffold se llama scaffold_m19_p_1. La longitud de cada scaffold la debo conocer, por ejemplo el primer scaffold va desde la posicion 0 hasta la posicion 223369599. Un resumen del comportamiento de todos los scaffolds de mi referencia se ve a continuación:
```
scaffold_m19_p_1 0 223369599
scaffold_m19_p_2 0 217756413
scaffold_m19_p_3 0 213723965
scaffold_m19_p_4 0 130331158
scaffold_m19_p_5 0 111266764
scaffold_m19_p_6 0 101075066
scaffold_m19_p_7 0 94448911
scaffold_m19_p_8 0 94234955
scaffold_m19_p_9 0 92776453
scaffold_m19_p_10 0 80193900
scaffold_m19_p_11 0 78476750
scaffold_m19_p_12 0 74216526
scaffold_m19_p_13 0 58607551
scaffold_m19_p_14 0 55602309
scaffold_m19_p_15 0 53400106
scaffold_m19_p_16 0 53200087
scaffold_m19_p_17 0 47534757
scaffold_m19_p_18 0 43537823
scaffold_m19_p_19 0 42769290
scaffold_m19_p_20 0 25313220
scaffold_m19_p_21 0 15591159
scaffold_m19_p_22 0 8097369
scaffold_m19_p_23 0 7815798
scaffold_m19_p_24 0 6554382
scaffold_m19_p_25 0 4996645
scaffold_m19_p_26 0 3682276
scaffold_m19_p_27 0 3557006
scaffold_m19_p_28 0 3555632
scaffold_m19_p_29 0 3393479
scaffold_m19_p_30 0 3173525
scaffold_m19_p_31 0 2913534
scaffold_m19_p_32 0 2797511
scaffold_m19_p_33 0 2461131
scaffold_m19_p_34 0 2396235
scaffold_m19_p_35 0 2395910
scaffold_m19_p_36 0 2315537
scaffold_m19_p_37 0 2205510
scaffold_m19_p_38 0 2135428
scaffold_m19_p_39 0 2080404
scaffold_m19_p_40 0 2066723
scaffold_m19_p_41 0 1964739
scaffold_m19_p_42 0 1691905
scaffold_m19_p_43 0 1569096
scaffold_m19_p_44 0 1545942
scaffold_m19_p_45 0 1459827
scaffold_m19_p_46 0 1449802
scaffold_m19_p_47 0 1408249
scaffold_m19_p_48 0 1089875
scaffold_m19_p_49 0 1086583
scaffold_m19_p_50 0 1048190
scaffold_m19_p_51 0 1038638
scaffold_m19_p_52 0 842794
scaffold_m19_p_53 0 769791
scaffold_m19_p_54 0 738777
scaffold_m19_p_55 0 715828
scaffold_m19_p_56 0 694283
scaffold_m19_p_57 0 679316
scaffold_m19_p_58 0 642830
scaffold_m19_p_59 0 630193
scaffold_m19_p_60 0 626352
scaffold_m19_p_61 0 578126
scaffold_m19_p_62 0 544785
scaffold_m19_p_63 0 522568
scaffold_m19_p_64 0 372349
scaffold_m19_p_65 0 353007
scaffold_m19_p_66 0 319838
scaffold_m19_p_67 0 268097
scaffold_m19_p_68 0 239972
scaffold_m19_p_69 0 213343
scaffold_m19_p_70 0 206615
scaffold_m19_p_71 0 188461
scaffold_m19_p_72 0 147788
scaffold_m19_p_73 0 125338
scaffold_m19_p_74 0 124669
scaffold_m19_p_75 0 104718
scaffold_m19_p_76 0 104115
scaffold_m19_p_77 0 98181
scaffold_m19_p_78 0 97820
scaffold_m19_p_79 0 97268
scaffold_m19_p_80 0 91028
scaffold_m19_p_81 0 88811
scaffold_m19_p_82 0 84383
scaffold_m19_p_83 0 84131
scaffold_m19_p_84 0 71161
scaffold_m19_p_85 0 63657
scaffold_m19_p_86 0 60455
scaffold_m19_p_87 0 56930
scaffold_m19_p_88 0 56719
scaffold_m19_p_89 0 55619
scaffold_m19_p_90 0 45504
scaffold_m19_p_91 0 42821
scaffold_m19_p_92 0 15962
```
Entonces como conozco la arquitectura de mis scaffolds de referencia voy a partir el alineamiento de mRNA de a cuerdo con esa arquitectura y así obtendré en mi caso un fasta de RNA por cromosoma (scaffold):
```
module load gcc/9.2.0
module load bwa/0.7.17
module load samtools/1.11
module load bcftools/1.11
```
```
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_1:0-223369599 > scaffold_m19_p_1.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_2:0-217756413 > scaffold_m19_p_2.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_3:0-213723965 > scaffold_m19_p_3.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_4:0-130331158 > scaffold_m19_p_4.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_5:0-111266764 > scaffold_m19_p_5.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_6:0-101075066 > scaffold_m19_p_6.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_7:0-94448911 > scaffold_m19_p_7.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_8:0-94234955 > scaffold_m19_p_8.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_9:0-92776453 > scaffold_m19_p_9.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_10:0-80193900 > scaffold_m19_p_10.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_11:0-78476750 > scaffold_m19_p_11.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_12:0-74216526 > scaffold_m19_p_12.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_13:0-58607551 > scaffold_m19_p_13.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_14:0-55602309 > scaffold_m19_p_14.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_15:0-53400106 > scaffold_m19_p_15.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_16:0-53200087 > scaffold_m19_p_16.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_17:0-47534757 > scaffold_m19_p_17.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_18:0-43537823 > scaffold_m19_p_18.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_19:0-42769290 > scaffold_m19_p_19.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_20:0-25313220 > scaffold_m19_p_20.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_21:0-15591159 > scaffold_m19_p_21.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_22:0-8097369 > scaffold_m19_p_22.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_23:0-7815798 > scaffold_m19_p_23.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_24:0-6554382 > scaffold_m19_p_24.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_25:0-4996645 > scaffold_m19_p_25.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_26:0-3682276 > scaffold_m19_p_26.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_27:0-3557006 > scaffold_m19_p_27.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_28:0-3555632 > scaffold_m19_p_28.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_29:0-3393479 > scaffold_m19_p_29.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_30:0-3173525 > scaffold_m19_p_30.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_31:0-2913534 > scaffold_m19_p_31.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_32:0-2797511 > scaffold_m19_p_32.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_33:0-2461131 > scaffold_m19_p_33.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_34:0-2396235 > scaffold_m19_p_34.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_35:0-2395910 > scaffold_m19_p_35.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_36:0-2315537 > scaffold_m19_p_36.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_37:0-2205510 > scaffold_m19_p_37.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_38:0-2135428 > scaffold_m19_p_38.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_39:0-2080404 > scaffold_m19_p_39.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_40:0-2066723 > scaffold_m19_p_40.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_41:0-1964739 > scaffold_m19_p_41.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_42:0-1691905 > scaffold_m19_p_42.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_43:0-1569096 > scaffold_m19_p_43.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_44:0-1545942 > scaffold_m19_p_44.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_45:0-1459827 > scaffold_m19_p_45.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_46:0-1449802 > scaffold_m19_p_46.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_47:0-1408249 > scaffold_m19_p_47.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_48:0-1089875 > scaffold_m19_p_48.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_49:0-1086583 > scaffold_m19_p_49.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_50:0-1048190 > scaffold_m19_p_50.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_51:0-1038638 > scaffold_m19_p_51.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_52:0-842794 > scaffold_m19_p_52.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_53:0-769791 > scaffold_m19_p_53.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_54:0-738777 > scaffold_m19_p_54.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_55:0-715828 > scaffold_m19_p_55.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_56:0-694283 > scaffold_m19_p_56.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_57:0-679316 > scaffold_m19_p_57.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_58:0-642830 > scaffold_m19_p_58.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_59:0-630193 > scaffold_m19_p_59.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_60:0-626352 > scaffold_m19_p_60.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_61:0-578126 > scaffold_m19_p_61.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_62:0-544785 > scaffold_m19_p_62.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_63:0-522568 > scaffold_m19_p_63.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_64:0-372349 > scaffold_m19_p_64.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_65:0-353007 > scaffold_m19_p_65.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_66:0-319838 > scaffold_m19_p_66.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_67:0-268097 > scaffold_m19_p_67.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_68:0-239972 > scaffold_m19_p_68.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_69:0-213343 > scaffold_m19_p_69.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_70:0-206615 > scaffold_m19_p_70.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_71:0-188461 > scaffold_m19_p_71.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_72:0-147788 > scaffold_m19_p_72.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_73:0-125338 > scaffold_m19_p_73.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_74:0-124669 > scaffold_m19_p_74.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_75:0-104718 > scaffold_m19_p_75.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_76:0-104115 > scaffold_m19_p_76.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_77:0-98181 > scaffold_m19_p_77.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_78:0-97820 > scaffold_m19_p_78.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_79:0-97268 > scaffold_m19_p_79.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_80:0-91028 > scaffold_m19_p_80.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_81:0-88811 > scaffold_m19_p_81.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_82:0-84383 > scaffold_m19_p_82.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_83:0-84131 > scaffold_m19_p_83.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_84:0-71161 > scaffold_m19_p_84.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_85:0-63657 > scaffold_m19_p_85.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_86:0-60455 > scaffold_m19_p_86.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_87:0-56930 > scaffold_m19_p_87.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_88:0-56719 > scaffold_m19_p_88.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_89:0-55619 > scaffold_m19_p_89.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_90:0-45504 > scaffold_m19_p_90.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_91:0-42821 > scaffold_m19_p_91.bam
samtools view -b SRR18652217_sorted.bam scaffold_m19_p_92:0-15962 > scaffold_m19_p_92.bam
```
### 7. Ahora tengo que hacer un pileup de cada region
Copio mi genoma de referencia y le pongo la extensión "fasta" para que no joda más tarde:
```
cp /lustre/scratch/mhoyosro/project1/MSMC2/mMyo/genomes/mMyo.fa
mv mMyo.fa mMyo.fasta
```
Cargo todos los putos modulos que necesitaré
```
module load gcc/9.2.0
module load bwa/0.7.17
module load samtools/1.11
module load bcftools/1.11
```
Hago un pileup de cada cosa:
```
for i in {1..92}; do
bcftools mpileup -f mMyo.fasta scaffold_m19_p_$i.bam > pileup$i.txt
done
```
Este Pileup solo va a mostrar las zonas donde hay alineamiento y mi objetivo es hacer un fasta entonces vamos a hacer un bed con todas las posiciones
### 8. Bed con todas las posiciones
```
nano coverage.sh
```
Ese nano debe contener este código
```
#!/bin/bash
#SBATCH --job-name=coverage
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --partition=nocona
#SBATCH --nodes=1
#SBATCH --ntasks=1
cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4/
module load gcc/10.1.0
module load bedtools2/2.27.1
for i in {1..92}; do
bedtools genomecov -ibam scaffold_m19_p_$i.bam -d > coverage$i.bed
done
```
### 9. ¿Cómo se ven las dos clases archivos?
usemos el caso del scaffold 6
```
head coverage6.bed
```
scaffold_m19_p_6 1 0
scaffold_m19_p_6 2 0
scaffold_m19_p_6 3 0
scaffold_m19_p_6 4 0
scaffold_m19_p_6 5 0
scaffold_m19_p_6 6 0
scaffold_m19_p_6 7 0
scaffold_m19_p_6 8 0
scaffold_m19_p_6 9 0
scaffold_m19_p_6 10 0
```
head -n 200 pileup6.txt
```
scaffold_m19_p_6 64662 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64663 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64664 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64665 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64666 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64667 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64668 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64669 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64670 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64671 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64672 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64673 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64674 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
## 10. Juntar los dos archivos
#Quiero pegar las dos tablas y que esa vaina se vea mas o menos así:
```
scaffold_m19_p_6 1 . N
scaffold_m19_p_6 2 . N
scaffold_m19_p_6 3 . N
scaffold_m19_p_6 4 . N
scaffold_m19_p_6 5 . N
scaffold_m19_p_6 6 . N
scaffold_m19_p_6 7 . N
scaffold_m19_p_6 8 . N
scaffold_m19_p_6 9 . N
scaffold_m19_p_6 10 . N
..
scaffold_m19_p_6 64662 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64663 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64664 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64665 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64666 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64667 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64668 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64669 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64670 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64671 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64672 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64673 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
scaffold_m19_p_6 64674 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13
```
Entonces lo primero es alterar head coverage6.bed
```
for i in {1..92}; do
awk '{print $1"\t"$2"\t.\tN"}' coverage$i.bed > coverage$i_2.bed
done
```
Lo segundo es quitar el encabezado de pileup6.txt
```
for i in {1..92}; do
awk '!/^##/ && !/^#/' pileup$i.txt > clean_pileup$i.txt
done
```
Ahora si a unir ese par de hps
```
for i in {1..92}; do
awk '{ key = $1 FS $2 }; NR == FNR { t[key] = $0; next }; key in t { print t[key]; next }; 1' clean_pileup$i.txt coverage$i_2.bed > finalpileup$i.txt
done
```
### 11. Ahora puedo convertir finalpileup$i.txt en un fasta en el que puedo buscar cosas con blast
#Usar samtools o bcftools aquí seria un problema porque me tiré los encabezados y todo lo demàs es una mamera así que usemos la magia de la IA y Python
Busca ese TX2FSTA2.py en mis archivos, si tienes problemas me dices
```
#!/bin/bash
#SBATCH --job-name=convert
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --partition=nocona
#SBATCH --nodes=1
#SBATCH --ntasks=1
cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4
for i in {1..92}; do
python TX2FSTA2.py finalpileup$i.txt RNA_mapped_no_indels$i.fasta
done
```
### 12. Ahora puedo buscar cosas con blast en cada uno de los scaffolds
Cargo el BLAST con conda
```
. /home/mhoyosro/conda/etc/profile.d/conda.sh
conda activate BLAST
```
Suponiendo que ya tengo la base de datos
```
blastx -query RNA_mapped_no_indels6.fasta -db RepeatPeps -outfmt "6 ppos means qcovs means std qseq sseq qcovs qcovhsp" -out RepeatPeps_no_indels6.txt
```