# Pasar todas las READS a fasta para hacer una busqueda en ellas ## Estamos haiendo este procedimiento para hacer búsquedas de BLAST en las secciones de un cromosoma que se alinean con las reads de un experimento de secuenciación de mRNA ### 1. Set Directories: ``` mkdir -p /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4 cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4 ``` ### 2. Get the genomes: ``` export PATH=/lustre/work/mhoyosro/software/sratoolkit/bin:${PATH} ``` I am going to explore this sequencing experiment from Texas A&M University ``` prefetch --max-size 99999999999 SRR18652217 ``` As a reference I am going to use Genome assembly mMyoMyo1.p from Bat1k August 2020 ``` curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_014108235.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT" ``` ### 3. Unpack the genomes: ``` fastq-dump SRR18652217.sra ``` ### 4. Convert to FASTA: ``` . /home/mhoyosro/conda/etc/profile.d/conda.sh conda activate BLAST conda install bioconda::seqtk seqtk seq -A SRR18652217.fastq > SRR18652217.fasta ``` ### 5. Map the samples against the reference genome: Load the necessary modules ``` module load gcc/9.2.0 module load bwa/0.7.17 module load samtools/1.11 reference="/lustre/scratch/mhoyosro/project1/MSMC2/mMyo/genomes/mMyo.fa" bwa mem $reference SRR18652217.fasta > SRR18652217.sam ``` pasar a bam ``` samtools view -bS SRR18652217.sam > SRR18652217.bam ``` indexar el bam ``` samtools index SRR18652217.bam ``` ### 6. Arquitectura de los scaffolds Bueno aca la idea es explicar como estan organizados los scaffolds de mi referencia. EN MI CASO, mi referencia tien 92 scaffolds. Cada scaffold en mi referencia se llama scaffold_m19_p_ y cada uno tiene un indicador de número, por ejemplo el primer scaffold se llama scaffold_m19_p_1. La longitud de cada scaffold la debo conocer, por ejemplo el primer scaffold va desde la posicion 0 hasta la posicion 223369599. Un resumen del comportamiento de todos los scaffolds de mi referencia se ve a continuación: ``` scaffold_m19_p_1 0 223369599 scaffold_m19_p_2 0 217756413 scaffold_m19_p_3 0 213723965 scaffold_m19_p_4 0 130331158 scaffold_m19_p_5 0 111266764 scaffold_m19_p_6 0 101075066 scaffold_m19_p_7 0 94448911 scaffold_m19_p_8 0 94234955 scaffold_m19_p_9 0 92776453 scaffold_m19_p_10 0 80193900 scaffold_m19_p_11 0 78476750 scaffold_m19_p_12 0 74216526 scaffold_m19_p_13 0 58607551 scaffold_m19_p_14 0 55602309 scaffold_m19_p_15 0 53400106 scaffold_m19_p_16 0 53200087 scaffold_m19_p_17 0 47534757 scaffold_m19_p_18 0 43537823 scaffold_m19_p_19 0 42769290 scaffold_m19_p_20 0 25313220 scaffold_m19_p_21 0 15591159 scaffold_m19_p_22 0 8097369 scaffold_m19_p_23 0 7815798 scaffold_m19_p_24 0 6554382 scaffold_m19_p_25 0 4996645 scaffold_m19_p_26 0 3682276 scaffold_m19_p_27 0 3557006 scaffold_m19_p_28 0 3555632 scaffold_m19_p_29 0 3393479 scaffold_m19_p_30 0 3173525 scaffold_m19_p_31 0 2913534 scaffold_m19_p_32 0 2797511 scaffold_m19_p_33 0 2461131 scaffold_m19_p_34 0 2396235 scaffold_m19_p_35 0 2395910 scaffold_m19_p_36 0 2315537 scaffold_m19_p_37 0 2205510 scaffold_m19_p_38 0 2135428 scaffold_m19_p_39 0 2080404 scaffold_m19_p_40 0 2066723 scaffold_m19_p_41 0 1964739 scaffold_m19_p_42 0 1691905 scaffold_m19_p_43 0 1569096 scaffold_m19_p_44 0 1545942 scaffold_m19_p_45 0 1459827 scaffold_m19_p_46 0 1449802 scaffold_m19_p_47 0 1408249 scaffold_m19_p_48 0 1089875 scaffold_m19_p_49 0 1086583 scaffold_m19_p_50 0 1048190 scaffold_m19_p_51 0 1038638 scaffold_m19_p_52 0 842794 scaffold_m19_p_53 0 769791 scaffold_m19_p_54 0 738777 scaffold_m19_p_55 0 715828 scaffold_m19_p_56 0 694283 scaffold_m19_p_57 0 679316 scaffold_m19_p_58 0 642830 scaffold_m19_p_59 0 630193 scaffold_m19_p_60 0 626352 scaffold_m19_p_61 0 578126 scaffold_m19_p_62 0 544785 scaffold_m19_p_63 0 522568 scaffold_m19_p_64 0 372349 scaffold_m19_p_65 0 353007 scaffold_m19_p_66 0 319838 scaffold_m19_p_67 0 268097 scaffold_m19_p_68 0 239972 scaffold_m19_p_69 0 213343 scaffold_m19_p_70 0 206615 scaffold_m19_p_71 0 188461 scaffold_m19_p_72 0 147788 scaffold_m19_p_73 0 125338 scaffold_m19_p_74 0 124669 scaffold_m19_p_75 0 104718 scaffold_m19_p_76 0 104115 scaffold_m19_p_77 0 98181 scaffold_m19_p_78 0 97820 scaffold_m19_p_79 0 97268 scaffold_m19_p_80 0 91028 scaffold_m19_p_81 0 88811 scaffold_m19_p_82 0 84383 scaffold_m19_p_83 0 84131 scaffold_m19_p_84 0 71161 scaffold_m19_p_85 0 63657 scaffold_m19_p_86 0 60455 scaffold_m19_p_87 0 56930 scaffold_m19_p_88 0 56719 scaffold_m19_p_89 0 55619 scaffold_m19_p_90 0 45504 scaffold_m19_p_91 0 42821 scaffold_m19_p_92 0 15962 ``` Entonces como conozco la arquitectura de mis scaffolds de referencia voy a partir el alineamiento de mRNA de a cuerdo con esa arquitectura y así obtendré en mi caso un fasta de RNA por cromosoma (scaffold): ``` module load gcc/9.2.0 module load bwa/0.7.17 module load samtools/1.11 module load bcftools/1.11 ``` ``` samtools view -b SRR18652217_sorted.bam scaffold_m19_p_1:0-223369599 > scaffold_m19_p_1.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_2:0-217756413 > scaffold_m19_p_2.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_3:0-213723965 > scaffold_m19_p_3.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_4:0-130331158 > scaffold_m19_p_4.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_5:0-111266764 > scaffold_m19_p_5.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_6:0-101075066 > scaffold_m19_p_6.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_7:0-94448911 > scaffold_m19_p_7.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_8:0-94234955 > scaffold_m19_p_8.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_9:0-92776453 > scaffold_m19_p_9.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_10:0-80193900 > scaffold_m19_p_10.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_11:0-78476750 > scaffold_m19_p_11.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_12:0-74216526 > scaffold_m19_p_12.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_13:0-58607551 > scaffold_m19_p_13.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_14:0-55602309 > scaffold_m19_p_14.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_15:0-53400106 > scaffold_m19_p_15.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_16:0-53200087 > scaffold_m19_p_16.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_17:0-47534757 > scaffold_m19_p_17.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_18:0-43537823 > scaffold_m19_p_18.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_19:0-42769290 > scaffold_m19_p_19.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_20:0-25313220 > scaffold_m19_p_20.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_21:0-15591159 > scaffold_m19_p_21.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_22:0-8097369 > scaffold_m19_p_22.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_23:0-7815798 > scaffold_m19_p_23.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_24:0-6554382 > scaffold_m19_p_24.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_25:0-4996645 > scaffold_m19_p_25.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_26:0-3682276 > scaffold_m19_p_26.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_27:0-3557006 > scaffold_m19_p_27.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_28:0-3555632 > scaffold_m19_p_28.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_29:0-3393479 > scaffold_m19_p_29.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_30:0-3173525 > scaffold_m19_p_30.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_31:0-2913534 > scaffold_m19_p_31.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_32:0-2797511 > scaffold_m19_p_32.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_33:0-2461131 > scaffold_m19_p_33.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_34:0-2396235 > scaffold_m19_p_34.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_35:0-2395910 > scaffold_m19_p_35.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_36:0-2315537 > scaffold_m19_p_36.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_37:0-2205510 > scaffold_m19_p_37.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_38:0-2135428 > scaffold_m19_p_38.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_39:0-2080404 > scaffold_m19_p_39.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_40:0-2066723 > scaffold_m19_p_40.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_41:0-1964739 > scaffold_m19_p_41.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_42:0-1691905 > scaffold_m19_p_42.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_43:0-1569096 > scaffold_m19_p_43.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_44:0-1545942 > scaffold_m19_p_44.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_45:0-1459827 > scaffold_m19_p_45.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_46:0-1449802 > scaffold_m19_p_46.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_47:0-1408249 > scaffold_m19_p_47.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_48:0-1089875 > scaffold_m19_p_48.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_49:0-1086583 > scaffold_m19_p_49.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_50:0-1048190 > scaffold_m19_p_50.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_51:0-1038638 > scaffold_m19_p_51.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_52:0-842794 > scaffold_m19_p_52.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_53:0-769791 > scaffold_m19_p_53.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_54:0-738777 > scaffold_m19_p_54.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_55:0-715828 > scaffold_m19_p_55.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_56:0-694283 > scaffold_m19_p_56.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_57:0-679316 > scaffold_m19_p_57.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_58:0-642830 > scaffold_m19_p_58.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_59:0-630193 > scaffold_m19_p_59.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_60:0-626352 > scaffold_m19_p_60.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_61:0-578126 > scaffold_m19_p_61.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_62:0-544785 > scaffold_m19_p_62.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_63:0-522568 > scaffold_m19_p_63.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_64:0-372349 > scaffold_m19_p_64.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_65:0-353007 > scaffold_m19_p_65.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_66:0-319838 > scaffold_m19_p_66.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_67:0-268097 > scaffold_m19_p_67.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_68:0-239972 > scaffold_m19_p_68.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_69:0-213343 > scaffold_m19_p_69.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_70:0-206615 > scaffold_m19_p_70.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_71:0-188461 > scaffold_m19_p_71.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_72:0-147788 > scaffold_m19_p_72.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_73:0-125338 > scaffold_m19_p_73.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_74:0-124669 > scaffold_m19_p_74.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_75:0-104718 > scaffold_m19_p_75.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_76:0-104115 > scaffold_m19_p_76.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_77:0-98181 > scaffold_m19_p_77.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_78:0-97820 > scaffold_m19_p_78.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_79:0-97268 > scaffold_m19_p_79.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_80:0-91028 > scaffold_m19_p_80.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_81:0-88811 > scaffold_m19_p_81.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_82:0-84383 > scaffold_m19_p_82.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_83:0-84131 > scaffold_m19_p_83.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_84:0-71161 > scaffold_m19_p_84.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_85:0-63657 > scaffold_m19_p_85.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_86:0-60455 > scaffold_m19_p_86.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_87:0-56930 > scaffold_m19_p_87.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_88:0-56719 > scaffold_m19_p_88.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_89:0-55619 > scaffold_m19_p_89.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_90:0-45504 > scaffold_m19_p_90.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_91:0-42821 > scaffold_m19_p_91.bam samtools view -b SRR18652217_sorted.bam scaffold_m19_p_92:0-15962 > scaffold_m19_p_92.bam ``` ### 7. Ahora tengo que hacer un pileup de cada region Copio mi genoma de referencia y le pongo la extensión "fasta" para que no joda más tarde: ``` cp /lustre/scratch/mhoyosro/project1/MSMC2/mMyo/genomes/mMyo.fa mv mMyo.fa mMyo.fasta ``` Cargo todos los putos modulos que necesitaré ``` module load gcc/9.2.0 module load bwa/0.7.17 module load samtools/1.11 module load bcftools/1.11 ``` Hago un pileup de cada cosa: ``` for i in {1..92}; do bcftools mpileup -f mMyo.fasta scaffold_m19_p_$i.bam > pileup$i.txt done ``` Este Pileup solo va a mostrar las zonas donde hay alineamiento y mi objetivo es hacer un fasta entonces vamos a hacer un bed con todas las posiciones ### 8. Bed con todas las posiciones ``` nano coverage.sh ``` Ese nano debe contener este código ``` #!/bin/bash #SBATCH --job-name=coverage #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err #SBATCH --partition=nocona #SBATCH --nodes=1 #SBATCH --ntasks=1 cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4/ module load gcc/10.1.0 module load bedtools2/2.27.1 for i in {1..92}; do bedtools genomecov -ibam scaffold_m19_p_$i.bam -d > coverage$i.bed done ``` ### 9. ¿Cómo se ven las dos clases archivos? usemos el caso del scaffold 6 ``` head coverage6.bed ``` scaffold_m19_p_6 1 0 scaffold_m19_p_6 2 0 scaffold_m19_p_6 3 0 scaffold_m19_p_6 4 0 scaffold_m19_p_6 5 0 scaffold_m19_p_6 6 0 scaffold_m19_p_6 7 0 scaffold_m19_p_6 8 0 scaffold_m19_p_6 9 0 scaffold_m19_p_6 10 0 ``` head -n 200 pileup6.txt ``` scaffold_m19_p_6 64662 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64663 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64664 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64665 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64666 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64667 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64668 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64669 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64670 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64671 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64672 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64673 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64674 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 ## 10. Juntar los dos archivos #Quiero pegar las dos tablas y que esa vaina se vea mas o menos así: ``` scaffold_m19_p_6 1 . N scaffold_m19_p_6 2 . N scaffold_m19_p_6 3 . N scaffold_m19_p_6 4 . N scaffold_m19_p_6 5 . N scaffold_m19_p_6 6 . N scaffold_m19_p_6 7 . N scaffold_m19_p_6 8 . N scaffold_m19_p_6 9 . N scaffold_m19_p_6 10 . N .. scaffold_m19_p_6 64662 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64663 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64664 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64665 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64666 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64667 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64668 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64669 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64670 . C <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64671 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64672 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64673 . A <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 scaffold_m19_p_6 64674 . T <*> 0 . DP=1;I16=1,0,0,0,255,65025,0,0,13,169,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,13 ``` Entonces lo primero es alterar head coverage6.bed ``` for i in {1..92}; do awk '{print $1"\t"$2"\t.\tN"}' coverage$i.bed > coverage$i_2.bed done ``` Lo segundo es quitar el encabezado de pileup6.txt ``` for i in {1..92}; do awk '!/^##/ && !/^#/' pileup$i.txt > clean_pileup$i.txt done ``` Ahora si a unir ese par de hps ``` for i in {1..92}; do awk '{ key = $1 FS $2 }; NR == FNR { t[key] = $0; next }; key in t { print t[key]; next }; 1' clean_pileup$i.txt coverage$i_2.bed > finalpileup$i.txt done ``` ### 11. Ahora puedo convertir finalpileup$i.txt en un fasta en el que puedo buscar cosas con blast #Usar samtools o bcftools aquí seria un problema porque me tiré los encabezados y todo lo demàs es una mamera así que usemos la magia de la IA y Python Busca ese TX2FSTA2.py en mis archivos, si tienes problemas me dices ``` #!/bin/bash #SBATCH --job-name=convert #SBATCH --output=%x.%j.out #SBATCH --error=%x.%j.err #SBATCH --partition=nocona #SBATCH --nodes=1 #SBATCH --ntasks=1 cd /lustre/scratch/mhoyosro/project3/FUSION/Mmyo4 for i in {1..92}; do python TX2FSTA2.py finalpileup$i.txt RNA_mapped_no_indels$i.fasta done ``` ### 12. Ahora puedo buscar cosas con blast en cada uno de los scaffolds Cargo el BLAST con conda ``` . /home/mhoyosro/conda/etc/profile.d/conda.sh conda activate BLAST ``` Suponiendo que ya tengo la base de datos ``` blastx -query RNA_mapped_no_indels6.fasta -db RepeatPeps -outfmt "6 ppos means qcovs means std qseq sseq qcovs qcovhsp" -out RepeatPeps_no_indels6.txt ```