# BI278 Lab #4 Notes ## Exercise 1 I ran `grep tssB lab_03/bhqs69/PROKKA*.tsv` but it didn't show anything, so I copied professor Noh's PROKKA files by using `cp /courses/bi278/Course_Materials/lab_04/PROKKA_09242022.tsv lab_04` and `cp /courses/bi278/Course_Materials/lab_04/PROKKA_09242022.faa lab_04` Then, I used `grep tssB lab_04/PROKKA*.tsv` which gave me BH69_01414, BH69_02561, and BH69_03639 I used `samtools faidx lab_04/PROKKA*.faa` to index the file. Then I used `samtools faidx PATH/PROKKA*.faa BH69_01414 BH69_02561 BH69_03639` which resulted in giving us sequences. I ran `samtools faidx lab_04/PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh69_tssB.faa` which send them into a file copy t6ss file from the course directory by using `cp /courses/bi278/Course_Materials/lab_04/t6ss_db.faa lab_04` Now that I have t6ss file as well, `cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa` but it didn't work because my bh69 file was in home So, I had to mve bh69 file by using `mv bh69_tssB.faa lab_04`, then move myself into lab_04 by `cd lab_04`, then running cat command worked. Then, I ran `muscle -align tssB_input.faa -output tssB_muscle.afa` to align all the sequences Apparently, muscle changes the order of sequences when align things, so I had to run `grep ">" tssB_input.faa | head` and `grep ">" tssB_muscle.afa | head` to reorder I had to copy `cp /courses/bi278/Course_Materials/lab_04/stable_py3.py` then run `python stable_py3.py tssB_input.faa tssB_muscle.afa | grep ">"| head` Next, I used `python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa` to reorder and save as the new file Next, I ran `mafft --maxiterate 3 tssB_input.faa > tssB_mafft.faa` to align with MAFFT We can use `perl pal2nal.pl protein_alignment.faa nucleotide.fna -output fasta -codontable 11` to make a codon alignment from our protein alignment in future project. ## Exercise 2 I launched the website www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog to upload my muscle and MAFFT file. To look at the file, I used `cat` command. **Q1. Describe at least 2 major similarities between your MUSCLE vs. MAFFT alignments. What would you assume about the regions that are similar across your alignments?** They both have gaps at similar areas orund 100-120 and 180-200. The order of amino acids are similar in large scale. They represent genes that are the same or similar. **Q2. Describe at least 3 major differences between your MUSCLE vs. MAFFT alignments. Focus on how the starts and ends of your sequences are treated, and where gaps are introduced to make the alignment work across all your sequences.** In muscle, there is a gap across all the alignments from the position of 1-10 whereas MAFFT doesn't have that gap.The length of allignments in MUSCLE is past 240, whereas in MAFFT they all end by 220. In MUSCLE the alignments seem to have more gaps at the beggining, middle, and the end while the alignments of MAAFT seem to have those gaps reduced or deleted. ## Exercise 3 used `FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre FastTree -lg < tssB_mafft.faa > tssB_mafft_ft.tre` to generate tree file. MUSCLE tree ![](https://i.imgur.com/awz56yd.png) FAAST tree ![](https://i.imgur.com/63k3RYb.png) 12/10 trees