# Lab 4: Multiple Sequence Alignment Kirsten Pastore Exercise 1. Generate an MSA ``` #ssh into bi278 ssh klpast23@bi278 #create lab_04 directory in ~ mkdir lab_04 ``` compare sequences of a conserved component gene tssB of the bacterial Type 6 Secretion System (T6SS) in multi-fasta file: represenative protein sequences and multi-fasta also contain protein sequences from T6SS from close relatives 1. Find the ids of all tssB sequences in your new genome that you annotated with prokka last week ``` #copy files into my lab_04 folder cp /courses/bi278/Course_Materials/lab_04/Prokka* ~/lab_04/ grep tssB PROKKA*.tsv returns BH69_01414 BH69_02561 BH69_03639 ``` 2. Get the fasta format sequences for these genes samtools: index the fasta file so that you can find individual genes ``` #in ~/lab_04/ directory samtools faidx PROKKA*.faa ``` 3. Get the fasta format sequences of the genes that were identified as tssB ``` samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 returns amino acid sequence ``` 4. Send this to another file (use ">" operator ) ``` samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh6_tssB.faa ``` 5. concatenate tssB sequences with the ones downloaded from the T6SS database ``` #cp downloaded sequences int ~/lab_04/ cp /courses/bi278/Course_Materials/lab_04/t6ss_db.faa ~/lab_04/ #cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa ``` 6. align all the sequences with MUSCLE ``` muscle -align tssB_input.faa -output tssB_muscle.afa ``` 7. check the behavior of MUSCLE ``` grep ">" tssB_input.faa | head grep ">" tssB_muscle.afa | head #order of the sequences changed ``` 8. use python script to re-order the sequences - check that it works first ``` #copy python script to ~/lab_04/ cp /courses/bi278/Course_Materials/lab_04/stable_py3.py ~/lab_04/ #check python script python stable_py3.py tssB_input.faa tssB_muscle.afa | grep ">" | head ``` 9. fix the order and save it to a new file (name change from afa to faa) ``` python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa ``` 10. align with MAFT ``` #mafft does not have issue corrected for in MUSCLE mafft --maxiterate 3 tssB_input.faa > tssB_mafft.faa ``` note: can align borth nt and protein sequences -- rare that would give a more informative result for protein coding genes ``` #to make codon alignment for your protein alignment perl pal2nal.pl protein_alignment.faa nucleotide.fna -output fasta -codontable 11 ``` Exercise 2. Compare MSAs 11. can compare two alignments using a Multiple Sequence Alignment Viewer (NCBI) link: https://www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog click on the [Text] option as data sources and copy-paste your alignment into the windown on the right ``` #prints alignment cat tssB_muscle.faa #copy and paste into MSAV ``` 12. upload and when data uploaded, click close 13. Open a second window and load other alignment ``` #prints alignment cat tssB_mafft.faa #copy and paste into MSAV ``` Q1. Describe 2 major similarities between MUSCLE and MAFFT Both MUSCLE and MAFFT have gaps near the end. They also both have a gap near the middle (MAFFT: approx 105-110) and (MUSCLE: approx 115-125), with two stretches of alignment on either side Q2. Describe 3 major differences between MUCLE and MAFFT MUSCLE has more gaps at the begining than MAFFT. Generally, MAFFT starts at amino acid 8 (with exceptions), while MUSCLE file (generally) shows amino acids at 9, 11, and 17, then therre isa gab at position 29. The gap in the middle for the MAFFT file is between 105 and 110, while the MUSCLE file is between 115 and 125. The gap in alignment is further down the alignment sequence in the MUSCLE file. There is consistently amino acids in the last couple of positions in the MUSCLE file. This is not the case in the MAFFT file. Exercise 3. Generate a Gene Tree From Your MSA 14. Generate Phylogenies from your alignments ``` FastTree -lg <tssB_muscle.faa > tssB_muscle_ft.tre FastTree -lg <tssB_mafft.faa > tssB_mafft_ft.tre ``` 15. import to online tree viewer go to: itol.embl.de click upload ``` #get tree file to copy and paste into tree viewer cat tssB_muscle_ft.tre ``` 16. look at tree in circular form click around and check to see whether sequences group with other sequences from a close relative