# BI278 Lab notebook 4 ### Generate an MSA (Multiple sequence alignment) * Looking at gene *tssB* of the baterial Type 6 Secretion System (T6SS) * Found and downloaded the representative protein sequences from different categories from a database and collected into a multi-fasta file. `mkdir lab_04` in your home ##### Find the ids of the tssB sequences in the genome annotated with `prokka` in lab3: `grep tssB path/PROKKA*.tsv` = `grep tssB lab_03/bbqs395/PROKKA*.tsv` Did not work so can use the files form the course materials ``` cp /courses/bi278/Course_Materials/lab_04/* lab_04/ grep tssB lab_04/PROKKA*.tsv ``` Results: BH69_01414 BH69_02561 BH69_03639 ##### Get the fasta format sequences for the genes using `samtools` software: `cd lab_04` `samtools faidx PROKKA*.faa` ##### Get the fasta format sequences for the genes identified as tssB: `samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639` Results: ![](https://i.imgur.com/rmFL6Ps.png) * Send the output to a file using ">" `samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh69_tssB.faa ` ##### Concatenate tssB sequences with the T6SS data: `cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa` ##### Align all sequences with `MUSCLE` `muscle -align tssB_input.faa -output tssB_muscle.afa` ##### Check the order of sequences created by `MUSCLE` and reorder it: ``` grep ">" tssB_input.faa | head grep ">" tssB_muscle.afa | head ``` Need to reorder it, so use a python script: `python stable_py3.py tssB_input.faa tssB_muscle.afa | grep ">" | head` (It needs to be ordered simialr to the `tssB_input.faa` file) Save it to a file - `python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa` ##### Align with `MAFFT`: `mafft --maxiterate 3 tssB_input.faa >tssB_mafft.faa` This does some basic alignments * You can also align nucleotide sequences instead of protein sequences, you can reverse-translate protein alignments into their codon nucleotide alignments. * To make a codon alignment from your protein alignment: `perl pal2nal.pl protein_alignment.faa nucleotide.fna -output fasta -codontable 11` (You can read more about PAL2NAL and the codon tables it supports here: www.bork.embl.de/pal2nal/) ### Compare MSAs Can compare the two alignments using a Multiple Sequence Alignment Viewer available from NCBI. www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog * In the pop-up window, click on the [Text] option then copy-paste your alignment into the window `cat tssB_muscle.faa`will print it to the terminal, copy and paste into the window * Then click [Upload] and then [Close] * It shows a screen ![](https://i.imgur.com/6SEiWZK.png) * Open up a second window and load in the second alignment `cat tssB_mafft.faa` and copy and pase into window Similarities between the two MSA: * Both MSAs found higher alignment without much gaps towards the center of the sequences and the ends have more gaps and lower alignment scores * The alignments are similar towards the middle, especially slightly to the right of the middle * They have both inserted gaps in the middle Differences between the two MSA: * `MUSCLE` has much more gaps than the `MAFFT` alignment as they seem to have stretched all the sequences so that the start and end lines up regardless of the sequence sizes * Most of the gaps introduced by `MUSCLE` are towards the start and ends of the sequence and `MAFFT` has gaps distributed throughout the sequence with more gaps towards the end. * In the middle,`MUSCLE` alignment has one section of gaps and then there is a large portion that aligns well towards the right. As for`MAFFT`, there are two small sections of gaps in the middle and then in the large portion on the right that aligns well, there a small gap and a section that does not align very well ### Generate a gene tree from the MSAs Make a phylogeny using approximate Maximum Likelihood with `FastTree` Generate phylogenies form alignments: ``` FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre ``` Import tree into an online tree viewer: https://itol.embl.de * Click [Upload] and copy and paste thetree file content into the [Tree text] window * The [Upload] * Click on the circular and look at the tree ignoring branch lengths `MUSCLE` tree ![](https://i.imgur.com/Ho0ZGrg.png) `MAFFT` tree ![](https://i.imgur.com/rrDm08h.png)