# BI278 Lab notebook 4
### Generate an MSA (Multiple sequence alignment)
* Looking at gene *tssB* of the baterial Type 6 Secretion System (T6SS)
* Found and downloaded the representative protein sequences from different categories from a database and collected into a multi-fasta file.
`mkdir lab_04` in your home
##### Find the ids of the tssB sequences in the genome annotated with `prokka` in lab3:
`grep tssB path/PROKKA*.tsv` = `grep tssB lab_03/bbqs395/PROKKA*.tsv`
Did not work so can use the files form the course materials
```
cp /courses/bi278/Course_Materials/lab_04/* lab_04/
grep tssB lab_04/PROKKA*.tsv
```
Results:
BH69_01414
BH69_02561
BH69_03639
##### Get the fasta format sequences for the genes using `samtools` software:
`cd lab_04`
`samtools faidx PROKKA*.faa`
##### Get the fasta format sequences for the genes identified as tssB:
`samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639`
Results:

* Send the output to a file using ">"
`samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh69_tssB.faa
`
##### Concatenate tssB sequences with the T6SS data:
`cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa`
##### Align all sequences with `MUSCLE`
`muscle -align tssB_input.faa -output tssB_muscle.afa`
##### Check the order of sequences created by `MUSCLE` and reorder it:
```
grep ">" tssB_input.faa | head
grep ">" tssB_muscle.afa | head
```
Need to reorder it, so use a python script:
`python stable_py3.py tssB_input.faa tssB_muscle.afa | grep ">" | head`
(It needs to be ordered simialr to the `tssB_input.faa` file)
Save it to a file - `python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa`
##### Align with `MAFFT`:
`mafft --maxiterate 3 tssB_input.faa >tssB_mafft.faa`
This does some basic alignments
* You can also align nucleotide sequences instead of protein sequences, you can reverse-translate protein alignments into their codon nucleotide alignments.
* To make a codon alignment from your protein alignment: `perl pal2nal.pl protein_alignment.faa nucleotide.fna -output
fasta -codontable 11`
(You can read more about PAL2NAL and the codon tables it supports here: www.bork.embl.de/pal2nal/)
### Compare MSAs
Can compare the two alignments using a Multiple Sequence Alignment Viewer available from NCBI.
www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog
* In the pop-up window, click on the [Text] option then copy-paste your alignment into the window
`cat tssB_muscle.faa`will print it to the terminal, copy and paste into the window
* Then click [Upload] and then [Close]
* It shows a screen

* Open up a second window and load in the second alignment
`cat tssB_mafft.faa` and copy and pase into window
Similarities between the two MSA:
* Both MSAs found higher alignment without much gaps towards the center of the sequences and the ends have more gaps and lower alignment scores
* The alignments are similar towards the middle, especially slightly to the right of the middle
* They have both inserted gaps in the middle
Differences between the two MSA:
* `MUSCLE` has much more gaps than the `MAFFT` alignment as they seem to have stretched all the sequences so that the start and end lines up regardless of the sequence sizes
* Most of the gaps introduced by `MUSCLE` are towards the start and ends of the sequence and `MAFFT` has gaps distributed throughout the sequence with more gaps towards the end.
* In the middle,`MUSCLE` alignment has one section of gaps and then there is a large portion that aligns well towards the right. As for`MAFFT`, there are two small sections of gaps in the middle and then in the large portion on the right that aligns well, there a small gap and a section that does not align very well
### Generate a gene tree from the MSAs
Make a phylogeny using approximate Maximum Likelihood with `FastTree`
Generate phylogenies form alignments:
```
FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre
FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre
```
Import tree into an online tree viewer: https://itol.embl.de
* Click [Upload] and copy and paste thetree file content into the [Tree text] window
* The [Upload]
* Click on the circular and look at the tree ignoring branch lengths
`MUSCLE` tree

`MAFFT` tree
