# BI278 Lab 4
#### Olivia Schirle
#### 10/04/2022
## 1. Generate an MSA (Multiple Sequence Alignment)
#### MUSCLE and MAFFT are two popular alignment softwares.
##### MUSCLE: https://drive5.com/muscle5/manual/commands.html
##### MAFFT: https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html
### 1.1 Find ids of all tssB sequences in the genome annotated with prokka last week
```
ssh oeschi23@bi278 # Connect to course environment
grep tssB ~/lab_03/bbqs395/PROKKA*.tsv # Did not get any results with my tsv file
mkdir lab_04
cp /courses/bi278/Course_Materials/lab_04/PROKKA* ./lab_04 # Copy files from Professor Noh into this weeks lab folder
grep tssB /courses/bi278/Course_Materials/lab_04/PROKKA*.tsv # Used annotated files from Professor Noh
```
### 1.2 Get the fasta format sequences for these genes. First, index the fasta file.
##### Amino acid sequences are almost always better to use for protein coding genes than DNA sequences, likely because amino acid sequences are more conserved than DNA sequences.
```
cd lab_04
samtools faidx PROKKA*.faa
```
### 1.3 Get the fasta format sequences of the genes identified as tssB
```
samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639
```
### 1.4 Send this into a file
```
samtools faidx PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh69_tssB.faa
```
### 1.5 Concatenate tssB sequences with ones downloaded from T6SS database
```
cp /courses/bi278/Course_Materials/lab_04/t6ss_db.faa . # Copy files downloaded from T6SS database
cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa # concatenate
```
### 1.6 Align all sequences with MUSCLE
```
muscle -align tssB_input.faa -output tssB_muscle.afa
```
### 1.7 Check the behavior of MUSCLE. It changes the order of the sequences in its alignment
```
grep ">" tssB_input.faa | head
grep ">" tssB_muscle.afa | head
```
### 1.8 Use python script to re-order the sequences
```
cp /courses/bi278/Course_Materials/lab_04/stable_py3.py .
python stable_py3.py tssB_input.faa tssB_muscle.afa | grep ">" | head
```
### 1.9 Fix the order and save as a new file
```
python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa
```
### 1.10 Align the sequences with MAFFT
```
mafft --maxiterate 3 tssB_input.faa > tssB_mafft.faa
```
#### Make a codon alignment from the protein alignment:
```
perl pal2nal.pl protein_alignment.faa nucleotide.fna -output
fasta -codontable 11 # Note: I did not actually run this
```
#### codontable 11 is for Bacteria
#### More information about PAL2NAL and the codon tables it supports: http://www.bork.embl.de/pal2nal/
## 2. Compare MSAs
##### Use a Multiple Sequence Alignment Viewer available from NCBI to compare alignments.
### Open the link: http://www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog
### Print the alignment to the screen
```
cat tssB_muscle.faa
```
### 2.11 Click "Text" option as the data source. Copy-paste the alignment into the window.
### 2.12 Click "Upload". When the data has been uploaded, click "Close".
### 2.13 Open a second window and load the other alignment
```
cat tssB_mafft.faa
```
#### Q1.
Both sequences have two long stretches with lots of alignment and a gap in the middle between these regions. Both alignments also have big gaps in the middle and at the end of the sequences. The very ends of most of the sequences have some alignment in both.
#### Q2.
The MUSCLE alignment has larger gaps at the starts and ends of the sequences than the MAFFT alignment. There is also a bigger gap in the middle of the MUSCLE alignment. MAFFT has smaller alignments in the middle, between the two long stretches with lots of alignment, which breaks up the middle gap into two smaller gaps. The sequences are shifted to the left in the MAFFT alignment compared to the MUSCLE alignment and the MAFFT sequences are not as long. The MUSCLE aligment has more gaps, whereas the MAFFT aligments are more condensed.
## 3. Generate a gene tree from your MSA
### 3.14 Generate phylogenies from your alignments
```
FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre
FastTree -lg < tssB_mafft.faa > tssB_mafft_ft.tre
```
### 3.15 Import tree into an online viewer
#### Go to: https://itol.embl.de/
```
cat tssB_muscle_ft.tre
cat tssB_mafft_ft.tre
```
#### Click on "Upload" on the top bar. Copy-paste the tree file content into "Tree text" window. Click "Upload" button.
### 3.16 Look at tree in circular form and ignore branch lengths