# Notebook 4 Genomic Lab `grep tssB PATH/PROKKA*.tsv` get the pattern of tssB in the tsv file results would be several lines containing genes. `samtools faidx PATH/PROKKA*.faa` Using samtools to index the faa file. `samtools faidx PATH/PROKKA*.faa BH69_01414 BH69_02561 BH69_03639` Using samtools to find out the sequence of genes found from tsv file. `samtools faidx PATH/PROKKA*.faa BH69_01414 BH69_02561 BH69_03639 > bh69_tssB.faa` Using samtools to send the information to a new created file. `cat t6ss_db.faa bh69_tssB.faa > tssB_input.faa` Concatenate the t6ss database downloaded to the tssB gene sequence. `muscle -align tssB_input.faa -output tssB_muscle.afa` Align all sequences with muscle. Since muscle will change the order of sequences, need to correct it. `grep ">" tssB_input.faa | head grep ">" tssB_muscle.afa | head` Check what happened to the order. `python stable_py3.py tssB_input.faa tssB_muscle.afa > tssB_muscle.faa` The python file reorder it. `mafft --maxiterate 3 tssB_input.faa > tssB_mafft.faa` Another align method called mafft. `perl pal2nal.pl protein_alignment.faa nucleotide.fna -output fasta -codontable 11` Codon alignment from protein alignment. ## Compare alignments: http://www.ncbi.nlm.nih.gov/projects/msaviewer/?appname=ncbi_msav&openuploaddialog Get into this website. Use cat command to get the sequence of alignemnt file. Copy paste one to the text tab in the website. Upload it. The alignment result will appear after close the tab. Do the same thing for the other alignment and compare. #### Q1. Describe at least 2 major similarities between your MUSCLE vs. MAFFT alignments. What would you assume about the regions that are similar across your alignments? One similarity is that they have similar patterns for the shapely amino acid colors, and the pattern here means that the gap lengths, amount and segments lengths are similar. The other similarity is that generally, the part that are aligned in one alignment are also aligned in the other alignment. For regions similar across alignments, I think the maybe very similar to each other so that they can be similar with different alignment methods. #### Q2. Describe at least 3 major differences between your MUSCLE vs. MAFFT alignments.Focus on how the starts and ends of your sequences are treated, and where gaps are introduced to make the alignment work across all your sequences. One difference is that they have different amount of residues shown. One is 227, and one 244. The second difference is that the muscle alignment has a lot of individual short parts in the starts of the sequence while the mafft has parts more intact and long. The third difference is that the muscle has a longer gap between segments in the end of sequences than mafft. ## Generate Gene Tree `FastTree -lg < tssB_muscle.faa > tssB_muscle_ft.tre FastTree -lg < tssB_mafft.faa > tssB_mafft_ft.tre` Generate phylogenies from alignments. Then, use itol.embl.de (online tree viewer) to check the tree produced. Click upload in the top bar of the website. Copy paste the content of tree file into the text entry. Then upload. After getting the tree, select circular and ignore branch length make the reading easier.