# BI278 Lab notebook 7 ### andi (ANchor DIstances) andi identifies long exact matches (called anchors) to use as homologous segments between two genomes. Then it looks at the mismatched regions bracketed by these anchors and estimates a distance matrix. It is ideally used for very closely related genomes (strains within the same species and closely related species within the same genus). Make directory and make it the working directory- lab_07 Prepare an input file that contains all of our genomes that we want to build a phylogeny with. Use the input files in the Course_Materials folder `ls /courses/bi278/Course_Materials/lab_07/*.*a > andi_inputs.txt` Run andi ``` andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist ``` andi will ignore filenames after the first “.”(period), so rename your files accordingly #### IQ-TREE for phylogenies IQ-TREE is a powerful phylogenetics software that estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihood. Its most convenient feature is that it tests a large range of phylogenetic (e.g. substitution) models for you, then outputs a tree best of which model appears to be most appropriate for your data. Ran orthofinder with the same taxa used for andi. These files are in: `/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa` Identify orthogroups with genes present from each genome. Want the rows where orthologous groups share a common ancestor at node 0 (zero) and each genome has at least one gene member in the group. ``` awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!="" && $8!="" && $9!="" && $10!="" && $16!="" &&$17!="" && $18!="" && $19!="" && $20!="" && $21!=""' /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylogenetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt ``` See how many genes are included in each orthogroup from above. If there are only single copy genes, then the number of ,genes should be the same as the number of genomes included in the orthofinder analysis (18). `sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}' ` ##### Investigatee OG0000084 The amino acid sequences for all members of this orthogroup have been collected here: `/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000084.fa` Generate a multiple sequence alignment (MSA) using `MAFFT` ``` mafft --maxiterate 3 /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000084.fa > OG0000084_mafft.faa ``` Use the MSA as input for `iqtree` `iqtree -s OG0000084_mafft.faa` It will test all the protein substitution models it has, pick the best one for my data, do tree searching, etc, and will print a detailed log to the screen. The tree is in this file: `OG0000084_mafft.faa.treefile` See a preview of the tree along with a bunch of other information regarding the best protein substitution model, in the file: `OG0000084_mafft.faa.iqtree` ### Trees in R Look at and comapre the trees in R The two main packages for phylogenetics in R are ape: `Analyses of Phylogenetics and Evolution and phytools`. Copy the trees generated by `orthofinder` into working directory: ``` cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt . cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt . ``` Log on to RStudio at: [bi278.colby.edu](https://bi278.colby.edu) Load our phylogenetics packages. `library(ape)` `library(phytools)` Take the distance matrix generated by andi and make a neighbor-joining tree from it: ``` andi.dist <- read.table("/home2/tywang23/lab_07/andi_output.dist", header=F, row.names = 1, skip=1) colnames(andi.dist) <- colnames(andi.dist) <- rownames(andi.dist) ``` Infer tree from distance matrix: `andi.tree <- nj(as.dist(andi.dist))` Look at the tree: `plot(andi.tree)` ![](https://i.imgur.com/6GuT7w0.png) * Check what tip labels are: `andi.tree$tip.label` * Substitute patterns that have to do with method or source ``` temp <- gsub(andi.tree$tip.label, pattern = "_nanoassembly", replacement = " ") > temp2 <- gsub(temp, pattern="_ncbi", replacement = " ") ``` * Directly substitute individual elements: `temp2[1]<- "bh69"` * Can directly substitute ranges of elements: `temp2[15:18] <- c("Pfungorum", "Psprentiae", "Pterrae", "Pxenovarans")` * last thing to clean up that was inconsistent across names `temp3 <- gsub(temp2, pattern="qs", replacement="")` * Replace the original labels `andi.tree$tip.label <- temp3` * Clean up unnecessary files `rm(temp, temp2, temp3)` * Look at the tree again with node numbers: ``` plot(andi.tree) nodelabels() ``` ![](https://i.imgur.com/gQAV3Lb.png) * Two outgroups *B. mallei* and *B. pseudomallei* are joined at the same node * Root the tree with the outgroup node, then look at it again: ``` rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T) plot(rooted.andi, cex=1, label.offset=0.005) ``` ![](https://i.imgur.com/4ROdmUZ.png) **Look at the species tree outputted by `orthofinder`:** ``` ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt") plot(ortho.tree) ``` ![](https://i.imgur.com/aORPFZ2.png) * fix tip labels: ``` ortho.tree$tip.label temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="") temp2 <- gsub(temp, pattern="qs", replacement="") ortho.tree$tip.label <- temp2 rm(temp, temp2) ``` * Root the tree ``` plot(ortho.tree) nodelabels() ``` ![](https://i.imgur.com/37abRR7.png) ``` rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T) plot(rooted.ortho, cex=1, label.offset=0.005) ``` ![](https://i.imgur.com/ozHgqbD.png) * Rotate the nodes of this tree so the tips match more closely the tips of the tree from andi ``` rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label) plot(rotate.ortho) ``` ![](https://i.imgur.com/I5FbmGG.png) **Compare how similar these trees are** * Compare topology (branching pattern) `all.equal(andi.tree, ortho.tree, use.edge.length=FALSE)` * Plot both trees side by side `cophyloplot(rooted.andi, rotate.ortho)` (It is returning some errors and not properly rotating) ![](https://i.imgur.com/YJWIGqB.png) The trees are not in the same topo;ogy so it is not matching. **Compare gene trees** * the tree I generated with IQ-TREE `iq.tree <- read.tree("/home2/tywang23/lab_07/OG0000084_mafft.faa.treefile")` * check and root this tree ``` plot(iq.tree) nodelabels() ``` ![](https://i.imgur.com/F8Htexk.png) ``` rooted.iq84 <- root.phylo(iq.tree, node=33, resolve.root=T) plot(rooted.iq84, cex=0.75, label.offset=0.005) ``` ![](https://i.imgur.com/zx7QqOU.png) * the gene tree from orthofinder `gene.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt")` * check and root this tree too ``` plot(gene.tree) nodelabels() ``` ![](https://i.imgur.com/1ufscow.png) ``` rooted.or84 <- root.phylo(gene.tree, node=55, resolve.root=T) plot(rooted.or84, cex=0.75, label.offset=0.005) ``` ![](https://i.imgur.com/1eK57zS.png) * Rotate and compare ``` rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or84$tip.label) all.equal(rooted.or84, rooted.iq84, use.edge.length=FALSE) cophyloplot(rooted.or84, rotate.iq84) ``` ![](https://i.imgur.com/koJTkyn.png) The trees are similar but not the same. Save the cleaned up species trees as text files in case you want to use it in the future. ``` write.tree(rooted.andi, file="PATH/andi_cleaned.tre") write.tree(rotate.ortho, file="PATH/ortho_cleaned.tre") ```