# Lab 7: Phylogenetics `ssh` onto bi278 `mkdir labwk7` ## Exercise 1: andi (ANchor DIstances) "What andi does is identify long exact matches (called anchors) to use as homologous segments between two genomes. Then it looks at the mismatched regions bracketed by these anchors and estimates a distance matrix" 1. **Prepare an input file** This contains all of our genomes that we want to build a phylogeny with. `ls /courses/bi278/Course_Materials/lab_07/* > andi_inputs.txt` 2. **Next, run andi** `andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist` Note: andi will ignore filenames after the first period (.) so it might be needed to rename the files ## Exercise 2: IQ-TREE for phylogenetics "IQ-TREE is a powerful phylogenetics software that estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihood. Its most convenient feature is that it tests a large range of phylogenetic (e.g. substitution) models for you, then outputs a tree best of which model appears to be most appropriate for your data." 3. **Look at the orthogroups** These were rerun by orthofinder with the same taxa used for andi `cd /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa` 4. **Identify orthogroups** The goal is to have all the rows where orthologous groups share a common ancestor at node 0 (zero) and each genome has at least one gene member in the group. `awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!="" && $8!="" && $9!="" && $10!="" && $11!="" && $12!="" && $13!="" && $14!="" && $15!="" && $16!="" && $17!="" && $18!="" && $19!="" && $20!="" && $21!=""' /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylogenetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt` I got a result of 2256 matching orthogroups (although I'm not sure where this number came from) 5. **Identify # genes in orthogroup** If there are only single copy genes, then the number of genes should be the same as the number of genomes included in the orthofinder analysis (18). `sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}'` 6. **Choose one to investigate in more detail** In this case, I chose HOG0000412. We want to compare the gene tree to the species tree. 7. **Copy amino acid sequence** I don't know if this was necessary ornot but I copied the amino acid sequence of the orthogroup I decided to investigate into my labwk7 directory `cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000412.fa .` 8. **Generate a MSA (Multiple Sequence Alignment)** Use mafft `mafft --maxiterate 3 /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000412.fa > OG0000412_mafft.faa` 9. **Input iqtree** Use the MSA as input for iqtree `iqtree -s OG0000412_mafft.faa` This will test all the protein substitution models it has and print a detailed log to the screen. Full tree: OG0000084_mafft.faa.treefile Preview: OG0000084_mafft.faa.iqtree ## Exercise 3: Trees in R 10. **Copy the Orthofinder trees** I already have the IQ-tree from exercise 2, but now we want to copy the trees that orthofinder generates into our working directory so we have all the trees in the same place `cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000412_tree.txt .` (gene tree) and `cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt .` (species tree) 11. **Enter R studio** Note on R: All of the following commands can be entered into the script section (upper left) on the RStudio view and executed from this same section by leaving your cursor anywhere in the line you want to run, then typing [command + return] or manually hitting the Run button 12. **Load the Phylogenetics packages** `library(ape)` and `library(phytools)` 13. **Make a neighbor-joining tree** Using the distance matrix generated by andi a. `andi.dist <- read.table("/home2/slpuro26/labwk7/andi_output.dist", header=F, row.names=1, skip=1)` b. `colnames(andi.dist) <- rownames(andi.dist)` c. ``# infer tree from distance matrix andi.tree <- nj(as.dist(andi.dist))`` 14. **Look at the tree** `plot(andi.tree)` 15. **Clean up the labels of the tips** a. first check what the tip labels are `andi.tree$tip.label` b1. substitute patterns that have to do with method or source `temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="")` b2.`temp2 <- gsub(temp, pattern="_ncbi", replacement="")` c. directly substitute individual elements `temp2[1] <- "bh69"` d. directly substitute ranges of elements `temp2[15:18] <- c("Pfungorum", "Psprentiae", "Pterrae", "Pxenovorans")` e. clean up what was inconsistent across names `temp3 <- gsub(temp2, pattern="qs", replacement="")` f. replace the original labels `andi.tree$tip.label <- temp3` g. clean up unnecessary files `rm(temp, temp2, temp3)` 16. **Look at the treeagain with node numbers** `plot(andi.tree)` `nodelabels()` ![](https://i.imgur.com/j2M30R6.png) 17. **Root tree with the outgroup node** then look at it again: a. `rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T)` b. `plot(rooted.andi, cex=1, label.offset=0.005)` ![](https://i.imgur.com/PuZtvoY.png) 18. **Look at orthofinder species tree** a. `ortho.tree <- read.tree("/home2/slpuro26/labwk7/SpeciesTree_rooted.txt")` b. `plot(ortho.tree)` c1. fix tip labels `ortho.tree$tip.label` c2. `temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="")` c3. `temp2 <- gsub(temp, pattern="qs", replacement="")` c4. `ortho.tree$tip.label <- temp2` c5. `rm(temp, temp2)` d1. root this tree too `plot(ortho.tree)` d2. `nodelabels()` d3. `rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T)` d4. `plot(rooted.ortho, cex=1, label.offset=0.005)` ![](https://i.imgur.com/GIA8gAg.png) 19. **Rotate the nodes** So the tips match more closely the tips of the tree from andi a1. `rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label)` a2. `plot(rotate.ortho)` ![](https://i.imgur.com/EGCA1hd.png) 20. **Compare how similar the trees are.** a. compare topology (branching pattern) `all.equal(andi.tree, ortho.tree, use.edge.length=FALSE)` b. plot both trees side by side `cophyloplot(rooted.andi, rotate.ortho)` ![](https://i.imgur.com/TZe9hMp.png) 21. **Comapre gene trees** a. the tree I generated with IQ-TREE `iq.tree <- read.tree("/home2/slpuro26/labwk7/OG0000412_mafft.faa.treefile")` b. check and root this tree b1. `plot(iq.tree)` b2. `nodelabels()` b3. `rooted.iq84 <- root.phylo(iq.tree, node=33, resolve.root=T)` b4. `plot(rooted.iq84, cex=0.75, label.offset=0.005)` c. the gene tree from orthofinder `gene.tree <- read.tree("/home2/slpuro26/labwk7/OG0000412_tree.txt")` d. check and root this tree too d1. `plot(gene.tree)` d2. `nodelabels()` d3. `rooted.or84 <- root.phylo(gene.tree, node=28, resolve.root=T)` d4. `plot(rooted.or84, cex=0.75, label.offset=0.005)` e. rotate and compare e1. `rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or84$tip.label)` e2. `all.equal(rooted.or84, rooted.iq84, use.edge.length=FALSE)` e3. `cophyloplot(rooted.or84, rotate.iq84)` ![](https://i.imgur.com/vmO17Vp.png)