# LAB #7 PHYLOGENETICS We will use`andi` for whole genome phylogenies,`iqtree` for single gene phylogenies and `R` for visualizing and manipulating trees ## Exercise 1: Andi What `andi` does is identify long exact matches (called anchors) to use as homologous segments between two genomes. Then it looks at the mismatched regions bracketed by these anchors and estimates a distance matrix 1.To run andi we must have an input file for all our genomes that we want to build. a phylogeny with, (these files are profs) ``` ls /courses/bi278/Course_Materials/lab_07/*.*a > andi_inputs.txt ``` 2.Run `andi` ``` andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist ``` This output took a quicksecond, it compared 18 genomes ``` andi: For the two sequences 'b69_dnaa_irp_repolished' and 'baqs159_ncbi' very little homology was found (0.040889 and 0.089178, respectively). ``` -if we Later if you run `andi `on your own, note that it will ignore filenames after the first . (period). Chnage ilename sto pbonn_bqs346.fna ## Exercise 2: IQ-Tree for Gene Trees IQ-TREE (iqtree) is a powerful phylogenetics software that estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihood. Its most convenient feature is that it tests a large range of phylogenetic (e.g. substitution) models for you, then outputs a tree best of which model appears to be most appropriate for your data. Where I can see iqtree being useful for you is if you identified an interesting protein from your orthogroup analysis and wanted to look into in more detail by including additional proteins, either from additional BLASTs or from specific genomes (not your ingroup) in which a particular protein has been studied. 1.We will need to start with an MSA as input for `iqtree ` we will combine our Fasta sequences and genetare an MSA using mafft ``` cat /courses/bi278/Course_Materials/lab_07/ncbi_sensor_kinase.txt /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0002162.fa > OG0002162_plus_ncbi.fa mafft --genafpair --maxiterate 1000 OG0002162_plus_ncbi.fa > OG0002162_mafft.faa ``` 2.Now we can run iqtree ``` iqtree -s OG0002162_mafft.faa ``` Output will test all protein substitution models it has, pick best one for my data*?, do tree searching, and will print a detailed log screen eg:`496 mtVer+I 29980.748 50 60061.496 60067.125 60304.686` It then created the following files: ``` Analysis results written to: IQ-TREE report: OG0002162_mafft.faa.iqtree Maximum-likelihood tree: OG0002162_mafft.faa.treefile Likelihood distances: OG0002162_mafft.faa.mldist Screen log file: OG0002162_mafft.faa.log ``` to get a preview of what the tree looks like, go into ` OG0002162_mafft.faa.iqtree` ![Screenshot 2023-11-07 at 2.24.07 PM.png](https://hackmd.io/_uploads/r1qWyzum6.png) ## Exercise 3: Trees in `R` We can look a and compare our pakcages in R. The two main packages for phylogenetics in R are `ape: Analyses of Phylogenetics and Evolution` and `phytools`. To make our species trees ( we have our distance matrix)from `andi`:`andi_output.dist` We have our gene tree:`OG0002162_mafft.faa.treefile` 1.go into Rstudio 2.Load phylogenetics packages ``` library(ape) library(phytools) ``` 3. We need to take the distance matrix genetrated by `andi` and make neighbor-tree joining tree from it ``` andi.dist <- read.table("PATH/andi_output.dist", header=F, row.names=1, skip=1) colnames(andi.dist) <- rownames(andi.dist) # infer tree from distance matrix andi.tree <- nj(as.dist(andi.dist)) ``` 4. Visualize tree ``` plot(andi.tree) ``` ![Screenshot 2023-11-07 at 2.22.45 PM.png](https://hackmd.io/_uploads/BJt30Zd7a.png) 5. Clean up the labels of the tips, to make it easier to compare trees between methods ``` # first check what the tip labels are andi.tree$tip.label # substitute patterns that have to do with method or source temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="") temp2 <- gsub(temp, pattern="_ncbi", replacement="") # check what it looks like now temp2 # we can also directly substitute individual elements temp2[1] <- "bh69" # we can also directly substitute ranges of elements temp2[13:14] <- c("bmall", "bpseu") # last thing to clean up that was inconsistent across names temp3 <- gsub(temp2, pattern="qs", replacement="") # check what it looks like now temp3 # replace the original labels andi.tree$tip.label <- temp3 # clean up unnecessary files rm(temp, temp2, temp3) ``` 6. Now look at the tree ``` plot(andi.tree) nodelabels() ``` nodelabels put the numbers on each node, two outgroupd at node 34 ![Screenshot 2023-11-07 at 2.41.28 PM.png](https://hackmd.io/_uploads/HJiZQGuQa.png) 7. Root tree with outgroup node ``` rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T) plot(rooted.andi, cex=1, label.offset=0.005 ``` ![Screenshot 2023-11-07 at 2.45.04 PM.png](https://hackmd.io/_uploads/HJN0QMdX6.png) 8. Look at the tree outputted by orthfinder plot ortho tree:![Screenshot 2023-11-07 at 2.46.49 PM.png](https://hackmd.io/_uploads/r1cBNz_Xp.png) ``` ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt") plot(ortho.tree) # fix tip labels ortho.tree$tip.label temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="") temp2 <- gsub(temp, pattern="qs", replacement="") temp2[6:7] <- c("Bmallei", "Bpseudomallei") # check temp2 # replace labels ortho.tree$tip.label <- temp2 rm(temp, temp2) # root this tree too plot(ortho.tree) nodelabels() rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T) plot(rooted.ortho, cex=1, label.offset=0.005) ``` output:![Screenshot 2023-11-07 at 2.49.43 PM.png](https://hackmd.io/_uploads/SJClSfdXa.png) 9. Rotate nodes of this tree so tips match more closely thetips pf the tree from andi ``` rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label) plot(rotate.ortho) ``` 10. Compare how similar these gene trees are ``` # compare topology (branching pattern) all.equal(andi.tree, ortho.tree, use.edge.length=FALSE) # plot both trees side by side cophyloplot(rooted.andi, rotate.ortho) ``` ![Screenshot 2023-11-07 at 2.55.20 PM.png](https://hackmd.io/_uploads/HkCPUzdXa.png) 11.Look at gene tree ```# the tree you generated with IQ-TREE iq.tree <- read.tree("PATH/OG0000084_mafft.faa.treefile") # check and root this tree plot(iq.tree) nodelabels() rooted.iq <- root.phylo(iq.tree, node=33, resolve.root=T) plot(rooted.iq, cex=0.75, label.offset=0.005) ``` gave me error case nide shouod be greater than number of taxa node=64 ![Screenshot 2023-11-07 at 3.05.20 PM.png](https://hackmd.io/_uploads/Symquz_X6.png) 18.Lastly, you can save any figures that you created directly, but let’s also save our cleaned up species trees as text files in case you want to use them in the future. ``` write.tree(rooted.andi, file="PATH/andi_cleaned.tre") write.tree(rotate.iq, file="PATH/iq_sensorkinase.tre") ``` gave me an derror ofor my rotate.iq, says it was not found ## Exercise 4. Modify your tree with `dendextend` To use dendextend on trees you may have built with any phylogenetic package, you will need to convert the newick tree file format to a dendrogram format. The R package DECIPHER is able to do this. EG with gene tree First load packages ``` # install if not present library(DECIPHER) library(dendextend) ``` 2. Read in a tree then plot it ``` iq.dend <- ReadDendrogram("/home2/cibene26/lab_07/OG0002162_mafft.faa.treefile") plot(iq.dend) ``` ![Screenshot 2023-11-07 at 3.15.22 PM.png](https://hackmd.io/_uploads/rJRejfd7p.png) 3. Can adjust margins. to make more visisble ``` par(mar = c(8,4,2,2) + 0.1) #bottom, left, top, right; default is c(5,4,4,2) plot(iq.dend) ``` ![Screenshot 2023-11-07 at 3.16.51 PM.png](https://hackmd.io/_uploads/Hy78izdX6.png) 4.The order of taxa on the dendrogram is identical to what you see in the plot window ``` iq.dend %>% labels ``` dendextend is useful for chaging the visuals of your tree 5.You can color the leaves manually based on the label order you just checked,colored circles here: ``` iq.dend %>% set("leaves_pch", 19) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle") ``` OUTPUT;![Screenshot 2023-11-07 at 3.19.15 PM.png](https://hackmd.io/_uploads/Skek3z_Xa.png) 6. You can also turn these plots sideways, or change the circles to squares by changing the pch attribute: ``` par(mar = c(4,2,2,8)) q.dend %>% set("leaves_pch", 15) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE) ``` ![Screenshot 2023-11-07 at 3.20.49 PM.png](https://hackmd.io/_uploads/HJUVhGd76.png) 7. Could also change colors by adding extra shapes ``` iq.dend %>% set("labels_colors", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE) ``` [Screenshot 2023-11-07 at 3.21.41 PM.png](https://hackmd.io/_uploads/ryLO2GOQa.png) 8.Finally, this command sets the margins back to the default afterwards. ``` par(mar = c(5,4,4,2)) ```