# Bi278 Lab 7 Phylogenetics ### By Lee Ferenc 11/7/2023 * Whole Genome phyolgenies: `andi` * Single Genes: `iqtree` * Visualizing and manipulating trees: `R` ## 1. `andi` (***An***chor ***Di***stances) Alightment-free method of genearating a phylogeny. `andi` works by idenitifying long exact matches (anchors) to use as homologous segemnets between two genomes. Then it looks at the mismatches regions and estimates a distance matrix. ### Step 0: Make a directory ### Step 1: Prepare an input files that contains all of the **genomes** to build a phylogeny with. ``` ls /courses/bi278/Course_Materials/lab_07/*.*a > andi_inputs.txt ``` ### Step 2: Run `andi` ``` andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist ``` Note that it will ignore the file names after the first `.` so be wary if you use `.` and rename the files with an `_` or something else. My example: ``` andi --file-of-filenames=andi_Kinputs.txt -t 2 --join > /home2/enfere24/lab_07/andi_output.txt ``` ## 2. IQ-TREE It estimates trees from either nucleotide or protein alignments using maximum likelihood. ### Step 1: Get a MSA Lab example: (it combines two different fasta squences and then makes an MSA using `mafft` (see lab 4)) ``` cat /courses/bi278/Course_Materials/lab_07/ncbi_sensor_kinase.txt /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0002162.fa > OG0002162_plus_ncbi.fa mafft --genafpair --maxiterate 1000 OG0002162_plus_ncbi.fa > OG0002162_mafft.faa ``` ###### Here I decided to use the lab example due to it being annoying grab the specific locations of proteins from the annotated genomes. I'd like to take a gene that's across every strain. I reran the lab data for andi as well. ### Step 2: Run `iqtree` ``` iqtree -s FILE.faa ``` The output file has `.treefile` at the end and you can preview of what the tree looks like with the file+`.iqtree` #### Lab example: ``` iqtree -s OG0002162_mafft.faa ``` ## 3. Trees in `R` ### Step 1: Go to RStudio at: [bi278.colby.edu](https://bi278.colby.edu) ### Step 2: Load phylogeny packages ``` library(ape) library(maps) library(phytools) ``` (Maps was needed) ### Step 3: Take `andi` distance matrix and make neighbor-joining tree ``` andi.dist <- read.table("PATH/andi_output.dist", header=F, row.names=1, skip=1) colnames(andi.dist) <- rownames(andi.dist) # infer tree from distance matrix andi.tree <- nj(as.dist(andi.dist)) ``` ### Step 4: Plot tree ``` plot(andi.tree) ``` My tree (which isn't great) ![not great plot.png](https://hackmd.io/_uploads/HkfH0-_Xp.png) ### Step 5: Clean-Up #### Check tip labels ``` andi.tree$tip.label ``` #### Sub patterns (gets rid of things like "_nanoassembly") ``` temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="") temp2 <- gsub(temp, pattern="_ncbi", replacement="") ``` #### Sub inidividual elements ``` temp2[1] <- "bh69" ``` #### Sub ranges ``` temp2[13:14] <- c("bmall", "bpseu") ``` #### Clean-up inconsistencies ``` temp3 <- gsub(temp2, pattern="qs", replacement="") ``` #### Replace labels ``` andi.tree$tip.label <- temp3 ``` #### Clean-up temp files ``` rm(temp, temp2, temp3) ``` ### Step 6: Plot again ``` plot(andi.tree) nodelabels() ``` ![better?.png](https://hackmd.io/_uploads/BJsSlMOma.png) Don't know if that's *better*. Especially with the outgroups being at 34 and not at the root. ### Step 7: Root tree with outgroup ``` rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T) plot(rooted.andi, cex=1, label.offset=0.005) ``` ![hey that's pretty good.png](https://hackmd.io/_uploads/H1ICgzO76.png) ### Step 8: look at the species tree outputted by orthofinder. #### Make tree ``` ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt") plot(ortho.tree) ``` ![says rooted but trash.png](https://hackmd.io/_uploads/SJlbOWM_ma.png) #### fix tip labels ``` ortho.tree$tip.label temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="") temp2 <- gsub(temp, pattern="qs", replacement="") temp2[6:7] <- c("Bmallei", "Bpseudomallei") ``` #### replace labels ``` ortho.tree$tip.label <- temp2 rm(temp, temp2) ``` # root ``` plot(ortho.tree) nodelabels() rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T) plot(rooted.ortho, cex=1, label.offset=0.005) ``` ![new labels with nodes.png](https://hackmd.io/_uploads/SyOyMGd7a.png) ![rooted.png](https://hackmd.io/_uploads/Sk4eGGuQT.png) ### Step 8: Rotate nodes so tips = andi tree ``` rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label) plot(rotate.ortho) ``` ### Step 9: Compare trees! ``` # compare topology (branching pattern) all.equal(andi.tree, ortho.tree, use.edge.length=FALSE) # plot both trees side by side cophyloplot(rooted.andi, rotate.ortho) ``` I didn't alter all the labels in the andi tree. Howver it's not super bad, I still understand everything's the same except Sprentia and Pterrae and also the branches got flipped in the middle? No idea why considering I copied everything from the lab. Wait I didn't copy everything, I think the files are slightly different ![im lazy not fixing labels.png](https://hackmd.io/_uploads/ByR4XzuQp.png) ### Step 10: Look at gene tree #### Make the tree and check it out ``` iq.tree <- read.tree("/home2/enfere24/lab_07/OG0002162_mafft.faa.treefile") plot(iq.tree) nodelabels() ``` Yikes ![good lord.png](https://hackmd.io/_uploads/SJZsEMuQa.png) #### Root it ``` rooted.iq <- root.phylo(iq.tree, node=64, resolve.root=T) plot(rooted.iq, cex=0.75, label.offset=0.005) ``` ![ooooo.png](https://hackmd.io/_uploads/Bk_zSzOXa.png) ### Step 11: Clean-up workspace for next time and save ur work (did not do, I don't want these files) ``` write.tree(rooted.andi, file="PATH/andi_cleaned.tre") write.tree(rotate.iq, file="PATH/iq_sensorkinase.tre") ``` ## 4. Modify your tree with `dendextend` ### Step 0: Load packages ``` library(DECIPHER) library(dendextend) ``` ### Step 1: Read in a tree and plot ``` iq.dend <- iq.dend <- ReadDendrogram("/home2/enfere24/lab_07/OG0002162_mafft.faa.treefile") plot(iq.dend) ``` ![uh interesting.png](https://hackmd.io/_uploads/Hk6VPfdmT.png) ### Step 2: Adjust plot ``` par(mar = c(8,4,2,2) + 0.1) #bottom, left, top, right; default is c(5,4,4,2) plot(iq.dend) ``` ![better maybe.png](https://hackmd.io/_uploads/Byswwzu76.png) ### Step 3: Make order of taxa on dendrogram identical to what you see in plot window ``` iq.dend %>% labels ``` ### Step 4: Color leaves (with circles!) ``` iq.dend %>% set("leaves_pch", 19) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle") ``` ### Step 5: Turn sideways and add fancy colors ``` par(mar = c(4,2,2,8)) iq.dend %>% set("leaves_pch", 15) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE) ``` ![sexu.png](https://hackmd.io/_uploads/BykB_zOm6.png) ### Step 6: Change text color ``` iq.dend %>% set("labels_colors", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE) ``` ![finalmente.png](https://hackmd.io/_uploads/HkaOdGuXp.png) ### Step 7: Reset margins ``` par(mar = c(5,4,4,2)) ```