# Lab07: Phylogenetics Kirsten Pastore ## Exercise 1: andi ### 0) Make a New Directory ``` # ssh into bi278 ssh klpast23@bi278 #make a new directory mkdir lab_07 ``` ### 1) Prepare an input file with all genomes that we want to make a phylogeny with ``` ls /courses/bi278/Course_Materials/lab_07/* > andi_inputs.txt ``` ### 2) Run andi ``` andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist ``` ## Exercise 2: IQTree for Phylogenies *Note: IQ-Tree estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihoof ### 3) Identify orthogroups with genes present from each genome. All rows where orthologous groups share a common ancestor at node zero, and each genome has at least on gene member in the group ``` awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!="" && $8!="" && $9!="" && $10!="" && $11!="" && $12!="" && $13!="" && $14!="" && $15!="" && $16!="" && $17!="" && $18!="" && $19!="" && $20!="" && $21!=""' /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylogenetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt ``` ### 4) how many genes are included in each orthogroup from above. if there are only single copy genes, then the number of genes should be the same as the number of genomes included in the orthofinder analysis ``` sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}' ``` *Note: Investigating OG0000084 ### 5) Generate a multiple sequence alignmnent (MSA) using MAFFT ``` mafft --maxiterate 3 /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000084.fa > OG0000084_mafft.faa ``` ### 6) Use MSA as input for iqtree ``` iqtree -s OG0000084_mafft.faa ``` tree is in this file: OG0000084_mafft.faa.treefile ## Exercise 3. Trees in R Copy Gene Tree from Orthogroup 0000084: ``` cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt ~/lab_07/ ``` Copy Species Tree from Orthofinder ``` cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt ~/lab_07/ ``` ### 7) Go to RStudio (bi278.colby.edu) ``` setwd("/home2/klpast23/lab_07/") ``` ### 8) Load our Phylogenetics Packages ``` library(ape) library(phytools) ``` ### 9) Take the distance matrix generated by andi and make a neighbor-joining tree from it ``` andi.dist <- read.table("/home2/klpast23/lab_07/andi_output.dist", header=F, row.names=1, skip=1) colnames(andi.dist) <- rownames(andi.dist) #infer treefrom distance matrix andi.tree <- nj(as.dist(andi.dist)) ``` ### 10) Look at this tree ``` plot(andi.tree) ``` ![](https://i.imgur.com/wfqfIqo.png) ### 11) Clean up the labels of the tips ``` #first check what the tip labels are andi.tree$tip.label #substitute patterns that have to do with methods or source temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="") temp2 <- gsub(temp, pattern="_ncbi", replacement="") #clean up inconsistencies across names temp3 <- gsub(temp2, pattern="gs", replacement="") #replace the original lavels andi.tree$tip.label <- temp3 # clean up unnecessary files rm(temp, temp2, temp3) ``` ### 12) Look at tree again, this time with node numbers ``` plot(andi.tree) nodelabels(adj= c(1,1), frame="none") ``` ![](https://i.imgur.com/Jc7bjpZ.png) ### 13) Root our tree with the outgroup node, then look at it again ``` rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T) plot(rooted.andi,cex =1, label.offset=0.005) ``` ![](https://i.imgur.com/66D9P3X.png) ### 14) look at the species tree outputted by orthofinder ``` ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt") # fix the tip labels ortho.tree$tip.label temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="") temp2 <- gsub(temp, pattern="qs", replacement="") ortho.tree$tip.label <- temp2 rm(temp,temp2) #root this tree too plot(ortho.tree) nodelabels() rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T) plot(rooted.ortho, cex=1, label.offset=0.005) ``` ![](https://i.imgur.com/qL0R6pm.png) ### 15) Rotate the nodes of this tree so the tips match more closely the tips of the tree from andi ``` rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label) plot(rotate.ortho) ``` ![](https://i.imgur.com/ckrlymK.png) ### 16) Compare how similar these tress are ``` #compare topology (branching pattern) all.equal(andi.tree, ortho.tree, use.edge.length=FALSE) #plot both trees side by side cophyloplot(rooted.andi, rotate.ortho) ``` ![](https://i.imgur.com/bGTdwPs.png) ### 17) Compare out gene trees now ``` # the tree generated with IQ-TREE iq.tree <- read.tree("/home2/klpast23/lab_07/OG0000084_mafft.faa.treefile") #check and root this tree plot(iq.tree) nodelabels() rooted.iq84 <- root.phylo(iq.tree, node=33, resolve.root=T) plot(rooted.iq84, cex=0.75, label.offset=0.005) ``` ![](https://i.imgur.com/VqMfeSF.png) ``` #the gene tree from orthofinder gene.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt") #check and root this tree too plot(gene.tree) nodelabels() rooted.or84 <- root.phylo(gene.tree, node=55, resolve.root=T) plot(rooted.or84, cex=0.75, label.offset=0.005) ``` ![](https://i.imgur.com/LeI8YYI.png) ``` #rotate and compare rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or484$tip.label) cophyloplot(rooted.or84, rotate.iq84) ``` ![](https://i.imgur.com/0ufLSLA.png)