# Lab 07 ## Exercise 1: Configuring ANchor DIstances ``` mkdir lab_07 ls /courses/bi278/Course_Materials/lab_07/* > andi_inputs.txt/ andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist ``` The command above will create an andi input file made up of fa filepaths. The second command will identify long, exact matches (called anchors), which the alignment-free method, andi will use as homologous segments. ## Exercise 2: IQ-Tree for phylogenes ``` awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!="" && $8!="" && $9!="" && $10!="" && $11!="" && $12!="" && $13!="" && $14!="" && $15!="" && $16!="" && $17!="" && $18!="" && $19!="" && $20!="" && $21!=""' /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylo genetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt ``` The command above searches for orthogroups with genes present in each genome. It returns all rows where orthologous groups share a common ancester at the 0th node. ``` sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}' ``` The command above displays the number of genes in each orthogroup. Arbitrarily, I choose to look at HOG0000010 which has 77 gene members. The amino acid sequneces of each of the orthogroups are in the directory, /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/ ``` mafft --maxiterate 3 /courses/bi278/Course_Materials/lab_07/orthofinder_18tax Orthogroup_Sequences/OG0000010.fa > OG0000010_mafft.faa ``` Use MAFFT in order to construct a multiple sequence alignment. We can use this multiple sequence alignment in order to construct a phylogenetic tree using IQ-Tree. ``` iqtree -s OG0000010_mafft.faa ``` This will test all of the protein substitution models and pick the best match for the multiple sequence alignement. ### Exercise 3: trees in R The following code snippets will be in R rather than Unix. ``` library(ape) library(phytools) ``` The commands above loads ape and phytools packages in R. APE stands for Analyses of Phylogenetics and Evolution. They are both phylogenetics packages used for R. ``` library(ape) library(phytools) setwd('/home2/kyamad23/colbyhome/BI278/lab_07') andi.dist <- read.table('./andi_output.dist', header = F, row.names = 1, skip = 1) colnames(andi.dist) <- rownames(andi.dist) # tree from distance matrix andi.tree <- nj(as.dist(andi.dist)) plot(andi.tree) # labels the branches of the tree andi.tree$tip.label # substituting patterns that have to do with method temp <- gsub(andi.tree$tip_label, pattern = "_nanoassembly", replacement = "") temp2 <- gsub(temp, pattern="_ncbi", replacement="") #directly substituing individual elemetns temp2[1] <- "bh69" #substituing ranges of elements temp2[15:18] <- c("Pfungorum", "Psprentiae", "Pterrae", "Pxenovorans") # cleaning up inconsistent naming schemes temp3 <- gsub(temp2, pattern = "qs", replacement = "") #replacing original labels andi.tree$tip.label <- temp3 rm(temp, temp2, temp3) #removing temporary tree files plot(andi.tree) nodelabels() ``` ![](https://i.imgur.com/PGI79ew.png) Above is listed a method of cleaning the labels of the tree branches. We can also root the tree with the outgroup node by using the root.phylo command ``` rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T) plot(rooted.andi, cex=1, label.offset=0.005) ``` ![](https://i.imgur.com/pc8HL1P.png) In order to look at species tree outputted by orthofinder, we can use the read.tree command. ``` ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_1 8taxa/Species_Tree/SpeciesTree_rooted.txt") plot(ortho.tree) #fixing tip labels ortho.tree$tip.label temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="") temp2 <- gsub(temp, pattern="qs", replacement="") ortho.tree$tip.label <- temp2 rm(temp, temp2) ``` ![](https://i.imgur.com/q02QwFJ.png) ``` # root this tree too plot(ortho.tree) nodelabels() rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T) plot(rooted.ortho, cex=1, label.offset=0.005) ``` ![](https://i.imgur.com/dwihLvu.png) Rotating the nodes of this tree so that the tips match the tree from andi. ``` rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label) plot(rotate.ortho) ``` In order to compare how similar trees are, we can compare their topology (i.e. their branching pattern) and also plot them side by side ``` all.equal(andi.tree, ortho.tree, use.edge.length = F) cophyloplot(rooted.andi, rotate.ortho) ``` ![](https://i.imgur.com/teY5saU.png) The plot of side-by-side trees shows that they are not the same topology. This is validated by the all.equal command which outputs "FALSE". In order to improve our tree quality, we can either include more Paraburkholderia species or just use andi for closely related taxa. ``` iq.tree <- read.tree("./OG0000084_mafft.faa.treefile") plot(iq.tree) nodelabels() rooted.iq84 <- root.phylo(iq.tree, node = 33, resolve.root = T) plot(rooted.iq84, cex = 0.75, label.offset= 0.0005) ``` ![](https://i.imgur.com/toXN4u1.png) This plot is the rooted plot generated from IQ-tree from Orthogroup 84. ``` gene.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt") plot(gene.tree) nodelabels() rooted.or84 <- root.phylo(gene.tree, node = 55, resolve.root = T) plot(rooted.or84, cex = 0.75, label.offset= 0.005) ``` The code above plots the gene tree directly from orthofinder. It also checks and roots the tree. ![](https://i.imgur.com/Ikg05f2.png) ``` rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or84$tip.label) all.equal(rooted.or84, rooted.iq84, use.edge.length = F) cophyloplot(rooted.or84, rotate.iq84) ``` ![](https://i.imgur.com/yShFBmo.png) Aligning the trees together shows us that the two trees are very similar to each other but not identical. The all.equal command also returns False which means that the tree tips are not equal between the iqtree and the orthofinder produced genetree. Finally, we can write the two claned up tree files to our working directory by using the write.tree command. ``` write.tree(rooted.andi, file = "./and_cleaned.tree") write.tree(rotate.ortho, file = "./ortho_cleaned.tree") ```