# Lab07: Phylogenetics
Kirsten Pastore
## Exercise 1: andi
### 0) Make a New Directory
```
# ssh into bi278
ssh klpast23@bi278
#make a new directory
mkdir lab_07
```
### 1) Prepare an input file with all genomes that we want to make a phylogeny with
```
ls /courses/bi278/Course_Materials/lab_07/* > andi_inputs.txt
```
### 2) Run andi
```
andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist
```
## Exercise 2: IQTree for Phylogenies
*Note: IQ-Tree estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihoof
### 3) Identify orthogroups with genes present from each genome. All rows where orthologous groups share a common ancestor at node zero, and each genome has at least on gene member in the group
```
awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!="" && $8!="" && $9!="" && $10!="" && $11!="" && $12!="" && $13!="" && $14!="" && $15!="" && $16!="" && $17!="" && $18!="" && $19!="" && $20!="" && $21!=""' /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylogenetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt
```
### 4) how many genes are included in each orthogroup from above. if there are only single copy genes, then the number of genes should be the same as the number of genomes included in the orthofinder analysis
```
sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}'
```
*Note: Investigating OG0000084
### 5) Generate a multiple sequence alignmnent (MSA) using MAFFT
```
mafft --maxiterate 3 /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0000084.fa > OG0000084_mafft.faa
```
### 6) Use MSA as input for iqtree
```
iqtree -s OG0000084_mafft.faa
```
tree is in this file:
OG0000084_mafft.faa.treefile
## Exercise 3. Trees in R
Copy Gene Tree from Orthogroup 0000084:
```
cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt ~/lab_07/
```
Copy Species Tree from Orthofinder
```
cp /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt ~/lab_07/
```
### 7) Go to RStudio (bi278.colby.edu)
```
setwd("/home2/klpast23/lab_07/")
```
### 8) Load our Phylogenetics Packages
```
library(ape)
library(phytools)
```
### 9) Take the distance matrix generated by andi and make a neighbor-joining tree from it
```
andi.dist <- read.table("/home2/klpast23/lab_07/andi_output.dist", header=F, row.names=1, skip=1)
colnames(andi.dist) <- rownames(andi.dist)
#infer treefrom distance matrix
andi.tree <- nj(as.dist(andi.dist))
```
### 10) Look at this tree
```
plot(andi.tree)
```

### 11) Clean up the labels of the tips
```
#first check what the tip labels are
andi.tree$tip.label
#substitute patterns that have to do with methods or source
temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="")
temp2 <- gsub(temp, pattern="_ncbi", replacement="")
#clean up inconsistencies across names
temp3 <- gsub(temp2, pattern="gs", replacement="")
#replace the original lavels
andi.tree$tip.label <- temp3
# clean up unnecessary files
rm(temp, temp2, temp3)
```
### 12) Look at tree again, this time with node numbers
```
plot(andi.tree)
nodelabels(adj= c(1,1), frame="none")
```

### 13) Root our tree with the outgroup node, then look at it again
```
rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T)
plot(rooted.andi,cex =1, label.offset=0.005)
```

### 14) look at the species tree outputted by orthofinder
```
ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt")
# fix the tip labels
ortho.tree$tip.label
temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="")
temp2 <- gsub(temp, pattern="qs", replacement="")
ortho.tree$tip.label <- temp2
rm(temp,temp2)
#root this tree too
plot(ortho.tree)
nodelabels()
rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T)
plot(rooted.ortho, cex=1, label.offset=0.005)
```

### 15) Rotate the nodes of this tree so the tips match more closely the tips of the tree from andi
```
rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label)
plot(rotate.ortho)
```

### 16) Compare how similar these tress are
```
#compare topology (branching pattern)
all.equal(andi.tree, ortho.tree, use.edge.length=FALSE)
#plot both trees side by side
cophyloplot(rooted.andi, rotate.ortho)
```

### 17) Compare out gene trees now
```
# the tree generated with IQ-TREE
iq.tree <- read.tree("/home2/klpast23/lab_07/OG0000084_mafft.faa.treefile")
#check and root this tree
plot(iq.tree)
nodelabels()
rooted.iq84 <- root.phylo(iq.tree, node=33, resolve.root=T)
plot(rooted.iq84, cex=0.75, label.offset=0.005)
```

```
#the gene tree from orthofinder
gene.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt")
#check and root this tree too
plot(gene.tree)
nodelabels()
rooted.or84 <- root.phylo(gene.tree, node=55, resolve.root=T)
plot(rooted.or84, cex=0.75, label.offset=0.005)
```

```
#rotate and compare
rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or484$tip.label)
cophyloplot(rooted.or84, rotate.iq84)
```
