# Lab 07
## Exercise 1: Configuring ANchor DIstances
```
mkdir lab_07
ls /courses/bi278/Course_Materials/lab_07/* > andi_inputs.txt/
andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist
```
The command above will create an andi input file made up of fa filepaths. The second command will identify long, exact matches (called anchors), which the alignment-free method, andi will use as homologous segments.
## Exercise 2: IQ-Tree for phylogenes
```
awk -F "\t" '$3=="n0" && $4!="" && $5!="" && $6!="" && $7!=""
&& $8!="" && $9!="" && $10!="" && $11!="" && $12!="" && $13!=""
&& $14!="" && $15!="" && $16!="" && $17!="" && $18!="" &&
$19!="" && $20!="" && $21!=""'
/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Phylo
genetic_Hierarchical_Orthogroups/N0.tsv > 18taxa_coregenes.txt
```
The command above searches for orthogroups with genes present in each genome. It returns all rows where orthologous groups share a common ancester at the 0th node.
```
sed 's/, /\t/g' 18taxa_coregenes.txt | awk '{print $1, NF-1}'
```
The command above displays the number of genes in each orthogroup. Arbitrarily, I choose to look at HOG0000010 which has 77 gene members.
The amino acid sequneces of each of the orthogroups are in the directory, /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/
```
mafft --maxiterate 3
/courses/bi278/Course_Materials/lab_07/orthofinder_18tax Orthogroup_Sequences/OG0000010.fa > OG0000010_mafft.faa
```
Use MAFFT in order to construct a multiple sequence alignment. We can use this multiple sequence alignment in order to construct a phylogenetic tree using IQ-Tree.
```
iqtree -s OG0000010_mafft.faa
```
This will test all of the protein substitution models and pick the best match for the multiple sequence alignement.
### Exercise 3: trees in R
The following code snippets will be in R rather than Unix.
```
library(ape)
library(phytools)
```
The commands above loads ape and phytools packages in R. APE stands for Analyses of Phylogenetics and Evolution. They are both phylogenetics packages used for R.
```
library(ape)
library(phytools)
setwd('/home2/kyamad23/colbyhome/BI278/lab_07')
andi.dist <- read.table('./andi_output.dist', header = F, row.names = 1, skip = 1)
colnames(andi.dist) <- rownames(andi.dist)
# tree from distance matrix
andi.tree <- nj(as.dist(andi.dist))
plot(andi.tree)
# labels the branches of the tree
andi.tree$tip.label
# substituting patterns that have to do with method
temp <- gsub(andi.tree$tip_label, pattern = "_nanoassembly", replacement = "")
temp2 <- gsub(temp, pattern="_ncbi", replacement="")
#directly substituing individual elemetns
temp2[1] <- "bh69"
#substituing ranges of elements
temp2[15:18] <- c("Pfungorum", "Psprentiae", "Pterrae", "Pxenovorans")
# cleaning up inconsistent naming schemes
temp3 <- gsub(temp2, pattern = "qs", replacement = "")
#replacing original labels
andi.tree$tip.label <- temp3
rm(temp, temp2, temp3) #removing temporary tree files
plot(andi.tree)
nodelabels()
```

Above is listed a method of cleaning the labels of the tree branches.
We can also root the tree with the outgroup node by using the root.phylo command
```
rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T)
plot(rooted.andi, cex=1, label.offset=0.005)
```

In order to look at species tree outputted by orthofinder, we can use the read.tree command.
```
ortho.tree <-
read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_1
8taxa/Species_Tree/SpeciesTree_rooted.txt")
plot(ortho.tree)
#fixing tip labels
ortho.tree$tip.label
temp <- gsub(ortho.tree$tip.label, pattern="_prokka",
replacement="")
temp2 <- gsub(temp, pattern="qs", replacement="")
ortho.tree$tip.label <- temp2
rm(temp, temp2)
```

```
# root this tree too
plot(ortho.tree)
nodelabels()
rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T)
plot(rooted.ortho, cex=1, label.offset=0.005)
```

Rotating the nodes of this tree so that the tips match the tree from andi.
```
rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label)
plot(rotate.ortho)
```
In order to compare how similar trees are, we can compare their topology (i.e. their branching pattern) and also plot them side by side
```
all.equal(andi.tree, ortho.tree, use.edge.length = F)
cophyloplot(rooted.andi, rotate.ortho)
```

The plot of side-by-side trees shows that they are not the same topology. This is validated by the all.equal command which outputs "FALSE". In order to improve our tree quality, we can either include more Paraburkholderia species or just use andi for closely related taxa.
```
iq.tree <- read.tree("./OG0000084_mafft.faa.treefile")
plot(iq.tree)
nodelabels()
rooted.iq84 <- root.phylo(iq.tree, node = 33, resolve.root = T)
plot(rooted.iq84, cex = 0.75, label.offset= 0.0005)
```

This plot is the rooted plot generated from IQ-tree from Orthogroup 84.
```
gene.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Gene_Trees/OG0000084_tree.txt")
plot(gene.tree)
nodelabels()
rooted.or84 <- root.phylo(gene.tree, node = 55, resolve.root = T)
plot(rooted.or84, cex = 0.75, label.offset= 0.005)
```
The code above plots the gene tree directly from orthofinder. It also checks and roots the tree.

```
rotate.iq84 <- rotateConstr(rooted.iq84, rooted.or84$tip.label)
all.equal(rooted.or84, rooted.iq84, use.edge.length = F)
cophyloplot(rooted.or84, rotate.iq84)
```

Aligning the trees together shows us that the two trees are very similar to each other but not identical. The all.equal command also returns False which means that the tree tips are not equal between the iqtree and the orthofinder produced genetree.
Finally, we can write the two claned up tree files to our working directory by using the write.tree command.
```
write.tree(rooted.andi, file = "./and_cleaned.tree")
write.tree(rotate.ortho, file = "./ortho_cleaned.tree")
```