# LAB #7 PHYLOGENETICS
We will use`andi` for whole genome phylogenies,`iqtree` for single gene phylogenies and `R` for visualizing and manipulating trees
## Exercise 1: Andi
What `andi` does is identify long exact matches (called anchors) to use as homologous segments between two genomes. Then it looks at the mismatched regions bracketed by these anchors and estimates a distance matrix
1.To run andi we must have an input file for all our genomes that we want to build. a phylogeny with, (these files are profs)
```
ls /courses/bi278/Course_Materials/lab_07/*.*a > andi_inputs.txt
```
2.Run `andi`
```
andi --file-of-filenames=andi_inputs.txt -t 2 --join > andi_output.dist
```
This output took a quicksecond, it compared 18 genomes
```
andi: For the two sequences 'b69_dnaa_irp_repolished' and 'baqs159_ncbi' very little homology was found (0.040889 and 0.089178, respectively).
```
-if we Later if you run `andi `on your own, note that it will ignore filenames after the first . (period). Chnage ilename sto pbonn_bqs346.fna
## Exercise 2: IQ-Tree for Gene Trees
IQ-TREE (iqtree) is a powerful phylogenetics software that estimates trees for either nucleotide or protein (amino acid) alignments using maximum likelihood. Its most convenient feature is that it tests a large range of phylogenetic (e.g. substitution) models for you, then outputs a tree best of which model appears to be most appropriate for your data.
Where I can see iqtree being useful for you is if you identified an interesting protein from your orthogroup analysis and wanted to look into in more detail by including additional proteins, either from additional BLASTs or from specific genomes (not your ingroup) in which a particular protein has been studied.
1.We will need to start with an MSA as input for `iqtree ` we will combine our Fasta sequences and genetare an MSA using mafft
```
cat /courses/bi278/Course_Materials/lab_07/ncbi_sensor_kinase.txt /courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Orthogroup_Sequences/OG0002162.fa > OG0002162_plus_ncbi.fa
mafft --genafpair --maxiterate 1000 OG0002162_plus_ncbi.fa > OG0002162_mafft.faa
```
2.Now we can run iqtree
```
iqtree -s OG0002162_mafft.faa
```
Output will test all protein substitution models it has, pick best one for my data*?, do tree searching, and will print a detailed log screen eg:`496 mtVer+I 29980.748 50 60061.496 60067.125 60304.686`
It then created the following files:
```
Analysis results written to:
IQ-TREE report: OG0002162_mafft.faa.iqtree
Maximum-likelihood tree: OG0002162_mafft.faa.treefile
Likelihood distances: OG0002162_mafft.faa.mldist
Screen log file: OG0002162_mafft.faa.log
```
to get a preview of what the tree looks like, go into ` OG0002162_mafft.faa.iqtree`

## Exercise 3: Trees in `R`
We can look a and compare our pakcages in R. The two main packages for phylogenetics in R are `ape: Analyses of Phylogenetics and Evolution` and `phytools`.
To make our species trees ( we have our distance matrix)from `andi`:`andi_output.dist`
We have our gene tree:`OG0002162_mafft.faa.treefile`
1.go into Rstudio
2.Load phylogenetics packages
```
library(ape)
library(phytools)
```
3. We need to take the distance matrix genetrated by `andi` and make neighbor-tree joining tree from it
```
andi.dist <- read.table("PATH/andi_output.dist", header=F, row.names=1, skip=1)
colnames(andi.dist) <- rownames(andi.dist)
# infer tree from distance matrix
andi.tree <- nj(as.dist(andi.dist))
```
4. Visualize tree
```
plot(andi.tree)
```

5. Clean up the labels of the tips, to make it easier to compare trees between methods
```
# first check what the tip labels are
andi.tree$tip.label
# substitute patterns that have to do with method or source
temp <- gsub(andi.tree$tip.label, pattern="_nanoassembly", replacement="")
temp2 <- gsub(temp, pattern="_ncbi", replacement="")
# check what it looks like now
temp2
# we can also directly substitute individual elements
temp2[1] <- "bh69"
# we can also directly substitute ranges of elements
temp2[13:14] <- c("bmall", "bpseu")
# last thing to clean up that was inconsistent across names
temp3 <- gsub(temp2, pattern="qs", replacement="")
# check what it looks like now
temp3
# replace the original labels
andi.tree$tip.label <- temp3
# clean up unnecessary files
rm(temp, temp2, temp3)
```
6. Now look at the tree
```
plot(andi.tree)
nodelabels()
```
nodelabels put the numbers on each node, two outgroupd at node 34

7. Root tree with outgroup node
```
rooted.andi <- root.phylo(andi.tree, node=34, resolve.root=T)
plot(rooted.andi, cex=1, label.offset=0.005
```

8. Look at the tree outputted by orthfinder
plot ortho tree:
```
ortho.tree <- read.tree("/courses/bi278/Course_Materials/lab_07/orthofinder_18taxa/Species_Tree/SpeciesTree_rooted.txt")
plot(ortho.tree)
# fix tip labels
ortho.tree$tip.label
temp <- gsub(ortho.tree$tip.label, pattern="_prokka", replacement="")
temp2 <- gsub(temp, pattern="qs", replacement="")
temp2[6:7] <- c("Bmallei", "Bpseudomallei")
# check
temp2
# replace labels
ortho.tree$tip.label <- temp2
rm(temp, temp2)
# root this tree too
plot(ortho.tree)
nodelabels()
rooted.ortho <- root.phylo(ortho.tree, node=25, resolve.root=T)
plot(rooted.ortho, cex=1, label.offset=0.005)
```
output:
9. Rotate nodes of this tree so tips match more closely thetips pf the tree from andi
```
rotate.ortho <- rotateConstr(rooted.ortho, rooted.andi$tip.label)
plot(rotate.ortho)
```
10. Compare how similar these gene trees are
```
# compare topology (branching pattern)
all.equal(andi.tree, ortho.tree, use.edge.length=FALSE)
# plot both trees side by side
cophyloplot(rooted.andi, rotate.ortho)
```

11.Look at gene tree
```# the tree you generated with IQ-TREE
iq.tree <- read.tree("PATH/OG0000084_mafft.faa.treefile")
# check and root this tree
plot(iq.tree)
nodelabels()
rooted.iq <- root.phylo(iq.tree, node=33, resolve.root=T)
plot(rooted.iq, cex=0.75, label.offset=0.005)
```
gave me error case nide shouod be greater than number of taxa node=64

18.Lastly, you can save any figures that you created directly, but let’s also save our cleaned up species trees as text files in case you want to use them in the future.
```
write.tree(rooted.andi, file="PATH/andi_cleaned.tre")
write.tree(rotate.iq, file="PATH/iq_sensorkinase.tre")
```
gave me an derror ofor my rotate.iq, says it was not found
## Exercise 4. Modify your tree with `dendextend`
To use dendextend on trees you may have built with any phylogenetic package, you will need to convert the newick tree file format to a dendrogram format. The R package DECIPHER is able to do this. EG with gene tree
First load packages
```
# install if not present
library(DECIPHER)
library(dendextend)
```
2. Read in a tree then plot it
```
iq.dend <- ReadDendrogram("/home2/cibene26/lab_07/OG0002162_mafft.faa.treefile")
plot(iq.dend)
```

3. Can adjust margins. to make more visisble
```
par(mar = c(8,4,2,2) + 0.1) #bottom, left, top, right; default is c(5,4,4,2)
plot(iq.dend)
```

4.The order of taxa on the dendrogram is identical to what you see in the plot window
```
iq.dend %>% labels
```
dendextend is useful for chaging the visuals of your tree
5.You can color the leaves manually based on the label order you just checked,colored circles here:
```
iq.dend %>% set("leaves_pch", 19) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle")
```
OUTPUT;
6. You can also turn these plots sideways, or change the circles to squares by changing the pch attribute:
```
par(mar = c(4,2,2,8))
q.dend %>% set("leaves_pch", 15) %>% set("leaves_cex", 1) %>% set("leaves_col", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE)
```

7. Could also change colors by adding extra shapes
```
iq.dend %>% set("labels_colors", c(rep(1,15),2,2,2,2,2,3,3,3,4,4,4,4,4,4,4,4,5,5)) %>% plot(type="rectangle", horiz=TRUE)
```
[Screenshot 2023-11-07 at 3.21.41 PM.png](https://hackmd.io/_uploads/ryLO2GOQa.png)
8.Finally, this command sets the margins back to the default afterwards.
```
par(mar = c(5,4,4,2))
```