**Practical 3: Deep Time**

# **Practical 3: Deep Time** This practical is aimed to get you familiar with the kind of analyses evolutionary palaeobiologists conduct to look at the mode and tempo of evolution in deep time. You will: ***(a)*** use model-fitting approaches to understand the mode of evolution of a continuous trait (body size), using non-phylogenetic (time series) and phylogenetic approaches ***(b)*** use phylogenetic ridge regression to examine rate of change of a multivariate trait (shape) The practical is based around data for archosaurs (dinosaurs, pterosaurs and crocodilians) and their relatives, from the early part of their radiation following the end-Permian extinction event (the "dawn of the dinosaurs"). The entire practical is in R, and you will need the packages ape, paleoTS, geiger, qpcR, RRphylo, phytools, and viridis. You can install them all with `install.packages("NAME")` **Tips:** 1. If you ever want to find out more details about a package or function in R (if it's on CRAN) you can type `??` directly (no space) followed by the package/function name, and click the links which come up. 2. You can **save your workspaces**, so you can keep date or trees in R, using File->Save Workspace and save them as .RData. You can then open them again later. The **data you will need for the prac** you can download here: https://drive.google.com/drive/folders/16Rl-jvDEUTHbthF2Z9HnJeckOvdms5ia?usp=sharing **NOTES ON EXCEL AND SPREADSHEETS:** Following difficulties in the session today, we have **also uploaded .csv files** for the data after it was extracted from the supplement in the model-fitting section. Also note that **if you do not have Excel**, you can use **Google Sheets** or **Libre Office** (**ideally installed in English**). Furthermore, if you have German Excel you may need to use translations of functions (e.g. VERKETTEN() for CONCAT(), and .CSV files may be saved with ";" instead of "," separating their parts. If ";" is used in the CSV file simply replace `sep=","` with `sep=";"` and add in `dec=","` within `read.csv()` functions. Just **get in touch** if it still doesn't work! During the practical, you will do your analyses with all the data for Archosauromorpha (archosaurs and their relatives) as a whole. As homework, and assessed via the assessment questions, you will do the same but just for the clade Dinosauromorpha (dinosaurs and their relatives). You will present and compare the analyses and answer some simple questions as assessment. ![](https://hackmd.io/_uploads/SJr11bqDh.png) # Part 1: Model Fitting This part of the practical is all based on data from my study Sookias et al. 2012 (see below). This study investigated patterns in the evolution of body size in the early radiation of archosaurs, in part to see whether "Cope's rule" (i.e. a selection-driven increase in body size through time) occurred or not, and could have caused dinosaurs' giant sizes. To estimate body size, a proxy measurement - femoral (femur) length - was used, either measured or estimated for each species. We will look at this using both time-series and phylogenetic model-fitting approaches. ## Non-phylogenetic models In this part of the practical you will fit three different models of evolution to body size time series data using the package paleoTS. ### **Compiling time series data** The body size data is from the supplement of my own 2012 paper: > Sookias, R. B., Butler, R. J., & Benson, R. B. (2012). Rise of dinosaurs reveals major body-size transitions are driven by passive processes of trait evolution. Proceedings of the Royal Society B: Biological Sciences, 279(1736), 2180-2187. Find the paper (e.g. via Google Scholar), find the supplementary information ("Supplemental Material" on left hand side) and download the .xls file. This has the data You can also have a quick look at the paper if you want, to get some idea of what we will do. Open the Excel workbook, and look at the sheet "TSMF data, stage bins". Here you can see time series data of logged body size proxy (femoral length) means for different groups (clades) of animals from the Late Permian to Middle Jurassic. The data is "binned", which means all the species whose potential age (stratigraphic) range overlaps with a certain segment (bin) of time are put together and averaged for that bin. On that sheet, the bins correspond to geological stages which have varying real time lengths (see the [international chronostratigraphic chart](https://stratigraphy.org/chart), and the methods part of the paper and PDF supplement). The midpoint of each stage bin is given alongside the mean log femoral length, its variance in the bin, and the sample size in the bin *n*: ![](https://hackmd.io/_uploads/rJ29qW9P3.png) Together, we will look at Archosauromorpha as a whole (you will do Dinosauromorpha yourselves later), so **copy and paste** all the data for that clade into a new workbook, including the column headings (the **grey part** highlighted in the image above). The data needs to be reordered for paleoTS, which needs the columns in the order **sample size, sample means, sample variances, sample ages**. Now **delete the header row** (top row with column headings in) and save the file as .CSV called StageBinData.csv (you can choose a different name, but you will need to adjust the later R code accordingly). It should now be ready to open in paleoTS and look like this (in Excel): ![excel](https://hackmd.io/_uploads/rJAN84_U0.png) Make sure it is saved in the folder you choose to use as your working directory in the next part (in my case /ModelFitting - see below). ### **Loading the data into R** Open R. I would personally use R directly, with an open text file next to it for code, because sometimes RStudio has bugs, but if you prefer do use RStudio. Set the working directory to wherever you have your data, e.g. `setwd("C:/Users/rsook/Desktop/DeepTimePrac/ModelFitting")` Load the package paleoTS: `library(paleoTS)` Now open your data and save it as an object called stage.bin.data (you can choose a different name if you like, but again you will need to adjust the code accordingly): `stage.bin.data<-read.paleoTS(file="StageBinData.csv",sep=",",oldest="last")` Note if you are using German Excel you may need to specify sep=";" and dec="," because commas are used for decimals and semi-colons for separators, so: ``` stage.bin.data<-read.paleoTS(file="StageBinData.csv",sep=";", dec=",",oldest="last") ``` Have a look at the paleoTS object which has been created, and check it all looks fine: `stage.bin.data` It should begin something like this: ![paleots](https://hackmd.io/_uploads/rkWnIN_UC.png) ### **Doing the analysis** Now you are ready to actually analyze this data. You will be simultaneously fitting an Unbiased Random Walk (URW; =Brownian motion, i.e. no trend) model, a Generalized Random Walk (GRW; =trend) model, and a stasis model. An unbiased random walk model is random movements (steps) either up or down of varying lengths through time, a trend model (GRW) is that there is a bias towards upwards or downwards steps (mean is positive or negative), and stasis is like URW/BM but with the average step size very small and not moving far. See Hunt and Carrano 2010 (references at end) for a good explanation: ![](https://hackmd.io/_uploads/HyS7Qz9wh.png) To do the analysis, simply use the fit3models() function: `fit3models(stage.bin.data)` This will print the result to the screen but not save it. If you want to save it, you can copy and paste it to e.g. Word, or save it as an object in the R workspace (which you will also need to save as .RDat): `model.fit.stage.bins<-fit3models(stage.bin.data)` In any case, the result should be something like this: ![](https://hackmd.io/_uploads/rJToZ5TPh.png) Here you have the log likelihood of each model (logL), the number of parameters of the model (K), the Akaike Information Criterion for small sample sizes (AICc - more negative is better), the delta AICc (i.e. the difference of each model from the best scoring model/models), and the Akaike weight (the relative support for each model totalling to 1). Which model is the best, and is it clear? Does this model comparison result make sense for the data? Look at the original data, and e.g. Fig.1a in Sookias et al. (2012) which shows archosauromorph body sizes plotted as black triangles. How many parameters does it have? Look at the likelihood scores by comparison to AICc... Has the number of parameters affected the choice of best model in this case? ### **Analysis with 5 myr bins** Now you will try the same analysis, but with equal, 5 Ma time bins. Copy the Archosauromorpha data from the supplement, but this time from the sheet "TSMF data, 5 Myr bins". Be careful - make sure the columns are in the **order sample size, sample means, sample variances, sample ages**. Save it as a CSV and read it into R, and perform the analysis like before. What are the differences in these results, if any, and why? ## Phylogenetic models Now we will look at phylogenetic models, with similar data. To do this, we obviously need a phylogeny. The one we will use is an "informal supertree" (i.e. manually put together by experts based on various studies) from Sookias et al. 2012, and we will calibrate it by age. We also need the body size proxy trait data (femoral length) for each of the tips of the phylogeny. The tree and age data to calibrate it I have provided (archosauromorpha_tre.tre), and the trait data you will get from the supplement again. ### **Compiling size proxy data** First, let's get the data from the supplement. Go to the sheet "Taxa" (the first sheet). That has all the taxa used and all the data. Select all the columns up to "Measured or calculated femur length (mm)", and all the taxon rows down to *Zupaysaurus* (198): ![](https://hackmd.io/_uploads/SyXR-N5wn.png) Copy and paste this into a new workbook. Then delete the columns except "Genus", "Species" and "Measured or calculated femur length (mm)", and put the header "Taxon" on the column to the right of the "Measured or calculated femur length(mm)" values column (i.e. column D). In this column, we want to make the taxon names as they appear in the tree, so they can be matched up in R. Have a look at the tree file in a text editor - each taxon has the genus and species separated by an underscore "_", so "Zupaysaurus_rougieri". To make all the taxon names in this format in the excel, type the following formula into the top cell of the blank column: =CONCAT(A2,"_",B2) **UPDATE:** the formula needs to be entered all at once (don't click off the square while entering it) otherwise sometimes it displays as text. In German you need to use VERKETTEN() and semi-colons instead of the commas, so like =VERKETTEN(A2;"_",B2). If it really doesn't work, try Google Sheets or Libre Office (installed in English version). Now copy the entire column D and past it into column E using the **paste values** option. This turns the formulae into their results (text). ![](https://hackmd.io/_uploads/rkCOL06Dn.png) Next, cut and insert column E (the new names) before the femur lengths C. Do this by right clicking on the "E" heading (i.e. cell outside the spreadsheet itself showing "E") to get the menu with the cut option, clicking that, then right clicking on the "C" heading to get the menu with "Insert Copied Cells" and clicking that: ![](https://hackmd.io/_uploads/SyQVDCpD3.png) Then delete all the columns except the new names and values, and **delete the header row.** Now your sheet should look like this: ![](https://hackmd.io/_uploads/H1a7dR6Ph.png) Now save the file as a CSV file called fl.csv. **You will also need to make one small edit - please delete the space after the name *Poposaurus_gracilis* !** **N.B.** The file at this point of preparation is available on the Google Drive folder and Moodle as fl.csv. Now your size data is ready to go! Now you will read in the data into R. Start R and type: `fl<-read.csv(file="fl.csv",sep=",",header=FALSE,row.names=1)` See what it looks like (always a good idea): `fl` ### **Calibrating the tree** Open the tree file archosauromorpha_tree.tre. This is the same tree as in Fig. S1 of the PDF supplement of Sookias et al. 2012 - foolishly I didn't include it with the paper, but it could be reconstructed by you in Mesquite from the image. Have a look at it - you should be able to see it is in the Newick format like last time, and there are no branch lengths given: ![](https://hackmd.io/_uploads/SJmRXE9v2.png) We will date the tree using stratigraphic midpoints for each taxon, from the file ages.csv. Have a quick look at that file - it's a list of the taxon names and ages. The ages are taken from column "Midpoint (Ma)" ( R ) in the supplementary data ("Taxa" sheet). You could have assembled them yourselves, but I wanted to save us some time! Now we will read the tree and the age file into R. First load all the needed packages for this part of the prac in the same open R terminal as before: ``` library(ape) library(geiger) library(qpcR) ``` Also link to [Graeme Lloyd's code](https://graemetlloyd.com/methdpf.html) for the function date.phylo(), for dating the tree: `source("http://www.graemetlloyd.com/pubdata/functions_2.r")` And reset the working directory to where your data is, if needed: `setwd("C:/Users/rsook/Desktop/DeepTimePrac/ModelFitting")` Now, read in the tree: `tree<-read.tree(file="archosauromorpha_tree.tre")` See what happens when you type `tree` ![](https://hackmd.io/_uploads/HkLlnuiDh.png) What can you tell about the tree? Now plot the tree with `plot(tree, cex=0.5)` Can you see the tree? What does changing the value of cex do? ![](https://hackmd.io/_uploads/r1NysOov3.png) You can export the tree as a PDF with File->Save as: ![](https://hackmd.io/_uploads/ry4reipP3.png) Now read in the ages data: `ages<-read.csv("ages.csv",row.names=1,sep=",", header=FALSE)` Before we calibrate, we need to double check the taxa in ages and in the tree match up: `name.check(tree,ages)` Are they OK? We also need to make the tree fully resolved (remove any polytomies - we will do it randomly): `restree<-multi2di(tree)` Now you are ready to time calibrate the tree using date.phylo(). We will use the method of Brusatte et al. 2008, which we discussed in the lecture. This method sets nodes at the age of the oldest member of each clade, and then shares equally the length of non-zero branches over zero branches to avoid zeros. date.phylo() can also use the Ruta et al. 2006 method which splits time according to morphological change on that branch if you have character data, but we don't. We also need to set the root length (otherwise it will be zero), and will chose a value of 5 Ma, following Sookias et al. 2012 (and that was for various reasons to do with likely divergence times - see paper). `ttree<-date.phylo(restree, ages, rlen=5, method="equal")` Now plot the tree to double check it: `plot(ttree)` ![](https://hackmd.io/_uploads/S1f33_jDn.png) What is the difference from the old tree? Now try just `ttree` ![](https://hackmd.io/_uploads/HkHc2usP3.png) What is the difference from the previous output? You can also save the tree as a file (to see how branch lengths are stored) with: `write.tree(ttree,file="ttree.tre")` ...Have a quick look at the file in a text editor. Where are the branch lengths stored? Now we are ready for model fitting! ### ***Doing the analysis*** First, we want to log the FL data: `logfl<-log(fl,10)` Inspect it: `logfl` Now we can use the function fitContinuous() from the package geiger to fit different models like before, but this time taking phylogeny into account. We will also - unlike before - use an Ornstein-Uhlenbeck model too, which models a trait moving around a selective optimum ("attractor"): ![](https://hackmd.io/_uploads/HJtOp45vh.png) So, let's fit the various models and save them as R objects... Brownian motion(unweighted random walk): `BM.fit=fitContinuous(ttree,logfl,model="BM")` Stasis (called "white" by geiger): `stasis.fit=fitContinuous(ttree,logfl,model="white")` Trend: `trend.fit=fitContinuous(ttree,logfl,model="mean_trend")` Ornstein-Uhlenbeck: `OU.fit=fitContinuous(ttree,logfl,model="OU")` Have a look at these objects by typing their name, e.g. `BM.fit` ![](https://hackmd.io/_uploads/B1n7QepDh.png) What can you see? Note that **your results may differ susbtantially from those presented here** because of the random resolution of the tree. In any case, each analysis is also slightly stochastically different. Copy and paste the output into a word file and save it. Inspect the $opt part of the fits - this has the parameters, likelihood and AICc as separate parts, e.g. `BM.fit$opt` ![](https://hackmd.io/_uploads/rk8iXg6w2.png) We can use the AIC to calculate Akaike weights easily with the qpcR package ``` aics<-c(BM.fit$opt$aic,stasis.fit$opt$aic,trend.fit$opt$aic,OU.fit$opt$aic) aic.weights<-akaike.weights(aics) ``` We can round them up to understand them better, and name them by model... ``` weights<-formatC(aic.weights$weights) names(weights)<-c("BM","Stasis","Trend","OU") ``` And view them `weights` ![](https://hackmd.io/_uploads/S1274laDh.png) Which model is the best, and by how much? Is this the same result as using the time-series mean data? Again, **note your results may differ substantially** from this if your tree was randomly resolved differently, and stochastically there is also some random difference. We can do the same, but using the AIC adjusted for small sample sizes too, which is **actually more appropriate for the size of sample** we have here: ``` aiccs<-c(BM.fit$opt$aicc,stasis.fit$opt$aicc,trend.fit$opt$aicc,OU.fit$opt$aicc) aicc.weights<-akaike.weights(aiccs) ``` We can round them up to understand them better, and name them by model ``` weights.aicc<-formatC(aicc.weights$weights) names(weights.aicc)<-c("BM","Stasis","Trend","OU") ``` And view them `weights.aicc` Are they the same, and if not what is different? Should we trust AIC or AICc more here? # Part 2: Rates In this section you will look at rates of skull shape evolution in archosaurs, starting in the Triassic but also including later taxa. It is based on the data and approach used in me and my colleagues' paper Foth et al. 2021. This uses RRphylo (Castiglione et al. 2018) to do phylogenetic ridge regression on, in this case, multivariate data. Let's have a quick look at the Foth et al. paper to see what it's all about - find it via Google Scholar: > Foth, C., Sookias, R. B., & Ezcurra, M. D. (2021). Rapid initial morphospace expansion and delayed morphological disparity peak in the first 100 million years of the Archosauromorph evolutionary radiation. Frontiers in Earth Science, 763. The paper also uses the data to look at disparity (morphological diversity) through time, and as well as the skull it looks at the pelvis for comparison. We are not doing any of this today though - just rates for the skull. ## Reading in data I have provided the data files for you, as it's a bit time-consuming to get them together from the literature. First, open the required packages: ``` library(RRphylo) library(ape) library(phytools) library(viridis) ``` Set working directory as needed: `setwd("C:/Users/rsook/Desktop/DeepTimePrac/Rates")` Have a quick look at the tree file rates_tree.nex it is in nexus format: ![](https://hackmd.io/_uploads/Syg6LOovn.png) ...and has branch lengths already (it is time-calibrated - scroll down to bottom of file): ![](https://hackmd.io/_uploads/Hk8hwOsv3.png) Read in tree: `tree<-read.nexus(file="rates_tree.nex")` Have a look at the file pcadata.csv. It is PCA scores for an ordination of landmark points. If we have time, we may have a go at doing the ordination ourselves from the points. You can use a text editor or Excel. Here is a view from a text editor: ![](https://hackmd.io/_uploads/Sy-HD_oDn.png) Read in PCA data: `data=data.matrix(read.csv(file="pcadata.csv",sep=",",header=TRUE,row.names=1))` Now you need to convert data to positive values only, to allow RRphylo to perform its calculations properly: ``` for(i in 1:length(data[1,])){ min<-min(data[,i]) minplus=-min data[,i]<-data[,i]+minplus } ``` Double check it worked by typing: `data` Now you are ready to do the analysis! ## Analysis ### Rates First we will do the rate calculations with RRPhylo, and save it as an object: `RR=RRphylo(tree,data,cov=NULL,rootV=NULL,aces=NULL,clus=0.5)` Explore the object (and look at the output) by typing its name: `RR` ![](https://hackmd.io/_uploads/SyjGLuiwh.png) You can also use str() to understand its structure: `str(RR)` You can see within the object there are lists of all the branches, with the rate values for each branch. This is not very easy to read, although it can be useful for further calculations: ![](https://hackmd.io/_uploads/SJJN8ujw2.png) To quickly extract and view the rates section from the RR object, type: `RR$rates` In order to make it more intuitive, we want to plot rates onto tree: ``` pRR<-plotRR(RR,data,multivariate="rates") pRR$plotRRrates(tree.args=list(edge.width=2,show.tip.label=TRUE,cex=0.5),color.pal=viridis,colorbar.args = list(x="topright",labs.adj=0.2,xpd=TRUE, width=0.5)) ``` ![rates](https://hackmd.io/_uploads/ByIYhNVw0.png) Which seem to be the fastest evolving branches and sections of the tree, and which the slower? Any idea why? Remember, you can **export the tree as a PDF** with File->Save as->PDF and this allows you to zoom in and properly view it. ### Shifts No we want to search for significant shifts in rate at nodes in the tree. We will do this using the search.shift() function in RRphylo: `shifts=search.shift(RR,status.type= "clade")` Let's have a look at the shifts object. You should be able to see nodes with significance values by them (you can also use str() again if you like to examine the structure): `shifts` ![](https://hackmd.io/_uploads/BkFTSdsD3.png) Now we can plot the shifts as coloured circles onto the tree. Red ones mean the rate significantly decreased (slowed down), blue that it increased (got faster): ``` plot(tree,cex=0.5) addShift(shifts) ``` You can adjust the size of the text by changing the number after `cex=` ![](https://hackmd.io/_uploads/Hy9jrOsPh.png) It can be hard to see exactly which node is indicated by the circle. You can find the nod numbers listed at the end of the shifts object under $single.clades: ![](https://hackmd.io/_uploads/ryH4HdoPn.png) ...Or you can type `shifts$single.clades` to see it immediatey. In order to check which taxa are bracketed by that node, you can extract the part of the tree bracketed by the node and plot it like this (assuming the node of interest is 162 here): ``` tree162<-extract.clade(tree,162) plot(tree162) ``` If it's a different number, just change the 162 in the extract.clade() function. Where are there significant shifts, if any? Why might they be there? Note that this is one, single randomly resolved and stochastically dated tree - in the real study a whole lot of different dating and resolution options were tested. Also note that significant shifts are hard to find (it's very conservative) but tend to localize in similar areas of trees with slightly different resolution and dating. # Note: pruning trees for homework For two parts of the homework you will need to use just the dinosaur(omorph) part of the tree. To prune a tree down to the node which two taxa bracket, you can use the following code... Make sure the appropriate tree is loaded in R. For the model-fitting part, this is ttree, i.e. the time-calibrated fully-resolved tree. Make sure these packages are running: ``` library(ape) library(phytools) ``` Now find the node bracketing the two taxa you want (see below for which ones - replace `TAXONA` and `TAXONB` with their names; replace `TREE` with the appropriate tree, so probably either `ttree` for the model-fitting or `tree` for the rates) and save it as an object: `node<-getMRCA(TREE,c("TAXONA","TAXONB"))` Check node number by typing the object name: `node` Now extract only the part of the tree branching from that node: `dinosauromorpha_tree<-extract.clade(TREE,node)` Plot it, to check it worked: `plot(dinosauromorpha_tree)` To avoid having to change the name of the tree in the analysis code, you can rename the tree to whatever needed, e.g.: Either this for model-fitting: `ttree<-dinosauromorpha_tree` Or this for rates: `tree<-dinosauromorpha_tree` If you type e.g. this before that, you can also save the old tree: `old_tree<-TREE` The taxa which bracket the dinosauromorph clade are: **Model-fitting tree:** *Lagerpeton_chanarensis* and *Efraasia_minor* **Rates tree:** *Herrerasaurus* and *Psittacosaurus* Just ask if this doesn't make sense or doesn't work! # References Brusatte, S. L., M. J. Benton, M. Ruta and G. T. Lloyd, 2008. Superiority, competition, and opportunism in the evolutionary radiation of dinosaurs. Science, 321, 1485-1488. Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., ... & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9(4), 974-983. Foth, C., Sookias, R. B., & Ezcurra, M. D. (2021). Rapid initial morphospace expansion and delayed morphological disparity peak in the first 100 million years of the Archosauromorph evolutionary radiation. Frontiers in Earth Science, 763. Hunt, G., & Carrano, M. T. (2010). Models and methods for analyzing phenotypic evolution in lineages and clades. The Paleontological Society Papers, 16, 245-269. Ruta, M., P. J. Wagner and M. I. Coates, 2006. Evolutionary patterns in early tetrapods. I. Rapid initial diversification followed by decrease in rates of character change. Proceedings of the Royal Society B-Biological Sciences, 273, 2107-2111. Sookias, R. B., Butler, R. J., & Benson, R. B. (2012). Rise of dinosaurs reveals major body-size transitions are driven by passive processes of trait evolution. Proceedings of the Royal Society B: Biological Sciences, 279(1736), 2180-2187.