# BIT 150: PROJECT 2
## INTRODUCTION
Avian reoviruses (ARV) have been causing severe tenosynovitis in broiler chickens in the U.S. since 2011. The avian reoviruses have an RNA divided into ten segments that are classified into three large (L1, L2, L3), three medium (M1, M2, M3), and four small (S1, S2, S3, S4) genes. This feature favors point mutations and recombination events (1, 3, 4, 5). This scenario complicates control and prevention using conventional vaccines. Currently, the variability rate of avian reoviruses hinders the ability to formulate vaccines that provide persistent protection and because of that, it is important to study conserved and variable regions in the virus genome. Certain regions might be related to pathogenicity and antigenicity determinants, and understanding this, it would make more efficient the vaccine strains selection. Several groups across North America (1, 2, 4, 5), Europe (6, 7), and China (8) have worked on the molecular typing of ARV mainly targeting the sigma C gene, one of the major neutralizing antibodies inducing proteins. However, applying this typing directly to preventative strategies, particularly to vaccine selection has proven to be insufficient (5). Also, molecular characterization based on the sigma C gene does not always correlate with antigenicity or pathogenicity of ARV isolated in the field (2). That is why is important to characterize and analyze other segments. Whit this project I aim to explore and visualize the genetic variation using the genetic distance of four ARV genes.
## METHODS
### Data Acquisition
All genetic sequences for the avian reovirus L1, L2, S1, and S4 were searched in NCBI (https://www.ncbi.nlm.nih.gov), each set of the sequence was sorted by date released (VRL 16-JUL-2019) and downloaded from in FASTA format. Seventeen sequences corresponding to each gene were aligned using MEGA-X software and for each gene, the file was downloaded in a concatenated FASTA file.
For each gene, a table file (CSV) was created annotating the identification code, year, and organ of isolation.
### Data Integration
In order to create genetic distance-based heatmaps and maximum likelihood phylogenetic trees for each gene in the study, the two files were integrated using R commands that were studied and retrieved from Jombart, 2016 (9).
```
**R_SCRIPT**
install.packages("adegenet", dep=TRUE)
install.packages("phangorn", dep=TRUE)
library(stats)
library(ade4)
library(ape)
library(adegenet)
library(phangorn)
##L1_HEATMAP
dna <- fasta2DNAbin(file="REOL1.fasta")
dna
class(dna)
as.character(dna)[1:5,1:10]
unclass(dna)[1:5,1:10]
typeof(unclass(dna)[1:5,1:10])
object.size(as.character(dna))/object.size(dna)
annot <- read.csv("REOL1.annot.csv",
header=TRUE, row.names=1)
head(annot)
head(annot)
dim(dna)
dim(annot)
all(annot$accession==rownames(dna))
table(annot$year)
D <- dist.dna(dna, model="TN93")
class(D)
length(D)
temp <- as.data.frame(as.matrix(D))
table.paint(temp, cleg=0, clabel.row=.5, clabel.col=.5)
temp <- t(as.matrix(D))
temp <- temp[,ncol(temp):1]
par(mar=c(1,5,5,1))
image(x=1:18, y=1:18, temp, col=rev(heat.colors(100)), xaxt="n", yaxt="n",
xlab="",ylab="")
axis(side=2, at=1:18, lab=rownames(dna), las=2, cex.axis=.5)
axis(side=3, at=1:18, lab=rownames(dna), las=3, cex.axis=.5)
##L1_ML_TREE
class(dna2)
dna2
tre.ini <- nj(dist.dna(dna,model = "TN93"))
tre.ini
fit.ini <- pml(tre.ini, dna2, k = 4)
fit.ini
fit <- optim.pml(fit.ini, optNni = TRUE, optBf = TRUE,
optQ = TRUE, optGamma = TRUE)
fit
class(fit)
names(fit)
anova(fit.ini, fit)
AIC(fit.ini)
AIC(fit)
tre4 <- root(fit$tree,1)
tre4 <- ladderize(tre4)
plot(tre4, show.tip = FALSE, edge.width = 1)
title("Avian Reovirus L1 Maximum-likelihood tree")
tiplabels(annot$year,
bg = transp(num2col(annot$year, col.pal = myPal), .7),
cex = .5, fg = "transparent")
axisPhylo()
temp <- pretty(1980:2018, 2)
legend("bottomleft", fill = transp(num2col(temp, col.pal = myPal), .7),
leg = temp, ncol = 2)
##L2_HEATMAP
dna <- fasta2DNAbin(file="REOL2.fasta")
dna
class(dna)
as.character(dna)[1:5,1:10]
unclass(dna)[1:5,1:10]
typeof(unclass(dna)[1:5,1:10])
object.size(as.character(dna))/object.size(dna)
annot <- read.csv("REOL2.annot.csv",
header=TRUE, row.names=1)
head(annot)
head(annot)
dim(dna)
dim(annot)
all(annot$accession==rownames(dna))
table(annot$year)
D <- dist.dna(dna, model="TN93")
class(D)
length(D)
temp <- as.data.frame(as.matrix(D))
table.paint(temp, cleg=0, clabel.row=.5, clabel.col=.5)
temp <- t(as.matrix(D))
temp <- temp[,ncol(temp):1]
par(mar=c(1,5,5,1))
image(x=1:18, y=1:18, temp, col=rev(heat.colors(100)), xaxt="n", yaxt="n",
xlab="",ylab="")
axis(side=2, at=1:18, lab=rownames(dna), las=2, cex.axis=.5)
axis(side=3, at=1:18, lab=rownames(dna), las=3, cex.axis=.5)
##L2_ML_TREE
class(dna2)
dna2
tre.ini <- nj(dist.dna(dna,model = "TN93"))
tre.ini
fit.ini <- pml(tre.ini, dna2, k = 4)
fit.ini
fit <- optim.pml(fit.ini, optNni = TRUE, optBf = TRUE,
optQ = TRUE, optGamma = TRUE)
fit
class(fit)
names(fit)
anova(fit.ini, fit)
AIC(fit.ini)
AIC(fit)
tre4 <- root(fit$tree,1)
tre4 <- ladderize(tre4)
plot(tre4, show.tip = FALSE, edge.width = 1)
title("Avian Reovirus L2 Maximum-likelihood tree")
tiplabels(annot$year,
bg = transp(num2col(annot$year, col.pal = myPal), .7),
cex = .5, fg = "transparent")
axisPhylo()
temp <- pretty(1980:2018, 2)
legend("bottomleft", fill = transp(num2col(temp, col.pal = myPal), .7),
leg = temp, ncol = 2)
##S1_HEATMAP
dna <- fasta2DNAbin(file="REOS1.fasta")
dna
class(dna)
as.character(dna)[1:5,1:10]
unclass(dna)[1:5,1:10]
typeof(unclass(dna)[1:5,1:10])
object.size(as.character(dna))/object.size(dna)
annot <- read.csv("REOS1.annot.csv",
header=TRUE, row.names=1)
head(annot)
head(annot)
dim(dna)
dim(annot)
all(annot$accession==rownames(dna))
table(annot$year)
D <- dist.dna(dna, model="TN93")
class(D)
length(D)
temp <- as.data.frame(as.matrix(D))
table.paint(temp, cleg=0, clabel.row=.5, clabel.col=.5)
temp <- t(as.matrix(D))
temp <- temp[,ncol(temp):1]
par(mar=c(1,5,5,1))
image(x=1:15, y=1:15, temp, col=rev(heat.colors(100)), xaxt="n", yaxt="n",
xlab="",ylab="")
axis(side=2, at=1:15, lab=rownames(dna), las=2, cex.axis=.5)
axis(side=3, at=1:15, lab=rownames(dna), las=3, cex.axis=.5)
##S1_ML_TREE
class(dna2)
dna2
tre.ini <- nj(dist.dna(dna,model = "TN93"))
tre.ini
fit.ini <- pml(tre.ini, dna2, k = 4)
fit.ini
fit <- optim.pml(fit.ini, optNni = TRUE, optBf = TRUE,
optQ = TRUE, optGamma = TRUE)
fit
class(fit)
names(fit)
anova(fit.ini, fit)
AIC(fit.ini)
AIC(fit)
tre4 <- root(fit$tree,1)
tre4 <- ladderize(tre4)
plot(tre4, show.tip = FALSE, edge.width = 1)
title("Avian Reovirus S1 Maximum-likelihood tree")
tiplabels(annot$year,
bg = transp(num2col(annot$year, col.pal = myPal), .7),
cex = .5, fg = "transparent")
axisPhylo()
temp <- pretty(1980:2018, 2)
legend("bottomleft", fill = transp(num2col(temp, col.pal = myPal), .7),
leg = temp, ncol = 2)
##S4_HEATMAP
dna <- fasta2DNAbin(file="REOS4.fasta")
dna
class(dna)
as.character(dna)[1:5,1:10]
unclass(dna)[1:5,1:10]
typeof(unclass(dna)[1:5,1:10])
object.size(as.character(dna))/object.size(dna)
annot <- read.csv("REOS4.annot.csv",
header=TRUE, row.names=1)
head(annot)
head(annot)
dim(dna)
dim(annot)
all(annot$accession==rownames(dna))
table(annot$year)
D <- dist.dna(dna, model="TN93")
class(D)
length(D)
temp <- as.data.frame(as.matrix(D))
table.paint(temp, cleg=0, clabel.row=.5, clabel.col=.5)
temp <- t(as.matrix(D))
temp <- temp[,ncol(temp):1]
par(mar=c(1,5,5,1))
image(x=1:17, y=1:17, temp, col=rev(heat.colors(100)), xaxt="n", yaxt="n",
xlab="",ylab="")
axis(side=2, at=1:17, lab=rownames(dna), las=2, cex.axis=.5)
axis(side=3, at=1:17, lab=rownames(dna), las=3, cex.axis=.5)
##S4_ML_TREE
class(dna2)
dna2
tre.ini <- nj(dist.dna(dna,model = "TN93"))
tre.ini
fit.ini <- pml(tre.ini, dna2, k = 4)
fit.ini
fit <- optim.pml(fit.ini, optNni = TRUE, optBf = TRUE,
optQ = TRUE, optGamma = TRUE)
fit
class(fit)
names(fit)
anova(fit.ini, fit)
AIC(fit.ini)
AIC(fit)
tre4 <- root(fit$tree,1)
tre4 <- ladderize(tre4)
plot(tre4, show.tip = FALSE, edge.width = 1)
title("Avian Reovirus S4 Maximum-likelihood tree")
tiplabels(annot$year,
bg = transp(num2col(annot$year, col.pal = myPal), .7),
cex = .5, fg = "transparent")
axisPhylo()
temp <- pretty(1980:2018, 2)
legend("bottomleft", fill = transp(num2col(temp, col.pal = myPal), .7),
leg = temp, ncol = 2)
```
## RESULTS
Genetic distance based heatmaps
One genetic distance based heatmap was created for each gene: L1 (figure 1), L2 (figure 2), S1 (figure 3), and S4 (figure 4).
Maximum likelihood phylogenetic trees
One maximum likelihood phylogenetic tree was created for each gene: L1 (figure 5), L2 (figure 6, and S4 (figure 7).

Figure 1. L1 gene heatmap representing the genetic distances between 18 ARV strains. White color represent 100% homology. Darker colors represent high genetic distance. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 2. L2 gene heatmap representing the genetic distances between 18 ARV strains. White color represent 100% homology. Darker colors represent high genetic distance. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 3. S1 gene heatmap representing the genetic distances between 15 ARV strains. White color represent 100% homology. Darker colors represent high genetic distance. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 4. S4 gene heatmap representing the genetic distances between 17 ARV strains. White color represent 100% homology. Darker colors represent high genetic distance. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 5. L1 Maximun likelihood phylogenetic tree representing the genetic distances between 18 ARV strains. Different label colors indicates year of isolation. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 6. L2 Maximun likelihood phylogenetic tree representing the genetic distances between 18 ARV strains. Different label colors indicates year of isolation. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093

Figure 7. L1 Maximun likelihood phylogenetic tree representing the genetic distances between 17 ARV strains. Different label colors indicates year of isolation. Figure was created using the Rscript adapted from Jombart, 2016 in RStudio Version 1.3.1093
## DISCUSSION
The analysis of each avian reovirus gene separately showed that there are different genetic distances between the examined genetic segments. Despite that the S1 gene is used for molecular characterization of avian reovirus variants, the heatmap showed that L1 and L2 genes present more differences between the virus isolates. Nevertheless, it is important to consider the segment that is 4,011 bp and 3,907 bp for L1 and L2 respectively, and 1,658 and 1,201 for S1 and S4 respectively. Also, the segments L1 and L2 encode structural proteins that might be related to genetic variation. The S1 segment encodes for one structural protein and two non-structural proteins. The S4 segment encodes only a non-structural protein. Also, in order to perform better genomic analyses, it is important to consider a representative and heterogeneous sample, considering different geographic zones, years of isolation, the organ of isolation, along with a larger number of sequences.
## REFERENCES
1. Ayalew, L. E., A. Gupta, J. Fricke, K. A. Ahmed, S. Popowich, B. Lockerbie, S. K. Tikoo, D. Ojkic, and S. Gomis. Phenotypic, genotypic and antigenic characterization of emerging avian reoviruses isolated from clinical cases of arthritis in broilers in Saskatchewan, Canada. Scientific reports 7:3565. 2017.
2. Egaña-Labrin, S., R. Hauck, A. Figueroa, S. Stoute, H. L. Shivaprasad, M. Crispo, C. Corsiglia, H. Zhou, C. Kern, B. Crossley, and R. A. Gallardo. Genotypic Characterization of Emerging Avian Reovirus Genetic Variants in California. Scientific Reports 9:9351. 2019.
3. Kant, A., F. Balk, L. Born, D. Van Roozelaar, J. Heijmans, A. Gielkens, and A. ter Huurne. Classification of Dutch and German avian reoviruses by sequencing the sigma C protein. Veterinary research 34:203-212. 2003.
4. Lu, H., Y. Tang, P. A. Dunn, E. A. Wallner-Pendleton, L. Lin, and E. A. Knoll. Isolation and molecular characterization of newly emerging avian reovirus variants and novel strains in Pennsylvania, USA, 2011–2014. Scientific reports 5:14727. 2015.
5. Sellers, H. S. Current limitations in control of viral arthritis and tenosynovitis caused by avian reoviruses in commercial poultry. Veterinary microbiology 206:152-156. 2017.
6. Troxler, S., P. Rigomier, I. Bilic, D. Liebhart, I. Prokofieva, B. Robineau, and M. Hess. Identification of a new reovirus causing substantial losses in broiler production in France, despite routine vaccination of breeders. Vet Rec 172:556. 2013.
7. Van der Heide, L. The history of avian reovirus. Avian diseases 44:638-641. 2000.
8. Zhong, L., L. Gao, Y. Liu, K. Li, M. Wang, X. Qi, Y. Gao, and X. Wang. Genetic and pathogenic characterisation of 11 avian reovirus isolates from northern China suggests continued evolution of virulence. Scientific reports 6:35271. 2016.
9. Jombart, T. Introduction to phylogenetics using R.2016. http://adegenet.r-forge.r-project.org/files/MSc-intro-phylo.1.1.pdf
10. RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/ .