# Project 2: Animal Sleep Habits and the BHLHe41 Gene
[TOC]
## Introduction:
### Our Gene
The gene that we chose to look at was BHLHe41, or basic helix loop helix family member e41, which is a gene involved in controlling circadian rhythm. BHLHe41 is a transcriptional repressor that negatively regulates circadian rhythm controlling genes, including the gene CLOCK (Circadian Locomotor Output Cycles Caput).
### Our Project
We were particularly interested in BHLHe41 because a variant of this gene causes a short sleep phenotype in humans. This variant allows humans to function at an optimal level while only getting six hours of sleep every night. This relationship between BHLHe41 and sleep lengths in humans made us wonder if there was a possible correlation between how this gene controls sleep in humans, and how it's ortholog in other species could affect their sleep length.
As a first approach, we decided to analyze if divergence from the human form of the gene correlated to divergence from the human sleep pattern.
### Our Databases
The first database that we chose for this project was a database containing information about conservation of the BHLHe41 sequence across different species. We got this data from a table of orthologues of the gene among different species in the Ensembl genome browser database. The second database that we used was the Phylogeny of sleep database which contained information about the sleep patterns, including sleep lengths, of different species.
## Methods:
### Conservation of BHLHe41 Gene across organisms
#### Acquisition
https://uswest.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?db=core;g=ENSG00000123095;r=12:26120030-26125037
The database we generated examines the BHLHe41 ortholog in other organisms, and compares it to the human template. Of interest to us are the columns "Target % ID", which gives a percentage of conservation with the human template, and the "Species" column, which will act as our key column between the two databases.
#### Integration
After importing the database into a Python Jupyter notebook, several changes to the data had to be made. The species column contained both the scientific and common name, with the common contained in parenthesis. These were easily seperated using .split("("), and keeping only the first list element. Additional cleanup was done by normalizing the column into all lower case.
### The Phylogeny of Sleep
[https://www.bu.edu/phylogeny/](https://www.bu.edu/phylogeny/)
#### Acquisition
The Phylogeny of sleep website hosts a searchable database of sleep characterists for over 127 mammalian species. There are many reported columns, but our study will focus on the "Total Daily Sleep" column, and the species to again act as our key column.
#### Integration
The second database was unfortunately much less organized, and required significant cleaning. Multiple malformed columns were discovered, which otherwise broke column/row ordering. Conversion to a CSV was complicated by many cells containing commas as data values, and simillary tab seperation revealed multiple columns contained tabs, carraige returns, and line breaks.
Eventually, these inconsitancies were cleaned away through stripping of all special characters. Once normalized, the species data was convereted to all lower case in preperation for matching with the other key column.
#### Merging of datasets
Once both databases were prepared, they were converted to Pandas dataframes for ease of manipulation. The dataframes were then merged via the pd.merge() command. The species columns served as the key column, and only rows of intersection were kept.
This resulted in a new dataframe with 57 entries. One of the entires had to be thrown out due to "N/A" listed for the total sleep column. 2 outliers were also removed due to unrealistic values (IE less than 30 minutes total daily sleep in a species know to sleep in excess of 6 hours daily.)
## Results:
The final table we were left with consisted of a species, their total hours slept in a day, and the percent similarity of their BHLHe41 ortholog compared to humans.
| Species | % Gene Conserved | Total Daily Sleep Hours |
|-----------------------|------------|-------------------|
| bos taurus | 73.26 | 2.436 |
| callithrix jacchus | 96.22 | 9.5038 |
| callithrix jacchus | 96.22 | 9.5061 |
| callithrix jacchus | 96.22 | 9.5044 |
| callithrix jacchus | 96.22 | 12.383 |
| cavia porcellus | 91.6 | 6.816 |
| cavia porcellus | 91.6 | 7.73 |
| equus caballus | 76.79 | 4.368 |
| erinaceus europaeus | 53.88 | 14.16 |
| loxodonta africana | 59.07 | 3.3 |
| macaca mulatta | 98.55 | 8.9472 |
| macaca mulatta | 98.55 | 8 |
| macaca mulatta | 98.55 | 11.8 |
| macaca mulatta | 98.55 | 6.4 |
| macaca mulatta | 98.55 | 11.52 |
| macaca mulatta | 98.55 | 5.7508 |
| macaca nemestrina | 98.55 | 13.968 |
| macaca nemestrina | 98.55 | 7.845 |
| macaca nemestrina | 98.55 | 8.225 |
| macaca nemestrina | 98.55 | 8.43 |
| macaca nemestrina | 98.55 | 7.83 |
| macaca nemestrina | 98.55 | 9.05 |
| meriones unguiculatus | 78.22 | 3.0333 |
| meriones unguiculatus | 78.22 | 15.288 |
| meriones unguiculatus | 78.22 | 13.066 |
| mesocricetus auratus | 97.87 | 15.528 |
| mesocricetus auratus | 97.87 | 14.424 |
| microcebus murinus | 93.08 | 15.36 |
| microtus ochrogaster | 79.46 | 15.37 |
| mus musculus | 79.76 | 8.2464 |
| mus musculus | 79.76 | 10.968 |
| mus musculus | 79.76 | 13.152 |
| mustela putorius furo | 88.32 | 15.42 |
| mustela putorius furo | 88.32 | 15.113 |
| mustela putorius furo | 88.32 | 15.266 |
| mustela putorius furo | 88.32 | 14.496 |
| octodon degus | 88.5 | 6 |
| octodon degus | 88.5 | 6.96 |
| octodon degus | 88.5 | 5.04 |
| octodon degus | 88.5 | 6.12 |
| octodon degus | 88.5 | 6.24 |
| octodon degus | 88.5 | 6 |
| octodon degus | 88.5 | 9.024 |
| papio anubis | 96.26 | 9.84 |
| rattus norvegicus | 70.03 | 13.248 |
| sus scrofa | 94.19 | 9.03 |
| theropithecus gelada | 97.7 | 10.93 |
| theropithecus gelada | 97.7 | 11.96 |
| theropithecus gelada | 97.7 | 12.473 |
| theropithecus gelada | 97.7 | 10.902 |
| tursiops truncatus | 44.4 | 12 |
| tursiops truncatus | 44.4 | 10.4 |
| vulpes vulpes | 54.42 | 9.72 |
These values were then placed onto a scatter plot, and a line of best fit was calculated.

## Discussion:
Our final table consisted of 52 elements, many of which were repeated studies of the same organisms. The range of sleep varried significantly from 2.436 to 15.528 hours in a single day. Conservation also similarly varried from between 44.4 and 98.55 percent when compared to the human BHLHe41 gene. When graphed and compared to each other, the linear regression strongly suggests there is no correlation between these two variables. Of intrest, there were many species with near 100% similarity to humans that still varried wildly in total sleep hours. This would suggest that BHLHe41 plays only a small part in sleep length regulation, or perhaps is less impactful in species other than humans. Additionally, the genotype associated with short sleep syndrome may be radically different than any variation found in other organisms.
## Future Direction:
Possible improvement could be found by expanding the dataset to include more animals. Additionally, weighting the data points by an overall quality score, related to the validity of the individual study these values were recorded from, could improve the confidence of some trials over others.
Our statisical analysis could also be improved. Our hypothesis originally assumed that as conservation of the BHLHe41 gene decreased, total sleep would diverge from the human average. Although the short sleep syndrome results in less sleep, we conceaded early in the study that changes to the gene could hypothetically also cause an increase in total sleep time. Estimating the degree of divergence in our scatter plot would require more complex statistical analysis, which could reveal more supportive findings, but given the nature of the data we found this unlikely.
Further understanding of the exact changes to the BHLHe41 gene that lead to the short sleep syndrome could vastly improve this study. By examining other animal's BHLHe41 ortholog more closely, with an eye for the genotype leading to this condition in humans, could allow us to make a much more nuanced study. Instead of comparing divergence from the human norm, we could measure similarity to the specific variant that leads to our phenotype of interest.
## Conclusion
Based on our data, we have little evidence to support that the animals we sampled augmented their sleep patterns by diverging their BHLHe41 gene ortholog from the human model. Further assumptions or correlations would required additional study. In all likelyhood, this process is managed by multiple and complex regulatory systems, and may be managed radically different in other species.