# The genetic relationship of SIRT Sequences with Tumor Growth and Longevity
By Samantha Leano, Stephanie Frohrip
[TOC]
## Introduction
In recent times, from yeast to humans, sirtuins (Silent Information Regulator Transcription) are known to either prolong life spans or cause cancer. Despite being homologous, these seven SIRT genes (SIRT:1,2,3,4,5,6,7), each play a specific role in aging, transcription, apoptosis, inflammation, and stress resistance (Polito, L, et al., 2010, 216). SIRT is highly conserved in the mammalian genome, and is a transcription factor for several different processes. It is a nicotinamide adenine dinucleotide (NAD+) dependent histone deacetylase, and also is a target for several proteins. It is expressed virtually everywhere in the body, including yet not limited to the brain, heart, kidney, and liver. Even though these genes are commonly known as longevity genes, they are also known to affect a variety of other processes, as sirtuin plays roles in a multitude of diseases. By regulating downstream pathways, SIRT affects a range of age-related, metabolic, and cardiovascular diseases. In such diseases (Parkinsons, diabetes, etc), SIRT is found in decreased levels, compared to standard. In comparison, an overexpression of SIRT results in increased cell viability and decreased cell apoptosis.

Figure 1. A representation of the a SIRT protein.
https://en.wikipedia.org/wiki/Sirtuin#/media/File:1SZD.png
### SIRTs Associated with Longevity
Sirtuins were first investigated for their involvement in calorie restriction (CR), which involved the reduction of daily food intake. Researchers had tested SIRT1, and found out that mice that didn’t have the gene had a shorter lifespan than their wild-type (Polito, L, et al., 2010, 215). The experiment suggested SIRT prolonged life expectancy and delayed age-associated disorders. It was soon later discovered that SIRT1,3,4, and 5, are all associated with longevity. Further testing has been done but not yet concluded in the scientific community on whether common allelic variations of SIRT genes are associated with longevity in humans.
### SIRTs Relationship with Tumor Growth
After a thorough investigation of all SIRT proteins, it was noticed that SIRT, nicknamed a "Master Metabolic Regulator", controlled many of the same traits that influence cancer. SIRT affects DNA repair, transcriptional regulation, metabolism, aging, and senescence, which all heavily affects cancer and its severity. By searching thorough several databases, we discovered some that correlated cancer expression with SIRT genes thorough the Human Protein Atlas. One is a comprehensive list of all the SIRT genes (SIRT.tsv) that contains gene ID, gene description, chromosome position, molecular function, and RNA cancer specificity score. The other contains pathology information (pathology.tsv.zip) about SIRT genes, which includes Gene ID, mRNA expression, and patient mortality.
### Our Study
For this project, we will briefly look into the roles of each SIRT gene and the possible genetic roles they may play in cancerous tumor formation. In addition to understanding the function of all SIRT genes, specifically, we will focus more on the variants that have an association to both tumor growth and longevity. By focusing our understanding on sirtuins, we may be able to quickly identify the mutations of different SNPs associated with different cancers to understand why SIRT genes are associated with both longevity and cancerous diseases.
## Methods
### Databases We’ve used
#### *SIRT_rfn.tsv*
Our first database can be found in the human protein atlas website. In this database in particular, we are looking at the general description of all SIRT genes (SIRT.tsv). This dataset consists of: the gene name, gene description, location of the chromosome and the position of that gene, the protein class, biological class, and evidence indicating which disease it’s involved with.
Using R, we modified the dataset to filter out the columns that are not useful to us, and remove the RPS19BP1 protein. This will give us our first modified dataset (SIRT_rfn.tsv). Once we have our modified dataset, we could see that there is a lot of unfilled data written as (NA). Since these genes are still being studied and researched, we expected to not have some of the data.
```
# Opening up tidyverse package
library(tidyverse)
# Reading the SIRT.tsv file as SIRT_data
SIRT_data <- read_tsv(file='SIRT.tsv')
# Selecting important columns: 1,3, and 4 are all the name and gene IDs for all SIRT genes.
# 6 though 12 are the locations and brief functions of SIRT genes, 16 and 17 are cancerous tissues that are detected on an RNA level
beta_filter <- select(SIRT_data,1,3,4,6:12,16,17)
# Removing the bottom row to just having SIRT genes
beta_filter <- filter(beta_filter,Gene %in% c('SIRT1','SIRT2','SIRT3','SIRT4','SIRT5','SIRT6','SIRT7'))
# Double check we have everything
view(beta_filter)
# Saving beta_filter as the information for SIRT reference
write_tsv(beta_filter,file = 'SIRT_rfn.tsv')
```
Preview of SIRT_rfn.tsv

However, there is a lot we can look at based off of our SIRT_rfn data. This dataset will be useful as we can utilize UCSC Genome Browser to locate the positions of each SIRT gene and see different variations within the human genome. What’s interesting to point out is that even though this table suggests that SIRT3 is not involved with any disease, we know based off of other research done with this gene, that SIRT3 is associated with both longevity and cancer.
#### *SIRT_path.tsv*
Our next database can also be found in the human protein atlas website. Pathology data (pathology.tsv) is a dataset that delves into the staining profiles for proteins in human tumor tissues and survival rate of those patients. The dataset contains: the gene name and ID, the type of cancer it produces, the level of proteins present with the amount of stains done on the sample, and p-values for patient survival and mRNA correlation.
We modified it to SIRT genes and to where the disease has been diagnosed and could either be cured or not cured. The reason why we took out the unfavorable columns are due to the uncertainty if that sample was diagnosed with a specific cancer. We want the SIRT genes that have p-values in categories where a patient that does have a known cancer could survive or not.
```
# Opening up tidyverse
library(tidyverse)
# Reading the pathology file
pathology_expression <- read_tsv(file= 'pathology.tsv')
# Manipulating pathology_expression
alpha_filter = pathology_expression
alpha_filter <- select(pathology_expression,1:8,10)
# Format the column names to be a singular string
colnames(alpha_filter)
colnames(alpha_filter)[2] <- 'Gene_name'
colnames(alpha_filter)[7] <- 'Not_Detected'
# Filtering out Gene names to just have SIRT genes
alpha_filter <- filter(alpha_filter,Gene_name %in% c('SIRT1','SIRT2', 'SIRT3', 'SIRT4', 'SIRT5', 'SIRT6', 'SIRT7'))
# Double checking
View(alpha_filter)
# Saving alpha_filter as the pathology data of SIRT
write_tsv(alpha_filter,file = 'SIRT_path.tsv')
```
Preview of SIRT_path.tsv

After manipulating the data, we could see that we could fill in missing data into our first dataset (SIRT_rfn). For example, SIRT3 is associated with kidney, uteral, and pancreatic cancer. This information has become useful and we will integrate this into our SIRT_rfn data and also other information that was gleaned from the second database (SIRT_rfn_2). Essentially, we have merged the two datasets into one.
```
# Inserting new data into the columns
beta_filter$`Disease involvement`<-c('Cancer-realted genes head liver pancreatic prostate','Cancer-related genes Neurodegeneration','Cancer-related genes Tumor suppressor','Cancer-realated thyroid breast glioma','Cancer-related skin ovarian lung','Renal,Carcinoid,Pancreatic','Disease variant')
# Double check we have everything
view(beta_filter)
# Saving beta_filter as the information for SIRT reference
write_tsv(beta_filter,file = 'SIRT_rfn_2.tsv')
```
Preview of SIRT_rfn_2.tsv

With such a small p-value, we believe that there is some significance in the data that shows us that SIRT 3,4,5, and 6 have some favorable prognostics while SIRT 2 and 7 does not. This p-value agrees with other published research done on SIRT genes as SIRT2 and 7 are commonly associated with cancer genes (Polito, L., et al, 2010, 2015 table 1).
#### *SIRT_location.tsv*
The final dataset we will be using to understand the function of SIRT genes is found also in the Human protein atlas dataset. Subcellular_location.tsv is a dataset that has: gene name, supported locations, single cell variation intensity, and GO ID.
```
# Opening tidyverse
library(tidyverse)
# Reading the tsv file as SIRT_location
SIRT_location <- read_tsv(file='subcellular_location.tsv')
#Manipulating the data
omega_filter = SIRT_location
# Selecting only the columns needed
omega_filter <- select(SIRT_location,1:4,8,14)
# Editing names
colnames(omega_filter)[2] <- 'Gene_name'
# Filtering only SIRT genes
omega_filter <- filter(omega_filter,Gene_name %in% c('SIRT1','SIRT2', 'SIRT3', 'SIRT4', 'SIRT5', 'SIRT6', 'SIRT7'))
# Double checking
view(omega_filter)
# Wiriting it as a tsv file
write_tsv(omega_filter,file = 'SIRT_location.tsv')
```
We modified the data to only have SIRT genes, main location, and the GO ID (SIRT_location.tsv). These columns are useful as we could use their GO IDs and run them into cytoscape. From there, this could help us find connections within the cell and see which SIRT genes are most commonly associated with a specific location within the cell. Since SIRT_location tells us the location on a subcellular basis, this can tie us back to SIRT_path and give us additional information as to how that cancer is formed. In addition, SIRT_path also ties us back to SIRT_rfn as we could check within those specific locations what type of variants formed and any nuances found there.
Preview of SIRT_location.tsv

#### All datasets mentioned can be found here: https://drive.google.com/drive/folders/1qcswEmAL8PblVyygKQecCZKrnjmskkhx?usp=sharing
## Results
* The output of the data integration is described if it was successful, or the reasons for failure are clearly described. Results are presented in a clear way, including a visualization.
Using cytoscape, SIRT_location.tsv was imported and uploaded to a network. "Gene" was selected as source node, "Main Location" as target node, and "GO ID" as interaction type. The resulting network is below.

As demonstrated in the network, the genes ENG00000077463, ENSG00000187531, and ENSG00000096717 (SIRT6, SIRT7, and SIRT1, respectively) are associated with the nucleoplasm. ENSG00000124523 (SIRT5) are associated with the mitochondria. ENSG00000068903 (SIRT2) is associated with cytosol and the nuceloli.
These results are interesting because we know that the nucleoplasm is associated with cell growth and proliferation. Which makes sense from our SIRT_path dataset, if mutated, SIRT 1, 6, and 7 could be the likely cause of most common cancers. The same thing can be said about SIRT5, if mutated, will cause the mitochondria to be dysfunctional and cause cancer. In addition, SIRT2, which causes tumors to grow and form–also that can be traced back to our SIRT_path dataset.
SIRT 1


SIRT 2


SIRT 3


SIRT 4


SIRT 5


SIRT 6


SIRT 7


## Discussion
* The results are further explained in the discussion section. Includes failures, successes, results, future hopes of what to expand.