[TOC]
Project 2 Report
===
Member: Zoya Jalaei, Yirong (Iris) Liang, Ilara Yilmaz
## Introduction

Figure 1: BRCA1 and RAD51 cooperate during DNA repair. (Sundararajan, Ahmed and Goodman, 2011)
BRCA1 (Breast Cancer Gene 1) and BRCA2 (Breast Cancer Gene 2) are genes that each play an important role in DNA repair. (National Cancer Institute, 2020). BRCA1 and BRCA2 work together to achieve the effect of opening the gate for protein and activating the protein to repair the DNA. (Sundararajan, Ahmed and Goodman, 2011) Consider the nature of BRCA1 and BRCA2, they are the tumor suppressor genes that protect our body from developing cancer. There are two copies of each gene in the human body, and people who adopted one mutated variant have a much higher risk of developing cancer. RAD51 Recombinase is another protein involved in the repair of DNA breaks through homologous recombination. RAD51 is important because it is known to directly interact with nucleotide sequences in order to initiate proper DNA repair which is essential for progression of proper cell cycle.
We decided to focus our study in C. Elegans because they have been shown to have significant similarities to the DNA strand repair mechanisms we see in humans. Specifically they have either many of the same and/or similar proteins to the BRCA1 pathway. Primarily, brc-1 is known to be a close homolog of the human BRCA1. For our research question, we were curious about the relationship between BRCA1 and other DNA repair proteins. We asked: how do the expression levels and protein interactions of c. elegan brc-1 and rad-51 give clues to the effects of BRCA-1 mutations in humans? We choose to also look at rad-51/RAD51 because we hypothesize it to be an example of proteins heavily affected by the mutations and variants of BRCA1.
We will be taking several approaches to this question. First we will look at the expression levels of both rad-51 and brc-1 to see if they have shared or similar expression levels or locations in C. Elegans. Next, we will be observing the gene ontology and protein interactions of both brc-1 and rad-51 in C. Elegans to help determine how much these two proteins known to act in DNA repair actually interact with each other. Lastly, we will look deeper into the mutations in human BRCA1 itself to see how they may give an insight into the effects of BRCA1 mutations in the cell, specifically we will try to find if there are mutations that clearly affect how BRCA-1 interacts with other proteins.
## Methods
### Cell Expression Levels
We used R programming language to combine different datasets and make visualization results comparing these datasets. Since we want to know if brc-1 and rad-51 will be translated simultaneously and in the same place to confirm that they work together, we found two datasets for each of these protein-coding genes in C-elegans for the expression at each life stage and each cell type. Both datasets are from Wormbase. The datasets of expression levels at different cell types are obtained from the information pages for both genes. The expression dataset at life stages is from Dr. Levin and colleagues’ experiment comparing four Caenorhabditis embryos.
We first obtained the Wormbase ID from a list of common names and corresponding Wormbase IDs.
```
rad_51 = filter(common_name, CommonName == "rad-51")$WormbaseID[1]
brc_1 = filter(common_name, CommonName == "brc-1")$WormbaseID[1]
```
Then we merged the table of cell expression levels at cell types for both rad-51 and brc-1 in increasing order of brc-1 expression level. When we were inspecting the data, we found many zeros. The zeros aren't significant to display in the data visualization. Thus, we filtered out the zero data points.
```
cellexprtable = full_join(brc_1_cellexpr, rad_51_cellexpr, by=c("Cell_type" = "Cell_type"))
cellexprsubtable <- filter(cellexprtable, brc_1_cell_expression != 0 |
rad_51_cell_expression != 0 |
brc_1_TPM != 0 |
rad_51_TPM != 0)
```
Lastly, we plotted the expression level at different cell stages for both genes.
```
A = data.frame(names=cellexprsubtable$Cell_type, expression=cellexprsubtable$brc_1_cell_expression, gene = "brc-1")
B = data.frame(names=cellexprsubtable$Cell_type, expression=cellexprsubtable$rad_51_cell_expression, gene = "rad-51")
A$TPM <- cellexprsubtable$brc_1_TPM
B$TPM <- cellexprsubtable$rad_51_TPM
cell_expressionplot <- ggplot(A, aes(reorder(names,expression), expression, color=factor(gene), size=TPM)) +
geom_point() +
geom_point(data=B) +
theme(axis.text=element_text(angle=60,hjust=1)) +
labs(title="Cell Expression Level", x="Cell Types", y="Cells Expression (%)") +
labs(color = "gene")
```
For life stages expression level dataset, we simply filtered out the data for rad-51 and brc-1 and plotted them.
```
brc_1_eleg = filter(elegexpr, WBid == brc_1)
rad_51_eleg = filter(elegexpr, WBid == rad_51)
# plot
brc = data.frame(cell_stage, expression=array(brc_1_eleg[1,5:14]), gene = "brc-1")
rad = data.frame(cell_stage, expression=array(rad_51_eleg[1,5:14]), gene = "rad-51")
cell_expressionplot <- ggplot(brc, aes(cell_stage, unlist(expression), color=factor(gene))) +
geom_point() +
geom_point(data=rad) +
theme(axis.text=element_text(angle=60,hjust=1)) +
labs(title="Expression in Different embryonic stage for C. Elegan", x="Embryonic Stages", y="Microarray signal (log10)") +
labs(color = "gene") +
geom_line(data=brc,group=1) +
geom_line(data=rad,group=1)
```
### Protein Interaction and Gene Ontology (GO)
For the second part of our analysis, we set out to find the protein interactions and gene ontology enrichment of both brc-1 and rad-51 in C. Elegans. While our main focus is still on brc-1 and its relation to BRCA-1, we wanted to also observe how proteins involved at the level of direct DNA repair relate back to the brc-1/BRCA-1. We also wanted to look at the protein interactions of BRCA-1 and RAD-51 in humans to confirm similar interactions and pathways in both species.
We started by locating interaction datasets on STRING for each of the four proteins above. In order to manipulate the interaction results, we downloaded each of the four protein data tables and imported them into Cytoscape. We used a slightly adjusted Hierarchical Layout because it displays the flow of the network while minimizing edge overlap.
Once we had done that, it did become much easier to see interactions between proteins. However, we wanted to get a more clear picture of how the interactions to our individual proteins of study differed from each other. There are no values linked to the protein’s themselves in the datasets we found so we could not do any further analysis of the nodes in this analysis, but we were able to manipulate the edges to show some significant features. Primarily, we changed the width of edges to show experimentally determined interactions and changed the transparency to show coexpression and this gave us a more accurate
Furthermore, we were also interested in visualizing the gene ontology of brc-1 and rad-51 in C. Elegans in order to see overexpressed GO categories. We used the BINGO app in Cytoscape with the default settings and from there we were able to get a better idea of how and where the brc-1 and rad-51 protein clusters act most strongly. This helped us visualize how the brc-1 cluster functions in parallel, but not exactly the same as the rad-51 cluster.
### BRCA-1 Mutations and Phenotypes
According to studies germline mutations in BRCA1 can increase the risk of developing breast and ovarian cancer. The BRCA1 gene contains 22 exons spanning about 110 kb of DNA.Part of the BRCA1-associated genome surveillance complex (BASC), that contains BRCA1, MSH2, MSH6, MLH1, ATM, BLM, PMS2 and the MRE11-RAD50-NBN protein (MRN) complex. The most BRCA1 disease-causing mutations identified so far are the 510-bp deletion of exon 22, the 3.8-kb deletion of exon 13, and the 6-kb insertion of exon 13. Mutations in BRCA1 are to be responsible for 45% of inherited breast cancer.Breast-ovarian cancer-1 (BROVCA1; 604370) can be caused by mutation in the BRCA1 gene (113705) on chromosome 17q. we have used two different database to analysis and find phenotype and various variants of BRACA1. UCSC Genome Browser on Human and Omim database. Both of thses database have provided useful information to answer our biological question.
Result:
Here we can see allelic variant of BRCA1:
https://www.omim.org/allelicVariants/113705
Here we can see results from UCSC Genome Browser on Human:



## Results
### Cell Expression Levels
 Table 1: The detained Description of abbreviation for life stages. (Levin et al., 2012)
 Figure 2: Expression level at various cell types. Data obtained from Wormbase for brc-1 (WBGene00000264) and rad-51 (WBGene00004297). The cell type data arranged in ascending order of cell expresion level of brc-1. The size of each data points indicate the number of Transcript Per Millions (TPM)

Figure 3: The expression level at different embryonic stages of C. Elegan. Data obatained from Wormbase(Microarray_Study.WBPaper00041190.cre.mr). Detailed description of life stage abbreviation is in [Table 1](https://i.imgur.com/4AjhwNo.png).
The result of our primitive analysis using R Studio is shown in [Figure 2](https://i.imgur.com/bwqJV7s.png) and [Figure 3](https://i.imgur.com/DqZYELQ.png). The expression level at different cell types graphs did not show strong co-expression at the same cell type for brc-1 and rad-51. However, the expression level at different life stages showed that brc-1 and rad-51 co-expressed simultaneously, and the expression level was in sync.
The result shows that brc-1 is expressed most frequently in ADL, the sensory neuron, and ADF, the serotonergic sensory neuron in C. Elegans. At the same time, rad-51 is most commonly expressed in SAA, an interneuron and motor neuron type. The expression in various life stages of C.elegan shows that brc-1 and rad-51 are co-expressed simultaneously and have similar expression levels. Brc-1 expression peaked at stages when the embryo has developed around 86 cells, while rad-51 peaked at the very first step of development: approximately four cells in the embryo. Both genes experienced a downhill expression when the embryo had grown about 560 cells, and the first movement was observed. Overall, brc-1 and rad-51 were highly expressed throughout the embryonic stage when the experimenters collected data in the microarray.
### Protein Interaction and GO
C. Elegan brc-1 Protein Interaction Network:

C. Elegan rad-51 Protein Interaction Network:

Note: We do not show transparency in our these because it distracted from our main results
We started our analysis by observing the c. elegan brc-1 interaction network. As shown in the image above, rad-51 does indeed show up with the proteins that brc-1 interacts with. However, by modifying the edge width to show the experimentally determined interactions, we see that rad-51 and brc-1 do not display extremely high levels of interaction when compared to other proteins in the network. Instead, brc-1 interacts most strongly with brd-1 with a value of 0.999. However, brc-1 and rad-51 only had an experimentally determined interaction value of 0.334. When we altered the transparency to show differences in coexpression, we also did not see extremely high coexpression between the two.
We next performed the same analysis on c. elegan rad-51 interaction network. Surprisingly, we did not see the presence of brc-1 unless we greatly expanded the network. However, we did see interactions of rad-51 with many of the same proteins as we saw in the brc-1 interaction network. In fact, the protein that had the highest experimentally determined interaction value to rad-51 was brc-2. While brc-2 and brc-1 are still two different proteins, they do have extremely similar functions and tend to differ more in where they are expressed rather than how (Roy, chun and Powell, 2016).
We also quickly looked at the protein interaction networks of BRCA1 and RAD51 in humans to see if we saw similar results to brc-1 and rad-51 respectively. These are not pictured since they were just used as a reference and not as our main data. Similarly BRCA1 and RAD-51, while not having the highest interaction values compared to other interactions in the table, did still share a lot of the same proteins in their networks. In fact, RAD-51 had the highest interaction value with BRCA-2.
C. Elegan brc-1 GO Results: 
C. Elegan rad-51 GO Results: 
Note: Cytoscape was not functioning properly towards the end of our analysis. The only way to import images here was through screenshots.
After this we looked at the gene ontology results from BINGO of brc-1 and rad-51. The results of this are visualized above. In the brc-1 GO, the highest enriched terms were cellular response to stress, response to DNA damage stimulus, as well as DNA repair and double strand-break repair. The most enrichment in GO terms in rad-51 were M phase, meiosis, cell cycles and DNA/nucleic acid metabolic process.
## Discussion
### Cell Expression Levels
In the cell expression result, we failed to conclude that rad-51 and brc-1 are co-expressed in the same type of cells when brc-1 is expressed only at particular cell types, and rad-51 is more broadly expressed. However, the life stage expression shows decent synchrony for when and how much is expressed for both rad-51 and brc-1.
Our results may be because the cell expression dataset needed data corresponding to C Elegan life stages so we can compare cell and life stage expression data. However, there are drawbacks for both our datasets. Wormbase did not target the dataset for expression level at various cell types to any life stages but merely a public dataset from each gene's information page on Wormbase. At the same time, the expression level at life stages only contains embryonic life stages, and the experimental data collection stopped 12 hours after hatch (L1).
Despite the existence of improvement, the section comparing the cell and life stages expression level between brc-1 and rad-51 is necessary for future projects. The comparison result can show how BRCA-1 can influence the DNA repairing process in a simpler model organism and possibly develop new branches for cancer treatment design.
### Protein Interaction and GO
We determined that the results of protein interaction give some support for the relation of brc-1 to rad-51 in the C. Elegan DNA repair pathway. We determined this because protein interactions show a lot of overlap between brc-1 and rad-51 as well as large amount of shared proteins between both protein interaction networks. However, these interaction networks do not make it clear which protein affects the other. GO enrichment did give a little more support for this aspect. It helped us to see that while the brc-1 cluster and rad-51 cluster do not have the same exact gene ontology, they are very complementary to each other. The brc-1 cluster was highly involved in responding to signals for errors in DNA repair. On the other hand, rad-51 cluster was more involved in regulating proper cell cycle, especially in M phase where we observe the most DNA double strand breaks. It also showed a lot of DNA related metabolic activity showing involvement in the actual altering of DNA structure.
We are aware, however, that these two analyses do not provide a perfect answer to our question. First, the datasets collected here did not come from a specific tissue of life stage. This can make it unreliable to compare the results between brc-1 and rad-51 if they were not collected from a similar culture. Additionally, there was a limited amount of data about the proteins in these networks and, as always, having more information leads to a more accurate result.
If we were to do this analysis again, it might be better to take a closer look at proteins that directly interact with brc-1/BRCA1 rather than rad-51. This would give us a better idea of how a mutation in brc-1/BRCA1 could have a direct effect on another protein as opposed to a protein farther down in the DNA repair pathway.
## Reference
[[1]](https://pubmed.ncbi.nlm.nih.gov/22560298/) Levin, M., Hashimshony, T., Wagner, F. and Yanai, I. (2012). Developmental Milestones Punctuate Gene Expression in the Caenorhabditis Embryo. Developmental Cell, 22(5), pp.1101–1108. doi:10.1016/j.devcel.2012.04.004.
[[2]](https://www.cancer.gov/about-cancer/causes-prevention/genetics/brca-fact-sheet) National Cancer Institute (2020). BRCA Mutations: Cancer Risk & Genetic Testing. [online] National Cancer Institute.
[[3]](https://pubmed.ncbi.nlm.nih.gov/22252577/) Sundararajan, S., Ahmed, A. and Goodman, O.B. (2011). The relevance of BRCA genetics to prostate cancer pathogenesis and treatment. Clinical Advances in Hematology & Oncology: H&O, [online] 9(10), pp.748–755.
[[4]](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4972490/) Roy, Rohini et al. “BRCA1 and BRCA2: different roles in a common pathway of genome protection.” Nature reviews. Cancer vol. 12,1 68-78. 23 Dec. 2011, doi:10.1038/nrc3181
Gayther, S. A., Warren, W., Mazoyer, S., Russell, P. A., Harrington, P. A., Chiano, M., Seal, S., Hamoudi, R., van Rensburg, E. J., Dunning, A. M., Love, R., Evans, G., Easton, D., Clayton, D., Stratton, M. R., Ponder, B. A. J. Germline mutations of the BRCA1 gene in breast and ovarian cancer families provide evidence for a genotype-phenotype correlation. Nature Genet. 11: 428-433, 1995.
[5]Brown, M. A., Xu, C.-F., Nicolai, H., Griffiths, B., Chambers, J. A., Black, D., Solomon, E. The 5-prime end of the BRCA1 gene lies within a duplicated region of human chromosome 17q21. Oncogene 12: 2507-2513, 1996. [PubMed: 8700509, related citations]
[6]Castilla, L. H., Couch, F. J., Erdos, M. R., Hoskins, K. F., Calzone, K., Garber, J. E., Boyd, J., Lubin, M. B., Deshano, M. L., Brody, L. C., Collins, F. S., Weber, B. L. Mutations in the BRCA1 gene in families with early-onset breast and ovarian cancer. Nature Genet. 8: 387-391, 1994. [PubMed: 7894491, related citations] [Full Text]