1. Describe your plan for using the two tables to address the research question. Note: make your own attempt here; don't actually try it yet or compare to the approach I'll show after. I want you to think about the problem first.
We need to first, in the table contains protein interation data, filter the talbe to preserve brca1 protein as one of the node, since it's what we're interested in. Then, by using the connected gene we found that conbined with brca1, we use the gene-protein unique name index to find the corresponding name of the gene. Then, we use this unique gene name index to find all of the tissue location that this gene id expresses.
2. Repeat the analysis for only tissues where gene expression is classified as "High". Which interacting pairs are highly expressed in the same tissue?
This question is similar from the exercise we did during the lab section. However, we need only filter out the gene expression level marked as "high" in the tissue_expression file.
BRCA1 gene co-expressed in a high number of tissues with TOPBEP1 (37 tissues), MRE11 (37), etc. After made the summary table, I then use arrange() function to sort the number of co-expression from high to low. I attach the resutl of few highest proteins and the overall histogram here:

Fig1: several highest co-expression gene with BRCA1

Fig2: overal number of co-expression genes histogram
```
summary_table = tibble()
for(row in 1:nrow(brca1_data_subset)){
print(row)
node2_gene = brca1_data_subset$Gene_stable_ID[row]
node2_expression_high = filter(tissue_expression, Gene == node2_gene & Level %in% c("High"))
common_tissues = intersect(brca1_expression$Tissue,node2_expression_high$Tissue)
num_common_tissues = length(common_tissues)
tissue_list = toString(common_tissues)
interaction_table <- tibble(
node1_name = 'BRCA1',
node2_name = brca1_data_subset$node2[row],# this pulls out the actual gene name
num_common_tissues = num_common_tissues,
tissue_list = tissue_list
)
summary_table <- bind_rows(summary_table, interaction_table)
}
View(summary_table)
hist(summary_table$num_common_tissues)
summary_table %>% arrange(desc(num_common_tissues))
```
3. One explanation for why BRCA1 and MRE11 may show up so many times together, while BRCA1 and TP53 may not is that MRE11 and BRCA1 are both highly ("Medium" or "High") expressed in most or all tissues, while TP53 is only highly expressed in a few tissues. Is this true? Modify the above analysis to find out.
This statement is true at least from some aspects. I count the number of tissue that these 3 genes expressed with medium and high level, and I found that the number of tissue for tp53 is 0, for brca1 is 52, and for mre11 is 87. Since there is basically no tissue that tp52 express in a medium or high level, it's impossible to find tp53 and brca1 co-express in medium or high level.
```
expression_brca1 = tissue_expression %>% filter(`Gene name` == 'BRCA1') %>% filter(Level != 'Low') %>% filter(Level!= 'Not detected') %>% summarise(count = n())
expression_tp53 = tissue_expression %>% filter(`Gene name` == 'PT53') %>% filter(Level != 'Low') %>% filter(Level!= 'Not detected') %>% summarise(count = n())
expression_mre11 = tissue_expression %>% filter(`Gene name` == 'MRE11') %>% filter(Level != 'Low') %>% filter(Level!= 'Not detected') %>% summarise(count = n())
```