# Fall 2023 Committee Meeting
Student: Meghan Sleeper
:::info
**Student:** Meghan Sleeper
**Project:** Identifying commonly differentially methylated regions (DMRs) in colorectal cancer tissue samples
:::
## :beginner: Updates
:::success
Since our [last committee meeting](https://csuchico.box.com/s/uq10celtgeq7j5s12n60ynnbcrisg23v), I have learned so much and progressed the data analysis for my project.
:::
### General accomplishments
:white_check_mark: Passed the qualifying exam.
:white_check_mark: Attended UC Davis data analysis collaboratory workshop.
:white_check_mark: Presented at the [Biology summer seminar series](https://csuchico.box.com/s/ibdr0rw3vy57b6wmrueem9h1g52ectax).
:white_check_mark: Learned snakemake, R, and markdown.
:white_check_mark: Brushed up on my stats with MATH 615 (data analysis for graduate students)
:white_check_mark: Presented a poster on [Loss of Methylation as a Predictor
for Statistical Significance of Differentially Methylated Regions](https://csuchico.box.com/s/t3lfqfoqxzsdx2nbk6kokfesc9utz120). Disclaimer: poster is stats and not bio focussed.
### Project specific accomplishments
:white_check_mark: Aggregated whole genome bisulfite sequencing (WGBS) data from colorectal cancer tissue and healthy colon tissue samples.
:white_check_mark: Processed 17 tissue samples through my [analysis pipeline](https://github.com/MSleeper1/dmr_workflow/tree/main).
:white_check_mark: Wrote a [script to annotate genes](https://github.com/MSleeper1/dmr_workflow/blob/main/src/7_annotate_dmrs.ipynb) and regulatory features onto pipeline outputs (more challenging than it sounds).
:white_large_square: Started comparing DMRs found with different file groupings.
## :triangular_flag_on_post: Background review
:::success
My project seeks to aggregate genome wide methylation data for colorectal cancer patients and healthy patients and identify common differentially methylated regions across all samples.
:::
### Methylation patterns observed in cancers:
Epigenome-wide association studies (EWAS) investigate relationships between epigenetic modifications across the entire genome and a particular condition.
Differentially methylated regions (DMRs) associated with a condition are identified by comparing methylation in two groups.

DNAm patterns observed in cancer cells:
| Genomic feature | Change | Impact of change |
| ------------------ | ---------------- | -------------------------- |
| Intergenic repeats | Hypomethylation | Genomic instability |
| Gene promoters | Hypomethylation | Gene reactivation |
| Gene promoters | Hypermethylation | Gene silencing |
| Enhancer | Hypermethylation | Reduces gene transcription |
| CTCF binding sites | Both | Genomic instability |
Methylation arrays, despite covering ~2% of potential methylated regions, are commonly used due to cost considerations.
Whole genome methylation sequencing, is limited in usage due to high costs, often restricting analysis to one cancer tissue sample.
There is value in aggregating WGBS data from various studies and analysing as a collective.
## :pencil: Progress Overview and Challenges
:::success
I have found more sample data and processed samples with my pipeline. I have had success finding DMRs in one sample set (271). This DMR data was used for analysis in MATH615 this semester.
:::
:::danger
I have run into challenges with deconvolution of whole tissue into cell types and larger grouped analyses returning no DMRs.
:::
### Sample overview:
| Study info | 271 | 318 | 2399 | 161 | 215 |
| -------------------------- | ------ | ------------------------ | -------- | -------- | ------ |
| # of CRC samples | 2 | 5 | 1 | 0 | 0 |
| # of Normal samples | 1 | 5 | 1 | 1 | 1 |
| # of patients | 3 | 5 | 1 | 1 | 1 |
| Matched samples? | no | yes | yes | no | no |
| Technical replicated | 16 | 1 | 3 | 12 | 3 |
| single or paired end reads | paired | single | paired | single | paired |
| Sex info provided | No | No | Yes (M) | Yes (M) | No |
| Age info provided | No | Yes (95, 52, 68, 39, 69) | Yes (81) | Yes (34) | No |
| SRA group | SRX381569 | SRX8409041 | SRX332736 | SRX190161 | SRX1631736 |
* Total CRC tissue samples processed: 8
* Total normal colon tissue samples processed: 9
### Challenges during DMR calling:
:::info
When calling DMRs the grouping of cancer vs. control must be specified. My goal has been to analyze all samples against eachother.
:::
* When I group all files, no DMRs are returned.
* When I run DMR calling with small subgroups, some DMRs are returned.
I think this is due to minimum coverage required in all samples by the program and the limited coverage in about half of my samples.
VIM gene methylation vis in my samples (chr10:17229278-17237593):

As you can see, there is very limited coverage in top 10 samples listed. Of the samples with coverage, the top 3 are CRC tissue and the remaining are normal tissue.
VIM promoter methylation vis in my samples (chr10:17226229-1723684):

In both of the screenshot, you can see some blocks showing differential methylation between cancer and control.
I beleive these DMRs are not being found because of the lacking coverage in other samples.
**Next steps:** I will run a regrouped analysis of the samples with consistent coverage.
### Annotation: next steps
:::info
Now that I have written a script to annotate my data, the next step is to check my samples expected methylation markers in CRC
:::
Check my samples for the following methylation biomarkers:
| Gene ID | Gene Name | Gene Description | Methylation Status in CRC |
| ------- | ------------------------------------------ | ----------------------------- | ------------------------- |
| SEPT9 | Septin 9 | Cell division regulator | Hypermethylated |
| MLH1 | MutL Homolog 1 | DNA repair gene | Hypermethylated |
| VIM | Vimentin | Intermediate filament protein | Hypermethylated |
| CDKN2A | Cyclin-Dependent Kinase Inhibitor 2A (p16) | Cell cycle regulation | Hypermethylated |
| MGMT | O-6-Methylguanine-DNA Methyltransferase | DNA repair gene | | DNA repair gene | Hypermethylated |
| APC | Adenomatous Polyposis Coli | Tumor suppressor | Hypermethylated |
| LINE-1 | Long Interspersed Nuclear Elements-1 | Repetitive DNA element | Hypomethylated |
*Note: This is a condensed table with top markers*
**Commercialized and clinical biomarkers:**
* SEPT9 hypermethylation
* VIM hypermethylation
* MLH1 hypermethylation
## :timer_clock: Goals
:::info
I plan to walk in Spring 24 and graduate by Summer 24. I have a lot of work to do between wrapping up data analysis and thesis writing.
:::
* Thesis draft of introduction and methods by end of break
* Address tissue heterogeneity with [epiDISH](https://github.com/sjczheng/EpiDISH/tree/master)
* Address type 1 error in statistical analyses with [Metevalue](https://github.com/yfyang86/metevalue)
## :book: Linked resources
#### Previous committee meetings
:black_small_square: [Spring 23 Committee meeting slides](https://csuchico.box.com/s/uq10celtgeq7j5s12n60ynnbcrisg23v)
:black_small_square: [Fall 22 Committee meeting slides](https://csuchico.box.com/s/61e86bmcqaspvdyxna9sz1xvct5hjqjs)
#### Document to sign
:black_small_square: [F23 degree progress report](https://csuchico.box.com/s/gl9ubdljrhdvt2wy4thgw8hsd90fknep)
#### Presentations
:black_small_square: [Summer seminar presentation](https://csuchico.box.com/s/ibdr0rw3vy57b6wmrueem9h1g52ectax)
:black_small_square: [MATH615 poster: Loss of Methylation as a Predictor
for Statistical Significance of Differentially Methylated Regions](https://csuchico.box.com/s/t3lfqfoqxzsdx2nbk6kokfesc9utz120)
#### Data analysis
:black_small_square: [My analysis pipeline on GitHub](https://github.com/MSleeper1/dmr_workflow/tree/main)