# TOO Meeting notes
- This is intended as a collaborative note where all participants can edit (during and after) the meeting.
- Tip: The document is a markdown file. Using editing and viewing mode parallel to each other to allow easy documenting!
## Pre-meeting:
Date: 20201126 (Thu)
In this tissue-of-origin project, we are inspired by the cfDNA fragmentation patterns and their relationship with nucleosome occupancies. Previously, several papers described the coverage of cfDNA was useful in determining the nucleosome positioning around across the whole chromosome (Snyder, 2016), and was correlated with expression of genes (Ulz, 2016).
On the other hand, detecting minimal residual disease (MRD) for recurrent cancer is a popular/practical application for cfDNA use in clinics. Most MRD detection are based on single or multiple locus point mutation/indels detection. Genome-wide cfDNA combining discovery in SNV + CNV for MRD has recently been published in Nature Medicine (Zviran, 2020). For SNV discovery in cfDNA, matched WBC samples should be present to eliminate hematopoietic-origin clonal mutations (Razavi, 2019).
Based on these two arms of advancement, we are interested if we can develop a classifier for each cfDNA molecules based on their all kinds of features (fragment length, cfdna end motifs, position related to nearest genes/chromosome marks/nucleosomes, known mutations) to derive t-o-o for each molecule, and based on this detect minimal residual diseases (tumor-related molecule fraction). So that we have a chance to increase the sensitivity for detect disease not only by detecting mutations but boost by cell-type related features as well.
A few questions that arose:
- Is it really possible to determine the t-o-o for each molecule?
- What kind of dataset do we need to prove this? Are publicily available ones enough?
- Alternative interested topics related to t-o-o and cfDNA (that hasn't been done) -- other than MRD
Some action for kicking-off this project could be:
- literature review in cfDNA & t-o-o. I imagine quite a lot of effort has been done. Please update what you find on our project Gdrive.
Here I found a list of literature on cfDNA (updated 3 years ago though) https://github.com/christacaggiano/cell-free-dna-reading-list.
- Reproduce -- the Snyder paper: https://github.com/shendurelab/cfDNA
- Check -- github of a recent similar project (based on my quick search): https://github.com/christacaggiano/celfie
- Solidify the goal afterward
## Meeting 1: kick-off
Date: 20201130 (Tues)
People: Liting, Emmy, Inez, Roy, Myrthe
Topic: kickoff, project planning
A. Project objectives: 1) classify cfdna molecules(samples) by tissue-of-origin based on molecule motifs, mapping locations, and correspondance to epigenomics marks.
B. communicate expectations (responsibilities/credits/required time):
- Liting: 1-2 day a week
- Alex:
- Roy: may contribute, mostly contributing to background knowledge.
- Emmy: may contribute, flexible, depends on the content
- Myrthe: may contribute, flexible, depends on the content
C. how to track progress/contributions (GitHub/trello/google drive/todoist), and plan future meetings.
- Github: code/commits repository + Github Projects: issues/ todos
- Surfdrive: meeting notes, documents & literatures
- slack: meeting notifications & handy links & chats
## D. Planning for the future months:
### 1st month:
A. Literature review (everyone) – upload/ update on doc
B. molecule-based codebase structure (Liting)
C. acquire publicily available datasets
- datasets available on the archive storage:
1. (Zviran et al. 2020) - 29 Lung Adenocarcinoma patient sample (pre-treatment cfDNA + WBC + tissue biopsy (TUMOR/NORMAL)) --no umi
`/hpc/archive/cog_bioinf/ridder/ega_data`
2. (Cristiano et al. 2019) - ONLY cfDNA samples without corresponding WBC and tissue biopsy samples (shallow)
`/hpc/archive/cog_bioinf/ridder/ega_data`
3. Ovarian cancer ascites cfDNA with know background (cyclomics)
`path`
4. Healthy plasma with known background (cyclomics)
`path`
5. To acquire?: CSF fluid cfDNA (Emmy)
6. To acquire?: Urine
7. To acquire?: FinaleDB - no SNP. various tumor types. Tissue types:
- T-cells
- liver (for background cfDNA)
- Lung
- Breast
- Ovarian (20210111)
- Colorectal (20210111)
- Pancreatis
- Healthy
- Urine?
- The download needs to be manual... API only provide search function but not download.
D. Acquire available nucleosome tracks/ epigenomic tracks from ENCODE and Roadmap Epigenomics
`/hpc/compgen/projects/gw_cfdna/too_cfdna/raw/epigenomics_tracks__encode`
Tissue types:
- T-cells
- liver (for background cfDNA)
- Lung
- Breast
- Ovarian
- Colorectal
- Pancreatis
### 2rd-4th month:
A. Algorithm development
B. Establish dataset: cfdna with known labels of tissue-of-origin
C. training / testing on known datasets
### Assessment & adjust goals: March 2021
Tasks:
- [ ] reading literature related to cfDNA tissue of origin
- [ ] start-repo for too project
- [ ] submit SMART plan to Gdrive
#### inbetween meeting: Emmy & Roy, Emmy & Liting
- Hypothesis: cfDNA length/fragmentomics/cutting could relate to immune response which is different in different cancer type.
#### inbetween meeting: Alex & Liting
- Sync about data acquisition
- FinaleDB
## Meeting 2: prototype & samples
Date: 2020/1/13 14:00-14:30
Agenda:
- meeting document(here) + project management (Github Action)
- data
- prototypes
- UMI:
## Meeting 3: quick prototype on function of length
Liting shared a quick prototype of classifying cancer derived and normal cell derived cfDNA based on solely length and a quality measure (repeats). Did not work. But it is length of the whole repeat length instead of the insert length. Need to change.
### Emmy suggested:
- Cancer cells can still have REF allele. Which means, the cfDNA containing reference allele can be both from healthy cells and cancerous cells.
- Look at if the mutation is located at coding / non-coding region --> If it is at coding location, the cfDNA fragment may derived from a worse cell (functional disruption garanteed).
- Look at TF binding: Shorter cfDNA are more likely to be bounded by TF. Look at all TF binding locations, Test if short fragments are enriched --> TF is more present --> genes are more transcribed?
-
### Roy suggested:
- Cannot imagine how it would work based on length.
- Look at CTCF sites
### Liting's planning for next step:
- Encode the relative distance to epigenetic tracks
- TF binding sites
- CTCF / Tad boundaries
- Histone binding sites per tissue type!
- Frequent end coordinate (Dennis Lo paper)
- Learn cfDNA sequences (or end motifs)
## Next meeting:
- when:
- what: