1. ###### tags: `Sprint Diary`
# 4DN2 SPRINT DIARY - Q1S5
## 14th March, 2023 / 11:30 AM to 28th March, 2023 (/ Extended to 11th April, 2023)
## Timeline
[Airtable Timeline](https://airtable.com/shrLOuMSWU0UlAA9F)
### Attendees
Garrett, Vedat, Emily, Nezar, Jiangyuan, Trevor, Sasha
## Resources
* Sprint Slides for Figures:
https://docs.google.com/presentation/d/1bAKAU5D_pwIQ4Ram2PwocHlbqWBMkRy2ZALzqYxNdic/edit?usp=sharing
* Task Kanban: https://airtable.com/shrTNGv4Ypjr5Icde
* Github: https://github.com/abdenlab/4dn2-sprints.git
## Sprint Agenda
### Planned
### In Progress
~~Finishing Migrating HG documentation. *Trevor*~~
Histone marks pile ups, code review? Rewriting. *Emily*
~~GSEA for RNAseq. *Yu*~~
ITS vs E1, Further dot analysis. *Jiangyuan*
## Morning Huddle
Discussion around Standup update!
Discussion around
Emily: presenting slides on her past efforts! will be lab meeting slides! H3K27me3 and Polycomb and different. Will be working on other
Jiangyuan: dot-calling , singleton removal. CTCF, some removed singletons could be true positive, next step on how to fine tune!
* Emily's q: Have you considered your minimal list on CTCF motif analysis and pick out which one enriched CTCF motif?
Sasha:
Trevor: submission deadline next couple weeks. Transitioning 4DN demo from fall with new data, in terms of breakout- Garrett; Nezar; Trevor on HG python
## 03/15/2023
### Jiangyuan meeting
Singletons - after CTCF peak enrichment cut off at 150 anchors.
Called dots from megamap. loop(dot) strength (q-scores).
Nezar's rec. If you take megadots, and score them on each conditions, use dot-calling to calculate q-scores. Then assign an overall score to each conditions, similar to membership strengh. Sergey can help with multiple hypothesis tests. Rank which cell has the highest score!
Share the mega-dot list with Sergey can calculate hiccup scores for each of the dots in megamaps.
K means clustering of E1:
New ITS; log transformed of originals with Z scores, generate heatmaps again, with switching and not switching.
## Standup Updates
## Huddle day 2: 3/21/2023
Garrett, Jiangyuan and Vedat are present
### Garrett:
* Cut n Run
* Dowloading reference datasets
* Currently using https://github.com/lh3/seqtk , Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):
`seqtk sample -s100 read1.fq 10000 > sub1.fq`
`seqtk sample -s100 read2.fq 10000 > sub2.fq`
* Downsampling input: Need more information from Emily. It is random downsampling. Vedat's question: Do we need to apply a statistical method while downsampling instead of random?
* MACS2 and SEACR are established in the pipeline, Analysis is run, need to look at output
* Jupyter hub installed:
* The littleist Jupyter hub: running into connection issues on Jupyter side
* Node jump is still causing troubles. Hopefully will be resolved soon by Michael and Arjan
* will look into T&T mapping later this week.
### Jiangyuan:
* Compartmentalization vs Gene expression:
* E1 vs ITS: Normalization issue with iHEP has been resolved by normalizing genes across stages, instead of normalizing by cell type (dev. stage)
* New heatmaps E1 vs ITS using Hi-C 3.0
* Dot (loop) calling: improving dot calling, singletons vs dynamics
### EOD meeting Emily - Garrett
Emily to Garrett:
* compare / determine : why two assays show different results? One might has more seq
* What are the reads less seq dept dataset, in DE? Downsample to DE numbers.
* Combine FASTQ files together.
* Downsample ESC to DE data. Both reads?
* K27me3 = 21.3M reads, 70m reads in overall ESC K27me3
* Experiment set accession = 4DNES8TY5P5P, IgG: 4DNESB4SCBQW
* K27me3 downsample to 21.M, IgG for ESC downsample to 13.5M
## 03/28/2023 Huddle Day