4DN2 SPRINT DIARY - Q1S5

1. ###### tags: `Sprint Diary` # 4DN2 SPRINT DIARY - Q1S5 ## 14th March, 2023 / 11:30 AM to 28th March, 2023 (/ Extended to 11th April, 2023) ## Timeline [Airtable Timeline](https://airtable.com/shrLOuMSWU0UlAA9F) ### Attendees Garrett, Vedat, Emily, Nezar, Jiangyuan, Trevor, Sasha ## Resources * Sprint Slides for Figures: https://docs.google.com/presentation/d/1bAKAU5D_pwIQ4Ram2PwocHlbqWBMkRy2ZALzqYxNdic/edit?usp=sharing * Task Kanban: https://airtable.com/shrTNGv4Ypjr5Icde * Github: https://github.com/abdenlab/4dn2-sprints.git ## Sprint Agenda ### Planned ### In Progress ~~Finishing Migrating HG documentation. *Trevor*~~ Histone marks pile ups, code review? Rewriting. *Emily* ~~GSEA for RNAseq. *Yu*~~ ITS vs E1, Further dot analysis. *Jiangyuan* ## Morning Huddle Discussion around Standup update! Discussion around Emily: presenting slides on her past efforts! will be lab meeting slides! H3K27me3 and Polycomb and different. Will be working on other Jiangyuan: dot-calling , singleton removal. CTCF, some removed singletons could be true positive, next step on how to fine tune! * Emily's q: Have you considered your minimal list on CTCF motif analysis and pick out which one enriched CTCF motif? Sasha: Trevor: submission deadline next couple weeks. Transitioning 4DN demo from fall with new data, in terms of breakout- Garrett; Nezar; Trevor on HG python ## 03/15/2023 ### Jiangyuan meeting Singletons - after CTCF peak enrichment cut off at 150 anchors. Called dots from megamap. loop(dot) strength (q-scores). Nezar's rec. If you take megadots, and score them on each conditions, use dot-calling to calculate q-scores. Then assign an overall score to each conditions, similar to membership strengh. Sergey can help with multiple hypothesis tests. Rank which cell has the highest score! Share the mega-dot list with Sergey can calculate hiccup scores for each of the dots in megamaps. K means clustering of E1: New ITS; log transformed of originals with Z scores, generate heatmaps again, with switching and not switching. ## Standup Updates ## Huddle day 2: 3/21/2023 Garrett, Jiangyuan and Vedat are present ### Garrett: * Cut n Run * Dowloading reference datasets * Currently using https://github.com/lh3/seqtk , Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing): `seqtk sample -s100 read1.fq 10000 > sub1.fq` `seqtk sample -s100 read2.fq 10000 > sub2.fq` * Downsampling input: Need more information from Emily. It is random downsampling. Vedat's question: Do we need to apply a statistical method while downsampling instead of random? * MACS2 and SEACR are established in the pipeline, Analysis is run, need to look at output * Jupyter hub installed: * The littleist Jupyter hub: running into connection issues on Jupyter side * Node jump is still causing troubles. Hopefully will be resolved soon by Michael and Arjan * will look into T&T mapping later this week. ### Jiangyuan: * Compartmentalization vs Gene expression: * E1 vs ITS: Normalization issue with iHEP has been resolved by normalizing genes across stages, instead of normalizing by cell type (dev. stage) * New heatmaps E1 vs ITS using Hi-C 3.0 * Dot (loop) calling: improving dot calling, singletons vs dynamics ### EOD meeting Emily - Garrett Emily to Garrett: * compare / determine : why two assays show different results? One might has more seq * What are the reads less seq dept dataset, in DE? Downsample to DE numbers. * Combine FASTQ files together. * Downsample ESC to DE data. Both reads? * K27me3 = 21.3M reads, 70m reads in overall ESC K27me3 * Experiment set accession = 4DNES8TY5P5P, IgG: 4DNESB4SCBQW * K27me3 downsample to 21.M, IgG for ESC downsample to 13.5M ## 03/28/2023 Huddle Day