1. ###### tags: `Sprint Diary` # 4DN2 SPRINT DIARY - Q1S2 ## 24 Jan. 2023 / 10:00 AM to 14 Feb. 2023 ## Timeline [Airtable Timeline](https://airtable.com/shrLOuMSWU0UlAA9F) Attendees: * Nezar @nvictus Wk1-Wk2-Wk3 * Vedat @VOY Wk1-Wk2-Wk3 * Yu Wk1 * Trevor Wk1 ## Resources * Sprint Slides for Figures: https://docs.google.com/presentation/d/1eTivsEx96OuuLo4Vn9hUhyUnRjOwjfJqepHyGEDSmdY/edit?usp=sharing * Kanban Board: https://airtable.com/shrNznOkqe3JGugKK * Github: https://github.com/abdenlab/4dn2-sprints.git * Whiteboard: https://miro.com/app/board/uXjVPuwBMWg=/?share_link_id=217405554209 * Action Items: * Sprint Goals: ## Agenda Previous Items: * Planned: * Trajectories of RNAseq; Relationship to epigenetic compartmentalization. * Insulation Scores and Short range interactions, TADs, Hi-C Low dimensional features * ATAC-Seq peaks across the developmental stages. * In Progress: * Consolidated Epigenetic tracks * Improvements on IPG Clustering * Dot calling for newly mapped Hi-C Lanes: * RNAseq: Pairwise DEx * Release `hg` as `higlass-python` on PyPI and update documentation * Expose higlass’s Jupyter widget using anywidget and deprecate higlass-widget **Sprint Items Wk1**: * Time-course analysis RNAseq differential gene expression by Yu Fu Parking Lot: * ATAC-Seq by Yu Fu Notes Wk1: ## Sprint Day Agenda Wk1 Sprint Morning: * 10am - 10.15am: Check-in @ Sprint Room * 10.15am - 10.30am: Results from past sprints * 10.30am - 11.00am: New Sprint Agenda, Discussion and Task Assignment * 11.00am - 12.00pm: Breakout rooms for discussions. * 12pm - 1pm: 4DN2-CSG Subgroup Meeting * 3.30pm - 4.30pm: End of day meeting to announce plans, strategies and outcomes. ## Huddle Day Notes Wk1 ### Morning * Yu, Trevor, Nezar and Vedat in morning huddle. ### EOD (3.30pm - 4.30pm) * Clustering and visualization of RNAseq * - UMAP, PCA, Heatmap and Timeseries * - Cell type specific programs for lineage commitment, maturation, ES to Endoderm ### Sprint Notes Wk1 Amazing Sprint! ## Sprint Day Agenda Wk 2 ### Huddle Day Notes Wk 2 Emily, Trevor, Yu, Jiangyuan, Vedat and Nezar present. ### Sprint Morning: 11.30am -12pm: Morning Huddle Yu presents RNAseq DEx results - 1. PCA not great for clustering visualization, K mean clustering?? 2. Since deseq2 requires raw counts, i chose not to use TPM and wonder the difference between gene count scaled and gene count length scaled. Used gene count length scaled for this week analysis 3. Clusters have ~2K genes, 10 clusters. 4. Enrichment,filter by p-value, 5. Go enrichment with up and down genes. 6. Clusters 4 and Cluster 7 - interested in looking more as has cell ID markers. 7. Have set of high confidence genes. 8. Curious to see big effect changes, top 50 in each cluster? 9. Represent the data in genome track- we can compare to Hi-C. 10. Hi-C map with Tracks on Jupyter notebooks, integrate to anywidgets for HG? 11. Plot UMAP manifold to see gradients, and GEx. 12. Combaning Eig. vectors and embedding by Jiangyuan, ## Sprint Notes Wk2 Jan 31st - Feb 7th, 2023 ### JiangYuan Union dots list, for DEx, Inferred transcriptional score calculation \ ITS bedfiles. Bin level comparison of transcription ### Emily DE and ESC IPGs tracks, H3K9me3 IPG is particioned at Def. endoderm. Late repli vs mid repli. What are the other things to infer? What genes are in each of these? Are diff types of genes and diversity to infer from biology? Pluripotency genes, liver specific? Transposable elements. Does any of it enriched in IPGs?? - YU recommended overlapping IPG specific clusters!, Might be interesting to see genes that are repressed but not clustered together. EOD update 1/31/23: - Wasn't sure if we should be doing GO analysis on IPGs if they stay the same across differentiation... didn't really see the point - Generated a jupyter notebood to create IPG-specific pileups and found that polycomb dots exist in at least 4 clusters (unmerged) in ESCs. Need to meet with nezar to go over IPG assignments (different rows in the image below are different IPGs). ![](https://i.imgur.com/CHlB1Bl.png) Overall ## Breakout rooms. ### 12.25pm Jiangyuan - ITS discussion: single data frame, one column for each state. can be used for UMAP too. - Standard dot calling on the megamap! might give a good set of union_dots. - Differential peak and ??? ### 1.00pm Emily - Clustering discussion, focus on joint clusters, GSEA on joint clusters. if polycomb is established, expression will go down. - salmon.merged.gene_counts.tsv under Nextflow - ### 1.30pm Yu - Number of nearest neighbors, UMAP avg exp levels in different cond. may be more variable, something that is orthagonal to the timeseries. - Coloring by the IPGs on UMAPs. ## Sprint Day Agenda Wk 3: Feb 7th - Feb 14th, 2023 ### Huddle Day Notes Wk 3 Who is present?? ### Sprint Morning: Emily, Sasha, Nezar, Trevor, Yu, Jiangyuan, Sergey, Johan and Vedat is present for the huddle. #### Morning Huddle Check-in **Jiangyuang** - meeting with Job= Cell type specific dots are singletons. False positives need to be removed. Working on investigating singletons, which ones are real. Tried calling dots on the megamap! Johan called the dots on megamap! **Yu** - investigating RNAseq data, dimension reductions tuning the parameters -to see PCA and UMAP makes sense. Parameter for clustering RNAseq. Clustering picks up average, you'll need to cluster on a subset of genes, which does not fell into the avg. Yu is investigating these parameters. Might use Deseq2 output for clustering, or using different variance cut off. **Trevor** - HG is being reviewed, for merging seperate projects together in git repo, it will be released under HG as a new major version. Needs document update. **Sasha** - Varying objective function for 3C+ decomposition Decomposing 3C+ with varying objecive function. Nezar's note; all merged, new Hi-C , cis n trans decompositions will be great to have. Sasha had produced contact maps for 5kb res previously with IPG algorithm improvements?. **Emily** - IPG color assignmets needs to be checked, #### Breakout Schedule 12.30pm Jiangyuan - Nezar - Sergey discussions on dot-caller.