1. ###### tags: `Sprint Diary` # 4DN2 SPRINT DIARY - Q1S4 ## 28th Feb. 2023 / 11:30 AM to 14th March. 2023 ## Timeline [Airtable Timeline](https://airtable.com/shrLOuMSWU0UlAA9F) ### Attendees ## Resources * Sprint Slides for Figures: https://docs.google.com/presentation/d/1bAKAU5D_pwIQ4Ram2PwocHlbqWBMkRy2ZALzqYxNdic/edit?usp=sharing * Task Kanban: https://airtable.com/shrTNGv4Ypjr5Icde * Github: https://github.com/abdenlab/4dn2-sprints.git ## Sprint Agenda Planned: * Insulation Scores and TADs, Hi-C Low dimensional features - Data is ready, Jiangyuan might have some results already! Might need visualization. * Classical Eigen vectors * Expected * Scaling * Saddle plots. * Compartment strength * Insulation scores. (5kb-10kb-25kb-50kb-100kb done, Johan) * Dots. (Jiangyuan has completed, hasn't shared data) * ATAC-Seq peaks across from ESC to DE? * EVs and projections: the clustering of 50 Kb trans and projections on 5-10-25 Kb and their QC * Generate final IPGs for Hi-C 3.0 * ~~Create IPGs from joint clustering of trans eigenvectors (Emily)~~ * ~~Create IPGs from joint clustering of cis eigenvectors that Sasha aligned via trans eigenvectors~~ * Compare the Hi-C 3.0 joint trans IPGs to Hi-C HindiIII joint trans IPGs * Compare the Hi-C 3.0 joint cis IPGs to Hi-C 3.0 joint trans IPGs (50 kb) In Progress: Finishing Migrating HG documentation. *Trevor* Histone marks pile ups, code review? Rewriting. *Emily* GSEA for RNAseq. *Yu* ITS vs E1, Further dot analysis. *Jiangyuan* ## Standup bot updates for the sprint | Member | Date | What are you working on this sprint? | What progress have you had so far? | What will you do next? | Anything blocking your progress? | | --------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | Nezar Abdennur | 2023-02-28 | • Updates to `higlass-python` and upgrade cooler tileset fetcher for `clodius` | Implemented a "partition <https://github.com/higlass/higlass-python/pull/105\|scale>" object to greatly simplify the harmonization of different "binnings" of composite genomic coordinate systems: HiGlass's native binning vs how file formats do it. | Review Trevor's PRs to `clodius`. Review new `higlass-python` API for a release. | Time. But setting up openai's copilot in vscode has been an amazing productivity tool! | | Trevor Manz | 2023-02-28 | Not many updates. Continuing to work on HiGlass Python documentation, some additional maint. with clodius. | Added to <https://github.com/higlass/higlass-python/pull/97>, opened several smaller PRs in clodius to fix some issues with usage and installs in modern Python environments:<br><br>• <https://github.com/higlass/clodius/pull/146><br>• <https://github.com/higlass/clodius/pull/147><br>• <https://github.com/higlass/clodius/pull/148><br>• <https://github.com/higlass/clodius/pull/149> | Merge documentation. Meet with others to discuss IPG visualization next steps | Not at the moment. Just some long-ish cycles with opening PRs related to HiGlass (that I’m not able to merge). | | YU FU | 2023-03-02 | • Organize and upload all RNA-seq analysis output and notebook to workspace<br>• Make slide decks with method details and notes<br>• Test run ENCODE ATAC-seq pipeline | Done most of the analysis but in need of a lot organizing | Finish up everything and provide resources for people to refer back if necessary | Time… Need to catch up course work too | | Emily Navarrete | 2023-03-02 | figuring out what’s going on with the cut and run data posted on 4dn by the henikoff lab | It seams like SEACR (at least at the default settings) can’t discriminate signal from noise very well. If you look at the higlass screenshot, the cut and run track for H3K27me3 (shown in blue) on the left (ESC) corresponds to *459 called peaks genome-wide,* whereas the cut and run track on the right (DE) has over *700,000 peaks.* If you look at the track on the right, there appears to be a uniform, low background signal across the genome. I checked to see whether anything was weird about the IgG controls, which are supposed to be used to background normalize the tracks (shown in the top tracks in green), and it seems that there’s maybe ~50 times more reads in the DE IgG track than the ESC IgG track. | I’m going to try renormalizing the DE cut and run data to the ESC IgG control, because why not | just busy trying to balance things | | Nezar Abdennur | 2023-03-03 | Help to get to a release of `higlass-python` and get `pybbi` working on py3.10 for `clodius`. | We discovered some complexities in the current way the HTTP FUSE filesystem mount is exposed to the user. Fixing this will require some refactoring, maybe in a future release. | Fix the pybbi build issue. | For higlass-python, deciding whether anything else really needs to be done or just go ahead and release as is. For pybbi, finding time to troubleshoot. | | Trevor Manz | 2023-03-06 | • Help finishing up release of `higlass-python`<br>• Fixing some issues with installing `clodius` in a new environment so usage in `higlass-python` is seamless<br>• Starting analysis-vis work, migrating 4dn-demo to mirny lab | • HiGlass Python is ready for an initial release<br>• Started compiling visualization demo in `/home/manzt/demos/hg-scatter` | Finish migrating the example | Not at the moment, just balancing my other research project which will conclude at the end of the month. | | | | | | | | ## Huddle Day 2 - 03/07/2023, 11.00am ### Who is Present? * Nezar, Garrett, Vedat, Yu, Jiangyuan, Trevor, Sergey, Emily. ### Updates **- Trevor**: Documentation is updated, Blocker: What version to release? V.1 or V.2. Documentation not linked to HG, needs updates. Keep original documentation. Cloudius and local data! **- Yu**: Done GSEA, Hi-C compartment change in a given geneset. **- Jiangyuan**: Dot calling, Singleton filtering to remove false positives, to eval; overlapped anchors around CTCF ChIPs, Only have ESC. After singletons remeoved, convergent around the anchors. Nezar's Note: Dot calling on Megamap,with tolerant treshold, you get noise, but select independently from 5 stages. Has K means clustering on E1 vs ITS for stable and dynamic compartments. **-Emily**: remapped h3k27me3 Cut n Run, working to eliminate background, ESC C&R looks good. Problem; trying to overlap w/ IPGs, the way BioFr overlaps the peaks messing things up. BioFr increase the overlap, go through the notebook with Garrett. Calculate the extend of the overlap, each duplicate peak, see which one has max, then unify. How do you find peaks shared acrossed different IPGs? - overlap, calculate off-sets. Peakcaller might be biased towards narrower peaks. Let's start with union list! - ESC has low duplicate peaks whereas DE has more duplicate peaks. Q: What peaks are unique to each cell type vs shared? A: merge domains that overlaps, + dont overlap = you have a unionlist of peaks. Quantify enrichment at polycomb vs remainder. You can make a ratio and fold changes. Nezar's question: What's the RNAseq protocol? PolyA or Total RNA. Emily will run on our server w/ Garrett NFcore nd Nextflow piepelines. /needs docker. **- Sergey**: ID'ing significant dots: very strange statistical procedure, but vulnarable to noise. One solution: Try seeing if we can come up with matrix. Idea is local background comparison, if stat. significant firs p-val then q-val. ### CSG Meeting * Mohan protocol on Cut n Run * GSEA * Dotcalling * Emily's ppt? ### Notes: Cut n Run normalizations: Not spike in control, we got 100times more peaks, w/o spike in control. 5 cut n run seqs, compare depths, run NF-core pipeline w/ wo normalization, W/ Emily. In silico figure out how much spike-in you need?? What timeline for Cut n run? How deep the sequence needs to be? ## Breakout - Afternoon ### Jiangyuan CTCF seeded singletons. After removing singletons, false positive rate dropped. ChIpseq heatmaps: With the original matrix, heatmap created, 20kb CTCF Do you have a matrix just containing singletons? Take all singletons, plot them on HG to see if they look good? If looks good, let's keep them. If no look good then it might be false. Split in two files, keep high side of elbow. - keep top 150! We took what deeptools have done to quantify chipseq strength, including singletons, looked at midpoint, 500bp, for every singletons, we got chipseq enrichment and sorted them, we saw an elbow and and found a small population contaminating the analysis. Let's analyze the small population on HG to see if they look real! Add RNAseq as heatmap next to K means clustering! #### How it's done: - TXT to pandas, then to numpy. Graphed enrichment inverted. - ROSE algorithm to find elbow point. #### Thursday Meeting Emily, Garrett, VOY. quality of peaks? : Treshold and range? If compareble ? After normalization - compareble? How much spike in DNA needed? How much coverage is good for spike-in? 1% of the mapped reads should map to spike in. Try down sampling input vs total - determine Read dept, called peaks vs 50M read datasets, making sure heights don't change Diff normalization methods, normalize to spike in. Make sure how normalization happens. -Downsampling the pipeline in parallel, 6 rep vs 3reps v 1rep? Benchmarking it, Stringent vs relaxed, MACS2. Histone marks vs CTCF data on 4DN Ensure spiking is valid.