Isaac Virshup
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- tags: anndata, hiwi-cohort, meeting, dask --- # 2024-11-14 ## Severin - Looking at data formats and out-of-core - zarr >> hdf5 for out-of-core - Creating best-practice notebook for out-of-core - Should clear cache between runs (see zarrs becnhmark repo for command) ## Ilan ## Isaac - Not much else, time off and sick ## Mikaela - Benchmarking <3 thank you!!!! - Scaling results coming ## Phil - Picked up `subset` function for masking - will either do two PRs or all as one - Recreated Mikaela's plots - Good for looking into rework of scanpy plotting # 2024-11-04 ## Mikaela - https://github.com/MannLabs/alphabase - https://www.nature.com/articles/s41587-022-01302-5 - https://scholar.google.de/citations?user=TOK1Xd0AAAAJ&hl=de # 2024-10-31 ## Ilan - Working towards anndata release - zarrs performance # Isaac - Reproducible notebook tool - write/read soma? schema doesn't match right now - why this if mudata does something very similar? - Write bed file as URI for CellXGene, maybe look into sgkit? ## Mikaela - proteomics interested in hierarchies in mudata - global varp? - Danila open to modifying mudata ## Severin - UMAP not from a distance graph - Checking into intel PRs more ## Phil - Short week, off for funeral # 2024-09-30 ## Severin - Core vs. AnnData scanpy implementations would be good for scanpy - Allow for GPU, other speed-up drop-ins - Custom data loaders dropped for scvi-tools - Remove delayed in favor of map_blocks in multi-gpu dask ## Phil - wants to make progress on 2.0 - Need to document kernels, internals etc. ## Ilan - zarrs stuff needs to be done, need to contact IT - xarray concat with dask drop-in for masked-array super types # 2024-09-23 ## Ilan - `io` submodule is done - `zarrs` optimization? ## Severin - codespell for scanpy - bbknn for scanpy? - upstream first, then slow/no reply/resistance means we can incorporate maybe - definitely bbknn for rsc - multi-gpu dask ## Phil - Stabilize and export testing utils (anndata) - Remove `scanpy.tests` - use testing utils from `anndata` (or `scanpy`) - look at mask stuff - refresher on PCA - AnnData `.raw` copying issue # 2024-09-19 ## Phil - Preparing release for scanpy -> directive moved for anndata as well, towncrier, then release - Would be great to start sparse PCA in scanpy - Badge for scanpy functions that have a RSC implementation ## Ilan - Remove `shall` from the variable # 2024-09-16 ## Phil - Preparing release for scanpy -> directive moved for anndata as well, towncrier, then release - Would be great to start sparse PCA in scanpy - Badge for scanpy functions that have a RSC implementation ## Ilan - Need to re-release anndata with the dask extra/feature for install - Need to tweet about it - Getting Rust project into reviewable state - sparse array business - `vitessce-python` PR - viv upgrade PR ## Severin - Focus on getting dask ready for use in RSC - `delayed` objects - `bbknn` work needs to be done on GPU - could be integrated into scanpy - would be a great way for us to get some fine-grained control # 2024-09-02 ## Phil - Looking at what goes into top-level of release - Helping with 0.11 release - Writeathon - Prepare scverse ## Ilan - Classes for views/axes - Sparse array fix PR - Make presentation - Add Phil to rust repo - Sparse array API stuff - Release 0.11 ## Severin - BBKNN/general knn harmonization - Node for hackathon - Work on presentation - kernel in scirpy needs to be reviewed (hamming distance) - Finish intro presentation for Danila (done by tomorrow) # 2024-08-29 ## Ilan - Sparse `spmatrix` sub types that aren't csr or csc in `X` - bug or real? - Array API protocol maybe? But this shouldn't block the typing - Exporting `CSRDataset` and `CSCDataset` - are we ready? leave in `experimental`? export a `Protocol`? - Put in an `io` module along with `read_elem` and `write_elem` - Keep exporting from `experimental` - `should` vs `must` in booleans ## Phil - I/O for nullable strings has a spec for the future (use a setting) and is using the correct string syntax for pandas - Need to look into `should` vs `must` # 2024-08-26 ## Ilan - zarr-rs-python works and passes a bunch of tests! - zarr-python slower for big chunks but faster than zarr-rs-python for small chunks - `towncrier` work progressing -> assuming all looks, merge the `towncrier` PR. Then update and automate once header fix is in - We need to start the release process for 0.11, which means doing 0.10.9 first ## Severin - Working on welcome presentation for scverse conference - Fix error with CI for llvm-lite for AnnData - Finish `session-info2` work - Big mismatches between packages in UMAP calculations -> neighbors taken, saved, and reported are different among implementations - method field from scanpy implementation writes "umap" for neighbors form pynndescent, but bbknn also does this for some reason ## Philipp - Working on `session-info2` stuff - Helping with `towncrier` # 2024-08-19 ## Ilan - zarrs-python working more now - Make test repo for releases - Work on array-api ## Severin - Worked on multi-gpu -> moving everything to one graph still not working - Worked on PR for AnnData stuff -> all stuff working now with simplified `uv` install - `session-info2` install of `uv` very helpful - Specify that it is better to use pre-built wheels ## Phil - Dislikes lack of "provides" so that cuda resolution could be smoother (instead of multiple extras) - Thesis work being done - Update for current state of the field -> ATAC-seq for single cell based on AnnData, maybe foundational models mention? # 2024-08-12 ## Severin - Working on docstrings for rsc: https://github.com/scverse/rapids_singlecell/pull/242 - Looked at `session-info2`, would be good. Possible to hack in GPU question for driver, cuda-version etc.? ## Philipp - Submitted ICBS paper (waiting on latex error report) - PR about resetting plot parameters - Going through the sprint ## Ilan - Bunch of bugfixes for `AnnData` - Rust for zarr almost done - Try to make progress on scipy array-api # 2024-08-08 ## Ilan - Made an issue for GPU concatenation, but this is an isolated problem from the bigger GPU issue and is a big problem ## Philipp - Nullable string is mostly a release issue - this becomes quite annoying to people who have to update downstream readers - Since this is an update of stuff we don't currently support, releasing shouldn't be an issue as it is a "feature" - We've added the error message for i/o so we have kind of already done this - We improved error messaging in 0.7 - Solution(?): use a setting! we support this now! ## Isaac - Need to go through 0.11.0 and decide what is in it - Review xarray - Proposal for IO module ## Severin - Currently use `print` statements for logging, what is best practice? - Should maybe implement your own logger - scanpy's logging is not public as a module (only some functions) # 2024-08-05 ## Ilan - Picked up issue on violin plots - MOSCOT PR finish up - xarray PR - Major release AnnData stuff: https://github.com/orgs/scverse/projects/27/views/1 - Automate release completely - Many ways of doing this - Need to handle release notes, milestone creation, etc. so it just works by a single event (or no event if completely automated) - Create a doc for collecting these ideas - Start by applying these scanpy - towncrier? release-plz? changesets? ## Phil - Work around myst-nb issue by building tutorial with branch - Flakiness on benchmark could be down to cpu affinity - Assign all cores on CPU 2 to the benchmarking and then trying to assign everything else (background etc.) to CPU 1 - Not sure how to work on masking beause want to have best effect - Mask argument should be used where "subsettable" - Nan-mean for score genes would benefit from a mask - Boolean look-up tables can be faster than subset a lot as well, especially in numba - PCA is 3rd party, so needs subsetting instead of lookup ## Severin - Support for Jax anndata? - More broadly the array api is the way to go here: https://github.com/google/jax/issues/18353 - Need to fork Intel's branches to be able to make our edits - Spoke to Nvidia about blog posts # 2024-07-08 ## Ilan - distributed PCA: RSC changes needed - GPU cluster doesn’t give him any GPUs ## Phil - How to do mypy progressive typing - Do PR with benchmarks for off-axis (dense and sparse) ## Severin - sparse → dense conversion (C-continuous vs F-continuous) - work on intel PCRs - Was approached by Malte to work on open problems thing about seurat compat - phil opines that having statistical robustness of analysis outcomes is more valuable # 2024-07-01 ## Phil - Do sprint planning tomorrow, should set up meeting ## Ilan - Ale’s PCA runs into Cupy bug, but Ilan&Severin think they know a good workaround – “just use `map_blocks`”, but maybe not. Ilan is debugging it. - Isaac doesn’t review `read_as_dask`, so Phil should take over ## Severin - onboarding with HMGU! - new CuVS backend that can be used for e.g. PyNNDescent would be good fit for RSC - Intel PRs: - Need better benchmarks with bigger data for “primitives” like get_mean_var - Need minor axis benchmarks - Need benchmark for QC and normalization, there’s potential - Each thread should do continuous access # 2024-06-27 ## Ilan - Debugging/working on multi-gpu dask for Alejandro/Severin - Viv/Vitessce maintenance - Reviewing PRs/Maintenance fixes ## Eljas - Wrapping the PR bug-fix for subsetting HVG - Continue on Pearson PR - Look into scaling(or norm) producing NaNs # 2024-06-24 ## Ilan - What is going on with scrublet? - Dask memory issue - Viv upgrade - dask elem PR array type ## Phil - Working on china paper/talk - Need scanpy/anndata release for numpy 2 # 2024-06-17 ## Ilan - Proving to Fabian that PCA from millions of cells works - Vitessce/viv work - Look at sprint now that I'm back ## Phil - Professionally incapacitated by kitchen equipment - Look at Dask benchmarks ## Severin - C arrays can be viewed as 1D F continuous, but calling `flat` turns to C continuous - We should do more numba stuff, e.g. sparse clipping # 2024-06-06 ## Isaac - Has similar Dask error, maybe just bump its min version # 2024-06-03 ## Phil - Working on back PRs - Dask min version issue is weird - bug caused by dask’s `concatenate3` being called with sparse matrices, not arrays - Tried to fix zappy issue but had to revert: https://github.com/scverse/scanpy/issues/3087 ## Ilan - AnnData.js package in a good place, doing some review - Waiting on reviews of other PRs ## Severin - Also seeing the test failure from dask min version: https://dev.azure.com/scverse/scanpy/_build/results?buildId=6790&view=logs&j=ed19f947-f4ce-5e43-250d-4ce4db49a756&t=de40599c-6b0c-5ef7-3ea6-59cb0efad3c6&s=96ac2280-8cb4-5df5-99de-dd2da759617d - # 2024-05-23 ## Eljas - Waiting for review from Phil for PR on a bugfix, very important for next release ## Ilan - `xarray` bug that they called me for - `concat_on_disk` fix for `outer` joins - `vitessce` work, `anndata-js` - `vitessce-python` work as well # 2024-05-16 ## Eljas - Bug for subsetting hvg, PR opened, Phil looked into it, and hten improve tests + merge - Would be nice to refactor - Want to merge Severin's performance PR before refactor, though ## Phil - Fixed broken doctests, moving testing to `src/testing/scanpy` - Maybe move all of scanpy into `src`? Or maybe should wait until Phil is back, updating open PRs ## Ilan - TypeScript anndata-js almost done - Finished up sprint stuff as well - `xarray` release is out so no blocker now for backed `{obs,var}` ## Severin - Need to talk to Isaac about HVG performance PR - could make it single-threaded, but need Isaac's opinion - Wants to work on mean-var optimization as well for scanpy # 2024-05-13 ## Severin - Needs to work on thesis - Scrublet slow because of caching or imports - Wants to tweet about docker image ## Phil - Not sure if running ASV is good - Newer python projects using codspeed - Could give more insight if we could upload our stuff there as well - Codspeed is good but we don't control hardware and don't need to to pay money (for GPU) - Not sure why scrublet is not working on a certain dataset ## Ilan - Work on clearing up sprint PRs - Backed mode scanpy PR looks good # 2024-05-06 ## Ilan - scipy array api at a point where discussion needed/1d array PR needed ## Severin - Tests written for multi-GPU implementation - Testing dask on 3k dataset - Want to do scaling and neighbors - How to make clear that Dask is experimental? Different branch? Settings? ## Phil - Few small things, missed notebook stuff, tutorials have some reproucibility issues (text referring to different cluster numbers) - Myst has support for fixing this via expressions, but didn't work with scverse tutorials # 2024-05-02 ## Phil - Bug Igraph flavor for leiden, should be fixed early - Added PR with benchmarks for last thing missing: scrublet. is slow, should consider if possible to speed up (Severin: can look into if have time) - Move tests out of package for anndata; warning about coverage break ## Ilan - working on array api for scipy sparse; including notimplementederror for proper documentation - spatialdata db project vitessce stuff ## Severin - hvg functions nicer, numba kernel is right choice but unclear if multiprocessing is. - looking into ilans stuff on clusters - getmeanvar - will have to do analyses in next weeks so reduced dev time ## Eljas - Important bug discovered in HVG: Seurat and cellranger with batch arguments just report the first 2000 genes instead of the HVGs. Seurat V3 is safe from this. - Has MRCE for Severin # 2024-04-29 Görlitz GPU Hackathon: results good, but messy, need to be integrated well ## Phil - Put standups in scverse calendar ## Severin - Isaac found memory peaks in some of Severin’s numba code, needs to measure what’s going on - Currently working on integrating hackathon work - Need to smartly handle numba multithreading with small data: preferably use single thread instead ## Ilan - Going to work on sprint stuff now that hackathon is over # 2024-04-15 ## Eljas * Pearson2dask * using scaline to profile * numba faster than dask where possible (i.e., datasets in memory) * try numba in dask * Try dataset from dask notebooks ## Ilan * Was sick * Look at benchmark PR * ## Phil * Benchmarks ## Isaac * # 2024-04-11 ## Ilan * csr_array pr * Weird coverage * `read_dask_elem` * benchmarking ## Phil * `scanpy` benchmarking * `pytest` bug * Next focus benchmarking ## Severin * sparse aggregate ## Isaac * gpu support * # 2024-04-08 ## Ilan * sparray PR * vitessce PR ## Isaac * scipy 13 review * a lot of reviews * follow up with severin about aggregation ## Phil * Benchmarking suite ## Severin * Another follow PR to finetune sparse scale, + benchmarking * Other aggregate functions PR working # 2024-04-04 ## Ilan * # Isaac * GenomicRanges stuff from Hackathon * Will helpf Phil to get on denbi # Eljas * Has draft PR for Dask for Pearson residuals * Keep Pearson normalized PCA, make compatible with dask? ## Phil * Fixing benchmarking bugs * Setting up benchmarking for scanpy # 2024-03-28 ## Ilan * Vitessce this week * csr_array/ csc_array * SpatialDataWrapper for hackathon * Benchmarking ## Eljas * Pearson residuals ## Phil * Bibtex * Benchmarking * Figuring out what to do for hackathon next week ## Isaac * Scale * Announcement * GroupBy # 2024-03-25 ## Isaac * Want to get release out ## Ilan * Will handle the scipy 1.13 release issue by intercepting CRS column operatoin * Use CZI fixed URLs and update test ## Phil * checking out yuge's `obs` vs. `obs` * `src` directory PR, will be done post-release # 2024-03-21 ## Isaac * Working with Ilan on dask tutorial * Performance issues with sparse notebook? ## Phil * Will work on docs PR in meantime so notebook renders well * Updated benchmark system so there is a status check on the latest commit * Need security for running on PRs from non-scverse (untrusted) people; can do label-based mechanism, but this is not a very good security setup ## Ilan * working on vindex issue for `_subset_obs_in_place` + `obsp` ## Eljas * Looking into pearson issue and will be joining thursdays rather than mondays # 2024-03-18 ## Phil * Benchmarking machine working ## Isaac * Working on tutorials, need new tutorials (TODO: Ilan) * Options for tutorial: By the book release, secret release, or with sparse * Create client, set memory limit, etc. # 2024-03-07 ## Isaac * Reivews mostly, maybe bug fix release of AnnData * Issue for 64 bit indptr on disk but 32 bit indices will probably not be done before this bug fix release despite desire * Big push for docs implementation for new neighbors implementation ## Phil * PR mechanism for benchmarking machine works * Still doing some tuning and then enabling for test repos * # 2024-03-04 ## Ilan * Finish(ing) the dask sparse sum and scaling PR (scale, hvg, normalize) * Move on to creating usable branch for array-api with sparse ## Eljas * Looking for things to dask-ify ## Phil * Benchmarking machine - looking in to tuning, PR comments posted back to run-branches, use git token so no rate limiting * For tuning, it would be nice for someone to look at it - how do we get CPU isolation? Maybe just a shell script run on startup * Switching AnnData to `src` folder - was waiting on PyTest response, but now that we have the reply, it still does not work # 2024-02-29 ## Ilan * almost done with sparse-in-dask * some questions about anndata’s dask helpers * xarray PR should get merged soon ## Isaac ## Phil * some progress on scverse/plotting_api w/ Gregor * # 2024-02-26 ## Philipp * Starting first test runs on the benchmark machine, seeing if we can respond to requests from github * Figuring out automatic runs should be first step * security, capacity etc. * review PR with control genes scoring ## Eljas * Just back from vacation, busy for the time being ## Ilan * Started on sparse-in-dask for `_mean_var` and so on * Going to do tutorial notebooks ## Isaac * Working on 64bit writing and then on to GPU stuff # 2024-02-22 ## Isaac * Not sure what to do for RC docs, but doing it the RC soon * TODO release: - [x] make docs read “1.10.0rc1 (2024-02-22)” - [x] make 0.10.x branch - [x] do it ## Phil * Benchmarking * Getting things running fine * Uploading and publish ## Ilan * array api test suite running on sparse doesn’t look pretty # 2024-02-19 ## Ilan * Discovered and working on Pandas bug * AnnData behavior difference in and out of Docker ## Eljas * Triaging issues * Looking through issues, mainly [Dask](https://github.com/scverse/scanpy/issues/2578) ## Philipp * sparse in HVG work: try to "compute" everything and just see what passes, but "compute" doesn't actually work * get_mean_var doesn't work with sparse * One more PR (semantic version identifier) waiting for review ## Isaac * igraph PR review coming * seaborn issue discussed with eljas fixed by updating seaborn * look at aggregate PR as well and then release! * release on tuesday or thursday # 2024-02-15 ## Isaac * igraph review * Maybe just keep current default for leiden and then switch later? Dont want to hold up and this *is* a breaking change * We have said we do semver, and this should include numerics w/i reason * Switch notebooks to igraph and leave default * Add warning for future default change `leidenalg` * [mask argument](https://scverse.zulipchat.com/#narrow/stream/393966-scanpy-anndata-dev/topic/mask.20argument) * axis specific arguments should probably be added eventually * while mask is new, we should don the change now * Don't infer mask from number of obs and var - plus, might want to do both * This will push back the release, but this is worth it * mindeps job * Happy with implementation, but want the flexibility to specify min version and also incorporate bug fixes * Follow up issue for this? * Can't specify .0 for dask because there is not a release on the first day of the month * docs change after release candidate * aggregate: Phil will be asked for review ## Philipp * All PRs are done or waiting for review * Isaac just approved HVG * Can add suppport sparse-in-dask later always - experimental in AnnData anyway * will do mask PR, then ask Ilan for review * ask yuge about dans PR :white_check_mark: ## Ilan * All PRs are done * Leiden: see above * Dask: Always write 64 bit sparse indices/indptr # 2024-02-12 ## Ilan * xarray categorical PR: seems to be going in a good direction * sparse array indexing fixes: compatible with array API, mostly done * starts with dask 64 bit stuff in scipy ## Isaac * PCA order problem: just throw warning * Needs to do some reviews ## Phil * blocked on PRs * scrublet PR not renamed, Isaac needs to follow up on a comment * fixed plotting warnings * benchmarking machine now gives ssh prompt via VPN but cannot login ## Eljas * Sick, so did not complete much last week * HVG meeting tomorrow * need experience with dask becuase it will be important for ehrapy * maybe sparse-dask-chunks integration into scanpy? * upstream things into dask as appropriate * focus on things that work on entire count matrix # 2024-02-08 ## Phil * Out today, we'll do sprint planning tuesday * HVG PR is ready for re-review ## Isaac * GPU issue * Min deps PR * order issue * plotting ## Ilan * waiting on review for leiden * # 2024-02-05 ## Phil * Maybe more PR's reviewed by Ilan to speed things up. * pytest 8.0 should be able to handle `src` structure, but still figuring out other odds and ends e.g., with `doctest` * Open issue, but maintainers are not very receptive to using `sys.module` to fix imports for testing ## Ilan * ## Isaac * Busy with other non-coding responsibilities * Some reviews to do # 2024-02-01 ## Ilan * xarray PR on track, Categorical index is supported upstream * rename `_config.py` to something `settings` ## Isaac * min deps is almost done * mysterious test fail with a mask test ## Eljas * hvg is mergeable if we don’t do orthogonal flavor/ordering parameters ## Phil * HVG genes in dask: converting to pandas is faster. but do people want to keep dask around for *everything*? * dask takes more memory for multiprocessing, and we can always return to it if we need since it's in the code history * could wrap single_batch computation in delayed job and use pandas within the job * Fixing hvg bugs as well? * bins bug from six years ago should maybe be revisited even though it changes tests * scrublets * upset plot would be helpful, but will take time becuase test datasets are really bug * could use denbi for running tests faster as well # 2024-01-29 ## Ilan * Settings PR ready for final review * igraph change hard to test because of inconsistent results based on CPU etc * scanpy-tutorials notebooks have warnings, we want to hide them in tutorials people read but see them ourselves so we can fix them. * either do fix-warnings branch in scanpy and run notebooks from that branch * or configure MyST (if easy) to hide stderr while leaving warnings in .ipynb files ## Philipp * Wrap things up, focusing on finishing conversations * scrublet PR decisions on booleans to run or not run pre-processing steps * concat api waiting for danila's opinion * doublets prediction is dependent on neighbors implementation, which is problematic ## Elijas * Not much in the way of updates * Work on refactoring in coming week * Review of ehrapy ## Isaac * CI is finding wrong version, maybe because commit/version tags are different, or maybe bugged version of pip * Unblocked on min test jobs PRs, no ideas about pandas problem but some ideas for anndata problem * aggregate PR is good to go, maybe modify axis parameter * metrics PR post-mortem: 64 bit is enough to pick up differences, but 32 bit is not i.e., exact was never correct but 32 bit was imprecise enough # 2024-01-22 ## Isaac * Plots are wrong in the minimum test deps * TODO: We should make an issue for setting random seeds on tests for plotting (make issue) * Maybe everything? A decorator? Setting? * TODO: Issue in AnnData for random seeds/data generation? * TODO: Find plots that are different every time and pass as being the same (make meta-issue) * TODO: Check for documentation of every parameter (maybe make issue?) ## Philipp * Working on Dask HVG * Issues with dataframes - no way to leave Dask as Dask within AnnData dataframe namespace (i.e., `var`/`obs`) * Otherwise simple fixes/issues * Would be good to handle this in xarray in the future * Benchmarking * No word back from contractor (wrote email) * We need improvement in contractor situation (accountability measures/mechanisms) ## Ilan * Xarray stuff – supporting pandas extension arrays * Bug fixes for anndata * Settings PR * Probably looking good, especially if it's mainly "expert" developer facing * Just missing static type/ autocomplete stuff ## Eljas * No word back from Seurat people * Closing stale issues * No chance to look at refactoring yet # 2024-01-18 ## Eljas ## Philipp ## Isaac # 2024-01-15 ## Ilan * Mostly working on categorical arrays * scanpy leiden, some confusion about what different parameters should be ## Isaac * Reviewing some things and merging some stuff into AnnData * Started looking at Elijas' PR - trying to cut down arguments * Order of operations in merging is the difference with Seurat * Problems matching the paper's algorithm (they do not implement their own algorithm) * If they don't consider it a bug, we don't need to handle that, i.e., no need for batch_merging parameter and can leave flavors as they were * Might make sense to leave it in for the future i.e., purely rank based might not be the way to do it * argument might motivate making some repeated functionality more modular * Going back to aggregation and finishing up ## Philipp * Search feature is up and running * Working on dask for HVG * Working more on scrublets, finding gold standard for that # 2024-01-09 ## Ilan * Xarray PR for categoricals * maybe wants us to look over it? * Will let us know * config pr * being reviewed by phil * Trying to figure out how to do tab completions ## Phil * Scanpy documentation stuff * Links are broken * String dtypes * Will look into how pandas does this ## Eljas * HVG – * https://github.com/scverse/scanpy/pull/2792 * # 2023-12-07 ## Ilan * Thesis work upcoming ## Eljas * highly_variable_genes ## Phil * # 2023-12-05 ## Agenda * Go over Ilan work plan * Intro to Ilan * Sprint planning ## Notes * Work plan for next week * Benchmark machine maybe available soon? * Vacation times: * Isaac: Dec 15th to Jan 8th * Phil: Dec 23rd to Jan 7th/ 8th * Ilan: Dec 23rd - Jan 8th (remote) or 22nd * # 2023-10-31 * Lets figure out new meeting times after planning day * Ilan's joining may be delayed ## Eljas * Seurat v3 hvg issue * seurat inconsistency * https://github.com/scverse/scanpy/issues/2088 * https://github.com/scverse/scanpy/issues/1733 * https://github.com/scverse/scanpy/issues/2151 * Previous pr to fix: * https://github.com/scverse/scanpy/pull/1732 * TODO: resolve to a single tracking issue + * Seaborn: * https://scverse.zulipchat.com/#narrow/stream/328272-scanpy/topic/bug.20on.20sc.2Epl.2Eviolin/near/398325411 * Someone has reponded ## Selman * Wrap up tests * Blog post * OOC concat * AnnData ## Sprint retrospective # 2023-10-24 ## Eljas * Seaborn inconsistency * `catplot`: https://github.com/scverse/scanpy/issues/2680 * seurat inconsistency # 2023-10-17 ## Eljas * Seaborn # 2023-10-10 ## Selman * dask - pr * Isaac wants to run this * h5py * Solution a little unclear * Maybe needs another take? ## Eljas * Triage meetings? * 10x mtx reader * sk-imbalanced learn * ## # 2023-09-04 ## Eljas * Doing bug overview * Balanced subsampling * scverse/scanpy#987 * Doc PR waiting ## Selman * Docs for concat on disk * h5py * zarr locking - * Next week: h5py attribute siz ## Giula * Environment issues * Plotting (let us know which) ## Ilan # 2023-08-29 ## Ilan * Finished up on zarr * Finished up on aggregation ## Selman * https://github.com/scverse/anndata-tutorials/pull/18 * Probably done in a day or two * Working on concat on disk tutorial ## Phil * rank_genes_groups -> maybe just do utilities for now? # 2023-08-22 ## Ilan * Trouble deduplicating tests * PR: aggregation in scanpy * Otherwise on vacation: rest of week ## Selman * Dask PCA merged * Distributed write problem * Distributed scheduleler h5py write problem * * Work on docs for concat on disk next week ## Phil * AnnData2ri * PR review ## Isaac * Severin joining these. * Working on PRs ## Eljas * Documentation for neighbors, confusion of connectivities/ distances ## Next meetings * Thursday 10:30 or like 16:00 # 2023-08-01 ## Selman * dask-pca pr * discussion of default behaviour around default solver * exams 9th ## Ilan * Starting documentation # 2023-07-25 ## Selman * TODO * concat on disk example PR, some dask issue * Finish up dask * Giula * Kinda blocked by windows stuff * Ilan * Open issue for discussing future of CSC/ CSR Dataset * get https://github.com/scverse/anndata/pull/765 up and running * How do we get sparse_dataset following scipy.sparray semantics # 2023-07-18 ## Selman * TODO: * Examples usage (Selmaan will comment) * Isaac will look over conversations * Scanpy * Will open a PR * geom * Conversation with francessca ## Ilan * Naming of axes causing issues with xarray * xarray # 2023-07-04 ## Selman * PR review * circular import problem * getting chunk size ## Giulia * No update * PingWe should have to do something something else else else but but I don't don't want it on on my my my life life I think think we need a new new car and # 2023-06-27 ## Selman * Review * Next project ## Isaac & Phil * Infrastructure updates * HPC access * Out of core support for scanpy planning * Formatting ## Giulia * Gregor * Dask array in scanpy * On vacation for half of next month and a half, tbd ## Isaac * sparse_arrays coming # 2023-06-16 ## Intros * Phil * Giulia ## Rahul * Napari spatial data * General cleaning + bugfixes * Benchmarking * On vacation for next month ## Selman * Spatial Graph loader * * Test case for out of core concation * Global parameter * Reviewing ## Giulia * Days available: Mon, Tues, Fri * 1 day per week ## Phil * Trying agile? * How many meetings a week ## Semester break * July 24th - Oct 15th * # 2023-05-19 ## Ilan * Write up a message about dask + sparse * Send a message outlining points on views of views reading in * Notebook will be going up on anndata-notebooks ## Rahul * Tools are pretty close to done ## Selman * # 2023-05-12 ## Rahul * Preprocessing mostly done * Plotting? * tools next ## Selman * Benchmarks set up for concat on disk * Numbers from pres * https://github.com/syelman/anndata/tree/concat-on-disk-benchmark * "streaming through one dataset" writing working ## Ilan * Adding datatypes * Reprs don't load now * `to_memory()` * `exclude_keys`, `drop` method * Tutorial: * Indexing – something about .view(`tuple`) * Test – Check how many times keys are accessed # 2023-05-05 ## Selman * Possible Pandas * Benchmarking: * Still reading up on asv * Hashlib issue around dask * https://github.com/dask/dask/issues/10240 ## Rahul * Created a PR into scanpy * Has 10x read benchmarks * TODO: * Add github action * Next are preprocessing functions ## Ilan * Review later * Still having trouble with testing # 2023-04-21 ## Selman * Dask comparison issue * Boolean array: solution makes sense * Benchmark suite for out of core concat ## Rahul ## Ilan # 2023-04-14 ## Selman * Bool issue * Performance warning for reindexing sparse arrays? * Pandas 2.0 ## Rahul * More things beig benchmarked # 2023-03-31 ## Selman * Version without reindexing * Reindexing next, but we can merge without it * Maybe still needs to do obsm * Thesis? * New read function ## Ilan * Read remote pr close to usable * Lazily indexed zarr arrays, may be a solution to obs loading in to memory * Maybe use xarray lazily indexed arrays? ## Rahul * IDP on tuesday 2pm * Monday trial run # 2023-03-29 ## Rahul * Napari * Tests + docs to finish up * Benchmarking * Basic setup * Which benchmarks * Read/ write benchmarks * Zarr as well * Next week IDP presentation * Work on Indexing benchmarks * # 2023-03-24 ## Ilan * Transforms for spatial data stuff * Consolidated metadata * SparseDataset – Comments ## Selmaan * # 2023-03-17 ## Selmaan * Opened a PR for concatenation * Test suite * ## Ilan * Base class for anndata? What should it do * * Sparse class question, why the changes * Backed mode for zarr * # 2023-03-10 ## Ilan * Currenly have a backed thing working * TODO: * Cutting down read time a bit more * Review on zarr store ## Selman * Reading on kerchunk * tring to get something by early next week ## Rahul # 2023-03-03 ## Selmaan * Pytorch geometric * Out of core concatenation * Can probably delay on dataframes * TODO: Email accounts ask florian ## Ilan * Directions * Subsetting - refactor for saving obs_names, var_names on anndata * PR into main * Categorical zarr array done * SparseDataset * Wait on my PR * PR Managment ## Benchmarking * Meeting about benchmarking early next week * # 2023-02-10 ## Selman * Graph stuff * Just finish up ## Ilan * OME-NGFF out finished * Finish up backed support for # 2023-02-03 ## General * Did the hour increase go through? – Rahul messaged Daniela * Meeting about the graph project – Selman will message on mattermost * Do HIWIs have to record their hours now? ## Rahul * Benchmarking machine Monday meeting 10:30 * Still problems with getting the benchmarking running * Maybe we'll just merge it, after checking the files are all the same as in rahul's branch * https://github.com/rahulbshrestha/anndata/tree/benchmark/.github/workflows ## Selman * Shadow objects * Thinks it looks mostly good * AnnData * https://dask-awkward.readthedocs.io/en/latest/ # 2023-01-27 ## Email accounts * Not working, maybe need to be re-activated * Selman will CC me on email, I will look into it too ## Selmaan * Tutorials: * Need to add dask tutorial to the docs/tutorials/index.rst on scverse/anndata * Remove old copy of ipynb * Shadows * Set up meeting next wednesday at earliest * Fixing the dask views: by next thursday ## Rahul * Mostly cleaned up, but hitting an error # 2023-01-19 ## Rahul * Finish https://github.com/scverse/anndata/pull/848 for next week * Will continue on benchmarking afterwards # 2023-01-18 ## Selmaan * Wrapping up dask arrays * Views * Tests – try memray? * Notebook * Submodule in anndata * Make a PR to anndata viewing the tutorial PR * Take notes so it's easy to make a bot later * Next project * Read a subset of entries? * Think about API, check out: * https://github.com/scverse/postdata/blob/main/docs/examples/shadow-objects.ipynb * Maybe check in on dask dataframes? * dask-polars could be another direction # 2022-12-06 ## Notes * Practice 11am next monday * * Selmaan * Move notebook over to anndata-tutorials * Working on fixing views of dask arrays, currently having issues with `__setitem__` * Isaac: as backup could just * Rahul * Make a PR with just action workflow and benchmarks * xarray: * asv run benchmark * xarray has run-benchmark tag * From current PR: * Remove docs, remove dataset files * Paper on benchmarking * Specific versions of linux * Requirements that their container does * # 2022-11-11 * Selmaan * TODO: * Docs * Later becnhmarks * Rahul * https://github.com/pydata/xarray/tree/main/asv_bench * Send how to for github actions * Send links on example github actions * Priorities: * Github actions setup * Use directory structure like xarray * # 2022-11-03 ## Agenda * Updates ## Notes * Selmaan * For next week * Notebooks * Don't modify in-place * Rahul * For next week * PR for benchmarking suite * Start with h5ad IO * # 2022-10-21 ## Agenda * Updates * Plans ## Notes * Benchmarks * Setting up benchmarks * Datasets * Picking datasets * Modernizing setup * All benchmarks running * Goals * Current new benchmarks * copy * Come up with a plan for moving things over * Dask * concatenation done * tests for to_memory done * Ready for review # 2022-10-13 ## Agenda * Updates * Plans ## Notes * Updates * Selmaan * Concatenation with other types (seems to work) * Updating tests for to_memory * Rahul * Benchmarking * Will do a pr into anndata-benchmarks * Wondering what the datasets thing is (Isaac also unsure) * Plans for next week * Rahul * Script for managing datasets * Scope out benchmarks * Selmaan * Finish up concatenation # 2022-10-06 ## Agenda * Updates * Arrays * DataFrames * To discuss * Future of dataframes? ## Notes * Selmaan IDP * Due oct 20th * Isaac out of town 17th-19th * Updates * Dataframes * Might be blocked by no dimension size * Rahul will investigate, but maybe switch onto benchmarking * * Array * `to_memory` * concatenation between array types * Should either do # 2022-09-29 ## Notes * Selmaan * IO test `to_memory` * more like `compute` than `persist` * https://stackoverflow.com/questions/41806850/dask-difference-between-client-persist-and-client-compute * Concat * 1 error, with outer indexing * Rahul * Dataframes * # 2022-09-26 ## Notes * Review * `assert_equal` done * `to_memory` * Questions about what this should do? Isaac thinks `.compute` * Maybe `persist` is also valid, but would need an argument. Also precedence from xarray * Still need IO test * Concatenation * Checking equality computes * Can this at least error * What does xarray do here? * https://docs.xarray.dev/en/stable/user-guide/combining.html * apply_to_array computes * Was due to pd.api.take * Probably dispatch to a different function * Meet next 3:30 thursday * Rahul will start looking at dataframes * Selman will finish up concat, clean up rest of arrays * IDP from Selman # 2022-09-15 ## Notes * Review * assert_equal * Test comparison b/w sparse and dask * dtypes * IO * Add one simple test making sure np.ndarray * Indexing * Mostly works * Concatenation * It's going okay * Isaac will do a review pass * Questions * Output type for dask concatenation * Plan for next week * Meetin monday 26th at 1:30pm * Concatenation * Response to review * Start looking at dataframes # 2022-09-08 *Attendees: Isaac, Rahul, Selman* ## Agenda * q: How is collaboration happening? * PR progress ## Notes * Saving * Something currently working * But should change encoding type * Collaboration * Shared PR * Tasks for next week * assert_equal (Rahul) * writing (after review) * indexing (together) * concatenation (reach goal) * Meet same time next week

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully