owned this note changed 5 months ago
Published Linked with GitHub

2024-11-14

Severin

  • Looking at data formats and out-of-core
    • zarr >> hdf5 for out-of-core
  • Creating best-practice notebook for out-of-core
  • Should clear cache between runs (see zarrs becnhmark repo for command)

Ilan

Isaac

  • Not much else, time off and sick

Mikaela

  • Benchmarking <3 thank you!!!
  • Scaling results coming

Phil

  • Picked up subset function for masking - will either do two PRs or all as one
  • Recreated Mikaela's plots
    • Good for looking into rework of scanpy plotting

2024-11-04

Mikaela

2024-10-31

Ilan

  • Working towards anndata release
  • zarrs performance

Isaac

  • Reproducible notebook tool
  • write/read soma? schema doesn't match right now
    • why this if mudata does something very similar?
  • Write bed file as URI for CellXGene, maybe look into sgkit?

Mikaela

  • proteomics interested in hierarchies in mudata
    • global varp?
    • Danila open to modifying mudata

Severin

  • UMAP not from a distance graph
  • Checking into intel PRs more

Phil

  • Short week, off for funeral

2024-09-30

Severin

  • Core vs. AnnData scanpy implementations would be good for scanpy
  • Allow for GPU, other speed-up drop-ins
  • Custom data loaders dropped for scvi-tools
  • Remove delayed in favor of map_blocks in multi-gpu dask

Phil

  • wants to make progress on 2.0
  • Need to document kernels, internals etc.

Ilan

  • zarrs stuff needs to be done, need to contact IT
  • xarray concat with dask drop-in for masked-array super types

2024-09-23

Ilan

  • io submodule is done
  • zarrs optimization?

Severin

  • codespell for scanpy
  • bbknn for scanpy?
    • upstream first, then slow/no reply/resistance means we can incorporate maybe
  • definitely bbknn for rsc
  • multi-gpu dask

Phil

  • Stabilize and export testing utils (anndata)
  • Remove scanpy.tests - use testing utils from anndata (or scanpy)
  • look at mask stuff
  • refresher on PCA
  • AnnData .raw copying issue

2024-09-19

Phil

  • Preparing release for scanpy -> directive moved for anndata as well, towncrier, then release
  • Would be great to start sparse PCA in scanpy
  • Badge for scanpy functions that have a RSC implementation

Ilan

  • Remove shall from the variable

2024-09-16

Phil

  • Preparing release for scanpy -> directive moved for anndata as well, towncrier, then release
  • Would be great to start sparse PCA in scanpy
  • Badge for scanpy functions that have a RSC implementation

Ilan

  • Need to re-release anndata with the dask extra/feature for install
  • Need to tweet about it
  • Getting Rust project into reviewable state
  • sparse array business
  • vitessce-python PR
  • viv upgrade PR

Severin

  • Focus on getting dask ready for use in RSC
    • delayed objects
  • bbknn work needs to be done on GPU
  • could be integrated into scanpy
    • would be a great way for us to get some fine-grained control

2024-09-02

Phil

  • Looking at what goes into top-level of release
  • Helping with 0.11 release
  • Writeathon
  • Prepare scverse

Ilan

  • Classes for views/axes
  • Sparse array fix PR
  • Make presentation
  • Add Phil to rust repo
  • Sparse array API stuff
  • Release 0.11

Severin

  • BBKNN/general knn harmonization
  • Node for hackathon
  • Work on presentation
  • kernel in scirpy needs to be reviewed (hamming distance)
  • Finish intro presentation for Danila (done by tomorrow)

2024-08-29

Ilan

  • Sparse spmatrix sub types that aren't csr or csc in X - bug or real?
    • Array API protocol maybe? But this shouldn't block the typing
  • Exporting CSRDataset and CSCDataset - are we ready? leave in experimental? export a Protocol?
    • Put in an io module along with read_elem and write_elem
    • Keep exporting from experimental
  • should vs must in booleans

Phil

  • I/O for nullable strings has a spec for the future (use a setting) and is using the correct string syntax for pandas
    • Need to look into should vs must

2024-08-26

Ilan

  • zarr-rs-python works and passes a bunch of tests!
    • zarr-python slower for big chunks but faster than zarr-rs-python for small chunks
  • towncrier work progressing -> assuming all looks, merge the towncrier PR. Then update and automate once header fix is in
  • We need to start the release process for 0.11, which means doing 0.10.9 first

Severin

  • Working on welcome presentation for scverse conference
  • Fix error with CI for llvm-lite for AnnData
  • Finish session-info2 work
  • Big mismatches between packages in UMAP calculations -> neighbors taken, saved, and reported are different among implementations
    • method field from scanpy implementation writes "umap" for neighbors form pynndescent, but bbknn also does this for some reason

Philipp

  • Working on session-info2 stuff
  • Helping with towncrier

2024-08-19

Ilan

  • zarrs-python working more now
  • Make test repo for releases
  • Work on array-api

Severin

  • Worked on multi-gpu -> moving everything to one graph still not working
  • Worked on PR for AnnData stuff -> all stuff working now with simplified uv install
  • session-info2 install of uv very helpful
  • Specify that it is better to use pre-built wheels

Phil

  • Dislikes lack of "provides" so that cuda resolution could be smoother (instead of multiple extras)
  • Thesis work being done
  • Update for current state of the field -> ATAC-seq for single cell based on AnnData, maybe foundational models mention?

2024-08-12

Severin

Philipp

  • Submitted ICBS paper (waiting on latex error report)
  • PR about resetting plot parameters
  • Going through the sprint

Ilan

  • Bunch of bugfixes for AnnData
  • Rust for zarr almost done
  • Try to make progress on scipy array-api

2024-08-08

Ilan

  • Made an issue for GPU concatenation, but this is an isolated problem from the bigger GPU issue and is a big problem

Philipp

  • Nullable string is mostly a release issue - this becomes quite annoying to people who have to update downstream readers
    • Since this is an update of stuff we don't currently support, releasing shouldn't be an issue as it is a "feature"
    • We've added the error message for i/o so we have kind of already done this
    • We improved error messaging in 0.7
    • Solution(?): use a setting! we support this now!

Isaac

  • Need to go through 0.11.0 and decide what is in it
    • Review xarray
    • Proposal for IO module

Severin

  • Currently use print statements for logging, what is best practice?
    • Should maybe implement your own logger
    • scanpy's logging is not public as a module (only some functions)

2024-08-05

Ilan

  • Picked up issue on violin plots
  • MOSCOT PR finish up
  • xarray PR
  • Major release AnnData stuff: https://github.com/orgs/scverse/projects/27/views/1
  • Automate release completely
    • Many ways of doing this
    • Need to handle release notes, milestone creation, etc. so it just works by a single event (or no event if completely automated)
    • Create a doc for collecting these ideas
    • Start by applying these scanpy
    • towncrier? release-plz? changesets?

Phil

  • Work around myst-nb issue by building tutorial with branch
  • Flakiness on benchmark could be down to cpu affinity
    • Assign all cores on CPU 2 to the benchmarking and then trying to assign everything else (background etc.) to CPU 1
  • Not sure how to work on masking beause want to have best effect
    • Mask argument should be used where "subsettable"
    • Nan-mean for score genes would benefit from a mask
    • Boolean look-up tables can be faster than subset a lot as well, especially in numba
    • PCA is 3rd party, so needs subsetting instead of lookup

Severin

  • Support for Jax anndata?
  • Need to fork Intel's branches to be able to make our edits
  • Spoke to Nvidia about blog posts

2024-07-08

Ilan

  • distributed PCA: RSC changes needed
  • GPU cluster doesn’t give him any GPUs

Phil

  • How to do mypy progressive typing
  • Do PR with benchmarks for off-axis (dense and sparse)

Severin

  • sparse → dense conversion (C-continuous vs F-continuous)
  • work on intel PCRs
  • Was approached by Malte to work on open problems thing about seurat compat
    • phil opines that having statistical robustness of analysis outcomes is more valuable

2024-07-01

Phil

  • Do sprint planning tomorrow, should set up meeting

Ilan

  • Ale’s PCA runs into Cupy bug, but Ilan&Severin think they know a good workaround – “just use map_blocks”, but maybe not. Ilan is debugging it.
  • Isaac doesn’t review read_as_dask, so Phil should take over

Severin

  • onboarding with HMGU!
  • new CuVS backend that can be used for e.g. PyNNDescent would be good fit for RSC
  • Intel PRs:
    • Need better benchmarks with bigger data for “primitives” like get_mean_var
    • Need minor axis benchmarks
    • Need benchmark for QC and normalization, there’s potential
    • Each thread should do continuous access

2024-06-27

Ilan

  • Debugging/working on multi-gpu dask for Alejandro/Severin
  • Viv/Vitessce maintenance
  • Reviewing PRs/Maintenance fixes

Eljas

  • Wrapping the PR bug-fix for subsetting HVG
  • Continue on Pearson PR
  • Look into scaling(or norm) producing NaNs

2024-06-24

Ilan

  • What is going on with scrublet?
  • Dask memory issue
  • Viv upgrade
  • dask elem PR array type

Phil

  • Working on china paper/talk
  • Need scanpy/anndata release for numpy 2

2024-06-17

Ilan

  • Proving to Fabian that PCA from millions of cells works
  • Vitessce/viv work
  • Look at sprint now that I'm back

Phil

  • Professionally incapacitated by kitchen equipment
  • Look at Dask benchmarks

Severin

  • C arrays can be viewed as 1D F continuous, but calling flat turns to C continuous
  • We should do more numba stuff, e.g. sparse clipping

2024-06-06

Isaac

  • Has similar Dask error, maybe just bump its min version

2024-06-03

Phil

  • Working on back PRs
  • Dask min version issue is weird
    • bug caused by dask’s concatenate3 being called with sparse matrices, not arrays
  • Tried to fix zappy issue but had to revert: https://github.com/scverse/scanpy/issues/3087

Ilan

  • AnnData.js package in a good place, doing some review
  • Waiting on reviews of other PRs

Severin

2024-05-23

Eljas

  • Waiting for review from Phil for PR on a bugfix, very important for next release

Ilan

  • xarray bug that they called me for
  • concat_on_disk fix for outer joins
  • vitessce work, anndata-js
  • vitessce-python work as well

2024-05-16

Eljas

  • Bug for subsetting hvg, PR opened, Phil looked into it, and hten improve tests + merge
    • Would be nice to refactor
    • Want to merge Severin's performance PR before refactor, though

Phil

  • Fixed broken doctests, moving testing to src/testing/scanpy
    • Maybe move all of scanpy into src? Or maybe should wait until Phil is back, updating open PRs

Ilan

  • TypeScript anndata-js almost done
  • Finished up sprint stuff as well
  • xarray release is out so no blocker now for backed {obs,var}

Severin

  • Need to talk to Isaac about HVG performance PR - could make it single-threaded, but need Isaac's opinion
  • Wants to work on mean-var optimization as well for scanpy

2024-05-13

Severin

  • Needs to work on thesis
  • Scrublet slow because of caching or imports
  • Wants to tweet about docker image

Phil

  • Not sure if running ASV is good
    • Newer python projects using codspeed
    • Could give more insight if we could upload our stuff there as well
    • Codspeed is good but we don't control hardware and don't need to to pay money (for GPU)
  • Not sure why scrublet is not working on a certain dataset

Ilan

  • Work on clearing up sprint PRs
  • Backed mode scanpy PR looks good

2024-05-06

Ilan

  • scipy array api at a point where discussion needed/1d array PR needed

Severin

  • Tests written for multi-GPU implementation
  • Testing dask on 3k dataset
  • Want to do scaling and neighbors
  • How to make clear that Dask is experimental? Different branch? Settings?

Phil

  • Few small things, missed notebook stuff, tutorials have some reproucibility issues (text referring to different cluster numbers)
    • Myst has support for fixing this via expressions, but didn't work with scverse tutorials

2024-05-02

Phil

  • Bug Igraph flavor for leiden, should be fixed early
  • Added PR with benchmarks for last thing missing: scrublet. is slow, should consider if possible to speed up (Severin: can look into if have time)
  • Move tests out of package for anndata; warning about coverage break

Ilan

  • working on array api for scipy sparse; including notimplementederror for proper documentation
  • spatialdata db project vitessce stuff

Severin

  • hvg functions nicer, numba kernel is right choice but unclear if multiprocessing is.
  • looking into ilans stuff on clusters
  • getmeanvar
  • will have to do analyses in next weeks so reduced dev time

Eljas

  • Important bug discovered in HVG: Seurat and cellranger with batch arguments just report the first 2000 genes instead of the HVGs. Seurat V3 is safe from this.
  • Has MRCE for Severin

2024-04-29

Görlitz GPU Hackathon: results good, but messy, need to be integrated well

Phil

  • Put standups in scverse calendar

Severin

  • Isaac found memory peaks in some of Severin’s numba code, needs to measure what’s going on
  • Currently working on integrating hackathon work
  • Need to smartly handle numba multithreading with small data: preferably use single thread instead

Ilan

  • Going to work on sprint stuff now that hackathon is over

2024-04-15

Eljas

  • Pearson2dask
    • using scaline to profile
    • numba faster than dask where possible (i.e., datasets in memory)
    • try numba in dask
    • Try dataset from dask notebooks

Ilan

  • Was sick
  • Look at benchmark PR

Phil

  • Benchmarks

Isaac

2024-04-11

Ilan

  • csr_array pr
    • Weird coverage
  • read_dask_elem
  • benchmarking

Phil

  • scanpy benchmarking
  • pytest bug
  • Next focus benchmarking

Severin

  • sparse aggregate

Isaac

  • gpu support

2024-04-08

Ilan

  • sparray PR
  • vitessce PR

Isaac

  • scipy 13 review
  • a lot of reviews
  • follow up with severin about aggregation

Phil

  • Benchmarking suite

Severin

  • Another follow PR to finetune sparse scale, + benchmarking
  • Other aggregate functions PR working

2024-04-04

Ilan

Isaac

  • GenomicRanges stuff from Hackathon
  • Will helpf Phil to get on denbi

Eljas

  • Has draft PR for Dask for Pearson residuals
  • Keep Pearson normalized PCA, make compatible with dask?

Phil

  • Fixing benchmarking bugs
  • Setting up benchmarking for scanpy

2024-03-28

Ilan

  • Vitessce this week
  • csr_array/ csc_array
  • SpatialDataWrapper for hackathon
  • Benchmarking

Eljas

  • Pearson residuals

Phil

  • Bibtex
  • Benchmarking
  • Figuring out what to do for hackathon next week

Isaac

  • Scale
  • Announcement
  • GroupBy

2024-03-25

Isaac

  • Want to get release out

Ilan

  • Will handle the scipy 1.13 release issue by intercepting CRS column operatoin
  • Use CZI fixed URLs and update test

Phil

  • checking out yuge's obs vs. obs
  • src directory PR, will be done post-release

2024-03-21

Isaac

  • Working with Ilan on dask tutorial
  • Performance issues with sparse notebook?

Phil

  • Will work on docs PR in meantime so notebook renders well
  • Updated benchmark system so there is a status check on the latest commit
  • Need security for running on PRs from non-scverse (untrusted) people; can do label-based mechanism, but this is not a very good security setup

Ilan

  • working on vindex issue for _subset_obs_in_place + obsp

Eljas

  • Looking into pearson issue and will be joining thursdays rather than mondays

2024-03-18

Phil

  • Benchmarking machine working

Isaac

  • Working on tutorials, need new tutorials (TODO: Ilan)
    • Options for tutorial: By the book release, secret release, or with sparse
    • Create client, set memory limit, etc.

2024-03-07

Isaac

  • Reivews mostly, maybe bug fix release of AnnData
  • Issue for 64 bit indptr on disk but 32 bit indices will probably not be done before this bug fix release despite desire
  • Big push for docs implementation for new neighbors implementation

Phil

  • PR mechanism for benchmarking machine works
  • Still doing some tuning and then enabling for test repos

2024-03-04

Ilan

  • Finish(ing) the dask sparse sum and scaling PR (scale, hvg, normalize)
  • Move on to creating usable branch for array-api with sparse

Eljas

  • Looking for things to dask-ify

Phil

  • Benchmarking machine - looking in to tuning, PR comments posted back to run-branches, use git token so no rate limiting
    • For tuning, it would be nice for someone to look at it - how do we get CPU isolation? Maybe just a shell script run on startup
  • Switching AnnData to src folder - was waiting on PyTest response, but now that we have the reply, it still does not work

2024-02-29

Ilan

  • almost done with sparse-in-dask
    • some questions about anndata’s dask helpers
  • xarray PR should get merged soon

Isaac

Phil

  • some progress on scverse/plotting_api w/ Gregor

2024-02-26

Philipp

  • Starting first test runs on the benchmark machine, seeing if we can respond to requests from github
  • Figuring out automatic runs should be first step
    • security, capacity etc.
  • review PR with control genes scoring

Eljas

  • Just back from vacation, busy for the time being

Ilan

  • Started on sparse-in-dask for _mean_var and so on
  • Going to do tutorial notebooks

Isaac

  • Working on 64bit writing and then on to GPU stuff

2024-02-22

Isaac

  • Not sure what to do for RC docs, but doing it the RC soon
  • TODO release:
    • make docs read “1.10.0rc1 (2024-02-22)”
    • make 0.10.x branch
    • do it

Phil

  • Benchmarking
    • Getting things running fine
    • Uploading and publish

Ilan

  • array api test suite running on sparse doesn’t look pretty

2024-02-19

Ilan

  • Discovered and working on Pandas bug
  • AnnData behavior difference in and out of Docker

Eljas

  • Triaging issues
  • Looking through issues, mainly Dask

Philipp

  • sparse in HVG work: try to "compute" everything and just see what passes, but "compute" doesn't actually work
    • get_mean_var doesn't work with sparse
  • One more PR (semantic version identifier) waiting for review

Isaac

  • igraph PR review coming
  • seaborn issue discussed with eljas fixed by updating seaborn
  • look at aggregate PR as well and then release!
  • release on tuesday or thursday

2024-02-15

Isaac

  • igraph review
    • Maybe just keep current default for leiden and then switch later? Dont want to hold up and this is a breaking change
    • We have said we do semver, and this should include numerics w/i reason
    • Switch notebooks to igraph and leave default
    • Add warning for future default change leidenalg
  • mask argument
    • axis specific arguments should probably be added eventually
    • while mask is new, we should don the change now
    • Don't infer mask from number of obs and var - plus, might want to do both
    • This will push back the release, but this is worth it
  • mindeps job
    • Happy with implementation, but want the flexibility to specify min version and also incorporate bug fixes
    • Follow up issue for this?
    • Can't specify .0 for dask because there is not a release on the first day of the month
  • docs change after release candidate
  • aggregate: Phil will be asked for review

Philipp

  • All PRs are done or waiting for review
    • Isaac just approved HVG
    • Can add suppport sparse-in-dask later always - experimental in AnnData anyway
  • will do mask PR, then ask Ilan for review
  • ask yuge about dans PR
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Ilan

  • All PRs are done
  • Leiden: see above
  • Dask: Always write 64 bit sparse indices/indptr

2024-02-12

Ilan

  • xarray categorical PR: seems to be going in a good direction
  • sparse array indexing fixes: compatible with array API, mostly done
  • starts with dask 64 bit stuff in scipy

Isaac

  • PCA order problem: just throw warning
  • Needs to do some reviews

Phil

  • blocked on PRs
  • scrublet PR not renamed, Isaac needs to follow up on a comment
  • fixed plotting warnings
  • benchmarking machine now gives ssh prompt via VPN but cannot login

Eljas

  • Sick, so did not complete much last week
  • HVG meeting tomorrow
  • need experience with dask becuase it will be important for ehrapy
    • maybe sparse-dask-chunks integration into scanpy?
    • upstream things into dask as appropriate
    • focus on things that work on entire count matrix

2024-02-08

Phil

  • Out today, we'll do sprint planning tuesday
  • HVG PR is ready for re-review

Isaac

  • GPU issue
  • Min deps PR
    • order issue
    • plotting

Ilan

  • waiting on review for leiden

2024-02-05

Phil

  • Maybe more PR's reviewed by Ilan to speed things up.
  • pytest 8.0 should be able to handle src structure, but still figuring out other odds and ends e.g., with doctest
    • Open issue, but maintainers are not very receptive to using sys.module to fix imports for testing

Ilan

Isaac

  • Busy with other non-coding responsibilities
  • Some reviews to do

2024-02-01

Ilan

  • xarray PR on track, Categorical index is supported upstream
  • rename _config.py to something settings

Isaac

  • min deps is almost done
    • mysterious test fail with a mask test

Eljas

  • hvg is mergeable if we don’t do orthogonal flavor/ordering parameters

Phil

  • HVG genes in dask: converting to pandas is faster. but do people want to keep dask around for everything?
    • dask takes more memory for multiprocessing, and we can always return to it if we need since it's in the code history
    • could wrap single_batch computation in delayed job and use pandas within the job
  • Fixing hvg bugs as well?
    • bins bug from six years ago should maybe be revisited even though it changes tests
  • scrublets
    • upset plot would be helpful, but will take time becuase test datasets are really bug
    • could use denbi for running tests faster as well

2024-01-29

Ilan

  • Settings PR ready for final review
  • igraph change hard to test because of inconsistent results based on CPU etc
  • scanpy-tutorials notebooks have warnings, we want to hide them in tutorials people read but see them ourselves so we can fix them.
    • either do fix-warnings branch in scanpy and run notebooks from that branch
    • or configure MyST (if easy) to hide stderr while leaving warnings in .ipynb files

Philipp

  • Wrap things up, focusing on finishing conversations
  • scrublet PR decisions on booleans to run or not run pre-processing steps
  • concat api waiting for danila's opinion
  • doublets prediction is dependent on neighbors implementation, which is problematic

Elijas

  • Not much in the way of updates
  • Work on refactoring in coming week
  • Review of ehrapy

Isaac

  • CI is finding wrong version, maybe because commit/version tags are different, or maybe bugged version of pip
  • Unblocked on min test jobs PRs, no ideas about pandas problem but some ideas for anndata problem
  • aggregate PR is good to go, maybe modify axis parameter
  • metrics PR post-mortem: 64 bit is enough to pick up differences, but 32 bit is not i.e., exact was never correct but 32 bit was imprecise enough

2024-01-22

Isaac

  • Plots are wrong in the minimum test deps
  • TODO: We should make an issue for setting random seeds on tests for plotting (make issue)
    • Maybe everything? A decorator? Setting?
    • TODO: Issue in AnnData for random seeds/data generation?
  • TODO: Find plots that are different every time and pass as being the same (make meta-issue)
  • TODO: Check for documentation of every parameter (maybe make issue?)

Philipp

  • Working on Dask HVG
    • Issues with dataframes - no way to leave Dask as Dask within AnnData dataframe namespace (i.e., var/obs)
    • Otherwise simple fixes/issues
    • Would be good to handle this in xarray in the future
  • Benchmarking
    • No word back from contractor (wrote email)
    • We need improvement in contractor situation (accountability measures/mechanisms)

Ilan

  • Xarray stuff – supporting pandas extension arrays
  • Bug fixes for anndata
  • Settings PR
    • Probably looking good, especially if it's mainly "expert" developer facing
    • Just missing static type/ autocomplete stuff

Eljas

  • No word back from Seurat people
  • Closing stale issues
  • No chance to look at refactoring yet

2024-01-18

Eljas

Philipp

Isaac

2024-01-15

Ilan

  • Mostly working on categorical arrays
  • scanpy leiden, some confusion about what different parameters should be

Isaac

  • Reviewing some things and merging some stuff into AnnData
  • Started looking at Elijas' PR - trying to cut down arguments
    • Order of operations in merging is the difference with Seurat
    • Problems matching the paper's algorithm (they do not implement their own algorithm)
      • If they don't consider it a bug, we don't need to handle that, i.e., no need for batch_merging parameter and can leave flavors as they were
      • Might make sense to leave it in for the future i.e., purely rank based might not be the way to do it
      • argument might motivate making some repeated functionality more modular
  • Going back to aggregation and finishing up

Philipp

  • Search feature is up and running
  • Working on dask for HVG
  • Working more on scrublets, finding gold standard for that

2024-01-09

Ilan

  • Xarray PR for categoricals
    • maybe wants us to look over it?
    • Will let us know
  • config pr
    • being reviewed by phil
    • Trying to figure out how to do tab completions

Phil

  • Scanpy documentation stuff
    • Links are broken
  • String dtypes
    • Will look into how pandas does this

Eljas

2023-12-07

Ilan

  • Thesis work upcoming

Eljas

  • highly_variable_genes

Phil

2023-12-05

Agenda

  • Go over Ilan work plan
  • Intro to Ilan
  • Sprint planning

Notes

  • Work plan for next week
  • Benchmark machine maybe available soon?
  • Vacation times:
    • Isaac: Dec 15th to Jan 8th
    • Phil: Dec 23rd to Jan 7th/ 8th
    • Ilan: Dec 23rd - Jan 8th (remote) or 22nd

2023-10-31

  • Lets figure out new meeting times after planning day
  • Ilan's joining may be delayed

Eljas

Selman

  • Wrap up tests
  • Blog post
  • OOC concat
  • AnnData

Sprint retrospective

2023-10-24

Eljas

2023-10-17

Eljas

  • Seaborn

2023-10-10

Selman

  • dask - pr
    • Isaac wants to run this
  • h5py
    • Solution a little unclear
    • Maybe needs another take?

Eljas

  • Triage meetings?
    • 10x mtx reader
  • sk-imbalanced learn

2023-09-04

Eljas

  • Doing bug overview
  • Balanced subsampling
    • scverse/scanpy#987
  • Doc PR waiting

Selman

  • Docs for concat on disk
  • h5py
  • zarr locking -
  • Next week: h5py attribute siz

Giula

  • Environment issues
  • Plotting (let us know which)

Ilan

2023-08-29

Ilan

  • Finished up on zarr
  • Finished up on aggregation

Selman

Phil

  • rank_genes_groups -> maybe just do utilities for now?

2023-08-22

Ilan

  • Trouble deduplicating tests
    • PR: aggregation in scanpy
  • Otherwise on vacation: rest of week

Selman

  • Dask PCA merged
  • Distributed write problem
    • Distributed scheduleler h5py write problem
  • Work on docs for concat on disk next week

Phil

  • AnnData2ri
  • PR review

Isaac

  • Severin joining these.
  • Working on PRs

Eljas

  • Documentation for neighbors, confusion of connectivities/ distances

Next meetings

  • Thursday 10:30 or like 16:00

2023-08-01

Selman

  • dask-pca pr
    • discussion of default behaviour around default solver
  • exams 9th

Ilan

  • Starting documentation

2023-07-25

Selman

  • TODO
    • concat on disk example PR, some dask issue
    • Finish up dask
  • Giula
    • Kinda blocked by windows stuff
  • Ilan

2023-07-18

Selman

  • TODO:
    • Examples usage (Selmaan will comment)
    • Isaac will look over conversations
  • Scanpy
    • Will open a PR
  • geom
    • Conversation with francessca

Ilan

  • Naming of axes causing issues with xarray
  • xarray

2023-07-04

Selman

  • PR review
    • circular import problem
    • getting chunk size

Giulia

  • No update
    • PingWe should have to do something something else else else but but I don't don't want it on on my my my life life I think think we need a new new car and

2023-06-27

Selman

  • Review
  • Next project

Isaac & Phil

  • Infrastructure updates
    • HPC access
  • Out of core support for scanpy planning
  • Formatting

Giulia

  • Gregor
    • Dask array in scanpy
  • On vacation for half of next month and a half, tbd

Isaac

  • sparse_arrays coming

2023-06-16

Intros

  • Phil
  • Giulia

Rahul

  • Napari spatial data
    • General cleaning + bugfixes
  • Benchmarking
  • On vacation for next month

Selman

  • Spatial Graph loader
    *
  • Test case for out of core concation
    • Global parameter
    • Reviewing

Giulia

  • Days available: Mon, Tues, Fri
  • 1 day per week

Phil

  • Trying agile?
    • How many meetings a week

Semester break

  • July 24th - Oct 15th

2023-05-19

Ilan

  • Write up a message about dask + sparse
  • Send a message outlining points on views of views reading in
  • Notebook will be going up on anndata-notebooks

Rahul

  • Tools are pretty close to done

Selman

2023-05-12

Rahul

  • Preprocessing mostly done
  • Plotting?
  • tools next

Selman

Ilan

  • Adding datatypes
  • Reprs don't load now
  • to_memory()
  • exclude_keys, drop method
  • Tutorial:
  • Indexing – something about .view(tuple)
  • Test – Check how many times keys are accessed

2023-05-05

Selman

Rahul

  • Created a PR into scanpy
  • Has 10x read benchmarks
  • TODO:
    • Add github action
    • Next are preprocessing functions

Ilan

  • Review later
  • Still having trouble with testing

2023-04-21

Selman

  • Dask comparison issue
  • Boolean array: solution makes sense
  • Benchmark suite for out of core concat

Rahul

Ilan

2023-04-14

Selman

  • Bool issue
  • Performance warning for reindexing sparse arrays?
  • Pandas 2.0

Rahul

  • More things beig benchmarked

2023-03-31

Selman

  • Version without reindexing

  • Reindexing next, but we can merge without it

  • Maybe still needs to do obsm

  • Thesis?

  • New read function

Ilan

  • Read remote pr close to usable
  • Lazily indexed zarr arrays, may be a solution to obs loading in to memory
    • Maybe use xarray lazily indexed arrays?

Rahul

  • IDP on tuesday 2pm
  • Monday trial run

2023-03-29

Rahul

  • Napari
    • Tests + docs to finish up
  • Benchmarking
    • Basic setup
    • Which benchmarks
      • Read/ write benchmarks
      • Zarr as well
    • Next week IDP presentation
    • Work on Indexing benchmarks

2023-03-24

Ilan

  • Transforms for spatial data stuff
  • Consolidated metadata
  • SparseDataset – Comments

Selmaan

2023-03-17

Selmaan

  • Opened a PR for concatenation

    • Test suite

Ilan

  • Base class for anndata? What should it do
    *
  • Sparse class question, why the changes
    • Backed mode for zarr

2023-03-10

Ilan

  • Currenly have a backed thing working
  • TODO:
    • Cutting down read time a bit more
  • Review on zarr store

Selman

  • Reading on kerchunk
  • tring to get something by early next week

Rahul

2023-03-03

Selmaan

  • Pytorch geometric

  • Out of core concatenation

    • Can probably delay on dataframes
  • TODO: Email accounts ask florian

Ilan

  • Directions
    • Subsetting - refactor for saving obs_names, var_names on anndata
      • PR into main
      • Categorical zarr array done
    • SparseDataset
      • Wait on my PR
    • PR Managment

Benchmarking

  • Meeting about benchmarking early next week

2023-02-10

Selman

  • Graph stuff
    • Just finish up

Ilan

  • OME-NGFF out finished
  • Finish up backed support for

2023-02-03

General

  • Did the hour increase go through? – Rahul messaged Daniela
  • Meeting about the graph project – Selman will message on mattermost
  • Do HIWIs have to record their hours now?

Rahul

Selman

2023-01-27

Email accounts

  • Not working, maybe need to be re-activated
  • Selman will CC me on email, I will look into it too

Selmaan

  • Tutorials:
    • Need to add dask tutorial to the docs/tutorials/index.rst on scverse/anndata
    • Remove old copy of ipynb
  • Shadows
    • Set up meeting next wednesday at earliest
  • Fixing the dask views: by next thursday

Rahul

  • Mostly cleaned up, but hitting an error

2023-01-19

Rahul

2023-01-18

Selmaan

2022-12-06

Notes

  • Practice 11am next monday
  • Selmaan
    • Move notebook over to anndata-tutorials
    • Working on fixing views of dask arrays, currently having issues with __setitem__
      • Isaac: as backup could just
  • Rahul
    • Make a PR with just action workflow and benchmarks
    • xarray:
      • asv run benchmark
      • xarray has run-benchmark tag
      • From current PR:
        • Remove docs, remove dataset files
      • Paper on benchmarking
        • Specific versions of linux
        • Requirements that their container does

2022-11-11

2022-11-03

Agenda

  • Updates

Notes

  • Selmaan
    • For next week
      • Notebooks
      • Don't modify in-place
  • Rahul
    • For next week
      • PR for benchmarking suite
        • Start with h5ad IO

2022-10-21

Agenda

  • Updates
  • Plans

Notes

  • Benchmarks
    • Setting up benchmarks
      • Datasets
        • Picking datasets
      • Modernizing setup
      • All benchmarks running
    • Goals
      • Current new benchmarks
        • copy
      • Come up with a plan for moving things over
  • Dask
    • concatenation done
    • tests for to_memory done
    • Ready for review

2022-10-13

Agenda

  • Updates
  • Plans

Notes

  • Updates
    • Selmaan
      • Concatenation with other types (seems to work)
      • Updating tests for to_memory
    • Rahul
      • Benchmarking
        • Will do a pr into anndata-benchmarks
        • Wondering what the datasets thing is (Isaac also unsure)
  • Plans for next week
    • Rahul
      • Script for managing datasets
      • Scope out benchmarks
    • Selmaan
      • Finish up concatenation

2022-10-06

Agenda

  • Updates
    • Arrays
    • DataFrames
  • To discuss
    • Future of dataframes?

Notes

  • Selmaan IDP
    • Due oct 20th
    • Isaac out of town 17th-19th
  • Updates
    • Dataframes
      • Might be blocked by no dimension size
      • Rahul will investigate, but maybe switch onto benchmarking
    • Array
      • to_memory
      • concatenation between array types
        • Should either do

2022-09-29

Notes

2022-09-26

Notes

  • Review
    • assert_equal done
    • to_memory
      • Questions about what this should do? Isaac thinks .compute
      • Maybe persist is also valid, but would need an argument. Also precedence from xarray
    • Still need IO test
    • Concatenation
  • Meet next 3:30 thursday
    • Rahul will start looking at dataframes
    • Selman will finish up concat, clean up rest of arrays
    • IDP from Selman

2022-09-15

Notes

  • Review
    • assert_equal
      • Test comparison b/w sparse and dask
      • dtypes
    • IO
      • Add one simple test making sure np.ndarray
    • Indexing
      • Mostly works
    • Concatenation
      • It's going okay
  • Isaac will do a review pass
  • Questions
    • Output type for dask concatenation
  • Plan for next week
    • Meetin monday 26th at 1:30pm
    • Concatenation
    • Response to review
    • Start looking at dataframes

2022-09-08

Attendees: Isaac, Rahul, Selman

Agenda

  • q: How is collaboration happening?
  • PR progress

Notes

  • Saving
    • Something currently working
    • But should change encoding type
  • Collaboration
    • Shared PR
  • Tasks for next week
    • assert_equal (Rahul)
    • writing (after review)
    • indexing (together)
    • concatenation (reach goal)
  • Meet same time next week
Select a repo