owned this note
owned this note
Published
Linked with GitHub
---
tags: anndata, hiwi-cohort, meeting, dask
---
# 2024-05-16
## Eljas
- Bug for subsetting hvg, PR opened, Phil looked into it, and hten improve tests + merge
- Would be nice to refactor
- Want to merge Severin's performance PR before refactor, though
## Phil
- Fixed broken doctests, moving testing to `src/testing/scanpy`
- Maybe move all of scanpy into `src`? Or maybe should wait until Phil is back, updating open PRs
## Ilan
- TypeScript anndata-js almost done
- Finished up sprint stuff as well
- `xarray` release is out so no blocker now for backed `{obs,var}`
## Severin
- Need to talk to Isaac about HVG performance PR - could make it single-threaded, but need Isaac's opinion
- Wants to work on mean-var optimization as well for scanpy
# 2024-05-13
## Severin
- Needs to work on thesis
- Scrublet slow because of caching or imports
- Wants to tweet about docker image
## Phil
- Not sure if running ASV is good
- Newer python projects using codspeed
- Could give more insight if we could upload our stuff there as well
- Codspeed is good but we don't control hardware and don't need to to pay money (for GPU)
- Not sure why scrublet is not working on a certain dataset
## Ilan
- Work on clearing up sprint PRs
- Backed mode scanpy PR looks good
# 2024-05-06
## Ilan
- scipy array api at a point where discussion needed/1d array PR needed
## Severin
- Tests written for multi-GPU implementation
- Testing dask on 3k dataset
- Want to do scaling and neighbors
- How to make clear that Dask is experimental? Different branch? Settings?
## Phil
- Few small things, missed notebook stuff, tutorials have some reproucibility issues (text referring to different cluster numbers)
- Myst has support for fixing this via expressions, but didn't work with scverse tutorials
# 2024-05-02
## Phil
- Bug Igraph flavor for leiden, should be fixed early
- Added PR with benchmarks for last thing missing: scrublet. is slow, should consider if possible to speed up (Severin: can look into if have time)
- Move tests out of package for anndata; warning about coverage break
## Ilan
- working on array api for scipy sparse; including notimplementederror for proper documentation
- spatialdata db project vitessce stuff
## Severin
- hvg functions nicer, numba kernel is right choice but unclear if multiprocessing is.
- looking into ilans stuff on clusters
- getmeanvar
- will have to do analyses in next weeks so reduced dev time
## Eljas
- Important bug discovered in HVG: Seurat and cellranger with batch arguments just report the first 2000 genes instead of the HVGs. Seurat V3 is safe from this.
- Has MRCE for Severin
# 2024-04-29
Görlitz GPU Hackathon: results good, but messy, need to be integrated well
## Phil
- Put standups in scverse calendar
## Severin
- Isaac found memory peaks in some of Severin’s numba code, needs to measure what’s going on
- Currently working on integrating hackathon work
- Need to smartly handle numba multithreading with small data: preferably use single thread instead
## Ilan
- Going to work on sprint stuff now that hackathon is over
# 2024-04-15
## Eljas
* Pearson2dask
* using scaline to profile
* numba faster than dask where possible (i.e., datasets in memory)
* try numba in dask
* Try dataset from dask notebooks
## Ilan
* Was sick
* Look at benchmark PR
*
## Phil
* Benchmarks
## Isaac
*
# 2024-04-11
## Ilan
* csr_array pr
* Weird coverage
* `read_dask_elem`
* benchmarking
## Phil
* `scanpy` benchmarking
* `pytest` bug
* Next focus benchmarking
## Severin
* sparse aggregate
## Isaac
* gpu support
*
# 2024-04-08
## Ilan
* sparray PR
* vitessce PR
## Isaac
* scipy 13 review
* a lot of reviews
* follow up with severin about aggregation
## Phil
* Benchmarking suite
## Severin
* Another follow PR to finetune sparse scale, + benchmarking
* Other aggregate functions PR working
# 2024-04-04
## Ilan
*
# Isaac
* GenomicRanges stuff from Hackathon
* Will helpf Phil to get on denbi
# Eljas
* Has draft PR for Dask for Pearson residuals
* Keep Pearson normalized PCA, make compatible with dask?
## Phil
* Fixing benchmarking bugs
* Setting up benchmarking for scanpy
# 2024-03-28
## Ilan
* Vitessce this week
* csr_array/ csc_array
* SpatialDataWrapper for hackathon
* Benchmarking
## Eljas
* Pearson residuals
## Phil
* Bibtex
* Benchmarking
* Figuring out what to do for hackathon next week
## Isaac
* Scale
* Announcement
* GroupBy
# 2024-03-25
## Isaac
* Want to get release out
## Ilan
* Will handle the scipy 1.13 release issue by intercepting CRS column operatoin
* Use CZI fixed URLs and update test
## Phil
* checking out yuge's `obs` vs. `obs`
* `src` directory PR, will be done post-release
# 2024-03-21
## Isaac
* Working with Ilan on dask tutorial
* Performance issues with sparse notebook?
## Phil
* Will work on docs PR in meantime so notebook renders well
* Updated benchmark system so there is a status check on the latest commit
* Need security for running on PRs from non-scverse (untrusted) people; can do label-based mechanism, but this is not a very good security setup
## Ilan
* working on vindex issue for `_subset_obs_in_place` + `obsp`
## Eljas
* Looking into pearson issue and will be joining thursdays rather than mondays
# 2024-03-18
## Phil
* Benchmarking machine working
## Isaac
* Working on tutorials, need new tutorials (TODO: Ilan)
* Options for tutorial: By the book release, secret release, or with sparse
* Create client, set memory limit, etc.
# 2024-03-07
## Isaac
* Reivews mostly, maybe bug fix release of AnnData
* Issue for 64 bit indptr on disk but 32 bit indices will probably not be done before this bug fix release despite desire
* Big push for docs implementation for new neighbors implementation
## Phil
* PR mechanism for benchmarking machine works
* Still doing some tuning and then enabling for test repos
*
# 2024-03-04
## Ilan
* Finish(ing) the dask sparse sum and scaling PR (scale, hvg, normalize)
* Move on to creating usable branch for array-api with sparse
## Eljas
* Looking for things to dask-ify
## Phil
* Benchmarking machine - looking in to tuning, PR comments posted back to run-branches, use git token so no rate limiting
* For tuning, it would be nice for someone to look at it - how do we get CPU isolation? Maybe just a shell script run on startup
* Switching AnnData to `src` folder - was waiting on PyTest response, but now that we have the reply, it still does not work
# 2024-02-29
## Ilan
* almost done with sparse-in-dask
* some questions about anndata’s dask helpers
* xarray PR should get merged soon
## Isaac
## Phil
* some progress on scverse/plotting_api w/ Gregor
*
# 2024-02-26
## Philipp
* Starting first test runs on the benchmark machine, seeing if we can respond to requests from github
* Figuring out automatic runs should be first step
* security, capacity etc.
* review PR with control genes scoring
## Eljas
* Just back from vacation, busy for the time being
## Ilan
* Started on sparse-in-dask for `_mean_var` and so on
* Going to do tutorial notebooks
## Isaac
* Working on 64bit writing and then on to GPU stuff
# 2024-02-22
## Isaac
* Not sure what to do for RC docs, but doing it the RC soon
* TODO release:
- [x] make docs read “1.10.0rc1 (2024-02-22)”
- [x] make 0.10.x branch
- [x] do it
## Phil
* Benchmarking
* Getting things running fine
* Uploading and publish
## Ilan
* array api test suite running on sparse doesn’t look pretty
# 2024-02-19
## Ilan
* Discovered and working on Pandas bug
* AnnData behavior difference in and out of Docker
## Eljas
* Triaging issues
* Looking through issues, mainly [Dask](https://github.com/scverse/scanpy/issues/2578)
## Philipp
* sparse in HVG work: try to "compute" everything and just see what passes, but "compute" doesn't actually work
* get_mean_var doesn't work with sparse
* One more PR (semantic version identifier) waiting for review
## Isaac
* igraph PR review coming
* seaborn issue discussed with eljas fixed by updating seaborn
* look at aggregate PR as well and then release!
* release on tuesday or thursday
# 2024-02-15
## Isaac
* igraph review
* Maybe just keep current default for leiden and then switch later? Dont want to hold up and this *is* a breaking change
* We have said we do semver, and this should include numerics w/i reason
* Switch notebooks to igraph and leave default
* Add warning for future default change `leidenalg`
* [mask argument](https://scverse.zulipchat.com/#narrow/stream/393966-scanpy-anndata-dev/topic/mask.20argument)
* axis specific arguments should probably be added eventually
* while mask is new, we should don the change now
* Don't infer mask from number of obs and var - plus, might want to do both
* This will push back the release, but this is worth it
* mindeps job
* Happy with implementation, but want the flexibility to specify min version and also incorporate bug fixes
* Follow up issue for this?
* Can't specify .0 for dask because there is not a release on the first day of the month
* docs change after release candidate
* aggregate: Phil will be asked for review
## Philipp
* All PRs are done or waiting for review
* Isaac just approved HVG
* Can add suppport sparse-in-dask later always - experimental in AnnData anyway
* will do mask PR, then ask Ilan for review
* ask yuge about dans PR :white_check_mark:
## Ilan
* All PRs are done
* Leiden: see above
* Dask: Always write 64 bit sparse indices/indptr
# 2024-02-12
## Ilan
* xarray categorical PR: seems to be going in a good direction
* sparse array indexing fixes: compatible with array API, mostly done
* starts with dask 64 bit stuff in scipy
## Isaac
* PCA order problem: just throw warning
* Needs to do some reviews
## Phil
* blocked on PRs
* scrublet PR not renamed, Isaac needs to follow up on a comment
* fixed plotting warnings
* benchmarking machine now gives ssh prompt via VPN but cannot login
## Eljas
* Sick, so did not complete much last week
* HVG meeting tomorrow
* need experience with dask becuase it will be important for ehrapy
* maybe sparse-dask-chunks integration into scanpy?
* upstream things into dask as appropriate
* focus on things that work on entire count matrix
# 2024-02-08
## Phil
* Out today, we'll do sprint planning tuesday
* HVG PR is ready for re-review
## Isaac
* GPU issue
* Min deps PR
* order issue
* plotting
## Ilan
* waiting on review for leiden
*
# 2024-02-05
## Phil
* Maybe more PR's reviewed by Ilan to speed things up.
* pytest 8.0 should be able to handle `src` structure, but still figuring out other odds and ends e.g., with `doctest`
* Open issue, but maintainers are not very receptive to using `sys.module` to fix imports for testing
## Ilan
*
## Isaac
* Busy with other non-coding responsibilities
* Some reviews to do
# 2024-02-01
## Ilan
* xarray PR on track, Categorical index is supported upstream
* rename `_config.py` to something `settings`
## Isaac
* min deps is almost done
* mysterious test fail with a mask test
## Eljas
* hvg is mergeable if we don’t do orthogonal flavor/ordering parameters
## Phil
* HVG genes in dask: converting to pandas is faster. but do people want to keep dask around for *everything*?
* dask takes more memory for multiprocessing, and we can always return to it if we need since it's in the code history
* could wrap single_batch computation in delayed job and use pandas within the job
* Fixing hvg bugs as well?
* bins bug from six years ago should maybe be revisited even though it changes tests
* scrublets
* upset plot would be helpful, but will take time becuase test datasets are really bug
* could use denbi for running tests faster as well
# 2024-01-29
## Ilan
* Settings PR ready for final review
* igraph change hard to test because of inconsistent results based on CPU etc
* scanpy-tutorials notebooks have warnings, we want to hide them in tutorials people read but see them ourselves so we can fix them.
* either do fix-warnings branch in scanpy and run notebooks from that branch
* or configure MyST (if easy) to hide stderr while leaving warnings in .ipynb files
## Philipp
* Wrap things up, focusing on finishing conversations
* scrublet PR decisions on booleans to run or not run pre-processing steps
* concat api waiting for danila's opinion
* doublets prediction is dependent on neighbors implementation, which is problematic
## Elijas
* Not much in the way of updates
* Work on refactoring in coming week
* Review of ehrapy
## Isaac
* CI is finding wrong version, maybe because commit/version tags are different, or maybe bugged version of pip
* Unblocked on min test jobs PRs, no ideas about pandas problem but some ideas for anndata problem
* aggregate PR is good to go, maybe modify axis parameter
* metrics PR post-mortem: 64 bit is enough to pick up differences, but 32 bit is not i.e., exact was never correct but 32 bit was imprecise enough
# 2024-01-22
## Isaac
* Plots are wrong in the minimum test deps
* TODO: We should make an issue for setting random seeds on tests for plotting (make issue)
* Maybe everything? A decorator? Setting?
* TODO: Issue in AnnData for random seeds/data generation?
* TODO: Find plots that are different every time and pass as being the same (make meta-issue)
* TODO: Check for documentation of every parameter (maybe make issue?)
## Philipp
* Working on Dask HVG
* Issues with dataframes - no way to leave Dask as Dask within AnnData dataframe namespace (i.e., `var`/`obs`)
* Otherwise simple fixes/issues
* Would be good to handle this in xarray in the future
* Benchmarking
* No word back from contractor (wrote email)
* We need improvement in contractor situation (accountability measures/mechanisms)
## Ilan
* Xarray stuff – supporting pandas extension arrays
* Bug fixes for anndata
* Settings PR
* Probably looking good, especially if it's mainly "expert" developer facing
* Just missing static type/ autocomplete stuff
## Eljas
* No word back from Seurat people
* Closing stale issues
* No chance to look at refactoring yet
# 2024-01-18
## Eljas
## Philipp
## Isaac
# 2024-01-15
## Ilan
* Mostly working on categorical arrays
* scanpy leiden, some confusion about what different parameters should be
## Isaac
* Reviewing some things and merging some stuff into AnnData
* Started looking at Elijas' PR - trying to cut down arguments
* Order of operations in merging is the difference with Seurat
* Problems matching the paper's algorithm (they do not implement their own algorithm)
* If they don't consider it a bug, we don't need to handle that, i.e., no need for batch_merging parameter and can leave flavors as they were
* Might make sense to leave it in for the future i.e., purely rank based might not be the way to do it
* argument might motivate making some repeated functionality more modular
* Going back to aggregation and finishing up
## Philipp
* Search feature is up and running
* Working on dask for HVG
* Working more on scrublets, finding gold standard for that
# 2024-01-09
## Ilan
* Xarray PR for categoricals
* maybe wants us to look over it?
* Will let us know
* config pr
* being reviewed by phil
* Trying to figure out how to do tab completions
## Phil
* Scanpy documentation stuff
* Links are broken
* String dtypes
* Will look into how pandas does this
## Eljas
* HVG –
* https://github.com/scverse/scanpy/pull/2792
*
# 2023-12-07
## Ilan
* Thesis work upcoming
## Eljas
* highly_variable_genes
## Phil
*
# 2023-12-05
## Agenda
* Go over Ilan work plan
* Intro to Ilan
* Sprint planning
## Notes
* Work plan for next week
* Benchmark machine maybe available soon?
* Vacation times:
* Isaac: Dec 15th to Jan 8th
* Phil: Dec 23rd to Jan 7th/ 8th
* Ilan: Dec 23rd - Jan 8th (remote) or 22nd
*
# 2023-10-31
* Lets figure out new meeting times after planning day
* Ilan's joining may be delayed
## Eljas
* Seurat v3 hvg issue
* seurat inconsistency
* https://github.com/scverse/scanpy/issues/2088
* https://github.com/scverse/scanpy/issues/1733
* https://github.com/scverse/scanpy/issues/2151
* Previous pr to fix:
* https://github.com/scverse/scanpy/pull/1732
* TODO: resolve to a single tracking issue +
* Seaborn:
* https://scverse.zulipchat.com/#narrow/stream/328272-scanpy/topic/bug.20on.20sc.2Epl.2Eviolin/near/398325411
* Someone has reponded
## Selman
* Wrap up tests
* Blog post
* OOC concat
* AnnData
## Sprint retrospective
# 2023-10-24
## Eljas
* Seaborn inconsistency
* `catplot`: https://github.com/scverse/scanpy/issues/2680
* seurat inconsistency
# 2023-10-17
## Eljas
* Seaborn
# 2023-10-10
## Selman
* dask - pr
* Isaac wants to run this
* h5py
* Solution a little unclear
* Maybe needs another take?
## Eljas
* Triage meetings?
* 10x mtx reader
* sk-imbalanced learn
*
##
# 2023-09-04
## Eljas
* Doing bug overview
* Balanced subsampling
* scverse/scanpy#987
* Doc PR waiting
## Selman
* Docs for concat on disk
* h5py
* zarr locking -
* Next week: h5py attribute siz
## Giula
* Environment issues
* Plotting (let us know which)
## Ilan
# 2023-08-29
## Ilan
* Finished up on zarr
* Finished up on aggregation
## Selman
* https://github.com/scverse/anndata-tutorials/pull/18
* Probably done in a day or two
* Working on concat on disk tutorial
## Phil
* rank_genes_groups -> maybe just do utilities for now?
# 2023-08-22
## Ilan
* Trouble deduplicating tests
* PR: aggregation in scanpy
* Otherwise on vacation: rest of week
## Selman
* Dask PCA merged
* Distributed write problem
* Distributed scheduleler h5py write problem
*
* Work on docs for concat on disk next week
## Phil
* AnnData2ri
* PR review
## Isaac
* Severin joining these.
* Working on PRs
## Eljas
* Documentation for neighbors, confusion of connectivities/ distances
## Next meetings
* Thursday 10:30 or like 16:00
# 2023-08-01
## Selman
* dask-pca pr
* discussion of default behaviour around default solver
* exams 9th
## Ilan
* Starting documentation
# 2023-07-25
## Selman
* TODO
* concat on disk example PR, some dask issue
* Finish up dask
* Giula
* Kinda blocked by windows stuff
* Ilan
* Open issue for discussing future of CSC/ CSR Dataset
* get https://github.com/scverse/anndata/pull/765 up and running
* How do we get sparse_dataset following scipy.sparray semantics
# 2023-07-18
## Selman
* TODO:
* Examples usage (Selmaan will comment)
* Isaac will look over conversations
* Scanpy
* Will open a PR
* geom
* Conversation with francessca
## Ilan
* Naming of axes causing issues with xarray
* xarray
# 2023-07-04
## Selman
* PR review
* circular import problem
* getting chunk size
## Giulia
* No update
* PingWe should have to do something something else else else but but I don't don't want it on on my my my life life I think think we need a new new car and
# 2023-06-27
## Selman
* Review
* Next project
## Isaac & Phil
* Infrastructure updates
* HPC access
* Out of core support for scanpy planning
* Formatting
## Giulia
* Gregor
* Dask array in scanpy
* On vacation for half of next month and a half, tbd
## Isaac
* sparse_arrays coming
# 2023-06-16
## Intros
* Phil
* Giulia
## Rahul
* Napari spatial data
* General cleaning + bugfixes
* Benchmarking
* On vacation for next month
## Selman
* Spatial Graph loader
*
* Test case for out of core concation
* Global parameter
* Reviewing
## Giulia
* Days available: Mon, Tues, Fri
* 1 day per week
## Phil
* Trying agile?
* How many meetings a week
## Semester break
* July 24th - Oct 15th
*
# 2023-05-19
## Ilan
* Write up a message about dask + sparse
* Send a message outlining points on views of views reading in
* Notebook will be going up on anndata-notebooks
## Rahul
* Tools are pretty close to done
## Selman
*
# 2023-05-12
## Rahul
* Preprocessing mostly done
* Plotting?
* tools next
## Selman
* Benchmarks set up for concat on disk
* Numbers from pres
* https://github.com/syelman/anndata/tree/concat-on-disk-benchmark
* "streaming through one dataset" writing working
## Ilan
* Adding datatypes
* Reprs don't load now
* `to_memory()`
* `exclude_keys`, `drop` method
* Tutorial:
* Indexing – something about .view(`tuple`)
* Test – Check how many times keys are accessed
# 2023-05-05
## Selman
* Possible Pandas
* Benchmarking:
* Still reading up on asv
* Hashlib issue around dask
* https://github.com/dask/dask/issues/10240
## Rahul
* Created a PR into scanpy
* Has 10x read benchmarks
* TODO:
* Add github action
* Next are preprocessing functions
## Ilan
* Review later
* Still having trouble with testing
# 2023-04-21
## Selman
* Dask comparison issue
* Boolean array: solution makes sense
* Benchmark suite for out of core concat
## Rahul
## Ilan
# 2023-04-14
## Selman
* Bool issue
* Performance warning for reindexing sparse arrays?
* Pandas 2.0
## Rahul
* More things beig benchmarked
# 2023-03-31
## Selman
* Version without reindexing
* Reindexing next, but we can merge without it
* Maybe still needs to do obsm
* Thesis?
* New read function
## Ilan
* Read remote pr close to usable
* Lazily indexed zarr arrays, may be a solution to obs loading in to memory
* Maybe use xarray lazily indexed arrays?
## Rahul
* IDP on tuesday 2pm
* Monday trial run
# 2023-03-29
## Rahul
* Napari
* Tests + docs to finish up
* Benchmarking
* Basic setup
* Which benchmarks
* Read/ write benchmarks
* Zarr as well
* Next week IDP presentation
* Work on Indexing benchmarks
*
# 2023-03-24
## Ilan
* Transforms for spatial data stuff
* Consolidated metadata
* SparseDataset – Comments
## Selmaan
*
# 2023-03-17
## Selmaan
* Opened a PR for concatenation
* Test suite
*
## Ilan
* Base class for anndata? What should it do
*
* Sparse class question, why the changes
* Backed mode for zarr
*
# 2023-03-10
## Ilan
* Currenly have a backed thing working
* TODO:
* Cutting down read time a bit more
* Review on zarr store
## Selman
* Reading on kerchunk
* tring to get something by early next week
## Rahul
# 2023-03-03
## Selmaan
* Pytorch geometric
* Out of core concatenation
* Can probably delay on dataframes
* TODO: Email accounts ask florian
## Ilan
* Directions
* Subsetting - refactor for saving obs_names, var_names on anndata
* PR into main
* Categorical zarr array done
* SparseDataset
* Wait on my PR
* PR Managment
## Benchmarking
* Meeting about benchmarking early next week
*
# 2023-02-10
## Selman
* Graph stuff
* Just finish up
## Ilan
* OME-NGFF out finished
* Finish up backed support for
# 2023-02-03
## General
* Did the hour increase go through? – Rahul messaged Daniela
* Meeting about the graph project – Selman will message on mattermost
* Do HIWIs have to record their hours now?
## Rahul
* Benchmarking machine Monday meeting 10:30
* Still problems with getting the benchmarking running
* Maybe we'll just merge it, after checking the files are all the same as in rahul's branch
* https://github.com/rahulbshrestha/anndata/tree/benchmark/.github/workflows
## Selman
* Shadow objects
* Thinks it looks mostly good
* AnnData
* https://dask-awkward.readthedocs.io/en/latest/
# 2023-01-27
## Email accounts
* Not working, maybe need to be re-activated
* Selman will CC me on email, I will look into it too
## Selmaan
* Tutorials:
* Need to add dask tutorial to the docs/tutorials/index.rst on scverse/anndata
* Remove old copy of ipynb
* Shadows
* Set up meeting next wednesday at earliest
* Fixing the dask views: by next thursday
## Rahul
* Mostly cleaned up, but hitting an error
# 2023-01-19
## Rahul
* Finish https://github.com/scverse/anndata/pull/848 for next week
* Will continue on benchmarking afterwards
# 2023-01-18
## Selmaan
* Wrapping up dask arrays
* Views
* Tests – try memray?
* Notebook
* Submodule in anndata
* Make a PR to anndata viewing the tutorial PR
* Take notes so it's easy to make a bot later
* Next project
* Read a subset of entries?
* Think about API, check out:
* https://github.com/scverse/postdata/blob/main/docs/examples/shadow-objects.ipynb
* Maybe check in on dask dataframes?
* dask-polars could be another direction
# 2022-12-06
## Notes
* Practice 11am next monday
*
* Selmaan
* Move notebook over to anndata-tutorials
* Working on fixing views of dask arrays, currently having issues with `__setitem__`
* Isaac: as backup could just
* Rahul
* Make a PR with just action workflow and benchmarks
* xarray:
* asv run benchmark
* xarray has run-benchmark tag
* From current PR:
* Remove docs, remove dataset files
* Paper on benchmarking
* Specific versions of linux
* Requirements that their container does
*
# 2022-11-11
* Selmaan
* TODO:
* Docs
* Later becnhmarks
* Rahul
* https://github.com/pydata/xarray/tree/main/asv_bench
* Send how to for github actions
* Send links on example github actions
* Priorities:
* Github actions setup
* Use directory structure like xarray
*
# 2022-11-03
## Agenda
* Updates
## Notes
* Selmaan
* For next week
* Notebooks
* Don't modify in-place
* Rahul
* For next week
* PR for benchmarking suite
* Start with h5ad IO
*
# 2022-10-21
## Agenda
* Updates
* Plans
## Notes
* Benchmarks
* Setting up benchmarks
* Datasets
* Picking datasets
* Modernizing setup
* All benchmarks running
* Goals
* Current new benchmarks
* copy
* Come up with a plan for moving things over
* Dask
* concatenation done
* tests for to_memory done
* Ready for review
# 2022-10-13
## Agenda
* Updates
* Plans
## Notes
* Updates
* Selmaan
* Concatenation with other types (seems to work)
* Updating tests for to_memory
* Rahul
* Benchmarking
* Will do a pr into anndata-benchmarks
* Wondering what the datasets thing is (Isaac also unsure)
* Plans for next week
* Rahul
* Script for managing datasets
* Scope out benchmarks
* Selmaan
* Finish up concatenation
# 2022-10-06
## Agenda
* Updates
* Arrays
* DataFrames
* To discuss
* Future of dataframes?
## Notes
* Selmaan IDP
* Due oct 20th
* Isaac out of town 17th-19th
* Updates
* Dataframes
* Might be blocked by no dimension size
* Rahul will investigate, but maybe switch onto benchmarking
*
* Array
* `to_memory`
* concatenation between array types
* Should either do
# 2022-09-29
## Notes
* Selmaan
* IO test `to_memory`
* more like `compute` than `persist`
* https://stackoverflow.com/questions/41806850/dask-difference-between-client-persist-and-client-compute
* Concat
* 1 error, with outer indexing
* Rahul
* Dataframes
*
# 2022-09-26
## Notes
* Review
* `assert_equal` done
* `to_memory`
* Questions about what this should do? Isaac thinks `.compute`
* Maybe `persist` is also valid, but would need an argument. Also precedence from xarray
* Still need IO test
* Concatenation
* Checking equality computes
* Can this at least error
* What does xarray do here?
* https://docs.xarray.dev/en/stable/user-guide/combining.html
* apply_to_array computes
* Was due to pd.api.take
* Probably dispatch to a different function
* Meet next 3:30 thursday
* Rahul will start looking at dataframes
* Selman will finish up concat, clean up rest of arrays
* IDP from Selman
# 2022-09-15
## Notes
* Review
* assert_equal
* Test comparison b/w sparse and dask
* dtypes
* IO
* Add one simple test making sure np.ndarray
* Indexing
* Mostly works
* Concatenation
* It's going okay
* Isaac will do a review pass
* Questions
* Output type for dask concatenation
* Plan for next week
* Meetin monday 26th at 1:30pm
* Concatenation
* Response to review
* Start looking at dataframes
# 2022-09-08
*Attendees: Isaac, Rahul, Selman*
## Agenda
* q: How is collaboration happening?
* PR progress
## Notes
* Saving
* Something currently working
* But should change encoding type
* Collaboration
* Shared PR
* Tasks for next week
* assert_equal (Rahul)
* writing (after review)
* indexing (together)
* concatenation (reach goal)
* Meet same time next week