AnnData Dev notes

--- tags: anndata, hiwi-cohort, meeting, dask --- # 2024-05-16 ## Eljas - Bug for subsetting hvg, PR opened, Phil looked into it, and hten improve tests + merge - Would be nice to refactor - Want to merge Severin's performance PR before refactor, though ## Phil - Fixed broken doctests, moving testing to `src/testing/scanpy` - Maybe move all of scanpy into `src`? Or maybe should wait until Phil is back, updating open PRs ## Ilan - TypeScript anndata-js almost done - Finished up sprint stuff as well - `xarray` release is out so no blocker now for backed `{obs,var}` ## Severin - Need to talk to Isaac about HVG performance PR - could make it single-threaded, but need Isaac's opinion - Wants to work on mean-var optimization as well for scanpy # 2024-05-13 ## Severin - Needs to work on thesis - Scrublet slow because of caching or imports - Wants to tweet about docker image ## Phil - Not sure if running ASV is good - Newer python projects using codspeed - Could give more insight if we could upload our stuff there as well - Codspeed is good but we don't control hardware and don't need to to pay money (for GPU) - Not sure why scrublet is not working on a certain dataset ## Ilan - Work on clearing up sprint PRs - Backed mode scanpy PR looks good # 2024-05-06 ## Ilan - scipy array api at a point where discussion needed/1d array PR needed ## Severin - Tests written for multi-GPU implementation - Testing dask on 3k dataset - Want to do scaling and neighbors - How to make clear that Dask is experimental? Different branch? Settings? ## Phil - Few small things, missed notebook stuff, tutorials have some reproucibility issues (text referring to different cluster numbers) - Myst has support for fixing this via expressions, but didn't work with scverse tutorials # 2024-05-02 ## Phil - Bug Igraph flavor for leiden, should be fixed early - Added PR with benchmarks for last thing missing: scrublet. is slow, should consider if possible to speed up (Severin: can look into if have time) - Move tests out of package for anndata; warning about coverage break ## Ilan - working on array api for scipy sparse; including notimplementederror for proper documentation - spatialdata db project vitessce stuff ## Severin - hvg functions nicer, numba kernel is right choice but unclear if multiprocessing is. - looking into ilans stuff on clusters - getmeanvar - will have to do analyses in next weeks so reduced dev time ## Eljas - Important bug discovered in HVG: Seurat and cellranger with batch arguments just report the first 2000 genes instead of the HVGs. Seurat V3 is safe from this. - Has MRCE for Severin # 2024-04-29 Görlitz GPU Hackathon: results good, but messy, need to be integrated well ## Phil - Put standups in scverse calendar ## Severin - Isaac found memory peaks in some of Severin’s numba code, needs to measure what’s going on - Currently working on integrating hackathon work - Need to smartly handle numba multithreading with small data: preferably use single thread instead ## Ilan - Going to work on sprint stuff now that hackathon is over # 2024-04-15 ## Eljas * Pearson2dask * using scaline to profile * numba faster than dask where possible (i.e., datasets in memory) * try numba in dask * Try dataset from dask notebooks ## Ilan * Was sick * Look at benchmark PR * ## Phil * Benchmarks ## Isaac * # 2024-04-11 ## Ilan * csr_array pr * Weird coverage * `read_dask_elem` * benchmarking ## Phil * `scanpy` benchmarking * `pytest` bug * Next focus benchmarking ## Severin * sparse aggregate ## Isaac * gpu support * # 2024-04-08 ## Ilan * sparray PR * vitessce PR ## Isaac * scipy 13 review * a lot of reviews * follow up with severin about aggregation ## Phil * Benchmarking suite ## Severin * Another follow PR to finetune sparse scale, + benchmarking * Other aggregate functions PR working # 2024-04-04 ## Ilan * # Isaac * GenomicRanges stuff from Hackathon * Will helpf Phil to get on denbi # Eljas * Has draft PR for Dask for Pearson residuals * Keep Pearson normalized PCA, make compatible with dask? ## Phil * Fixing benchmarking bugs * Setting up benchmarking for scanpy # 2024-03-28 ## Ilan * Vitessce this week * csr_array/ csc_array * SpatialDataWrapper for hackathon * Benchmarking ## Eljas * Pearson residuals ## Phil * Bibtex * Benchmarking * Figuring out what to do for hackathon next week ## Isaac * Scale * Announcement * GroupBy # 2024-03-25 ## Isaac * Want to get release out ## Ilan * Will handle the scipy 1.13 release issue by intercepting CRS column operatoin * Use CZI fixed URLs and update test ## Phil * checking out yuge's `obs` vs. `obs` * `src` directory PR, will be done post-release # 2024-03-21 ## Isaac * Working with Ilan on dask tutorial * Performance issues with sparse notebook? ## Phil * Will work on docs PR in meantime so notebook renders well * Updated benchmark system so there is a status check on the latest commit * Need security for running on PRs from non-scverse (untrusted) people; can do label-based mechanism, but this is not a very good security setup ## Ilan * working on vindex issue for `_subset_obs_in_place` + `obsp` ## Eljas * Looking into pearson issue and will be joining thursdays rather than mondays # 2024-03-18 ## Phil * Benchmarking machine working ## Isaac * Working on tutorials, need new tutorials (TODO: Ilan) * Options for tutorial: By the book release, secret release, or with sparse * Create client, set memory limit, etc. # 2024-03-07 ## Isaac * Reivews mostly, maybe bug fix release of AnnData * Issue for 64 bit indptr on disk but 32 bit indices will probably not be done before this bug fix release despite desire * Big push for docs implementation for new neighbors implementation ## Phil * PR mechanism for benchmarking machine works * Still doing some tuning and then enabling for test repos * # 2024-03-04 ## Ilan * Finish(ing) the dask sparse sum and scaling PR (scale, hvg, normalize) * Move on to creating usable branch for array-api with sparse ## Eljas * Looking for things to dask-ify ## Phil * Benchmarking machine - looking in to tuning, PR comments posted back to run-branches, use git token so no rate limiting * For tuning, it would be nice for someone to look at it - how do we get CPU isolation? Maybe just a shell script run on startup * Switching AnnData to `src` folder - was waiting on PyTest response, but now that we have the reply, it still does not work # 2024-02-29 ## Ilan * almost done with sparse-in-dask * some questions about anndata’s dask helpers * xarray PR should get merged soon ## Isaac ## Phil * some progress on scverse/plotting_api w/ Gregor * # 2024-02-26 ## Philipp * Starting first test runs on the benchmark machine, seeing if we can respond to requests from github * Figuring out automatic runs should be first step * security, capacity etc. * review PR with control genes scoring ## Eljas * Just back from vacation, busy for the time being ## Ilan * Started on sparse-in-dask for `_mean_var` and so on * Going to do tutorial notebooks ## Isaac * Working on 64bit writing and then on to GPU stuff # 2024-02-22 ## Isaac * Not sure what to do for RC docs, but doing it the RC soon * TODO release: - [x] make docs read “1.10.0rc1 (2024-02-22)” - [x] make 0.10.x branch - [x] do it ## Phil * Benchmarking * Getting things running fine * Uploading and publish ## Ilan * array api test suite running on sparse doesn’t look pretty # 2024-02-19 ## Ilan * Discovered and working on Pandas bug * AnnData behavior difference in and out of Docker ## Eljas * Triaging issues * Looking through issues, mainly [Dask](https://github.com/scverse/scanpy/issues/2578) ## Philipp * sparse in HVG work: try to "compute" everything and just see what passes, but "compute" doesn't actually work * get_mean_var doesn't work with sparse * One more PR (semantic version identifier) waiting for review ## Isaac * igraph PR review coming * seaborn issue discussed with eljas fixed by updating seaborn * look at aggregate PR as well and then release! * release on tuesday or thursday # 2024-02-15 ## Isaac * igraph review * Maybe just keep current default for leiden and then switch later? Dont want to hold up and this *is* a breaking change * We have said we do semver, and this should include numerics w/i reason * Switch notebooks to igraph and leave default * Add warning for future default change `leidenalg` * [mask argument](https://scverse.zulipchat.com/#narrow/stream/393966-scanpy-anndata-dev/topic/mask.20argument) * axis specific arguments should probably be added eventually * while mask is new, we should don the change now * Don't infer mask from number of obs and var - plus, might want to do both * This will push back the release, but this is worth it * mindeps job * Happy with implementation, but want the flexibility to specify min version and also incorporate bug fixes * Follow up issue for this? * Can't specify .0 for dask because there is not a release on the first day of the month * docs change after release candidate * aggregate: Phil will be asked for review ## Philipp * All PRs are done or waiting for review * Isaac just approved HVG * Can add suppport sparse-in-dask later always - experimental in AnnData anyway * will do mask PR, then ask Ilan for review * ask yuge about dans PR :white_check_mark: ## Ilan * All PRs are done * Leiden: see above * Dask: Always write 64 bit sparse indices/indptr # 2024-02-12 ## Ilan * xarray categorical PR: seems to be going in a good direction * sparse array indexing fixes: compatible with array API, mostly done * starts with dask 64 bit stuff in scipy ## Isaac * PCA order problem: just throw warning * Needs to do some reviews ## Phil * blocked on PRs * scrublet PR not renamed, Isaac needs to follow up on a comment * fixed plotting warnings * benchmarking machine now gives ssh prompt via VPN but cannot login ## Eljas * Sick, so did not complete much last week * HVG meeting tomorrow * need experience with dask becuase it will be important for ehrapy * maybe sparse-dask-chunks integration into scanpy? * upstream things into dask as appropriate * focus on things that work on entire count matrix # 2024-02-08 ## Phil * Out today, we'll do sprint planning tuesday * HVG PR is ready for re-review ## Isaac * GPU issue * Min deps PR * order issue * plotting ## Ilan * waiting on review for leiden * # 2024-02-05 ## Phil * Maybe more PR's reviewed by Ilan to speed things up. * pytest 8.0 should be able to handle `src` structure, but still figuring out other odds and ends e.g., with `doctest` * Open issue, but maintainers are not very receptive to using `sys.module` to fix imports for testing ## Ilan * ## Isaac * Busy with other non-coding responsibilities * Some reviews to do # 2024-02-01 ## Ilan * xarray PR on track, Categorical index is supported upstream * rename `_config.py` to something `settings` ## Isaac * min deps is almost done * mysterious test fail with a mask test ## Eljas * hvg is mergeable if we don’t do orthogonal flavor/ordering parameters ## Phil * HVG genes in dask: converting to pandas is faster. but do people want to keep dask around for *everything*? * dask takes more memory for multiprocessing, and we can always return to it if we need since it's in the code history * could wrap single_batch computation in delayed job and use pandas within the job * Fixing hvg bugs as well? * bins bug from six years ago should maybe be revisited even though it changes tests * scrublets * upset plot would be helpful, but will take time becuase test datasets are really bug * could use denbi for running tests faster as well # 2024-01-29 ## Ilan * Settings PR ready for final review * igraph change hard to test because of inconsistent results based on CPU etc * scanpy-tutorials notebooks have warnings, we want to hide them in tutorials people read but see them ourselves so we can fix them. * either do fix-warnings branch in scanpy and run notebooks from that branch * or configure MyST (if easy) to hide stderr while leaving warnings in .ipynb files ## Philipp * Wrap things up, focusing on finishing conversations * scrublet PR decisions on booleans to run or not run pre-processing steps * concat api waiting for danila's opinion * doublets prediction is dependent on neighbors implementation, which is problematic ## Elijas * Not much in the way of updates * Work on refactoring in coming week * Review of ehrapy ## Isaac * CI is finding wrong version, maybe because commit/version tags are different, or maybe bugged version of pip * Unblocked on min test jobs PRs, no ideas about pandas problem but some ideas for anndata problem * aggregate PR is good to go, maybe modify axis parameter * metrics PR post-mortem: 64 bit is enough to pick up differences, but 32 bit is not i.e., exact was never correct but 32 bit was imprecise enough # 2024-01-22 ## Isaac * Plots are wrong in the minimum test deps * TODO: We should make an issue for setting random seeds on tests for plotting (make issue) * Maybe everything? A decorator? Setting? * TODO: Issue in AnnData for random seeds/data generation? * TODO: Find plots that are different every time and pass as being the same (make meta-issue) * TODO: Check for documentation of every parameter (maybe make issue?) ## Philipp * Working on Dask HVG * Issues with dataframes - no way to leave Dask as Dask within AnnData dataframe namespace (i.e., `var`/`obs`) * Otherwise simple fixes/issues * Would be good to handle this in xarray in the future * Benchmarking * No word back from contractor (wrote email) * We need improvement in contractor situation (accountability measures/mechanisms) ## Ilan * Xarray stuff – supporting pandas extension arrays * Bug fixes for anndata * Settings PR * Probably looking good, especially if it's mainly "expert" developer facing * Just missing static type/ autocomplete stuff ## Eljas * No word back from Seurat people * Closing stale issues * No chance to look at refactoring yet # 2024-01-18 ## Eljas ## Philipp ## Isaac # 2024-01-15 ## Ilan * Mostly working on categorical arrays * scanpy leiden, some confusion about what different parameters should be ## Isaac * Reviewing some things and merging some stuff into AnnData * Started looking at Elijas' PR - trying to cut down arguments * Order of operations in merging is the difference with Seurat * Problems matching the paper's algorithm (they do not implement their own algorithm) * If they don't consider it a bug, we don't need to handle that, i.e., no need for batch_merging parameter and can leave flavors as they were * Might make sense to leave it in for the future i.e., purely rank based might not be the way to do it * argument might motivate making some repeated functionality more modular * Going back to aggregation and finishing up ## Philipp * Search feature is up and running * Working on dask for HVG * Working more on scrublets, finding gold standard for that # 2024-01-09 ## Ilan * Xarray PR for categoricals * maybe wants us to look over it? * Will let us know * config pr * being reviewed by phil * Trying to figure out how to do tab completions ## Phil * Scanpy documentation stuff * Links are broken * String dtypes * Will look into how pandas does this ## Eljas * HVG – * https://github.com/scverse/scanpy/pull/2792 * # 2023-12-07 ## Ilan * Thesis work upcoming ## Eljas * highly_variable_genes ## Phil * # 2023-12-05 ## Agenda * Go over Ilan work plan * Intro to Ilan * Sprint planning ## Notes * Work plan for next week * Benchmark machine maybe available soon? * Vacation times: * Isaac: Dec 15th to Jan 8th * Phil: Dec 23rd to Jan 7th/ 8th * Ilan: Dec 23rd - Jan 8th (remote) or 22nd * # 2023-10-31 * Lets figure out new meeting times after planning day * Ilan's joining may be delayed ## Eljas * Seurat v3 hvg issue * seurat inconsistency * https://github.com/scverse/scanpy/issues/2088 * https://github.com/scverse/scanpy/issues/1733 * https://github.com/scverse/scanpy/issues/2151 * Previous pr to fix: * https://github.com/scverse/scanpy/pull/1732 * TODO: resolve to a single tracking issue + * Seaborn: * https://scverse.zulipchat.com/#narrow/stream/328272-scanpy/topic/bug.20on.20sc.2Epl.2Eviolin/near/398325411 * Someone has reponded ## Selman * Wrap up tests * Blog post * OOC concat * AnnData ## Sprint retrospective # 2023-10-24 ## Eljas * Seaborn inconsistency * `catplot`: https://github.com/scverse/scanpy/issues/2680 * seurat inconsistency # 2023-10-17 ## Eljas * Seaborn # 2023-10-10 ## Selman * dask - pr * Isaac wants to run this * h5py * Solution a little unclear * Maybe needs another take? ## Eljas * Triage meetings? * 10x mtx reader * sk-imbalanced learn * ## # 2023-09-04 ## Eljas * Doing bug overview * Balanced subsampling * scverse/scanpy#987 * Doc PR waiting ## Selman * Docs for concat on disk * h5py * zarr locking - * Next week: h5py attribute siz ## Giula * Environment issues * Plotting (let us know which) ## Ilan # 2023-08-29 ## Ilan * Finished up on zarr * Finished up on aggregation ## Selman * https://github.com/scverse/anndata-tutorials/pull/18 * Probably done in a day or two * Working on concat on disk tutorial ## Phil * rank_genes_groups -> maybe just do utilities for now? # 2023-08-22 ## Ilan * Trouble deduplicating tests * PR: aggregation in scanpy * Otherwise on vacation: rest of week ## Selman * Dask PCA merged * Distributed write problem * Distributed scheduleler h5py write problem * * Work on docs for concat on disk next week ## Phil * AnnData2ri * PR review ## Isaac * Severin joining these. * Working on PRs ## Eljas * Documentation for neighbors, confusion of connectivities/ distances ## Next meetings * Thursday 10:30 or like 16:00 # 2023-08-01 ## Selman * dask-pca pr * discussion of default behaviour around default solver * exams 9th ## Ilan * Starting documentation # 2023-07-25 ## Selman * TODO * concat on disk example PR, some dask issue * Finish up dask * Giula * Kinda blocked by windows stuff * Ilan * Open issue for discussing future of CSC/ CSR Dataset * get https://github.com/scverse/anndata/pull/765 up and running * How do we get sparse_dataset following scipy.sparray semantics # 2023-07-18 ## Selman * TODO: * Examples usage (Selmaan will comment) * Isaac will look over conversations * Scanpy * Will open a PR * geom * Conversation with francessca ## Ilan * Naming of axes causing issues with xarray * xarray # 2023-07-04 ## Selman * PR review * circular import problem * getting chunk size ## Giulia * No update * PingWe should have to do something something else else else but but I don't don't want it on on my my my life life I think think we need a new new car and # 2023-06-27 ## Selman * Review * Next project ## Isaac & Phil * Infrastructure updates * HPC access * Out of core support for scanpy planning * Formatting ## Giulia * Gregor * Dask array in scanpy * On vacation for half of next month and a half, tbd ## Isaac * sparse_arrays coming # 2023-06-16 ## Intros * Phil * Giulia ## Rahul * Napari spatial data * General cleaning + bugfixes * Benchmarking * On vacation for next month ## Selman * Spatial Graph loader * * Test case for out of core concation * Global parameter * Reviewing ## Giulia * Days available: Mon, Tues, Fri * 1 day per week ## Phil * Trying agile? * How many meetings a week ## Semester break * July 24th - Oct 15th * # 2023-05-19 ## Ilan * Write up a message about dask + sparse * Send a message outlining points on views of views reading in * Notebook will be going up on anndata-notebooks ## Rahul * Tools are pretty close to done ## Selman * # 2023-05-12 ## Rahul * Preprocessing mostly done * Plotting? * tools next ## Selman * Benchmarks set up for concat on disk * Numbers from pres * https://github.com/syelman/anndata/tree/concat-on-disk-benchmark * "streaming through one dataset" writing working ## Ilan * Adding datatypes * Reprs don't load now * `to_memory()` * `exclude_keys`, `drop` method * Tutorial: * Indexing – something about .view(`tuple`) * Test – Check how many times keys are accessed # 2023-05-05 ## Selman * Possible Pandas * Benchmarking: * Still reading up on asv * Hashlib issue around dask * https://github.com/dask/dask/issues/10240 ## Rahul * Created a PR into scanpy * Has 10x read benchmarks * TODO: * Add github action * Next are preprocessing functions ## Ilan * Review later * Still having trouble with testing # 2023-04-21 ## Selman * Dask comparison issue * Boolean array: solution makes sense * Benchmark suite for out of core concat ## Rahul ## Ilan # 2023-04-14 ## Selman * Bool issue * Performance warning for reindexing sparse arrays? * Pandas 2.0 ## Rahul * More things beig benchmarked # 2023-03-31 ## Selman * Version without reindexing * Reindexing next, but we can merge without it * Maybe still needs to do obsm * Thesis? * New read function ## Ilan * Read remote pr close to usable * Lazily indexed zarr arrays, may be a solution to obs loading in to memory * Maybe use xarray lazily indexed arrays? ## Rahul * IDP on tuesday 2pm * Monday trial run # 2023-03-29 ## Rahul * Napari * Tests + docs to finish up * Benchmarking * Basic setup * Which benchmarks * Read/ write benchmarks * Zarr as well * Next week IDP presentation * Work on Indexing benchmarks * # 2023-03-24 ## Ilan * Transforms for spatial data stuff * Consolidated metadata * SparseDataset – Comments ## Selmaan * # 2023-03-17 ## Selmaan * Opened a PR for concatenation * Test suite * ## Ilan * Base class for anndata? What should it do * * Sparse class question, why the changes * Backed mode for zarr * # 2023-03-10 ## Ilan * Currenly have a backed thing working * TODO: * Cutting down read time a bit more * Review on zarr store ## Selman * Reading on kerchunk * tring to get something by early next week ## Rahul # 2023-03-03 ## Selmaan * Pytorch geometric * Out of core concatenation * Can probably delay on dataframes * TODO: Email accounts ask florian ## Ilan * Directions * Subsetting - refactor for saving obs_names, var_names on anndata * PR into main * Categorical zarr array done * SparseDataset * Wait on my PR * PR Managment ## Benchmarking * Meeting about benchmarking early next week * # 2023-02-10 ## Selman * Graph stuff * Just finish up ## Ilan * OME-NGFF out finished * Finish up backed support for # 2023-02-03 ## General * Did the hour increase go through? – Rahul messaged Daniela * Meeting about the graph project – Selman will message on mattermost * Do HIWIs have to record their hours now? ## Rahul * Benchmarking machine Monday meeting 10:30 * Still problems with getting the benchmarking running * Maybe we'll just merge it, after checking the files are all the same as in rahul's branch * https://github.com/rahulbshrestha/anndata/tree/benchmark/.github/workflows ## Selman * Shadow objects * Thinks it looks mostly good * AnnData * https://dask-awkward.readthedocs.io/en/latest/ # 2023-01-27 ## Email accounts * Not working, maybe need to be re-activated * Selman will CC me on email, I will look into it too ## Selmaan * Tutorials: * Need to add dask tutorial to the docs/tutorials/index.rst on scverse/anndata * Remove old copy of ipynb * Shadows * Set up meeting next wednesday at earliest * Fixing the dask views: by next thursday ## Rahul * Mostly cleaned up, but hitting an error # 2023-01-19 ## Rahul * Finish https://github.com/scverse/anndata/pull/848 for next week * Will continue on benchmarking afterwards # 2023-01-18 ## Selmaan * Wrapping up dask arrays * Views * Tests – try memray? * Notebook * Submodule in anndata * Make a PR to anndata viewing the tutorial PR * Take notes so it's easy to make a bot later * Next project * Read a subset of entries? * Think about API, check out: * https://github.com/scverse/postdata/blob/main/docs/examples/shadow-objects.ipynb * Maybe check in on dask dataframes? * dask-polars could be another direction # 2022-12-06 ## Notes * Practice 11am next monday * * Selmaan * Move notebook over to anndata-tutorials * Working on fixing views of dask arrays, currently having issues with `__setitem__` * Isaac: as backup could just * Rahul * Make a PR with just action workflow and benchmarks * xarray: * asv run benchmark * xarray has run-benchmark tag * From current PR: * Remove docs, remove dataset files * Paper on benchmarking * Specific versions of linux * Requirements that their container does * # 2022-11-11 * Selmaan * TODO: * Docs * Later becnhmarks * Rahul * https://github.com/pydata/xarray/tree/main/asv_bench * Send how to for github actions * Send links on example github actions * Priorities: * Github actions setup * Use directory structure like xarray * # 2022-11-03 ## Agenda * Updates ## Notes * Selmaan * For next week * Notebooks * Don't modify in-place * Rahul * For next week * PR for benchmarking suite * Start with h5ad IO * # 2022-10-21 ## Agenda * Updates * Plans ## Notes * Benchmarks * Setting up benchmarks * Datasets * Picking datasets * Modernizing setup * All benchmarks running * Goals * Current new benchmarks * copy * Come up with a plan for moving things over * Dask * concatenation done * tests for to_memory done * Ready for review # 2022-10-13 ## Agenda * Updates * Plans ## Notes * Updates * Selmaan * Concatenation with other types (seems to work) * Updating tests for to_memory * Rahul * Benchmarking * Will do a pr into anndata-benchmarks * Wondering what the datasets thing is (Isaac also unsure) * Plans for next week * Rahul * Script for managing datasets * Scope out benchmarks * Selmaan * Finish up concatenation # 2022-10-06 ## Agenda * Updates * Arrays * DataFrames * To discuss * Future of dataframes? ## Notes * Selmaan IDP * Due oct 20th * Isaac out of town 17th-19th * Updates * Dataframes * Might be blocked by no dimension size * Rahul will investigate, but maybe switch onto benchmarking * * Array * `to_memory` * concatenation between array types * Should either do # 2022-09-29 ## Notes * Selmaan * IO test `to_memory` * more like `compute` than `persist` * https://stackoverflow.com/questions/41806850/dask-difference-between-client-persist-and-client-compute * Concat * 1 error, with outer indexing * Rahul * Dataframes * # 2022-09-26 ## Notes * Review * `assert_equal` done * `to_memory` * Questions about what this should do? Isaac thinks `.compute` * Maybe `persist` is also valid, but would need an argument. Also precedence from xarray * Still need IO test * Concatenation * Checking equality computes * Can this at least error * What does xarray do here? * https://docs.xarray.dev/en/stable/user-guide/combining.html * apply_to_array computes * Was due to pd.api.take * Probably dispatch to a different function * Meet next 3:30 thursday * Rahul will start looking at dataframes * Selman will finish up concat, clean up rest of arrays * IDP from Selman # 2022-09-15 ## Notes * Review * assert_equal * Test comparison b/w sparse and dask * dtypes * IO * Add one simple test making sure np.ndarray * Indexing * Mostly works * Concatenation * It's going okay * Isaac will do a review pass * Questions * Output type for dask concatenation * Plan for next week * Meetin monday 26th at 1:30pm * Concatenation * Response to review * Start looking at dataframes # 2022-09-08 *Attendees: Isaac, Rahul, Selman* ## Agenda * q: How is collaboration happening? * PR progress ## Notes * Saving * Something currently working * But should change encoding type * Collaboration * Shared PR * Tasks for next week * assert_equal (Rahul) * writing (after review) * indexing (together) * concatenation (reach goal) * Meet same time next week

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.