Ralf Gommers

@rgommers

Joined on Jul 14, 2018

  • Draft editable version of https://github.com/numpy/numpy/issues/26191. Anyone can feel free to edit here. This table will be synced with the table in the issue description at https://github.com/numpy/numpy/issues/26191 regularly. This table was last synced on 05 November 2024, 10:31am UTC. Package name Compatible release on PyPI? Min compatible version Notes Adaptive
     Like  Bookmark
  • This summary is Ralf's perspective on what happened and how we ended up with a fairly problematic API design tl;dr we started with something that looked a lot like a clean subset of Pandas/cuDF/Modin (all eager) with no row index. We then tried extending to lazy libraries, and ran into a host of problems and took a few wrong turns trying to shoehorn in support for lazy execution. What happened with API evolution over the past ~6 months? We tried to incorporate support for dataframe libraries with lazy execution - Polars LazyFrame most prominently, but issues with it in some cases would also affect Ibis (in particular when used with an SQL backend), Dask, or other libraries. We dropped support for element-wise operations (in gh-242) We attempted to add support for expressions - this was rejected. We added a .persist() method as an execution hint for lazy libraries (see gh-307), to help bridge some of the issues with lazy execution and avoiding otherwise-possibly-repeated calculations. We then instead added a "column ownership" concept where column instances are owned by the dataframe that they came from. Column-column and column-dataframe operations are only allowed if there is no more than one parent dataframe (see gh-310).
     Like  Bookmark