Polars: issue triage

# Polars: issue triage Need some extra eyes on an issue before making a decision? ## 2025-11-28 - https://github.com/pola-rs/polars/issues/24998#issuecomment-3439397950 - ok to change in `sum_horizontal`? - `list` ## 2025-10-03 * preserve nulls in pct_change https://github.com/pola-rs/polars/issues/22670. OK to change as "breaking" (bug fix)? ## 2025-09-05 * compare datatypes: https://github.com/pola-rs/polars/issues/24210 * `lit(Series)` behavior - https://github.com/pola-rs/polars/issues/23337. `lit()` constructor does not always create a scalar (e.g. `lit(Series)` creates a column) - suggested to have some form of API that strictly always creates scalars? * See latest issue comments for suggested APIs * Change `from_arrow` to always return `Series` from `ArrowStreamExportables` https://github.com/pola-rs/polars/issues/24511 ## 2025-08-08 * When `Date` needs to be transformed to `Datetime`, which resolution to use? * Reject `df.rename(a=b)` ## 2025-07-11 * `sample()` should shuffle when `fraction=1` (https://github.com/pola-rs/polars/issues/13685) * Sample should output order always and shuffle would be needed to * Add `pl.row_index()` (https://github.com/pola-rs/polars/issues/22164) * Previously rejected - https://github.com/pola-rs/polars/issues/12420#issuecomment-1880145564 * Expression for percent en/decoding (https://github.com/pola-rs/polars/issues/23390) * `sample` with fraction should have an `exact: bool` flag * Revisit before 2.0, as this may be a breaking change. * Make issue for this (Simon) * Pipe with schema * Make issue for this (Gijs) ## 2025-06-13 - nulls in n-ary operations (https://github.com/pola-rs/polars/issues/16200, https://github.com/pola-rs/polars/issues/17827): - decision: align `{any,all}_horizontal` with vertical ones - simple linear regression: https://github.com/pola-rs/polars/issues/22089. Close, and just document as "recipe"? - out of scope, but document - "kind of a tragedy" timedelta truncation https://github.com/pola-rs/polars/issues/11625 - `dt.days_in_month`: ok to accept contribution? sure - `replace_strict` https://github.com/pola-rs/polars/issues/22593 ambiguousness related to broadcasting - `Series.__setitem__`: strict cast? ## Future * Categorical API for mapping category values https://github.com/pola-rs/polars/issues/22311 * Proposed https://github.com/pola-rs/polars/issues/22311#issuecomment-2828372397 ## 2025-02-21 - `quarter_end`, with different quarters. 'q[jan]' / 'q[feb]' / etc...? https://github.com/pola-rs/polars/issues/21183 should be fine. how to specify starting month? `pl.col('a').dt.quarter_end(start_month=12)` implementation should share things - Use PyCapsule Interface in `from_dataframe`? https://github.com/pola-rs/polars/pull/21377 - Kahan summation keeps coming up: - https://github.com/pola-rs/polars/issues/21358 - https://github.com/pola-rs/polars/issues/5325 - https://github.com/pola-rs/polars/issues/9318 - https://github.com/pola-rs/polars/issues/9383 ## 2025-01-24 - Business days are a recurring request from company's I've taught at. Let's discuss plans for broader support? https://github.com/pola-rs/polars/issues/20884 - remove LazyFrame.clone? - ## 2024-11-29 - floordiv vs truncdiv? https://github.com/pola-rs/polars/pull/19949 - lazy assertion utilities: something like `.assert_no_nulls`? I remember reading about such requests a few times but can't find them right now - boundary behaviour of `interpolate` https://github.com/pola-rs/polars/pull/18355 - preserve leading and trailing nulls (current behaviour), or extend (like NumPy)? Or add `forward_fill_by`? - configurable plotting backend - gonna try to get something together using Plotly - interchange protocol: move `from_dataframe`, or even consider deprecating? https://github.com/pola-rs/polars/issues/20065 ## 2024-10-04 - rolling: behaviour of consecutive elements? https://github.com/pola-rs/polars/issues/18126 close this for Polars - allow_exact_matches in `join_asof`? https://github.com/pola-rs/polars/issues/7932 - Series.to_dummies: `dummy_null`? resolved: easy to workaround ## 2024-08-23 - Return type of `group_by_dynamic`: for lazy its aligned with LazyGroupBy, but not for eager. Align, and include all the convenience methods (e.g. `.sum()` etc.) on `group_by_dynamic` too? https://github.com/pola-rs/polars/issues/17916 yes. also: enable common subexpression elimination in Polars 1.7 - Iterating over groups in `LazyFrame.group_by` - close? https://github.com/pola-rs/polars/issues/8966 - cumulative n_unique https://github.com/pola-rs/polars/issues/18157 start by adding to the docstring ## 2024-07-26 - There's two similar issue labels: `incomplete` and `needs repro`. Just choose one? - `unary_elementwise` vs `apply_generic` - are both necessary? just standardise on one? e.g. - `apply_into_string_amortized` vs `apply_to_buffer`: the latter is only used in the `pig_latinnify` example. just, use `apply_into_string_amortized` instead, then teach that to plugins users, so they have a function which also works with non-string inputs? - ok to add `binary_into_string_amortized`? ## 2024-07-12 - `Expr.list.is_duplicated` - does it need adding, or is `list.eval` enough? https://github.com/pola-rs/polars/issues/6137 also asked in: https://github.com/pola-rs/polars/issues/9466 use list.n_unique vs list.len, optimise - GD Script? Not familiar with this, can it just be closed? https://github.com/pola-rs/polars/issues/6575 - symmetric-difference join: outer + filter + let lazy / new streaming engine take care of it? https://github.com/pola-rs/polars/issues/6947 - DataFrame.value_counts? this was a fairly popular addition to pandas 1.1 https://github.com/pola-rs/polars/issues/6138 - pl.from_numpy: overload return type depending on input ndim? https://github.com/pola-rs/polars/issues/17454 Gijs to make enh issue about non-writeable flag ## 2024-06-28 - deprecate `DataFrame.rows`? Why not use tell people to use `list(df.iter_rows())` if they really need the entire list, so that people who don't can enjoy lower peak memory usage for free? no - extra rolling dtypes: https://github.com/pola-rs/polars/issues/16988 in some cases, where it makes sense - multiply dataframe by list: https://github.com/pola-rs/polars/issues/17147 rejected - extra stats in pivot_table: https://github.com/pola-rs/polars/issues/16372 rejected - `.list.map_elements`: https://github.com/pola-rs/polars/issues/16452. it's already possible, and it's probably good if such performance destroyers are hard to write? - `DataFrame.show`? https://github.com/pola-rs/polars/issues/16534 accepted, add configs as args, support for lazyframe, only print (without returning anything) ## 2024-06-14 - reverse changing default `coalesce` in left join - try to address autocomplete in ipython: https://github.com/pola-rs/polars/issues/16933 - broadcast binary operations: https://github.com/pola-rs/polars/issues/16070. Should return result and raise on different-length inputs ## 2024-05-31 - push forwards deprecation of `offset` argument in `truncate` / `round`/ `upsample`? https://github.com/pola-rs/polars/pull/15478 . I'm working on optimising them and it just adds complexity... ok! - `descending` in `top_k` - `unnest_all` https://github.com/pola-rs/polars/issues/12353 is None, unnest all ## 2024-05-17 - `pivot` / `unpivot`: https://github.com/pola-rs/polars/issues/11974 alias melt to unpivot - accept `rolling / over`? https://github.com/pola-rs/polars/issues/12051 accepted - `dt.month_end` for datetimes: last moment in month? breaking change for 1.0? document better - `struct.with_fields`: https://github.com/pola-rs/polars/issues/16082 pl.col.coords.struct.with_fields(x=pl.element().struct.field("x").sqrt()) ## 2024-05-03 - `by` argument: when to split out into `func_by`, and when to have `func(..., by=...)`? The former when not all args are applicable, the latter when they are? - ok for group-by-dynamic to change type of index column? https://github.com/pola-rs/polars/issues/15878 - https://github.com/pola-rs/polars/issues/16021 : plugins need to specify `dtype` of `pl.lit`? ## 2023-04-18 - https://github.com/pola-rs/polars/issues/15736 : should cache size be limited? - InvalidOperationError vs ComputeError - when? - https://github.com/pola-rs/polars/issues/15754#issuecomment-2065713093: quantile interpolation, default to `'linear'`? - https://github.com/pola-rs/polars/issues/15700 replace_columns - README import time comparison: perhaps package size would be good to include as well / instead? - Polars: 85 MB - pandas + numpy = 163 MB - AWS Lambda package size limit: 250 MB - pandas + numpy + pyarrow = 290 MB visually: https://github.com/pola-rs/polars/issues/11599#issuecomment-2066308420 - call for proposals for PyData Amsterdam - anyone submitting? https://amsterdam.pydata.org/ ## 2024-03-22 - https://github.com/pola-rs/polars/issues/9859 : simple aggs on list (e.g. list.min, list.max) - https://discord.com/channels/908022250106667068/1082941945216782397/1220375752428753027 : top_k `descending` feels backwards todo: make issues about descending and maintain_order done: - https://github.com/pola-rs/polars/issues/15238 - https://github.com/pola-rs/polars/issues/15236 - https://github.com/pola-rs/polars/issues/15193 : "start date" of iso week + year combination - https://github.com/pola-rs/polars/issues/10054 `group_by(...).top_k` - `top_k(k, by, group_by)` would be more natural? ## 2024-03-08 - https://github.com/pola-rs/polars/issues/10481 `filename` arg in IO functions - https://github.com/pola-rs/polars/issues/10794 EWMA by time - https://github.com/pola-rs/polars/issues/10989 inconsistent `by` - https://github.com/pola-rs/polars/issues/9099 : warning about map_elements return_dtype - https://github.com/pola-rs/polars/issues/10968 : reading object from pandas. infer? respect `object`?