owned this note
owned this note
Published
Linked with GitHub
# Scikit-learn Monthly Developer
Next major release (1.9) scheduled date: beginning of May 18th, 2026
Release manager: TBD
Backup release manager: I (Loïc) can do it if noone else volunteers
## December 15, 2025 @ 4pm CET
### Updates
- [name=Tim] Next major release in May 2026, congrats on 1.8 coming out!
### Topics
- [name=Loïc] 3-tier system for estimators to help resource allocation/priority setting?
- Tier1 would be what we should focus on, important estimators that we know are useful and used. I think there is a broad agreement on this amongst maintainers. For example `RandomForest*`, `HistGradientBoosting*`, `LogisticRegresion`, `Pipeline`, `ColumnTransformer`, etc ...
- Tier3 would be "no new feature", "no bug fix", however easy they are. We can use the website + maintainer agreement to decide.
- [name=Olivier]: I am not comfortable with "no bug fixes": I would rather say: expect that bug reports on those estimators will not actively be fixed by the core team and that reviews on bugfix PR are very likely to receive attention unless fixing a critical bug (e.g. a crash)
- Tier2 would be the rest, big list, maybe "no new feature, only bugfix" rule of thumb, except in special cases.
- [name=Gael] Do a wording somewhere that says that opens a path for people to investing time and become actors of the package, thus influencing the tier rating
- **General feeling** it's all about the wording, nothing wrong with the idea
[name=Christian] Some thoughts about a tier system:
- What about first the vision, then the mission, then the priorities? We had a shy start with the vision some time ago in https://hackmd.io/H_CrV5OvSYiJVI-kWEg8ow.
- [name=Gael] Maybe we should publish this
- What impact will a tier system have?
- Shouldn't we always accept bugfixes? Otherwise we could remove the functionality altogether, IMHO.
- [name=Gael] removing functionality will create huge negative press
- [name=Olivier] it will also be significant work on agreeing to remove
- [name=Christian] Announce paid work (like grants) for scikit-learn on the blog, always. Consider making it a requirement.
- **General feeling** Agreed. We should do a better work on this, both on the previously funded grants, and on new one
- [name=Stefanie] in [#32680](https://github.com/scikit-learn/scikit-learn/issues/32680) we had been discussing on exchanging a few names of contributors who do a great job and whom we could collectively spend more attention (review faster, ping for opinions, ...)
- some names I'd share would be @virchan @EmilyXinyi @glevv @Tialo
- [name=Loïc] @dkobak has been active in manifold and TSNE lately sometimes reviewing stuff
- maybe we can find a (easy to keep) format to regularily add and remove names (edited)
- ideally we would be able to see that on github
- it is possible to see who is a first time contributor on github and we could mark their pull requests; equally, maybe there is a way to see if the author of a PR is in a certain team
- https://github.com/jupyter/pr-triage-board-bot/tree/main
- project board: https://github.com/orgs/jupyterhub/projects/4
- [name=Olivier]: Virgil is already part of the contributors experience team.
- **General feeling** keep this in mind and explore the possible implementations, including low-tech ones. The goal is to help distributing attention well, and not instutitionalize status via a "badge"
- talk about this topic/list again at the January 2026 meeting
- [name=Christian] Array API for LBFGS solver in LogisticRegression [PR#32644](https://github.com/scikit-learn/scikit-learn/pull/32644)
Christian's concern: Huge maintainability impact (=duplication of `_loss` module). Options should be discussed first.
- [name=Olivier] other (future) estimators with similar anticipated Cython/array API duplication:
- KMeans
- pairwise distance based models (nearest neighbors, nystroem)
- maybe (H)DBSCAN
- [name=Tim]: collect the known cases and solutions we explored and other possible solutions.
- try to have a central place to discuss and catalogue options
- **General feeling**: probably some duplication is a cost to pay to support dedicated hardware. We need to document this (hence collecting the known cases) and be aware about or tradeoffs
- [name=Dea] should I close this PR? Jérémie and Olivier seem unimpressed :-) : [ENH Display Methods in HTML representation](https://github.com/scikit-learn/scikit-learn/pull/31698) ?
- [name=Gael] Can we, should we, do user testing (on the broad idea of displaying methods)?
- [name=Dea] I haven't implemented the latest feedback.
- Ready for feedback [ENH: Display the number of output features](https://github.com/scikit-learn/scikit-learn/pull/31937).
- In progress ~~Feedback needed~~ [ENH Display fitted attributes in HTML representation](https://github.com/scikit-learn/scikit-learn/pull/31442)
- This has 1 Approval already: PR corrects ColumnTransformer dotted-line. [FIX remainder parameter for column transformer visual block](https://github.com/scikit-learn/scikit-learn/pull/32713)
- Ready for feedback. [Remove CSS template substitution in estimators' HTML Display](https://github.com/scikit-learn/scikit-learn/pull/32839)
### Needs attention/decision
- [name=Christian] RFC make response / inverse link / activation function official[Issue#29169](https://github.com/scikit-learn/scikit-learn/issues/29169)
Is it still the consensus? Not that somebody gets frightened if a PR implementing it pops up.
- [name=Gael] I'm +1 on making the inverse link function public, but no change of public API / deprecating
- [name=Gael] Gathering user feedback / point of view on our API changes? (eg changes in linear model API which got bad rap on Linkedin, or moving away from accuracy)
- [name=Dea] Is this the post? https://www.linkedin.com/posts/carl-mcbride-ellis_datascience-machinelearning-activity-7404958491892875264-HEp9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAdeyYBuYY8ZK4lVYO6GfBQ-Lb7_nfnLNw
### Broader discussions
- [name=Gael] Growing the comm's team
--------------------------------------------------------------------------
## November 24, 2025 @ 4pm CET
### Updates
- [name=betatim] Inclusion criteria now includes a "wikipedia rule"
- TL;DR: "don't propose work/algorithms that you (your friend, boss, partner) invented"
- https://github.com/scikit-learn/scikit-learn/pull/32663
- [name=Loïc] quick [dashboard](https://lesteve.github.io/scikit-learn-website-analytics/pyodide-build/dashboard.html) to make scikit-learn.org stats more easily available and usable. [repo](https://github.com/lesteve/scikit-learn-website-analytics) if you are curious.
- [name=Loïc] Update regarding the 1.8 release
- RC planned beginning of the week
- release planned begining of December to have a bit of leeway before Christmas break (just in case we need a quick bug-fix release)
- Discussion on better communication around the release
- consensus ended up being: add release manager and release date in the monthly meeting template (done at the top)
- [name=Dea] Kaggle docker-python developer commented [Bump scikit-learn version from 1.2.2 to 1.7.2](https://github.com/Kaggle/docker-python/pull/1513#issuecomment-3513173265)
### Topics
- Focus groups [name=betatim]
- can we announce the topic for focus group n+1 when focus group n starts?
- suggest your idea for a focus group topic here: X, Y, Z
- do the following board would help: https://github.com/orgs/scikit-learn/projects/24
- Currently active focus groups:
- array API (many active people): https://github.com/orgs/scikit-learn/projects/12
- Olivier / Antoine: [tree-based performance evaluation and comparison](https://github.com/orgs/scikit-learn/projects/26) with xgboost and co.
- EOSS topics:
- [Callback API](https://github.com/orgs/scikit-learn/projects/8): Francois, Jeremie
- [Improve Display]([Link](https://github.com/orgs/scikit-learn/projects/10/views/2)): Lucy, Jeremie, and Guillaume
- [Estimator UI](https://github.com/orgs/scikit-learn/projects/9): Dea, Guillaume
- [name=Loïc] Reluctance from a few maintainers to open a new issue because people jump on it with "can I work on it". What about having a "Not open for contribution" label on new issues? The maintainer on triage is supposed to remove the label when things have clarified and when we actually would welcome a contribution? This feels a bit sad but necessary ...
- "needs decision" label now triggers a bot that comments
- [name=Loïc] 3-tier system for estimators to help resource allocation/priority setting?
- Tier1 would be what we should focus on, important estimators that we know are useful and used. I think there is a broad agreement on this amongst maintainers. For example `RandomForest*`, `HistGradientBoosting*`, `LogisticRegresion`, `Pipeline`, `ColumnTransformer`, etc ...
- Tier3 would be "no new feature", "no bug fix", however easy they are. We can use the website + maintainer agreement to decide.
- Tier2 would be the rest, big list, maybe "no new feature, only bugfix" rule of thumb, except in special cases.
[name=Christian] Some thoughts about a tier system:
- What about first the vision, then the mission, then the priorities? We had a shy start with the vision some time ago in https://hackmd.io/H_CrV5OvSYiJVI-kWEg8ow.
- What impact will a tier system have?
- Shouldn't we always accept bugfixes? Otherwise we could remove the functionality altogether, IMHO.
- [name=Christian] Announce paid work (like grants) for scikit-learn on the blog, always. Consider making it a requirement.
- [name=Stefanie] in [#32680](https://github.com/scikit-learn/scikit-learn/issues/32680) we had been discussing on exchanging a few names of contributors who do a great job and whom we could collectively spend more attention (review faster, ping for opinions, ...)
- some names I'd share would be @virchan @EmilyXinyi @glevv @Tialo
- maybe we can find a (easy to keep) format to regularily add and remove names (edited)
- [name=Olivier] array API and Cython code duplication. Known examples:
- loss functions (`LogisticRegression`)
- k-means (not started)
### Needs attention/decision
- [name=Christian] Deprecation of `penalty` in `LogisticRegression` [PR#32659](https://github.com/scikit-learn/scikit-learn/pull/32659).
This includes change of default `l1_ratio=None` -> `l1_ratio=0`.
What about `l1_ratios` in `LogisticRegressionCV` (keep in mind the change with `use_legacy_attributes`)?:
1. Change default from `None` -> `(0,)`
- This also changes the shape of some attributes (adds a dimension of shape 1).
2. Change default from `None` -> "warn"
- This is then different to `LogisticRegression` for 2 releases.
Option 2 it is.
- [name=Christian] Array API for LBFGS solver in LogisticRegression [PR#32644](https://github.com/scikit-learn/scikit-learn/pull/32644)
Christian's concern: Huge maintainability impact (=duplication of `_loss` module). Options should be discussed first.
- [name=Christian] RFC make response / inverse link / activation function official[Issue#29169](https://github.com/scikit-learn/scikit-learn/issues/29169)
Is it still the consensus? Not that somebody gets frightened if a PR implementing it pops up.
### Action items
### Archived meeting notes:
- https://github.com/scikit-learn/administrative/tree/master/monthly_meetings
### Next meeting
Automatically configured as a recurring event on the shared calendar:
- https://blog.scikit-learn.org/calendar/
## October 27, 2025 @ 4pm CEST
### Updates
- [name=Guillaume] upcoming release 1.8 in November
### Topics
- [name=Olivier] Security hardening via isolated release workflow in NumPy (done) and SciPy (WIP):
- https://github.com/numpy/numpy-release
- https://github.com/scipy/scipy-release
- [name=Adrin] docs repo size
- it's been tricky, might require a non-shallow clone (100GB)
- might require deleting the `scikit-learn.github.io` repo and creating a new one / renaming one
- https://github.com/scikit-learn/scikit-learn/issues/32562
- [name=Olivier] Bi-weekly task force organization to advance stalled high priority roadmap items.
- First iteration: Antoine and Olivier focussing on comparing scikit-learn tree based models vs xgboost/lightgbm/catboost on different datasets with different shapes.
- [name=Tim] what is on the roadmap?
- plan which months will be focussed on several months in advance
- idea: announce every 2 months what the focus topic is for the next block and what the topic will be for the next-next block
- For example
- next: tree based models (Nov and Dec)
- next-next: linear models (Jan and Feb)
- [name=Tim] Update the version of scikit-learn in Kaggle
- does someone want to tackle this?
- https://github.com/Kaggle/docker-python
### Need attention/decision
### Action items
### Archived meeting notes:
- https://github.com/scikit-learn/administrative/tree/master/monthly_meetings
### Next meeting
Automatically configured as a recurring event on the shared calendar:
- https://blog.scikit-learn.org/calendar/