# Scikit-learn bi-weekly progress status (even weeks)
**Goal**: internal communication on recent work and short term planning
**Who**: people working the maintenance of scikit-learn and related, in particular at probabl and Inria and maybe others
**Frequency**: every other Monday at 15:00 CET/CEST, unless it happens on the same day as the Monthly meeting.
**Where**: https://meet.google.com/xdm-ozyn-pgj
**Meeting notes**: to be archived on the [scikit-learn org repo](https://github.com/scikit-learn/administrative/tree/master/biweekly_meetings)
## Next meeting templates
**Rules of the game:**
- No question during the progress reports
- Add those questions in the discussion reports
### Progress reports
- [name=Someone]
- item
- item
### Discussion points
- ...
## 2025-11-24
### Progress reports
- [name=Stefanie] (on train with bad WiFi, as usual on trains)
- mostly preparing talk ["OSS Community Building"](https://github.com/StefanieSenger/Talks/tree/main/2025_Building_an_OSS_Community) for Probability 1.0
- finalised PR [CI Add "autoclose" workflow by label setting](https://github.com/scikit-learn/scikit-learn/pull/32504) which now can be used by setting the "autoclose" label
- finalised PR [FIX classification metrics raise on empty input](https://github.com/scikit-learn/scikit-learn/pull/32549)
- fixed merge conflicts in PR [ENH Add zero division handling to cohen_kappa_score](https://github.com/scikit-learn/scikit-learn/pull/31172) since `cohen_kappa_score` now has array api support
- waiting for reviews
- reviewed [DOC add paragraph on "AI usage disclosure" to Automated Contributions Policy and PR Template](https://github.com/scikit-learn/scikit-learn/pull/32566)
- [name=Olivier Grisel]
- Reviewing deprecation related PRs for 1.8
- logistic regression fitted attributes
- logistic regression `penalty`
- [`SVC` with `probability=True`](https://github.com/scikit-learn/scikit-learn/pull/32050)
- Array API:
- review and benchmarking for [LogisticRegression with LBFGS](https://github.com/scikit-learn/scikit-learn/pull/32644)
- GPU speed-up can be significant on large enough problems with `n_samples >> n_features`.
- But requires code duplication (Cython vs array API in the loss module)
- Tree-based models:
- started discussion on binning/histograms for RFs: https://github.com/scikit-learn/scikit-learn/issues/32704
- Antoine confirmed that this can significantly improve the fit time/accuracy tradeoff on real customer data.
- Back to investigate the threading overhead of HGBT on small data:
- https://github.com/scikit-learn/scikit-learn/issues/14306
- confirmed that xgboost has similar performance problems with openmp overhead on small dataset
- working on a heuristic that seems to fix the problem on my Apple M4 CPU
- TODO:
- check if heuristic also work well on 32 core x86_64 CPU
- profile why the scalability w.r.t. number of threads is not better with large number of cores
- BLAS packaging:
- compared AMD AOCL-BLAS from AMD pip packages with BLIS from conda-forge
- reported discrepancy with default number of threads in BLIS https://github.com/conda-forge/blis-feedstock/issues/45
- TODO: craft a minimal reproducer with OpenBLAS / OpenMP overhead.
- [name=Anne]
- DOC discussions
- [Lucy's PR](https://github.com/scikit-learn/scikit-learn/pull/32715) on issues for new contributors
- [adding AI usage disclusure to guidelines and template](https://github.com/scikit-learn/scikit-learn/pull/32566)
- [Draft PR](https://github.com/scikit-learn/scikit-learn/pull/32737) for testing pyrefly (to potentially replace mypy)
- [Issue on passing class vs. instance](https://github.com/scikit-learn/scikit-learn/issues/32719) (occured in Pipeline, currently not caught in all meta-estimators), related to [my other PR](https://github.com/scikit-learn/scikit-learn/pull/32565) (where a better error will be raised when calling `get_tags()`)
- [name=François]
- Callbacks : organised a meeting next week to discuss alternatives for the implementation
- RandomState : looking at the previous issues / SLEPs / discussions regarding the use of RandomState objects in estimators with the idea of restarting the discussion around supporting the Generators and either fixing the behaviour of cloned estimators or the doc, as they are in contradiction right now.
- CI : working on some migration from azure to GHA issues with Loïc:
- [adding doc tests to GHA](https://github.com/scikit-learn/scikit-learn/pull/32730)
- [adding pytest soft dependecy test to GHA](https://github.com/scikit-learn/scikit-learn/pull/32738)
- [automate the labelling of PRs with failed linting](https://github.com/scikit-learn/scikit-learn/pull/32751)
- [name=Arturo]
- Took over [Refactor make_classification](https://github.com/scikit-learn/scikit-learn/pull/32476)
- [name=Loïc]
- scikit-learn
- scikit-learn 1.8 release has never been closer :wink:
- release candidate soonish: today or tomorrow? draft rc PR https://github.com/scikit-learn/scikit-learn/pull/32766. Interested to have a call with Jérémie to see what is missing.
- one PR to decide on `LogisticRegression` + `LogisticRegressionCV` `penalty` deprecation https://github.com/scikit-learn/scikit-learn/pull/32659. Actually 2 Olivier added one more :sweat_smile:.
- planning for 1.8 release beginning December so we have a bit of time to adjust before Christmas break (just in case we need a quick bug-fix release)
- Stefanie's PR: autoclose label is now posting a comment (that works) and closing after two weeks (tested in a fork), feed-back in https://github.com/scikit-learn/scikit-learn/pull/32743
- Windows arm64 segmentation fault has disappeared from the CI but can somehow still reproduced https://github.com/scikit-learn/scikit-learn/pull/32754. TBC.
- conda-lock install hang, unable to reproduce in https://github.com/scikit-learn/scikit-learn/pull/32643 so giving up for now.
- probably other a few not so important things
- [name=Dea]
- [Fixing issue with Column Transformer dotted lines](https://github.com/scikit-learn/scikit-learn/pull/32713) I needed to split [ENH: Display the number of output features](https://github.com/scikit-learn/scikit-learn/pull/31937#issuecomment-3497690242). Targeting the 1.8.1 release probably.
- [name=Guillaume]
- [`skrub` presentation](https://docs.google.com/presentation/d/1ejclBruWTbSQfKk37DBGi_SlgQ1vc6wVWBrAZGP5HHQ/edit?usp=sharing) for AdoptAI
- Feedbacks on some `skore` PRs related to display
### Discussion points
- [name=Stefanie] Can we ask people from the product team to review Dea's CSS PRs?
- [name=François] Looking for people's opinion on the RandomState situation. Should we just update the doc ? Change the behaviour ? Or take an opportunity to move to Generators and fix the behaviour while doing that ?
- [name=Guillaume] did we look at `ty` from Astral: https://blog.edward-li.com/tech/comparing-pyrefly-vs-ty/
- [name=Loïc] what are we trying to fix? The time taken by mypy in pre-commit? Seems like a distraction to me, especially because we are not using typing much and we don't really plan to in the foreseeable future ...
- [name=Olivier] conda-lock freeze happened 3 times to me this morning.
- [name=Loïc] weird ... feel free to take a look at https://github.com/scikit-learn/scikit-learn/pull/32643 and make suggestions about what I am missing ... maybe enabling verbose makes the bug disappear (only half-joking :wink:)
- [name=Dea]
- Just FYI, this bug has been there for years (previous to the first HTML Display PR)[Fixing issue with Column Transformer dotted lines](https://github.com/scikit-learn/scikit-learn/pull/32713)
## 2025-11-10
### Progress reports
- [name=Dea]
- Worked on PR - ready for feedback [ENH: Display the number of output features](https://github.com/scikit-learn/scikit-learn/pull/31937#issuecomment-3497690242)
- Opened PR on Kaggle docker-python [Bump scikit-learn version from 1.2.2 to 1.7.2](https://github.com/Kaggle/docker-python/pull/1513)
- [MAINT Cleaning up old scipy version mentions and code](https://github.com/scikit-learn/scikit-learn/pull/32685)
- [CI Update ubuntu_atlas lock file](https://github.com/scikit-learn/scikit-learn/pull/32653),
- [MAINT Clean up after Python 3.11 bump](https://github.com/scikit-learn/scikit-learn/pull/32646)
- [MAINT Clean up after Scipy min version to 1.10](https://github.com/scikit-learn/scikit-learn/pull/32615)
- Closed PR - couldn't remove old scipy version code [MAINT Clean up old scipy version code](https://github.com/scikit-learn/scikit-learn/pull/32654)
- [name=Loïc]
- scientific Python developer summit
- discussions on various topics
- better handling with the repo activity https://github.com/scientific-python/summit-2025-nov/issues/6
- [google/triage-party](https://github.com/google/triage-party#triage-party-in-production) temporary home, may be hosted somewhere by scientific-python: https://sklearn.triage-party.mriduls.com/s/home
- example usage by kubernetes/minikube: https://teaparty-tts3vkcpgq-uc.a.run.app/s/daily
- admin
- the scikit-learn thanks.dev money (~240$) managed to find its way to our OpenCollective account after a bit more than a week
- NASA ROSES invoice for October
- scikit-learn
- triage last week
- took part in recovering the accidentally deleted scikit-learn.github.io repo :sweat_smile:
- reviewed+merged sponsors page reorg by François G: https://github.com/scikit-learn/scikit-learn/pull/32642
- merged final Python 3.10 -> 3.11 by Dea: https://github.com/scikit-learn/scikit-learn/issues/32650
- merged dependabot PR: https://github.com/scikit-learn/scikit-learn/pull/32629
- merged my own PR with one approval to add example dependencies to our bumping script: https://github.com/scikit-learn/scikit-learn/pull/32557
- macOS-15-intel Azure brownout https://github.com/scikit-learn/scikit-learn/pull/32649
- meson
- shorter path for Cython generated files to avoid Windows MAX_PATH limitation: https://github.com/mesonbuild/meson/pull/15219
- seems like Cython has limitations with spaces in paths for dependency files https://github.com/mesonbuild/meson/issues/15227
- threadpoolctl
- merged Pyodide fix with deprecated method in Pyodide 0.29 https://github.com/joblib/threadpoolctl/pull/201
- investigating main CI failures https://github.com/joblib/threadpoolctl/pull/203
- [name=Gael]
- Gave a talk on skrub at dotAI, the gist being: we need better composable, reusable primitives in data-science
- I did some AI-assisted live coding in a 20mn-long talk, in from of a mixed audience of 600 people. Was scary/fun
- Merged some PRs:
- An example of defining selectors for columns with outliers https://skrub-data.org/dev/modules/multi_column_operations/advanced_selectors.html#custom-criteria-in-filter-example-of-selecting-columns-with-outliers
- Running the TableReport on polars when pyarrow is not install
- Did a PR on selector docs: outline, formulation and see-alsos
- https://github.com/skrub-data/skrub/pull/1745 (if people want to review :) )
- Will need to do a big-picture presentation on what is going on in scikit-learn
- Happy to take input on what should be in
- [name=Adrin]
- Doc repo drama continues...
- Talk @Open Science days @Max Planck
- scikit-learn triage this week
- [name=Arturo]
- Scientific Python developer summit : [CI tool to test jupyterlite](https://github.com/scientific-python/summit-2025-nov/issues/13)
- [name=Stefanie]
- further work on Autoclose bot (PR [CI Add "autoclose" workflow by label setting](https://github.com/scikit-learn/scikit-learn/pull/32504))
- [DOC add information on 'needs triage' label in contribution guide](https://github.com/scikit-learn/scikit-learn/pull/32574) merged
- scientific python summit
- learning
- followed [python packaging tutorial](https://packaging.python.org/en/latest/tutorials/packaging-projects/)
- back on Linear Algebra (geometric interpretation of dot product, cosine similarity, projections
- [name=Guillaume]
- Scientific Python Dev Summit
- Thoughts around tutorials
- TODO:
- Write annual report for the Wellcome Trust grant
- Follow-up Scientific Python admin
- Review Dea and Anne PRs
- [name=Anne]
- Started on Displays by taking over PR on [setting custom `xlim`/`ylim` in
DecisionBoundaryDisplay](https://github.com/scikit-learn/scikit-learn/pull/31693)
- Follow-up PR from updating contributing documentation on [pre-commit instructions](https://github.com/scikit-learn/scikit-learn/pull/32664)
### Discussions
- [name=Guillaume] Is it normal that I get notified that I got fund by thanks.dev?
## 2025-10-27
### Progress reports
- [name=Stefanie] (on vacation)
- finished work on PR [ENH Add zero division handling to cohen_kappa_score](https://github.com/scikit-learn/scikit-learn/pull/31172); happy to get this into 1.8 as planned (reviewed by Adrin, Jérémie and Virgil; needs final review)
- PR [FIX classification metrics raise on empty input](https://github.com/scikit-learn/scikit-learn/pull/32549) (superseeds [#31187](https://github.com/scikit-learn/scikit-learn/pull/31187)) (needs reviews)
- PR [MNT bump to Python 3.11 for pymin_conda_forge_openblas_min_dependencies](https://github.com/scikit-learn/scikit-learn/pull/32530) (merged with Loic)
- took over PR [DOC: Clarify recommended usage of fit_transform() vs fit().transform() in TargetEncoder](https://github.com/scikit-learn/scikit-learn/pull/32347) (merged with Arturo)
- from past weeks:
- PR [DOC Simplify metadata routing example and add short example to docstrings](https://github.com/scikit-learn/scikit-learn/pull/32191) (reviewed by Antoine, awaiting second reviewer)
- PR [CI Add "autoclose" workflow by label setting](https://github.com/scikit-learn/scikit-learn/pull/32504) ready for more reviews
- reviews and other things:
- reviewed [DOC Clearer linear "get your development environment" setup documentation](https://github.com/scikit-learn/scikit-learn/pull/32509) (looks nice, close to be merged)
- reviewed [FIX Infer pos_label in Display method from_cv_results](https://github.com/scikit-learn/scikit-learn/pull/32372) to unplug blockage in Displays
- unblocking reviews needed for some of Lucies PR, before Anne can start, especially [ENH add from_cv_results in PrecisionRecallDisplay (single Display)](https://github.com/scikit-learn/scikit-learn/pull/30508)
- reviewed [MNT Add example dependencies to version bumping script](https://github.com/scikit-learn/scikit-learn/pull/32557)
- reviewed [Add specific error message when users pass estimator class instead of instance to is_regressor() and co.](https://github.com/scikit-learn/scikit-learn/pull/32565)
- found a few tasks for Anne
- issue/discussion on bumping scipy-version for array api(
[Array API test failure for probabilistic metrics with scipy==1.15.0](https://github.com/scikit-learn/scikit-learn/issues/32552))
- Birdaro training sessions on preparing collaborative work (no directly applicable learnings, but exchange)
- [name=Antoine]
- continued [FIX instability of gcv_mode="svd" in RidgeCV](https://github.com/scikit-learn/scikit-learn/pull/32506)
- reviews
- [FIX: Reduce bias of covariance.MinCovDet with consistency correction](https://github.com/scikit-learn/scikit-learn/pull/32117)
- [FEA Add support for arbitrary metrics and informative initialization to MDS](https://github.com/scikit-learn/scikit-learn/pull/32229)
- [name=Olivier]
- many meetings at probabl..., including open source team priority and organization
- array API reviews and merges!
- followed-up on triaged PR from previous weeks (QDA, ...)
- review of upstream fixes for CPython 3.14 free-threading in python-lz4 and impact on joblib and scikit-learn
- lock file fixes and polars regression min reproducer
- started to draft a skrub text embedding + PyTorch ridge classification on google colab: started with polars code:
- [WIP] https://colab.research.google.com/drive/1S03Ry3726urs9I46iS4NowcVcD9V-3Oh?usp=sharing
- WIP reviewing the MAE criterion PR: https://github.com/scikit-learn/scikit-learn/pull/32100
- [name=Anne]
- refined [DOC Clearer linear "get your development environment" setup documentation](https://github.com/scikit-learn/scikit-learn/pull/32509)
- joined conversation on [AI tools like Copilot Coding Agent don't know about / don't respect our Automated Contributions Policy](https://github.com/scikit-learn/scikit-learn/issues/31679#issuecomment-3450994191) and opened [DOC add paragraph on "AI usage disclosure" to Automated Contributions Policy and PR Template](https://github.com/scikit-learn/scikit-learn/pull/32566)
- took over (probably AI generated) PR on [Add specific error message when users pass estimator class instead of instance to is_regressor() and co.](https://github.com/scikit-learn/scikit-learn/pull/32565)
- [name=Guillaume]
- Review PR from Dea regarding showing the number of feature in the HTML representation
- Reported a bug when `remainder="passthrough"` when displaying the HTML representation of a `ColumnTransformer` in a `Pipeline`: [issue](https://github.com/scikit-learn/scikit-learn/issues/32146#issuecomment-3450807154)
- Review PR from Lucy regarding overwriting kwargs in `Display`: [PR](https://github.com/scikit-learn/scikit-learn/pull/32313)
- Had a look at the list of PRs to take over regarding the displays
- Many meetings as well
- [name=Dea]
- Worked on PR [ENH: Display the number of output features](https://github.com/scikit-learn/scikit-learn/pull/31937)
- Spent some time with [Dark mode on documentation may not work as intended](https://github.com/scikit-learn/scikit-learn/issues/32354)
- PR was merged [CI Remove python 3.10 wheels](https://github.com/scikit-learn/scikit-learn/pull/32522)
- PR was merged [MNT Bump to Python 3.11 for remaining pymin CI builds](https://github.com/scikit-learn/scikit-learn/pull/32555)
- [name=Arturo]
- DOC Clarify decision trees complexity [#32583](https://github.com/scikit-learn/scikit-learn/pull/32583)
### Discussion points
- [name=Olivier] start with focus task force this week or discuss it first at the monthly meeting?
- First focus group (Antoine + Olivier):
- tree-based model comparative investigation with xgboost/lightgbm/catboost on many class and thread-based parallelism problems.
- [name=Arturo] Make make_classification draw from the same distribution regardless of n_samples [#32405](https://github.com/scikit-learn/scikit-learn/issues/32405)?
- [name=Guillaume] Issue related to the HTML diagram regarding the feature count.
- ...
## 2025-10-13
### Progress reports
- [name=Olivier]
- Addressed some pending reviews on `sample_weight` doc:
- https://github.com/scikit-learn/scikit-learn/pull/30564
- Follow-up on `d2_log_loss_score` / `d2_brier_score` merges by registering named scorers and common tests + performance fix
- https://github.com/scikit-learn/scikit-learn/pull/32356
- array API
- reviews
- explored testing with testing against dpnp https://github.com/scikit-learn/scikit-learn/pull/32460
- mostly works on CPUs
- probably need non-default driver to get Intel GPU working
- Investigating build slow-down on the weekly lockfile update
- Triaging + attending a 2-day Inria event this week
- TODO: bluesky thread on estimating prediction intervals
- [name=Arturo]
- [DOC Clarify splitter criteria in Random Forest and Decision Tree #32416](https://github.com/scikit-learn/scikit-learn/pull/32416)
- [Make make_classification draw from the same distribution regardless of n_samples #32405](https://github.com/scikit-learn/scikit-learn/issues/32405)
- [DOC Expand description of random_state in make_classification #32406](https://github.com/scikit-learn/scikit-learn/pull/32406)
- [name=Antoine]
- reproducer [BUG RidgeCV with gcv_mode="svd" is unstable](https://github.com/scikit-learn/scikit-learn/issues/32459), originally observed in [FEAT Polynomial Chaos Expansions](https://github.com/scikit-learn/scikit-learn/pull/27842)
- explore sample weight in SGD
- [name=François]
- explored 3 variations to manage the callback context : [private fit function](https://github.com/jeremiedbb/scikit-learn/pull/18), [decorator around fit](https://github.com/jeremiedbb/scikit-learn/pull/19), and [dynamically wrap the fit through a mixin](https://github.com/jeremiedbb/scikit-learn/pull/20)
A meeting will decide between these 3 strategies.
- working on tests to handle an estimator that does not support callback as a child of a meta-estimator that does and vice versa
- Various PRs for deprecation clean-ups for 1.8:
- [Deprecation of response_method=None in make_scorer](https://github.com/scikit-learn/scikit-learn/pull/32457)
- [Deprecation of copy_X in TheilSenRegressor](https://github.com/scikit-learn/scikit-learn/pull/32456)
- [Remove the positional arg deprecation warning for groups param in RFE](https://github.com/scikit-learn/scikit-learn/pull/32454)
- [Rename force_all_finite to ensure_all_finite](https://github.com/scikit-learn/scikit-learn/pull/32452)
- [name=Loïc]
- scikit-learn
- contributing doc restructuring
- easiest thing seems to be a linear "get your dev environment" setup (I find [matplotlib's one](https://matplotlib.org/devdocs/devel/development_setup.html) has the right amount of details) https://github.com/scikit-learn/scikit-learn/issues/32475
- for more unstructured WIP thoughts look at https://hackmd.io/X8T3WmGBRj6g4DTUxhEaoA
- fix weird build issues (modifying a .pxd would not rebuild associated files with rebuild from scratch work-around) https://github.com/scikit-learn/scikit-learn/pull/32420. Planning to open a meson issue longer-term because we ended up there because of our work-around for too long generated paths on Windows.
- opened a tracking issue (with raw brain dump on some aspects) for Azure -> GHA migration. Thomas S may be motivated to help.
- check changelog has link to changelog instructions in the build log: https://github.com/scikit-learn/scikit-learn/pull/32464
- use absolute imports in Cython code meta-issue done: https://github.com/scikit-learn/scikit-learn/issues/32315.
- cython-lint used for linting (PR by MarcoGorelli, tweaks and mistakes by me)
- macOS arm64 CI on GHA (array API with PyTorch mps backend): https://github.com/scikit-learn/scikit-learn/pull/32349
- I commented on the ppc64le (IBM-specific) wheels and Adrin closed it. Main reason: numpy is not doing it so we won't do it either.
- scipy
- 1.16.2 hang on macOS Intel debugged further until an OpenBLAS C reproducer: https://github.com/scipy/scipy/issues/23686#issuecomment-3381958611
- [name=Jérémie]
- reviewed PRs in preparation for 1.8
- finalized https://github.com/scikit-learn/scikit-learn/pull/30134 to make public a function to compute the confusion matrix terms at different thresholds.
- Made comments in Lucy's PR which does the same for any metric to have common code.
- testing different alternatives with François with the callbacks to not depend on public vs private fit in sklearn.
-
- [name=Stefanie]
- involved in working on displaying link to changelog instructions where contributors can find it with Emily and Loic
- [CI Add link to changelog instructions when check-changelog fails](https://github.com/scikit-learn/scikit-learn/pull/32464) merged, alternative PR [CI Add link to changelog instructions](https://github.com/scikit-learn/scikit-learn/pull/31954) closed
- working on CI action on setting autoclose label in [test repo](https://github.com/StefanieSenger/GitHub-Actions-test-repo/blob/main/.github/workflows/autoclose.yml) and discussed with Loic
- issue[RFC: Proposal for autoclose option for non-compliant PRs](https://github.com/scikit-learn/scikit-learn/issues/32207)
- please get involved with your suggestions
- looking through [Displays and Visualisation project board](https://github.com/orgs/scikit-learn/projects/10/views/2) for getting an overview
- reviewed [DOC: Clarify recommended usage of fit_transform() vs fit().transform() in TargetEncode](https://github.com/scikit-learn/scikit-learn/pull/32347) and needs second reviewer
- [name=Anne]
- preparing a first draft for restructuring the contributing docs, starting with [linear description for setting up development environment](https://github.com/scikit-learn/scikit-learn/issues/32475)
- [name=Gael]
- skrub: adding an optional connection to optuna for hyper-parameter search on the DataOps
- this week: P16 days: meeting of broader funding of open source in French academia. Projects present: MAPIE, AEON, tslearn, skrub....
- [name=Dea]
- Worked on PR [ENH: Display the number of output features](https://github.com/scikit-learn/scikit-learn/pull/31937)
- Opened issue about [Dark mode on documentation may not work as intended](https://github.com/scikit-learn/scikit-learn/issues/32354)
- Tried to debug part of previous issue [https://github.com/scikit-learn/scikit-learn/pull/32458](https://github.com/scikit-learn/scikit-learn/pull/32458)
- Commented on [MAINT add jupyter extension and pre-commit in devcontainer](https://github.com/scikit-learn/scikit-learn/pull/32342)
- Commented on [FIX Guess theme based on estimator parent node color](https://github.com/scikit-learn/scikit-learn/pull/32477)
- Commented on [DOC Clearer linear "get your development environment" setup documentation](https://github.com/scikit-learn/scikit-learn/issues/32475)
### Discussion points
- Loïc switching between sparse array and sparse matrix with scikit-learn config. What's the deprecating strategy on our side? https://github.com/scikit-learn/scikit-learn/pull/31177
- Olivier: `RidgeCV` bug: any potentially fixable root cause?
## 2025-09-15
### Progress reports
- [name=Olivier]
- New feature idea: frequency encoding option for `OrdinalEncoder`: https://github.com/scikit-learn/scikit-learn/issues/32161 (please express opinion)
- Got several unrelated feedback from users wanting to use ML surrogate models to approximate/accelerate and explain slow numerical simulation results. So I decided to get a bit more familiar with the sensitivity analysis literature and related open issues or PRs:
- Explored the use of Sobol indices as an alternative to permutation importance (or SAGE):
- https://github.com/scikit-learn/scikit-learn/issues/22453#issuecomment-3284608178
- PR on polynomial chaos expansion with analytical value of Sobol indices for that model
- https://github.com/scikit-learn/scikit-learn/pull/27842
- Thinking about how to document the use of feature importance and the need for unbiased feature importance in RFs and co:
- https://github.com/scikit-learn/scikit-learn/pull/31279
- good way to leverage RFECV (compared to MDI which is likely to fail pruning noisy high cardinality features)
- more efficient than using Permutation Importance
- [name=Gael] (not here, updating about skrub)
- Reporting on skrub: we're mostly working on improving the documentation and having custom error messages that help users figure out what's wrong
- [name=Arturo]
- [DOC Rework Decision boundary of semi-supervised methods #32024](https://github.com/scikit-learn/scikit-learn/pull/32024)
- [DOC Rework StackingRegressor example and add SuperLearner #32163](https://github.com/scikit-learn/scikit-learn/pull/32163)
- Minor reviews
- [name=Stefanie]
- worked on PR [MNT Refactor and deprecate get_metadata_routing method in _MetadataRequester](https://github.com/scikit-learn/scikit-learn/pull/31695) after Adrins and Antoine's reviews
- reviewed
- [FIX (SLEP6) descriptor shouldn't override method](https://github.com/scikit-learn/scikit-learn/pull/32111)
- [DOC Rework Decision boundary of semi-supervised methods example](https://github.com/scikit-learn/scikit-learn/pull/32024)
- [name=Jérémie du Boisberranger]
- Finalized release 1.7.2
- freezes on conda-forge feedstock for windows build
- need to lower the CI timeout from 6h to maybe 2h
- Working on Callbacks with François https://github.com/jeremiedbb/scikit-learn/pull/11
- Reviewed François PRs to clean up deprecations for 1.8
- Reviewed Christian PR to clean LR deprecation for 1.8 https://github.com/scikit-learn/scikit-learn/pull/32073#pullrequestreview-3211646078
- needs a second pair of eyes
- Reviewed Guillaume's PR to improve HTML repr https://github.com/scikit-learn/scikit-learn/pull/31969
- need someone with better css knowledge to review if possible
- reverted deprecation of public murmurhash3
-
-
- [name=Emily]
- [Nystroem Array API compatibility PR](https://github.com/scikit-learn/scikit-learn/pull/29661) is ready for review. Certain utility functions are repeated from another function and I wonder if we can add them into the array API internal util file
- [D2 Brier score User Guide doc rendering](https://github.com/scikit-learn/scikit-learn/issues/32174)... how to reproduce? (@Stefanie)
- [name=Guillaume]
- Mainly some HTML related PRs.
- [name=Adrin]
- mlflow <-> skops
- some metadata routing work
- blog post on AI contributions
- couple of reviews
- [name=Shruti]
- Working on PR of [Deprecate use of probability=True in SVC and NuSVC](https://github.com/scikit-learn/scikit-learn/pull/32050) lots of tests using CustomSVC to adapt
- Working on [SAG tests](https://github.com/scikit-learn/scikit-learn/pull/31675) PR, weighted regression based convergence is not passing still but weighted classifier tests are working
- Finalising stricter gradient checks for Gaussian Processes [PR](https://github.com/scikit-learn/scikit-learn/pull/31543)
- Opened [PR](https://github.com/scikit-learn/scikit-learn/pull/31888) raising ValueError for logistic regression with high values and liblinear
- Review of LabelPropagation [PR](https://github.com/scikit-learn/scikit-learn/pull/31924) by dschult
### Discussion points
- (Stefanie) What do you read for tech information?
- (Stefanie) Re-open [Add links to examples from the docstrings and user guide](https://github.com/scikit-learn/scikit-learn/issues/30621) for sprint at PyData Paris?
- (Adrin) Categorical kmeans-like clustering: https://github.com/scikit-learn/scikit-learn/issues/32115
- (Guillaume) Sprint Developer Summit Python in Copenhagen (Scientific Python)
- (Adrin) Olivier's feature request on categorical encoding
- (Olivier) conda-forge freeze: at build time or at test time? Would pytest's `faulthandler` help?
- https://github.com/pytest-dev/pytest/pull/13679 (still not merged but 1 green review)
- (Olivier) ping Charlie or Thomas for CSS reviews?
- (Stefanie) reproduce Brier score section rendering issue
- (Jérémie) to Olivier: François not in both invites ?