Scikit-learn bi-weekly progress status (even weeks)

Goal: internal communication on recent work and short term planning
Who: people working the maintenance of scikit-learn and related, in particular at probabl and Inria and maybe others
Frequency: every other Monday at 15:00 CET/CEST, unless it happens on the same day as the Monthly meeting.
Where: https://meet.google.com/xdm-ozyn-pgj
Meeting notes: to be archived on the scikit-learn org repo

Next meeting templates

Rules of the game:

No question during the progress reports
Add those questions in the discussion reports

Progress reports

Someone
- item
- item

Discussion points

2025-07-07

Progress reports

Olivier
- Preparing an advanced workshop on the use of skrub expressions for time series forecasting.
- Triaging this week.
- Off on Friday for two weeks (till end of July).
Adrin
- scikit-learn
  - review
    - preventing AI bots: https://github.com/scikit-learn/scikit-learn/pull/31643
    - new cbuildwheel: https://github.com/scikit-learn/scikit-learn/pull/31688
    - new blocked users (llm generating bots / users): https://github.com/adrinjalali/agents-to-block/pull/1
    - nvidia RAPIDS: https://github.com/scikit-learn/scikit-learn/pull/31682
    - ClassicalMDS: https://github.com/scikit-learn/scikit-learn/pull/31322
    - set_config / get_configdoc improvement: https://github.com/scikit-learn/scikit-learn/pull/31486
    - return final cross-validation score in SequentialFeatureSelector : https://github.com/scikit-learn/scikit-learn/pull/31483
    - Recommend setting array_api_dispatch globally in array API docs: https://github.com/scikit-learn/scikit-learn/pull/31687
    - FunctionTransformer feature names out: https://github.com/scikit-learn/scikit-learn/pull/31573
    - FIX wrong >= in error message in _locally_linear_embedding: https://github.com/scikit-learn/scikit-learn/pull/29716
    - example links
      - https://github.com/scikit-learn/scikit-learn/pull/31561
      - https://github.com/scikit-learn/scikit-learn/pull/31405
  - issues
    - copilot PRs: https://github.com/scikit-learn/scikit-learn/issues/31679
    - regression error characteristic curve: https://github.com/scikit-learn/scikit-learn/issues/31441
- RadiusClustering project inclusion: https://github.com/scikit-learn-contrib/scikit-learn-contrib/issues/71
- GH community discussion on LLMs: https://github.com/community/maintainers/discussions/559
- SLEP6 refactoring: https://github.com/scikit-learn/scikit-learn/pull/31534
Guillaume
- Nothing tangible on the scikit-learn side
Antoine
- review Fix sample weight handling in SAG(A)
- review Added sample weight handling to BinMapper under HGBT
Gaétan
- PR on feature importance is kinda stalled
- ongoing discussions with Christian on the relevance of the PR
Stefanie
- new refactoring PRs in the metadata routing module
  - MNT Refactor get_metadata_routing method in MetadataRequest to directly use _get_metadata_request instead
  - MNT little refactor and doc improvement for metadata routing consumes() methods
- reviewed
  - MNT refactoring in routing _MetadataRequester
- possibly in future
  - RouterMappingPair displays ["mapping", "router"] even if router is a request
  - in ColumnTransformer "remainder" is in the routing dict (empty) also if remainder=drop
  - use process_routing in example?
Arturo
- Announce the launch of skolar in the scikit-learn blog #217
- Change way we enumerate exercises and quizzes in the mooc #848
- Need help with finishing the clustering module of the mooc #836
Dea
- I was off for 6 days
- WIP on ENH Display Methods in HTML representation
- Opened issue Pipelines are permitted to have no steps and are displayed as fitted
- PRs ready for feedback ENH Display fitted attributes in HTML representation,
- PR ready for feedback ENH Adding a column to link each parameter docstring row in params table display 3rd-party models not supported (Didn't find the setting that Olivier mentioned in the last call regarding ReprHTMLMixin)
- Kept working on global_random_seed PR for Logistic Regression
Shruti
- Working on HGBT PR reviews: https://github.com/scikit-learn/scikit-learn/pull/29641
- Opened SAG(A) PR and having discussions on initial fix there: https://github.com/scikit-learn/scikit-learn/pull/31675
- GPR and GPC test: https://github.com/scikit-learn/scikit-learn/pull/31543 had to look into scipy.stats.derivative vs. approx_fprime
Loïc
- back from holidays
- small meson related clean-up PR https://github.com/scikit-learn/scikit-learn/pull/31718
- merged reduce wheel size reduction through Cython shared utility https://github.com/scikit-learn/scikit-learn/pull/31151
- work coming up on bot linting comment
Jérémie
- focus on 1.7.1 issues/PRs
  - last one is https://github.com/scikit-learn/scikit-learn/pull/31553
  - add to milestone stuff that needs to be in 1.7.1
- CI/maintenance

Discussion points

GPC/GPR: approximate gradient computation: isn't it possible to have an explicit symbolic expression of the gradient?
Linting/comment bot: open an issue?
- A private issue + PR to test the mechanism.
feed-back from monthly meeting about requiring 2FA at the scikit-learn org level?
- no objection, sklearn-ci bot needs 2FA set-up
1.7.1
SkrubPipeline and TableVectorizer could be extended to act as meta-data routers if someone is looking for a way to contribute to skrub.

2025-06-23

Progress reports

Guillaume
- Not much on scikit-learn
- Send contract info for Francois and Anne
- Triage week this week
Stefanie
- new PRs
  - FIX use pyarrow types in pyarrow.filter() for older pyarrow versions
  - CI Move some pip_dependencies to conda_dependencies
- reviews:
  - MNT refactoring in routing _MetadataRequester
  - MNT Deprecate metrics.pairwise.paired_*_distances and paired_distances public functions
  - and a lot of reviews on small or doc PRs
- learning on python metaclasses
Gaétan
- opened an issue on improper rendering of docstring in the documentation for attributes w/ @property
- small PR to fix a mistake in plot_forest_importance example
- more work on the feature importance PR
- more experimentation on the side
Antoine
- continue sample_weight in forest 31529
- review gatean's PR on unbiased feature importance 31279
Adrin
- Accepted Anne for an internship on Displays & maintenance on CZI, starting October
- alt3 auto routing: https://github.com/scikit-learn/scikit-learn/pull/31413
- reference to higher level functions in generate_checks_for_instance: https://github.com/scikit-learn/scikit-learn/pull/31480
- metadata routing refactoring: https://github.com/scikit-learn/scikit-learn/pull/31534
- metadata routing visualisation: https://github.com/scikit-learn/scikit-learn/pull/31535
- CI license regression: https://github.com/scikit-learn/scikit-learn/pull/31594
- issues
  - slow metric calculation: https://github.com/scikit-learn/scikit-learn/issues/31554
  - Unjustified "number of unique classes > 50%" warning in CalibratedClassifierCV https://github.com/scikit-learn/scikit-learn/issues/31583
  - Compilation "neighbors/_kd_tree.pyx" crashes on ARM: https://github.com/scikit-learn/scikit-learn/issues/31592
  - Attribute docstring does not show properly when there is a property with the same name: https://github.com/scikit-learn/scikit-learn/issues/31595
- reviews
  - Improve metadata routing docs: https://github.com/scikit-learn/scikit-learn/pull/31419/files
  - Add link to example: https://github.com/scikit-learn/scikit-learn/pull/31425
  - Add link to oob score: https://github.com/scikit-learn/scikit-learn/pull/31457
  - gc.collect in DBSCAN.fit (I say no): https://github.com/scikit-learn/scikit-learn/pull/31526
  - cross_val_score in SequentialFeatureSelector: https://github.com/scikit-learn/scikit-learn/pull/31483
  - set_config, get_config docstrings: https://github.com/scikit-learn/scikit-learn/pull/31486
  - Improve error message in check_requires_y_none : https://github.com/scikit-learn/scikit-learn/pull/31481
  - Update about us page: https://github.com/scikit-learn/scikit-learn/pull/31519
  - link to example
    - https://github.com/scikit-learn/scikit-learn/pull/31485
    - https://github.com/scikit-learn/scikit-learn/pull/31476
    - https://github.com/scikit-learn/scikit-learn/pull/31511
    - https://github.com/scikit-learn/scikit-learn/pull/31504
    - https://github.com/scikit-learn/scikit-learn/pull/31581
  - FeatureHasher , HashingVectorizer tags: https://github.com/scikit-learn/scikit-learn/pull/31557
  - Log file update PRs
  - cross validation score in SequantialFeatureSelector: https://github.com/scikit-learn/scikit-learn/pull/31483
  - Regression in DecisionBoundaryDisplay.from_estimator with colors and plot_method='contour’ https://github.com/scikit-learn/scikit-learn/pull/31553
  - avoid using fetch_california_housing: https://github.com/scikit-learn/scikit-learn/pull/31579
  - doc formatting fix: https://github.com/scikit-learn/scikit-learn/pull/31577
  - eps float32 → float64 in GradientBoosting: https://github.com/scikit-learn/scikit-learn/pull/31575
  - minor doc: https://github.com/scikit-learn/scikit-learn/pull/31570
  - responsive multi-column layout for emeritus contributors to reduce whitespace: https://github.com/scikit-learn/scikit-learn/pull/31565
  - use more modern way to specify license metadata: https://github.com/scikit-learn/scikit-learn/pull/31560
  - add store_cv_models option to ElasticNetCV: https://github.com/scikit-learn/scikit-learn/pull/31545
  - Improve Ridge regression example — fix typo, clarify title, add legend: https://github.com/scikit-learn/scikit-learn/pull/31539
  - Fix set CategoricalNB().__sklearn_tags__.input_tags.categorical to True: https://github.com/scikit-learn/scikit-learn/pull/31556
  - Fix example Recursive feature elimination with cross-validation: https://github.com/scikit-learn/scikit-learn/pull/31516
  - ENH Add multi-threshold classification to FixedThresholdClassifier: https://github.com/scikit-learn/scikit-learn/pull/31544
  - Improve older whats_new entries: https://github.com/scikit-learn/scikit-learn/pull/31589
  - fix comparison between array-like parameters when detecting non-default params for HTML representation: https://github.com/scikit-learn/scikit-learn/pull/31528
  - DOC improve doc for _check_n_features and _check_feature_names and fix their usages: https://github.com/scikit-learn/scikit-learn/pull/31585
Dea
- PR was merged FIX comparison between array-like parameters when detecting non-default params for HTML representation
- PR ready for feedback ENH Display fitted attributes in HTML representation
- PR ready for feedback ENH Adding a column to link each parameter docstring row in params table display 3rd-party models not supported (Didn't find the setting that Olivier mentioned in the last call regarding ReprHTMLMixin)
- Kept working on global_random_seed PR for Logistic Regression
- Opened issue about LaTeX https://github.com/scikit-learn/scikit-learn/issues/31593
Loïc
- Gaussian mixture with Stefanie merged
- security training
- Second reviewer for GaussianProcessRegessor regression in 1.0 https://github.com/scikit-learn/scikit-learn/pull/31431?
Jérémie
- off last week

Discussion points

Guillaume The windows CI node on Azure is bothering us. Any idea why?
- It looks that any node started with a single physical core will fail
Adrin ClassicalMDS: should it be its own class? https://github.com/scikit-learn/scikit-learn/pull/31322
Loïc require 2FA for scikit-learn organization members? Bots will need to have 2FA setup and added. Do anyone know what sklearn-wheels bot was used for?
Guillaume document property in sphinx
- I think that we can overwrite the way sphinx work there

2025-06-09

Progress reports

Olivier
- Ongoing work on the clustering lesson for the MOOC:
  - https://github.com/INRIA/scikit-learn-mooc/pull/836
  - still working on wrap-up quiz, probably anomaly detection in time series with repeated patterns.
- Review / discussion about display classes for:
  - permutation importance
  - (unbiased) Gini importance for tree-based models under implementation in #31279
- Investigating Bagging sample weight support and relationship to the dependency between the regularization parameter of LogisticRegression and Ridge regression on the number of training data points. #31414
Antoine
- investigate sample_weight in forest estimators (draft PR to come)
- reviews
  - ENH Add Friedman's H-squared
  - FEAT (alt3) allow setting auto routed strategy on objects
Jérémie
- 2nd round of interviews for the callback internship
  - We selected François Paugam
- Release highlights for 1.7.0 then released 1.7.0
- was off wednesday + Thursday
Shruti
- Investigating sag, saga and liblinear (with dual=True) solver:
  - Realised liblinear with dual is in fact fine just slower convergence
  - Read up about sag and saga to understand a bit more how the incremental aggregation of the algorithm works
  - [TO DO]: make a minimal reproducer code to open issue for sag, still not sure whether the test is good enough: https://github.com/snath-xoc/sample-weight-audit-nondet/issues/25
Arturo
- Mooc (clustering)
- scikit-learn v1.7 in jupyterlite?

Discussion points

2025-05-26

Progress reports

Olivier
- Worked on a short but updated version of our tutorial on survival analysis with hazardous with Vincent Maladiere: https://github.com/probabl-ai/survival-analysis-tutorial/
- WIP peer-reviewing/editing a new lesson on unsupervised clustering for the MOOC:
  - https://github.com/INRIA/scikit-learn-mooc/pull/836
- WIP fixing remaining thread-safety problems in our test suite and sometimes code to be able to properly test free-threading robustness
  - https://github.com/scikit-learn/scikit-learn/pull/30041
  - Fixed an actual bug in sklearn.tree.export_tree.
  - Use the pytest-run-parallel pytest plugin.
- reviewing simplified sample_weight fix for BaggingRegressor/Classifier:
  - https://github.com/scikit-learn/scikit-learn/pull/31414
Arturo
- Open issue about a ValueError being raised by plotly inside of jupyterlite (#31399), fixed really fast by Loïc (#31400)
- A bit of Skrub, a lot of Mooc.
Antoine
- sample weight in bagging in new PR #31414
- this week: reviews
Dea
- PR ENH: Display parameters in HTML representation was merged (https://github.com/scikit-learn/scikit-learn/pull/30763)
- PR TST use global_random_seed in sklearn/decomposition/tests/test_incremental_pca.py was merged (https://github.com/scikit-learn/scikit-learn/pull/31250#event-17756321873)
- PR DOC: Add from_predictions example and other details to visualizations.rst (https://github.com/scikit-learn/scikit-learn/pull/30825)
- Started adding fitted attributes to the HTML display.
- Still working on a draft PR: TST use global_random_seed in sklearn/linear_model/tests/test_logistic.py https://github.com/scikit-learn/scikit-learn/pull/31362
Gaétan
- investigated the impact of the max_features parameter on the feature importance measures: increasing its value "masks" correlated/dependent features
- Received help & input from Antoine on both theory and the PR, especially regarding sample weight testing
- Made a few tweaks to pass the checks on the CI
Guillaume
- skore
  - Presentation at Inria
  - Sealing holes
  - Improving the documentation (there is a User Guide now)
  - Reviewing PRs related to ROC visualization
  - Fixing some issues related to parallelism
- sklearn
  - Super happy about the merge of "Parameters" menu in diagram
  - Discuss with developer of hidimstats to discuss topics related to Gaetan's work
Gael
- skrub
  - reviewing (random state, consistency… small features)
  - improving docs (adding Cleaner to narrative docs)
- sklearn
  - reviewed Dea's PR on admiring parameters (which got merged, hurray)
Loïc
- scikit-learn
  - scikit-learn was selected for github secure OSS fund
  - triage
  - reviewed pairwise kernels API support
    https://github.com/scikit-learn/scikit-learn/pull/29822.
    _fill_or_add_diagonal probably needs to be fixed first in a separate PR
    (doesn't work on non C-contiguous arrays).
  - GaussianMixture array API support with Stefanie: ready for a first
    round of reviews https://github.com/scikit-learn/scikit-learn/pull/30777
  - feeling about mentioning Probabl professional support in Github comments?
    https://github.com/scikit-learn/scikit-learn/issues/31390#issuecomment-2898378489
- PyCon Italia (Wednesday-Friday)
Adrin
- improve plot_grid_search_refit_callable.py: https://github.com/scikit-learn/scikit-learn/pull/30990
- Alternative ways to implement a “default routing”:
  - https://github.com/scikit-learn/scikit-learn/pull/31401
  - https://github.com/scikit-learn/scikit-learn/pull/31413: this one has a green CI and is my preferred way of implementing it for now
- Visualising routing
- Will prepare the talk for PyConIt which is on Friday for me.
- Reviews
  - Tags, nan, SplineTransformer: https://github.com/scikit-learn/scikit-learn/pull/28043
  - OPTICS docstring update: https://github.com/scikit-learn/scikit-learn/pull/31363
Stefanie
- started looking at Christians review on SplineTransformer (PR #28043), but still in the process
- reviewed Adrins PRs on adding a strategy for auto routing (PR FEAT (alt3) allow setting auto routed strategy on objects and PR FEAT allow configuring automatically requested metadata)
- new PR on better documentation in metadata routing module: PR DOC Clarify metadata routing docs from _metadata_requests.py module
- some documentation reviews
- fairlearn: further work on PR MNT remove control_features passing into ErrorRate.load_data that was now merged
Vincent
- hazardous: prepared materials and gave a sponsor talk about survival analysis w/ Olivier. (I need to find the recording link)
- skrub:
  - Adding a random state to StringEncoder #1397
  - Coalesce dev dependencies #1404
    - CI was part of the reason why dependencies were split
  - Cleaner docs #1399 and #1407
  - Improving the documentation on how to contribute #1393

Discussion points

Olivier hidimstats discussion in Saclay next week?
- Conditional Permutation Importance make it possible to detect spurious features that are correlated with predictive features but does not bring anything new on top of the other features.
Guillaume I'm curious of potential takeaways of the max_features experiments.
Guillaume What would be the best representation of "chance level" on PR curve when it comes to cross-validation

2025-05-12

Progress reports

Stefanie (absent)
- PR MNT remove default behaviour deprecation from class_likelihood_ratios to adher to new solutions for issue Make zero_division parameter consistent in the different metric
- more work on PR Investigate GaussianMixture array API support with Loic
- getting into PR ENH Add support for np.nan values in SplineTransformer again, but didn't push yet
- reviewed PR FIX ConvergenceWarning in plot_gpr_on_structured_data and some doc reviews
Olivier
- Some reviews for the 1.7 milestone.
- Revived #30041 to be able to use pytest-run-parallel to test free-threeding related race conditions
  - pytest-run-parallel was updated to fix most of the problems discovered in my previous attempt to use it on the scikit-learn test suite
  - much lower number of changes required in our test files
  - still a few issue with some thread-unsafe fixtures (but can be tackled via configuration)
  - uses a lot of RAM when running the tests many times in parallel: not sure if this is expected or not
  - test thread-safety inspection has a ~1 min overhead when collecting the tests (before running them)
  - reported some of the remaining problems:
    - https://github.com/Quansight-Labs/pytest-run-parallel/issues?q=author%3Aogrisel
  - 148 errors (before marking compiled as safe), probably most of them caused by the use of thread-unsafe generator objects with @pytest.mark.parametrize
- Explored SBOM generation for vendored dependencies array-api-extra and array-api-compat #31343
  - using the dev version of vendoring
  - requires seperating 100% vendored libs from partial backward compat backports instead of mixing both under sklearn/externals/
  - still not enough to be PEP 770 compliant: need to move the SBOM file to the right location when generating the wheel
- Some ongoing discussions about evaluating the business case of a potential R&D colab related to robustess to training poisoning and dataset watermarking.
- Focus this week: preparing educational material on pitfalls of using sklearn models in the context of time series forecasting.
Arturo
- Sprint Unaite x probabl resulting in several doc PR's
  - merged: #31305, #31306, #31307, #31308, #31312, #31313, #31321, #31330
  - waiting for review: #31309, #31314
- Iter on DOC Update plots in Categorical Feature Support in GBDT example #31062
- Review pass on ENH add CAP curve #28972
Guillaume
- Review the work of Dea: https://github.com/scikit-learn/scikit-learn/pull/30763
  - Ready for a second review ;)
- Triage issues and make sure that upcoming PRs got some feedbacks
- ✅ Tested compatibility imbalanced-learn with scikit-learn==1.7.0rc1
  - Minimum changes due to our usage of some private API
  - Need to check what needs to be done for sklearn-compat looking at the changelog
- Focus on imbalanced classification material (tutorial @ EuroSciPy + masterclass)
- Sprint planned in Zurich on June 16 with Tim and Edo
Dea
- Kept working on issue of global_random_seed. https://github.com/scikit-learn/scikit-learn/issues/22827
- Working on Doc PR https://github.com/scikit-learn/scikit-learn/pull/30825
- Continued giving feedback to Guillaume's "Traces" course (2 issues, 1 PR)
Antoine
- back from holidays
- this week reviews and FIX sample weight in Bagging
Adrin
- short week (off Thursday/Friday)
- reviews
  - Links to examples reviewed by Stefanie, making the review much nicer and easier for me
    - deleting examples/cluster/plot_agglomerative_clustering.py: https://github.com/scikit-learn/scikit-learn/pull/30861/
    - MiniBatchDictionaryLearning example link : https://github.com/scikit-learn/scikit-learn/pull/30864
    - link to user guide in feature_extraction.grid_to_graph: https://github.com/scikit-learn/scikit-learn/pull/30916
    - link under TfIdfVectorizer: https://github.com/scikit-learn/scikit-learn/pull/30974
    - rejected a link to SpectralClustering example: https://github.com/scikit-learn/scikit-learn/pull/30978
    - improve headings in LabelSpreading examples: https://github.com/scikit-learn/scikit-learn/pull/30553
    - example reference for plot_manifold_sphere.py: https://github.com/scikit-learn/scikit-learn/pull/30959
    - link to plot_gpr_on_structured_data example in gaussian_process: https://github.com/scikit-learn/scikit-learn/pull/31150
    - Add link to plot_nnls example: https://github.com/scikit-learn/scikit-learn/pull/31280
    - link to the plot_gmm_covariances example: https://github.com/scikit-learn/scikit-learn/pull/31249
    - link to plot_sparse_cov example: https://github.com/scikit-learn/scikit-learn/pull/31278
  - reduce generated file path length: https://github.com/scikit-learn/scikit-learn/pull/31212
  - improve SimpleImputer error message for fill_value type: https://github.com/scikit-learn/scikit-learn/pull/30828
  - Metadata routing question in fairlearn: https://github.com/fairlearn/fairlearn/issues/4
  - replace_undefined_by in scorers
    - default: 0 or np.nan? Organised a meeting and we found a consensus
    - Add replace_undefined_by to accuracy_score: https://github.com/scikit-learn/scikit-learn/pull/31187
    - Add zero division handling to cohen_kappa_score: https://github.com/scikit-learn/scikit-learn/pull/31172
  - Frequency weighting k-medoids: https://github.com/scikit-learn-contrib/scikit-learn-extra/issues/179
    - We’re not really supporting sklearn-extras, not sure what to do there
  - Exposes latent mean and variance for GPCs: https://github.com/scikit-learn/scikit-learn/pull/22227
    - merged checking the API and having Antoine and Shruti’s approvals
    - Forgot .. versionadded : https://github.com/scikit-learn/scikit-learn/pull/31320
  - default value of average in precision_recall_fscore_support: https://github.com/scikit-learn/scikit-learn/pull/31270
  - Remove ellipsis from doctests: https://github.com/scikit-learn/scikit-learn/pull/31332
- callback internship shortlist
  - email to schedule times sent
Jérémie
- last reviews for 1.7
- released 1.7.0rc1
- meeting with Stefanie to prepare the interviews for the callback internship
- wrote upper bound policy in the maintainers page https://github.com/scikit-learn/scikit-learn/pull/31345
Loïc
- scikit-learn
  - GaussianMixture array API support with Stefanie: https://github.com/scikit-learn/scikit-learn/pull/30777
    - bug opened in array-api-compat with pytorch 2.7: https://github.com/data-apis/array-api-compat/issues/320
  - reviewed roc_curve array API support: https://github.com/scikit-learn/scikit-learn/pull/30878
  - merged jaccard_score array API support: https://github.com/scikit-learn/scikit-learn/pull/31204
  - use Cython 3.1 (released May 8) for free-threaded build https://github.com/scikit-learn/scikit-learn/pull/31357. Mark extension as free-threaded-compatible is left for later https://github.com/scikit-learn/scikit-learn/pull/31342
  - remove ellipsis from doctests
    https://github.com/scikit-learn/scikit-learn/pull/31332. Rely on
    scipy-doctest floating point comparison.
    - scipy-doctest bug in dict comparisons: https://github.com/scipy/scipy_doctest/issues/195
  - conda-lock update to 3.0.1: https://github.com/scikit-learn/scikit-learn/pull/31333
  - Use PYTHON_GIL=0 only at test time to avoid interference with conda: https://github.com/scikit-learn/scikit-learn/pull/31341
  - add missing meson generator for a few extensions: https://github.com/scikit-learn/scikit-learn/pull/31346
- joblib
  - add note about joblib.Memory security considerations: https://github.com/joblib/joblib/pull/1722
- need to prepare my PyCon Italia talk in roughly two weeks: "PyPI in the face: running jokes that PyPI download stats can play on you"
Gael Reviewing and design discussion on skrub pipelines (it's Jérome's last week)
- Iterating on names and API
Vincent
- hazardous: AISTATS!
- skrub: many discussions on the skrub pipeline
  - How do we subsample inputs: https://github.com/skrub-data/skrub/pull/1328
  - Adding an optional default to choices: explicit default in choices: https://github.com/skrub-data/skrub/pull/1361
  - Document everything: https://github.com/skrub-data/skrub/pull/1365, https://github.com/skrub-data/skrub/pull/1363
Gaétan
- Mainly worked on adding unbiased feature importance to tree ensemble built with criterion other than MSE and gini i.e. entropy, friedman mse.
- added support for sample weights
- included unbiased feature importance methods to GradientBoosting

Discussion points

Vincent Could you remind why do we need to SBOM generation for vendored dependencies?
- Olivier in case I leave the meeting earlier: I put some motivations in the description of the draft PR itself: https://github.com/scikit-learn/scikit-learn/pull/31343

2025-04-28

Progress reports

Stefanie
- on vacation last week
- reviewed DOC Merge plot_svm_margin.py and plot_separating_hyperplane.py into plot_svm_hyperplane_margin.py and DOC Add link to plot_gmm_pdf.py in GaussianMixture examples
- preview this week:
  - back at PR ENH Add support for np.nan values in SplineTransformer
  - more PRs on undefined classification metrics (issue Make zero_division parameter consistent in the different metric
    - PR ENH Add zero division handling to cohen_kappa_score
      - ready for review
    - PR ENH Add replace_undefined_by to accuracy_score
      - reviewed, will get back to it this week
Olivier
- Triaging week:
  - investigated and closed as not planned a few reports and feature requests.
  - Others seemed legit and tried to guive guidance to contributors and/or pinged other maintainers.
    - New reason to deprecate SVC(probability=True) #31222
    - randomized eigsh: #31246
    - volunter to expand IterativeImputer to support categorical features #31219
- Investigated free-threading CI build:
  - https://github.com/scikit-learn/scikit-learn/pull/31263
  - I am surprised Cython would not complain at build time previous and why it does not fail on regular Python builds.
  - I observed a deadlock on the first commit of this PR. I enabled faulthandler on this build in case it happens again in the future. The deadlock problem can be investigated later in a dedicated debug PR once this one is merged.
- Investigated and fixed unstable test for spectral clustering:
  - https://github.com/scikit-learn/scikit-learn/pull/31262
- Confirmed what I suspected when reviewing the temperature calibration PR:
  - Our sigmoid calibration is broken when the input is predict_proba values: we need to preprocess the input to extract OvR Bernoulli logits and the sigmoid calibration works really well, both for binary and multiclass classifiers:
  - Will soon open a PR, in the mean time here is my branch.
- Following Gaetan's progress on unbiased feature importance on random forests. Draft PR should be up soon.
- Adapted the plot_det example in the CAPCurveDisplay PR to align with recently merged @arturo's update to this example:
  - #28972
- Gave some feedback/reviews on array API PRs but not as much as I wanted.
- Submitted a presentation on probabilistic regression for PyData Paris 2025
- Starting to this about teaching material for time series forecasting. Will probably focus on feature engineering and evaluation of model predictions.
Gaetan
- Implemented two methods from the litterature to remove the cardinality bias in MDI computations for RFs, by leveraging oob samples. Available for classification and for regression with criterion="squared_error".
- adapted the test suite to use these methods. Added tests that make sure they match the methods from the papers and that they recover classical MDI on inbag samples.
- Did a bit of memory and CPU profiling to make sure that the computational overhead is managable on large datasets compared to the cost of fitting.
- Draft PR soon
- Still need to choose one method over the other (e.g. find a statistical criterion that may favor one), to add support for sparse input, to implement sample weights
Antoine
Dea
- Working on "Improve tests by using global_random_seed fixture to make them less seed-sensitive" https://github.com/scikit-learn/scikit-learn/issues/22827 - 3 PRs merged, 1 open: https://github.com/scikit-learn/scikit-learn/pull/31250
- Opened PR on Guillaume's Traces-sklearn
- Working on tests for (Display parameters in HTML representation) https://github.com/scikit-learn/scikit-learn/pull/30763
Arturo
- Some reviews
- Got my PR ENH/FIX/DOC add drop_intermediate to DET curve and add threshold at infinity #29151 merged!
- Opened #31225 (merged!) and #31238 to iterate on visualization tools' docstrings
Emily
- do we close the stateless tag fix? (depending on if it's going out in 1.7 or 1.6.2?)

Discussion points

Olivier stateless tag fix
Arturo Best way to visualize benchmarks? DOC Update plots in Categorical Feature Support in GBDT #31062
Dea Olivier added that my PR (HTML Display) https://github.com/scikit-learn/scikit-learn/pull/30763 will close https://github.com/scikit-learn/scikit-learn/issues/21266 But Guillaume thinks this should be on another PR
Olivier Unbiased feature importance for classification: same problem with criterion="entropy" instead of criterion="gini".

2025-04-14

Progress reports

jeremie
- triage
  - reviewed+merged several PRs adding global_random_seed. A few are still open.
  - reviewed+merged some old bug fix PRs. A lot are still open :/
  - reviewed+merged some API change PRs (deprecations and renamings) to have them in 1.7
- clean-ups for 1.7
  - sample weights in metadata routers https://github.com/scikit-learn/scikit-learn/pull/31119
  - old tags https://github.com/scikit-learn/scikit-learn/pull/31134
    @guillaume ? :)
  - remainder col type of column transformer https://github.com/scikit-learn/scikit-learn/pull/31167
- reviews for 1.7, going through the milestone
- reviewed applications for the callbacks internship and left comments and selected the ones I'd interview in priority.
Olivier
- Ongoing CAP curve review:
  - https://github.com/scikit-learn/scikit-learn/pull/28972
- Getting familiar with the new skrub expressions
  and reporting UX problems found along the way:
- Reviewed unmerged SciPy PRs that attempted to introduce systematic weight support and testing in SciPy.
  - https://github.com/scipy/scipy/pull/6931
- Discussed literature with a new intern (Gaetan) on
  unbiased MDI computation with the help of out-of-bags data point.
  - naively computing MDI on test data-point is still biased towards higher cardinality features.
  - 2 methods implement some form of cross-MDI that take the class frequencies of in-bag and out-of-bag datapoints at each node of the trees to estimate unbiased alternative to MDIs that should still be cheap enough to compute.
  - draft PR should be open soon to share findings.
Antoine
- PR FIX Use sample weight to draw samples in Bagging estimators
- discussion with @snath-xoc on testing the various semantics of sample_weight in sklearn and dependencies (scipy, numpy)
Stefanie
- PR Investigate GaussianMixture array API support
  - fix Array API support for covariance_types "full", and "spherical"
  - sample() method uses numpy random number generation
  - question on init params (espechially weights_init) following X (but can't follow on same device)
- undefined metric tasks
  - PR ENH Add zero division handling to cohen_kappa_score –> ready for review
  - PR ENH Add replace_undefined_by to accuracy_score –> ready for review
    - seen that with normalize=False float is returned (docs say int is returned): fix or deprecation?
- PR MNT Improve exception handling for invalid labels in cohen_kappa_score –> ready for review
- PR DOC add metadata_routing.rst to User Guide sidebar
- doc reviews: #31104, #31150, #31150
- issue Fix ConvergenceWarning in plot_gpr_on_structured_data.py example
Gael (not there)
- skrub sprint
- added a tiny example demonstrating skrub pipelines to the front page https://skrub-data.org/dev/index.html
Loïc
- scikit-learn
  - investigated weird dependabot update https://github.com/pypa/cibuildwheel/issues/2348. Was not able to reproduce.
  - WIP GaussianMixture array API support with Stefanie https://github.com/scikit-learn/scikit-learn/pull/30777
  - Work-around for lack of pandas Windows free-threaded wheel
    https://github.com/scikit-learn/scikit-learn/pull/31159. pandas issue: https://github.com/pandas-dev/pandas/issues/61242 and contributed tweak https://github.com/pandas-dev/pandas/pull/61248
  - investigating Windows long path issues when building from source https://github.com/scikit-learn/scikit-learn/issues/31123 and https://github.com/scikit-learn/scikit-learn/issues/31149
- joblib:
  - fix memmap reducing when base array is 1d https://github.com/joblib/joblib/pull/1704. Approved by Thomas but CI is red because next bullet point.
  - debug with Franck unraiseable exception during ResourceTracker.__del__, new Python thing in 3.12.10 and 3.13.3 release https://github.com/joblib/joblib/issues/1708
Shruti
- Reviewing literature of quantile estimation and weights
- Working on BinMapper PR: https://github.com/scikit-learn/scikit-learn/pull/29641
- Review and resolving numerical instabilities in Conrad Steven's PR: https://github.com/scikit-learn/scikit-learn/pull/30482
- Add example model tests for TProcess: https://www.sfu.ca/~ssurjano/piston.html
Emily
- Unit testneighbors/tests/test_neighbors.py[csr_array] not passing (ARM workflow) in https://github.com/scikit-learn/scikit-learn/pull/30925. Passes on my local machine though…??
- Caught up on comments and reviews from Lucy's contributions on my existing PRs
Arturo
- Reviewing DOC Add drawings to demonstrate Pipeline, ColumnTransformer, and FeatureUnion #30740
- Opened DOC Update plots in Categorical Feature Support in GBDT #31062
- DOC Rework voting classifier example #30985 waiting for second reviewer
- FIX thresholds in DET curve to represent Non-informative classifiers #29151 waiting for decision
Vincent
- [skrub] Released skrub expressions (aka skrub pipeline). Focusing on UX and documentation.
- [skrub] Working on l2 normalization for our encoders.

Discussion points

Olivier can the loky problem be reproduced with the resource tracker implementation in the Python standard library?
- Loïc I don't know how to start on this … Olivier said resource tracker is used in concurrent.futures not multiprocessing.Pool

2025-03-31

Progress reports

Olivier
- CI investigation/mitigation on debian_32bit failures.
  - https://github.com/scikit-learn/scikit-learn/issues/31098
  - root cause in scipy still need to investigated
  - 32 bit linux is not that interesting but is a useful proxy for WASM.
  - but I have not checked if the [pyodide] build failed similarly
    - Pyodide is fine: https://github.com/scikit-learn/scikit-learn/actions/runs/14163691264. I am guessing because scipy is more recent inside Pyodide
- Various reviews:
  - BayesianRidge covariance computation
  - non-metric MDS
  - temperature scaling (still need to follow-up):
    - https://github.com/scikit-learn/scikit-learn/issues/28574
  - poisson loss for MLPRegressor
    - https://github.com/scikit-learn/scikit-learn/pull/30712/
- sample_weight fixes
  - _BinMapper used HistGradientBoosting* (still WIP)
    - https://github.com/scikit-learn/scikit-learn/pull/29641
    - behavior change when fitting with sample_weight=None
  - Merged score-based testing tool:
    - https://github.com/snath-xoc/sample-weight-audit-nondet/pull/18
    - https://github.com/scikit-learn/scikit-learn/issues/16298#issuecomment-2758394115
    - a bit less sensitive for models that are both transformers and clustering algorithm
- array API
  - _weighted_percentile https://github.com/scikit-learn/scikit-learn/pull/29431
- this week:
  - triaging
Antoine
- review FIX Fix multiple severe bugs in non-metric MDS
- PR FIX Covariance matrix in BayesianRidge
- will work on scipy reproducer for CI failure 31098
Jérémie
- review maintenance PRs, mostly Loic's :)
- review adding from_cv_results to RocCurveDisplay (https://github.com/scikit-learn/scikit-learn/pull/30399). Converging on the public API. Trying to make it simple and intuitive.
- small design discussion with Guillaume for skore timings reports.
- draft CZI internship sheet about callback API
- Started cleaning-up the deprecations for 1.7
Shruti
- _BinMapper PR: https://github.com/scikit-learn/scikit-learn/pull/29641
- Investigating sample weight audit tool for clustering algorithms
- Looking at comparison of Gaussian Process regression to GPytorch
Loïc
- scikit-learn
  - wrote a summary about our bumping dependencies rule
  - scikit-learn doc example JupyterLite now uses scikit-learn dev
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
    https://github.com/scikit-learn/scikit-learn/pull/29791 for uploading
    Pyodide scikit-learn dev wheel to anaconda.org.
    https://github.com/scikit-learn/scikit-learn/pull/31085 to use Pyodide
    dev wheel inside JupyterLite.
  - WIP GaussianMixture array API support https://github.com/scikit-learn/scikit-learn/pull/30777
  - set up codespaces for sprints: https://github.com/scikit-learn/scikit-learn/issues/31091
  - Mention Github security advisory in our security policy: https://github.com/scikit-learn/scikit-learn/pull/31082
  - Bump pyamg by following our bumping rules: https://github.com/scikit-learn/scikit-learn/pull/31089
  - Fix minimum Python version in advanced installation doc: https://github.com/scikit-learn/scikit-learn/pull/31081
  - Fix issues found by sphinx-lint (merged): https://github.com/scikit-learn/scikit-learn/pull/31114
  - Use explicit permissions for all Github workflows completed: https://github.com/scikit-learn/scikit-learn/issues/30702
  - reviewed/merged a few maintenance PRs by DimitriPapadopoulos
- JupyterLite
  - hacking session with Jérémy Tuloup https://github.com/jupyterlite/jupyterlite/issues/1582
- joblib:
  - simplify free-threading CI setup https://github.com/joblib/joblib/pull/1697
  - opened a few issues to help tracking intermittent CI issues
    https://github.com/joblib/joblib/issues?q=sort%3Aupdated-desc is%3Aissue is%3Aopen intermittent author%3Alesteve
- Probabl certification
  - 2 PRs opened + 1 issue about COOP/COEP headers
Gael
- Skrub front page cosmetics
- Email exchange with nvidia
Guillaume
- triage week last week
- TODO:
  - Review Dea PR
  - Second review for Lucy PR after Jérémie first review

Discussion points

Guillaume EuroSciPy proposal for scikit-learn?
- Tutorial on imbalanced classification
- skrub tutorial by Jerome
Olivier CFP PyData Paris
- Olivier: probabilistic regression
Gael Should we add sphinx-lint to our tests? (not a required test, something to help contributors)
- that is a possibility, either a pre-commit or an additional linter step like black/ruff/mypy, or both. Some content are generated (rst from .py examples, .rst.template) so this may catch only a subset of issues, probably good enough still
  - +1 (Jérémie)
  - +1 for both (Olivier)
  - Suggestion to feed the output to the comment bot

2025-03-17

Progress reports

Arturo
- DOC Rework voting classifier example #30985
Shruti
- Working on test using scores for sample weight invariance in non-deterministic estimators PR #18.
- Working on #53593 (Binmapper in HGBT) PR, similar to KBinsDiscretizer however a bit complicated since breaking several tests
- Started mini comparison of scikit-learn Gaussian Processes vs. GPytorch, maybe something to try out skore with?
Stefanie
- fix testing and routing for dynamic method selection in PR FEA Add metadata routing through predict methods of BaggingClassifier and BaggingRegressor
- fairlearn PR MNT Refactor _validate_and_reformat_input
- fairlearn PR MNT Narwhalify reductions.ErrorRate
- a lot of doc reviews
Olivier
- First pass of review on default policy sample_weight meta-data routing
  - https://github.com/scikit-learn/scikit-learn/pull/30946
  - are sample_weight special or should we always request all metadata?
    - if yes, this could be a justification to release scikit-learn 2.0 if this causes breaking changes?
    - would require to study impact on downstream libraries to make a decision
- Investigated with Antoine why decision trees do not pass the sample_weight repetition equivalence tests
  - rounding errors cause different choices of splits with near-tied impurity improvements when running the stochastic tests
  - rounding errors cause small but non-uniform offsets in leaf values in shallow trees (10 to 100x machine epsilon)
  - need to follow-up with opening issues to document the bias induced by the current code when the data leads to (exact or near) tied splits
    - current handling of tied splits leads to hard to understand inductive bias when inspecting the prediction function of learned decision trees
  - potential solution would involve randomized near-tie breaking in trees and then our stochastic test sample_weight equivalence tests should pass
  - alternatively we would need to change the way we write the sample_weight equivalence tests.
- A few reviews to get loky 3.5.0 out.
- Science: I read the TabPFNv2 paper and now reading the TabICL preprint.
Loïc
- last week: open-source in academia conference + scikit-learn triage
- scikit-learn
  - bump to Python 3.10 https://github.com/scikit-learn/scikit-learn/pull/30895
  - WIP GaussianMixture array API support with Stefanie: https://github.com/scikit-learn/scikit-learn/pull/30777
  - fix CircleCI doc lock-files update with PYTHONNOUSERSITE https://github.com/scikit-learn/scikit-learn/pull/31006
- scikit-learn MOOC: Fix Binder link with js on page load (merged): https://github.com/INRIA/scikit-learn-mooc/pull/807
Antoine
- investigated why GradientBoosting fails the sample_weight equivalence check
- root cause is tied splits in DecisionTree investigated with Olivier
- meeting with Olivier, Shruti and Jérémie on adapting the statistical test
- this week: metadata routing and multiclass brier score
Guillaume
- last week:
  - open-source in academia conference
  - some discussions with Dea and Lucy
- next week:
  - PyData Milan meetup
Adrin
- https://github.com/scikit-learn/scikit-learn/pull/30859 for LogisticRegressionCV.score
- https://github.com/scikit-learn/scikit-learn/pull/30946 for default routing
- https://github.com/scikit-learn/scikit-learn/pull/30990 for a better refit=callable example and plotting
- Triage this week
Jérémie
- Set up automated release with github actions for threadpoolctl and loky
  - publish to test pypi on push to main for .dev0 versions
  - publish to pypi when a tag is pushed for actual releases
  - still need to manually release on conda-forge
- released threadpoolctl 3.6.0
- released loky 3.5.0
Gael Varoquaux
- Went over scikit-learn + ColumnTransformer / pandas questions on stackoverflow
- PRs/contributions to list skrub in various parts of the ecosystem (pandas, polars, scikit-learn)

Discussion points

Bump to Python 3.10 opinions are roughly split between plan 1 (oldest minor X.Y with Python 3.10 wheels) plan 2 (oldest bugfix X.Y.Z with Python 3.10 wheels) and "both are fine". https://github.com/scikit-learn/scikit-learn/pull/30895
Moving meta-data routing (sample weights) to more mandatory
- The challenge is when we add metadata routing to something, eg a scoring, in which case it leads to a change in statistical behavior
Guillaume SearchCV:
- I'm under the impression that we should improve the validation curve
- Still thinking about the parallel coordinate plot (skrub does some stuff in this area (wip))
Adrin Copilot context hacks
Adrin SBOMs, GH's action, minimal starting point

2025-03-03

Progress reports

Guillaume
- Working upstream (joblib) in order to propagate configuration from driver (main process) to workers: https://github.com/joblib/joblib/pull/1668
  - Hitting issue with backward compatibility for people that already implement the trick, e.g. scikit-learn
Arturo
- Update wikipedia article for scikit-learn
Olivier
- Design session on default metadata routing for sample_weight with Antoine, Adrin, Stefanie and Jeremie
- joblib / sprint
  - refactored the loky process spawning to avoid raising the DeprecationWarning on manual os.fork and fix a crash on macOS: https://github.com/joblib/loky/pull/429
  - revived enabling faulhandler for loky workers by default https://github.com/joblib/loky/pull/419
  - we should be able to release loky soon
  - several reviews/merges and more to come
- a bit of follow-up on:
  - sample_weight entry in glossary: https://github.com/scikit-learn/scikit-learn/pull/30564
  - issue with setting the number of threads in gradient boosting https://github.com/scikit-learn/scikit-learn/issues/30662
Stefanie
- scikit-learn
  - swapped test data for tests in PR MNT _weighted_percentile supports np.nan values
- joblib
- reviewed
  - https://github.com/scikit-learn/scikit-learn/pull/30876
  - https://github.com/scikit-learn/scikit-learn/pull/30882
jeremie (off)
- joblib sprint -> loky sprint
  - migrated CI from azure pipelines to github actions. faster than expected
  - dropped support for PyPy, Python 3.7 and Python 3.8.
  - fixed cpu_count on recent windows versions and "exotic" Linux systems.
- triage
- attended drafting sample_weight × metadata routing
Adrin
- A bit of joblib sprint work
- numpy.unique merged: https://github.com/numpy/numpy/pull/26018
- work on metadata routing in scikit-learn
  - default routing
  - GS routing of sample weight
- https://github.com/adrinjalali/agents-to-block to collect accounts people want to block (LLM spam)
Antoine
- FIX Forward sample weight to the scorer in grid search ready for review/merge
Gaël
- [skrub] improved memory and time of skrub.StringEncoder 2x memory, 1.5x time
- [skrub] gave a vscode only presentation on skrub, including the latest features (Recipe / Expressions)
Vincent
- [skrub] Extensively test the expression API/Recipe to catch some gotchas and bugs
  - Iterate on the documentation of the Recipe
- [hazardous] iteration on the documentation of the C-index PR

Discussion points

Olivier Wikipedia editing
Adrin balancing priorities, comms with the outside, expectations from other maintainers
Olivier inconsistent handling of location in joblib.Memory
Gael what can we learn / borrow from the numpy review process? How do we change?
- One good reviewer sufficient for merge
  - Suggestion: define what need two reviews (eg no changelog => 1 review sufficient; new API => 2 reviews needed). Need more trust of reviewer
- Maybe introducing a time boundary after which only one maintainer is allowed to merge.
- PR triaging, close more PR
  - Revived triaging meeting to triage PR and close some to help focus attention

2025-02-17

Progress reports

Jérémie
- sample weight debugging
  - For MiniBatchKMeans. Several issues in different parts of the estimator (fit, convergence checking, …)
    https://github.com/scikit-learn/scikit-learn/pull/30751
    Raises questions regarding the equivalence properties we want.
  - For the statistical test. Spurious pvalue constant to 1
    https://github.com/snath-xoc/sample-weight-audit-nondet/issues/14
Stefanie
- PR FEA Add metadata routing through predict methods of BaggingClassifier and BaggingRegressor
- fairlearn: PR DOC add example for using ErrorRate
- reviewed several DOC PRs
Shruti
- Fully back from teaching in Cape Town (put out a good work)
- Working on expanding sample weight testing to clustering algorithms and using a score based equivalence test: https://github.com/snath-xoc/sample-weight-audit-nondet
- Implemented fix for spurious 1 values found due to construction of np.random.choice (thank you Jeremie)
- PR https://github.com/scikit-learn/scikit-learn/pull/30751, need to further discuss issues with scaling of weights, sometimes optimisation problem is not well-defined
Loïc
- Use OpenML download URL from OpenML metadata: https://github.com/scikit-learn/scikit-learn/pull/30708
- Download parquet file from OpenML in example: https://github.com/scikit-learn/scikit-learn/pull/30824
- inputs on public website analysis https://github.com/scikit-learn/scikit-learn/issues/30815 ?
- WIP GaussianMixture Array API wih Stefanie still ongoing: https://github.com/scikit-learn/scikit-learn/pull/30777
- JupyterLite:
  - issue with polars and parquet file: https://github.com/pola-rs/polars/issues/20876
  - issue with CSV and polars.read_csv https://github.com/jupyterlite/jupyterlite/issues/1576
- sphinx-gallery API doc duplicated links to example:
  https://github.com/scikit-learn/scikit-learn/pull/30822 and
  https://github.com/skrub-data/skrub/pull/1239
- Github actions for arm64 CI (not using CirrusCI anymore): https://github.com/scikit-learn/scikit-learn/pull/30797
- joblib triage in preparation of the sprint 26-27 @ Inria Paris
- mybinder.org may become more stable in the future: https://github.com/scikit-learn/scikit-learn/pull/30697#issuecomment-2659881848
Dea
- Comments welcome here (get_params() html): PR https://github.com/scikit-learn/scikit-learn/pull/30763
- Working on https://github.com/scikit-learn/scikit-learn/pull/30846
Antoine
- still investigating sample_weight and metadata routing
- found two issues 30818 and 30817
Arturo
- Experimented a bit with stratify on X: issue #26821
Vincent
- [skrub] iter on Recipe

Discussion points

2025-02-03

Progress reports

Olivier & Shruti
- Progress on comprehensive deterministic and stochastic estimator testing for correct use of sample_weight:
  - https://github.com/snath-xoc/sample-weight-audit-nondet/blob/main/reports/sklearn_estimators_sample_weight_audit_report.ipynb
  - Still need proper way to test clustering algorithms and simpler handling of transformers
  - Summary:
```
✅ 19 passed the deterministic test
❌ 4 failed the deterministic test
✅ 14 passed the statistical test
❌ 17 failed the statistical test
❌ 5 other errors
⚠ 112 lack sample_weight support
```
  - Next: plan to give feedback on stratification, array API, display PRs issue/PR:
    - calibration binning / uncertainty https://github.com/scikit-learn/scikit-learn/issues/30664
  - Will come to Paris soon (Wednesday and Thursday)
Loïc
- will be in Paris Tuesday - Thursday
- Remove 10 year old tutorial links (1 more approval needed) https://github.com/scikit-learn/scikit-learn/pull/30724
- Use OpenML dataset description for download URL: https://github.com/scikit-learn/scikit-learn/pull/30708
- metrics always return Python floats Jérémie's PR (merged): https://github.com/scikit-learn/scikit-learn/pull/30575
- end of the OpenML saga (still using scikit-learn/examples for one parquet file)
- indexing of older versions doc by search engines was due to switch to sphinx-pydata-theme. It seems to have been fixed by using canonical link (Tim's PRs)? Double-check with your favourite search engine!
- social media links have been updated
- gave opinions on scikit-image JupyterLite interactive doc https://github.com/scikit-image/scikit-image/pull/7644 and moving sphinx-gallery JupyterLite functionality to jupyterlite-sphinx https://github.com/sphinx-gallery/sphinx-gallery/issues/1427
- joblib security reports on Huntr opened by the same person. 3 marked as spam (not by me), replied to the last one
- joblib sprint @ Inria Paris wed. 26/ thu. 27 February. Started to collect good issues in a Github project, feel free to add some!
Stefanie
- ENH Array API support for confusion_matrix converting to numpy array
  - second suggestion of how to solve pandas extension dtype failure
- some doc reviews
- issue at Github about inconsistent search functionality
- continued with Guillaumes Traces course (trees and bagging)
Arturo
- Helped contributor of #30740 DOC Add drawings to demonstrate Pipeline, ColumnTransformer, and FeatureUnion with her setup
- Jupyterlab kernel crash after page refresh
Antoine
- fix sample weight in GridSearch
  - draft PR forward sample weight to the scorer https://github.com/scikit-learn/scikit-learn/pull/30743
  - need to investigate when metadata is enabled
- reviews hazardous
Guillaume
- mainly work on skore library with brainstorming with Adrin
- attended FOSDEM
Vincent
[skrub]
- Released 0.5.1 (adding StringEncoder and fixes for the datasets fetcher)
- P16 conference in Paris to collect feedback about TableReport, tabular_learner and the recipe
- Testing the recipe on a few examples, we want to release this thing soon
[hazardous]
metrics PRs are moving forward thanks to @Antoine
- Slight revamp of the C-index metric
- Enhance the accuracy in time

Discussion points

Guillaume I confirm that Kagi search engine looks to have the same behaviour than Google and point out to 1.6.1
Loïc Stefanie's Github search issue: probably an alternative way to do what you want. Likely due to us switching to "new-style issues" (or whatever it is called with sub-issues)
Loïc JupyterLab/JupyterLite issue, do you have a way to reproduce?
Arturo JupyterLite crash on the scikit-learn.org/stable examples.
- error in the JS console of firefox when running the first cell of an example with import statements TypeError: _query_package() got multiple values for argument 'index_urls'

Scikit-learn bi-weekly progress status (even weeks)

Next meeting templates

Progress reports

Discussion points

2025-07-07

Progress reports

Discussion points

2025-06-23

Progress reports

Discussion points

2025-06-09

Progress reports

Discussion points

2025-05-26

Progress reports

Discussion points

2025-05-12

Progress reports

Discussion points

2025-04-28

Progress reports

Discussion points

2025-04-14

Progress reports

Discussion points

2025-03-31

Progress reports

Discussion points

2025-03-17

Progress reports

Discussion points

2025-03-03

Progress reports

Discussion points

2025-02-17

Progress reports

Discussion points

2025-02-03

Progress reports

Discussion points

Read more

OS Team Review & Planning (odd weeks)

EuroScipy 2024 - probablistic machine learning and optimal decision making under uncertainty

scikit-learn 1.5.0 social media

Je fait des fautes de grammair