Goal: internal communication on recent work and short term planning
Who: people working the maintenance of scikit-learn and related, in particular at probabl and Inria and maybe others
Frequency: every other Monday at 15:00 CET/CEST, unless it happens on the same day as the Monthly meeting.
Where: https://meet.google.com/xdm-ozyn-pgj
Meeting notes: to be archived on the scikit-learn org repo
Rules of the game:
Olivier
skrub
expressions for time series forecasting.Adrin
ClassicalMDS
: https://github.com/scikit-learn/scikit-learn/pull/31322set_config
/ get_config
doc improvement: https://github.com/scikit-learn/scikit-learn/pull/31486SequentialFeatureSelector
: https://github.com/scikit-learn/scikit-learn/pull/31483array_api_dispatch
globally in array API docs: https://github.com/scikit-learn/scikit-learn/pull/31687FunctionTransformer
feature names out: https://github.com/scikit-learn/scikit-learn/pull/31573_locally_linear_embedding
: https://github.com/scikit-learn/scikit-learn/pull/29716Guillaume
Antoine
Gaétan
Stefanie
process_routing
in example?Arturo
Dea
ReprHTMLMixin
)Shruti
Loïc
Jérémie
sklearn-ci
bot needs 2FA set-upSkrubPipeline
and TableVectorizer
could be extended to act as meta-data routers if someone is looking for a way to contribute to skrub
.Guillaume
Stefanie
Gaétan
Antoine
Adrin
generate_checks_for_instance
: https://github.com/scikit-learn/scikit-learn/pull/31480gc.collect
in DBSCAN.fit
(I say no): https://github.com/scikit-learn/scikit-learn/pull/31526cross_val_score
in SequentialFeatureSelector
: https://github.com/scikit-learn/scikit-learn/pull/31483set_config
, get_config
docstrings: https://github.com/scikit-learn/scikit-learn/pull/31486check_requires_y_none
: https://github.com/scikit-learn/scikit-learn/pull/31481FeatureHasher
, HashingVectorizer
tags: https://github.com/scikit-learn/scikit-learn/pull/31557SequantialFeatureSelector
: https://github.com/scikit-learn/scikit-learn/pull/31483fetch_california_housing
: https://github.com/scikit-learn/scikit-learn/pull/31579eps
float32
→ float64
in GradientBoosting
: https://github.com/scikit-learn/scikit-learn/pull/31575store_cv_models
option to ElasticNetCV
: https://github.com/scikit-learn/scikit-learn/pull/31545Ridge
regression example — fix typo, clarify title, add legend: https://github.com/scikit-learn/scikit-learn/pull/31539CategoricalNB().__sklearn_tags__.input_tags.categorical
to True
: https://github.com/scikit-learn/scikit-learn/pull/31556FixedThresholdClassifier
: https://github.com/scikit-learn/scikit-learn/pull/31544whats_new
entries: https://github.com/scikit-learn/scikit-learn/pull/31589_check_n_features
and _check_feature_names
and fix their usages: https://github.com/scikit-learn/scikit-learn/pull/31585Dea
ReprHTMLMixin
)Loïc
Jérémie
ClassicalMDS
: should it be its own class? https://github.com/scikit-learn/scikit-learn/pull/31322sklearn-wheels
bot was used for?Olivier
Antoine
sample_weight
in forest estimators (draft PR to come)Jérémie
Shruti
Arturo
Olivier
sklearn.tree.export_tree
.pytest-run-parallel
pytest plugin.sample_weight
fix for BaggingRegressor/Classifier
:
Arturo
Antoine
Dea
from_predictions
example and other details to visualizations.rst
(https://github.com/scikit-learn/scikit-learn/pull/30825)Gaétan
max_features
parameter on the feature importance measures: increasing its value "masks" correlated/dependent featuresGuillaume
skore
sklearn
hidimstats
to discuss topics related to Gaetan's workGael
skrub
sklearn
Loïc
_fill_or_add_diagonal
probably needs to be fixed first in a separate PRAdrin
plot_grid_search_refit_callable.py
: https://github.com/scikit-learn/scikit-learn/pull/30990nan
, SplineTransformer
: https://github.com/scikit-learn/scikit-learn/pull/28043Stefanie
Vincent
max_features
experiments.Stefanie (absent)
Olivier
pytest-run-parallel
to test free-threeding related race conditions
pytest-run-parallel
was updated to fix most of the problems discovered in my previous attempt to use it on the scikit-learn test suite@pytest.mark.parametrize
array-api-extra
and array-api-compat
#31343
vendoring
sklearn/externals/
Arturo
Guillaume
imbalanced-learn
with scikit-learn==1.7.0rc1
sklearn-compat
looking at the changelogDea
global_random_seed
. https://github.com/scikit-learn/scikit-learn/issues/22827Antoine
Adrin
examples/cluster/plot_agglomerative_clustering.py
: https://github.com/scikit-learn/scikit-learn/pull/30861/MiniBatchDictionaryLearning
example link : https://github.com/scikit-learn/scikit-learn/pull/30864feature_extraction.grid_to_graph
: https://github.com/scikit-learn/scikit-learn/pull/30916TfIdfVectorizer
: https://github.com/scikit-learn/scikit-learn/pull/30974SpectralClustering
example: https://github.com/scikit-learn/scikit-learn/pull/30978LabelSpreading
examples: https://github.com/scikit-learn/scikit-learn/pull/30553plot_manifold_sphere.py
: https://github.com/scikit-learn/scikit-learn/pull/30959plot_gpr_on_structured_data
example in gaussian_process
: https://github.com/scikit-learn/scikit-learn/pull/31150plot_nnls
example: https://github.com/scikit-learn/scikit-learn/pull/31280plot_gmm_covariances
example: https://github.com/scikit-learn/scikit-learn/pull/31249plot_sparse_cov
example: https://github.com/scikit-learn/scikit-learn/pull/31278SimpleImputer
error message for fill_value
type: https://github.com/scikit-learn/scikit-learn/pull/30828fairlearn
: https://github.com/fairlearn/fairlearn/issues/4replace_undefined_by
in scorers
np.nan
? Organised a meeting and we found a consensusreplace_undefined_by
to accuracy_score
: https://github.com/scikit-learn/scikit-learn/pull/31187cohen_kappa_score
: https://github.com/scikit-learn/scikit-learn/pull/31172sklearn-extras
, not sure what to do there.. versionadded
: https://github.com/scikit-learn/scikit-learn/pull/31320average
in precision_recall_fscore_support
: https://github.com/scikit-learn/scikit-learn/pull/31270Jérémie
Loïc
Gael Reviewing and design discussion on skrub pipelines (it's Jérome's last week)
Vincent
choices
: explicit default in choices: https://github.com/skrub-data/skrub/pull/1361Gaétan
Stefanie
Olivier
faulthandler
on this build in case it happens again in the future. The deadlock problem can be investigated later in a dedicated debug PR once this one is merged.predict_proba
values: we need to preprocess the input to extract OvR Bernoulli logits and the sigmoid calibration works really well, both for binary and multiclass classifiers:plot_det
example in the CAPCurveDisplay
PR to align with recently merged @arturo's update to this example:
Gaetan
criterion="squared_error"
.Antoine
Dea
Arturo
Emily
criterion="entropy"
instead of criterion="gini"
.jeremie
Olivier
Antoine
sample_weight
in sklearn and dependencies (scipy, numpy)Stefanie
Gael (not there)
Loïc
ResourceTracker.__del__
, new Python thing in 3.12.10 and 3.13.3 release https://github.com/joblib/joblib/issues/1708Shruti
Emily
neighbors/tests/test_neighbors.py[csr_array]
not passing (ARM workflow) in https://github.com/scikit-learn/scikit-learn/pull/30925. Passes on my local machine though…??Arturo
Vincent
Olivier
debian_32bit
failures.
[pyodide]
build failed similarly
BayesianRidge
covariance computationMLPRegressor
sample_weight
fixes
_BinMapper
used HistGradientBoosting*
(still WIP)
sample_weight=None
_weighted_percentile
https://github.com/scikit-learn/scikit-learn/pull/29431Antoine
Jérémie
from_cv_results
to RocCurveDisplay
(https://github.com/scikit-learn/scikit-learn/pull/30399). Converging on the public API. Trying to make it simple and intuitive.Shruti
Loïc
Gael
Guillaume
black
/ruff
/mypy
, or both. Some content are generated (rst from .py examples, .rst.template
) so this may catch only a subset of issues, probably good enough still
Arturo
Shruti
KBinsDiscretizer
however a bit complicated since breaking several testsStefanie
Olivier
sample_weight
meta-data routing
sample_weight
special or should we always request all metadata?
sample_weight
repetition equivalence tests
sample_weight
equivalence tests.Loïc
PYTHONNOUSERSITE
https://github.com/scikit-learn/scikit-learn/pull/31006Antoine
GradientBoosting
fails the sample_weight
equivalence checkDecisionTree
investigated with OlivierGuillaume
Adrin
LogisticRegressionCV.score
refit=callable
example and plottingJérémie
.dev0
versionsGael Varoquaux
Bump to Python 3.10 opinions are roughly split between plan 1 (oldest minor X.Y
with Python 3.10 wheels) plan 2 (oldest bugfix X.Y.Z
with Python 3.10 wheels) and "both are fine". https://github.com/scikit-learn/scikit-learn/pull/30895
Moving meta-data routing (sample weights) to more mandatory
Guillaume SearchCV
:
skrub
does some stuff in this area (wip))Adrin Copilot context hacks
Adrin SBOMs, GH's action, minimal starting point
Guillaume
joblib
) in order to propagate configuration from driver (main process) to workers: https://github.com/joblib/joblib/pull/1668
Arturo
Olivier
sample_weight
with Antoine, Adrin, Stefanie and JeremieDeprecationWarning
on manual os.fork
and fix a crash on macOS: https://github.com/joblib/loky/pull/429sample_weight
entry in glossary: https://github.com/scikit-learn/scikit-learn/pull/30564Stefanie
jeremie (off)
cpu_count
on recent windows versions and "exotic" Linux systems.Adrin
joblib
sprint workAntoine
Gaël
Vincent
joblib.Memory
Jérémie
Stefanie
Shruti
np.random.choice
(thank you Jeremie)Loïc
Dea
Antoine
Arturo
Vincent
Olivier & Shruti
sample_weight
:
Loïc
Stefanie
Arturo
Antoine
Guillaume
skore
library with brainstorming with AdrinVincent
[skrub]
[hazardous]
metrics PRs are moving forward thanks to @Antoine
Guillaume I confirm that Kagi search engine looks to have the same behaviour than Google and point out to 1.6.1
Loïc Stefanie's Github search issue: probably an alternative way to do what you want. Likely due to us switching to "new-style issues" (or whatever it is called with sub-issues)
Loïc JupyterLab/JupyterLite issue, do you have a way to reproduce?
Arturo JupyterLite crash on the scikit-learn.org/stable examples.
TypeError: _query_package() got multiple values for argument 'index_urls'