October 28th - HackMD

# October 28th ## Present ## Agenda ### The 0.22 release - RC - What's new fixes - Release highlights [#15152](https://github.com/scikit-learn/scikit-learn/issues/15152) - Build/packaging issues * cython & sdist [#15335](https://github.com/scikit-learn/scikit-learn/pull/15335) Cythonized souces shouldn't be included in the sdist, and allow people to have different versions of cython. It's done in Numpy. It also causes an issue when it was cythonized using openmp and sdist installed on a system w/o openmp. - To get in: - [#15274](https://github.com/scikit-learn/scikit-learn/pull/15274) (support multiclass in `_ThresholdScorer`). It's a simple PR, needs reviews. It's a blocker. - plot_precision_recall_curve [#14936](https://github.com/scikit-learn/scikit-learn/pull/14936). Goal is to release more than one plot API instnace. Needs review. This has higher priority than confusion matrix. - plot_confusion_matrix [#15083](https://github.com/scikit-learn/scikit-learn/pull/15083). Needs review. - Column Selector [#12371](https://github.com/scikit-learn/scikit-learn/pull/12371). Getting in, easy. - some things we resolved about encoders [#15050](https://github.com/scikit-learn/scikit-learn/pull/15050), [#14984](https://github.com/scikit-learn/scikit-learn/pull/14984). These are fixing up user traps for string-based categories (released 0.19). Need reviews, not blockers. - Use `FutureWarning` instead of `DeprecationWarning` to avoid our custom filter which makes people angry [#15080](https://github.com/scikit-learn/scikit-learn/pull/15080). Needs to be merged if we're all okay with it. We're okay with it. - [#9250](https://github.com/scikit-learn/scikit-learn/issues/9250) (make modules private). Almost done. - [#15303](https://github.com/scikit-learn/scikit-learn/pull/15303) (should we remove `pos_label` in `plot_roc_curve` since we do not have a `pos_label` parameter in `roc_auc_score`?). We don't infer `pos_label` authomatically in some places, and do in other places. So we're not consistent with it. There is some logic behind the incosistency. We should still discuss. It's critical, needs discussion. - [#15164](https://github.com/scikit-learn/scikit-learn/pull/15164) `RidgeClassifier`, why do we need it? Answer: we have a bunch of classifiers that don't have user manuals. This fixes one of them. For instance RidgeClassifier works well with least square loss. Olivier has a very clear idea why it's usuful. - Blocker Bug in iterative imputer (it doesn't converge with RF) [#14338](https://github.com/scikit-learn/scikit-learn/issues/14338) - It's experimental - Not a blocker - [#15010](https://github.com/scikit-learn/scikit-learn/pull/15010) Missing indicator in KNNImputer - [#14028](https://github.com/scikit-learn/scikit-learn/pull/14028) dataframe support in PDP * Big changes are introduced in the new release (deprecated paths, etc). We should probably make it known well in advance. Tweet about it?? Mention it in the release highlights? ### The next release Top priorities for 0.23 * Blog, tweet, etc., especially for the release. * we could have a blog.scikit-learn.org * release highlight examples can be moved to blog * it also encourages people who are good writers but don't want to send PRs, and stil be active in the community. - Towards fixing the random_state statefulness issue: [#15177](https://github.com/scikit-learn/scikit-learn/pull/15177). Note that this causes a **bug** if we warm-start successive halving. Splitters don't have a fit method. * Successive halving * DataFrame-related things: feature names and categoricals, etc. * Bringing the open SLEPs to vote! * (I'd also like to see GLMs, but I don't see myself being able to review) - NamedArray - [#14315](https://github.com/scikit-learn/scikit-learn/pull/14315) - What is the goal of this feature? @adrinjalali - Move forward with [`_validate_data` SLEP](https://github.com/scikit-learn/enhancement_proposals/pull/22) - Freezing [#8370](https://github.com/scikit-learn/scikit-learn/issues/8370). - GLMS [#14300](https://github.com/scikit-learn/scikit-learn/pull/14300)!!! * confirm approach to openmp [#15174](https://github.com/scikit-learn/scikit-learn/pull/15174). The behavior is changed that if openmp is no there, it still builds successfully. It only warns in sklearn import time. The main issue is MacOS, and the warning links to instructions on how to install on MacOS so that the warning is gone. The original reason for this was that it was hard during Sprints, and that issue is fixed now. Alternatively, we could reorganize the installation instructions in the dev guide. ### Mid-term goals * It seems from issues being raised recently that people are increasingly using Pipeline infrastrucutre, thanks perhaps to `ColumnTransformer`... It makes things like [sample props](https://github.com/scikit-learn/enhancement_proposals/pull/16) more important. - Lot of testing in threadpoolctl - ongoing: towards a [scikit-learn MOOC](https://github.com/lesteve/scikit-learn-tutorial). The plan is to have enverything on github and open license, including the videos. The interface and the content (primarily) will be in English. The audience of this effort may be people who are less technically savvy than Andy's audience. But there's overlap with Andy's past efforts. - Make sure CI fails on deprecation/future warnings. The public/private deprecations warnings aren't all caught by `pytest`. We'll need more thorough manual checks. Mostly done, some PRs need to be merged. Sometimes CI doesn't fail if there are some new deprecation warnings. [#15381](https://github.com/scikit-learn/scikit-learn/issues/15381) ### Project management * There's enough activity on the repo that we could use a little bit of project management. Reviewing the roadmap, watching the issues/PRs, and reminding people. * Module labels: module:ensemble - we use do have this, it was not useful, maybe it is now? (andy) ### Misc - Intel packaging: the daal4py is not enabled by default on their soft-fork. But it gives a warning/message that they can enable that. Intel's Numpy is very different from the official Numpy. Intel still are not completely on the same page, hence Gael's response to them. ## Next meeting time Monday 18th November.