# OS Team Review & Planning (odd weeks) Progress meeting on OS projects. - Frequency: bi-weekly, on Mondays at 15:00 CET/CEST - Where: Google Meet: todo update - Who: probabl OS Team members ## 2024-03-25 ### Individual reports - [name=Olivier] - Mostly Triaging week - Array API follow-up - More job interviews - Still todo: test guillaume's private CI to be used to conduct technical evaluation for the short list. - No progress on the calibration meta-issue I wanted to prepare. - [name=Stefanie] - focused on learning - configuration files - Joblib documentation - starting metadata routing for StackingClassifier and StackingRegressor - [name=Guillaume] - Mainly worked of `TunedThresholdClassifier` - Code ready for reviews - Minimal changes to do in the documentation - Jobs: - Interviews - Prepare private repository with the CIs (to be tested) - Callback drafting meeting - Preparing more stuff for PyConDE talk - Playing more with `pixi` - [name=Loïc] + Meson as main build tool: https://github.com/scikit-learn/scikit-learn/pull/28506 + Work-around OpenBLAS dead-lock on Windows: https://github.com/scikit-learn/scikit-learn/pull/28692 + Simplifying global random seed plugin: https://github.com/scikit-learn/scikit-learn/pull/27963 - [name=Jérémie] - Released threadpoolctl 3.4.0. - Pyodide support (thx Loïc) - Support for statically linked or alternative implementation of libc (e.g. musl on alpine linux) - drafting meeting for callback API -> good feedbacks - reviews for 1.4.2 / 1.5 - triage this week - lock files update issue https://github.com/scikit-learn/scikit-learn/pull/28691 - [name=Arturo] - tried to give feedback to guillaume's PR [add from_cv_results](https://github.com/scikit-learn/scikit-learn/pull/25939) and failed miserably - iter on [description of l2_regularization for hgbt models](https://github.com/scikit-learn/scikit-learn/pull/28652) ### :probabl. OS team organizational discussions - pixi: is it simplifying things or not? - sklearn vs os team report ? - Loïc in Saclay April 2 (Jean-Zay user committee on April 2) - April 4 - OpenBLAS / threadpool_limits overhead ## 2024-02-12 ### Topics - META [name=Olivier] - Precisely define the scope of this meeting and who should attend. - Shall we expect to keep public meeting notes? - If this meeting is meant to only be attended by Probabl members, we can use notion instead of hackmd. - If we want to publish the notes we need to be able to do filtering before publishing so that we can discuss probabl sensitive topics (e.g. mention partnerships under negotations, hiring plans) if needed. It means we need. - Archive meeting notes on a probabl github repo for transparency. - Alternatively we could use a public section on notion. - Team organization [name=Olivier] - Shall we maintain a backlog / project board view of ongoing focus issue for each bi-weekly iteration? - Shall we use github repo(s) to host project boards? - If so how to get a global view on who is working on what? - OSS contribution visibility [name=Guillaume] - crediting employer (probabl.ai) in github profile - most team members are ok with this apparently: "@probabl-ai" - [name=Olivier] wrote `:probabl.` - https://ossinsight.io (see [scikit-learn dashboard](https://ossinsight.io/analyze/scikit-learn/scikit-learn#people)) seems to be using the "Company" field from the Github profile reading between the lines in their [API doc](https://ossinsight.io/docs/api/issue-creators-history) - commit metadata: use probabl email address - on github, it's possible to link several email addresses to a github account - on git only, the identity is a single email address (+ name) - TODO: credit support in the documentation (about page of scikit-learn) - we can maintain a dashboard on probabl to query github API to quantify team members contributions to (community-driven) open source projects. - Drafting and publishing job description [name=Adrin] [name=Guillaume] [name=Olivier] ### Per-project progress reports & planning - **threadpoolctl**: [name=Jeremie] [name=Olivier] - Inspect flexiblas [#156](https://github.com/joblib/threadpoolctl/pull/156) (merged) - FlexiBLAS backend switching [#163](https://github.com/joblib/threadpoolctl/pull/163) (WIP, almost ready besides non-informative CI failure) - Detect Apple Accelerate (with threading inspection) [#166](https://github.com/joblib/threadpoolctl/pull/166) (WIP, more complex than anticipated) - Planned 3.3.0 or even 4.0.0 release once those are merged. - **skrub**: [name=Jerome] [name=Guillaume] [name=Gael] - Refactoring `TableVectorizer` (automatic preprocessing based on heuristic) - Internal dispatch to dataframe-specific code (Pandas or Polars) - API discussion around pipeline planned on Thursday. - **scikit-learn**: - [name=Guillaume] Start the process for the 1.4.1 release - [name=Olivier] [name=Franck] GPU and Array API - GPU FAQ update is in - Franck iteration on Array API `r2_score` [#27904](https://github.com/scikit-learn/scikit-learn/pull/27904) based on reviews. - [name=Olivier] Calibration / Uncertainty - Addressed review comments on improved calibration example [#28231](https://github.com/scikit-learn/scikit-learn/pull/28231) - [name=Stefanie] - [#27576](https://github.com/scikit-learn/scikit-learn/pull/27576): test error in RegressorChain at `scipy.sparse.hstack((X_sparse, Y_sparse))` involving `scipy.sparse.dok_array` → workaround: convert to coo_array??? - [#28205](https://github.com/scikit-learn/scikit-learn/pull/28205): metadata routing for `FeatureUnion` → routing with `fit_transform` - [name=Loïc] + a bit of work on the last 1.4.1 milestoned issue: https://github.com/scikit-learn/scikit-learn/pull/28327 + Moving scipy-dev build to Python 3.12 https://github.com/scikit-learn/scikit-learn/pull/28383 + Bit of review for new_web_theme / pydata-sphinx-theme + Fix for pandas regression in HistGradientBoosting https://github.com/scikit-learn/scikit-learn/pull/28385 - **Pyodide** - [name=Loïc] Scipy 1.12 PR ran tests from https://github.com/lesteve/scipy-tests-pyodide and mentioned it would be great if it was inside Pyodide CI