sk-learn user survey STAGE 1

### SCIKIT-LEARN USER SURVEY QUESTIONNAIRE **WELCOME TO THE SCIKIT-LEARN SURVEY** This survey is being conducted by the scikit-learn survey team to ensure that scikit-learn evolves in a way that benefits its user community. Participation in this survey is voluntary and it offers an option to remain completely anonymous. It should take approximately 15 minutes to complete. Please check the box below to indicate that you have read this statement in its entirety; that your questions about the survey have been answered to your satisfaction; and that you voluntarily agree to participate in the survey. You may print a copy of this consent form if you wish. [x] I have read this statement in its entirety and affirm the stated conditions. ### PROJECT FUTURE DIRECTION AND PRIORITIES 1. Thinking about scikit-learn's future, what aspects of the library would you prioritize for improvement? [Multiple choice grid] * Performance * Reliability * Packaging * New features * Technical documentation * Educational materials * Website redesign * Other 2. Please expand on your answer about the priorities for scikit-learn. [Free text] 3. What single immediate change to scikit-learn would bring the most value to you as a scikit-learn user? [Free text] ### TECHNICAL QUESTIONS **Project** 4. Please order the following ML tasks in order of priority to you: * Regression * Classification * Forecasting * Outlier/anomaly detection * Dimensionality reduction * Clustering * Other 5. What visualizations do you use to evaluate your models? [Drop down menu] [Multiple choice] * Confusion matrix * Reliability diagram * ROC curve * Precision-Recall curve * Feature importance * Residual plots * Learning curves * Other 6. Which DataFrame libraries do you use? [Multiple choice] - cudf - Dask DataFrame - DuckDB - Modin - Pandas - Polars - Spark DataFrame - Other **Modeling** 7. What do you like the most about scikit-learn? [Free text] 8. Which other Machine Learning libraries do you use? [Multiple choice] - CatBoost - Jax - Keras - LightGBM - PyTorch - Transformers - XGBoost - Other 9. Which estimators do you regularly use? [Multiple choice] - `LogisticRegression` - `RandomForestClassifier` or `RandomForestRegressor` - `HistGradientBoostingRegressor` or `HistGradientBoostingClassifier` - `Pipeline` - `ColumnTransformer` - Other 10. Have you ever written your own estimator, or extended an existing scikit-learn estimator? * Yes * No 11. What ML features are important for your use case? [Multiple choice] - Calibration of probabilistic classifiers - Calibration of regressors - Uncertainty estimates for prediction - Cost-sensitive learning - Feature importances - Sample weights - Metadata routing - Non-euclidean metrics 12. Is there additional information you want to pass to an estimator that is not `x` and `y`? * Yes * No 13. [Conditional question] If so, what kind of information would that be? [Free text] 14. [Conditional question] How would it benefit the model training process?[Free text] **Deployment** 15. Considering your current machine learning projects, how critical would GPU capabilities within scikit-learn be? [Ranking question 1-5] 16. For model registry and experiment tracking, do you use any of the following tools? Choose all that apply. [Multiple choice] * MLFlow * DVC * Weight and biases * Neptune * Custom tool * Other 17. For scheduling, do you use any of the following tools? [Dropdown menu] * Airflow * Argo * Coiled * Dagster * Kubeflow * Metaflow (outerbounds) * Custom tool * Other 18. How long does a typical model training take in your ML projects? [Dropdown menu] * less than 10sc * less than a minute * less than 10 minutes * less than an hour * less than a day * more than a day 19. How many deployed models are you (and your team) currently maintaining? [Number field] 20. To what extent do you agree with the following statement: "Open source ML & AI frameworks and libraries are crucial for ensuring transparency and the reproducibility of AI research and development"? [Dropdown menu] * Strongly agree * Agree * Neither agree nor disagree * Disagree * Strongly disagree ### VOLUNTEER FOR INTERVIEW 21. Would you like to volunteer for a short conversation with the scikit-learn team to discuss your responses in more detail? * Yes * No 22. [Conditional question] If yes, please provide your email address. [Free text]