Title: Probabilistic classification and cost-sensitive learning with scikit-learn Abstract: Data scientists are repeatly told that it is absolutely critical to align their model training methodology with a specific business objective. While being a rather good advise, it usually falls short on how to achieve this in practice. This hands-on tutorial aims to introduce helpful theoretical concepts and concrete software tools to help them bridge this gap. This method will be illustrated on a worked practical use case: optimizing the operations of a fraud detection system for a payment processing platform. More specifically, we will introduce the concepts of calibrated probabilistic classifiers, how to evaluate them and fix common causes of miscalibration. In a second part, we will explore how to turn probablistic classifiers into optimal business decision makers. Description: Detailed outline of the tutorial: - Introduction - Evaluting ML based predictions with: - ranking metrics, - probabilistic metrics, - decision metrics. - Proper scoring losses and their decomposition in: - calibration loss, - grouping loss, - irreducible loss. - Part I: Probabilistic classification - The calibration curve - Possible causes of miscalibration - Model misspecification - Overfitting and bad level of regularization - Possible ways to improve calibration - Non-linear feature engineering to avoid misspecification - Post-hoc calibration with Isotonic regression - Tuning parameters and early stopping with a proper-scoring rule - Part II: Optimal decision making under uncertainty - Defining a custom business cost functions - Individual-specific cost functions - Setting the Elkan-optimal threshold with `FixedThresholdClassifier` - Cost-sensitive learning for arbitrary cost functions with `TunedThresholdClassifierCV` - Predict-time decision threshold optimization. This tutorial will be delivered as a set of publicly available Jupyter notebooks under an open source license. We will mostly use components of the latest version of the scikit-learn library + a few custom extensions.