--- tags: SciPy, conference --- # Improving random sampling in Python: `scipy.stats.sampling` and `scipy.stats.qmc` ## Short Summary NumPy random number generators and SciPy distributions are widely used to get random numbers. However, challenges might arise in the following situations: _(i)_ sampling from non-standard distributions can be slow if a custom implementation is not available and _(ii)_ sampling in high dimensions leads to poor convergence rates. Thanks to new developments in SciPy, there is an answer to these problems with two new submodules: `scipy.stats.sampling` and `scipy.stats.qmc`. ## Abstract NumPy random number generators (`numpy.random`) and SciPy distributions (`scipy.stats`) have become the standard way of sampling random numbers in the scientific Python ecosystem. These methods are fast and reliable, and the results are repeatable when a random seed is provided. This talk addresses two challenges with these methods: 1. It's difficult to sample from a new or non-standard distribution. For example, naive methods to numerically invert the cumulative distribution function can be too slow in practice, even for simple probability density functions. Instead of deriving specific generators in such situations, so-called _automatic_ or _black-box_ methods have been implemented to generate random variates from fairly large classes of distributions by only specifying some properties of the distributions (e.g., the density and/or cumulative distribution function). Sampling from truncated distributions is easily possible. 2. Sampling in high dimensions produces a lot of gaps and clusters of points. In integration problems, classical methods have a low convergence rate meaning that a large sample size is required. By construction, Quasi-Monte Carlo (QMC) methods provide efficient, determinist (or not) and quality generators that can advantageously replace traditional methods. This can be decisive when the sampling size is limited or strong reproducibility guarantee is required. Also QMC methods are known to have better convergence rate than traditional Monte Carlo sampling (used by NumPy and SciPy). This presentation will start with a refresher on random number generation and techniques to sample from a distribution. Then, it will offer some intuition on (Quasi-)Monte Carlo methods. Finally, it will present the new features introduced in SciPy using practical examples and conclude with pitfalls to avoid and recommendations when dealing with random numbers. This presentation seeks to generate discussion, gather feedback to improve the library and call for new contributors. ## Other information Track: General Keywords: SciPy, QMC, Random sampling, MC, Distribution Type: Talk Author 1: First Name: Pamphile T. Last Name: Roy Email: proy@quansight.com Country/Region: Austria Organization: Quansight Web page: https://github.com/tupui Author 2: First Name: Tirth Last Name: Patel Email: tirthasheshpatel@gmail.com Country/Region: India Organization: Nirma University Web page: https://tirthasheshpatel.github.io/about Author 3: First Name: Christoph Last Name: Baumgarten Email: christoph.baumgarten@gmail.com Country/Region: Switzerland Organization: UBS Website: N/A Author 4: First Name: Matt Last Name: Haberland Email: mhaberla@calpoly.edu Country/Region: USA Organization: California Polytechnic State University, San Luis Obispo Web page: https://brae.calpoly.edu/faculty-and-staff-haberland