scipy.stats.sampling
and scipy.stats.qmc
NumPy random number generators and SciPy distributions are widely used to get random numbers. However, challenges might arise in the following situations: (i) sampling from non-standard distributions can be slow if a custom implementation is not available and (ii) sampling in high dimensions leads to poor convergence rates. Thanks to new developments in SciPy, there is an answer to these problems with two new submodules: scipy.stats.sampling
and scipy.stats.qmc
.
NumPy random number generators (numpy.random
) and SciPy distributions (scipy.stats
) have become the standard way of sampling random numbers in the scientific Python ecosystem. These methods are fast and reliable, and the results are repeatable when a random seed is provided.
This talk addresses two challenges with these methods:
It's difficult to sample from a new or non-standard distribution. For example, naive methods to numerically invert the cumulative distribution function can be too slow in practice, even for simple probability density functions.
Instead of deriving specific generators in such situations, so-called automatic or black-box methods have been implemented to generate random variates from fairly large classes of distributions by only specifying some properties of the distributions (e.g., the density and/or cumulative distribution function). Sampling from truncated distributions is easily possible.
Sampling in high dimensions produces a lot of gaps and clusters of points. In integration problems, classical methods have a low convergence rate meaning that a large sample size is required.
By construction, Quasi-Monte Carlo (QMC) methods provide efficient, determinist (or not) and quality generators that can advantageously replace traditional methods. This can be decisive when the sampling size is limited or strong reproducibility guarantee is required. Also QMC methods are known to have better convergence rate than traditional Monte Carlo sampling (used by NumPy and SciPy).
This presentation will start with a refresher on random number generation and techniques to sample from a distribution. Then, it will offer some intuition on (Quasi-)Monte Carlo methods. Finally, it will present the new features introduced in SciPy using practical examples and conclude with pitfalls to avoid and recommendations when dealing with random numbers.
This presentation seeks to generate discussion, gather feedback to improve the library and call for new contributors.
Track: General
Keywords: SciPy, QMC, Random sampling, MC, Distribution
Type: Talk
Author 1:
First Name: Pamphile T.
Last Name: Roy
Email: proy@quansight.com
Country/Region: Austria
Organization: Quansight
Web page: https://github.com/tupui
Author 2:
First Name: Tirth
Last Name: Patel
Email: tirthasheshpatel@gmail.com
Country/Region: India
Organization: Nirma University
Web page: https://tirthasheshpatel.github.io/about
Author 3:
First Name: Christoph
Last Name: Baumgarten
Email: christoph.baumgarten@gmail.com
Country/Region: Switzerland
Organization: UBS
Website: N/A
Author 4:
First Name: Matt
Last Name: Haberland
Email: mhaberla@calpoly.edu
Country/Region: USA
Organization: California Polytechnic State University, San Luis Obispo
Web page: https://brae.calpoly.edu/faculty-and-staff-haberland