Random Sampling

# Random Sampling ```python import numpy as np import matplotlib.pyplot as plt from scipy import stats ``` ## SciPy Statistical Functions The Python module [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) is the preferred tool for working with probability distributions, as it is built upon NumPy's random number generators and offers several statistical utilities. We use `scipy.stats` both to generate random samples and to plot probability density functions (PDFs). ### Distribution Objects The `scipy.stats` module includes a Python object for each common probability distribution (e.g., `scipy.stats.norm`, `scipy.stats.uniform`). Each object represents an entire family of distributions defined by parameters. | Distribution Object | Family | Standard PDF $f(x)$ | Key Parameters | | :--- | :--- | :--- | :--- | | `scipy.stats.norm` | Normal $N(\mu, \sigma^2)$ | $\frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ | `loc =\mu` (mean), `scale = sigma` (std. dev.) | | `scipy.stats.uniform` | Uniform $U(a, b)$ | $\begin{cases} 1 & , 0 \le x \le 1 \\ 0 & , \text{otherwise} \end{cases}$ | `loc = a`, `scale = b-a` | | `scipy.stats.expon` | Exponential $\text{Exp}(\lambda)$ | $\begin{cases} e^{-x} & , x \ge 0 \\ 0 & , x < 0 \end{cases}$ | `scale = 1/lambda` | | `scipy.stats.gamma` | Gamma $\Gamma(\alpha, \beta)$ | $\displaystyle \frac{x^{\alpha-1} e^{-x}}{\Gamma(\alpha)}$ (with $\alpha$ as a shape parameter) | $\alpha$ (shape parameter), `scale = 1/\beta` | ### Density Function (`pdf`) Each distribution object has a `pdf` function. This function uses the standard PDF, $f(x)$, but applies parameters `loc` and `scale` to transform it into the specific distribution: $$ \frac{1}{\mathtt{scale}} \ f\left( \frac{x - \mathtt{loc}}{\mathtt{scale}} \right) $$ Conceptually, the result is first scaled by `scale` and then shifted by `loc`. ### Sampling Function (`rvs`) Each distribution object also includes a function named `rvs` (for "random variables") to generate random samples from the distribution. The parameters `loc` and `scale` define the distribution's form, and the parameter `size` specifies the number of samples $N$. ## Normal Distribution The Python object for the normal distribution, $N(\mu,\sigma^2)$, is `scipy.stats.norm`. The parameters are set as: `mu = loc` and `sigma = scale`. ### Plotting the PDF We use `scipy.stats.norm.pdf` to plot the density functions for various means $\mu$ and standard deviations $\sigma$. ```python= x = np.linspace(-5,10,100) y0 = stats.norm.pdf(x,loc=0,scale=1) y1 = stats.norm.pdf(x,loc=1,scale=2) y2 = stats.norm.pdf(x,loc=2,scale=3) plt.plot(x,y0,x,y1,x,y2) plt.title('Normal Distribution $N(\mu,\sigma^2)$') plt.legend(['$\mu=0,\sigma^2=1$','$\mu=1,\sigma^2=4$','$\mu=2,\sigma^2=9$']) plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/B1CW96Aaxe.png) ### Generating Random Samples We use `scipy.stats.norm.rvs` to generate samples, plotting a normalized histogram against the theoretical PDF to show the fit. ```python=+ N = 1000; mu = 1.; sigma = 0.5; X = stats.norm.rvs(loc=mu,scale=sigma,size=N) plt.hist(X,bins=50,density=True,alpha=0.5), plt.grid(True) x = np.linspace(-1,3,100) y = stats.norm.pdf(x,loc=mu,scale=sigma) plt.plot(x,y) plt.title('1000 Random Samples of N(1,1/4)') plt.show() ``` ![image](https://hackmd.io/_uploads/HknVcpA6ex.png) ## Uniform Distribution The object for the uniform distribution, $U(a,b)$, is `scipy.stats.uniform`. The parameters are set as: `a = loc` and `b-a = scale`. ### Plotting the PDF ```python= x = np.linspace(-2,5,500) y0 = stats.uniform.pdf(x,loc=0,scale=1) y1 = stats.uniform.pdf(x,loc=1,scale=2) y2 = stats.uniform.pdf(x,loc=-1,scale=5) plt.plot(x,y0,x,y1,x,y2) plt.title('Uniform Distribution U(a,b)') plt.legend(['$a=0,b=1$','$a=1,b=3$','$a=-1,b=4$']), plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/S1Q396Rall.png) ### Generating Random Samples ```python=+ N = 1000; a = -1.; b = 3; X = stats.uniform.rvs(loc=a,scale=(b-a),size=N) plt.hist(X,bins=20,density=True,alpha=0.5), plt.grid(True) x = np.linspace(-2,4,200) y = stats.uniform.pdf(x,loc=a,scale=(b-a)) plt.plot(x,y) plt.ylim([-0.2,1.0]), plt.title('1000 Random Samples of $U(-1,3)$') plt.show() ``` ![image](https://hackmd.io/_uploads/r1ha9pCTlg.png) ## Exponential Distribution The object for the exponential distribution, $\text{Exp}(\lambda)$, is `scipy.stats.expon`. The parameter is set as: `1/\lambda = scale`. The location parameter `loc` is typically $0$. ### Plotting the PDF ```python= x = np.linspace(-1,10,200) y0 = stats.expon.pdf(x,loc=0,scale=1) y1 = stats.expon.pdf(x,loc=0,scale=2) y2 = stats.expon.pdf(x,loc=0,scale=3) plt.plot(x,y0,x,y1,x,y2) plt.title('Exponential Distribution $Exp(\lambda)$') plt.legend(['$\lambda=1$','$\lambda=1/2$','$\lambda=1/3$']), plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/r1S7oaAaex.png) ### Generating Random Samples ```python=+ N = 1000; lam = 0.2; X = stats.expon.rvs(loc=0.,scale=1/lam,size=N) plt.hist(X,bins=range(30),density=True,alpha=0.5), plt.grid(True) x = np.linspace(-1,31,200) y = stats.expon.pdf(x,loc=0.,scale=1/lam) plt.plot(x,y) plt.title('1000 Random Samples of Exp(0.2)') plt.show() ``` ![image](https://hackmd.io/_uploads/ByPrip0agx.png) ## Gamma Distribution The object for the gamma distribution, $\Gamma(\alpha,\beta)$, is `scipy.stats.gamma`. The parameters are set as: $\alpha$ is a required shape parameter, and `1/\beta = scale`. The location parameter `loc` is typically $0$. ### Plotting the PDF ```python= x = np.linspace(-1,8,200) y0 = stats.gamma.pdf(x,1.75,loc=0,scale=1) y1 = stats.gamma.pdf(x,2.5,loc=0,scale=1.5) y2 = stats.gamma.pdf(x,3,loc=0,scale=0.8) plt.plot(x,y0,x,y1,x,y2) plt.title('Gamma Distribution $\Gamma(a,b)$') plt.legend(['$a=7/4,b=1$','$a=5/2,b=2/3$','$a=3,b=5/4$']), plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/H1yKiTAaee.png) ### Generating Random Samples ```python=+ N = 1000; alpha = 5; beta = 2/3; X = stats.gamma.rvs(alpha,loc=0.,scale=1/beta,size=N) plt.hist(X,bins=np.arange(0,20,0.5),density=True,alpha=0.5), plt.grid(True) x = np.linspace(-1,20,200) y = stats.gamma.pdf(x,alpha,loc=0.,scale=1/beta) plt.plot(x,y) plt.title('1000 Random Samples of $\Gamma(5,2/3)$') plt.show() ``` ![image](https://hackmd.io/_uploads/Hyv5ip0ple.png) ## Functions of Random Variables When $Y$ is a random variable defined as a function of other independent random variables $Y = g(X_1, \dots, X_n)$, finding the exact PDF, $f_Y(x)$, can be difficult. The solution is to use **random sampling to approximate the distribution**: 1. Generate large random samples for each independent variable $X_1, \dots, X_n$. 2. Compute the corresponding sample values for $Y$ using the function $g$. 3. Plot a normalized histogram of the $Y$ samples, which approximates $f_Y(x)$. The `plt.hist(Y, density=True)` function returns the relative frequency values and bin edges, which can be plotted as a line to visualize the approximated density function more clearly. ### Example: Sum of Normal Variables If $X_1 \sim N(\mu_1,\sigma^2_1)$ and $X_2 \sim N(\mu_2,\sigma^2_2)$ are independent, then their sum $Y = X_1 + X_2$ is also normally distributed: $Y \sim N(\mu_1 + \mu_2,\sigma^2_1 + \sigma^2_2)$. This is verified by sampling and plotting the histogram. ```python= N = 1000 X1 = stats.norm.rvs(loc=-1,scale=1,size=N) X2 = stats.norm.rvs(loc=1,scale=2,size=N) Y = X1 + X2 freqs,bins,_ = plt.hist(Y,bins=25,density=True,alpha=0.5) mids = (bins[:-1] + bins[1:])/2 plt.plot(mids,freqs,'.-') plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/BJaM36RTle.png) ### Example: Norm of Normal Variables If $Y = \sqrt{X_1^2 + X_2^2}$, where $X_1$ and $X_2$ are normal random variables, the resulting distribution $Y$ is known as the chi distribution. ```python= N = 2000 X1 = stats.norm.rvs(loc=0,scale=1,size=N) X2 = stats.norm.rvs(loc=1,scale=2,size=N) Y = np.sqrt(X1**2 + X2**2) freqs,bins,_ = plt.hist(Y,bins=np.arange(0,8,0.2),density=True,alpha=0.5) mids = (bins[:-1] + bins[1:])/2 plt.plot(mids,freqs,'.-') plt.grid(True) plt.show() ``` ![image](https://hackmd.io/_uploads/r1MS36RTgg.png)