# Random Sampling
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
```
## SciPy Statistical Functions
The Python module [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) is the preferred tool for working with probability distributions, as it is built upon NumPy's random number generators and offers several statistical utilities. We use `scipy.stats` both to generate random samples and to plot probability density functions (PDFs).
### Distribution Objects
The `scipy.stats` module includes a Python object for each common probability distribution (e.g., `scipy.stats.norm`, `scipy.stats.uniform`). Each object represents an entire family of distributions defined by parameters.
| Distribution Object | Family | Standard PDF $f(x)$ | Key Parameters |
| :--- | :--- | :--- | :--- |
| `scipy.stats.norm` | Normal $N(\mu, \sigma^2)$ | $\frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ | `loc =\mu` (mean), `scale = sigma` (std. dev.) |
| `scipy.stats.uniform` | Uniform $U(a, b)$ | $\begin{cases} 1 & , 0 \le x \le 1 \\ 0 & , \text{otherwise} \end{cases}$ | `loc = a`, `scale = b-a` |
| `scipy.stats.expon` | Exponential $\text{Exp}(\lambda)$ | $\begin{cases} e^{-x} & , x \ge 0 \\ 0 & , x < 0 \end{cases}$ | `scale = 1/lambda` |
| `scipy.stats.gamma` | Gamma $\Gamma(\alpha, \beta)$ | $\displaystyle \frac{x^{\alpha-1} e^{-x}}{\Gamma(\alpha)}$ (with $\alpha$ as a shape parameter) | $\alpha$ (shape parameter), `scale = 1/\beta` |
### Density Function (`pdf`)
Each distribution object has a `pdf` function. This function uses the standard PDF, $f(x)$, but applies parameters `loc` and `scale` to transform it into the specific distribution:
$$
\frac{1}{\mathtt{scale}} \ f\left( \frac{x - \mathtt{loc}}{\mathtt{scale}} \right)
$$
Conceptually, the result is first scaled by `scale` and then shifted by `loc`.
### Sampling Function (`rvs`)
Each distribution object also includes a function named `rvs` (for "random variables") to generate random samples from the distribution. The parameters `loc` and `scale` define the distribution's form, and the parameter `size` specifies the number of samples $N$.
## Normal Distribution
The Python object for the normal distribution, $N(\mu,\sigma^2)$, is `scipy.stats.norm`. The parameters are set as: `mu = loc` and `sigma = scale`.
### Plotting the PDF
We use `scipy.stats.norm.pdf` to plot the density functions for various means $\mu$ and standard deviations $\sigma$.
```python=
x = np.linspace(-5,10,100)
y0 = stats.norm.pdf(x,loc=0,scale=1)
y1 = stats.norm.pdf(x,loc=1,scale=2)
y2 = stats.norm.pdf(x,loc=2,scale=3)
plt.plot(x,y0,x,y1,x,y2)
plt.title('Normal Distribution $N(\mu,\sigma^2)$')
plt.legend(['$\mu=0,\sigma^2=1$','$\mu=1,\sigma^2=4$','$\mu=2,\sigma^2=9$'])
plt.grid(True)
plt.show()
```

### Generating Random Samples
We use `scipy.stats.norm.rvs` to generate samples, plotting a normalized histogram against the theoretical PDF to show the fit.
```python=+
N = 1000; mu = 1.; sigma = 0.5;
X = stats.norm.rvs(loc=mu,scale=sigma,size=N)
plt.hist(X,bins=50,density=True,alpha=0.5), plt.grid(True)
x = np.linspace(-1,3,100)
y = stats.norm.pdf(x,loc=mu,scale=sigma)
plt.plot(x,y)
plt.title('1000 Random Samples of N(1,1/4)')
plt.show()
```

## Uniform Distribution
The object for the uniform distribution, $U(a,b)$, is `scipy.stats.uniform`. The parameters are set as: `a = loc` and `b-a = scale`.
### Plotting the PDF
```python=
x = np.linspace(-2,5,500)
y0 = stats.uniform.pdf(x,loc=0,scale=1)
y1 = stats.uniform.pdf(x,loc=1,scale=2)
y2 = stats.uniform.pdf(x,loc=-1,scale=5)
plt.plot(x,y0,x,y1,x,y2)
plt.title('Uniform Distribution U(a,b)')
plt.legend(['$a=0,b=1$','$a=1,b=3$','$a=-1,b=4$']), plt.grid(True)
plt.show()
```

### Generating Random Samples
```python=+
N = 1000; a = -1.; b = 3;
X = stats.uniform.rvs(loc=a,scale=(b-a),size=N)
plt.hist(X,bins=20,density=True,alpha=0.5), plt.grid(True)
x = np.linspace(-2,4,200)
y = stats.uniform.pdf(x,loc=a,scale=(b-a))
plt.plot(x,y)
plt.ylim([-0.2,1.0]), plt.title('1000 Random Samples of $U(-1,3)$')
plt.show()
```

## Exponential Distribution
The object for the exponential distribution, $\text{Exp}(\lambda)$, is `scipy.stats.expon`. The parameter is set as: `1/\lambda = scale`. The location parameter `loc` is typically $0$.
### Plotting the PDF
```python=
x = np.linspace(-1,10,200)
y0 = stats.expon.pdf(x,loc=0,scale=1)
y1 = stats.expon.pdf(x,loc=0,scale=2)
y2 = stats.expon.pdf(x,loc=0,scale=3)
plt.plot(x,y0,x,y1,x,y2)
plt.title('Exponential Distribution $Exp(\lambda)$')
plt.legend(['$\lambda=1$','$\lambda=1/2$','$\lambda=1/3$']), plt.grid(True)
plt.show()
```

### Generating Random Samples
```python=+
N = 1000; lam = 0.2;
X = stats.expon.rvs(loc=0.,scale=1/lam,size=N)
plt.hist(X,bins=range(30),density=True,alpha=0.5), plt.grid(True)
x = np.linspace(-1,31,200)
y = stats.expon.pdf(x,loc=0.,scale=1/lam)
plt.plot(x,y)
plt.title('1000 Random Samples of Exp(0.2)')
plt.show()
```

## Gamma Distribution
The object for the gamma distribution, $\Gamma(\alpha,\beta)$, is `scipy.stats.gamma`. The parameters are set as: $\alpha$ is a required shape parameter, and `1/\beta = scale`. The location parameter `loc` is typically $0$.
### Plotting the PDF
```python=
x = np.linspace(-1,8,200)
y0 = stats.gamma.pdf(x,1.75,loc=0,scale=1)
y1 = stats.gamma.pdf(x,2.5,loc=0,scale=1.5)
y2 = stats.gamma.pdf(x,3,loc=0,scale=0.8)
plt.plot(x,y0,x,y1,x,y2)
plt.title('Gamma Distribution $\Gamma(a,b)$')
plt.legend(['$a=7/4,b=1$','$a=5/2,b=2/3$','$a=3,b=5/4$']), plt.grid(True)
plt.show()
```

### Generating Random Samples
```python=+
N = 1000; alpha = 5; beta = 2/3;
X = stats.gamma.rvs(alpha,loc=0.,scale=1/beta,size=N)
plt.hist(X,bins=np.arange(0,20,0.5),density=True,alpha=0.5), plt.grid(True)
x = np.linspace(-1,20,200)
y = stats.gamma.pdf(x,alpha,loc=0.,scale=1/beta)
plt.plot(x,y)
plt.title('1000 Random Samples of $\Gamma(5,2/3)$')
plt.show()
```

## Functions of Random Variables
When $Y$ is a random variable defined as a function of other independent random variables $Y = g(X_1, \dots, X_n)$, finding the exact PDF, $f_Y(x)$, can be difficult.
The solution is to use **random sampling to approximate the distribution**:
1. Generate large random samples for each independent variable $X_1, \dots, X_n$.
2. Compute the corresponding sample values for $Y$ using the function $g$.
3. Plot a normalized histogram of the $Y$ samples, which approximates $f_Y(x)$.
The `plt.hist(Y, density=True)` function returns the relative frequency values and bin edges, which can be plotted as a line to visualize the approximated density function more clearly.
### Example: Sum of Normal Variables
If $X_1 \sim N(\mu_1,\sigma^2_1)$ and $X_2 \sim N(\mu_2,\sigma^2_2)$ are independent, then their sum $Y = X_1 + X_2$ is also normally distributed: $Y \sim N(\mu_1 + \mu_2,\sigma^2_1 + \sigma^2_2)$. This is verified by sampling and plotting the histogram.
```python=
N = 1000
X1 = stats.norm.rvs(loc=-1,scale=1,size=N)
X2 = stats.norm.rvs(loc=1,scale=2,size=N)
Y = X1 + X2
freqs,bins,_ = plt.hist(Y,bins=25,density=True,alpha=0.5)
mids = (bins[:-1] + bins[1:])/2
plt.plot(mids,freqs,'.-')
plt.grid(True)
plt.show()
```

### Example: Norm of Normal Variables
If $Y = \sqrt{X_1^2 + X_2^2}$, where $X_1$ and $X_2$ are normal random variables, the resulting distribution $Y$ is known as the chi distribution.
```python=
N = 2000
X1 = stats.norm.rvs(loc=0,scale=1,size=N)
X2 = stats.norm.rvs(loc=1,scale=2,size=N)
Y = np.sqrt(X1**2 + X2**2)
freqs,bins,_ = plt.hist(Y,bins=np.arange(0,8,0.2),density=True,alpha=0.5)
mids = (bins[:-1] + bins[1:])/2
plt.plot(mids,freqs,'.-')
plt.grid(True)
plt.show()
```
