---
tags: Python Workshop 沈煒翔
---
# Lesson 7: Numpy
Numpy is a library for managing numbers and arrays in Python. We can import the library into our script via
```python
import numpy as np
```
Numpy arrays are like Python list. However, the size of a numpy array is fixed, so no append() or pop(). Also, numpy is around 50x faster than Python lists (built by C++).
Numpy has many useful functions that can make everyone's life easier when manipulating numbers/arrays. It is impratical to go through every function but we'll do our best.
## Create a numpy array
We can construct a numpy array using lists
```python
# 0-D array
a = np.array(40)
# 1-D array
b = np.array([1, 2, 3, 4, 5])
# 2-D array
c = np.array([[1,2,3], [2,3,4]]) # list of lists
```
or we can use ones() and zeros() function. (You may notice that some function names are similar to Matlab)
```python
d = np.ones((5, 3)) # 2D arrays of shape (5, 3) with all ones
e = np.zeros((5, 3, 4)) # 3D arrays of shape (5, 3, 4) with all zeros
```
We can check the dimension using .ndim and the shape using .shape
```python
print(d.ndim) # >>> 2
print(d.shape) # >>> (5, 3)
print(a.ndim) # >>> 0
print(a.shape) # >>> ()
print(b.ndim) # >>> 1
print(b.shape) # >>> (5, )
```
## Array indexing
We use completely the same indexing techniques as Python lists when indexing an array.
```python
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print(arr[1][3]) # >>> 9
# This also works
print(arr[1, 3]) # list doesn't work
```
### Exercise
Create a 1-D numpy array of length N with its elements of 0, 1, 2, 3, ..., N-1.
```python
N = 4
print(arr) # >>> [0 1 2 3]
```
## Traversing a array
For 1-D array, it's simple. We can use a for loop
```python
arr = np.arange(4)
# Use i as the loop index
for i in range(arr.shape[0]):
print(arr[i])
# >>> 0
# >>> 1
# >>> 2
# >>> 3
# This also works
# We treat the array as an iterable thing
for value in arr:
print(value)
for i, value in enumerate(arr):
print(i, value)
```
For 2-D array, we just need a nested for loop
```python
arr = np.array([[1, 2, 3], [4, 5, 6]])
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
print(arr[i][j])
# >>> would print 1, 2, 3, 4, 5, 6
# This also works
for row in arr:
for value in row:
print(value)
```
People use both methods to traverse an array. Both are good.
### Exercise
Make a N*N array with each element's value assigned to the distance to (0,0).
So arr[0,0] = 0, arr[1,1]=1.414, arr[3,4] = 5
```python
N = 20
# hint: distance can be calculated as (x**2 + y**2)**0.5
# You can plot the array using
import matplotlib.pyplot as plt
plt.imshow(arr)
plt.colorbar()
```
matplotlib is another library in Python and is commonly used for plotting (drawing) stuff.
## Array operations
First, we should know that arrays can store arbitrary data types as its elements. Elements should have the same data types.
```python
a = np.array([[1,2], [3,4]])
b = np.ones(a.shape) * 2 # element-wise multiplication
c = a + b # element-wise addition
# [[3. 4.]
# [5. 6.]]
d = a - b # element-wise subtraction
# [[-1. 0.]
# [ 1. 2.]]
e = a * b # element-wise multiplication
# [[2. 4.]
# [6. 8.]]
f = a / b # element-wise division
# [[0.5 1.]
# [1.5 2.]]
# If you want true matrix multiplication, you should use
g = np.matmul(a, b)
# [[ 6. 6.]
# [14. 14.]]
```
## Random number generation
Numpy has a ```random``` module for generating random numbers.
```python
import numpy as np
a = np.random.randint(0, 100) # random generate an integer from [0,100)
b = np.random.rand() # random generate a float between 0 and 1
```
Or we can generate an entire random array at once
```python
x = np.random.randint(1, 5, size=(3, 5))
y = np.random.rand(3, 5)
```
### Exercise
Generate an N*N 2D array with a uniform distribution (float) from 0-10. Calculate the probability of values between 0-5.
```python
N = 100
```
## Random Distribution
There are many built-in functions in numpy that can help us generate common probability distribution easily.
### Gaussian distribution (normal distribution)
```python
x = np.random.normal(loc=3, scale=2, size=(10000, ))
# plot the histogram of the distribution
plt.hist(x, bins=100)
```

### Uniform distribution
```python
x = np.random.uniform(low=5, high=20, size=(10000,))
# plot the histogram of the distribution
plt.hist(x, bins=100)
```

### Other distirbution
Just look up the documentation when using them. No need to remember
```python
np.random.binomial()
np.random.poisson()
np.random.rayleigh()
np.random.exponential()
...
```
### Exercise
Assume there is a system ```y = x^2 + 3x + 1```. We sample 100 points of the system (with a x being a normal distribution).
```python
x = # gaussian distribution
y =
# plotting
import matplotlib.pyplot as plt
plt.scatter(x, y)
```
## Data manipulation example
Here, we go through an example on how you may use numpy.
We want to estimate the parameters of a Gaussian distribution by maximizing likelihood estimation.
So this is the PDF of a Gaussian distribution

We want to estimate these parameters from the observed data.

After a lot of math that maximize the likelihood function (so minimize the negative log likelihood function), we come to the conclustion that


1. First, we generate a real Gaussian distribution with a predefiend mean a standard deviation. The Gaussian distribution is added with a little bit white noise
white noise: a Gaussian random variable with mean = 0
```python
def generate_gaussian_dist(mean, std, noise_std, size=(1000, )):
# generate distribution here
white_noise =
x =
x = x + white_noise
return x
```
2. Second, we estimate the parameters of the distribution using MLE (assume the distribution is Gaussian)
```python
def gaussian_MLE(x):
# MLE
return mean, std
```
3. With the estimated parmeters, generate the estimated Gaussian distribution with zero noise
4. plot two distributions together
```python
plt.hist(x, bins=100)
plt.hist(y, bins=100)
```
5. Change step 1 (the oberserved) distribution to a uniform distribution and see what happens.
### Solution
```python
import numpy as np
import matplotlib.pyplot as plt
def generate_noisy_gaussian_dist(mean, std, noise_std, size=(1000, )):
# generate distribution here
white_noise = np.random.normal(loc=0, scale=noise_std, size=size)
x = np.random.normal(loc=mean, scale=std, size=size)
x = x + white_noise
return x
def gaussian_MLE(x):
# mean
sum_ = 0
for value in x:
sum_ += value
mean = sum_ / x.shape[0]
# std
sum_ = 0
for value in x:
sum_ += (value - mean)**2
sum_ /= x.shape[0]
std = sum_ ** 0.5
return mean, std
# generate data
observed_data = generate_noisy_gaussian_dist(5, 3, 1)
# estimating parameters
estimated_mean, estimated_std = gaussian_MLE(observed_data)
estimated_data = generate_noisy_gaussian_dist(estimated_mean, estimated_std, 0)
plt.hist(observed_data, bins=100)
plt.hist(estimated_data, bins=100)
```