Lesson 7: Numpy

--- tags: Python Workshop 沈煒翔 --- # Lesson 7: Numpy Numpy is a library for managing numbers and arrays in Python. We can import the library into our script via ```python import numpy as np ``` Numpy arrays are like Python list. However, the size of a numpy array is fixed, so no append() or pop(). Also, numpy is around 50x faster than Python lists (built by C++). Numpy has many useful functions that can make everyone's life easier when manipulating numbers/arrays. It is impratical to go through every function but we'll do our best. ## Create a numpy array We can construct a numpy array using lists ```python # 0-D array a = np.array(40) # 1-D array b = np.array([1, 2, 3, 4, 5]) # 2-D array c = np.array([[1,2,3], [2,3,4]]) # list of lists ``` or we can use ones() and zeros() function. (You may notice that some function names are similar to Matlab) ```python d = np.ones((5, 3)) # 2D arrays of shape (5, 3) with all ones e = np.zeros((5, 3, 4)) # 3D arrays of shape (5, 3, 4) with all zeros ``` We can check the dimension using .ndim and the shape using .shape ```python print(d.ndim) # >>> 2 print(d.shape) # >>> (5, 3) print(a.ndim) # >>> 0 print(a.shape) # >>> () print(b.ndim) # >>> 1 print(b.shape) # >>> (5, ) ``` ## Array indexing We use completely the same indexing techniques as Python lists when indexing an array. ```python arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) print(arr[1][3]) # >>> 9 # This also works print(arr[1, 3]) # list doesn't work ``` ### Exercise Create a 1-D numpy array of length N with its elements of 0, 1, 2, 3, ..., N-1. ```python N = 4 print(arr) # >>> [0 1 2 3] ``` ## Traversing a array For 1-D array, it's simple. We can use a for loop ```python arr = np.arange(4) # Use i as the loop index for i in range(arr.shape[0]): print(arr[i]) # >>> 0 # >>> 1 # >>> 2 # >>> 3 # This also works # We treat the array as an iterable thing for value in arr: print(value) for i, value in enumerate(arr): print(i, value) ``` For 2-D array, we just need a nested for loop ```python arr = np.array([[1, 2, 3], [4, 5, 6]]) for i in range(arr.shape[0]): for j in range(arr.shape[1]): print(arr[i][j]) # >>> would print 1, 2, 3, 4, 5, 6 # This also works for row in arr: for value in row: print(value) ``` People use both methods to traverse an array. Both are good. ### Exercise Make a N*N array with each element's value assigned to the distance to (0,0). So arr[0,0] = 0, arr[1,1]=1.414, arr[3,4] = 5 ```python N = 20 # hint: distance can be calculated as (x**2 + y**2)**0.5 # You can plot the array using import matplotlib.pyplot as plt plt.imshow(arr) plt.colorbar() ``` matplotlib is another library in Python and is commonly used for plotting (drawing) stuff. ## Array operations First, we should know that arrays can store arbitrary data types as its elements. Elements should have the same data types. ```python a = np.array([[1,2], [3,4]]) b = np.ones(a.shape) * 2 # element-wise multiplication c = a + b # element-wise addition # [[3. 4.] # [5. 6.]] d = a - b # element-wise subtraction # [[-1. 0.] # [ 1. 2.]] e = a * b # element-wise multiplication # [[2. 4.] # [6. 8.]] f = a / b # element-wise division # [[0.5 1.] # [1.5 2.]] # If you want true matrix multiplication, you should use g = np.matmul(a, b) # [[ 6. 6.] # [14. 14.]] ``` ## Random number generation Numpy has a ```random``` module for generating random numbers. ```python import numpy as np a = np.random.randint(0, 100) # random generate an integer from [0,100) b = np.random.rand() # random generate a float between 0 and 1 ``` Or we can generate an entire random array at once ```python x = np.random.randint(1, 5, size=(3, 5)) y = np.random.rand(3, 5) ``` ### Exercise Generate an N*N 2D array with a uniform distribution (float) from 0-10. Calculate the probability of values between 0-5. ```python N = 100 ``` ## Random Distribution There are many built-in functions in numpy that can help us generate common probability distribution easily. ### Gaussian distribution (normal distribution) ```python x = np.random.normal(loc=3, scale=2, size=(10000, )) # plot the histogram of the distribution plt.hist(x, bins=100) ``` ![](https://i.imgur.com/eRaGMRE.png) ### Uniform distribution ```python x = np.random.uniform(low=5, high=20, size=(10000,)) # plot the histogram of the distribution plt.hist(x, bins=100) ``` ![](https://i.imgur.com/09Uozfk.png) ### Other distirbution Just look up the documentation when using them. No need to remember ```python np.random.binomial() np.random.poisson() np.random.rayleigh() np.random.exponential() ... ``` ### Exercise Assume there is a system ```y = x^2 + 3x + 1```. We sample 100 points of the system (with a x being a normal distribution). ```python x = # gaussian distribution y = # plotting import matplotlib.pyplot as plt plt.scatter(x, y) ``` ## Data manipulation example Here, we go through an example on how you may use numpy. We want to estimate the parameters of a Gaussian distribution by maximizing likelihood estimation. So this is the PDF of a Gaussian distribution ![](https://i.imgur.com/zlbkKSw.png) We want to estimate these parameters from the observed data. ![](https://i.imgur.com/4GOYFAC.png) After a lot of math that maximize the likelihood function (so minimize the negative log likelihood function), we come to the conclustion that ![](https://i.imgur.com/Zz2zCfy.png) ![](https://i.imgur.com/vql9aYx.png) 1. First, we generate a real Gaussian distribution with a predefiend mean a standard deviation. The Gaussian distribution is added with a little bit white noise white noise: a Gaussian random variable with mean = 0 ```python def generate_gaussian_dist(mean, std, noise_std, size=(1000, )): # generate distribution here white_noise = x = x = x + white_noise return x ``` 2. Second, we estimate the parameters of the distribution using MLE (assume the distribution is Gaussian) ```python def gaussian_MLE(x): # MLE return mean, std ``` 3. With the estimated parmeters, generate the estimated Gaussian distribution with zero noise 4. plot two distributions together ```python plt.hist(x, bins=100) plt.hist(y, bins=100) ``` 5. Change step 1 (the oberserved) distribution to a uniform distribution and see what happens. ### Solution ```python import numpy as np import matplotlib.pyplot as plt def generate_noisy_gaussian_dist(mean, std, noise_std, size=(1000, )): # generate distribution here white_noise = np.random.normal(loc=0, scale=noise_std, size=size) x = np.random.normal(loc=mean, scale=std, size=size) x = x + white_noise return x def gaussian_MLE(x): # mean sum_ = 0 for value in x: sum_ += value mean = sum_ / x.shape[0] # std sum_ = 0 for value in x: sum_ += (value - mean)**2 sum_ /= x.shape[0] std = sum_ ** 0.5 return mean, std # generate data observed_data = generate_noisy_gaussian_dist(5, 3, 1) # estimating parameters estimated_mean, estimated_std = gaussian_MLE(observed_data) estimated_data = generate_noisy_gaussian_dist(estimated_mean, estimated_std, 0) plt.hist(observed_data, bins=100) plt.hist(estimated_data, bins=100) ```