--- tags: Python Workshop 沈煒翔 --- # Lesson 8: Plotting ## Line plot Matplotlib is a library commonly used for plotting figures in Python. ```python import matplotlib.pyplot as plt import numpy as np x = np.linspace(0, 2, 100) # Sample data. plt.figure() plt.plot(x, x, label='linear') # Plot some data on the (implicit) axes. plt.plot(x, x**2, label='quadratic') # etc. plt.plot(x, x**3, label='cubic') plt.xlabel('x label') plt.ylabel('y label') plt.title("Simple Plot") plt.legend() ``` In some environments, you have to call ```plt.show()``` at the end of the script to show the figures. ![](https://i.imgur.com/L9QUUwo.png) You can control some properties of the line plot. ```python plt.figure() plt.plot(x, x, 'r.', label='linear') # color + line style plt.plot(x, x**2, '.-', label='quadratic') # line style ``` ![](https://i.imgur.com/IemNLDD.png) ## Scatter plot We can plot a 2D scatter plot. ```python x = np.random.normal(loc=0, scale=1, size=200) y = np.random.normal(loc=0, scale=1, size=200) x2 = np.random.normal(loc=3, scale=2, size=200) y2 = np.random.normal(loc=2, scale=2, size=200) plt.figure() plt.scatter(x, y) plt.scatter(x2, y2) plt.legend(['a', 'b']) ``` ![](https://i.imgur.com/05MgyLK.png) ## Histogram We can plot a 1D histogram ```python x = np.random.normal(loc=10, scale=3, size=5000) plt.figure() plt.hist(x, bins=50) ``` ![](https://i.imgur.com/iNf0cEh.png) ## Bar chart We can plot a bar chart to display multiple 1D data. ```python people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim') y_pos = np.arange(len(people)) performance = 3 + 10 * np.random.rand(len(people)) error = np.random.rand(len(people)) plt.figure() plt.barh(y_pos, performance, xerr=error, align='center') plt.yticks(ticks=y_pos, labels=people) ``` ![](https://i.imgur.com/EwEM6ro.png) ### Exercise Plot a line ```y = x^2 + 3x -5``` and plot another line in points using the same equation but with some errors. ```python x = np.linspace(-5, 5, 30) ``` ![](https://i.imgur.com/wlZEN8x.png) ### Exercise Plot three 2D normal distribution. ![](https://i.imgur.com/97aiaYA.png) Input a random point (e.g. (0,0)), use the nearest neighbor method to find which distribution it belongs to. ```python sample_point = (0, 0) ``` ## Data processing Assume we have a 1D time series that is noisy. We can apply moving average smoothing to eliminate some noise. ```python x = np.linspace(-5, 5, 100) y = x**2 + 3*x - 5 + 2*np.random.randn(100) plt.figure() plt.plot(x, y, '.-') y_smoothed = np.zeros(x.shape) for i in range(5, len(x)-5): y_smoothed[i] = np.mean(y[i-5:i+5]) y_smoothed[:5] = np.isnan y_smoothed[-5:] = np.isnan plt.figure() plt.plot(x, y, '.-') plt.plot(x, y_smoothed, '.-') ``` ![](https://hackmd.io/_uploads/SJzTLhMvi.png) ![](https://hackmd.io/_uploads/Sy6tPhzDo.png) ### Exercise Assume we have a 1D time series, but some points are loss and record as nan. Correct the nan using the average of the previous and the next value. ```python x = np.linspace(-5, 5, 100) y = x**2 + 3*x - 5 y_drop = y for i in range(100): if np.random.random() < 0.05: y_drop[i] = np.nan plt.figure() plt.plot(x, y_drop, '.-') # YOUR CORRECTION # HINT: use np.isnan() to detect nan values! # YOUR PLOTTING ``` Solution: ```python # YOUR CORRECTION # HINT: use np.isnan() to detect nan values! y_corrected = np.zeros(y_drop.shape) for i in range(len(y_drop)): if np.isnan(y_drop[i]): # reconstruct y_corrected[i] = (y_drop[i-1] + y_drop[i+1])/2 else: y_corrected[i] = y_drop[i] # YOUR PLOTTING plt.figure() plt.plot(x, y_corrected, '.-') ``` ## Optimization Assume we observe a data and want to model it with a ```y=ax+b``` system. ```python x = np.linspace(-5, 5, 100) y = 3*x - 5 + np.random.randn(100) plt.figure() plt.plot(x, y) ``` ![](https://i.imgur.com/qf3ngV9.png) We can solve the optimal (a, b) using linear sweeping. ```python optimal_error = 1e20 optimal_a = 0 optimal_b = 0 for a in np.arange(-10, 10, 0.1): for b in np.arange(-10, 10, 0.1): y_pred = a*x + b error = np.mean((y_pred - y)**2) if error < optimal_error: optimal_a = a optimal_b = b optimal_error = error print(optimal_a, optimal_b) ``` Then we can plot it. ```python y_pred = optimal_a*x + optimal_b plt.figure() plt.plot(x, y) plt.plot(x, y_pred) ``` ![](https://i.imgur.com/ywb6a47.png) Linear sweeping is the most naive approach and takes a lot of time. You should learn more advance optimization techniques (e.g. gradient descent) later in the course. ### Exercise Assume there is a ```y=ax^2+bx+c``` system. Follow the above methods. 1. Generate the 1D data (with noise) 2. Use linear sweeping to model it with a ```y=ax+b``` system 3. Plot (line plot) both data (observation/prediction)