a.資料科學 - Numpy 基礎篇

# a.資料科學 - Numpy 基礎篇 ###### tags: `Data Science Tokyo` ## 1.函式庫 * **Magic Command 魔術指令 (Numpy及Matplotlib)** ```python=+ %precision 3 #設定該檔案輸出顯示到小數點後第3位 %matplotlib inline #設定在該位置顯示圖表，而不是另開視窗 ``` ## * **匯入本章節函式庫** ```python=+ #為了使用下面的函式庫，請預先匯入 import numpy as np import numpy.random as ran import scipy as sp import pandas as pd from pandas import Series, DataFrame #視覺化函式庫 import matplotlib.pyplot as plt import matplotlib as mpl import seaborn as sns %matplotlib inline #顯示到小數點第三位 %precision 3 ``` ## 2.Numpy 的基礎 * **陣列的操作** ```python=+ #製作陣列 data = np.array([9,2,3,4,10,6,7,8,1,5]) data ``` > ```array([ 9, 2, 3, 4, 10, 6, 7, 8, 1, 5])``` ## * **資料型別** ```python=+ #資料的型別 data.dtype ``` > ```dtype('int32')``` ## * **維度與元素數量** ```python=+ print("維度:",data.ndim) print("元素數量:",data.size) ``` > ```維度: 1 ``` > ```元素數量: 10``` ## * **對所有的維度進行運算** ```python=+ #將各個數字乘上數倍(2倍為例) data*2 ``` > ```array([18, 4, 6, 8, 20, 12, 14, 16, 2, 10])``` ## * **續 - 對所有的維度進行運算** ```python=+ print("乘法運算:",np.array([1,2,3,4,5,6,7,8,9,10]) * np.array([10,9,8,7,6,5,4,3,2,1])) print("連乘:",np.array([1,2,3,4,5,6,7,8,9,10]) ** 2) print("除法運算:",np.array([1,2,3,4,5,6,7,8,9,10]) / np.array([10,9,8,7,6,5,4,3,2,1])) ``` > ```乘法運算: [10 18 24 28 30 30 28 24 18 10]``` > ```連乘: [ 1 4 9 16 25 36 49 64 81 100]``` > ```除法運算: [ 0.1 0.222 0.375 0.571 0.833 1.2 1.75 2.667 4.5 10. ]``` ## * **排序** ```python=+ print("排序之前:",data) data.sort() print("排序之後:",data) ``` > ```排序之前: [ 9 2 3 4 10 6 7 8 1 5]``` > ```排序之後: [ 1 2 3 4 5 6 7 8 9 10]``` ## * **續 - 排序** ```python=+ #從尾端開始逐一取出排列 data[::-1].sort() print('排序之後:',data) ``` > ```排序之後: [10 9 8 7 6 5 4 3 2 1]``` ## * **最小、最大、總和、累積的計算** ```python=+ #最大值 print("Min:",data.min()) #最小值 print("Max:",data.max()) #總和 print("Sum:",data.sum()) #累積和 print("CumSum:",data.cumsum()) #累積比例 print("Ratio:",data.cumsum() / data.sum()) ``` > ```Min: 1``` > ```Max: 10``` > ```Sum: 55``` > ```CumSum: [10 19 27 34 40 45 49 52 54 55]``` > ```Ratio: [0.182 0.345 0.491 0.618 0.727 0.818 0.891 0.945 0.982 1. ]``` ## 3.亂數 * **亂數的產生** > **Python也有亂數功能，但資料科學領域通常使用「Numpy」的亂數功能** ```python=+ import numpy.random as ran ran.seed(0) #產生常態分佈(平均為0、標準差為1)的10個亂數 rand_data = random.randn(10) print('含有10個亂數的陣列:',rand_data) ``` > ```含有10個亂數的陣列: [ 1.764 0.4 0.979 2.241 1.868 -0.977 0.95 -0.151 -0.103 0.411]``` ## * **資料的隨機取出** ```python=+ data = np.array([9,2,3,4,10,6,7,8,1,5]) #取出10個(允許重複，放回抽樣) print(ran.choice(data,10)) #取出10個(不允許重複，不放回抽樣) print(ran.choice(data,10,replace=False)) ``` > ```[ 7 8 8 1 2 6 5 1 5 10]``` > ```[10 2 7 8 3 1 6 5 9 4]``` ## * **★ Numpy非常快** ```python=+ n = 10*6 normal_data = [ran.random() for _ in range(n)] #Numpy版 numpy_random_data = np.array(normal_data) #calc time:總和 #一般的處理 %timeit sum(normal_data) #使用Numpy的處理 %timeit np.sum(numpy_random_data) ``` > ```The slowest run took 5.40 times longer than the fastest. This could mean that an intermediate result is being cached.1000000 loops, best of 3: 422 ns per loop``` > ```The slowest run took 120.44 times longer than the fastest. This could mean that an intermediate result is being cached.100000 loops, best of 3: 4.14 µs per loop``` ## 4.矩陣 * **矩陣** ```python=+ np.arange(9) ``` > ```array([0, 1, 2, 3, 4, 5, 6, 7, 8])``` ## * **矩陣的基礎** ```python=+ array1 = np.arange(9).reshape(3,3) print(array1) ``` > ```[[0 1 2]``` > ```[3 4 5]``` > ```[6 7 8]]``` ## * **從矩陣取出陣列** ```python=+ print(array1[0,:]) #第一列，全部的行 array1[:,0] #所有列，第一行 ``` > ```[0 1 2]``` > ```array([0, 3, 6])``` ## * **矩陣的運算** ```python=+ array2 = np.arange(9,18).reshape(3,3) np.dot(array1,array2) #矩陣之積 ``` > ```array([[ 42, 45, 48],``` > ```[150, 162, 174],``` > ```[258, 279, 300]])``` ## * **續 - 矩陣的運算** ```python=+ array1 * array2 #各自的元素進行乘法 ``` > ```array([[ 0, 10, 22],``` > ```[ 36, 52, 70],``` > ```[ 90, 112, 136]])``` ## * **製作元素為0或1的矩陣** ```python=+ print(np.zeros((2,3),dtype = np.int64)) print(np.ones((2,3),dtype = np.float64)) ``` > ```[[0 0 0]``` > ``` [0 0 0]]``` > ```[[1. 1. 1.]``` > ``` [1. 1. 1.]]``` ## 時間戳記 > [name=ZEOxO][time=Mon, Mar 8, 2021 17:50 PM][color=#907bf7] > [name=ZEOxO][time=Mon, Mar 7, 2022 11:09 PM][color=#907bf7]