###### tags: `Python`,`pandas`,`Standard deviation` # Python 常態分布(Standard deviation)結合資料視覺化 **探討Standard deviation** **NumPy Standard deviation** ```python= import numpy as np arr = np.array([1,2,3]) #Vectorization print(np.mean(arr)) print(np.std(arr)) #母體標準差(Population standard deviation) print(np.std(arr,ddof=1)) #樣本標準差(Sample standard deviation) #2.0 #0.816496580927726 #1.0 ``` **Pandas Standard deviation** ```python= #DataFrame 二維表格 import pandas as pd df = pd.DataFrame({'a':[1,2,3]}) print(df) print(df['a'].describe()) #describe裡面的std預設樣本標準差 print(df['a'].std(ddof=0))#母體標準差 # a #0 1 #1 2 #2 3 #count 3.0 #mean 2.0 #std 1.0 ==>樣本標準差 #25% 1.5 #50% 2.0 #75% 2.5 #max 3.0 #Name: a, dtype: float64 #0.816496580927726 ``` **NumPy Standard deviation** =>std 預設為母體標準差(Population standard deviation) **Pandas Standard deviation** =>std 預設為樣本標準差(Sample standard deviation) ```python= import pandas as pd df = pd.DataFrame({'x':[1,2,3,8]}) #df = pd.DataFrame({'x':[1,2,3,9]})#跑箱型圖可以直接檢視離群值 print(df['x'].describe()) ##boxplot可以很快知道是否有離群值 #箱型圖 import matplotlib.pyplot as plt df.boxplot(column="x") plt.show() ``` **如下圖,資料視覺化,用箱型圖來顯示,很快就能將離群值(outliner)給找出來** 離群值會以一個小圓圈方式呈現 ![](https://i.imgur.com/lCiIJYq.jpg) 以下數據是經由特別設計過的數據,來表示其 ```python= import pandas as pd df1 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset1.txt') df2 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset2.txt') df3 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset3.txt') df4 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset4.txt') print("************df1************") print(df1['x'].describe()) print("************df2************") print(df2['x'].describe()) print("************df3************") print(df3['x'].describe()) print("************df4************") print(df4['x'].describe()) import matplotlib.pyplot as plt plt.subplot(2,2,1) plt.scatter(df1.x,df1.y) plt.subplot(2,2,2) plt.scatter(df2.x,df2.y) plt.subplot(2,2,3) plt.scatter(df3.x,df3.y) plt.subplot(2,2,4) plt.scatter(df4.x,df4.y) plt.show() ``` **這個經過設計的數據,有著相同的平均值、std,但視覺化的資料是完全不一樣的** ![](https://i.imgur.com/Iz0ILLT.jpg) ![](https://i.imgur.com/DUqpUIT.jpg) References: <https://zh.wikipedia.org/zh-tw/%E6%AD%A3%E6%80%81%E5%88%86%E5%B8%83> 常態分布(中國大陸作正態分布,香港作正態分佈,英語:Normal distribution),又名高斯分布(英語:Gaussian distribution)、正規分布,是一個非常常見的連續機率分布。常態分布在統計學上十分重要,經常用在自然和社會科學來代表一個不明的隨機變數。 常態分布的機率密度函數曲線呈鐘形,因此人們又經常稱之為鐘形曲線(類似於寺廟裡的大鐘,因此得名)。 ![](https://i.imgur.com/vTrTAdB.jpg) ```python= import matplotlib.pyplot as plt import numpy as np import math x = np.arange(start=-10,stop=10,step=0.5) y = 1/math.sqrt(2*math.pi)*math.e**(-1/2*x**2)#標準常態分布簡化公式 plt.scatter(x,y) plt.show() ``` ![](https://i.imgur.com/apu7E8c.jpg) ```python= import matplotlib.pyplot as plt import numpy as np import math x = np.arange(start=-3,stop=3,step=0.1) y = 1/(math.sqrt(2*math.pi))*math.e**(-x**2/2) # y1 = 1/math.sqrt(2*math.pi)*math.e**(-1/2*x**2) plt.plot(x,y) # plt.plot(x,y1) plt.show() ``` ![](https://i.imgur.com/EDgNXxJ.jpg) **利用scipy模組來畫** References: <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html?highlight=scipy%20stats%20norm%20pdf> ![](https://i.imgur.com/0dZQRTL.jpg) ```python= import matplotlib.pyplot as plt import scipy.stats as stats import numpy as np mu = 0 std = 1 x = np.arange(-3,3,.1) y = stats.norm.pdf(x,loc=mu,scale=std) plt.style.use('dark_background') #喜歡的樣式 plt.title('Normal Distribution') #標題 plt.xlabel('x') plt.ylabel('y') plt.plot(x,y) plt.show() ``` ![](https://i.imgur.com/UQYCtbx.jpg) **matplotlib樣式使用** References: <https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html>