###### tags: `Python`,`pandas`,`Standard deviation`
# Python 常態分布(Standard deviation)結合資料視覺化
**探討Standard deviation**
**NumPy Standard deviation**
```python=
import numpy as np
arr = np.array([1,2,3])
#Vectorization
print(np.mean(arr))
print(np.std(arr)) #母體標準差(Population standard deviation)
print(np.std(arr,ddof=1)) #樣本標準差(Sample standard deviation)
#2.0
#0.816496580927726
#1.0
```
**Pandas Standard deviation**
```python=
#DataFrame 二維表格
import pandas as pd
df = pd.DataFrame({'a':[1,2,3]})
print(df)
print(df['a'].describe())
#describe裡面的std預設樣本標準差
print(df['a'].std(ddof=0))#母體標準差
# a
#0 1
#1 2
#2 3
#count 3.0
#mean 2.0
#std 1.0 ==>樣本標準差
#25% 1.5
#50% 2.0
#75% 2.5
#max 3.0
#Name: a, dtype: float64
#0.816496580927726
```
**NumPy Standard deviation**
=>std 預設為母體標準差(Population standard deviation)
**Pandas Standard deviation**
=>std 預設為樣本標準差(Sample standard deviation)
```python=
import pandas as pd
df = pd.DataFrame({'x':[1,2,3,8]})
#df = pd.DataFrame({'x':[1,2,3,9]})#跑箱型圖可以直接檢視離群值
print(df['x'].describe())
##boxplot可以很快知道是否有離群值
#箱型圖
import matplotlib.pyplot as plt
df.boxplot(column="x")
plt.show()
```
**如下圖,資料視覺化,用箱型圖來顯示,很快就能將離群值(outliner)給找出來**
離群值會以一個小圓圈方式呈現
![](https://i.imgur.com/lCiIJYq.jpg)
以下數據是經由特別設計過的數據,來表示其
```python=
import pandas as pd
df1 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset1.txt')
df2 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset2.txt')
df3 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset3.txt')
df4 = pd.read_csv(r'C:\Users\user\Desktop\Data Visualization\dataset4.txt')
print("************df1************")
print(df1['x'].describe())
print("************df2************")
print(df2['x'].describe())
print("************df3************")
print(df3['x'].describe())
print("************df4************")
print(df4['x'].describe())
import matplotlib.pyplot as plt
plt.subplot(2,2,1)
plt.scatter(df1.x,df1.y)
plt.subplot(2,2,2)
plt.scatter(df2.x,df2.y)
plt.subplot(2,2,3)
plt.scatter(df3.x,df3.y)
plt.subplot(2,2,4)
plt.scatter(df4.x,df4.y)
plt.show()
```
**這個經過設計的數據,有著相同的平均值、std,但視覺化的資料是完全不一樣的**
![](https://i.imgur.com/Iz0ILLT.jpg)
![](https://i.imgur.com/DUqpUIT.jpg)
References:
<https://zh.wikipedia.org/zh-tw/%E6%AD%A3%E6%80%81%E5%88%86%E5%B8%83>
常態分布(中國大陸作正態分布,香港作正態分佈,英語:Normal distribution),又名高斯分布(英語:Gaussian distribution)、正規分布,是一個非常常見的連續機率分布。常態分布在統計學上十分重要,經常用在自然和社會科學來代表一個不明的隨機變數。
常態分布的機率密度函數曲線呈鐘形,因此人們又經常稱之為鐘形曲線(類似於寺廟裡的大鐘,因此得名)。
![](https://i.imgur.com/vTrTAdB.jpg)
```python=
import matplotlib.pyplot as plt
import numpy as np
import math
x = np.arange(start=-10,stop=10,step=0.5)
y = 1/math.sqrt(2*math.pi)*math.e**(-1/2*x**2)#標準常態分布簡化公式
plt.scatter(x,y)
plt.show()
```
![](https://i.imgur.com/apu7E8c.jpg)
```python=
import matplotlib.pyplot as plt
import numpy as np
import math
x = np.arange(start=-3,stop=3,step=0.1)
y = 1/(math.sqrt(2*math.pi))*math.e**(-x**2/2)
# y1 = 1/math.sqrt(2*math.pi)*math.e**(-1/2*x**2)
plt.plot(x,y)
# plt.plot(x,y1)
plt.show()
```
![](https://i.imgur.com/EDgNXxJ.jpg)
**利用scipy模組來畫**
References:
<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html?highlight=scipy%20stats%20norm%20pdf>
![](https://i.imgur.com/0dZQRTL.jpg)
```python=
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np
mu = 0
std = 1
x = np.arange(-3,3,.1)
y = stats.norm.pdf(x,loc=mu,scale=std)
plt.style.use('dark_background') #喜歡的樣式
plt.title('Normal Distribution') #標題
plt.xlabel('x')
plt.ylabel('y')
plt.plot(x,y)
plt.show()
```
![](https://i.imgur.com/UQYCtbx.jpg)
**matplotlib樣式使用**
References:
<https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html>