08_Data Visualization

matplotlib是Python中廣泛被運用的函示庫
可製作長條圖、折線圖、散點圖

安裝matplotlib

$ pip install matplotlib

繪製折線圖

from matplotlib import pyplot as plt

# [1950, 1960,..., 2010]
years = range(1950, 2011, 10)
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

# 建立一個折線圖，x軸是年份，y軸是GDP
# 也可用速寫法plt.plot(years, gdp, 'b-', marker='o')
plt.plot(years, gdp, color='blue', marker='o', linestyle='solid') 
# 加入圖表標題
plt.title('Nominal GDP')
# 加入x軸標籤
plt.xlabel('Year')
# 加入y軸標籤
plt.ylabel('Billion of $')
# 也可以用plt.savefig(fname)存成圖檔
plt.show()

在matplotlib輸入中文字串時會出現亂碼，原因為此模組中沒有支援中文的字型，解決方式可以參考這篇文章: https://medium.com/marketingdatascience/解決python-3-matplotlib與seaborn視覺化套件中文顯示問題-f7b3773a889b

繪製多個資料序列的折線圖

from matplotlib import pyplot as plt
# [1, 2, 4, 8, 16, 32, 64, 128, 256]
variance = [2**n for n in range(9)]
bias_squared = sorted(variance)
total_error = [x + y for x, y in zip(variance, bias_squared)]
xs = range(len(variance))

## 透過呼叫多次plt.plot來繪製多個資料序列的折線圖
plt.plot(xs, variance, 'g-', label='variance') # 綠色實線
plt.plot(xs, bias_squared, 'r-.', label='bias_squared') # 紅色點虛線
plt.plot(xs, total_error, 'b:', label='total error') # 藍色點線
# 將圖例說明設定在圖表中上位置處(loc=9)
plt.legend(loc=9)

plt.xlabel('model complexity')
# 把x軸的資料點拿掉
plt.xtick([])
plt.title('The Bias_Variance Tradeoff')
plt.show()

繪製長條圖

from matplotlib import pyplot as plt

movies = ['Star Wars', 'The Avenger', 'Kingdom', 'Lord of the rings', 'Game of Thrones']

rates = [8, 7, 4, 8, 9]

plt.bar(movies, ratess, color='blue')
plt.title('My Favorate Movies')
plt.xlabel('Movie')
plt.ylabel('Rate')
plt.show()

以數值為x坐標軸的長條圖

from matplotlib import pyplot as plt
from collections import Counter

grades = [83, 95, 91, 87, 70, 0, 85, 82, 100, 67, 73, 77, 0]
# 以10為級距歸納資料: 83 -> 80, 67 -> 60
# 100要歸在90的級距內，因此使用min(n // 10 * 10, 90)做轉換
# Counter函數可以直接計算串列中數值出現的數字匯出一個字典
# histogram = {0: 2, 60: 1, ....}
histogram = Counter(min(n // 10 * 10, 90) for n in grades)

# x坐標軸需要位移5，讓各個bar落在正確的區間0~10, 10~20, ...
# plt.bar預設的bar寬度為0.8，將寬度拉寬為10橫跨整個區間
# 修改bar的邊界顏色讓bar之間清楚分割edgecolor='black'
plt.bar([x + 5 for x in histogram.keys()], histogram.values), width='10', edgecolor='black')
# 將x軸資料點名稱設為0, 10, 20, ...100
plt.xticks([10 * n for n in range(11)])
# 將x軸的數值範圍設為-5 ~ 105, y軸設為0~5
plt.axis([-5, 105, 0, 5])
plt.xlabel('Decile')
plt.ylabel('Number of Students')
plt.title('Distribution of Exam 1 Grades')
plt.show()

散點圖

from matplotlib import pyplot

friends = [70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

plt.scatter(friends, minutes)
# 為每個資料點寫上標籤
for label, f, m in zip(labels, friends, minutes):
    # 預設xytext=(x, y)即標籤會與資料點重疊，因此這裡設為(5, -5)
    # textcoords設為offset points代表以資料點為依據位移多少字元
    plt.annotate(label, xy=(f, m), xytext=(5, -5), textcoords='offset points')

plt.title('Daily Minutes vs. Number of Friends')
plt.xlabel('Number of Friends')
plt.ylabel('Daily minutes spend on the sites')

plt.show()

08_Data Visualization

安裝matplotlib

繪製折線圖

繪製多個資料序列的折線圖

繪製長條圖

以數值為x坐標軸的長條圖

散點圖

Read more

RNA-Seq overivew slide

RNA-Seq Analysis

三角函數

FASTAQ format