資料探勘
散點圖矩陣允許同時看到多個單獨變量的分佈和它們兩兩之間的關係,是為後續分析識別趨勢的很棒方法。
In this notebook we will explore making pair plots in Python using the seaborn visualization library. We'll start with the default sns.pairplot.
Let's use the entire dataset to create a simple, yet useful plot.
我們可以看到 life-exp 和 gdp_per_cap 是正相關的,這表明較高收入國家的國民要活得更久一些(儘管這並不能表明二者存在因果關係)。這也顯示出世界範圍內的人口壽命隨着時間逐漸增長。我們可以從直方圖中瞭解到人口和 GDP 變量呈嚴重右偏態分佈。(右偏:右側尾部較長,平均數大於中位數)
In order to better understand the data, we can color the pair plot using a categorical variable and the hue keyword. First, we will color the plots by the continent.
We can also see that the distribution of pop and gdp_per_cap is heavily skewed to the right. To better represent the data, we can take the log transform of those columns.
現在我們發現大洋洲和歐洲趨向於擁有最高的期望壽命,而亞洲擁有最多的人口量。
代碼出自:https://github.com/WillKoehrsen/Data-Analysis/blob/master/pairplots/Pair%20Plots.ipynb