# Week 1 Data visualization ## What would you like to show? 1. Comparison 比較 Bar chart; Box plot 2. Distribution 分布 Histogram; Boxplot 3. Composition 組成 Pie chart; Stacked bar chart 4. Relationship 關係 Scatter plot(bubble chart); Heat map > * 直方圖 Histogram 資料分布狀況 > * 散布圖 Scatter plot 兩種變數的關係 > * 泡泡圖 Bubble plot 三種變數的關係 ## Using ggplot2 package ggplot <span class="red">**不能**</span>做的事情: 1. 3D graphics 2. Graph type graphs 點線網絡的graph(nodes/edges layout) 3. Interactive graphics 互動式的graph ggplot is a building block of a graph include: 1. data; aesthetic mapping; geometric object 2. statistical transformations; scales 3. coordinate system; position adjustments 4. faceting ### 基本架構: aesthetics + geometric objects aesthetics 的參數設定: aes() * position * color (outside) * fill (inside color) * shape * linetype * size geometric objects: geom_() * geom_point() * geom_line() * geom_boxplot() * geom_histogram() * geom_bar() * geom_smoother() * geom_raster() ggplot(): 準備畫布 (canvas) ## Using data: landdata.csv 讀檔 ```r library(ggplot2) housing <- read.csv("檔案位置") head(housing) ``` ## Histogram ```r # 1. hist(housing$Home.value) # 2. ggplot(housing, aes(x = Home.value)) + geom_histogram() ``` ![#1](https://i.imgur.com/dU9y7Ps.png) ![#1](https://i.imgur.com/5HrHjnL.png) --- ## More complex graphs Traditional plot(): ```r plot(Home.Value ~ Date, data=subset(housing, State == "MA"), type="l") lines(Home.Value ~ Date, col="red", data=subset(housing, State == "TX")) legend(1975, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1,1)) ``` ![](https://i.imgur.com/kwe1hVB.png) subset(): 對屬性資料表去做查詢並擷取出來 By ggplot(): ```r data <- subset(housing, State %in% c("MA", "TX")) ggplot(data,aes(x=Date, y=Home.Value, color = State)) + geom_line() ``` ![](https://i.imgur.com/e3qVXT6.png) > 補充 %in% 運算元: > 判斷左邊集合的元素有沒有在右邊集合中,有則回傳 TRUE,沒有則回傳 FALSE ## Points (Scatter Plot) ```r hp2001Q1 <- subset(housing, Date == 2001.25) ggplot(hp2001Q1, aes(y = Structure.Cost, x = Land.Value)) + geom_point() ggplot(hp2001Q1, aes(y = Structure.Cost, x = log(Land.Value))) + geom_point() ``` ![](https://i.imgur.com/EqTkgGu.png) ![](https://i.imgur.com/LOP6HJV.png) (取LOG) --- ## Lines ```r # 回歸線的截距和係數 (Intercept & coefficient) model<-lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1) # 預測值 hp2001Q1$pred.SC <- predict(model) p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost)) p1 + geom_point(aes(color = Home.Value)) + geom_line(aes(y = pred.SC)) ``` ![](https://i.imgur.com/rMeOlJM.png) --- ## Smoothers (趨勢線) ```r p1 + geom_point(aes(color = Home.Value)) + geom_smooth() ``` ![](https://i.imgur.com/RWBL0BZ.png) --- ## Aesthetic Mapping vs. Assignment (scaltter plot) ```r p1 + geom_point(size = 2, color="red") ``` ![](https://i.imgur.com/RqZScyH.png) ```r p1 + geom_point(aes(color=Home.Value, shape = region)) ``` ![](https://i.imgur.com/nNGF6Gw.png) ```r p1 + geom_point(aes(size=Home.Value, color = region)) ``` ![](https://i.imgur.com/425jRhr.png) --- ## Bar chart: geom_bar() ```r ggplot(housing, aes(x=region)) + geom_bar() ``` ![](https://i.imgur.com/B2WKcC4.png) ### Stacked Bar Chart ```r ggplot(housing, aes(x=Year, fill=region)) + geom_bar() + labs(title = "Stacked Bar Chart", x = "YEAR", y = "Counts") ``` ![](https://i.imgur.com/vx0EvXs.png) #### aggregate(要的資料, 用什麼分類, 用什麼函數(mean,sum等等)) ```r housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean) ggplot(housing.sum, aes(x=State, y=Home.Value)) + geom_bar(stat='identity') # 一定要指定要畫的資料是什麼stat='identity' ``` ![](https://i.imgur.com/Pl0ess8.png) --- ## Pie chart 用ggplot2畫圓餅圖的原理是將長條圖的y座標改成極座標 x軸不放資料,用顏色去區分region ```r housing2.sum <- aggregate(housing["Home.Value"], housing["region"], FUN=length) ggplot(housing2.sum, aes(x=region, y=Home.Value))+geom_bar(stat='identity')+labs(y="Counts") ggplot(housing2.sum, aes(x="", y=Home.Value, fill =factor(region)))+geom_bar(stat='identity',width=1)+coord_polar(theta = "y", start=0) ``` ![](https://i.imgur.com/og1GQ8K.png) ![](https://i.imgur.com/7Ft5uR1.png) --- ## Box plot ```r ggplot(housing, aes(x = region, y= Home.Value)) + geom_boxplot(fill = "red")+ scale_y_continuous("hoem value", breaks= seq(0,800000, by=100000)) ``` ![](https://i.imgur.com/9FJ8pQx.png) --- ## Heat map ```r ggplot(housing, aes(x= Year, y= Qrtr)) + geom_raster(aes(fill = Home.Value)) +scale_fill_continuous(name="Value", breaks = c(200000, 500000, 800000), labels = c("'200", "'500", "'800"), low='gray', high='red') ``` ![](https://i.imgur.com/OIuIVln.png)