---
# System prepended metadata

title: Week 1 Data visualization

---

# Week 1 Data visualization

## What would you like to show?
1. Comparison 比較 Bar chart; Box plot
2. Distribution 分布 Histogram; Boxplot
3. Composition 組成 Pie chart; Stacked bar chart
4. Relationship 關係 Scatter plot(bubble chart); Heat map

> * 直方圖 Histogram 資料分布狀況
> * 散布圖 Scatter plot 兩種變數的關係
> * 泡泡圖 Bubble plot 三種變數的關係

## Using ggplot2 package

ggplot <span class="red">**不能**</span>做的事情：
1. 3D graphics 
2. Graph type graphs 點線網絡的graph(nodes/edges layout)
3. Interactive graphics 互動式的graph

ggplot is a building block of a graph include:
1. data; aesthetic mapping; geometric object
2. statistical transformations; scales
3. coordinate system; position adjustments
4. faceting

### 基本架構：

aesthetics + geometric objects

aesthetics 的參數設定：

aes()
* position
* color (outside)
* fill (inside color)
* shape
* linetype
* size

geometric objects: geom_()

* geom_point()
* geom_line()
* geom_boxplot()
* geom_histogram()
* geom_bar()
* geom_smoother()
* geom_raster()

ggplot(): 準備畫布 (canvas)

## Using data: landdata.csv
讀檔
```r
library(ggplot2)
housing <- read.csv("檔案位置")
head(housing) 
```

## Histogram
```r
# 1. 
hist(housing$Home.value) 
# 2.
ggplot(housing, aes(x = Home.value)) + geom_histogram()
```
![#1](https://i.imgur.com/dU9y7Ps.png)
![#1](https://i.imgur.com/5HrHjnL.png)

---

## More complex graphs
Traditional plot():
```r
plot(Home.Value ~ Date, data=subset(housing, State == "MA"), type="l")
lines(Home.Value ~ Date, col="red", data=subset(housing, State == "TX"))
legend(1975, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1,1))
```
![](https://i.imgur.com/kwe1hVB.png)

subset():
對屬性資料表去做查詢並擷取出來

By ggplot():
```r
data <- subset(housing, State %in% c("MA", "TX"))

ggplot(data,aes(x=Date, y=Home.Value, color = State)) + geom_line()
```
![](https://i.imgur.com/e3qVXT6.png)

> 補充 %in% 運算元:
> 判斷左邊集合的元素有沒有在右邊集合中，有則回傳 TRUE，沒有則回傳 FALSE

## Points (Scatter Plot)
```r
hp2001Q1 <- subset(housing, Date == 2001.25) 
ggplot(hp2001Q1, aes(y = Structure.Cost, x = Land.Value)) + geom_point()
ggplot(hp2001Q1, aes(y = Structure.Cost, x = log(Land.Value))) + geom_point()
```
![](https://i.imgur.com/EqTkgGu.png)

![](https://i.imgur.com/LOP6HJV.png)
(取LOG)

---
## Lines 
```r
# 回歸線的截距和係數 (Intercept & coefficient)
model<-lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1)
# 預測值
hp2001Q1$pred.SC <- predict(model)

p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))

p1 + geom_point(aes(color = Home.Value)) + geom_line(aes(y = pred.SC))
```
![](https://i.imgur.com/rMeOlJM.png)

---

## Smoothers (趨勢線)
```r
p1 + geom_point(aes(color = Home.Value)) + geom_smooth()
```
![](https://i.imgur.com/RWBL0BZ.png)

---

## Aesthetic Mapping vs. Assignment (scaltter plot)

```r
p1 + geom_point(size = 2, color="red")
```
![](https://i.imgur.com/RqZScyH.png)

```r
p1 + geom_point(aes(color=Home.Value, shape = region))
```
![](https://i.imgur.com/nNGF6Gw.png)

```r
p1 + geom_point(aes(size=Home.Value, color = region))
```
![](https://i.imgur.com/425jRhr.png)

---

## Bar chart: geom_bar()
```r
ggplot(housing, aes(x=region)) + geom_bar()
```
![](https://i.imgur.com/B2WKcC4.png)

### Stacked Bar Chart
```r
ggplot(housing, aes(x=Year, fill=region)) 
      + geom_bar() 
      + labs(title = "Stacked Bar Chart", x = "YEAR", y = "Counts")
```
![](https://i.imgur.com/vx0EvXs.png)

#### aggregate(要的資料, 用什麼分類, 用什麼函數(mean,sum等等))
```r
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean)

ggplot(housing.sum, aes(x=State, y=Home.Value)) + geom_bar(stat='identity') # 一定要指定要畫的資料是什麼stat='identity'
```
![](https://i.imgur.com/Pl0ess8.png)

---

## Pie chart
用ggplot2畫圓餅圖的原理是將長條圖的y座標改成極座標
x軸不放資料，用顏色去區分region
```r
housing2.sum <- aggregate(housing["Home.Value"], housing["region"], FUN=length) 

ggplot(housing2.sum, aes(x=region, y=Home.Value))+geom_bar(stat='identity')+labs(y="Counts")

ggplot(housing2.sum, aes(x="", y=Home.Value, fill =factor(region)))+geom_bar(stat='identity',width=1)+coord_polar(theta = "y", start=0)
```

![](https://i.imgur.com/og1GQ8K.png)
![](https://i.imgur.com/7Ft5uR1.png)

---

## Box plot

```r
ggplot(housing, aes(x = region, y= Home.Value)) + geom_boxplot(fill = "red")+ scale_y_continuous("hoem value", breaks= seq(0,800000, by=100000))
```
![](https://i.imgur.com/9FJ8pQx.png)

---

## Heat map
```r
ggplot(housing, aes(x= Year, y= Qrtr)) + geom_raster(aes(fill = Home.Value)) +scale_fill_continuous(name="Value", breaks = c(200000, 500000, 800000), labels = c("'200", "'500", "'800"), low='gray', high='red')
```

![](https://i.imgur.com/OIuIVln.png)