# Week 1 Data visualization
## What would you like to show?
1. Comparison 比較 Bar chart; Box plot
2. Distribution 分布 Histogram; Boxplot
3. Composition 組成 Pie chart; Stacked bar chart
4. Relationship 關係 Scatter plot(bubble chart); Heat map
> * 直方圖 Histogram 資料分布狀況
> * 散布圖 Scatter plot 兩種變數的關係
> * 泡泡圖 Bubble plot 三種變數的關係
## Using ggplot2 package
ggplot <span class="red">**不能**</span>做的事情:
1. 3D graphics
2. Graph type graphs 點線網絡的graph(nodes/edges layout)
3. Interactive graphics 互動式的graph
ggplot is a building block of a graph include:
1. data; aesthetic mapping; geometric object
2. statistical transformations; scales
3. coordinate system; position adjustments
4. faceting
### 基本架構:
aesthetics + geometric objects
aesthetics 的參數設定:
aes()
* position
* color (outside)
* fill (inside color)
* shape
* linetype
* size
geometric objects: geom_()
* geom_point()
* geom_line()
* geom_boxplot()
* geom_histogram()
* geom_bar()
* geom_smoother()
* geom_raster()
ggplot(): 準備畫布 (canvas)
## Using data: landdata.csv
讀檔
```r
library(ggplot2)
housing <- read.csv("檔案位置")
head(housing)
```
## Histogram
```r
# 1.
hist(housing$Home.value)
# 2.
ggplot(housing, aes(x = Home.value)) + geom_histogram()
```


---
## More complex graphs
Traditional plot():
```r
plot(Home.Value ~ Date, data=subset(housing, State == "MA"), type="l")
lines(Home.Value ~ Date, col="red", data=subset(housing, State == "TX"))
legend(1975, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1,1))
```

subset():
對屬性資料表去做查詢並擷取出來
By ggplot():
```r
data <- subset(housing, State %in% c("MA", "TX"))
ggplot(data,aes(x=Date, y=Home.Value, color = State)) + geom_line()
```

> 補充 %in% 運算元:
> 判斷左邊集合的元素有沒有在右邊集合中,有則回傳 TRUE,沒有則回傳 FALSE
## Points (Scatter Plot)
```r
hp2001Q1 <- subset(housing, Date == 2001.25)
ggplot(hp2001Q1, aes(y = Structure.Cost, x = Land.Value)) + geom_point()
ggplot(hp2001Q1, aes(y = Structure.Cost, x = log(Land.Value))) + geom_point()
```


(取LOG)
---
## Lines
```r
# 回歸線的截距和係數 (Intercept & coefficient)
model<-lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1)
# 預測值
hp2001Q1$pred.SC <- predict(model)
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
p1 + geom_point(aes(color = Home.Value)) + geom_line(aes(y = pred.SC))
```

---
## Smoothers (趨勢線)
```r
p1 + geom_point(aes(color = Home.Value)) + geom_smooth()
```

---
## Aesthetic Mapping vs. Assignment (scaltter plot)
```r
p1 + geom_point(size = 2, color="red")
```

```r
p1 + geom_point(aes(color=Home.Value, shape = region))
```

```r
p1 + geom_point(aes(size=Home.Value, color = region))
```

---
## Bar chart: geom_bar()
```r
ggplot(housing, aes(x=region)) + geom_bar()
```

### Stacked Bar Chart
```r
ggplot(housing, aes(x=Year, fill=region))
+ geom_bar()
+ labs(title = "Stacked Bar Chart", x = "YEAR", y = "Counts")
```

#### aggregate(要的資料, 用什麼分類, 用什麼函數(mean,sum等等))
```r
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean)
ggplot(housing.sum, aes(x=State, y=Home.Value)) + geom_bar(stat='identity') # 一定要指定要畫的資料是什麼stat='identity'
```

---
## Pie chart
用ggplot2畫圓餅圖的原理是將長條圖的y座標改成極座標
x軸不放資料,用顏色去區分region
```r
housing2.sum <- aggregate(housing["Home.Value"], housing["region"], FUN=length)
ggplot(housing2.sum, aes(x=region, y=Home.Value))+geom_bar(stat='identity')+labs(y="Counts")
ggplot(housing2.sum, aes(x="", y=Home.Value, fill =factor(region)))+geom_bar(stat='identity',width=1)+coord_polar(theta = "y", start=0)
```


---
## Box plot
```r
ggplot(housing, aes(x = region, y= Home.Value)) + geom_boxplot(fill = "red")+ scale_y_continuous("hoem value", breaks= seq(0,800000, by=100000))
```

---
## Heat map
```r
ggplot(housing, aes(x= Year, y= Qrtr)) + geom_raster(aes(fill = Home.Value)) +scale_fill_continuous(name="Value", breaks = c(200000, 500000, 800000), labels = c("'200", "'500", "'800"), low='gray', high='red')
```
