---
title: 'R 語言學習心得-資料處理篇'
disqus: hackmd
---
R 語言學習心得
資料處理篇
===
![downloads](https://img.shields.io/badge/download-R-brightgreen)
![grade](https://img.shields.io/badge/Grade-新手-brightgreen)
![chat](https://img.shields.io/discord/:serverId.svg)
---
## install "dplyr" package
```r=
install.packages("dplyr")
library(dplyr)
```
----
讀入資料(csv)
---
```r=
data <- read.csv("位置/檔名.csv")
# 若是首行的第一個字出現中文亂碼,可以嘗試以下寫法
data <- read.csv("位置/檔名csv",fileEncoding = "UTF-8-BOM")
```
* console可以透過更改工作目錄來直接讀取檔案 *
```r=
#查看目前工作目錄位置
getwd()
#設定工作目錄位置
setwd("C://USER....")
getwd()
data <- read.csv("檔名.csv")
```
----
輸出資料
---
```r=
#SWD_data 是要輸出的資料表
head(SWD_data)
write.csv(SWD_data,file="SWD_data.csv",row.names = FALSE)
```
* 圖表的話,可用ggplot2 進行輸出 *
**png / jpeg 都可以**
```r=
library(ggplot2)
--(進行數據處理 + 畫圖)---
#範例
ggplot(data_new) + geom_point(aes(x = Year , y = Num.Level.3.and.4 , color = Borough , shape = factor(Borough)))
ggsave('2006~2012 Level 3&4 in NYC.png')
```
---
dplyr 應用
---
:::info
六大常用函數 : filter() / select() / mutate() / arrange() / summarise() / group_by()
:::
tidylog package 也可以下載來用
可以顯示資料處理相關結果
```r=
install.packages("tidylog")
library(tidylog)
```
----
filter()
---
```r=
data <- head(iris)
data
data_new <- data %>% filter(Sepal.Width >= 3.5)
data_new
```
**filter remove 那行 => tidylog 的作用**
![](https://i.imgur.com/Cq3mfIl.png)
**多個條件可以以逗號隔開(AND)**
![](https://i.imgur.com/XIjcTF0.png)
**OR條件式也可以用 "||" / "|"進行表達**
![](https://i.imgur.com/6t0KFQw.png)
----
select()
---
**多個欄位以逗號隔開**
```r=
data <- read.csv("All_grade_SWD.csv")
str(data) # 檢視dataframe 的結構
data_new <- data %>% select(Borough , Year , Mean.Scale.Score , Num.Level.3.and.4 )
View(data_new) # 以額外視窗檢視資料
```
**原始data 有15個column name,每個col有35個row**
![](https://i.imgur.com/1ZfAqlM.png)
**new data 有4個column name,每個col有35個row**
![](https://i.imgur.com/wftyxmY.png)
----
mutate()
---
```r=
# 觀察數值,如果想要把人數跟分數一起比較的話,視覺上好比較要進行分數的倍數調整
data_new <- data_new %>% mutate(eight_times_Scores = Mean.Scale.Score * 8)
```
![](https://i.imgur.com/EGXxYcT.png)
----
arrange()
---
**以球衣背號進行排序**
```r=
number <- chicago_bulls %>% arrange(No.)
View(number)
```
![](https://i.imgur.com/iMTeatF.png)
**以球衣背號進行排序(降序排列)**
```r=
# 加入 desc() 來包覆要排序的欄位
number <- chicago_bulls %>% arrange(desc(No.))
View(number)
```
![](https://i.imgur.com/RvrsZ0p.png)
----
summarize()
---
summarize 可以搭配組合函數 : 平均 mean() / 總和 sum() / 標準差 sd()
```r=
data%>% summarize(avg_su = mean(DIVORCE))
data%>% summarize(sum_su = sum(DIVORCE))
data%>% summarize(std_su = sd(DIVORCE))
```
![](https://i.imgur.com/4R3tZT9.png)
----
group_by()
---
可以依照不同組別進行統計
```r=
data_new <- data%>% group_by(District)%>%summarize(mean(DIVORCE))
```
![](https://i.imgur.com/Dnp4scT.png)
如果要依照兩個以上排序進行分組,group_by以,分隔
**.groups = 'drop'如果不加的話也可以跑 , 但是會出現warning訊息**
```r=
data_new2 <- data%>% group_by(District,YEAR)%>%
summarize(mean(DIVORCE),.groups = 'drop')
```
![](https://i.imgur.com/JaTtjnN.png)
----
Rename Columns
---
當你想要把欄位改名時,可以用 rename() 函數
rename(新欄位名稱 = 現有欄位名稱)
```r=
colnames(data) # 先看data裡面的欄位名稱
data <- data %>% rename(NY = YEAR)
```
![](https://i.imgur.com/uvHTzD9.png)
當想要一次改多一點欄位就用","隔開
```r=
data <- data %>% rename(id = ID , divorce = DIVORCE)
colnames(data)
```
![](https://i.imgur.com/P84q9gZ.png)
----
Compare 2 data frame
===
用 dplyr 的 anti-join()可以快速找出在第一個dataframe但是不在第二個dataframe的元素,可以應用 ex.線上點名
```r=
getwd()
setwd("C:\\Users\\user\\Desktop")
getwd()
# import csv
library(dplyr)
library(tidylog)
class <- read.csv("class.csv",fileEncoding = "UTF-8-BOM")
head(class)
attendence <- read.csv("today.csv",fileEncoding = "UTF-8-BOM")
head(attendence)
# compare
abscence <- anti_join(class,attendence)
abscence
```
![](https://i.imgur.com/a5FqDRD.png)
---
## Next Lesson ...
1. Data type : DataFrame
2. Drawing Graphics
3. "Packages" need to be install
## More tutorial / note
1. [my coding-blog](fatcatcat-lab.blogspot.com)
2. [my movie-blog](fatcatcat-movie.blogspot.com)
###### tags: `R` `beginner` `cat` `tutorial`