R語言學習路程

# R語言學習路程 - 學習影片：[5分鐘R語言系列](https://www.youtube.com/playlist?list=PLghWJf3e_h2GMPHNGbT4TN-kdfEvQgmPu) - Question： 1. [第十二集: 資料輸入輸出之資料進得去分析才出得來](https://hackmd.io/T3k405vvTfivEdKntggOVw?view#%E7%AC%AC%E5%8D%81%E4%BA%8C%E9%9B%86-%E8%B3%87%E6%96%99%E8%BC%B8%E5%85%A5%E8%BC%B8%E5%87%BA-%E4%B9%8B-%E8%B3%87%E6%96%99%E9%80%B2%E5%BE%97%E5%8E%BB-%E5%88%86%E6%9E%90%E6%89%8D%E5%87%BA%E5%BE%97%E4%BE%86-0819-Question) 2. [第十七集: 做圖（下）神套件 ggplot2 之下半部](https://hackmd.io/T3k405vvTfivEdKntggOVw?view#%E4%BD%BF%E7%94%A8gridExtra%E5%A5%97%E4%BB%B6%E5%88%87%E5%89%B2%E7%95%AB%E5%B8%83%EF%BC%9Agridarrangeg1g2g3g4nrow2ncol2-Question) - 學習成果： https://drive.google.com/drive/folders/1Uc_VghUKButF4EEqYbdllQP-4lmgzGAb?usp=sharing ## 第一集: 安裝下載及簡介(0808) ### 安裝及下載R - [R](https://cran.csie.ntu.edu.tw/bin/windows/base/R-4.2.1-win.exe) - [RStudio](https://download1.rstudio.org/desktop/windows/RStudio-2022.07.1-554.exe) - 命令執行欄 ![](https://i.imgur.com/hJRqwWz.png) ### 指派 - Note：若變數性質為文字，需於文字前後加上" 如："(文字)" - EX： a $=$ 5 或 a $<$ 5 ![](https://i.imgur.com/1WcO1ZY.png) ### remove() - 搜尋函數意思 $\Longrightarrow$ 前方加上問號 - EX：remove(a) ![](https://i.imgur.com/d7OZidU.png) ### getwd() - 用以尋找目前的工作目錄 ![](https://i.imgur.com/AFFBHYb.png) ### setwd() - 用以更改目前的工作目錄位址 - EX：setwd ( "C:/Users/" ) ![](https://i.imgur.com/3PUUrUg.png) ## 第二集: 認識向量、判斷類型、創建向量(0808) ### 六種向量類型: 數值、整數、邏輯、文字、日期、日期時間 ### 判斷向量類型: class() ; is.向量類型() ![](https://i.imgur.com/aGlldZh.png) ### 創建向量: c ( ) ; rep ( 物件 , 重複次數 ) ; seq ( 從 , 到 , 間隔 ) ![](https://i.imgur.com/aWNA3WX.png) ## 第三集: 六種向量(0808) ### 數值(numeric) ![](https://i.imgur.com/GY014me.png) ### 整數(integer) - 加個大L ![](https://i.imgur.com/cBR0sfZ.png) ### 數值和整數運算子 - 加 ( + ) 、減 ( - ) 、乘 ( * ) 、除 ( / ) - 商數 ( %/% ) 、餘數 ( %% ) ![](https://i.imgur.com/8qITG1J.png) ### 文字(character) - 引號使用注意 ![](https://i.imgur.com/jiZLGzB.png) ### 邏輯(logical) - 判斷條件 ![](https://i.imgur.com/pUge0I4.png) ### 日期(Date) - Sys.Date(),可轉數值 ![](https://i.imgur.com/IyjOLPd.png) ### 日期時間(POSIXct) - Sys.time(),可轉數值 ![](https://i.imgur.com/tC1h8lr.png) ## 第四集: 中括號就是你大哥 (0811) ### 向量是單一類型 - 混和不同類型的向量會是什麼類型? ![](https://i.imgur.com/CN0Hy6g.png) ### 判斷向量類型 - class() - is.向量類型() - 可參考： https://hackmd.io/T3k405vvTfivEdKntggOVw#%E7%AC%AC%E4%BA%8C%E9%9B%86-%E8%AA%8D%E8%AD%98%E5%90%91%E9%87%8F%E3%80%81%E5%88%A4%E6%96%B7%E9%A1%9E%E5%9E%8B%E3%80%81%E5%89%B5%E5%BB%BA%E5%90%91%E9%87%8F0808 - inherits(物件,”向量類型”) ![](https://i.imgur.com/4BTwRNj.png) ### 轉換向量類型 - as.向量類型() ![](https://i.imgur.com/82AaEeE.png) ### 向量的索引切割篩選 - [] 中括號哥 ![](https://i.imgur.com/Ztgj3XZ.png) ## 第五集: 神秘的流程控制還搞不懂if else,for,while ?? (0811) ### 流程控制基本上就是if else; for; while ### if else家族 : - if(){} ![](https://i.imgur.com/DPGejvG.png) - if(){} else{} ![](https://i.imgur.com/QMtiyNt.png) - if(){} else if (){} else if(){} else{} ![](https://i.imgur.com/UEcjOlv.png) ### for(x in 向量){....} ![](https://i.imgur.com/oh9cmAQ.png) ## 第六集: 較複雜資料結構(上) 之清單就是超級包包 (0811) ### while(條件判斷){...} - for 用在你已經知道要重複幾次， while 則否 ![](https://i.imgur.com/5TG4snd.png) ### 較複雜的資料結構： - 清單list - 因素factor - 資料框dataframe - 矩陣matrix - 陣列array ### 清單list()是超級包包 - [[]] 可用以索引篩選 ![](https://i.imgur.com/o0GLThz.png) - 還可以用 $ 號 ![](https://i.imgur.com/yvqBDjJ.png) ## 第七集: 較複雜資料結構(中) 之 Data frame 終於出來了 (0811) ### factor (向量,ordered=T,levels=c(...)) ![](https://i.imgur.com/xnRyhBV.png) ### data.frame(a,b,c) - 列是橫的行是直的 ![](https://i.imgur.com/tgv63hO.png) ### str( )可以看data frame的結構 ![](https://i.imgur.com/1R0YoGK.png) ### matrix(向量,nrow= , byrow= ) ![](https://i.imgur.com/meVYfpV.png) ### data frame和matrix都可以用中括號來索引 [列,行] ## 第八集: 較複雜資料結構(下) 之 Data frame學得好 R語言沒煩惱 (0811) ### Array是多維向量 - array(向量,dim = c(...)) ![](https://i.imgur.com/48PecNv.png) ### data frame基本操作: - Note：這個R內建的鳶尾花(iris)資料集是非常著名的生物資訊資料集之一，取自美國加州大學歐文分校的機械學習資料庫(http://archive.ics.uci.edu/ml/datasets/Iris)，資料的筆數為150筆，共有五個欄位： 1. 花萼長度(Sepal Length)：計算單位是公分。 2. 花萼寬度(Sepal Width)：計算單位是公分。 3. 花瓣長度(Petal Length) ：計算單位是公分。 4. 花瓣寬度(Petal Width)：計算單位是公分。 5. 類別(Class)：可分為Setosa，Versicolor和Virginica三個品種。 (參考資料：https://www.cc.ntu.edu.tw/chinese/epaper/0031/20141220_3105.html) - dim( ) - head( ) - tail( ) - colnames( ) - summary( ) ![](https://i.imgur.com/gnKWtvn.png) - 中括號裡可以放條件判斷 ![](https://i.imgur.com/9qH6fah.png) ### data frame - 直接用$加變數 ![](https://i.imgur.com/f7Nykwe.png) - 指派為NULL消除變數 ![](https://i.imgur.com/A8Gtuzz.png) ### rbind( )上下合併data frame ## 第九集: 資料框操作系列 (上) 之 dplyr帶你飛 (0813) ### install.packages (”包裹名”)、library(包裹名) ```R=+ install.packages("dplyr") ##安裝 library(dplyr) ##載入 ``` ### dplyr : filter 、 select 、 mutate 、 arrange 、 summarise ### filter (資料框,條件) 或用 [條件, ] - 功能：篩選條件用 ```R=+ filter(iris,iris$Sepal.Length > 6.5) ``` ![](https://i.imgur.com/ron7atj.png) ```R=+ iris[iris$Sepal.Length > 6.5,] ``` ![](https://i.imgur.com/qT4csmU.png) ### select (資料框,變數名) 或用 [ ,變數名] - 功能：列出單項數據內容 ```R=+ select(iris,Sepal.Length) ``` ![](https://i.imgur.com/Vt6K77W.png) ```R=+ iris[ ,"Sepal.Length"] ``` ![](https://i.imgur.com/uYfNAY8.png) ```R=+ iris[ ,"Sepal.Length" , drop = F ] ``` ![](https://i.imgur.com/jUcRXiK.png) ### mutate (資料框,新變數) 或用 $號 - 功能：選取一定範圍資料 ```R=+ mutate(iris,abc = 1:150) ``` ![](https://i.imgur.com/MGBAAe8.png) ### arrange (資料框,變數名) - 功能：將其中一項數據做大小排列 1. 遞增排序 ```R=+ arrange(iris,Sepal.Length) ``` ![](https://i.imgur.com/NRxLVki.png) 2. 遞減排序 ```R=+ arrange(iris,desc(Sepal.Length)) ``` ![](https://i.imgur.com/fjIx9Ot.png) ### summarise (資料框,描述統計) - 功能：同時取得描述統計資料 ```R=+ summarise( iris , mean(Sepal.Length) , sd(Sepal.Length) ) ``` ![](https://i.imgur.com/s8DkPfU.png) ## 第十集: 資料框操作系列 (中) 之 tidyr轉起來 (0816) ### dplyr : group_by 結合 summarise() - 功能：容易分類資料框 group_by 結合 summarise() %>%就是下一步 ```R=+ iris %>% group_by(Species)%>% summarise(mean(Sepal.Length) , mean(Sepal.Width)) ``` ![](https://i.imgur.com/dMk2sXg.png) ### tidyr : spread 及 gather 函數 - 長型 vs. 寬型長型：一列一個數寬型：一列兩個以上 ```R=+ a = c("male","male","female","female") b = c("black","white","black","white") c = c(80,40,60,70) df1 = data.frame(gender = a ,color = b ,population = c) ``` ![](https://i.imgur.com/3AcNDCl.png) - spread 目標資料框,key=””,value=”” ```R=+ df2 = spread(df1 ,key = "color" ,value = "population") ``` ![](https://i.imgur.com/E8e9CBQ.png) - gather 目標資料框,key=””,value=””, keylevel ```R=+ df3 = gather(df2 ,key = "color" ,value = "population" ,black ,white) ``` ![](https://i.imgur.com/9Es9EPP.png) ## 第十一集: 資料框操作系列 (下) 之資料框合體大法 (0816) ### rbind( df1,df2 ) - 數據設定 ```R=+ a = iris[1:5, ] b = iris[11:15, ] a b ``` ![](https://i.imgur.com/a9cHMTn.png) ```R=+ c = rbind(a,b) c ``` ![](https://i.imgur.com/z4vAlJu.png) ### cbind( df1,df2 ) - 數據設定 ```R=+ d = c[,1:3] e = c[,4:5] d e ``` ![](https://i.imgur.com/Aprli6H.png) ```R=+ f = cbind(d,e) f ``` ![](https://i.imgur.com/oPQjApN.png) ### merge( df1,df2,by.x=””,by.y=””,all.x=T,all.y=T ) - 數據設定 ```R=+ a = c("爸爸","媽媽","姐姐") b = c(88,99,77) c = c("爸爸","姑姑","媽媽") d = c(8,6,7) df1 = data.frame(a,b) df2 = data.frame(c,d) df1 df2 ``` ![](https://i.imgur.com/oqTkUs9.png) ```R=+ merge(df1,df2,by.x = "a",by.y = "c") ``` ```R=+ merge(df1,df2,by.x = "a",by.y = "c",all.x = T,all.y = T) ``` ![](https://i.imgur.com/hBuypui.png) ## 第十二集: 資料輸入輸出之資料進得去分析才出得來 (0819) **(Question!)** ### 輸入: - 如何輸入.txt: read.table(＂路徑＂,header=T,stringsAsFactors=F) - .tsv: read.table(＂路徑＂,header=T,stringsAsFactors=F) - .csv: read.table(＂路徑＂,header=T,stringsAsFactors=F,sep=＂，＂) 或用 read.csv (＂路徑＂,header=T,stringAsFactors=F) - .xlsx: 安裝包裹readxl 使用read_excel () ### 輸出: - write.table ( 資料框,＂路徑＂) ```R=+ a = c("爸爸","媽媽","姐姐") b = c(88,99,77) df1 = data.frame(a,b) df1 ``` ## 第十三集: 自創函數（上）函數創起來 ### 創建函數 : 1. ```R=+ abc= function (x){ #函數參數 x=2*(x+100); #執行內容 return(x) #回傳值 } abc(100) ``` ![](https://i.imgur.com/CoA3dBN.png) 2. ```R=+ bcd= function (x,y){ #函數參數 x=2*(x+100); #執行內容 y=3*(y+200); return(x/y) #回傳值 } bcd(10,20) ``` ![](https://i.imgur.com/n35uw6I.png) 3. ```R=+ cde= function (x,y,z = T){ x=2*(x+100); y=3*(y+200); if(z){ return(x/y); }else{ return(y/x); } bcd(10,20) ``` ### 全域變數 VS 區域變數 - 全域變數 ![](https://i.imgur.com/jn3mr17.png) Note：全域變數可以拿到函數區塊使用，但是存在於函數區塊中的區域變數並不能拉到函數區塊外使用 - 函數錯誤使用 ```R=+ sqrt('aaaaa') sqrt(-5) ``` ![](https://i.imgur.com/8mJAnbF.png) ## 第十四集: 自創函數（下）trycatch 及 apply家族介紹 ### tryout{ {return(...)}, warning=function(w){return()}, error=function(e){return()} } ```R=+ new.sqrt = function(x){ tryCatch( { return(sqrt(x)) }, #回傳 warning = function(w){ #警示發生時回傳隻訊息 return("請不要輸入負數或0")}, error = function(e){ #錯誤發生時回傳之訊息 return("請不要輸入文字向量")} ) } ``` ![](https://i.imgur.com/x7VO9j7.png) ### apply家族： - 數據設定 ```R=+ a = matrix(sample(1:50,20),ncol = 4) b = data.frame(a = c(1,2,3),b = c(7,8,9)) c = list(a = sample(1:10,4),b = c(T,T,F,F)) a b c ``` ![](https://i.imgur.com/m7DnljT.png) - apply(矩陣或資料框, 1或2, 函數) ```R=+ apply(a,1,sum) apply(a,2,sum) apply(b,1,mean) apply(b,2,sd) ``` ![](https://i.imgur.com/l4Dn1FW.png) - lapply(清單, 函數) ```R=+ lapply(c,sum) ``` ![](https://i.imgur.com/eucJqHe.png) - sapply(清單, 函數) ```R=+ sapply(c,sum) ``` ![](https://i.imgur.com/JEgNhtz.png) ```R=+ sapply(c,sum,simplify = F) ``` ![](https://i.imgur.com/PaC1MfP.png) ## 第十五集: 做圖（上）基礎繪圖系統 ### 直方圖 hist( ) ```R=+ hist(iris$Sepal.Length) hist(iris$Sepal.Length,main = "IRIS的Sepal Length", ylab = "頻率", xlab = "Sepal Length", las = 1, col = "lightblue", border = "darkblue") ``` ![](https://i.imgur.com/N6vLD2j.png) ### 長條圖 barplot(table(...)) ```R=+ barplot(table(iris$Species)) barplot(table(iris$Species), main = "IRIS的Species", xlab = "Species", las = 1) barplot(table(iris$Species), main = "IRIS的Species", xlab = "Species", horiz = T, las = 1, cex.names = 1, cex.axis = 1.2) ``` ![](https://i.imgur.com/1Kj4rrR.png) ![](https://i.imgur.com/Jdm9PuQ.png) ### 盒鬚圖 boxplot(Y~X) ```R=+ boxplot(iris$Sepal.Length ~ iris$Species) boxplot(iris$Sepal.Length ~ iris$Species,main = "各品種IRIS的Sepal Length", xlab = "品種", ylab = "Sepal Length", las = 1) ``` ![](https://i.imgur.com/TGFY3Uy.png) ![](https://i.imgur.com/X3O5n7L.png) ### 散布圖 plot(Y~X) ```R=+ plot(iris$Sepal.Length ~ iris$Sepal.Width, main = "Sepal length, width cor", xlab = "Sepal width", ylab = "Sepal length", las = 1, pch = 16, col = "red") ``` ![](https://i.imgur.com/QPUjnkF.png) ![](https://i.imgur.com/j89eNfd.png) ### 函數圖 curve(函數,from to) ```R=+ curve(x^2, from = 0, to = 100, las = 1) ``` ![](https://i.imgur.com/FWTooer.png) ![](https://i.imgur.com/aTWzG44.png) ### 歷年趨勢圖 plot(Y~X,type=”1”) - 數據設定 ```R=+ b = data.frame(a = 81:108, b = c(sample(100:120,7),sample(120:140,7),sample(140:160,7),sample(160:180,7))) ``` ```R=+ plot(b$b~b$a,type = "l",las = 1) ``` ![](https://i.imgur.com/sQi6WBt.png) ![](https://i.imgur.com/oDRnjHn.png) ### 重要參數 : main, xlab, ylab, las, border ,cex.names, cex.axis, pch, col - main：圖的名稱 - xlab：x軸名稱 - ylab：y軸名稱 - las：y軸數值便正 - border：直方邊線顏色 - cex.names：字體大小 - cex.axis：字體大小 - pch：點的形狀 - col：直方顏色 - horiz：長條圖變水平 ### 畫板切割 par(mfrow=c()) ```R=+ par(mfrow=c(3,3)) hist(iris$Sepal.Length,main = "IRIS的Sepal Length", ylab = "頻率", xlab = "Sepal Length", las = 1, col = "lightblue", border = "darkblue") boxplot(iris$Sepal.Length ~ iris$Species) boxplot(iris$Sepal.Length ~ iris$Species,main = "各品種IRIS的Sepal Length", xlab = "品種", ylab = "Sepal Length", las = 1) boxplot(iris$Sepal.Length ~ iris$Species,main = "各品種IRIS的Sepal Length", xlab = "品種", ylab = "Sepal Length", las = 1) plot(iris$Sepal.Length ~ iris$Sepal.Width, main = "Sepal length, width cor", xlab = "Sepal width", ylab = "Sepal length", las = 1, pch = 16, col = "red") curve(x^2, from = 0, to = 100, las = 1) b = data.frame(a = 81:108, b = c(sample(100:120,7),sample(120:140,7),sample(140:160,7),sample(160:180,7))) plot(b$b~b$a,type = "l",las = 1) ``` ![](https://i.imgur.com/yCqJXkn.png) ![](https://i.imgur.com/wNpsEQK.png) ## 第十六集: 做圖（中）神套件 ggplot2 之上半部 ### 先安裝啟動ggplot2, dplyr (或其他具%>%的套件) ```R=+ install.packages("ggplot2") library(ggplot2) library(dplyr) ``` ### 資料框 %大於% ggplot( aes(x= , y= ) ) + geom_圖類型( 相關參數設定 ) - ggplot2 直方圖 ```R=+ iris %>% ggplot(aes(x = Petal.Length)) + geom_histogram(bins = 40) iris %>% ggplot(aes(x = Petal.Length)) + geom_histogram(bins = 100) ``` ![](https://i.imgur.com/M4IZ29N.png) ![](https://i.imgur.com/EoxcG28.png) - ggplot2 盒鬚圖 ```R=+ iris %>% ggplot(aes(x = Species, y = Petal.Length)) + geom_boxplot() ``` ![](https://i.imgur.com/MGBHwHw.png) ![](https://i.imgur.com/7Ooslsv.png) - ggplot2 歷年變化線圖 - 數據設定 ```R=+ b = data.frame(a = 81:108, b = c(sample(100:120,7),sample(120:140,7),sample(140:160,7),sample(160:180,7))) ``` ![](https://i.imgur.com/pVdHMHB.png) - 程式碼 ```R=+ b %>% ggplot(aes(x = a, y = b)) + geom_line() ``` ![](https://i.imgur.com/3iB0jGf.png) ![](https://i.imgur.com/NUFhH4a.png) - ggplot2 散布圖 ```R=+ iris %>% ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point(shape = 6, color = "blue") iris %>% ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point(aes(shape = Species, color = Species)) ``` ![](https://i.imgur.com/uPDueDG.png) ![](https://i.imgur.com/4UsBCE3.png) - ggplot2 長條圖 ```R=+ iris %>% ggplot(aes(x = Species)) + geom_bar() ``` ![](https://i.imgur.com/LdBhzZK.png) ![](https://i.imgur.com/RjXbAj8.png) ### 其他會用到的函數： coord_flip ```R=+ iris %>% ggplot(aes(x = Species)) + geom_bar() + coord_flip() ``` ![](https://i.imgur.com/L3A7zW3.png) ![](https://i.imgur.com/kojQVlL.png) ## 第十七集: 做圖（下）神套件 ggplot2 之下半部 ### 其他會用到的函數：ggtitle, xlab, ylab ```R=+ iris %>% ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point() + ggtitle("Petal 長寬散布圖") + xlab("長") + ylab("寬") ``` ![](https://i.imgur.com/SNPwRGp.png) ![](https://i.imgur.com/nfgxcIr.png) ### 隱藏格線 - 隱藏主要格線：panel.grid.major = element_blank() - 隱藏次要格線：panel.grid.minor = element_blank() - 隱藏 X 軸主要格線：panel.grid.major.x = element_blank() - 隱藏 Y 軸主要格線：panel.grid.major.y = element_blank() - 隱藏 X 軸次要格線：panel.grid.minor.x = element_blank() - 隱藏 Y 軸次要格線：panel.grid.minor.y = element_blank() ```R=+ iris %>% ggplot(aes(x = Petal.Length)) + geom_histogram(bins = 30) + theme(panel.grid.major = element_blank()) ``` ![](https://i.imgur.com/WyhKVJQ.png) ![](https://i.imgur.com/5tly5lf.png) ### 使用gridExtra套件切割畫布：grid.arrange(g1,g2,g3,g4,nrow=2,ncol=2) **(Question!)** - 安裝 ```R=+ install.packages("gridExtra") library(gridExtra) ``` - 程式碼 ```R=+ grid.arrange(g1, g2, g3, g4, nrow = 2, ncol = 2) ```