【R Language 自學筆記】

# **【R Language 自學筆記】** :::info - 安裝 - 基本操作 - 參數 - 查詢內建 - 數值運算 - print - 條件判斷 - while vs for - 產生序列、複製元素 - random numbers - for迴圈 - 創建向量 - 離開 - Data Structure: list、factor、df... - Math - 字串處理 - 內建資料集 - 創建、讀取檔案: .csv、.txt、.xlsx、.json、RSQLite - 篩選資料: 篩選 [列,行/欄]、排序、output資料到檔案 - 進階 dplyr ::: ### 安裝 - R+RStudio [R下載](https://cran.r-project.org/) ![螢幕擷取畫面 2024-01-21 161325](https://hackmd.io/_uploads/SkL4zU9t6.png) [RStudio 開發環境下載](https://posit.co/download/rstudio-desktop/) ![螢幕擷取畫面 2024-01-21 161932](https://hackmd.io/_uploads/H15oQIcta.png) ![螢幕擷取畫面 2024-01-21 162036](https://hackmd.io/_uploads/Skr1NL5ta.png) - 開啟 RStudio ![螢幕擷取畫面 2024-01-21 162803](https://hackmd.io/_uploads/SJfnrU9F6.png) ![螢幕擷取畫面 2024-01-21 162907](https://hackmd.io/_uploads/BkEdUUcFp.png) ![螢幕擷取畫面 2024-01-21 163147](https://hackmd.io/_uploads/SkMYILqtp.png) - 新增RScript路徑先找到路徑，複製 ![螢幕擷取畫面 2024-01-21 164121](https://hackmd.io/_uploads/ryHRuL5Fp.png) 本機(右鍵) - 內容 ![螢幕擷取畫面 2024-01-21 164257](https://hackmd.io/_uploads/Hyd8FI9Fp.png) 進階系統設定 - 環境變數 ![螢幕擷取畫面 2024-01-21 164848](https://hackmd.io/_uploads/HJaqqU5K6.png) ![螢幕擷取畫面 2024-01-21 164902](https://hackmd.io/_uploads/Syxj58ctp.png) 新增加入剛才複製的路徑 ![螢幕擷取畫面 2024-01-21 165048](https://hackmd.io/_uploads/rkDZo8qtp.png) - 終端機執行 cd 到檔案位置 - Rscript 執行檔案 ![螢幕擷取畫面 2024-01-21 165713](https://hackmd.io/_uploads/rkZqh89Yp.png) - VSCode執行 cd 到檔案位置 - Rscript 執行檔案 ![1705827776106](https://hackmd.io/_uploads/HkGmCU9KT.jpg) ![1705828011154](https://hackmd.io/_uploads/SJc60IqKT.jpg) >PS 如果之後要用VScode跑，要安裝模組 ![螢幕擷取畫面 2024-01-22 140647](https://hackmd.io/_uploads/BkPILKst6.png) ctrl+enter執行，會詢問你要不要跳到r終端機，點"確認" ![螢幕擷取畫面 2024-01-22 140716](https://hackmd.io/_uploads/rJ8dIFoFa.png) ![螢幕擷取畫面 2024-01-22 140923](https://hackmd.io/_uploads/S1Zj8tsY6.png) ### 基本操作 ctrl+enter 執行 ctrl+l: 清空Consloe頁面 alt+shift+k 查詢快捷鍵 ![1705829808079](https://hackmd.io/_uploads/BJyJ8P9Ka.jpg) [風格](http://adv-r.had.co.nz/Style.html) 其實只要你會Python，能轉換滿快的整體邏輯和Python、pandas、numpy差不多，例如del(num)，在R是rm(num)；type(num)，在R是class(num) 類似在jupyter notebook 使用，R不需要print(num)，只要num就可印出參數 - 參數 ```= number_1 = 6 number_1 # alt + - number_2 <- 6 number_2 # 移除 rm(number_2) ``` - 查詢內建 ```= help(iris) ?iris ``` - 數值運算 ```= # + 3+2 # - 3-1 # * 3*2 # / 3/2 # ** or ^ 3**3 3^3 # %/% # 取商數，python是 % 5%%2 # %% # 取餘數，python是 // 5%/%2 # ^0.5 # 開根號 4^0.5 ``` ![螢幕擷取畫面 2024-01-21 194248](https://hackmd.io/_uploads/SJvB7FctT.png) - print = 與 <- 用法相同 ```= name <- "Catalina" height <- 1.53 weight <- 46 bmi <- weight / height**2 sprintf("%s's BMI is %s", name, bmi) sprintf("%s's BMI is %.2f", name, bmi) ``` ![螢幕擷取畫面 2024-01-22 132343](https://hackmd.io/_uploads/S1lxnOiY6.png) - 條件判斷 ```= (bmi > 18) | (bmi < 20) ``` ![螢幕擷取畫面 2024-01-22 134006](https://hackmd.io/_uploads/HJv6yYiYT.png) ```= if (bmi > 18){ print("要減脂肪") } ``` ![螢幕擷取畫面 2024-01-22 133911](https://hackmd.io/_uploads/HJGqyYjYp.png) ```= if (bmi > 18) | (bmi < 20) { print("剛剛好") } ``` ![螢幕擷取畫面 2024-01-22 133902](https://hackmd.io/_uploads/Byw5yYota.png) ```= if (bmi < 17) { print("你太瘦了") } else if (17 < bmi & bmi < 20) { print("剛剛好") } else { print("要減脂肪!") } ``` ![螢幕擷取畫面 2024-01-22 135110](https://hackmd.io/_uploads/By8vGYiYp.png) ```= # 賦予參數值 num = 10 if ((num%%2) == 0){ result = "是偶數" }else { result = "是基數" } result ``` ![螢幕擷取畫面 2024-01-22 143325](https://hackmd.io/_uploads/BymH2toK6.png) ```= # 賦予參數值 if (bmi < 17) { tag = "你太瘦了" } else if (17 < bmi & bmi < 20) { tag = "剛剛好" } else { tag = "要減脂肪!" } tag ``` ![螢幕擷取畫面 2024-01-22 141320](https://hackmd.io/_uploads/SyAtwtiFa.png) - while vs for ```= # while n = 1 while (n <= 40) { if (n %% 20 == 0) { ans <- "yes" } else if (n %% 5 == 0) { ans <- "five" } else if (n %% 4 == 0) { ans <- "four" } else { ans = n } print(ans) n = n + 1 } # for for (n in 1:40) { if (n %% 20 == 0) { ans <- "yes" } else if (n %% 5 == 0) { ans <- "five" } else if (n %% 4 == 0) { ans <- "four" } else { ans = n } print(ans) } ``` ![螢幕擷取畫面 2024-01-22 192936](https://hackmd.io/_uploads/B1Gh-Cit6.png) - while ```= # 印出1~100 n = 1 while (n <= 100) { print(n) n = n + 1 } ``` ![螢幕擷取畫面 2024-01-22 150009](https://hackmd.io/_uploads/r1DtfqiK6.png) ```= # 印出1+2+...+100 n = 1 sum = 0 while (n <= 100) { sum = sum + n n = n + 1 } print(sum) ``` ![螢幕擷取畫面 2024-01-22 150234](https://hackmd.io/_uploads/ByOGX9st6.png) ```= # 印出2+4+6...+100 even = 2 sum_even = 0 while (even <= 100 & even%%2 == 0) { sum_even = sum_even + even even = even + 2 } print(sum_even) ``` ![螢幕擷取畫面 2024-01-22 150540](https://hackmd.io/_uploads/Hy4079sta.png) >for 寫法 ```= sum_100 = 0 for (i in seq(2, to = 100, by = 2)) { sum_100 = sum_100 + i } sum_100 sum(seq(2, to = 100, by = 2)) ``` ![螢幕擷取畫面 2024-01-22 181548](https://hackmd.io/_uploads/B1mwlpiYp.png) 印出特定結果 ```= # 避開4、13 floor n = 1 n_floors = 20 while (n <= n_floors) { if(n %in% c(4, 13)) { n = n + 1 next } else { print(sprintf("%s 樓", n)) n = n + 1 } } ``` ![螢幕擷取畫面 2024-01-22 165044](https://hackmd.io/_uploads/BkmOnssFp.png) >for寫法 ```= for (i in 1:20) { if(n %in% c(4, 13)) { next } else { print(sprintf("%s 樓", i)) } } ``` ![螢幕擷取畫面 2024-01-22 185142](https://hackmd.io/_uploads/r1Cad6iFT.png) 練習: 骰子題 ```= # 要骰幾次，才有五個5? # 方法1 system.time({ dice = 1:6 dice_rolls = c() five_count = 0 while (TRUE) { one_dice = sample(dice, 1) dice_rolls = c(dice_rolls, one_dice) if (one_dice[1] == 5) { five_count = five_count + 1 } if (five_count == 5) { break } } print(length(dice_rolls)) print(dice_rolls) }) # 方法2 system.time({ dice = 1:6 dice_rolls = c() while (TRUE) { one_dice = sample(dice, 1) dice_rolls = c(dice_rolls, one_dice) if (sum(dice_rolls == 5) == 5) { break } } print(length(dice_rolls)) print(dice_rolls) }) ``` ![螢幕擷取畫面 2024-01-23 170620](https://hackmd.io/_uploads/rylsWZpKa.png) ![螢幕擷取畫面 2024-01-23 170624](https://hackmd.io/_uploads/r1LjWbpKa.png) ```= # 要骰幾次，才有連續五個5? # 方法一 system.time({ dice = 1:6 dice_rolls = c() five_count = 0 while (TRUE) { one_dice = sample(dice, 1) dice_rolls = c(dice_rolls, one_dice) if (one_dice[1] == 5) { five_count = five_count + 1 } else { five_count = 0 } if (five_count == 5) { break } } cat("Method 1:\n") cat("Length:", length(dice_rolls), "\n") cat("Last 5 elements:", dice_rolls[(length(dice_rolls) - 4):length(dice_rolls)], "\n") }) cat("Dice Rolls:", dice_rolls, "\n") # 方法二 system.time({ dice = 1:6 dice_rolls = c() while (TRUE) { one_dice = sample(dice, 1) dice_rolls = c(dice_rolls, one_dice) # 檢查是否有連續五個5 if (length(dice_rolls) >= 5 && all(dice_rolls[(length(dice_rolls) - 4):length(dice_rolls)] == 5)) { break } } cat("Method 2:\n") cat("Length:", length(dice_rolls), "\n") cat("Last 5 elements:", dice_rolls[(length(dice_rolls) - 4):length(dice_rolls)], "\n") }) cat("Dice Rolls:", dice_rolls, "\n") ``` ![螢幕擷取畫面 2024-01-23 171205](https://hackmd.io/_uploads/rk5bQWTYp.png) ![螢幕擷取畫面 2024-01-23 171211](https://hackmd.io/_uploads/rkn-X-pta.png) - 產生序列、複製元素 ```= # 類似Python np.arange、np.linspace seq(from = 1, to = 7, by = 2) seq(from = 1, to = 7, length.out = 3) ``` ![螢幕擷取畫面 2024-01-22 170956](https://hackmd.io/_uploads/Hyme-hsta.png) ```= # 類似Python np.tile rep(7, times = 7) rep("77", times = 7) rep(TRUE, times = 2) ``` ![螢幕擷取畫面 2024-01-22 171128](https://hackmd.io/_uploads/BJ6Sb3jt6.png) - random numbers ```= set.seed(100) random_numbers <- sample(1:1000, size = 100, replace = FALSE) # odds for (n in random_numbers) { if (n %% 2 == 1) { print(n) } } #odds random_numbers[random_numbers %% 2 == 1 ] ``` ![螢幕擷取畫面 2024-01-22 175135](https://hackmd.io/_uploads/Bkih92oFp.png) ![螢幕擷取畫面 2024-01-22 175140](https://hackmd.io/_uploads/ByW652iYT.png) ```= # 到31的倍數就停止 l = c() while (TRUE) { l_num = sample(1:1000, size = 1) l = c(l, l_num) if (l_num %% 31 == 0) { break } } length(l) l ``` ![螢幕擷取畫面 2024-01-23 155241](https://hackmd.io/_uploads/HytLelTFp.png) 加上隨機種子 ```= set.seed(100) random_numbers <- sample(1:1000, size = 100, replace = FALSE) # odds odds = c() for (n in random_numbers) { if (n %% 2 == 1) { odds = c(odds, n) } } odds length(odds) # odds odds = c() n = 1 while (n <= length(random_numbers)) { random_num = random_numbers[n] if (random_num %% 2 == 1) { odds = c(odds, random_num) } n = n + 1 } odds length(odds) ``` ![螢幕擷取畫面 2024-01-22 192229](https://hackmd.io/_uploads/B1Hbg0oF6.png) - for迴圈 ```= for (x in seq(from = 1, to = 7, by = 2)) { print(x) } ``` ![螢幕擷取畫面 2024-01-22 180825](https://hackmd.io/_uploads/SJG3Rnjta.png) 1~100 基數個數 ```= # 1~100 基數個數 count_odd = 0 for (n in 1:100){ if (n%%2 == 1){ count_odd = count_odd + 1 } } print(count_odd) ``` ![螢幕擷取畫面 2024-01-22 151828](https://hackmd.io/_uploads/r1-RL9st6.png) >while寫法，要多一個參數，指定起始點 ```= count_odd_w = 0 n = 1 while (n <= 100) { if (n %% 2 == 1) { count_odd_w = count_odd_w + 1 } n = n + 1 } print(count_odd_w) ``` ![螢幕擷取畫面 2024-01-22 151828](https://hackmd.io/_uploads/r1-RL9st6.png) x~y 基數個數 ```= # x~y 基數個數 x = 12 y = 34 count_odd_r = 0 for (n in x:y) { if (n %% 2 == 1) { count_odd_r = count_odd_r + 1 } } print(count_odd_r) ``` ![螢幕擷取畫面 2024-01-22 152706](https://hackmd.io/_uploads/HJ5RO5jKT.png) >while寫法，要多一個參數，指定起始點 ```= x = 12 y = 34 n = x count_odd_r = 0 while (n <= y) { if (n %% 2 == 1) { count_odd_r = count_odd_r + 1 } n = n + 1 } print(count_odd_r) ``` ![螢幕擷取畫面 2024-01-22 152706](https://hackmd.io/_uploads/HJ5RO5jKT.png) 因數 ```= # x 有多少因數? x = 33 count_divisor = 0 for (n in 1:x) { if (x %% n == 0) { count_divisor = count_divisor + 1 print(n) } } print(count_divisor) ``` ![螢幕擷取畫面 2024-01-22 160140](https://hackmd.io/_uploads/Hk4ebjjKa.png) >c() 寫法 ```= x = 33 divisors = c() for (n in 1:x) { if (x %% n == 0) { divisors = c(divisors, n) } } divisors ``` ![螢幕擷取畫面 2024-01-22 183035](https://hackmd.io/_uploads/HkrkEaoYp.png) >while寫法，要多一個參數，指定起始點 ```= x = 33 n = 1 count_divisor = 0 while(n <= x){ if(x %% n == 0){ count_divisor = count_divisor + 1 print(n) } n = n + 1 } print(count_divisor) ``` ![螢幕擷取畫面 2024-01-22 183044](https://hackmd.io/_uploads/BJmlEpiYp.png) 質數 ```= # x 有多少質數? # 質數:除了1和它本身，没有其他正整数能够整除 x = 33 count_divisor = 0 for (n in 1:x) { print(sprintf("檢查第 %s 次", n)) if (x %% n == 0) { count_divisor = count_divisor + 1 print(n) } } if (count_divisor == 2) { ans = sprintf("%s 是質數", x) } else { ans = sprintf("%s 不是質數", x) } ans ``` ![螢幕擷取畫面 2024-01-22 184650](https://hackmd.io/_uploads/BJH6wTjKa.png) ```= # x 有多少質數? # 質數:除了1和它本身，没有其他正整数能够整除 divisors = c() count_divisor = 0 for (n in 1:x) { print(sprintf("檢查第 %s 次", n)) if (x %% n == 0) { divisors = c(divisors, n) count_divisor = 0 } if (length(divisors) > 2) { break } } if (length(divisors) == 2) { ans = sprintf("%s 是質數", x) } else { ans = sprintf("%s 不是質數", x) } ans ``` ![螢幕擷取畫面 2024-01-22 184643](https://hackmd.io/_uploads/H1nhDpjYT.png) >while寫法，要多一個參數，指定起始點 ```= x = 11 n = 1 count_divisor = 0 while(n <= x){ print(sprintf("第 %s 次檢查因數", n)) if(x %% n == 0){ count_divisor = count_divisor + 1 print(n) } n = n + 1 } if (count_divisor == 2) { ans = sprintf("%s 是質數", x) } else { ans = sprintf("%s 不是質數", x) } ans ``` ![螢幕擷取畫面 2024-01-22 163313](https://hackmd.io/_uploads/rJYIOisKa.png) >while，改成 n <= x**0.5，扣除掉平方根範圍外的，更有效率 ```= x = 21 n = 1 count_divisor = 0 while(n <= x**0.5){ print(sprintf("第 %s 次檢查因數", n)) if(x %% n == 0){ count_divisor = count_divisor + 1 print(n) } if (count_divisor > 2){ break } n = n + 1 } if (count_divisor == 2) { ans = sprintf("%s 是質數", x) } else { ans = sprintf("%s 不是質數", x) } ans ``` ![螢幕擷取畫面 2024-01-22 164043](https://hackmd.io/_uploads/BJ5MqsiK6.png) 範圍質數 ```= x = 1 y = 5 primes = c() for (n in x:y){ divisors = c() for (m in 1:n){ if (n %% m == 0){ divisors = c(divisors, m) } } if (length(divisors)==2){ primes = c(primes, n) } } length(primes) primes ``` ![螢幕擷取畫面 2024-01-22 191420](https://hackmd.io/_uploads/SyAzRpot6.png) >包成function ```= count_prime = function(x, y) { primes = c() for (n in x:y) { divisors = c() for (m in 1:n) { if (n %% m == 0) { divisors = c(divisors, m) } } if (length(divisors) == 2) { primes = c(primes, n) } } result = list(length=length(primes), primes=primes, divisors=divisors) return(result) } result = count_prime(1, 10) print(result$length) print(result[[1]]) print(result$primes) print(result[[2]]) print(result$divisors) print(result[[3]]) ``` ![螢幕擷取畫面 2024-01-24 152610](https://hackmd.io/_uploads/BJzoj4At6.png) - 創建向量 concat ```= # 類似Python np.array name <- c("catalina", "tomas", "biga") name ``` ![螢幕擷取畫面 2024-01-22 170441](https://hackmd.io/_uploads/BJUTkhjYT.png) >長度 ```= name <- c("catalina", "tomas", "biga") length(name) ``` ![螢幕擷取畫面 2024-01-22 171432](https://hackmd.io/_uploads/HJFWGhsFT.png) >索引、切割 ```= name <- c("catalina", "tomas", "biga") name[1] name[2] name[c(1,2)] ``` ![螢幕擷取畫面 2024-01-22 171834](https://hackmd.io/_uploads/BJTlQ3sYa.png) >新增、減少 ```= name <- c("catalina", "tomas", "biga") # 新增 name = c(name, "maite") name # 減少 name = name[-1] name name = name[-2] name ``` ![螢幕擷取畫面 2024-01-22 172834](https://hackmd.io/_uploads/BybIHhiY6.png) >判斷 ```= age <- c(22, 26, 17, 36) age > 25 ``` ![螢幕擷取畫面 2024-01-22 173156](https://hackmd.io/_uploads/BynMI3sFa.png) >合併使用 ```= name <- c("catalina", "tomas", "biga") age <- c(22, 26, 17) name[age > 25] ``` ![螢幕擷取畫面 2024-01-22 173607](https://hackmd.io/_uploads/SJe5mDhit6.png) - 離開 ```= q() ``` ### Data Structure - list 可迭代，r中可以存儲 key:value ```= membership <- list( "Catalina", 2022, 8.7, c("data department", "F") ) class(membership) ``` ![螢幕擷取畫面 2024-01-23 172720](https://hackmd.io/_uploads/rkItLWaF6.png) ```= membership[1]，創建了一個新的包含單一元素的資料結構membership[[1]]，直接返回第一個元素本身 result_1 = membership[1] print(result_1) # membership[[1]] result_2 = membership[[1]] print(result_2) ``` ![螢幕擷取畫面 2024-01-23 173615](https://hackmd.io/_uploads/HyTc_bpFp.png) ```= membership[[4]][2] ``` ![螢幕擷取畫面 2024-01-23 173921](https://hackmd.io/_uploads/SJd8KbaFa.png) 可跌代 ```= for (m in membership) { print(m) } ``` ![螢幕擷取畫面 2024-01-23 174030](https://hackmd.io/_uploads/Bya5YZaYa.png) key-value ```= membership_2 <- list( name= "Catalina", year= 2022, level= 8.7, info = c("data department", "F") ) class(membership_2) ``` ![螢幕擷取畫面 2024-01-23 180033](https://hackmd.io/_uploads/HyxLC-TtT.png) ```= names(membership_2) # get all keys membership_2[["name"]] membership_2$name membership_2$"info"[2] ``` ![螢幕擷取畫面 2024-01-23 180105](https://hackmd.io/_uploads/ByEdAZaFp.png) ```= # 新增k、更新、刪除ey，類似python pandas membership_2[["age"]] = 26 membership_2 membership_2[["age"]] = 18 membership_2 membership_2[["age"]] = NULL membership_2 ``` ![螢幕擷取畫面 2024-01-23 180206](https://hackmd.io/_uploads/HJN3CWpKa.png) ![螢幕擷取畫面 2024-01-23 180210](https://hackmd.io/_uploads/HyI3AZaFT.png) >大小寫轉換 ```= mem = c("Catalina", "Biga", "Diego") split = strsplit(mem, " ") print(split) # 大寫 upper_mem = toupper(mem) print(upper_mem) # 小寫 lower_mem = tolower(mem) print(lower_mem) ``` ![螢幕擷取畫面 2024-01-23 181719](https://hackmd.io/_uploads/ByaEzM6YT.png) - factor Factor 將文字向量轉換為整數編碼，通常用於統計分析(統計模型、機器學習算法) 為分類變數的資料結構，文字向量可以轉換為order有序或無序、可以設level ```= level <- factor(c("gold", "silver", "copper"), ordered = TRUE, levels =c("copper", "silver", "gold")) level as.numeric(level) ``` ![螢幕擷取畫面 2024-01-23 182825](https://hackmd.io/_uploads/rktAVM6K6.png) ```= level2 = level[-1] level2 as.numeric(level2) ``` ![螢幕擷取畫面 2024-01-23 184257](https://hackmd.io/_uploads/HkGHufpF6.png) - Data Frame Data Frame 是一種二維的資料結構（列&欄），類似於表格，每一列可以是不同的型別，通常用於處理結構化的數據 ```= mem = c("Catalina", "Biga", "Diego") age = c(26,25,32) year = c(2020, 2021, 2022) mem_df = data.frame(title=mem, age, year, stringsAsFactors = FALSE) mem_df ``` ![螢幕擷取畫面 2024-01-23 184552](https://hackmd.io/_uploads/SJJlFzTtT.png) ```= mem_df[["title"]] mem_df$title mem_df[["age"]] mem_df$age ``` ![螢幕擷取畫面 2024-01-23 191830](https://hackmd.io/_uploads/HkM5lmaYa.png) ```= mem_df[, 1:2] ``` ![螢幕擷取畫面 2024-01-23 191716](https://hackmd.io/_uploads/HJYSlXpta.png) ```= mem_df[c(1,3), 1:3] mem_df[c(TRUE, FALSE, TRUE, FALSE), 1:3 ] ``` ![螢幕擷取畫面 2024-01-23 192109](https://hackmd.io/_uploads/BkXNb7Tt6.png) ```= mem_df[mem_df$age == 25, 1:2] ``` ![螢幕擷取畫面 2024-01-23 192535](https://hackmd.io/_uploads/ByJrGXaFT.png) 查看表 ```= View(mem_df) ``` ![螢幕擷取畫面 2024-01-23 190733](https://hackmd.io/_uploads/BkBZRz6Yp.png) str 印出資料框的基本結構 ```= str(mem_df) ``` ![螢幕擷取畫面 2024-01-23 185203](https://hackmd.io/_uploads/S1Hv5f6Fa.png) - Matrix Matrix 具有相同的數據型別，通常用來進行線性代數運算，例如矩陣相乘和轉置 ```= mat_1 = matrix(1:8, nrow = 2) mat_1 class(mat_1) ``` ![螢幕擷取畫面 2024-01-23 193302](https://hackmd.io/_uploads/r1TxN76KT.png) ```= mat_1 = matrix(1:6) mat_1 ``` ![螢幕擷取畫面 2024-01-23 193550](https://hackmd.io/_uploads/ryViEQ6ta.png) ```= mat_1 = matrix(1:8, nrow = 3) mat_1 class(mat_1) ``` ![螢幕擷取畫面 2024-01-23 193346](https://hackmd.io/_uploads/H1p7EmptT.png) ```= mat_1 = matrix(1:6, nrow=3) mat_2 = matrix(5:10, nrow=3) mat_1 mat_2 mat_1 * mat_2 ``` ![螢幕擷取畫面 2024-01-23 193835](https://hackmd.io/_uploads/Bkg8HX6F6.png) ```= t(mat_1 ) %*% mat_2 ``` ![螢幕擷取畫面 2024-01-23 194014](https://hackmd.io/_uploads/B1xhB76F6.png) ```= mat_1 <- matrix(1:6, nrow = 3) mat_2 <- matrix(5:10, nrow = 2) mat_1 %*% mat_2 mat_1 = matrix(1:6, nrow=3) mat_2 = matrix(5:10, nrow=3) t(mat_1 ) %*% mat_2 ``` ![螢幕擷取畫面 2024-01-23 194654](https://hackmd.io/_uploads/Sy3ED7pYT.png) - Array Array 可以有多於兩個的維度，用來處理高維矩陣的資料結構 ```= arr_1 = array(1:24, dim = c(2, 3, 4)) arr_1 class(arr_1) ``` ![螢幕擷取畫面 2024-01-23 195744](https://hackmd.io/_uploads/B1Y6KXaK6.png) ### Math ```= # 等同python math.sqrt sqrt(4) ``` ![螢幕擷取畫面 2024-01-24 131451](https://hackmd.io/_uploads/Bk_C3fCKp.png) ```= round(2.33) floor(2.33) ceiling(2.33) ``` ![螢幕擷取畫面 2024-01-25 152513](https://hackmd.io/_uploads/H1n1pKJ96.png) ```= e = exp(2.33) log(e) ``` ![螢幕擷取畫面 2024-01-25 152543](https://hackmd.io/_uploads/HJymTK1qa.png) ```= log10(10) log10(10**2) ``` ![螢幕擷取畫面 2024-01-25 152650](https://hackmd.io/_uploads/rk18Ttkqp.png) ```= mean(1:5) mean(c(1,2,3)) ``` ![螢幕擷取畫面 2024-01-25 152817](https://hackmd.io/_uploads/B1bo6KJq6.png) ```= range(2:10) max(2:10) min(2:10) sum(1:10) ``` ![螢幕擷取畫面 2024-01-25 153303](https://hackmd.io/_uploads/BylpAtJq6.png) ![螢幕擷取畫面 2024-01-25 153339](https://hackmd.io/_uploads/BkNky915a.png) ```= name = c("catalina", "biga", "tomas") unique(name) toupper(name[[1]]) tolower(name[[1]]) ``` ![螢幕擷取畫面 2024-01-25 153943](https://hackmd.io/_uploads/B1YUlqyc6.png) ```= grep(pattern = "catalina", name) ``` ![螢幕擷取畫面 2024-01-25 154104](https://hackmd.io/_uploads/HyZigcJqT.png) ```= morning = " Good morning, sir! " substr(morning, start = 2, stop = 5) substr(morning, start = 6, stop = nchar(morning)) sub(" .* ", "", morning) ``` ![螢幕擷取畫面 2024-01-25 154355](https://hackmd.io/_uploads/r1hSb9kca.png) ```= trimws(morning, which = "left") trimws(morning, which = "right") ``` ![螢幕擷取畫面 2024-01-25 154510](https://hackmd.io/_uploads/BkK9Z9J5a.png) ### 字串處理 ```= paste(n, ", Good morning, sir!", sep = " :)") paste0(n, ", Good morning, sir!") ``` ![螢幕擷取畫面 2024-01-25 154547](https://hackmd.io/_uploads/Ski2bqk9a.png) ```= for (n in name) { morning <- paste(n, ", Good morning, sir!", sep = " :)") print(morning) } ``` ![螢幕擷取畫面 2024-01-25 154657](https://hackmd.io/_uploads/BkeZGc15T.png) ```= sub(pattern = "catalina", replacement = "cata", name) ``` ![螢幕擷取畫面 2024-01-25 154806](https://hackmd.io/_uploads/ByUrM5yca.png) ```= strsplit(name, split = " ") ``` ![螢幕擷取畫面 2024-01-25 154848](https://hackmd.io/_uploads/Skl_M9k5T.png) ### 內建資料集類似python seaborn資料集，裡面會有一些像kaggle中的資料集 ```= data() ``` ![螢幕擷取畫面 2024-01-24 165659](https://hackmd.io/_uploads/BJgxZICF6.png) ```= # 查看解說 ?euro ``` ![螢幕擷取畫面 2024-01-24 165854](https://hackmd.io/_uploads/rJ8P-U0F6.png) ### 創建、讀取檔案 - csv 創建檔案 .csv ```= members_csv <- "name,age catalina,26 biga,25 tomas,31" file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\members_csv.csv" tryCatch({ writeLines(members_csv, file_path) print("成功存儲") }, error = function(e) { print(paste("存儲失敗:", e)) }) ``` ![螢幕擷取畫面 2024-01-24 185909](https://hackmd.io/_uploads/BkBopwAKp.png) 讀取檔案 .csv ```= members_csv_df = read.csv(file_path, header=TRUE, sep=",") members_csv_df View(members_csv_df) ``` ![螢幕擷取畫面 2024-01-24 185912](https://hackmd.io/_uploads/BkqoTDRKp.png) ![螢幕擷取畫面 2024-01-24 185922](https://hackmd.io/_uploads/BympTwRtp.png) >ps 用","，excel 資料會自動分開 ![螢幕擷取畫面 2024-01-24 190036](https://hackmd.io/_uploads/Hk8kRPRF6.png) - txt 創建檔案 .txt ```= members <- "name;age catalina;26 biga;25 tomas;31" file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\members.txt" tryCatch({ writeLines(members, file_path) print("成功存儲") }, error = function(e) { print(paste("存儲失敗:", e)) }) ``` ![螢幕擷取畫面 2024-01-24 185740](https://hackmd.io/_uploads/rkiVavRta.png) 讀取檔案 .txt ```=members_df = read.table(file_path, header=TRUE, sep=";") members_df View(members_df) ``` ![螢幕擷取畫面 2024-01-24 185747](https://hackmd.io/_uploads/Hk04TvRYT.png) ![螢幕擷取畫面 2024-01-24 185639](https://hackmd.io/_uploads/rklb6DCtp.png) - txt 文件檔創建檔案 .txt ```= file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\news.txt" news <- "There have been racial barriers, and it has been challenging to be accepted as Japanese. That's what a tearful Carolina Shiino said in impeccable Japanese after she was crowned Miss Japan on Monday. The 26-year-old model, who was born in Ukraine, moved to Japan at the age of five and was raised in Nagoya." writeLines(news, file_path) ``` 讀取檔案 .txt ```= file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\news.txt" news_char <- readLines(file_path) class(news_char) length(news_char) news_char ``` ![螢幕擷取畫面 2024-01-24 190622](https://hackmd.io/_uploads/S1RV1uRY6.png) - json 關聯資料創建檔案 .json ```= file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\members_json.json" members_json <- '[{"name": "Catalina", "age": 26}, {"name": "Tomas", "age": 32}, {"name": "Biga", "age": 25}]' tryCatch({ writeLines(members_json, file_path) print("成功存儲") }, error = function(e) { print(paste("存儲失敗:", e)) }) ``` ![螢幕擷取畫面 2024-01-24 180448](https://hackmd.io/_uploads/SJLkZwCYp.png) ![螢幕擷取畫面 2024-01-24 180431](https://hackmd.io/_uploads/SktybwAtp.png) 讀取檔案 .json ```= # 先安裝套件 install.packages("jsonlite") ``` ```= file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\members_json.json" library("jsonlite") members_json_df = fromJSON(file_path) members_json_df ``` ![螢幕擷取畫面 2024-01-24 180441](https://hackmd.io/_uploads/SJpkZP0Ya.png) - json 非關聯資料創建檔案 .json ```= file_path = "C:\\Users\\catal\\OneDrive\\桌面\\emp_catalina.json" emp_catalina = ' { "name": "Catalina", "age": 26, "gender": "F", "department":["data","business"] } ' writeLines(emp_catalina, file_path) ``` ![螢幕擷取畫面 2024-01-24 191336](https://hackmd.io/_uploads/B1bgbOCt6.png) ![螢幕擷取畫面 2024-01-24 191619](https://hackmd.io/_uploads/ryE9-OCYT.png) 讀取檔案 .json ```= # 先安裝套件 install.packages("jsonlite") ``` ```= file_path = "C:\\Users\\catal\\OneDrive\\桌面\\emp_catalina.json" library(jsonlite) emp_catalina = fromJSON(file_path) emp_catalina class(emp_catalina) length(emp_catalina) names(emp_catalina) ``` ![螢幕擷取畫面 2024-01-24 191523](https://hackmd.io/_uploads/r1_Dbu0Kp.png) ![螢幕擷取畫面 2024-01-24 191528](https://hackmd.io/_uploads/SJtwZOAY6.png) - 試算表讀取檔案試算表 ```= # 先安裝套件 install.packages("readxl") ``` ![螢幕擷取畫面 2024-01-24 173344](https://hackmd.io/_uploads/r1dFKUAt6.png) ```= library("readxl") file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\members_xlsx.xlsx" members_xlsx_df = read_excel(file_path) members_xlsx_df ``` ![螢幕擷取畫面 2024-01-24 190314](https://hackmd.io/_uploads/HkWtRwRKp.png) - RSQLite 創建 RSQLite ```= # 先安裝套件 install.packages("RSQLite") ``` ![螢幕擷取畫面 2024-01-24 180836](https://hackmd.io/_uploads/Hym3ZwRY6.png) 創建csv ```= students_csv <- "name, age catalina, 26 biga, 25 tomas, 32" file_path <- "C:\\Users\\catal\\OneDrive\\桌面\\students_csv.csv" tryCatch({ writeLines(students_csv, file_path) print("成功存儲") }, error = function(e) { print(paste("存儲失敗:", e)) }) ``` 讀取csv ```= #### students_csv_df = read.csv(file_path, header=TRUE, sep=",") students_csv_df ``` 寫進db ```= library(DBI) conn = dbConnect(RSQLite::SQLite(), ":memory:") dbListTables(conn) file_path = "C:\\Users\\catal\\OneDrive\\桌面\\students_csv.csv" students_csv_df = read.csv(file_path, sep = ",", header=TRUE) dbWriteTable(conn, "students_csv_df", students_csv_df) dbListTables(conn) ``` 讀取、查詢 ``` # 讀取 students_csv_df_db = dbReadTable(conn, "students_csv_df") students_csv_df_db # dbSendQuery 查詢 sql_query <- "SELECT * FROM students_csv_df WHERE name = 'catalina';" res <- dbSendQuery(conn, sql_query) dbFetch(res) ``` ![螢幕擷取畫面 2024-01-24 190422](https://hackmd.io/_uploads/rkrpCvRF6.png) ```= # 清理，釋放相關的資源 dbClearResult(res) # ：關閉與數據庫的連接，防止留下不必要的打開連接 dbDisconnect(conn) ``` ### 篩選資料 - 篩選 [列,行/欄] ```= str(mem_df) mem_df[, "age"] mem_df[, c("title","age")] mem_df$age ``` ![螢幕擷取畫面 2024-01-25 161050](https://hackmd.io/_uploads/Skq5Dqk5T.png) ```= mem_df mem_df[mem_df$title == "Catalina",] mem_df[mem_df$title == "Diego", "age"] mem_df[2, ] mem_df[2, c("title","age")] ``` ![螢幕擷取畫面 2024-01-25 161822](https://hackmd.io/_uploads/rkXDY9y9a.png) ```= mem_df[mem_df$age %in% c(22,23,24,25), c("title","age")] mem_df[mem_df$age %in% 22:25, c("title","age")] mem_df$age %in% 22:25 ``` ![螢幕擷取畫面 2024-01-25 170614](https://hackmd.io/_uploads/SJ2cVsk9p.png) - 排序按照年齡 ```= mem_df_indices = order(mem_df[, "age"]) mem_df[mem_df_indices, ] mem_df[order(mem_df[, "age"]), ] ``` ![螢幕擷取畫面 2024-01-25 162443](https://hackmd.io/_uploads/ryY05qyqa.png) ```= selected_rows = members_csv_df[members_csv_df["age"] > 25, ] members_more_than25_df = data.frame(name = selected_rows["name"], age = selected_rows["age"]) View(members_more_than25_df) ``` ![螢幕擷取畫面 2024-01-24 192633](https://hackmd.io/_uploads/SJ9XEuCta.png) - 新增欄位 ```= mem_df$No. = mem_df$age * 10 head(mem_df) mem_df$dep ="" head(mem_df) ``` ![螢幕擷取畫面 2024-01-25 165146](https://hackmd.io/_uploads/B1ZN-j156.png) - output資料到檔案 .txt ```= file_path = "C:\\Users\\catal\\OneDrive\\桌面\\members_more_than25_df.txt" write.table(members_more_than25_df, file = file_path, row.names = FALSE) ``` ![螢幕擷取畫面 2024-01-24 193004](https://hackmd.io/_uploads/r1YUH_RYp.png) .csv ```= file_path = "C:\\Users\\catal\\OneDrive\\桌面\\members_more_than25_df.csv" write.table(members_more_than25_df, file = file_path, sep = ",", row.names = FALSE) ``` ![螢幕擷取畫面 2024-01-24 192951](https://hackmd.io/_uploads/BJi8HuRtT.png) 非關聯 .json ```= emp_catalina = ' { "name": "Catalina", "age": 26, "gender": "F", "department":["data","business"] } ' file_path = "C:\\Users\\catal\\OneDrive\\桌面\\emp_catalina_2.json" library(jsonlite) emp_catalina = toJSON(emp_catalina) writeLines(emp_catalina, file_path) ``` ![螢幕擷取畫面 2024-01-24 194005](https://hackmd.io/_uploads/rJUUwuCKT.png) 關聯 .json ```= file_path = "C:\\Users\\catal\\OneDrive\\桌面\\iris_json " iris_json = toJSON(iris) writeLines(iris_json, file_path) ``` ![螢幕擷取畫面 2024-01-24 194000](https://hackmd.io/_uploads/Bkd8vO0KT.png) ### 進階 dplyr %>% 符號是magrittr套件（也被稱為"pipe"操作符號）的一部分，讓代碼更易讀和連貫，它允許前一步的結果傳遞給下一步的函數 %>% 快捷鍵: Ctrl+Shift+M ```= # 先安裝套件 install.packages("dplyr") ``` ```= library("dplyr") # 原本寫法 select(mem_df, title) filter(select(mem_df, c("title","age")), age > 25) # %>% 寫法 mem_df %>% select(title,age) %>% filter(age > 25) ``` ![螢幕擷取畫面 2024-01-25 185030](https://hackmd.io/_uploads/ByfGa3y5p.png) ```= mem_df %>% arrange(`age`) # arrange(desc(`age`)) ``` ![螢幕擷取畫面 2024-01-25 185036](https://hackmd.io/_uploads/Hyoz6n15a.png) ```= mem_df %>% mutate(No.2 = age * 20) mem_df %>% mutate(No.2 = age * 20) %>% summarise(avg_no.2 = mean(No.2)) ``` ![螢幕擷取畫面 2024-01-25 185220](https://hackmd.io/_uploads/S1vd6hy9T.png) ![螢幕擷取畫面 2024-01-26 001500](https://hackmd.io/_uploads/rkmfKZx56.png)