# 大數據分析與R語言 第六週 ###### tags: `大數據分析與R語言` `高科大` `碩士` `R` `rstudio` >:::spoiler 文章目錄 >[TOC] >::: ## 前情提要 1. 報告時間(下週分組)、報告方式(中文報告,講英文則加分) 2. 2025/5/5(一)早上或下午 建工 聽演講 ## R最重要的套件之一 ==ggplot2== ### 安裝路徑 ```R=1 install.packages("ggplot2") library(ggplot2) ``` ![image](https://hackmd.io/_uploads/H1yPQOCh1x.png =300x500) ![image](https://hackmd.io/_uploads/B1ZUXORn1l.png =400x300) ![image](https://hackmd.io/_uploads/By8GX_A31g.png =500x300) ### 繪製圖表 #### 散佈圖 ```R=1 car_data <- cars str(car_data) head(car_data) # eom_point()散佈圖 ggplot(car_data,aes(x=speed,y=dist))+geom_point() ``` ![image](https://hackmd.io/_uploads/BJRsEu0nJx.png) ![image](https://hackmd.io/_uploads/HJU1S_Anyx.png) #### 直方圖 ```R=1 dt <- rnorm(1000) hist <- data.frame(dt) # geom_histogram(binwidth=0.5) 合併:設定分箱數bins() ggplot(hist,aes(x=dt))+geom_histogram(binwidth=0.5) ggplot(hist,aes(x=dt))+geom_histogram(binwidth=0.2) ``` > **接近標準的常態分佈** ![image](https://hackmd.io/_uploads/BJTTr_02kl.png) > **更精細的結果** ![image](https://hackmd.io/_uploads/Syg1v_02Jl.png) > **更廣域的結果** ![image](https://hackmd.io/_uploads/Hy0FD_ChJe.png) #### 盒鬚圖 ```R=1 iris_data <- iris # geom_boxplot() 盒鬚圖 ggplot(iris_data,aes(x=Species,y=Sepal.Width))+geom_boxplot() ``` > **知曉中間線盒鬚圖,最大跟最小值多少,異常值則用點呈現** ![image](https://hackmd.io/_uploads/SyQN__0hyl.png) #### 線圖 ```R=1 xd <- seq(1:100) yd <- rnorm(100,20,5) df <- data.frame(x=xd,y=yd) # geom_line() 線圖:兩者比較用途 ggplot(df,aes(x=xd,y=yd))+geom_line() ``` > **比較兩者差異結果用,並獲取標準差** ![image](https://hackmd.io/_uploads/rym7Fu0h1g.png) #### 長條圖 ```R=1 x <- rnorm(100,20,5) int_x <- as.integer(x) tb_x <- data.frame(int_x = int_x) # geom_bar() 長條圖 ggplot(tb_x,aes(x=int_x))+geom_bar() ``` > **離散型浮點數資料,轉整數,做分佈圖(年齡/投票...)** ![image](https://hackmd.io/_uploads/SyD55_Ahkl.png) #### 曲線圖 ```R=1 para_curve <- function(x) { return(-(x)^2+4) } rng <- data.frame(x=c(-10,10)) ggplot(rng,aes(x=x))+stat_function(fun=para_curve,geom="line") ``` > **給定公式,即可繪製出曲線圖** ![image](https://hackmd.io/_uploads/H1LrndA2kx.png) #### 自訂圖形元素 ==增加title、xy標題軸== ```R=1 car_data <- cars ggplot(car_data,aes(x=speed,y=dist))+geom_point()+ggtitle("Speed vs Brake")+xlab("Speed")+ylab("Braking Distance") ``` > **疊加圖層(共3層),增加title、xy標題軸** ![image](https://hackmd.io/_uploads/r1Z_pOChJx.png) #### 自訂圖形元素 ==清除xy軸== | 參數 | 說明 | | ------------------ | --------------- | | Panel.grid.major | 關掉主要格線 | | Panel.grid.minor | 關掉次要格線 | | Panel.grid.major.x | 關掉x軸主要格線 | | Panel.grid.minor.x | 關掉x軸次要格線 | ```R=1 car_data <- cars ggplot(car_data,aes(x=speed,y=dist))+geom_point()+ggtitle("Speed vs Brake")+xlab("Speed")+ylab("Braking Distance")+theme(panel.grid.major.x=element_blank(),panel.grid.major.y=element_blank()) ``` > **清除xy軸的格線,讓圖表更清晰易懂,能夠重點呈現** ![image](https://hackmd.io/_uploads/Bk-5CdR3ke.png) #### 自訂圖形元素 ==水平翻轉== ```R=1 x <- rnorm(100,20,5) int_x <- as.integer(x) tb_x <- data.frame(int_x = int_x) # geom_bar() 水平長條圖 ggplot(tb_x,aes(x=int_x))+geom_bar()+coord_flip() ``` > **用水平呈現** ![image](https://hackmd.io/_uploads/S1oVkFRnyx.png) #### 自訂圖形元素 ==散佈圖水平並改變顏色== ```R=1 car_data <- cars ggplot(car_data,aes(x=speed,y=dist))+geom_point(shape=1,colour="blue")+ggtitle("Speed vs Brake")+xlab("Speed")+ylab("Braking Distance")+theme(panel.grid.major.x=element_blank(),panel.grid.major.y=element_blank()) ``` > **改變形狀與顏色** ![image](https://hackmd.io/_uploads/SJ_a1t0nye.png) #### 繪製多個圖形 ```R=1 x1 <- seq(1:100) y1 <- rnorm(100,20,5) df1 <- data.frame(x1,y1) x2 <- seq(1:100) y2 <- rnorm(100,20,5) df2 <- data.frame(x2,y2) f1 <- ggplot(df1,aes(x=x1,y=y1))+geom_point() f2 <- ggplot(df2,aes(x=x2,y=y2))+geom_point() # 安裝套件 install.packages("gridExtra") library(gridExtra) # 合併圖表 grid.arrange(f1,f2,nrow=1,ncol=2) ``` > **合併兩張圖表結果** ![image](https://hackmd.io/_uploads/SJ0mnY021x.png) ## 案例 內建資料集 ==Gapminder dataset== ```R=1 install.packages("gapminder") library(gapminder) summary(gapminder) ``` > **安裝過程與結果顯示** ![image](https://hackmd.io/_uploads/ByCcztR31l.png =300x200) ![image](https://hackmd.io/_uploads/SJHyFF0nJg.png) #### 探討全世界健康預測和GDP的趨勢 ```R=1 dt <- gapminder ggplot(dt,aes(x=lifeExp,y=gdpPercap))+geom_point() ``` ![image](https://hackmd.io/_uploads/SygKQFRn1g.png) #### 探討 ==亞洲人== 健康預測和GDP的趨勢 ```R=1 dt <- gapminder asia <- dt[dt$continent=="Asia",] ggplot(asia,aes(x=gdpPercap,y=lifeExp))+geom_point() ``` ![image](https://hackmd.io/_uploads/SJ8J9YC3Je.png) #### 同上:將資料點加上顏色 ```R=1 dt <- gapminder asia <- dt[dt$continent=="Asia",] ggplot(asia,aes(x=gdpPercap,y=lifeExp))+geom_point(shape=3,colour="blue") ``` ![image](https://hackmd.io/_uploads/H1sE5tA3Jl.png) #### 探討全世界健康預測和GDP的趨勢 > 依 ==不同洲== 給 ==不同點點形狀跟顏色== ```R=1 dt <- gapminder ggplot(dt,aes(x=lifeExp,y=gdpPercap))+geom_point(aes(shape=continent,colour=continent)) ``` ![image](https://hackmd.io/_uploads/BJRUoF03Je.png) ## 模糊理論解釋 Body conset ## 作業練習 ### 作業q5-1 1. 讀取 `gapminder data` 2. 練習畫出以下圖形 Hint : `geom_line()` ![image](https://hackmd.io/_uploads/BJdkvYAhJg.png =350x200) ```R=1 install.packages("gapminder") install.packages("ggplot2") library(gapminder) library(ggplot2) summary(gapminder) # 使用 subset() 過濾出 Oceania 的資料 oceania_data <- subset(gapminder, continent == "Oceania") # 畫出生命期望趨勢圖 ggplot(oceania_data, aes(x = year, y = lifeExp, color = country)) + geom_line(size = 1) + labs(title = "Life expectancy in Oceania from 1952 to 1997", x = "year", y = "lifeExp") + theme_minimal() ``` #### 結果呈現 ![image](https://hackmd.io/_uploads/Ske1DYC3yl.png) --- :::spoiler 最後更新日期 >==第一版==[time=2025 3 24 , 3:48 PM][color=#786ff7] <!-- >第二版[time=2025 2 24 , 3:20 PM][color=#ce770c] --> <!-- >第三版[time=2025 2 24 , 3:20 PM][color=#ce770c] --> >**最後版[time=2025 3 24 , 3:48 PM]**[color=#EA0000] :::