Try   HackMD

用R做網頁爬蟲+風花圖

更新:現在氣象局的 CODiS 氣候資料服務系統已經更改網頁介面,這篇介紹的方法已經沒辦法再使用,只能用 python 自動抓取,或者手動下載資料再用 R 整理,目前尚無解。

風花圖,又叫做風玫瑰圖(wind rose),是一種呈現風向、風速與頻度分布的資訊繪圖,從圖面上可以了解某地在特定時間的風向、風速概要,在設計機場跑道方向或者設計建築物座向時都會用到。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

中央氣象局氣候年報所使用的風花圖圖例。


最近需要製作中央氣象局香山氣象站的氣候圖與風花圖,因此研究了一下如何用 R 簡單繪製,最後找到了阿好伯的文章〈以R語言爬取監測站歷史資料並以ggplot2繪製風玫瑰圖(風花圖,Wind Rose)_大寮測站為例〉,裡面詳細記錄了如何用 R 從氣象局的測站資料網把資料爬下來,再繪製成風花圖。不過我想做的是彙整各年度冬、夏季的資料,所以有一些不同,稍微更改一些程式碼,記錄在這邊。

首先到氣象局的測站資料查詢網找到香山氣象站的日報表(才有每個小時的風速、風向資料),看一下網址的組成。

網址長這樣:https://e-service.cwb.gov.tw/HistoryDataQuery/DayDataController.do?command=viewMain&**station=**C0D570&stname=%25E9%25A6%2599%25E5%25B1%25B1&datepicker=2020-09-09&altitude=15m
黑體標記是之後再爬蟲的時候要換掉的部分。

接下來就是把阿好伯大大的程式碼稍做更改如下:

packages <- c("jsonlite", "rvest", "magrittr", "lubridate","ggplot2")
invisible(lapply(packages, library, character.only = TRUE))

url_start <- "http://e-service.cwb.gov.tw/HistoryDataQuery/DayDataController.do?command=viewMain&station=C0D570&stname=%25E9%25A6%2599%25E5%25B1%25B1&datepicker="

windcrawler <-  function(x,y){
  start_date <- ymd(x)
  end_date <- ymd(y)
  data1 <- data.frame()
  data2 <- data.frame()
  for(d in c(0:(end_date - start_date))){
    url <- paste0(url_start,as.character(start_date + d),"&altitude=15m")
    #時間資料
    time_H <- url %>% 
      read_html() %>%
      html_nodes(xpath='//*[(@id = "MyTable")]//td[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]') %>% 
      html_text(trim	= T) %>% 
      as.numeric()
    #風速資料
    Windspeed <- url %>%
      read_html() %>%
      html_nodes(xpath='//td[(((count(preceding-sibling::*) + 1) = 7) and parent::*)]') %>%
      html_text(trim	= T) %>% 
      gsub("X","-9999", .) %>% 
      as.numeric()
    #風向
    windDirection<- url %>%
      read_html() %>%
      html_nodes(xpath='//td[(((count(preceding-sibling::*) + 1) = 8) and parent::*)]') %>%
      html_text(trim	= T) %>% 
      gsub("X","-9999", .) %>% 
      as.numeric()
    data1 <- cbind(as.character(start_date + d), time_H, Windspeed,windDirection)  #合併資料
    data2 <- rbind(data2, data1)
  }
  data3 <- na.omit(data2)  #刪除包含NA值的列
  data3$windDirection_N <- cut(as.numeric(data3$windDirection),
                               breaks = c(0, 11.26, 33.76, 56.26, 78.76, 101.26, 127.76, 146.26, 168.76, 191.26, 213.76, 236.26, 258.76, 281.26, 303.76, 326.26, 348.75, 360),
                               labels = c("N","NNE","NE","ENE","E","ESE","ES","SSE","S","SWS","SW","WSW","W","WNW","NW","NNW","N"),
                               include.lowest = TRUE)
  data3$風級 <- cut(as.numeric(data3$Windspeed),
                  breaks =c(0, 1.5, 3.3,  5.4, Inf),
                  labels = c("0–1 級", "2 級", "3 級", "3 級以上"),
                  right = F)
  data3$風速區間 <- cut(as.numeric(data3$Windspeed),
                    breaks =c(0, 0.5, 1.5,  3.0, Inf),
                    labels = c("0–0.5 m/s", "0.5–1.5 m/s", "1.5–3 m/s", ">3 m/s"),
                    right = F)
  data3$Windspeed <- as.numeric(data3$Windspeed)
  return(data3)
}

觀測資料查詢網的網址結構和阿好伯發文的時候相比,現在多加了 &altitude= 在網址最後,在處理時要小心,記得加上。


因為想了解香山在冬季和夏季的風向異同,所以需要分成冬夏季來抓資料。在這邊定義冬季 12–2 月、夏季是 6–8 月,就以此定義新的函數:

#夏季風向風速資料 windcrawlerS <- function(x){ a = paste0(x,0,601) b = paste0(x,0,831) windcrawler(a,b) } #冬季風向風速資料 windcrawlerW <- function(x){ a = paste0(x-1,1201) b = paste0(x,0,301) windcrawler(a,b) } #舉例:抓取2010年夏季風向風速資料 Summer2010 <- windcrawlerS(2010)

如果要抓取數年的資料,可以寫個迴圈來抓,或者逐年手動抓也可以,最後再用 rbind() 合併即可。


最後是繪製風花圖,有設定字型為思源柔黑體(源柔ゴシック)

WindSummer <- read.csv("WindSummer.csv")
windowsFonts(GJ = windowsFont("Gen Jyuu Gothic P Medium"))

WindSummer$風速區間 <- factor(WindSummer$風速區間, levels = c("0–0.5 m/s", "0.5–1.5 m/s", "1.5–3.0 m/s", ">3.0 m/s"))
WindSummer$windDirection_N <- factor(WindSummer$windDirection_N, levels = c("N","NNE","NE","ENE","E","ESE","ES","SSE","S","SWS","SW","WSW","W","WNW","NW","NNW"))
col = c("gray86", "#19B4C5", "#F9BD21", "#7F1084")
SummerRose <- ggplot(WindSummer, aes(x = windDirection_N, fill = 風速區間))+
  geom_bar(position = position_stack(reverse = T)) + 
  labs(title = "香山氣象站 2008–2020 年度夏季風花圖") +
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 32), 
        axis.title.y = element_text(hjust = 0.75, vjust = 2.5, size = 18, color = "azure4"), 
        text = element_text(size=18)) + 
  coord_polar(start = -0.2) + 
  xlab("") +
  ylim(0, 10000)
ggpar(SummerRose, font.family = "GJ", palette = col)
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
繪製出的風花圖。


可以發現香山地區在夏季還是吹北風居多,有可能因為台灣在行星風帶中位於東北信風帶,夏天沒有西南氣流吹來的時候,還是以偏北風為主嗎?
另外,香山測站(C0D570)已經撤銷了,取而代之的是 2022 年 3 月 8 開始有觀測資料的香山濕地測站(C0D680),之後若需要了解香山溼地周遭的氣候,要更換查詢對象了。

最後再次感謝阿好伯大大
🐕‍🦺 2022.10.12