owned this note
owned this note
Published
Linked with GitHub
# Json to R & Sort by datetime
## Problem description
1. 想從3個json來源的其中一個source抓資料下來,
最後我只要其中一個device (ID: 74DA38C7D1D2 )的資訊
(1) 有全部的device資訊,real time
https://data.lass-net.org/data/last-all-airbox.json
(2) 一個是by specific ID,real time
https://data.lass-net.org/data/last.php?device_id=74DA38C7D1D2
(3) 一個是historical by ID
https://data.lass-net.org/data/history.php?device_id=74DA38C7D1D2
2. Data處理: 抓下來的資料依照時間排序,如果是
(1)的作法: 打算寫個抓下來需要再依照ID去做擷取並output的script,
再用工作排程器做定時抓資料,但排程器還沒掌握到如何
可以每天每小時/5分鐘去啟動script;
(2)的做法: 就是寫好再用工作排程器作定時抓資料。
(3)的作法: 抓下來後,想辦法排序。
3. Problems:
(1)的做法與(2)的做法:
用自己try出來的寫法,但排程器還沒掌握到如何可以每天每小時或
5分鐘去啟動script;或是本身用R做到這件事。
(3)的做法
不確定怎麼克服 1 row by 多重columns 至 只要些specific columns in rows by different times
--------------------------------------------------------
## Answered by ***celestialgod*** on *ptt R-language*
https://www.ptt.cc/bbs/R_Language/M.1496579812.A.7E8.html
### Solution 1
用as.data.frame不是不行,只是這樣去破壞原有的list架構
會讓整個使用上變得怪,建議這樣用:
```Rscript
url <- "https://data.lass-net.org/data/last-all-airbox.json"
x <- fromJSON(url)
final_data_id <- subset(x$feeds, SiteName == "74DA38C7D1D2")
```
排程部分: taskscheduleR,可以參考這篇:https://goo.gl/4DtZTb
### Solution 2
```Rscript
library(jsonlite)
url <- "https://data.lass-net.org/data/history.php?device_id=74DA38C7D1D2"
x <- fromJSON(url)
library(data.table)
library(lubridate)
outDT <- rbindlist(x$feeds$AirBox)
outDT[ , `:=`(source = x$source, version = ymd_hms(x$version),
device_id = x$device_id, timestamp = ymd_hms(timestamp))]
```
看不懂data.table的話,試著用用看base做法
而lubridate的ymd_hms是比較簡單把字串轉時間的方式,不喜歡可以用as.POSIXct
結果:


------------------------------------------------
## MY Final codes for historical data
```Rscript
# Get data from Json link
library(jsonlite)
url <- "https://data.lass-net.org/data/history.php?device_id=74DA38C7D1D2"
x <- fromJSON(url)
# Arrange data into table matrix
library(data.table)
library(lubridate)
outDT <- rbindlist(x$feeds$AirBox)
# Claim the timestamp into correct time format by lubridate package
outDT[ , `:=`(source = x$source, version = ymd_hms(x$version),
device_id = x$device_id, timestamp = ymd_hms(timestamp))]
sortD <- outDT
# Capture data by column names
headers<-c("timestamp","s_d0","s_t0","s_h0","date","time","device_id","gps_lon","gps_lat","version")
sortD <- subset(outDT,select=headers)
# rename column names
colnames(sortD)[which(names(sortD) == "s_d0")] <- "PM2.5"
colnames(sortD)[which(names(sortD) == "s_t0")] <- "Temperature"
colnames(sortD)[which(names(sortD) == "s_h0")] <- "Humidity"
# Sort data
sortD$timestamp <- as.POSIXct(sortD$timestamp, tz='UTC')
class(sortD$timestamp)
Final_data<-sortD[order(sortD$timestamp)]
View(Final_data)
# Output data
date <- sprintf("AirBox_74DA38C7D1D2_%s.csv", format(Sys.time(),"%Y%m%d%H%M"))
outfile = paste("D:\\AirBoxTest\\", date, sep = "")
write.csv(Final_data, file = outfile)
#################################
# Hourly average and output
#################################
Final_data_hourly<- aggregate(list(PM2.5 = Final_data$PM2.5,
Humidity = Final_data$Humidity,
Temperature = Final_data$Temperature),
list(hourofday = cut(Final_data$timestamp, "1 hour")),
mean)
# Output data
date <- sprintf("AirBox_74DA38C7D1D2_Hourly_%s.csv", format(Sys.time(),"%Y%m%d%H%M"))
outfile = paste("D:\\AirBoxTest\\", date, sep = "")
write.csv(Final_data_hourly, file = outfile)
```
## Note for Realtime and Historical data
- For both Realtime and historical data
***An about 8 hours gap is found between the timing we capturing the data and the latest data we can get***
- For the very first real time data, we might need to learn how to command the "***Arduino***"
-------------------------------------------------
:::info
### 偵測資料:
#time:資料產生時間
#device_id:AirBox ID
#s_0:Message sequence number
#s_1:Battery power(0 ~ 100)
#s_2:Battery or Plug(0: Battery 1: Plug)
#s_3:Moving speed(Should be 0)
#s_d0:PM2.5
#s_t0:Temperature
#s_h0:Humidity
### 設備資料:
#device_id:AirBox ID
#vendor:Vendor number
#ver_format:Message format version
#fmt_opt:A fixed value
#app:App name
#ver_app:AirBox firmware version
#device:AirBox Name
#gps_lat:Latitude
#gps_lon:Longitude
#gps_fix:A fixed value(Must be 1)
#gps_num:A number(Greater than 4)
#gps_alt:Altitude(註:gps_fix,gps_num,gps_alt
:::
### Previous notes: How to download Airbox historial data by R
1. Basing on the codes I found on a website [R - JSON Files](https://www.tutorialspoint.com/r/r_json_files.htm)
```
# Load the package required to read JSON files.
library("rjson")
# Give the input file name to the function.
result <- fromJSON(file = "input.json")
# Convert JSON file to a data frame.
json_data_frame <- as.data.frame(result)
print(json_data_frame)
```
2. However, it's not really worked for the data from the open source
-- the latest samples of AirBox devices
-- update frequency: every 5 min
-- https://data.lass-net.org/data/last-all-airbox.json
3. Here are my code:
```
library(RJSONIO)
library(RCurl)
library(rjson)
# grab the data
raw_data <- getURL("https://data.lass-net.org/data/last-all-airbox.json")
# Then covert from JSON into a list in R
data <- fromJSON(raw_data)
length(data)
# We can coerce this to a data.frame
# final_data <- do.call(rbind, data)
final_data <- as.data.frame(data)
# Then write it to a flat csv file
write.csv(final_data, "D:\\NTU\\Meeting\\170512 AirBoxTest\\final_data.csv")
#偵測資料:
# time:資料產生時間
# device_id:AirBox ID
# s_0:Message sequence number
# s_1:Battery power(0 ~ 100)
# s_2:Battery or Plug(0: Battery 1: Plug)
# s_3:Moving speed(Should be 0)
# s_d0:PM2.5
# s_t0:Temperature
# s_h0:Humidity
#設備資料:
# device_id:AirBox ID
# vendor:Vendor number v
# er_format:Message format version
# fmt_opt:A fixed value
# app:App name
# ver_app:AirBox firmware version
# device:AirBox Name
# gps_lat:Latitude
# gps_lon:Longitude
# gps_fix:A fixed value(Must be 1)
# gps_num:A number(Greater than 4)
# gps_alt:Altitude(註:gps_fix,gps_num,gps_alt
# device ID: 74da38c7d1d2
# splat <- read.table(gzfile("D:\\NTU\\Meeting\\170512 AirBoxTest\\AirBoxData_history_V2.gz"), header=T)
```
###### tags: `R` `Airbox` `data`