Try   HackMD
tags: R sunburst Visualization Report 資料視覺化

Sunburst套件介紹

Refernce

使用條件

  • 資料型態:路徑String + 頻率
    1. 把重複的路徑拿掉, 只顯示過程變化:
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
    2. 還原所有過程變化:
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →

資料型態轉換

1. 讀取資料

library(TraMineR)

# use example from TraMineR vignette
data("mvad")
mvad.alphab <- c(
  "employment", "FE", "HE", "joblessness",
  "school", "training"
)
mvad.seq <- seqdef(mvad, 17:86, xtstep = 6, alphabet = mvad.alphab)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

2. 型態轉換

  • 寫法1
# To make this work, we'll compress the sequences with seqdss
library(pipeR)

seqtab( seqdss(mvad.seq), tlim = 0, format = "SPS" ) %>>%
  attr("freq") %>>%
  (
    data.frame(
      # appending "-end" is necessary for this to work
      sequence = paste0(
        gsub(
          x = rownames(.)
          , pattern = "(/[0-9]*)" # 把不斷重複的'/1'拿掉
          , replacement = ""
          , perl = T # 是否使用perl兼容的正則表達式(regexps)
        )
        ,"-end"
      )
      ,freq = as.numeric(.$Freq)
      ,stringsAsFactors = FALSE
    )
  ) %>>%
  sunburst
  • 寫法2
library(tibble)

seq_df = seqtab( seqdss(mvad.seq), idxs = 0, format = "SPS" ) %>% 
  attr("freq") %>% 
  rownames_to_column("Path")

seq_df$Path = gsub('/1', "" , seq_df$Path) %>% paste0("-end")

繪製Sunburst

library(sunburstR)

sequence_data <- read.csv(
  paste0(
    "https://gist.githubusercontent.com/kerryrodden/7090426/",
    "raw/ad00fcf422541f19b70af5a8a4c5e1460254e6be/visit-sequences.csv"
  )
  ,header=F
  ,stringsAsFactors = FALSE
)

sunburst(sequence_data)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Sunburst比例的算法

index = grep("^home-home",sequence_data$V1) # 由home開頭的序列
sum(sequence_data$V2[index])/sum(sequence_data$V2)

[1] 0.06031