R SOM Cluster
Machine Learning

SOM

Self-Organize Map
原理類似k-means
類神經網路的一種(有輸入層/權重/輸出層)
收斂速度很快

原理

更新次數 = 輸入的資料長度
每個神經元透過競爭來更新權重
更新權重時與該神經元鄰近的神經元也跟著更新，其餘不更新
一開始權重也是隨機的

輸入X(x1,x2…xn) –> 計算每個x與神經元的距離(越小 = 刺激越大) –> 更新權重

優點
具有神經網路的特性與優勢，如平行處理、分散式儲存、容錯力等
透過競爭學習，訓練權重係數後，自動得出各分群的中心
不須事先指定分群數目(應該說可以設大一點的群數，不用太精準)
支援大數量的分群結果，有效找出異常資料類且網路訓練收斂速度快
缺點
輸入資料少時，分群結果與資料輸入先後順序有關
與 adaptive resonance theory network (art) 不同，在學習完成前，不能加入新的類別

kohonen in R

下載kohonen並引入






install.packages("kohonen")
library(kohonen)
# wines 好像是 kohonen 自帶的data set
# 也可以用iris data set看看
data(wines)
head(wines, 5)

先觀察一下data



summary(wines)
nrow(wines)
ncol(wines)

發現總共有177筆wine的資料，每筆資料又有13個不同屬性
因為SOM model需要訓練，所以開始分割訓練跟驗證集(8:2)
訓練集採用140筆(177x0.8 = 141.x ~ 140)

訓練集&驗證集&scaling




som.data.idx <- sample(nrow(wines), 140)
som.training.data <- wines[som.data.idx, ]
som.testing.data <- wines[-som.data.idx, ]
som.testing.data.scale <- scale(som.testing.data)

設定訓練參數

data = scale training data
somgrid(x_dim, y_dim, output plot shape, neighborhood function)
這邊設定x_dim = y_dim = 5, 共25聚類 (以後面結果來看好像分太多類)
output shape 有六邊形(hexagonal)跟長方形(rectangle)兩種排列方式
neighborhood function 有 guassian & bubble 比較


som.model <- som(som.training.data.scale, grid = somgrid(5,5,"hexagonal","gaussian"))
summary(som.model)

mean distance 越小越好

畫圖看看結果






plot(som.model, type="codes")
plot(som.model, type="counts")
plot(som.model, type="quality")
plot(som.model, type="mapping")
plot(som.model, type="changes")
plot(som.model, type="dist.neighbours")

code : 看每個cluster中的屬性貢獻(權重)分布
mapping : 看每個cluster中的資料與cluster中心的距離
counts : 看每個cluster中有幾筆資料
quality : 顯示每個cluster內的資料與cluster中心的平均距離(越小越好 = 越集中 = 與其他cluster有較大差距)
dist.neighbours : 與其他鄰居資料的距離總和

code

counts

quality

mapping

changes

dist.neighbours

預測



som.testing.data.scale <- scale(som.testing.data)
som.pred <- predict(som.model, som.testing.data.scale)
som.pred$unit.classif

reference
第一篇
 第二篇
 第三篇

R SOM Cluster
Machine Learning

SOM

原理

kohonen in R

設定訓練參數

畫圖看看結果

預測

More tutorial / note

tags: `R` `beginner` `cat` `tutorial`

R SOM Cluster Machine Learning

SOM

原理

kohonen in R

設定訓練參數

畫圖看看結果

預測

More tutorial / note

tags: R beginner cat tutorial

Read more

研替面試之路

從成大測量到台大測量組

SQL語法學習心得-4

R 語言學習心得 Text Mining + WordCloud

R SOM Cluster
Machine Learning

tags: `R` `beginner` `cat` `tutorial`