Try   HackMD

帳號分析 hash string clustering 201223


本週實驗

  • KMeans features調整

    • 取 hash all 的value & word_length (字串長度)
    • 調整原因:推測兩個feature關聯性高
  • 把自訂流水號的帳號與原data串接,透過Kmeans分群結果觀察流水號相似或是字串相似的帳號會不會被分至不同群

  • 以初步分群結果再透過minimum distance細分

實驗方法

  • 建立三種帳號,每種帳號各 50 個,總計 150 筆,規則如下。
    1. 'hm0000'+流水號
    2. 'ssu'+流水號+'ssu'
    3. 'asd'+流水號+'asd'
    • (流水號範圍 0~50)
  • 將原始資料也取 150 筆與流水號帳號串接,共 300 筆帳號作為 test data
  • 將過濾後的帳號透過hash function轉成數值,並將此hash value與帳號長度做為Kmeans feature
  • 用Kmeans做分群並將結果匯出觀察
  • 初步分群後,將各群再做minimum edit distance matrix
  • 並將distance matrix 透過 hierachy clustering分成樹狀圖
  • 最後觀察三群的分群結果

實驗結果

Kmeans 分群 (3群)

  • 結果來看這樣的feature組合效果是不錯的
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Hierachy clustering

  • group 0 (分成三大群)

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • group 1 (僅分一大群)

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • group 2 (分成兩大群)

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • 從此實驗結果來看效果是非常好的,運算的時間複雜度也有所降低

Kmeans 群心數調整實驗

Kmeans 分群 (6群) (結果還可以)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Kmeans 分群 (9群) (有點over fitting了)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

下週計畫

  • 擴大資料量做看看,觀察哪些因素會因資料量不同有落差,並做對應調整
tags: Progress Report