COSCUP
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 回首過去,看 KKBOX 怎麼走搜尋音樂這條路 - Eason Chen {%hackmd tGp8pt49Q2aT5cD5TFyaIA %} * 從Elasticsearch 1.6升級到5經驗 * 基本上是人品不好的採坑紀錄 ## 常見問題 * 有沒有免費帳號? > 沒有 開發人員需要領年終 * KKBOX 不就播歌 還要做什麼? > 現在有追劇功能 - [KKTV](https://www.kktv.me/) > Live * 你們不是有 4000 萬首歌,為什麼我找找不到? - 版權和續約問題 ## 有哪些地方用到搜尋 - 線上搜尋頁面 - 結果用熱門度排序 - 音樂辨識搜尋:錄音、哼唱,透過 [ACRCloud](https://www.acrcloud.com/)技術提供 - CMS 後台搜尋需求,資料放在 MySQL,但全文搜尋不佳 - [KKBOX Open API](api.kkbox.com/v1.1/search) - [文件](https://docs-zhtw.kkbox.codes/docs) 曲庫資料放在mysql 效能不可靠(用關聯度在全文檢索會1. 累積大量 query 資料拖慢效能 2. 準確度不佳) 搜尋工作要交給專業的來 ### 應用 - 聊天機器人 - [KUBE](https://medium.com/kubeapp/%E8%AB%8B%E4%BE%86%E8%A9%A6%E8%A9%A6%E6%88%91%E5%80%91%E7%9A%84%E6%96%B0-app-kube-%E9%85%B7%E6%92%AD%E6%AD%8C%E5%96%AE-ccf9612952ca) (免費 無廣告) - [KUBE 官方](https://www.kube-app.com/today) KUBE ### 統計 - 有 40% 使用者使用搜尋找到自己喜歡的歌曲 - 每個月有8000~9000萬次搜尋 ## 目前用Elasticsearch當搜尋的核心架構 - 2014 開始轉換 - 當時ES還是 1.6 - 主要原因為老闆說要有 ES 就有了 ES,且開發比較快、沒有歷史包袱 ## MeCab 日文分詞工具 - 日本習慣輸入羅馬拼音來搜尋歌手 - 舉例:**西野**加奈,可以輸入**Nishino**,後面要能自動完成 - Line Data Labs 提供的新詞詞庫 - https://engineering.linecorp.com/ja/blog/detail/109 ## OpenCC 繁簡轉換 - https://github.com/BYVoid/OpenCC - 不管搜什麼,全部轉為台灣繁體去搜尋 ## 容錯處理 - 英文容易拼錯 - 解決方案 - ES 提供的 Term Suggester 根據編輯距離(兩字串轉換時所需要的轉換次數)來找到最有可能的字串 - 指定index 放在同一個shard 編輯字串越少越相近 越大越不同 例如搜尋Dangerous,會出現Danger Mouse,因為找不到完全一樣的,所以會找最接近的 ## Autocomplete - Edge n-gram - 開頭做切分,e.g. 周杰倫 -> 周 -> 周杰 - Context Suggester - FST ## 資料處理 * 小寫字母, lowercase * ASCII, asciifolding * 原型詞根, porter_stem * Dash縮寫符號,word_delimiter * 表情符號,emojione, 轉回英文名稱,前後加上特殊符號以示區分 > 好像很猛 可以搜表情符號 > 阿拉伯數字跟中文也有對應轉換 ## 配置 - ES cluster, Consul, Gearman Worker - 節點分配 - 3 servers for master node - 16 for data node 如果data node異常而且又是master node會有短暫空窗區 所以乾脆分離 master歸master Cron server to gearman worker 索引更新現在是1小時1次,所以會影響搜尋結果,速度和正確是互斥的要自己取決平衡點 ### 導入 Consul 原因 - 要解決預設使用到第一台 server 的問題 - 使用 random 方式將流量導向不同的 Server ### 升級 ES 到 5.6 - 升級前先建立好 CI/CD 環境,目前使用 Gitlab - 升級時程 1. 2014/6 es1.6 2. 2017/10 es2.4 3. 2018/03 es5.6 - 官方建議一個版號一個版號升級 - 內部索引相容性較高 ## 升級計畫 > Schema 不一致 - 功能測試 - Client 平行轉換 - 壓力測試 - gearman workers 模擬搜尋 連續執行5天,確認搜尋結果和差異是符合預期的 ### 備援計畫 - 要能隨時 rollback ### 升級效益 - CPU Ram 使用量下降 - JVM 最大值下降 - API 平均回應時間下降 30ms - Index 更新時間少了 33% ### 為什麼沒有升到6.x? 因為還在觀望中 讓子彈飛一會兒 升級成本很高 ## 監控 用過幾套 * Munin * Kibana + Marvel * Grafana ### Munin 圖太復古 太忙會掉圖 ### Kibana + Marvel 性能指標豐富 介面操作容易 資料來源穩定 ### kibana vs Grafana #### Kibana 資料來源: ES #### Grafana 資料來源: - Prometheus - [ES Exporter](https://github.com/justwatchcom/elasticsearch_exporter) - 針對指標設定通知條件 - 多種通知方式 (slack,line) Grafana介面很像戰情中心 且好修改 https://grafana.com/dashboards/7259 地表最完整ES Dashboard Github SourceCode https://github.com/KKBOX/grafana-elasticsearch-dashboard - 用了這個對於事件掌握比較好 ## 踩坑全紀錄 ### Case 1 - Swap Issue 某天在打電動 Slack 突然噴錯誤通知,趕快先重開機 第5台掛了,第4台接手,但因為第4台機器IO效能不及格,在接手歌單時才會有問題 server 5開始動用到SWAP 效能大幅下降 [Disabled swap](https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html) swap is the killer for efficient The offical recommanding turning off swapping. - 不同版的 Ubuntu 對於 SWAP 設定不同,因此就乾脆都關閉 SWAP,如此機器被踢出 Cluster 機率就變低了 ### Case 2 - 機器負載分布不均 本來就不可能每台機器負載都一樣 索引文件分配不均 - 問題:某些機器CPU使用率特別高 - 追蹤:跟索引文件的分配方式有關 - 解決方案 - 人工 Sharding 介入調整 ([Reroute API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html)) - 重構歌曲和歌單的 Query 熱門資料分散 重構是為了減少機器運算效能 調整過後就回到自動Sharding ### Case 3 - 搜尋效能下降 - 機器運行一段時間後,搜尋效能下降 - 追蹤 - GC 的花費時間,隨著 JVM 運行越久而不斷上升 - Groovy 有 Memory leak 問題 - AWS 上建立機器 個別測試功能 - 是歌單的問題 - 歌單搜尋中 groovy 的 query 移除後效能回覆不少 - 棄用 groovy 轉向 (請各位補充) ## 問題發問 ### Q1: 已經用Prometheus了 為什麼還要用Grafana的alert A:沒特別去測Prometheus怎麼使用,Grafana可以用就直接用 ### Q2: 在未來做更多搜尋方式,例如曲風 音頻特徵? A:曲風搜尋有計畫要做,規劃中 音頻算是多維資料在做搜尋會有些問題 ### Q3: 台灣比較少有找原廠諮詢的行為 請問有沒有打算找ES原廠協助 A: 反正OpenSource,喜歡自幹,挑戰自我 挑戰自我 ### Q4: 找繁體中文也會出現簡體中文內容 A: 只有索引做了簡轉繁 不影響曲目資料庫的資料 ### Q5: 版本轉換 1.x ~5.x,遇到搜尋語法改變的問題? 如filter不能用之類的 A: 階段性進行 1.6 to 2.4 1.6有filter,2.4希望全部轉query 2.4 to 5.6 scope語法大改 --- ## 聊天區 > 好像很多人在編 弄個迷你聊天室放在最下面,請隨意 是否要提供使用麥克風的錄音?(音樂播放完畢) 我發現項目符號有 * 派系和 - 派系呢 看起來是正常的就好 markdown設計就是那樣XD context suggester 我沒有聽到原理,有人可以補充嗎 他也只說用FST而已 OK :) ![](https://www.elastic.co/assets/bltced6bd71d5fa33ab/suggest_1.png?uid=bltced6bd71d5fa33ab) > FST: https://www.elastic.co/blog/you-complete-me 為什麼有歌名有 emoji because stuffs and reasons > 通常只有歌單才會有 emoji 唷 :P 建議一行不要打太長歐 別人也好幫你補充 一開始就用項目符號還蠻難補充東西進去的ORZ 給我翻譯翻譯 什麼是升級! 老闆說要升級 就有了升級 ###### tags: `COSCUP2018` `misc`

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully