HWDC 2024 - 當 Elasticsearch 遇上 AI
本次工作坊參考資訊
工作坊環境準備
-
Elasticsearch 8.15 以上版本
- 推薦:在 Elastic Cloud 使用 Email 註冊即可取得 14 天試用的免費環境,不用綁信用卡!
- 自行架設:確保版本一定要在 8.15.0 以上,並且是乾淨的環境,單純參與工作坊使用。
工作坊進行中,無法替大家解決各種自行架設造成的環境問題,因此不建議使用自行架設,若是要使用自行架設的環境,請確認自己對於 Elasticsearch 足夠熟悉能排除障礙。
-
這次的操作會使用到 Google Colab
- 需要登入你的 Google 帳號,並且依以下教學複製工作坊要使用的 Python Notebook 到你的 Google Drive 當中。
1. 準備 Elasticsearch 環境
依照不同的環境,你需要完成以下的任務
使用 Elastic Cloud:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
從 Elastic Cloud 取得 Cloud ID
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
安裝 Elasticsearch Analysis ICU Plugin 以及增加 Machine Learning Instance (重新啟動集群需要 5 分鐘)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
在 Kibana 建立 API Key
自行架設的 Elasticsearch:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
安裝 Elasticsearch Analysis ICU Plugin
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
若有啟用 Security,準備好能存取 Elasticsearch 的使用者帳號密碼
1.1 Elastic Cloud 版本的環境準備
a. 註冊 Elastic Cloud
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
收 Email 確認信。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
驗證並重新登入後,進入設定流程。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
可以選 Search (非必要)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
可以考慮選 Google Cloud Taiwan 的 Region,或許會比較快一點 (非必要)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
點選 Launch 之後,大約要等 3~5 分鐘,等待 Elasticsearch Cluster 建立。
最後完成時,你會看到下面這樣的畫面,這已經是進入 Kibana 的畫面了。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
若是要進入
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
b. 取得 Elastic Cloud ID
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
c. 安裝 ICU Analysis Plugin
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
第 5 步到第 6 步之中,多增加新增 Machine Learning Instance
- 為了之後試用 ESRE (Elasticsearch Relevance Engine) 功能
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
試用版的 Machine Learning Instance 的配置,可以增加到 4 GB RAM (非必要,只是可以跑快一些)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Confirm 之後,大約要等 5 分鐘的時間,讓 Elasticsearch Cluster Rolling update.
d. 建立 API Key
- 在 Kibana 網頁上方,搜尋
API Keys
,進入 Security / API Keys 頁面。
Kibana 網址在哪? 從 1.1 步驟的畫面上,就可以看到 Kibana 的 Endpoint,可直接點 Open。
- 點選右上
Create API Key
。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
3.隨便取一個名字,例如 HWDC2024-demo
,按下 Create API Key
。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 畫面上會出現 Encoded 的 API Key,將其複製起來保存。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
1.2 自行安裝 Elasticsearch 版本的環境準備 (Elastic Cloud 版請略過)
a. 安裝 ICU Analysis Plugin
在每個 Elasticsearch 節點執行以下執行,安裝 plugin,安裝完成後重新啟動節點。
2. 準備 Google Colab 的環境
2.1 複製 Colab Notebook 並且設定 Secret
a. 先將這次工作坊的 Python Notebook 複製到你的 Google Drive
請點選這份 Colab Notebook 並且複製到自己的 Google Drive。
請先複製到自己的 Google Drive,才能保存後續你所有的修改內容。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
b. 設定 ELASTIC_CLOUD_ID
與 ELASTIC_API_KEY
Secret
後續在 Colab 中進行 1. 準備環境 時,會需要透過這些設定,存取你的 Elasticsearch。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
2.2 執行 Colab Notebook 中的 1. 準備環境
將 1. 準備環境
裡面的每個步驟都先執行過,準備機器與安裝相依性套件。
6. ElasticSearch Relevance Engine (ESRE)
因為 ESRE 的出現,許多功能在 Elastic Stack 當中即可獨立完成,我們暫時不再需要 Python Client 的操作,接下來的操作步驟將透過 Kibana 進行。
6.1 使用 Kibana > Search > Machine Learning > Model Management > Trained Models 下載與佈署模型
- 打開 kibana
- 在上方搜尋框,搜尋
Machine Learning
,並且點選第一個推薦的結果
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 進入 Machine Learning 頁面後,在左方選單中,點選 Model Management 底下的 Trained Models
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-
點選 Add trained model
-
選擇 E5 模型,並點選下載。

- 等待下載完成。

- 點選 Deploy 的按鈕。

- Start 完成後,會出現 Deployed 的狀態。

6.2 建立 Ingest Pipeline 配合 Inference Process 使用模型進行 Embedding 處理
6.3 搜尋時,也能直接使用 Elasticsearch 內部的 Model,不需再 Python Client 先進行 Embedding 的處理。
6.4 使用外部 OpenAI 的 Text Embedding 模型
這把 API Key 供大家試用,僅提供 gpt-4o-mini & text-embedding-3-small,並且在研討會結束後就會移除。
sk-proj-(已失效,請改用自己的 API Key)
可在 Kibana Dev Tools 的 Variables 設定中,加入 openai_api_key
名稱的變數,以方便直接執行以下的範例。
這個透過 Inference API 建立好的 text_embedding 的 model id: openai-embeddings-3-small
就如同前面 Elasticsearch 載入的模型,可直接在 Inference Processor 或是 knn 的 query_vector_build 直接使用
6.5 使用外部 OpenAI 的 Completion 推論模型
6.6 Ingest Pipeline 配合 OpenAI Completion 推論模型的使用
新聞摘要
擷取電影重要關鍵字
6.7 Semantic Type & Semantic Query