Jeremy Lin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    2
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- # type: slide slideOptions: # 簡報相關的設定 theme: white # 顏色主題 center: false --- # How to Setup and Run Monitoring Using Prometheus and Grafana --- ## 在這一系列的 lab 我們會學到 - How to use docker compose to deploy Prometheus and Grafana - How to config Prometheus scrape job - How to create dashboard in Grafana - How to export/import dashboard in Grafana - How to config provisioning of data sources and dashboards in Grafana - How to instrument our application to expose metrics for Prometheus - How to install node exporter and config Prometheus and Grafana - How to deploy create a Kubernetes cluster on Google Cloud - How to deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) - How to deploy APP to Google Kubernetes Engine (GKE) --- ## Labs 1. Lab 1: Deploy and config Prometheus and Grafana using docker compose 2. Lab 2: Config auto provisioning of data sources and dashboards when deploying Grafana 4. Lab 3: Instrument application to expose metrics for Prometheus 3. Lab 4: Install node exporter and config Prometheus and Grafana 5. Lab 5 (Optional): Deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) 6. Lab 6 (Optional): Deploy APP to Google Kubernetes Engine (GKE) 7. Lab 7 (Optional): Create Kubernetes cluster on Google Cloud. --- ## 課堂事前準備 1. (Recommended) 安裝 docker engine: https://docs.docker.com/compose/install/ - Windows, Mac: 安裝對應的 docker desktop - Linux: 根據 platform & architecture 選擇對應的安裝方式 2. (Recommended) 安裝 docker compose - Windows, Mac: 已經包含在 docker desktop - Linux: https://docs.docker.com/compose/install/ 3. (Optional) 安裝 Python --- ## Lab 1: Deploy and config Prometheus and Grafana using docker compose --- ### 在這堂 lab 我們會學到 - 如何使用 docker compose 執行一整組的 container application, 包含 Prometheus 和 Grafana - 如何設定 Prometheus scrape job 來收集 Prometheus 自己的 metrics - 如何在 Prometheus 執行基本的 PromQL 查詢 - 如何在 Grafaha 新增 data source - 如何在 Grafaha 建立 dashboard - 如何在 Grafaha 匯出 dashboard - 如何在 Grafaha 匯入 dashboard --- ### What is docker compose - A tool for defining and running multi-container Docker applications. - Use a YAML file to configure your application’s services. - With a single command, you create and start all the services from your configuration --- ### docker compose yaml source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-1/docker-compose.yml {%gist jeremy-wang-lin/a6b37119a54b81c1feb24555b3cb71d5 %} --- ### 取得 lab sample code - 有安裝 git: - 打開 command line ``` git clone https://github.com/jeremy-wang-lin/nctu-course-on-monitoring.git ``` --- ### 取得 lab sample code - 沒有安裝 git: - 開瀏覽器連到 https://github.com/jeremy-wang-lin/nctu-course-on-monitoring - 點選 Code -> Download ZIP ![](https://i.imgur.com/4xC8tuE.png) - 存到任意目錄,解壓縮 --- ### 執行 docker compose 打開 command line ```bash cd <lab sample code 目錄位置> cd lab-1 docker-compose up ``` 成功的話,預期會出現像這樣的畫面: ![](https://i.imgur.com/N8Ka4gf.png) --- ### 連到 Prometheus Web GUI 打開瀏覽器連到 http://localhost:19090 成功的話,預期會出現像這樣的畫面: ![](https://i.imgur.com/PTOQavw.png) --- ### Config Prometheus source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-1/prometheus.yml {%gist jeremy-wang-lin/afc0c7824676249364d3d49c295ec4b6 %} --- ### Navigate Prometheus Web GUI --- ### Query target Status -> Targets <img src="https://hackmd.io/_uploads/BkpwDE6Uc.png" width="200" /> 預期出現像這樣的畫面: ![](https://i.imgur.com/AjvBR0O.png) --- ### Query metric - up `up` 是個特殊的 metric, 記錄每個 target scrape 成功與否. 1 表示成功. ![](https://i.imgur.com/E3oQtll.png) --- ### Query metric - process_resident_memory_bytes `process_resident_memory_bytes` 是從 prometheus exporter 問回來的,記錄 prometheus proesss 使用的 memory ![](https://i.imgur.com/ekByhlp.png) 從這個例子可以發現 metric 設計及命名的慣例: 1. 數值使用基本單位 (像是 bytes, seconds) 2. 將單位名稱作為 metric 名稱後綴 我們還可以對 metric 進行數值運算,轉換為比較好讀的格式 ![](https://i.imgur.com/CgPxY2d.png) --- ### Query the trend chart of metric values 在查詢 panel 點選 Graph ![](https://i.imgur.com/eOR1XKx.png) 像`process_resident_memory_bytes` 這樣的 metric 是所謂的 **gauge** 類型,像是身高體重,我們關心的是它的實際數值 --- ### Query metric - prometheus_http_requests_total `prometheus_http_requests_total` 是從 prometheus exporter 問回來的,記錄 prometheus request 的次數,是所謂 **counter** 類型的 metric ![](https://i.imgur.com/awqhJKL.png) - 這個 metric 有 4 個 label: code, handler, instance, job - Instrumentation label: code, handler - 應用程式在設計 metric 所定義 (用以區別) 的 label - Target label: instance, job - Prometheus 在 scrape monitor target 時附加上去的 --- ### Query metric - prometheus_http_requests_total (Cont'd) 每一種 label 組合定義了一個時間序列 (time series) ![](https://i.imgur.com/0TIcFeb.png) --- ### Use label matcher to filter time series 我們可以透過 label 來過濾所關注的時間序列, 例如: `prometheus_http_requests_total{handler="/metrics"}` ![](https://i.imgur.com/xn6tj07.png) --- ### Use rate function to calcuate how fast a counter is increasing counter 類型 metric,我們關注的不是它的絕對值,而是「它增加得有多快」。 先介紹所謂的 range vector. 這裡透過 ==[1m]== 的語法取得過去一分鐘 scrape 的結果 ``` prometheus_http_requests_total[1m] ``` ![](https://hackmd.io/_uploads/ry3tfXR8q.png) 我們可以使用 rate function 套用 range vector 來==推算==增加速度, 例如: `rate(prometheus_http_requests_total[1m])` ![](https://i.imgur.com/JbOR058.png) 意義:以過去一分鐘為樣本,計算 `prometheus_http_requests_total` 平均每秒增加多少數量 - 為什麼是 /metrics 路徑的請求數量一秒增加 0.2 次? => 因為 scrape interval 設 5 秒問一次 --- ### increase function - the syntactic sugar of rate function rate 回傳的意義是==每秒==的增加速度,如果我們想要知道過去一分鐘增加速度,我們要乘以秒數做換算: `rate(prometheus_http_requests_total[1m]) * 60` 而 increase function 提供了這樣的語法蜜糖 ``` increase(prometheus_http_requests_total[1m]) 等同於 rate(prometheus_http_requests_total[1m]) * 60 ``` 過去五分鐘增加數量 (注意==5m==) ``` increase(prometheus_http_requests_total[5m]) 等同於 rate(prometheus_http_requests_total[5m]) * 300 ``` --- ### The info metric - prometheus_build_info 像 prometheus_build_info, 數值恆為 1, 在 labels 標註版號, 建置等資訊的 metric, 是所謂的 info metric. ![](https://hackmd.io/_uploads/Hk5nXWRI9.png) 可以用來: - 跟其他 metric 做 join - 放在 dashboard 作為附註資訊 --- ### 連到 Grafana Web GUI 打開瀏覽器連到 http://localhost:13000 成功的話,預期會出現登入畫面: ![](https://i.imgur.com/eXnDXX6.png) 使用預設帳密 username: admin password: admin 成功後會請你輸入新的密碼,按 Skip 跳過即可。 --- ### Add data source 點選左下角的齒輪 icon -> Data sources ![](https://i.imgur.com/Xktp8AM.png) Add data source ![](https://i.imgur.com/V7wgMXL.png) 點選 Prometheus ![](https://i.imgur.com/pYKZpwL.png) --- ### Add data source (Cont'd) 按照下圖輸入設定。 注意 URL host 的部份要填入在 docker-compose.yml 定義的 service 名稱 my-prometheu, 以及 container 的 port 號。 因為這是兩個 container (my-grafana & my-prometheus) 之間的通訊。 ![](https://i.imgur.com/lLIQOWR.png) 最後按 Save & test, 成功的話應該會出現如下圖 "Data source is working" 的訊息 ![](https://i.imgur.com/BD0jFVN.png) --- ### Create dashboard 接下來我們要為 Prometheus 新增 dashboard 來觀察內部狀態。 首先,點選左邊的加號 => Dashboard ![](https://hackmd.io/_uploads/BJKXnNTL9.png) --- ### Add time series panel to show Prometheus process memory 第一個 panel, 要來觀察 Prometheus process memory 隨著時間的變化趨勢。 點選 Add a new panel ![](https://hackmd.io/_uploads/rJFza4aL5.png) 輸入我們要觀察的 metric, `process_resident_memory_bytes`. Options, Legend 填入 `{{ job }}`, 用 job label 作為序列名稱 ![](https://hackmd.io/_uploads/HJiW_YCL9.png) 右手邊 Panel options, 修改 Title, 改用 `Memory Usage` ![](https://hackmd.io/_uploads/BJXYVH68c.png) 接下來要告訴 Grafana 我們 metric 的數值單位是什麼,好讓它做適當的呈現。 再往下拉會看到 Standard options, Unit -> Data -> bytes(IEC) ![](https://hackmd.io/_uploads/ryR64BpIc.png) 修改完後,按右上角的 Apply, 回到 dashboard 頁面就可以看到像這樣的時間序列囉。 ![](https://hackmd.io/_uploads/By0WVBTU9.png) --- ### Add stat panel to show Prometheus series count 第二個 panel, 打算呈現 Prometheus 目前收集的時間序列個數。 點選右上角 Add panel 的 icon ![](https://hackmd.io/_uploads/HJcx0Z0Iq.png) 新增 panel, metric: `prometheus_tsdb_head_series` ![](https://hackmd.io/_uploads/B1NcDraI9.png) 右上角 panel 類型選擇 Stat ![](https://hackmd.io/_uploads/SJ5hDr689.png) Panel options, 修改 Title ![](https://hackmd.io/_uploads/BytpwHTL9.png) 我們可以在 Graph mode 設為 None, 不顯示 sparkline ![](https://hackmd.io/_uploads/H1vJdB6Lq.png) Thresholds 可以指定數值落在什麼範圍就顯示什麼顏色。 這個例子我們不需要,所以將原本設定按垃圾桶 icon 做刪除 ![](https://hackmd.io/_uploads/BJM2OBp8c.png) 結果如下 ![](https://hackmd.io/_uploads/H1QJtBTLq.png) --- ### Add stat panel to show Prometheus Version 接下來應用剛提到的 `prometheus_build_info` 來顯示版號資訊。 新增 panel, metric 輸入 `prometheus_build_info` Format 選 Table ![](https://hackmd.io/_uploads/r1HFDFA85.png) panel 類型選擇 Stat ![](https://hackmd.io/_uploads/SJ5hDr689.png) 修改 Title ![](https://hackmd.io/_uploads/HksuTGA8c.png) Value Options -> Fields -> 選擇 version ![](https://hackmd.io/_uploads/S1t0Z8aI9.png) 結果如下 ![](https://hackmd.io/_uploads/rkQi6z085.png) --- ### Add table panel to show HTTP request count last minute 新增 panel, metric 輸入 `increase(prometheus_http_requests_total[1m])` Format 選 Table Instant 勾起來 ![](https://hackmd.io/_uploads/ryK8HY08q.png) panel 類型選擇 Table ![](https://hackmd.io/_uploads/HJCfEI685.png) 點選 Query 右邊的 Transform, 新增 Organize fields 按照下圖點擊眼睛 icon 以選擇呈現/隱藏哪些欄位 按住欄位可以做拖放,以決定欄位排列順序 ![](https://hackmd.io/_uploads/BkoP48689.png) 別忘了修改我們的 title ![](https://hackmd.io/_uploads/Sk_KB86Iq.png) 最後結果如下 ![](https://hackmd.io/_uploads/HkLVHLaUc.png) --- ### Save dashboard 大功告成,儲存我們剛做好的 dashboard 吧 點選右上角的 save icon ![](https://hackmd.io/_uploads/HyQ6FITIq.png) Dashboard name 輸入如下,最後按 Save ![](https://hackmd.io/_uploads/rk_xDF0Iq.png) 完成! --- ### Add variable of data source 我們剛建好的 dashboard, data source 是寫死在 panel 的設定。 實務上,一個 Grafana 往往會設定多個 Prometheus data source, 如果為不同的 data source 分別建立 dashboard 既重工又難以維護。(違反了所謂 DRY 原則) 接下來介紹如何將 data source 抽成變數,讓我們的 dashboard 得以呈現不同 data source 的資訊。 點選右上角的齒輪 icon ![](https://hackmd.io/_uploads/ryggqLa89.png) 點選左邊的 Variables 選單 -> Add variable ![](https://hackmd.io/_uploads/HkXBcLpU5.png) 按照下圖設定變數,按 Update 儲存 ![](https://hackmd.io/_uploads/Hkz3jUTU5.png) 回到 dashboard 頁面就會看到多個一個變數下拉選單 ![](https://hackmd.io/_uploads/By5gPw6I9.png) 接下來在每個 panel 的 data source 就可以選到我們剛建的`datasouce` 變數 選擇好後儲存 ![](https://hackmd.io/_uploads/SycUoIaU5.png) --- ### Add variable of query 我們的 HTTP request panel 呈現所有 handler 以及 code 的組合,接下來示範怎麼透過變數來過濾。 回到 dashboard settings 頁面,點選 Variable 選單,再按右上角的 New ![](https://hackmd.io/_uploads/HkxL_v689.png) 按照下圖新增 handler 變數. Query: `label_values(prometheus_http_requests_total, handler)` 注意這裡用到 Grafana label_values() 函式,帶入 metric 以及 label, 即可傳回 label 的所有 value. 另外要注意的是,Data source 請選剛新增的 `datasource` 變數。 ![](https://hackmd.io/_uploads/ByIsboRUq.png) 回到 HTTP request 的 panel, 修改查詢條件,加上`{handler=~"${handler}"}`。帶入我們所新增的變數到 label 條件,注意比對條件是`=~`, 會做 regular expression 比對。 ![](https://hackmd.io/_uploads/ry975F0U9.png) 修改完後按右上角的 Apply ![](https://hackmd.io/_uploads/SJsxnwaIq.png) 回到 dashboard 頁面,在 handler 選擇要過濾的項目 ![](https://hackmd.io/_uploads/ryU83PTL5.png) 選好後按 Enter 或點擊頁面其他區域, HTTP request panel 就會跟著更新了 ![](https://hackmd.io/_uploads/HJ5N2vTUc.png) --- ### Export dashboard for reuse and version control Grafana 是以 json 格式儲存 dashboard 設定,我們可以匯出剛做好的 dashboard,將 json 內容存起來做版本控制。 怎麼匯出? 點選 dashboard 名稱右邊的 share icon ![](https://hackmd.io/_uploads/SJsqZc0Uc.png) 點選 Export 頁籤,可選擇 View JSON 或是 Save to file ![](https://hackmd.io/_uploads/HyAAb5C89.png) --- ### [補充] Import dashboard from grafana.com Prometheus 本身匯出非常多樣的 metrics, 我們剛建的 dashboard 只揭露一小部分的內部運作的資訊。我們可以查查 Grafana 是否有社群貢獻的 dashboard 可以使用。 Google "grafana prometheus dashboard",前幾筆就會查到這個頁面: ![](https://i.imgur.com/Dvxaslz.png) 有兩種 import 方式: 1. 於 Grafna Import GUI 輸入 dashboard id (3662) 2. 直接下載 JSON 檔案,再到 Grafana Import GUI 做上傳 --- ### [補充] Import dashboard from grafana.com (Cont'd) 點選左邊的加號 icon -> Import ![](https://i.imgur.com/a4xhj7K.png) 可上傳 JSON 檔案, 或是輸入 dashboard id (3662) ![](https://i.imgur.com/avEw8A0.png) 由於此 dashboard 需要有收 Prometheus metrics 的 data source, 所以在這裡要選擇我們剛新增的 data source (Prometheus), 最後按 Import ![](https://i.imgur.com/kOc2pgM.png) 成功的畫面 ![](https://i.imgur.com/umPbSHD.png) --- ### Clean up - 回到開啟 docker compose 的 command line, 按 Ctrl+C 停止容器. 請注意容器停止後,內容不會銷毀. 可以用 `docker-compose restart` 重啟容器 - 執行 `docker-compose rm` 刪除已停止的容器 (裡面的資料就會跟著銷毀了) --- ### More docker compose commands {%gist jeremy-wang-lin/6b23611f3ec79bb8058168c2a05be225 %} --- ## Lab 2: Config auto provisioning of data sources and dashboards when deploying Grafana --- ### Grafana provisioning 希望達成什麼目標 - 在前一個 lab, 部署完 Grafana 後要自行新增 data source, 匯入 dashboard. 希望可以在部署 Grafana 的同時就完成這些設定 有什麼好處 - 省事 (懶惰是工程師的美德) - 可以做 version control - 可以達成 GitOps 的精神 (Single Source of Truth) --- ### 在這堂 lab 我們會學到 - 如何設定 grafana data source provisioning - 如何設定 grafana dashboard provisioning --- ### Provision data source source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/datasource.yml {%gist jeremy-wang-lin/d41606137494e77a22581ac5d0ec9a7a %} --- ### Provision data source (Cont'd) In docker compose file, specify volume mapping to mount `datasource.yml` to path `/etc/grafana/provisioning/datasources/datasource.yml` inside grafana container ```yaml my-grafana: image: grafana/grafana:8.3.6 volumes: - ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml ``` --- ### Provision dashboard Prepare dashboard provider YAML file. Source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/dashboard-provider.yml {%gist jeremy-wang-lin/35d8c7a1eed4e262f180267bfa957ac0 %} --- ### Provision dashboard (Cont'd) Prepare dashboard JSON file: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/dashboard-prometheus.json --- ### Provision dashboard (Cont'd) Mount configuration files to grafana container: ```yaml my-grafana: image: grafana/grafana:8.3.6 volumes: - ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml - ./dashboard-provider.yml:/etc/grafana/provisioning/dashboards/dashboard-provider.yml - ./dashboard-prometheus.json:/var/lib/grafana/dashboards/demo/dashboard-prometheus.json ``` --- ### Run ```bash cd <lab sample code 目錄位置> cd lab-2 docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧 打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Data source provisioned Configuration -> Data Sources ![](https://hackmd.io/_uploads/rknph90Iq.png) 可以看到 data source 已經設定好了 ![](https://i.imgur.com/153lACy.png) --- ### Dashboard provisioned Dashboards -> Browse ![](https://hackmd.io/_uploads/rJIn2cRU9.png) 可以看到 dashboard 也是已經設定完成了 ![](https://hackmd.io/_uploads/HkZHp9CU5.png) --- ### Clean up ```bash # 在 lab-2 目錄下 docker-compose down ``` --- ### Reference Grafana provisioning: https://grafana.com/docs/grafana/latest/administration/provisioning/ --- ## Lab 3: Instrument application to expose metrics for Prometheus --- ### What is Instrumentation - 為我們的應用程式戴上「健康手環」,持續記錄應用程式內部狀態 - 目前收到多少的請求 (目前走了多少步) - 目前使用多少記憶體 (現在體重多少) - 目前等待的人數 - 目前登入系統的人數 - 最近一次處理的時間 - ... - 也就是在應用程式加入「建立及更新 metric 的程式碼」 --- ### Why Instrumentation - 最了解應用程式的實際狀況的還是應用程式本身 - 我們所關注的事件、指標 (metric) 一開始在應用程式就記錄下來可能最省事 - DevOps 精神:從源頭根本的地方處理問題 --- ### How to instrument our application for Prometheus - Use Prometheus Client Libary (https://prometheus.io/docs/instrumenting/clientlibs/) - Go - Java - Python - Node.JS - C/C++ - .NET / C# - Ruby --- ### 在這堂 lab 我們會學到 - 如何對 Python 程式做 instrumentation - 如何將我們埋的 metric 搭配 PromQL 計算 traffic, error rate --- ### Python sample code Let's trace the sample code: `lab-3/my-app.py` https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/my-app.py --- ### How to run the sample code 需安裝的 package ``` pip install prometheus-client ``` Run command ``` python my-app.py ``` --- ### Containerizing the sample application Dockerfile: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/Dockerfile ```dockerfile= FROM python:3.6.15-slim RUN pip install prometheus-client COPY ./my-app.py /app/ CMD ["python", "app/my-app.py"] ``` --- ### Run with docker compose docker compose file: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/docker-compose.yml ``` ... my-app: build: dockerfile: Dockerfile context: . ports: - 8000:8000 - 8123:8123 ``` Run command ```bash cd <lab sample code 目錄位置> cd lab-3 # 每次執行都強制重新 build docker image docker-compose up --build ``` --- ### 來做個實驗吧 開另外一個 command line ```bash cd <lab sample code 目錄位置> cd lab-3 sh endless_request.sh ``` 如果是 Windows , 沒有安裝 Linux tool 的套件的話,可以開瀏覽器或 postman 直接打 `http://localhost:8000/foo` (多按幾次) 來測試。 --- ### 連到 Prometheus 打開 browser 連到 `http://localhost:19090` --- ### PromQL basic 查詢 metric 所有時間序列 ``` myapp_requests_total ``` 透過 label 篩選出特定的時間序列 ``` myapp_requests_total{path="/foo"} ``` label 可以帶多組條件 ``` myapp_requests_total{path="/foo", response_code="200"} ``` 參考: https://prometheus.io/docs/prometheus/latest/querying/basics/ --- ### PromQL aggregation 加總所有時間序列的數值 ``` sum(myapp_requests_total) ``` by path 這個 label 做加總 ``` sum by (path) (myapp_requests_total) ``` by 擺前面擺後面都可以 ``` sum (myapp_requests_total) by (path) ``` 參考: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators --- ### 計算 traffic 先介紹所謂的 range vector ``` myapp_requests_total[1m] ``` 怎麼算 traffic? => 使用 rate function 套用到 range vector 計算每秒增加幾筆 ``` rate(myapp_requests_total[1m]) ``` ==有沒有什麼發現?== <br/> 加總起來看看。這就是整體的 traffic ``` sum(rate(myapp_requests_total[1m])) ``` 也可以 by path 看 traffic ``` sum by (path) (rate(myapp_requests_total[1m])) ``` --- ### 計算整體 error rate 整體 error rate ``` sum(myapp_requests_total{response_code="500"}) / sum(myapp_requests_total) ``` by path 看整體 error rate ``` sum by (path) (myapp_requests_total{response_code="500"}) / sum by (path) (myapp_requests_total) ``` --- ### 計算 error rate 隨著時間的變化 過去一分鐘請求量增長速度 (分母) ``` sum by (path) (rate(myapp_requests_total[1m])) ``` 過去一分鐘 error 增長速度 (分子) ``` sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) ``` 兩者相除即算出 error rate ``` sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) / sum by (path) (rate(myapp_requests_total[1m])) ``` --- ### 如何設定 Monitor 告警機制 1. 在 Prometheus 設定檔指定 rule file 2. 在 rule file 設定 alerting criterial 3. 在 Prometheus 設定檔指定 Alertmanager 4. 在 Alertmanager 設定檔指定如何發送 alert --- ### 在 Prometheus 設定檔指定 rule file source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/6ef24b92556e084643446f6c1c8654458a50d1f4/lab-3/prometheus.yml#L12 ```yaml= rule_files: - "alert-rules.yml" ... ``` --- --- ### 設定 alert rule alert rule 也是透過 YAML 來定義,以下是 alert rule 設定範例 source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/alert-rules.yml 當監控對象問不到時發 alert ```yaml= groups: - name: My alert rules rules: - alert: PrometheusTargetMissing expr: up == 0 for: 1m labels: severity: warning annotations: summary: 'Prometheus scrape target is missing. instance: {{ $labels.instance }} ' ``` --- ### 設定 alert rule (Cont'd) 當 error rate 高於 0.1, 持續 2 分鐘時發 alert ```yaml= - alert: MyAppHighErrorRate expr: > (sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) / sum by (path) (rate(myapp_requests_total[1m])) ) > 0.1 for: 2m labels: severity: warning annotations: summary: 'My Application has high error rate. VALUE: {{ $value }} ' ``` --- ### 在 Prometheus 設定檔指定 Alertmanager source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/6ef24b92556e084643446f6c1c8654458a50d1f4/lab-3/prometheus.yml#L6 ```yaml= alerting: alertmanagers: - static_configs: # 指定 compose file 的 service 名稱 (my-alertmanager), 以及 container port 9093 - targets: [my-alertmanager:9093] ... ``` --- ### 設定 Alertmanager source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/alertmanager.yml ```yaml= route: receiver: email-me receivers: - name: email-me email_configs: - to: <your gmail> from: <your gmail> smarthost: smtp.gmail.com:587 auth_username: <your gmail> auth_identity: <your gmail> auth_password: <google app password> ``` 更多 Alertmanager 設定請參考: https://prometheus.io/docs/alerting/latest/configuration/ --- ### 如何將 alet 寄到 gmail account 參考這兩篇文章: https://ithelp.ithome.com.tw/articles/10271492 https://www.robustperception.io/sending-email-with-the-alertmanager-via-gmail --- ### 設定 info metric python sample code ```python= from prometheus_client import Gauge app_info = { 'app': 'myapp', 'author' : 'Jeremy Lin', 'author_email': 'alucard.lin@gmail.com', 'version' : '0.0.1' } INFO = Gauge('myapp_info', 'Demo info metric to record application info', labelnames=app_info.keys()) INFO.labels(**app_info).set(1) ``` 打開 http://localhost:8123/metrics exposed metric ``` # HELP myapp_info Demo info metric to record program info # TYPE myapp_info gauge myapp_info{app="myapp",author="Jeremy Lin",author_email="alucard.lin@gmail.com",version="0.0.1"} 1.0 ``` 練習:請將 author, author_email 改成自己的,然後重跑看看 --- ### 接下來請試著做 Grafana dashboard 吧 需要顯示的資訊 - 程式作者, 作者 email - error rate 每分鐘的變化 --- ### Clean up ```bash # 在 lab-3 目錄下 docker-compose down ``` --- ## Lab 4: Install Node Exporter and config Prometheus and Grafana --- ## Lab 4-1: Linux --- ### Install and run Node Exporter in Linux 參考: https://prometheus.io/docs/guides/node-exporter/#installing-and-running-the-node-exporter 執行下列指令: ```bash # dowloand wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz # extract it tar xvzf node_exporter-1.3.1.linux-amd64.tar.gz # run cd node_exporter-1.3.1.linux-amd64 ./node_exporter ``` 打開 browser 連到 `http://localhost:9100` --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd linux docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧 打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/Sk84IdWL5.png) --- ### Clean up ```bash # 在 lab-4/linux 目錄下 docker-compose down ``` --- ## Lab 4-2: MacOS --- ### Install and run Node Exporter in Mac 執行下列指令: ```bash # dowloand wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.darwin-amd64.tar.gz # extract it tar xvzf node_exporter-1.3.1.darwin-amd64.tar.gz # run cd node_exporter-1.3.1.darwin-amd64 ./node_exporter ``` 打開 browser 連到 `http://localhost:9100` --- ### Install and run Node Exporter in Mac (using brew) Install Command: ``` brew install node_exporter ``` Run: ``` brew services restart node_exporter ``` 打開 browser 連到 `http://localhost:9100` Stop: ``` brew services stop node_exporter ``` --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd mac docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧 打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/Sk84IdWL5.png) --- ### Clean up ```bash # 在 lab-4/mac 目錄下 docker-compose down ``` --- ## Lab 4-3: Windows (Windows Expoter) --- ### Install Windows Exporter for Windows Download: ``` https://github.com/prometheus-community/windows_exporter/releases/download/v0.18.1/windows_exporter-0.18.1-amd64.msi ``` 執行 msi 檔案。成功後會啟動並註冊 windows_exporter 到「服務」: ![](https://i.imgur.com/ag1viX6.png) 打開 browser 連到 `http://localhost:9182` Stop: 到服務去停止 windows_exporter 即可 --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd windows docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧 打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/rk2kJd-8q.png) --- ### Clean up ```bash # 在 lab-4/windows 目錄下 docker-compose down ``` --- ### Reference Simple step-by-step guide to install Prometheus and node exporter: https://prometheus.io/docs/guides/node-exporter/ Node exporter: https://github.com/prometheus/node_exporter Windows exporter: https://github.com/prometheus-community/windows_exporter --- ## Lab 5 (Optional): Deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) --- ### 佈署Prometheus & Grafana Connect to google cloud console https://console.cloud.google.com/ Select your project (lab group) ![](https://i.imgur.com/bMrQ0n5.png) 點選左邊 "Marketplace" 搜尋 "prometheus", 選擇 "Prometheus & Grafnan" ![](https://i.imgur.com/nlyT4HZ.png) 點選 "設定" ![](https://i.imgur.com/zH3zs17.jpg) 選擇部署GKE Cluster ![](https://i.imgur.com/7dIcWVB.jpg) 選擇"StorageClass", 按下"部署". ![](https://i.imgur.com/Li6kCdg.jpg) 待其佈署完成. 最多需15分鐘. ![](https://i.imgur.com/Ssk7CJB.jpg) 完成畫面. ![](https://i.imgur.com/K3sUKm8.png) --- ### 如何進入 Prometheus 進入 cluster console. ![](https://i.imgur.com/V29HpaW.png) 使用 port forwarding. https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/ 輸入 "kubectl port-forward pod/prometheus-1-prometheus-0 8080:9090" 點選右上角"網頁預覽" 即可進入 Prometheus. ![](https://i.imgur.com/GZqvJ5J.png) ![](https://i.imgur.com/TgAX1aW.png) --- ### 如何進入 Grafana 步驟如 Prometheus 但 forword port 要改為如下. kubectl port-forward pod/prometheus-1-grafana-0 8080:3000 ![](https://i.imgur.com/uFRJlpU.png) ![](https://i.imgur.com/T8wEfW2.png) Grafana 管理帳號為admin, 密碼存在vault 中. 取出command 如下. ![](https://i.imgur.com/jE8WF6O.png) ``` kubectl get secret prometheus-1-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo ``` ![](https://i.imgur.com/FqDCtZ1.png) --- ## Lab 6 (Optional): Deploy Metric exporter to Google Kubernetes Engine (GKE) --- ### 佈署Prometheus & Grafana 可參考此篇介紹. https://cloud.google.com/kubernetes-engine/docs/quickstarts/deploy-app-container-image#python 進入 cluster console. git clone https://github.com/wcncymr/metric-exporter.git ![](https://i.imgur.com/lfourAa.png) rename dockerignore to .dockerignore mv dockerignore to .dockerignore ![](https://i.imgur.com/7A00JNd.png) 如果沒有Repo需先 create. 需修改 PROJECT_ID & LOCATION gcloud artifacts repositories create hello-repo \ --project=PROJECT_ID \ --repository-format=docker \ --location=LOCATION \ --description="Docker repository" Build your container image. 以下command會參考Dockerfile內容 build image. gcloud auth configure-docker ![](https://i.imgur.com/cNcw23P.png) gcloud builds submit \ --tag asia-east1-docker.pkg.dev/trainer-lib/hello-repo/hellometric . ![](https://i.imgur.com/1daQ9qO.png) ![](https://i.imgur.com/Dn99rqo.png) Deploy the resource to the cluster: kubectl apply -f deployment.yaml kubectl get deployments kubectl get pods ![](https://i.imgur.com/fhYLpxX.png) 如果build失敗. 檢查看是否沒指定 project. gcloud config set project trainer-lab Deploy a Service to access from chrome. kubectl apply -f service.yaml kubectl get services service end point 會花幾分鐘才會建立 ![](https://i.imgur.com/9aYTFBv.png) service 建立成功. 點選即可開啟 APP. ![](https://i.imgur.com/pYIJ3zg.png) ![](https://i.imgur.com/J6xmoyI.png) URL 加上 metrics 即可看到產生的metrics. ![](https://i.imgur.com/VM4F5HB.png) 再回到 Prometheus 即可查到有資料進來. 但 Prometheus 是如何scrape這些資料的 ? ![](https://i.imgur.com/sOo6HM5.png) 透過在 service 中加入 annotations: prometheus.io/scrape: "true" 來讓 Prometheus 知道要來scrape metrics. ![](https://hackmd.io/_uploads/r1Pbua4dq.png) --- ## Lab 7 (Optional): Create a Kubernetes cluster on Google Cloud. --- ### 建立GKE 方法一 選擇要建立GKE的Project. ![](https://hackmd.io/_uploads/rk79ybmD9.png) 進入Kubernetes Engine ![](https://hackmd.io/_uploads/SJOJxW7w9.png) 點建立 ![](https://hackmd.io/_uploads/r1d4EWmwq.png) 選GKE Standard ![](https://hackmd.io/_uploads/Hk-KeWmDc.png) 修改名稱及區域即可建立. ![](https://hackmd.io/_uploads/HJpuZWmv9.png) ### 建立GKE 方法二 也可使用gcloud command 來建立. https://cloud.google.com/kubernetes-engine/docs/quickstarts/deploy-app-container-image#standard 進入Cloud Shell 執行下面命令. ![](https://hackmd.io/_uploads/B1KIEbXwc.png) gcloud container clusters create helloworld-gke --num-nodes 1 --zone asia-east1 --- ## Reference Docker Compose: https://docs.docker.com/compose/ Grafana provisioning: https://grafana.com/docs/grafana/latest/administration/provisioning/ Simple step-by-step guide to install Prometheus and node exporter: https://prometheus.io/docs/guides/node-exporter/ Node exporter: https://github.com/prometheus/node_exporter Deploy Promethes, Grafana on GKE: https://ithelp.ithome.com.tw/articles/10224491 https://ithelp.ithome.com.tw/articles/10224364 --- ## Backup ---

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully