How to Setup and Run Monitoring Using Prometheus and Grafana

--- # type: slide slideOptions: # 簡報相關的設定 theme: white # 顏色主題 center: false --- # How to Setup and Run Monitoring Using Prometheus and Grafana --- ## 在這一系列的 lab 我們會學到 - How to use docker compose to deploy Prometheus and Grafana - How to config Prometheus scrape job - How to create dashboard in Grafana - How to export/import dashboard in Grafana - How to config provisioning of data sources and dashboards in Grafana - How to instrument our application to expose metrics for Prometheus - How to install node exporter and config Prometheus and Grafana - How to deploy create a Kubernetes cluster on Google Cloud - How to deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) - How to deploy APP to Google Kubernetes Engine (GKE) --- ## Labs 1. Lab 1: Deploy and config Prometheus and Grafana using docker compose 2. Lab 2: Config auto provisioning of data sources and dashboards when deploying Grafana 4. Lab 3: Instrument application to expose metrics for Prometheus 3. Lab 4: Install node exporter and config Prometheus and Grafana 5. Lab 5 (Optional): Deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) 6. Lab 6 (Optional): Deploy APP to Google Kubernetes Engine (GKE) 7. Lab 7 (Optional): Create Kubernetes cluster on Google Cloud. --- ## 課堂事前準備 1. (Recommended) 安裝 docker engine: https://docs.docker.com/compose/install/ - Windows, Mac: 安裝對應的 docker desktop - Linux: 根據 platform & architecture 選擇對應的安裝方式 2. (Recommended) 安裝 docker compose - Windows, Mac: 已經包含在 docker desktop - Linux: https://docs.docker.com/compose/install/ 3. (Optional) 安裝 Python --- ## Lab 1: Deploy and config Prometheus and Grafana using docker compose --- ### 在這堂 lab 我們會學到 - 如何使用 docker compose 執行一整組的 container application, 包含 Prometheus 和 Grafana - 如何設定 Prometheus scrape job 來收集 Prometheus 自己的 metrics - 如何在 Prometheus 執行基本的 PromQL 查詢 - 如何在 Grafaha 新增 data source - 如何在 Grafaha 建立 dashboard - 如何在 Grafaha 匯出 dashboard - 如何在 Grafaha 匯入 dashboard --- ### What is docker compose - A tool for defining and running multi-container Docker applications. - Use a YAML file to configure your application’s services. - With a single command, you create and start all the services from your configuration --- ### docker compose yaml source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-1/docker-compose.yml {%gist jeremy-wang-lin/a6b37119a54b81c1feb24555b3cb71d5 %} --- ### 取得 lab sample code - 有安裝 git: - 打開 command line ``` git clone https://github.com/jeremy-wang-lin/nctu-course-on-monitoring.git ``` --- ### 取得 lab sample code - 沒有安裝 git: - 開瀏覽器連到 https://github.com/jeremy-wang-lin/nctu-course-on-monitoring - 點選 Code -> Download ZIP ![](https://i.imgur.com/4xC8tuE.png) - 存到任意目錄，解壓縮 --- ### 執行 docker compose 打開 command line ```bash cd <lab sample code 目錄位置> cd lab-1 docker-compose up ``` 成功的話，預期會出現像這樣的畫面： ![](https://i.imgur.com/N8Ka4gf.png) --- ### 連到 Prometheus Web GUI 打開瀏覽器連到 http://localhost:19090 成功的話，預期會出現像這樣的畫面： ![](https://i.imgur.com/PTOQavw.png) --- ### Config Prometheus source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-1/prometheus.yml {%gist jeremy-wang-lin/afc0c7824676249364d3d49c295ec4b6 %} --- ### Navigate Prometheus Web GUI --- ### Query target Status -> Targets <img src="https://hackmd.io/_uploads/BkpwDE6Uc.png" width="200" /> 預期出現像這樣的畫面： ![](https://i.imgur.com/AjvBR0O.png) --- ### Query metric - up `up` 是個特殊的 metric, 記錄每個 target scrape 成功與否. 1 表示成功. ![](https://i.imgur.com/E3oQtll.png) --- ### Query metric - process_resident_memory_bytes `process_resident_memory_bytes` 是從 prometheus exporter 問回來的，記錄 prometheus proesss 使用的 memory ![](https://i.imgur.com/ekByhlp.png) 從這個例子可以發現 metric 設計及命名的慣例: 1. 數值使用基本單位 (像是 bytes, seconds) 2. 將單位名稱作為 metric 名稱後綴我們還可以對 metric 進行數值運算，轉換為比較好讀的格式 ![](https://i.imgur.com/CgPxY2d.png) --- ### Query the trend chart of metric values 在查詢 panel 點選 Graph ![](https://i.imgur.com/eOR1XKx.png) 像`process_resident_memory_bytes` 這樣的 metric 是所謂的 **gauge** 類型，像是身高體重，我們關心的是它的實際數值 --- ### Query metric - prometheus_http_requests_total `prometheus_http_requests_total` 是從 prometheus exporter 問回來的，記錄 prometheus request 的次數，是所謂 **counter** 類型的 metric ![](https://i.imgur.com/awqhJKL.png) - 這個 metric 有 4 個 label: code, handler, instance, job - Instrumentation label: code, handler - 應用程式在設計 metric 所定義 (用以區別) 的 label - Target label: instance, job - Prometheus 在 scrape monitor target 時附加上去的 --- ### Query metric - prometheus_http_requests_total (Cont'd) 每一種 label 組合定義了一個時間序列 (time series) ![](https://i.imgur.com/0TIcFeb.png) --- ### Use label matcher to filter time series 我們可以透過 label 來過濾所關注的時間序列, 例如: `prometheus_http_requests_total{handler="/metrics"}` ![](https://i.imgur.com/xn6tj07.png) --- ### Use rate function to calcuate how fast a counter is increasing counter 類型 metric，我們關注的不是它的絕對值，而是「它增加得有多快」。先介紹所謂的 range vector. 這裡透過 ==[1m]== 的語法取得過去一分鐘 scrape 的結果 ``` prometheus_http_requests_total[1m] ``` ![](https://hackmd.io/_uploads/ry3tfXR8q.png) 我們可以使用 rate function 套用 range vector 來==推算==增加速度, 例如: `rate(prometheus_http_requests_total[1m])` ![](https://i.imgur.com/JbOR058.png) 意義：以過去一分鐘為樣本，計算 `prometheus_http_requests_total` 平均每秒增加多少數量 - 為什麼是 /metrics 路徑的請求數量一秒增加 0.2 次？ => 因為 scrape interval 設 5 秒問一次 --- ### increase function - the syntactic sugar of rate function rate 回傳的意義是==每秒==的增加速度，如果我們想要知道過去一分鐘增加速度，我們要乘以秒數做換算： `rate(prometheus_http_requests_total[1m]) * 60` 而 increase function 提供了這樣的語法蜜糖 ``` increase(prometheus_http_requests_total[1m]) 等同於 rate(prometheus_http_requests_total[1m]) * 60 ``` 過去五分鐘增加數量 (注意==5m==) ``` increase(prometheus_http_requests_total[5m]) 等同於 rate(prometheus_http_requests_total[5m]) * 300 ``` --- ### The info metric - prometheus_build_info 像 prometheus_build_info, 數值恆為 1, 在 labels 標註版號, 建置等資訊的 metric, 是所謂的 info metric. ![](https://hackmd.io/_uploads/Hk5nXWRI9.png) 可以用來: - 跟其他 metric 做 join - 放在 dashboard 作為附註資訊 --- ### 連到 Grafana Web GUI 打開瀏覽器連到 http://localhost:13000 成功的話，預期會出現登入畫面： ![](https://i.imgur.com/eXnDXX6.png) 使用預設帳密 username: admin password: admin 成功後會請你輸入新的密碼，按 Skip 跳過即可。 --- ### Add data source 點選左下角的齒輪 icon -> Data sources ![](https://i.imgur.com/Xktp8AM.png) Add data source ![](https://i.imgur.com/V7wgMXL.png) 點選 Prometheus ![](https://i.imgur.com/pYKZpwL.png) --- ### Add data source (Cont'd) 按照下圖輸入設定。注意 URL host 的部份要填入在 docker-compose.yml 定義的 service 名稱 my-prometheu, 以及 container 的 port 號。因為這是兩個 container (my-grafana & my-prometheus) 之間的通訊。 ![](https://i.imgur.com/lLIQOWR.png) 最後按 Save & test, 成功的話應該會出現如下圖 "Data source is working" 的訊息 ![](https://i.imgur.com/BD0jFVN.png) --- ### Create dashboard 接下來我們要為 Prometheus 新增 dashboard 來觀察內部狀態。首先，點選左邊的加號 => Dashboard ![](https://hackmd.io/_uploads/BJKXnNTL9.png) --- ### Add time series panel to show Prometheus process memory 第一個 panel, 要來觀察 Prometheus process memory 隨著時間的變化趨勢。點選 Add a new panel ![](https://hackmd.io/_uploads/rJFza4aL5.png) 輸入我們要觀察的 metric, `process_resident_memory_bytes`. Options, Legend 填入 `{{ job }}`, 用 job label 作為序列名稱 ![](https://hackmd.io/_uploads/HJiW_YCL9.png) 右手邊 Panel options, 修改 Title, 改用 `Memory Usage` ![](https://hackmd.io/_uploads/BJXYVH68c.png) 接下來要告訴 Grafana 我們 metric 的數值單位是什麼，好讓它做適當的呈現。再往下拉會看到 Standard options, Unit -> Data -> bytes(IEC) ![](https://hackmd.io/_uploads/ryR64BpIc.png) 修改完後，按右上角的 Apply, 回到 dashboard 頁面就可以看到像這樣的時間序列囉。 ![](https://hackmd.io/_uploads/By0WVBTU9.png) --- ### Add stat panel to show Prometheus series count 第二個 panel, 打算呈現 Prometheus 目前收集的時間序列個數。點選右上角 Add panel 的 icon ![](https://hackmd.io/_uploads/HJcx0Z0Iq.png) 新增 panel, metric: `prometheus_tsdb_head_series` ![](https://hackmd.io/_uploads/B1NcDraI9.png) 右上角 panel 類型選擇 Stat ![](https://hackmd.io/_uploads/SJ5hDr689.png) Panel options, 修改 Title ![](https://hackmd.io/_uploads/BytpwHTL9.png) 我們可以在 Graph mode 設為 None, 不顯示 sparkline ![](https://hackmd.io/_uploads/H1vJdB6Lq.png) Thresholds 可以指定數值落在什麼範圍就顯示什麼顏色。這個例子我們不需要，所以將原本設定按垃圾桶 icon 做刪除 ![](https://hackmd.io/_uploads/BJM2OBp8c.png) 結果如下 ![](https://hackmd.io/_uploads/H1QJtBTLq.png) --- ### Add stat panel to show Prometheus Version 接下來應用剛提到的 `prometheus_build_info` 來顯示版號資訊。新增 panel, metric 輸入 `prometheus_build_info` Format 選 Table ![](https://hackmd.io/_uploads/r1HFDFA85.png) panel 類型選擇 Stat ![](https://hackmd.io/_uploads/SJ5hDr689.png) 修改 Title ![](https://hackmd.io/_uploads/HksuTGA8c.png) Value Options -> Fields -> 選擇 version ![](https://hackmd.io/_uploads/S1t0Z8aI9.png) 結果如下 ![](https://hackmd.io/_uploads/rkQi6z085.png) --- ### Add table panel to show HTTP request count last minute 新增 panel, metric 輸入 `increase(prometheus_http_requests_total[1m])` Format 選 Table Instant 勾起來 ![](https://hackmd.io/_uploads/ryK8HY08q.png) panel 類型選擇 Table ![](https://hackmd.io/_uploads/HJCfEI685.png) 點選 Query 右邊的 Transform, 新增 Organize fields 按照下圖點擊眼睛 icon 以選擇呈現/隱藏哪些欄位按住欄位可以做拖放，以決定欄位排列順序 ![](https://hackmd.io/_uploads/BkoP48689.png) 別忘了修改我們的 title ![](https://hackmd.io/_uploads/Sk_KB86Iq.png) 最後結果如下 ![](https://hackmd.io/_uploads/HkLVHLaUc.png) --- ### Save dashboard 大功告成，儲存我們剛做好的 dashboard 吧點選右上角的 save icon ![](https://hackmd.io/_uploads/HyQ6FITIq.png) Dashboard name 輸入如下，最後按 Save ![](https://hackmd.io/_uploads/rk_xDF0Iq.png) 完成！ --- ### Add variable of data source 我們剛建好的 dashboard, data source 是寫死在 panel 的設定。實務上，一個 Grafana 往往會設定多個 Prometheus data source, 如果為不同的 data source 分別建立 dashboard 既重工又難以維護。(違反了所謂 DRY 原則) 接下來介紹如何將 data source 抽成變數，讓我們的 dashboard 得以呈現不同 data source 的資訊。點選右上角的齒輪 icon ![](https://hackmd.io/_uploads/ryggqLa89.png) 點選左邊的 Variables 選單 -> Add variable ![](https://hackmd.io/_uploads/HkXBcLpU5.png) 按照下圖設定變數，按 Update 儲存 ![](https://hackmd.io/_uploads/Hkz3jUTU5.png) 回到 dashboard 頁面就會看到多個一個變數下拉選單 ![](https://hackmd.io/_uploads/By5gPw6I9.png) 接下來在每個 panel 的 data source 就可以選到我們剛建的`datasouce` 變數選擇好後儲存 ![](https://hackmd.io/_uploads/SycUoIaU5.png) --- ### Add variable of query 我們的 HTTP request panel 呈現所有 handler 以及 code 的組合，接下來示範怎麼透過變數來過濾。回到 dashboard settings 頁面，點選 Variable 選單，再按右上角的 New ![](https://hackmd.io/_uploads/HkxL_v689.png) 按照下圖新增 handler 變數. Query: `label_values(prometheus_http_requests_total, handler)` 注意這裡用到 Grafana label_values() 函式，帶入 metric 以及 label, 即可傳回 label 的所有 value. 另外要注意的是，Data source 請選剛新增的 `datasource` 變數。 ![](https://hackmd.io/_uploads/ByIsboRUq.png) 回到 HTTP request 的 panel, 修改查詢條件，加上`{handler=~"${handler}"}`。帶入我們所新增的變數到 label 條件，注意比對條件是`=~`, 會做 regular expression 比對。 ![](https://hackmd.io/_uploads/ry975F0U9.png) 修改完後按右上角的 Apply ![](https://hackmd.io/_uploads/SJsxnwaIq.png) 回到 dashboard 頁面，在 handler 選擇要過濾的項目 ![](https://hackmd.io/_uploads/ryU83PTL5.png) 選好後按 Enter 或點擊頁面其他區域， HTTP request panel 就會跟著更新了 ![](https://hackmd.io/_uploads/HJ5N2vTUc.png) --- ### Export dashboard for reuse and version control Grafana 是以 json 格式儲存 dashboard 設定，我們可以匯出剛做好的 dashboard，將 json 內容存起來做版本控制。怎麼匯出？點選 dashboard 名稱右邊的 share icon ![](https://hackmd.io/_uploads/SJsqZc0Uc.png) 點選 Export 頁籤，可選擇 View JSON 或是 Save to file ![](https://hackmd.io/_uploads/HyAAb5C89.png) --- ### [補充] Import dashboard from grafana.com Prometheus 本身匯出非常多樣的 metrics, 我們剛建的 dashboard 只揭露一小部分的內部運作的資訊。我們可以查查 Grafana 是否有社群貢獻的 dashboard 可以使用。 Google "grafana prometheus dashboard"，前幾筆就會查到這個頁面： ![](https://i.imgur.com/Dvxaslz.png) 有兩種 import 方式： 1. 於 Grafna Import GUI 輸入 dashboard id (3662) 2. 直接下載 JSON 檔案，再到 Grafana Import GUI 做上傳 --- ### [補充] Import dashboard from grafana.com (Cont'd) 點選左邊的加號 icon -> Import ![](https://i.imgur.com/a4xhj7K.png) 可上傳 JSON 檔案, 或是輸入 dashboard id (3662) ![](https://i.imgur.com/avEw8A0.png) 由於此 dashboard 需要有收 Prometheus metrics 的 data source, 所以在這裡要選擇我們剛新增的 data source (Prometheus), 最後按 Import ![](https://i.imgur.com/kOc2pgM.png) 成功的畫面 ![](https://i.imgur.com/umPbSHD.png) --- ### Clean up - 回到開啟 docker compose 的 command line, 按 Ctrl+C 停止容器. 請注意容器停止後，內容不會銷毀. 可以用 `docker-compose restart` 重啟容器 - 執行 `docker-compose rm` 刪除已停止的容器 (裡面的資料就會跟著銷毀了) --- ### More docker compose commands {%gist jeremy-wang-lin/6b23611f3ec79bb8058168c2a05be225 %} --- ## Lab 2: Config auto provisioning of data sources and dashboards when deploying Grafana --- ### Grafana provisioning 希望達成什麼目標 - 在前一個 lab, 部署完 Grafana 後要自行新增 data source, 匯入 dashboard. 希望可以在部署 Grafana 的同時就完成這些設定有什麼好處 - 省事 (懶惰是工程師的美德) - 可以做 version control - 可以達成 GitOps 的精神 (Single Source of Truth) --- ### 在這堂 lab 我們會學到 - 如何設定 grafana data source provisioning - 如何設定 grafana dashboard provisioning --- ### Provision data source source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/datasource.yml {%gist jeremy-wang-lin/d41606137494e77a22581ac5d0ec9a7a %} --- ### Provision data source (Cont'd) In docker compose file, specify volume mapping to mount `datasource.yml` to path `/etc/grafana/provisioning/datasources/datasource.yml` inside grafana container ```yaml my-grafana: image: grafana/grafana:8.3.6 volumes: - ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml ``` --- ### Provision dashboard Prepare dashboard provider YAML file. Source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/dashboard-provider.yml {%gist jeremy-wang-lin/35d8c7a1eed4e262f180267bfa957ac0 %} --- ### Provision dashboard (Cont'd) Prepare dashboard JSON file: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-2/dashboard-prometheus.json --- ### Provision dashboard (Cont'd) Mount configuration files to grafana container: ```yaml my-grafana: image: grafana/grafana:8.3.6 volumes: - ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yml - ./dashboard-provider.yml:/etc/grafana/provisioning/dashboards/dashboard-provider.yml - ./dashboard-prometheus.json:/var/lib/grafana/dashboards/demo/dashboard-prometheus.json ``` --- ### Run ```bash cd <lab sample code 目錄位置> cd lab-2 docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Data source provisioned Configuration -> Data Sources ![](https://hackmd.io/_uploads/rknph90Iq.png) 可以看到 data source 已經設定好了 ![](https://i.imgur.com/153lACy.png) --- ### Dashboard provisioned Dashboards -> Browse ![](https://hackmd.io/_uploads/rJIn2cRU9.png) 可以看到 dashboard 也是已經設定完成了 ![](https://hackmd.io/_uploads/HkZHp9CU5.png) --- ### Clean up ```bash # 在 lab-2 目錄下 docker-compose down ``` --- ### Reference Grafana provisioning: https://grafana.com/docs/grafana/latest/administration/provisioning/ --- ## Lab 3: Instrument application to expose metrics for Prometheus --- ### What is Instrumentation - 為我們的應用程式戴上「健康手環」，持續記錄應用程式內部狀態 - 目前收到多少的請求 (目前走了多少步) - 目前使用多少記憶體 (現在體重多少) - 目前等待的人數 - 目前登入系統的人數 - 最近一次處理的時間 - ... - 也就是在應用程式加入「建立及更新 metric 的程式碼」 --- ### Why Instrumentation - 最了解應用程式的實際狀況的還是應用程式本身 - 我們所關注的事件、指標 (metric) 一開始在應用程式就記錄下來可能最省事 - DevOps 精神：從源頭根本的地方處理問題 --- ### How to instrument our application for Prometheus - Use Prometheus Client Libary (https://prometheus.io/docs/instrumenting/clientlibs/) - Go - Java - Python - Node.JS - C/C++ - .NET / C# - Ruby --- ### 在這堂 lab 我們會學到 - 如何對 Python 程式做 instrumentation - 如何將我們埋的 metric 搭配 PromQL 計算 traffic, error rate --- ### Python sample code Let's trace the sample code: `lab-3/my-app.py` https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/my-app.py --- ### How to run the sample code 需安裝的 package ``` pip install prometheus-client ``` Run command ``` python my-app.py ``` --- ### Containerizing the sample application Dockerfile: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/Dockerfile ```dockerfile= FROM python:3.6.15-slim RUN pip install prometheus-client COPY ./my-app.py /app/ CMD ["python", "app/my-app.py"] ``` --- ### Run with docker compose docker compose file: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/docker-compose.yml ``` ... my-app: build: dockerfile: Dockerfile context: . ports: - 8000:8000 - 8123:8123 ``` Run command ```bash cd <lab sample code 目錄位置> cd lab-3 # 每次執行都強制重新 build docker image docker-compose up --build ``` --- ### 來做個實驗吧開另外一個 command line ```bash cd <lab sample code 目錄位置> cd lab-3 sh endless_request.sh ``` 如果是 Windows , 沒有安裝 Linux tool 的套件的話，可以開瀏覽器或 postman 直接打 `http://localhost:8000/foo` (多按幾次) 來測試。 --- ### 連到 Prometheus 打開 browser 連到 `http://localhost:19090` --- ### PromQL basic 查詢 metric 所有時間序列 ``` myapp_requests_total ``` 透過 label 篩選出特定的時間序列 ``` myapp_requests_total{path="/foo"} ``` label 可以帶多組條件 ``` myapp_requests_total{path="/foo", response_code="200"} ``` 參考: https://prometheus.io/docs/prometheus/latest/querying/basics/ --- ### PromQL aggregation 加總所有時間序列的數值 ``` sum(myapp_requests_total) ``` by path 這個 label 做加總 ``` sum by (path) (myapp_requests_total) ``` by 擺前面擺後面都可以 ``` sum (myapp_requests_total) by (path) ``` 參考: https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators --- ### 計算 traffic 先介紹所謂的 range vector ``` myapp_requests_total[1m] ``` 怎麼算 traffic? => 使用 rate function 套用到 range vector 計算每秒增加幾筆 ``` rate(myapp_requests_total[1m]) ``` ==有沒有什麼發現？== <br/> 加總起來看看。這就是整體的 traffic ``` sum(rate(myapp_requests_total[1m])) ``` 也可以 by path 看 traffic ``` sum by (path) (rate(myapp_requests_total[1m])) ``` --- ### 計算整體 error rate 整體 error rate ``` sum(myapp_requests_total{response_code="500"}) / sum(myapp_requests_total) ``` by path 看整體 error rate ``` sum by (path) (myapp_requests_total{response_code="500"}) / sum by (path) (myapp_requests_total) ``` --- ### 計算 error rate 隨著時間的變化過去一分鐘請求量增長速度 (分母) ``` sum by (path) (rate(myapp_requests_total[1m])) ``` 過去一分鐘 error 增長速度 (分子) ``` sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) ``` 兩者相除即算出 error rate ``` sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) / sum by (path) (rate(myapp_requests_total[1m])) ``` --- ### 如何設定 Monitor 告警機制 1. 在 Prometheus 設定檔指定 rule file 2. 在 rule file 設定 alerting criterial 3. 在 Prometheus 設定檔指定 Alertmanager 4. 在 Alertmanager 設定檔指定如何發送 alert --- ### 在 Prometheus 設定檔指定 rule file source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/6ef24b92556e084643446f6c1c8654458a50d1f4/lab-3/prometheus.yml#L12 ```yaml= rule_files: - "alert-rules.yml" ... ``` --- --- ### 設定 alert rule alert rule 也是透過 YAML 來定義，以下是 alert rule 設定範例 source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/alert-rules.yml 當監控對象問不到時發 alert ```yaml= groups: - name: My alert rules rules: - alert: PrometheusTargetMissing expr: up == 0 for: 1m labels: severity: warning annotations: summary: 'Prometheus scrape target is missing. instance: {{ $labels.instance }} ' ``` --- ### 設定 alert rule (Cont'd) 當 error rate 高於 0.1, 持續 2 分鐘時發 alert ```yaml= - alert: MyAppHighErrorRate expr: > (sum by (path) (rate(myapp_requests_total{response_code="500"}[1m])) / sum by (path) (rate(myapp_requests_total[1m])) ) > 0.1 for: 2m labels: severity: warning annotations: summary: 'My Application has high error rate. VALUE: {{ $value }} ' ``` --- ### 在 Prometheus 設定檔指定 Alertmanager source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/6ef24b92556e084643446f6c1c8654458a50d1f4/lab-3/prometheus.yml#L6 ```yaml= alerting: alertmanagers: - static_configs: # 指定 compose file 的 service 名稱 (my-alertmanager), 以及 container port 9093 - targets: [my-alertmanager:9093] ... ``` --- ### 設定 Alertmanager source: https://github.com/jeremy-wang-lin/nctu-course-on-monitoring/blob/main/lab-3/alertmanager.yml ```yaml= route: receiver: email-me receivers: - name: email-me email_configs: - to: <your gmail> from: <your gmail> smarthost: smtp.gmail.com:587 auth_username: <your gmail> auth_identity: <your gmail> auth_password: <google app password> ``` 更多 Alertmanager 設定請參考: https://prometheus.io/docs/alerting/latest/configuration/ --- ### 如何將 alet 寄到 gmail account 參考這兩篇文章: https://ithelp.ithome.com.tw/articles/10271492 https://www.robustperception.io/sending-email-with-the-alertmanager-via-gmail --- ### 設定 info metric python sample code ```python= from prometheus_client import Gauge app_info = { 'app': 'myapp', 'author' : 'Jeremy Lin', 'author_email': 'alucard.lin@gmail.com', 'version' : '0.0.1' } INFO = Gauge('myapp_info', 'Demo info metric to record application info', labelnames=app_info.keys()) INFO.labels(**app_info).set(1) ``` 打開 http://localhost:8123/metrics exposed metric ``` # HELP myapp_info Demo info metric to record program info # TYPE myapp_info gauge myapp_info{app="myapp",author="Jeremy Lin",author_email="alucard.lin@gmail.com",version="0.0.1"} 1.0 ``` 練習：請將 author, author_email 改成自己的，然後重跑看看 --- ### 接下來請試著做 Grafana dashboard 吧需要顯示的資訊 - 程式作者, 作者 email - error rate 每分鐘的變化 --- ### Clean up ```bash # 在 lab-3 目錄下 docker-compose down ``` --- ## Lab 4: Install Node Exporter and config Prometheus and Grafana --- ## Lab 4-1: Linux --- ### Install and run Node Exporter in Linux 參考: https://prometheus.io/docs/guides/node-exporter/#installing-and-running-the-node-exporter 執行下列指令: ```bash # dowloand wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz # extract it tar xvzf node_exporter-1.3.1.linux-amd64.tar.gz # run cd node_exporter-1.3.1.linux-amd64 ./node_exporter ``` 打開 browser 連到 `http://localhost:9100` --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd linux docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/Sk84IdWL5.png) --- ### Clean up ```bash # 在 lab-4/linux 目錄下 docker-compose down ``` --- ## Lab 4-2: MacOS --- ### Install and run Node Exporter in Mac 執行下列指令: ```bash # dowloand wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.darwin-amd64.tar.gz # extract it tar xvzf node_exporter-1.3.1.darwin-amd64.tar.gz # run cd node_exporter-1.3.1.darwin-amd64 ./node_exporter ``` 打開 browser 連到 `http://localhost:9100` --- ### Install and run Node Exporter in Mac (using brew) Install Command: ``` brew install node_exporter ``` Run: ``` brew services restart node_exporter ``` 打開 browser 連到 `http://localhost:9100` Stop: ``` brew services stop node_exporter ``` --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd mac docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/Sk84IdWL5.png) --- ### Clean up ```bash # 在 lab-4/mac 目錄下 docker-compose down ``` --- ## Lab 4-3: Windows (Windows Expoter) --- ### Install Windows Exporter for Windows Download: ``` https://github.com/prometheus-community/windows_exporter/releases/download/v0.18.1/windows_exporter-0.18.1-amd64.msi ``` 執行 msi 檔案。成功後會啟動並註冊 windows_exporter 到「服務」： ![](https://i.imgur.com/ag1viX6.png) 打開 browser 連到 `http://localhost:9182` Stop: 到服務去停止 windows_exporter 即可 --- ### Run with Prometheus and Grafana ``` cd <lab sample code 目錄位置> cd lab-4 cd windows docker-compose up -d ``` --- ### 連到 Grafana GUI 瞧瞧吧打開瀏覽器連到 http://localhost:13000 使用預設帳密 username: admin password: admin --- ### Dashboard 可以在 Demo 目錄下找到這個 dashboard ![](https://hackmd.io/_uploads/rk2kJd-8q.png) --- ### Clean up ```bash # 在 lab-4/windows 目錄下 docker-compose down ``` --- ### Reference Simple step-by-step guide to install Prometheus and node exporter: https://prometheus.io/docs/guides/node-exporter/ Node exporter: https://github.com/prometheus/node_exporter Windows exporter: https://github.com/prometheus-community/windows_exporter --- ## Lab 5 (Optional): Deploy Prometheus and Grafana to Google Kubernetes Engine (GKE) --- ### 佈署Prometheus & Grafana Connect to google cloud console https://console.cloud.google.com/ Select your project (lab group) ![](https://i.imgur.com/bMrQ0n5.png) 點選左邊 "Marketplace" 搜尋 "prometheus", 選擇 "Prometheus & Grafnan" ![](https://i.imgur.com/nlyT4HZ.png) 點選 "設定" ![](https://i.imgur.com/zH3zs17.jpg) 選擇部署GKE Cluster ![](https://i.imgur.com/7dIcWVB.jpg) 選擇"StorageClass", 按下"部署". ![](https://i.imgur.com/Li6kCdg.jpg) 待其佈署完成. 最多需15分鐘. ![](https://i.imgur.com/Ssk7CJB.jpg) 完成畫面. ![](https://i.imgur.com/K3sUKm8.png) --- ### 如何進入 Prometheus 進入 cluster console. ![](https://i.imgur.com/V29HpaW.png) 使用 port forwarding. https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/ 輸入 "kubectl port-forward pod/prometheus-1-prometheus-0 8080:9090" 點選右上角"網頁預覽" 即可進入 Prometheus. ![](https://i.imgur.com/GZqvJ5J.png) ![](https://i.imgur.com/TgAX1aW.png) --- ### 如何進入 Grafana 步驟如 Prometheus 但 forword port 要改為如下. kubectl port-forward pod/prometheus-1-grafana-0 8080:3000 ![](https://i.imgur.com/uFRJlpU.png) ![](https://i.imgur.com/T8wEfW2.png) Grafana 管理帳號為admin, 密碼存在vault 中. 取出command 如下. ![](https://i.imgur.com/jE8WF6O.png) ``` kubectl get secret prometheus-1-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo ``` ![](https://i.imgur.com/FqDCtZ1.png) --- ## Lab 6 (Optional): Deploy Metric exporter to Google Kubernetes Engine (GKE) --- ### 佈署Prometheus & Grafana 可參考此篇介紹. https://cloud.google.com/kubernetes-engine/docs/quickstarts/deploy-app-container-image#python 進入 cluster console. git clone https://github.com/wcncymr/metric-exporter.git ![](https://i.imgur.com/lfourAa.png) rename dockerignore to .dockerignore mv dockerignore to .dockerignore ![](https://i.imgur.com/7A00JNd.png) 如果沒有Repo需先 create. 需修改 PROJECT_ID & LOCATION gcloud artifacts repositories create hello-repo \ --project=PROJECT_ID \ --repository-format=docker \ --location=LOCATION \ --description="Docker repository" Build your container image. 以下command會參考Dockerfile內容 build image. gcloud auth configure-docker ![](https://i.imgur.com/cNcw23P.png) gcloud builds submit \ --tag asia-east1-docker.pkg.dev/trainer-lib/hello-repo/hellometric . ![](https://i.imgur.com/1daQ9qO.png) ![](https://i.imgur.com/Dn99rqo.png) Deploy the resource to the cluster: kubectl apply -f deployment.yaml kubectl get deployments kubectl get pods ![](https://i.imgur.com/fhYLpxX.png) 如果build失敗. 檢查看是否沒指定 project. gcloud config set project trainer-lab Deploy a Service to access from chrome. kubectl apply -f service.yaml kubectl get services service end point 會花幾分鐘才會建立 ![](https://i.imgur.com/9aYTFBv.png) service 建立成功. 點選即可開啟 APP. ![](https://i.imgur.com/pYIJ3zg.png) ![](https://i.imgur.com/J6xmoyI.png) URL 加上 metrics 即可看到產生的metrics. ![](https://i.imgur.com/VM4F5HB.png) 再回到 Prometheus 即可查到有資料進來. 但 Prometheus 是如何scrape這些資料的 ? ![](https://i.imgur.com/sOo6HM5.png) 透過在 service 中加入 annotations: prometheus.io/scrape: "true" 來讓 Prometheus 知道要來scrape metrics. ![](https://hackmd.io/_uploads/r1Pbua4dq.png) --- ## Lab 7 (Optional): Create a Kubernetes cluster on Google Cloud. --- ### 建立GKE 方法一選擇要建立GKE的Project. ![](https://hackmd.io/_uploads/rk79ybmD9.png) 進入Kubernetes Engine ![](https://hackmd.io/_uploads/SJOJxW7w9.png) 點建立 ![](https://hackmd.io/_uploads/r1d4EWmwq.png) 選GKE Standard ![](https://hackmd.io/_uploads/Hk-KeWmDc.png) 修改名稱及區域即可建立. ![](https://hackmd.io/_uploads/HJpuZWmv9.png) ### 建立GKE 方法二也可使用gcloud command 來建立. https://cloud.google.com/kubernetes-engine/docs/quickstarts/deploy-app-container-image#standard 進入Cloud Shell 執行下面命令. ![](https://hackmd.io/_uploads/B1KIEbXwc.png) gcloud container clusters create helloworld-gke --num-nodes 1 --zone asia-east1 --- ## Reference Docker Compose: https://docs.docker.com/compose/ Grafana provisioning: https://grafana.com/docs/grafana/latest/administration/provisioning/ Simple step-by-step guide to install Prometheus and node exporter: https://prometheus.io/docs/guides/node-exporter/ Node exporter: https://github.com/prometheus/node_exporter Deploy Promethes, Grafana on GKE: https://ithelp.ithome.com.tw/articles/10224491 https://ithelp.ithome.com.tw/articles/10224364 --- ## Backup ---

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.