Python / prometheus_client

Python / prometheus_client === ###### tags: `Python` ###### tags: `Python`, `prometheus_client`, `grafana` [TOC] ## 官網 ### [[Github] Prometheus Python Client](https://github.com/prometheus/client_python#gauge) - 安裝方式：`pip install prometheus-client` ### [[doc] https://prometheus.io/](https://prometheus.io/) - [CONCEPTS](https://prometheus.io/docs/concepts/) - [METRIC TYPES](https://prometheus.io/docs/concepts/metric_types/) - [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) - [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge) - [Histogram](https://prometheus.io/docs/concepts/metric_types/#histogram) - [Summary](https://prometheus.io/docs/concepts/metric_types/#summary) - [PROMETHEUS](https://prometheus.io/docs/prometheus/) - [GETTING STARTED](https://prometheus.io/docs/prometheus/latest/getting_started/) <hr> ## 使用指南 ### 數據型態的選擇 - **數據會隨時間變動，例如：CPU/GPU/RAM 使用率、溫度、濕度、庫存、不同 port 的流量值等 -> ==**Gauge**==** - **單一維度** -> 不使用 labels (沒有 key) - RAM - **多個維度** -> 使用 labels (可視為 sub-key) - **CPU**：CPU0, CPU1, CPU2, CPU3 - **GPU**：GPU0, GPU1, GPU2, GPU3 - **溫度**：市區、郊區、Taipei, Taichung, Tainan (等不同地點) - **port**：443, 80 (流量統計) - **endpoint 統計**：handler, method, status_code - {handler="/return/200", method="GET", status="200"} - {handler="/return/404", method="POST", status="400"} - **數據不會隨時間變動，例如：版號、組態設定 -> ==**Info**==** - 資料來源放 dict 物件 ### 範例1：Gauge + 單一維度 ```python= from prometheus_client import Gauge, start_http_server import time g1 = Gauge('my_memory_gauge', '描述記憶體使用量') g1.set(60.5) g1.inc(20) # = 80.5 g1.dec(50) # = 30.5 start_http_server(8000) while True: time.sleep(1) ``` ![](https://hackmd.io/_uploads/rkYbZe6Fa.png) ### 範例2：Gauge + 多個維度 - GPU 案例 ```python= from prometheus_client import Gauge, start_http_server import time g2 = Gauge('my_gpu_gauge', '描述 GPU 使用量', ['index']) g2.labels(index=0).set('100') g2.labels(index=1).set('99.9') g2.labels(index=2).set('0') g2.labels(index=3).set('0.1') start_http_server(8000) while True: time.sleep(1) ``` ![](https://hackmd.io/_uploads/ryzdGxTt6.png) ### 範例3：Gauge + 多個維度 - 溫度案例 ```python= from prometheus_client import Gauge, start_http_server import time from math import cos g3 = Gauge('my_temperature_gauge', '不同地區的溫度', ['city', 'district']) start_http_server(8000) while True: g3.labels(city='台南市', district='中西區').set( round(28 + 3 * cos(time.time()), 1) ) time.sleep(0.04) g3.labels(city='台南市', district='東區').set( round(26 + 3 * cos(time.time()), 1) ) time.sleep(0.04) g3.labels(city='台南市', district='北區').set( round(25 + 3 * cos(time.time()), 1) ) time.sleep(1) ``` 每次查詢結果都會不同： ![](https://hackmd.io/_uploads/S1LWnxpK6.png) ![](https://hackmd.io/_uploads/Syd7hgTYp.png) ![image](https://hackmd.io/_uploads/Bybx6e6Yp.png) ### 範例4：Info - build 版號 ```python= from prometheus_client import Info, start_http_server import time i = Info('my_build_info', '版號描述') i.info({'version': '1.2.3', 'build-host': '10.78.26.241'}) start_http_server(8000) while True: time.sleep(1) ``` ![](https://hackmd.io/_uploads/B1Xee40Y6.png) - Info 的 name，會自動補上 `_info` - `my_build_info` -> `my_build_info_info` - 所以 Info 不需要像 Gauge 那樣添加字尾 `_gauge`，修正後的結果如下： ```python i = Info('my_build_info', '版號描述') i.info({'version': '1.2.3', 'build-host': '10.78.26.241'}) ``` ![image](https://hackmd.io/_uploads/SJg2SM4CK6.png) <hr> ## 實際案例 ### 案例1：Gauge - python_info ( CPython 版號 ) - prometheus_fastapi_instrumentator 在 FastAPI /metrics 上所輸出的例子 ![](https://hackmd.io/_uploads/r1awwrAKa.png) 這個 python_info 案例，卻使用 Gauge 類型，是有些奇怪... - 呈現效果 (文字模式) ``` # HELP python_info Python platform information # TYPE python_info gauge # python_info{implementation="CPython",major="3",minor="10",patchlevel="13",version="3.10.13"} 1.0 ``` - 對應程式碼 ```python= python_info = Gauge( 'python_info', 'Python platform information', ['implementation', 'major', 'minor', 'patchlevel', 'version']) python_info.labels( implementation="CPython", major="3", minor="10", patchlevel="13", version="3.10.13" ).set(1.0) ``` <hr> ## 與 FastAPI 整合 ### 安裝套件 ```bash= pip install fastapi pip install prometheus-fastapi-instrumentator pip install uvicorn ``` - [[github] Prometheus FastAPI Instrumentator](https://github.com/trallnag/prometheus-fastapi-instrumentator/blob/master/README.md) ### 新增 Prometheus 監控 ```python= from fastapi import FastAPI from prometheus_fastapi_instrumentator import Instrumentator # 建立 FastAPI 應用程式實例 app = FastAPI() # 為 FastAPI 應用程式新增 Prometheus 監控 Instrumentator().instrument(app).expose(app) # 範例路由 @app.get("/") async def read_root(): return {"Hello": "World"} # 其他業務邏輯路由... if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000) ``` - 瀏覽 http://localhost:8000/metrics - 顯示 /metrics 存取次數 [![](https://hackmd.io/_uploads/ByRdyMTYT.png)](https://hackmd.io/_uploads/ByRdyMTYT.png) ### 新增 Prometheus 監控 - 在 `lifttime.py` 的使用方式 ```python= from prometheus_fastapi_instrumentator import Instrumentator instrumentator = Instrumentator().instrument(app) @app.on_event("startup") async def _startup(): instrumentator.expose(app) ``` - 使用方式 [[官網] prometheus-fastapi-instrumentator / README.md](https://github.com/trallnag/prometheus-fastapi-instrumentator/blob/master/README.md) - 若在 startup 時才建立 instance，會發生錯誤： `RuntimeError: Cannot add middleware after an application has started` ### 呈現 endpoint 資訊 - ### http_request ![](https://hackmd.io/_uploads/B17TF519a.png) - ### http_response ![](https://hackmd.io/_uploads/H10b9cy96.png) - ### 所沒有呈現 endpoint 資訊 - 不知道哪邊影響內部預設值 - 手動設定： ```python= from fastapi import FastAPI, Response, status @app.get('/') def read_root(): return 'hello' @app.get('/return/200') def return_200(): return Response(status_code=status.HTTP_200_OK) @app.get('/return/201') def return_201(): return Response(status_code=status.HTTP_201_CREATED) @app.get('/return/404') def return_404(): return Response(status_code=status.HTTP_404_NOT_FOUND) # Adding metrics # https://github.com/trallnag/prometheus-fastapi-instrumentator/blob/master/README.md#adding-metrics from prometheus_fastapi_instrumentator import Instrumentator instrumentator = Instrumentator().instrument(app) instrumentator.add( metrics.request_size( should_include_handler=True, should_include_method=False, should_include_status=True, metric_namespace="a", metric_subsystem="b", ) ).add( metrics.response_size( should_include_handler=True, should_include_method=True, should_include_status=True, metric_namespace="namespace", metric_subsystem="subsystem", ) ) ``` - [官方] Adding metrics https://github.com/trallnag/prometheus-fastapi-instrumentator/blob/master/README.md#adding-metrics <hr> ## 使用 Grafana 觀看 /metrics ### Step1: 起 Grafana 服務 - #### 前景模式 `$ docker run --rm -p 3000:3000 grafana/grafana` - #### 背景模式 `$ docker run --rm -d -p 3000:3000 grafana/grafana` - 要對接的 port 自行設定 ### Step2: 前往 http://10.78.26.241:3000 ![](https://hackmd.io/_uploads/rJW4RJ-qT.png) - 預設帳密：admin / admin ### Step3: 設定 Grafana 資料來源：端點 `/metrics` - #### 設定連線來源 ![](https://hackmd.io/_uploads/SJpPty-qp.png) ![](https://hackmd.io/_uploads/B1vOK1W56.png) - #### [錯誤示範] :warning: Grafana 不能直接對接上端點 `/metrics`，會有錯誤： - [Connection] Prometheus server URL: http://10.78.26.241:40099 ![](https://hackmd.io/_uploads/HJGsRy-5T.png) ![](https://hackmd.io/_uploads/Hk34cJ-qp.png) - 按儲存與測試，會引發錯誤： 404 Not Found - There was an error returned querying the Prometheus API. ![](https://hackmd.io/_uploads/H1PUqJZ9p.png) ![](https://hackmd.io/_uploads/S1Gj5yZcp.png) - #### [正確示範] 要先建立 Prometheus 服務 1. **準備 `prometheus.yml`，導入自定義的資料來源** ```yaml= scrape_configs: - job_name: 'fastapi' static_configs: - targets: ["10.78.26.241:40099"] ``` - 原生資料來源為：`localhost:9090` - 原生的預設定內容： ``` /etc/prometheus $ cat prometheus.yml ``` ```yaml= # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: "prometheus" # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ["localhost:9090"] ``` 2. **使用 docker 建立 Prometheus 服務** ```bash $ docker run --rm -p 9090:9090 \ -v ./prometheus.yml:/etc/prometheus/prometheus.yml \ prom/prometheus ``` 3. **前往 http://10.78.26.241:9090/ (Prometheus Web) 測試** ![](https://hackmd.io/_uploads/SyB3Jlbca.png) - **查看 http://10.78.26.241:40099/metrics** ![](https://hackmd.io/_uploads/H1T8elWq6.png) - **在 prometheus web UI 測試** ![](https://hackmd.io/_uploads/ryjWxg-cT.png) -> 有資料 -> PASS 4. **回到 http://10.78.26.241:3000/ (Grafana Web) 測試** ![](https://hackmd.io/_uploads/rkXiWgb5a.png) ![](https://hackmd.io/_uploads/Hkxabe-cT.png) ### Step4: 建立 Dashboard 1. ### 點選 Menu 中的 Dashboards 功能 [![](https://hackmd.io/_uploads/B18LXWb9a.png)](https://hackmd.io/_uploads/B18LXWb9a.png) 2. ### 點選 Create Dashboard [![](https://hackmd.io/_uploads/Bk-O7Z-ca.png)](https://hackmd.io/_uploads/Bk-O7Z-ca.png) 3. ### 添加可視化圖示 [![](https://hackmd.io/_uploads/BJPRQbZcT.png)](https://hackmd.io/_uploads/BJPRQbZcT.png) 4. ### 選擇資料來源：Prometheus [![](https://hackmd.io/_uploads/HyDyEZ-q6.png)](https://hackmd.io/_uploads/HyDyEZ-q6.png) 5. ### 點選指標，並選擇想要的指標 [![](https://hackmd.io/_uploads/BJ0-N-b5a.png)](https://hackmd.io/_uploads/BJ0-N-b5a.png) [![](https://hackmd.io/_uploads/BJsH4Wb96.png)](https://hackmd.io/_uploads/BJsH4Wb96.png) 6. ### 執行 query，並儲存 [![](https://hackmd.io/_uploads/Hye_Sb-cp.png)](https://hackmd.io/_uploads/Hye_Sb-cp.png) <hr> <hr> ## ChatGPT4 > 參考資訊 ### prometheus 的數據類型有幾種 type ? Prometheus 支持多種不同的數據類型，每種類型適用於不同的監控場景和目的。主要的數據類型包括： - **Counter**： - **用途**：用於表示一個只能增加、不能減少的計數器。它適合用於記錄累積數值，比如請求次數、完成的任務數、錯誤發生的次數等。 - **特點**：當 Prometheus 重新啟動時，Counter 會重置。 - **Gauge**： - **用途**：用於表示可以任意上升或下降的數值。適合用於測量可能上下變動的數據，如溫度、當前記憶體使用量、磁碟空間等。 - **特點**：Gauge 顯示的是在某一時刻的值。 - **Histogram**： - **用途**：用於對數據進行取樣（比如請求持續時間或響應大小），並將它們劃分到配置好的桶（buckets）中。非常適合用於跟踪事件的分佈情況。 - **特點**：Histogram 提供了累計計數器，顯示落入各個桶中的觀測值的總數，以及所有觀測值的總和。 - **Summary**： - **用途**：與 Histogram 類似，Summary 也用於收集和計算指標的樣本，比如請求持續時間和響應大小。不同的是，它計算可配置的百分位數。 - **特點**：Summary 提供觀測值的總和和計數，還可以計算滑動時間窗口的百分位數。 - **Info**： - **用途**：用於暴露一組描述性的標籤信息（通常是靜態的或不經常變化的），比如版本號、配置參數等。 - **特點**：Info 通常用於表示應用的狀態或版本信息。 - **Enum**（可能不在所有 Prometheus 客戶端實現中提供）： - **用途**：用於表示一個枚舉類型的指標，其值是一組預定義的字符串之一。 - **特點**：適用於那些有限狀態集合的指標，比如狀態機的狀態。每種類型都有其特定的用途和場景，選擇合適的數據類型對於有效地表示和監控你的系統至關重要。 ### info, gauge 要如何用？又是什麼情境下使用？在 Prometheus 的 Python 客戶端中，Gauge 和 Info 是兩種不同類型的指標，用於記錄和表示不同的數據。下面我將解釋它們各自的用途和使用情境： - ### Gauge 1. **用途**：Gauge 是一種指標，用於表示可以隨時間上升或下降的值。它適合用於測量那些可以隨時變化的數值，例如溫度、內存使用量、正在運行的進程數等。 2. **使用方法**： - **創建 Gauge**：首先，你需要創建一個 Gauge 實例。 - **設置值**：你可以增加、減少或直接設定 Gauge 的值。 3. **範例代碼**： ```python from prometheus_client import Gauge # 創建一個 Gauge 實例 g = Gauge('my_gauge', '描述信息') # 設定值 g.set(5.5) # 將值設為 5.5 # 增加或減少值 g.inc() # 增加 1 g.dec(3) # 減少 3 ``` - ### Info 1. **用途**：Info 用於表示一組靜態或不經常變化的標籤信息，常用於暴露配置或環境相關的信息，如版本號、配置參數等。 2. **使用方法**： - **創建 Info**：創建一個 Info 標籤集。 - **更新標籤**：通過提供一個字典來更新標籤的值。 3. **範例代碼**： ```python from prometheus_client import Info # 創建一個 Info 實例 i = Info('my_info', '描述信息') # 更新信息 i.info({'version': '1.0.0', 'build_date': '2024-01-01'}) ``` - ### 使用情境 - **Gauge**：當你需要跟踪可能會增加或減少的數值時，使用 Gauge。例如，實時監控系統的 CPU 使用率、記憶體使用量、磁碟空間等。 - **Info**：當你想要暴露應用程序的元數據或配置信息時使用 Info。例如，顯示應用程序的版本號、配置設定或系統環境信息。這兩種指標類型各自適用於不同的監控場景，根據你的監控需求和目的來選擇合適的指標類型。 ### Gauge 中有不同的 label，能舉特定的情境例子來解釋？當使用 Prometheus 的 `Gauge` 指標時，可以透過標籤（labels）來提供更細緻的維度或上下文信息。這些標籤可以幫助你對同一種指標進行更具體的分類和篩選。下面我將通過一個特定情境的例子來解釋如何使用含有不同標籤的 Gauge。 - ### 情境例子：網路服務器的流量監控假設你在管理一個網路服務器，你想要監控這個服務器上不同端口的流量使用情況。這裡的端口可以被視為不同的服務入口，比如 HTTP 服務在 80 端口，HTTPS 服務在 443 端口。你可以使用 `Gauge` 來表示每個端口的流量使用量，並且用標籤來標示不同的端口。 - ### 實現代碼 ```python from prometheus_client import Gauge # 創建一個帶有標籤的 Gauge traffic_gauge = Gauge('server_traffic', 'Traffic on server ports', ['port']) # 假設有一個函數可以獲取特定端口的當前流量 def get_traffic_for_port(port): # 返回該端口的流量數據（示例值） return 100.0 # 假設的流量值 # 更新不同端口的流量數據 traffic_gauge.labels('80').set(get_traffic_for_port('80')) traffic_gauge.labels('443').set(get_traffic_for_port('443')) ``` 在這個例子中，`traffic_gauge` 是一個 `Gauge`，它有一個標籤 `port`。對於每個端口（例如 80 和 443），你會分別設置其對應的流量值。這樣，當你在 Prometheus 中查看這個指標時，你可以根據端口來篩選和比較流量數據。 - ### 使用場景這種帶有標籤的方法非常適合於需要根據某些分類或屬性來分別監控和報告同一種類型指標的情況。在這個例子中，即使流量指標是相同的（即網路流量），但是通過使用端口號作為標籤，你可以更容易地識別和比較不同服務的流量使用情況。 ### 有沒有生活化的具體例子？比如氣象之類的觀測數據？當然，讓我提供一個關於使用 Prometheus 監控氣象數據的生活化例子。這個例子會顯示如何使用 `Gauge` 來監控不同地點的溫度、濕度等氣象指標。 - ### 情境例子：氣象站的溫度和濕度監控假設你要監控一系列氣象站的溫度和濕度數據。這些氣象站分佈在不同的地理位置，例如城市、郊區、山區等。你希望能夠根據地點和氣象指標（溫度或濕度）來收集和分析數據。使用 Prometheus 的 Gauge，你可以輕鬆地按地點和指標類型來監控這些數據，並通過標籤來區分不同的維度。 - ### 實現代碼 ```python= from prometheus_client import Gauge # 創建帶有標籤的 Gauge temperature_gauge = Gauge('weather_temperature', 'Temperature at different locations', ['location']) humidity_gauge = Gauge('weather_humidity', 'Humidity at different locations', ['location']) # 假設有函數可以獲取特定地點的溫度和濕度 def get_temperature(location): # 返回該地點的溫度（示例值） return 22.5 # 假設的溫度值 def get_humidity(location): # 返回該地點的濕度（示例值） return 60 # 假設的濕度值 # 更新不同地點的溫度和濕度數據 temperature_gauge.labels('city').set(get_temperature('city')) humidity_gauge.labels('city').set(get_humidity('city')) temperature_gauge.labels('suburb').set(get_temperature('suburb')) humidity_gauge.labels('suburb').set(get_humidity('suburb')) ``` 在這個例子中，有兩個 Gauge 指標：一個用於溫度，另一個用於濕度。每個指標都有一個標籤 location，用於區分不同地點的數據。通過設置不同地點的溫度和濕度，你可以在 Prometheus 中根據地點來篩選和比較氣象數據。 - ### 使用場景這種多維度的監控方法適用於環境監測、氣象學和地理研究，能夠幫助科學家和研究者收集不同地區的氣象數據，進行氣候變化研究，或者對特定地區進行環境監測。此外，這些數據還可以用於公共服務，比如向公眾提供最新的氣象信息。 ### 已經有 Gauge 類型，為何還要新增 Counter 類型？拿 Gauge 作為 Counter 不就可以？雖然理論上可以用 `Gauge` 來模擬 `Counter` 的行為，但在實踐中，`Gauge` 和 `Counter` 是為了不同的用途和場景而設計的，因此它們在 Prometheus 監控系統中都存在是有其重要原因的。 #### 為什麼需要 Counter 1. **專門設計用於累積計數**：`Counter` 用於表示只會增加（或在重新啟動時重置）的值，非常適合用於計數類型的度量，如請求次數、完成的任務數、發生的錯誤數等。這種單向增加的特性使得 `Counter` 在處理只增不減的計數數據時更為直觀和有效。 2. **防止錯誤使用**：由於 `Counter` 只能增加，它在某種程度上防止了錯誤的使用（如意外減少計數）。這確保了度量的一致性和準確性。 3. **Prometheus 的優化和查詢**：Prometheus 對 `Counter` 進行了特別的優化，特別是在計算速率（例如使用 `rate()` 函數）時。這些優化使得基於 `Counter` 的查詢更有效率，特別是在處理大量數據時。 #### 使用 Gauge 作為 Counter 1. **靈活性**：`Gauge` 可以增加或減少，提供了更大的靈活性。它適用於那些隨時間變化的度量，如溫度、記憶體使用量等。 2. **不適合累積計數**：雖然你可以增加 `Gauge` 的值來模擬 `Counter`，但這並不適合累積計數。因為 `Gauge` 可以隨意減少，所以它不能可靠地表示累積值。 3. **查詢和優化的差異**：在 Prometheus 中，基於 `Gauge` 的查詢與基於 Counter 的查詢有不同的行為和優化。使用不正確的類型可能會導致查詢效率低下或得到錯誤的結果。 #### 總結 - 使用 `Counter` 來精確追蹤那些只增不減的度量，確保數據的準確性和一致性。 - 使用 `Gauge` 來追蹤那些可以增加或減少的度量，提供更大的靈活性。正確選擇度量類型對於獲得有效的監控數據至關重要。這樣，你可以依靠 Prometheus 提供的各種工具和函數來進行高效的數據分析和查詢。 ### [查詢][PromQL] 能對 Gauge 和 Counter 分別舉簡單的查詢例子？當然可以。讓我為你提供一些 Prometheus 查詢語言（PromQL）中使用 `Gauge` 和 `Counter` 的基本示例。這些示例將幫助你理解如何在實際場景中查詢和利用這些指標。 1. Gauge 查詢示例假設你有一個名為 `system_memory_usage_bytes` 的 `Gauge` 指標，用於追蹤系統的內存使用量。你可以直接查詢這個指標來獲得當前的內存使用情況： ```promql system_memory_usage_bytes ``` 此查詢將返回最新的內存使用數據。如果你想要基於特定標籤（例如主機名）查詢，可以這樣做： ```promql system_memory_usage_bytes{host="server01"} ``` 這將返回主機名為 `server01` 的系統的當前內存使用量。 2. Counter 查詢示例假設你有一個名為 `http_requests_total` 的 `Counter` 指標，用於記錄收到的 HTTP 請求總數。由於 `Counter` 只增不減，通常我們會對其進行速率計算，以獲得在一定時間窗口內的變化率，例如每分鐘的請求速率： ```promql rate(http_requests_total[1m]) ``` 這個查詢將計算過去一分鐘內每秒的平均請求速率。同樣，如果你想要根據特定標籤篩選結果，例如查看特定路徑的請求速率，可以這樣做： ```promql rate(http_requests_total{path="/api"}[1m]) ``` 這將返回過去一分鐘中，訪問 `/api` 路徑的平均每秒請求數。 #### 總結 - 使用 `Gauge` 類型指標時，你通常會直接查詢其當前值或在一段時間內的變化。 - 使用 `Counter` 類型指標時，由於其值隨時間僅增不減，所以常見的操作是使用如 `rate()` 或 `increase()` 這樣的函數來計算在一定時間範圍內的變化率或增量。這些基本的查詢示例顯示了如何根據你的監控需求從 Prometheus 指標中提取有價值的信息。這些知識對於構建有效的監控系統和警報規則非常重要。 ## 參考資料 - https://prometheus.io/ - [GRAFANA SUPPORT FOR PROMETHEUS](https://prometheus.io/docs/visualization/grafana/) - https://grafana.com/ - [Prometheus data source](https://grafana.com/docs/grafana/latest/datasources/prometheus/)