ADR 001: 引入Redis作為資料快取層以支援高QPS需求

# ADR: 引入Redis作為資料快取層以支援高QPS需求 ## 狀態評估中 ## 決策者數據架構團隊 ## 日期 2024-04-26 ## 問題 ### 可預期流量 [hackmd 4/15統計數字](https://hackmd.io/X7qRKrU5ScW3WTgQnGfjBw) 目前的用戶數據API在iBus開發與測試期間的QPS平均為26，最高為56。考慮到初期成長（預計增長20% iBus用戶），預估流量為開發與測試用量的100倍，QPS約為2,600至5,600。穩定成長期的QPS可能達到13,000至28,000。如果APIM Cache能處理掉80%的流量，實際需求的QPS範圍為520至1,200，穩定成長期的QPS可能達到2,600至5,600。 | 成長階段 | 原始QPS範圍 | 經過APIM Cache後的QPS範圍 | |---------------|----------------|------------------------| | 開發與測試階段 | 26至56 | 5.2至11.2 | | 初期成長 | 2,600至5,600 | 520至1,200 | | 穩定成長期 | 13,000至28,000 | 2,600至5,600 | ### 關於QPS限制雖然可以透過限制各用戶的QPS來應對使用量，但總QPS會因用戶數量的增加而持續往上調整，以iBus開發現況，初期成長將預期在前期時有QPS不足而有等待的狀況，遇到效能瓶頸。 ## 現況目前架構如需橫向擴展，需要對Druid、SQLServer、Trino進行昂貴且複雜的資源擴展。建議引入一個資料快取層，以應對未來可預期的高流量需求。 ## 解決方法 **建議方案**：選用Redis作為資料快取層。目前資料特性主要為非即時與半即時，並且是只讀型（Readonly）。建議當資料有變更時，由Trino同步資料至Redis。 ## 功能係需求 - 架構 - **ETL發起CDC的架構（ETL-CDC）** Option1 - **DB發起CDC的架構（DB-CDC）** Option2 [Redis Data Cache - mermaidchart](https://www.mermaidchart.com/app/projects/da6b7dc1-6ff7-4b79-8463-1c73bfcd6bf0/diagrams/b0d43eb8-b2e4-4730-a416-f85225f9da77/version/v0.1/edit) ![image](https://hackmd.io/_uploads/rkdHr-Bb0.png) ### Redis OData 支援度: 當Redis語法支援Odata時，優先從Redis讀取資料，不支援時再使用Druid、SQLServer、Trino的資源。 ✅ - 完全支援 :x: - 不支援 :construction: - 需額外加工 **OData 參數** |select fields|top|skip|count|orderby|filter| |---|---|---|---|---|---| |✅|✅|✅|✅|✅|:construction:| **OData - filter計算元** |and|or|eq|gt|lt|ge|le|ne| |---|---|---|---|---|---|---|---| |✅|✅|✅|✅|✅|✅|✅|✅| **OData - filter參數** |geo.distance|geo.contains|contains|substring|in(decimal)|in(string) |---|---|---|---|---|---| |✅|:x:|✅|:construction:|✅|✅| ### 簡易效能檢測 **本地端測試**: 使用 flask 在本地端進行 api 壓力測試 > 1. 模擬Azure APIM基本網路的延遲 > |APIs|平均(ms)|標準差|min|max|中位數| |-:|-:|-:|-:|-:|-:| |N1|106.9|77.1|43|406|87| |市區公車即時人數|49.6|7.23|45|58|46| |全台市區鄉鎮界圖資|2,890|:warning:6,128|1|2670|214| > 2. 套用延遲時間，並進行壓力測試 > |APIs|type|VU|Duration (min)|RPS|Avg. Response Time(ms)|Total Requests|Error %| |---:|---:|---:|---:|---:|---:|---:|---:| |N1|fixed|50|1|42.79|129|2,887|0| |N1|fixed|50|3|43.79|133|8,210|0| |市區公車即時人數|fixed|50|1|46.50|58|3,096|0| |市區公車即時人數|fixed|50|3|47.28|59|8,842|0| |全台市區鄉鎮界圖資|fixed|50|1|9.04|3,991|609|0| |全台市區鄉鎮界圖資|fixed|50|3|8.73|4,502|1,630|0| :::warning 目前本地端測試缺點： 1. 沒有考量資料庫的更新。 2. 參數部分不夠全面。 ::: ## 非功能需求 - HA、Failover、Single Point Failure - 若不使用debezium，無特殊安全、高可用性和容錯需求。 ## 相依性 - **CDC相依性** - **ETL-CDC**: 依賴於ETL發起Event，需要開發並定義規格。 - **DB-CDC**: 依賴於DB本身的CDC功能，並非所有DB都具備。 ## 優缺點 - **資料快取層整體** - **優點**: - Redis於1MiB資料有高吞吐量(1 GigE為網路限制) - 擴張成本低(記憶體與人力) - 僅讀取的場景適合 ![image](https://hackmd.io/_uploads/HkSt_bSbA.png) - **缺點**: - Redis有部分Odata與法不支援(降為Trino執行) - Throughput大的資料不適合 - 額外的DataAPI程式碼 - **ETL-CDC** - **優點**: 無DB依賴，靈活，可供未來其他服務使用。 - **缺點**: 手動改變DB資料時，需手動清除Redis快取，或等待下次同步。 - **DB-CDC** - **優點**: 更新自動反映於Redis，保持資料最新。 - **缺點**: 需要為每種支援的DB研究CDC規則，創造一次性開發工作。 ## 建構技術 - 使用技術：Redis、Database CDC（可選擇使用debezium）。 - SQLserver: ChangeTracking ## 架構詳細流程圖 [eraser flowchart](https://app.eraser.io/workspace/HfpEYJRPJbsBBJH0e8z5) ### 流程 1. 當資料庫有更新時觸發到 trino 獲取和該 TABLE 有關的所有 VIEW 的最新資料，接著將資料匯入 Redis-stack 中。 2. 定期更新各 redis-stack 中 index 的 schema ### 待解: 1. 如何讓資料庫主動通知 Data Cache (Redis PUB/SUB?) 2. 在測試機上如何實現 ![image](https://hackmd.io/_uploads/HJKhTchWR.png) --- ## 簡易效能檢測 **本地端測試**: 使用 flask 在本地端進行 api 壓力測試，後端處理部分, 在每次收到 request 時先隨機產生 ==**x**== 毫秒的延遲，藉此模擬在測試機的 client time。 ==**x**== 的產生主要依據在測試機上的 client time 時間，下面表格為 Azure ApiManagementGatewayLogs 整理結果 ```Query= ApiManagementGatewayLogs | where OperationId contains "_abfs_dal_v_stg_tdx_map_district_boundary_abfs_dal_v_stg_tdx_map_district_bo" | extend ApimTime = TotalTime - (BackendTime + ClientTime) | project TimeGenerated, TotalTime, BackendTime,ClientTime, ApimTime, CacheTime, BackendMethod, BackendUrl, BackendResponseCode, Url, ResponseCode, LastErrorMessage, LastErrorReason, LastErrorSource, LastErrorSection, ApiId, ProductId, OperationId | render timechart ``` |APIs|平均(ms)|標準差|min|max|中位數| |-:|-:|-:|-:|-:|-:| |N1|106.9|77.1|43|406|87| |市區公車即時人數|49.6|7.23|45|58|46| |全台市區鄉鎮界圖資|2,890|:warning:6,128|1|2670|214| 並依據上述數值建立常態分佈並隨機產生數值，該值即做為該次本地端 request 的 Client Time。三支 API 的 URL ```url! N1 http://127.0.0.1:5000/v_stg_tdx_estimatedtimeofarrival_pt1m \ ?routeid={{routeid}}&stopname_zh_tw={{routename}} 市區公車即時人數 http://127.0.0.1:5000/v_stg_ibus_gateway?routeid={{routeid}} \ &platenumber={{platenumber}}&direction={{random_direction}} 全台市區鄉鎮界圖資 http://127.0.0.1:5000/v_stg_tdx_map_district_boundary \ ?cityname={{cityname}}&townname={{townname}} ``` |APIs|type|VU|Duration(min)|RPS|Avg. Response Time(ms)|Total Requests|Error%| |---:|---:|---:|---:|---:|---:|---:|---:| |N1|fixed|50|1|42.79|129|2,887|0| |N1|fixed|50|3|43.79|133|8,210|0| |市區公車即時人數|fixed|50|1|46.50|58|3,096|0| |市區公車即時人數|fixed|50|3|47.28|59|8,842|0| |全台市區鄉鎮界圖資|fixed|50|1|9.04|3,991|609|0| |全台市區鄉鎮界圖資|fixed|50|3|8.73|4,502|1,630|0.00 :::warning 目前本地端測試缺點： 1. 沒有考量資料庫的更新。 2. 參數部分不夠全面。 ::: ### Redis OData 支援度: 當Redis語法支援Odata時，優先從Redis讀取資料，不支援時再使用Druid、SQLServer、Trino的資源。 ✅ - 完全支援 :x: - 不支援 :construction: - 加工 :::danger 這邊指的 ✅ 完全支援代表可以容易將 OData 參數字串切割，:construction: 加工部份則是會需要額外處理。 ::: **normal** |API 名稱|select fields|top|skip|count|orderby|filter| |---|---|---|---|---|---|---| |n1|✅|✅|✅|✅|✅|:construction:| |圖資|✅|✅|✅|✅|✅|:construction:| |銓鼎|✅|✅|✅|✅|✅|:construction:| ```python # select fields result = rs.search(Query('*').return_fields("@routename_zh_tw","@routeid")).docs # 能完全支援的 filter 例如型別 str, int result = rs.search(Query("@stopname_zh_tw: '捷運美麗島站'")) reuslt = rs.search(Query("@direction: 0")) # top + skip + orderby result = rs.search(Query("*").paging(skip, top).sort_by("direction", asc=False)) # count count = rs.search(Query("*")).total ``` **filter function** |API 名稱|geo_distance|geo_contains|contains|substring| |---|---|---|---|---| |n1|✅|:x:|✅|:construction:| |圖資|✅|:x:|✅|:construction:| |銓鼎|✅|:x:|✅|:construction:| ```python # geo_distance geodist(name, place1, place2, unit=None) # geo_contains 尚未確定 # contains result = rs.search(Query("@stopname_zh_tw: '*美麗*'")) 只要欄位 stopname_zh_tw 含有美麗就會match 真美麗、好美麗、美麗島 etc # substring: # e.g. substring('stopname_zh_tw',2,3) result = rs.search(Query("*")) data = [docs['json'] for docs in result] start_idx = 2 length = 3 for d in data: d = json.loads(d) print(d['stopname_zh_tw'][start_idx:start_idx+length+1]) # 美麗島 ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.