# istio 熔斷 熔斷,是創建彈性微服務應用程式的重要模式。熔斷能夠使您的應用程式具備應對來自故障、潛在高峰和其他未知網路因素影響的能力。它通過監控服務的健康狀況並在檢測到問題時切斷流量來實現。這可以防止故障服務拖累整個系統並導致服務不可用。 1. 監控服務健康狀況: Istio 使用各種指標來監控服務的健康狀況,包括請求成功率、響應時間和錯誤率。 2. 檢測故障: 如果監控指標超出預設閾值,Istio 將會將服務標記為不健康。 3. 切斷流量: 當服務被標記為不健康時,Istio 將會切斷對該服務的流量。這意味著新的請求將不會被路由到該服務,現有的請求將會失敗。 4. 故障恢復: Istio 會定期重新評估不健康服務的健康狀況。如果服務恢復正常,Istio 將會重新打開對該服務的流量。 ## 實作 * 需先安裝好 istio * 在 default namespace 設定 istio auto injection * 如果 istio 使用 `ambient` 模式安裝,也要使用 `istio auto injection` 的 label 做 sidecar 模式才能使用。 ``` $ kubectl label namespace default istio-injection=enabled ``` * 部屬 httpbin 應用 ``` $ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/httpbin.yaml ``` ``` $ kubectl get all NAME READY STATUS RESTARTS AGE pod/httpbin-86b8ffc5ff-tsrtd 2/2 Running 0 5m2s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/httpbin ClusterIP 10.43.44.113 <none> 8000/TCP 5m2s service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 134d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/httpbin 1/1 1 1 5m2s NAME DESIRED CURRENT READY AGE replicaset.apps/httpbin-86b8ffc5ff 1 1 1 5m2s ``` * 建立一個目標規則,在呼叫 httpbin 服務時會套用熔斷設定 * 定義連線池 HTTP1 最大等待請求設定為 1 ,每個連線的 HTTP 最大請求設定為 1,TCP 最大連線數設定為 1 ``` $ kubectl apply -f - <<EOF apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: httpbin spec: host: httpbin trafficPolicy: connectionPool: tcp: maxConnections: 1 http: http1MaxPendingRequests: 1 maxRequestsPerConnection: 1 outlierDetection: consecutive5xxErrors: 1 interval: 1s baseEjectionTime: 3m maxEjectionPercent: 100 EOF ``` ``` $ kubectl get destinationrule httpbin NAME HOST AGE httpbin httpbin 12s ``` ### 新增一個 client 端 建立 client 程式以傳送流量到 httpbin 服務。這是一個名為 Fortio 的負載測試客戶端,它可以控制連線數、並發數及發送 HTTP 請求的延遲。透過 Fortio 能夠有效的觸發前面在 DestinationRule 中設定的熔斷策略。 ``` $ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/sample-client/fortio-deploy.yaml ``` ``` $ kubectl get all NAME READY STATUS RESTARTS AGE pod/fortio-deploy-689bd5969b-nn8c5 2/2 Running 0 27s pod/httpbin-86b8ffc5ff-tsrtd 2/2 Running 0 7m38s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/fortio ClusterIP 10.43.243.156 <none> 8080/TCP 27s service/httpbin ClusterIP 10.43.44.113 <none> 8000/TCP 7m38s service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 134d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/fortio-deploy 1/1 1 1 27s deployment.apps/httpbin 1/1 1 1 7m38s NAME DESIRED CURRENT READY AGE replicaset.apps/fortio-deploy-689bd5969b 1 1 1 27s replicaset.apps/httpbin-86b8ffc5ff 1 1 1 7m38s ``` ### 觸發熔斷器 #### 發送連接數為 2 的連線(-c 2),請求 20 次(-n 20) ``` $ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 -loglevel Warning http://httpbin:8000/get ``` 螢幕輸出 ``` {"ts":1720169256.582932,"level":"info","r":1,"file":"logger.go","line":254,"msg":"Log level is now 3 Warning (was 2 Info)"} Fortio 1.60.3 running at 0 queries per second, 8->8 procs, for 20 calls: http://httpbin:8000/get Starting at max qps with 2 thread(s) [gomax 8] for exactly 20 calls (10 per thread + 0) {"ts":1720169256.595975,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169256.598523,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169256.602482,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169256.642456,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169256.672433,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169256.680723,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169256.687158,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169256.696587,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169256.698998,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} Ended after 143.100201ms : 20 calls. qps=139.76 Aggregated Function Time : count 20 avg 0.012769649 +/- 0.01521 min 0.000646681 max 0.056919877 sum 0.255392975 # range, mid point, percentile, count >= 0.000646681 <= 0.001 , 0.000823341 , 15.00, 3 > 0.001 <= 0.002 , 0.0015 , 20.00, 1 > 0.002 <= 0.003 , 0.0025 , 30.00, 2 > 0.003 <= 0.004 , 0.0035 , 35.00, 1 > 0.005 <= 0.006 , 0.0055 , 40.00, 1 > 0.008 <= 0.009 , 0.0085 , 50.00, 2 > 0.009 <= 0.01 , 0.0095 , 60.00, 2 > 0.01 <= 0.011 , 0.0105 , 70.00, 2 > 0.014 <= 0.016 , 0.015 , 85.00, 3 > 0.025 <= 0.03 , 0.0275 , 90.00, 1 > 0.05 <= 0.0569199 , 0.0534599 , 100.00, 2 # target 50% 0.009 # target 75% 0.0146667 # target 90% 0.03 # target 99% 0.0562279 # target 99.9% 0.0568507 Error cases : count 9 avg 0.0030782114 +/- 0.002748 min 0.000646681 max 0.009490675 sum 0.027703903 # range, mid point, percentile, count >= 0.000646681 <= 0.001 , 0.000823341 , 33.33, 3 > 0.001 <= 0.002 , 0.0015 , 44.44, 1 > 0.002 <= 0.003 , 0.0025 , 66.67, 2 > 0.003 <= 0.004 , 0.0035 , 77.78, 1 > 0.005 <= 0.006 , 0.0055 , 88.89, 1 > 0.009 <= 0.00949068 , 0.00924534 , 100.00, 1 # target 50% 0.00225 # target 75% 0.00375 # target 90% 0.00904907 # target 99% 0.00944651 # target 99.9% 0.00948626 # Socket and IP used for each connection: [0] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.00034085183 +/- 9.122e-05 min 0.000203988 max 0.000459448 sum 0.002045111 [1] 4 socket used, resolved to 10.43.44.113:8000, connection timing : count 4 avg 0.00041282025 +/- 0.0001522 min 0.000187264 max 0.000559172 sum 0.001651281 Connection time (s) : count 10 avg 0.0003696392 +/- 0.0001245 min 0.000187264 max 0.000559172 sum 0.003696392 Sockets used: 10 (for perfect keepalive, would be 2) Uniform: false, Jitter: false, Catchup allowed: true IP addresses distribution: 10.43.44.113:8000: 10 Code 200 : 11 (55.0 %) Code 503 : 9 (45.0 %) Response Header Sizes : count 20 avg 126.8 +/- 114.7 min 0 max 231 sum 2536 Response Body/Total Sizes : count 20 avg 561.95 +/- 290.3 min 241 max 825 sum 11239 All done 20 calls (plus 0 warmup) 12.770 ms avg, 139.8 qps ``` 結果發現,有幾乎一半還是可以正常請求的,代表 istio 允許存在一些誤差 ``` Code 200 : 11 (55.0 %) # 代表正常請求 Code 503 : 9 (45.0 %) # 代表請求失敗 ``` #### 發送連接數為 3 的連線(-c 3),請求 30 次(-n 30) ``` $ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c fortio -- /usr/bin/fortio load -c 3 -qps 0 -n 30 -loglevel Warning http://httpbin:8000/get ``` 螢幕輸出 ``` {"ts":1720169589.239981,"level":"info","r":1,"file":"logger.go","line":254,"msg":"Log level is now 3 Warning (was 2 Info)"} Fortio 1.60.3 running at 0 queries per second, 8->8 procs, for 30 calls: http://httpbin:8000/get Starting at max qps with 3 thread(s) [gomax 8] for exactly 30 calls (10 per thread + 0) {"ts":1720169589.247802,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} {"ts":1720169589.249129,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.250238,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} {"ts":1720169589.253388,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.255138,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.256829,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.257720,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.276840,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169589.277520,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} {"ts":1720169589.280439,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169589.285195,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.286524,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169589.288304,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.292242,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.292570,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} {"ts":1720169589.296302,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0} {"ts":1720169589.297674,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169589.308965,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} {"ts":1720169589.314986,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0} {"ts":1720169589.320419,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0} Ended after 85.810211ms : 30 calls. qps=349.61 Aggregated Function Time : count 30 avg 0.0070091095 +/- 0.008139 min 0.00049344 max 0.030126658 sum 0.210273286 # range, mid point, percentile, count >= 0.00049344 <= 0.001 , 0.00074672 , 16.67, 5 > 0.001 <= 0.002 , 0.0015 , 33.33, 5 > 0.002 <= 0.003 , 0.0025 , 46.67, 4 > 0.003 <= 0.004 , 0.0035 , 56.67, 3 > 0.004 <= 0.005 , 0.0045 , 63.33, 2 > 0.005 <= 0.006 , 0.0055 , 66.67, 1 > 0.006 <= 0.007 , 0.0065 , 70.00, 1 > 0.01 <= 0.011 , 0.0105 , 80.00, 3 > 0.012 <= 0.014 , 0.013 , 83.33, 1 > 0.014 <= 0.016 , 0.015 , 86.67, 1 > 0.016 <= 0.018 , 0.017 , 90.00, 1 > 0.025 <= 0.03 , 0.0275 , 96.67, 2 > 0.03 <= 0.0301267 , 0.0300633 , 100.00, 1 # target 50% 0.00333333 # target 75% 0.0105 # target 90% 0.018 # target 99% 0.0300887 # target 99.9% 0.0301229 Error cases : count 20 avg 0.0022672909 +/- 0.001464 min 0.00049344 max 0.006094168 sum 0.045345818 # range, mid point, percentile, count >= 0.00049344 <= 0.001 , 0.00074672 , 25.00, 5 > 0.001 <= 0.002 , 0.0015 , 50.00, 5 > 0.002 <= 0.003 , 0.0025 , 70.00, 4 > 0.003 <= 0.004 , 0.0035 , 85.00, 3 > 0.004 <= 0.005 , 0.0045 , 95.00, 2 > 0.006 <= 0.00609417 , 0.00604708 , 100.00, 1 # target 50% 0.002 # target 75% 0.00333333 # target 90% 0.0045 # target 99% 0.00607533 # target 99.9% 0.00609228 # Socket and IP used for each connection: [0] 9 socket used, resolved to 10.43.44.113:8000, connection timing : count 9 avg 0.00027354944 +/- 0.0001713 min 8.3008e-05 max 0.000635358 sum 0.002461945 [1] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.00024202917 +/- 5.902e-05 min 0.000167526 max 0.000324743 sum 0.001452175 [2] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.0003877755 +/- 0.000185 min 0.000102225 max 0.000595858 sum 0.002326653 Connection time (s) : count 21 avg 0.00029717967 +/- 0.0001637 min 8.3008e-05 max 0.000635358 sum 0.006240773 Sockets used: 21 (for perfect keepalive, would be 3) Uniform: false, Jitter: false, Catchup allowed: true IP addresses distribution: 10.43.44.113:8000: 21 Code 200 : 10 (33.3 %) Code 503 : 20 (66.7 %) Response Header Sizes : count 30 avg 76.833333 +/- 108.7 min 0 max 231 sum 2305 Response Body/Total Sizes : count 30 avg 435.5 +/- 275.1 min 241 max 825 sum 13065 All done 30 calls (plus 0 warmup) 7.009 ms avg, 349.6 qps ``` * 現在只剩 33.3% 請求成功,其餘的均被熔斷器攔截 ``` Code 200 : 10 (33.3 %) Code 503 : 20 (66.7 %) ``` ### 查詢 istio-proxy 狀態以了解熔斷詳情 ``` $ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c istio-proxy -- pilot-agent request GET stats | grep httpbin | grep pending ``` 螢幕輸出 ``` cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.remaining_pending: 1 cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0 cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0 cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_active: 0 cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_failure_eject: 0 cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_overflow: 29 cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_total: 21 ``` 可以看到 `upstream_rq_pending_overflow` 值 29,這意代表,目前為止已有 29 個呼叫被標記為熔斷 ## 環境清理 ``` $ kubectl delete destinationrule httpbin $ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/httpbin.yaml $ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/sample-client/fortio-deploy.yaml ``` ## 參考網站 https://istio.io/latest/zh/docs/tasks/traffic-management/circuit-breaking/