# istio 熔斷
熔斷,是創建彈性微服務應用程式的重要模式。熔斷能夠使您的應用程式具備應對來自故障、潛在高峰和其他未知網路因素影響的能力。它通過監控服務的健康狀況並在檢測到問題時切斷流量來實現。這可以防止故障服務拖累整個系統並導致服務不可用。
1. 監控服務健康狀況: Istio 使用各種指標來監控服務的健康狀況,包括請求成功率、響應時間和錯誤率。
2. 檢測故障: 如果監控指標超出預設閾值,Istio 將會將服務標記為不健康。
3. 切斷流量: 當服務被標記為不健康時,Istio 將會切斷對該服務的流量。這意味著新的請求將不會被路由到該服務,現有的請求將會失敗。
4. 故障恢復: Istio 會定期重新評估不健康服務的健康狀況。如果服務恢復正常,Istio 將會重新打開對該服務的流量。
## 實作
* 需先安裝好 istio
* 在 default namespace 設定 istio auto injection
* 如果 istio 使用 `ambient` 模式安裝,也要使用 `istio auto injection` 的 label 做 sidecar 模式才能使用。
```
$ kubectl label namespace default istio-injection=enabled
```
* 部屬 httpbin 應用
```
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/httpbin.yaml
```
```
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/httpbin-86b8ffc5ff-tsrtd 2/2 Running 0 5m2s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/httpbin ClusterIP 10.43.44.113 <none> 8000/TCP 5m2s
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 134d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/httpbin 1/1 1 1 5m2s
NAME DESIRED CURRENT READY AGE
replicaset.apps/httpbin-86b8ffc5ff 1 1 1 5m2s
```
* 建立一個目標規則,在呼叫 httpbin 服務時會套用熔斷設定
* 定義連線池 HTTP1 最大等待請求設定為 1 ,每個連線的 HTTP 最大請求設定為 1,TCP 最大連線數設定為 1
```
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
spec:
host: httpbin
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutive5xxErrors: 1
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
EOF
```
```
$ kubectl get destinationrule httpbin
NAME HOST AGE
httpbin httpbin 12s
```
### 新增一個 client 端
建立 client 程式以傳送流量到 httpbin 服務。這是一個名為 Fortio 的負載測試客戶端,它可以控制連線數、並發數及發送 HTTP 請求的延遲。透過 Fortio 能夠有效的觸發前面在 DestinationRule 中設定的熔斷策略。
```
$ kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/sample-client/fortio-deploy.yaml
```
```
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/fortio-deploy-689bd5969b-nn8c5 2/2 Running 0 27s
pod/httpbin-86b8ffc5ff-tsrtd 2/2 Running 0 7m38s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/fortio ClusterIP 10.43.243.156 <none> 8080/TCP 27s
service/httpbin ClusterIP 10.43.44.113 <none> 8000/TCP 7m38s
service/kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 134d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/fortio-deploy 1/1 1 1 27s
deployment.apps/httpbin 1/1 1 1 7m38s
NAME DESIRED CURRENT READY AGE
replicaset.apps/fortio-deploy-689bd5969b 1 1 1 27s
replicaset.apps/httpbin-86b8ffc5ff 1 1 1 7m38s
```
### 觸發熔斷器
#### 發送連接數為 2 的連線(-c 2),請求 20 次(-n 20)
```
$ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 -loglevel Warning http://httpbin:8000/get
```
螢幕輸出
```
{"ts":1720169256.582932,"level":"info","r":1,"file":"logger.go","line":254,"msg":"Log level is now 3 Warning (was 2 Info)"}
Fortio 1.60.3 running at 0 queries per second, 8->8 procs, for 20 calls: http://httpbin:8000/get
Starting at max qps with 2 thread(s) [gomax 8] for exactly 20 calls (10 per thread + 0)
{"ts":1720169256.595975,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169256.598523,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169256.602482,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169256.642456,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169256.672433,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169256.680723,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169256.687158,"level":"warn","r":13,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169256.696587,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169256.698998,"level":"warn","r":12,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
Ended after 143.100201ms : 20 calls. qps=139.76
Aggregated Function Time : count 20 avg 0.012769649 +/- 0.01521 min 0.000646681 max 0.056919877 sum 0.255392975
# range, mid point, percentile, count
>= 0.000646681 <= 0.001 , 0.000823341 , 15.00, 3
> 0.001 <= 0.002 , 0.0015 , 20.00, 1
> 0.002 <= 0.003 , 0.0025 , 30.00, 2
> 0.003 <= 0.004 , 0.0035 , 35.00, 1
> 0.005 <= 0.006 , 0.0055 , 40.00, 1
> 0.008 <= 0.009 , 0.0085 , 50.00, 2
> 0.009 <= 0.01 , 0.0095 , 60.00, 2
> 0.01 <= 0.011 , 0.0105 , 70.00, 2
> 0.014 <= 0.016 , 0.015 , 85.00, 3
> 0.025 <= 0.03 , 0.0275 , 90.00, 1
> 0.05 <= 0.0569199 , 0.0534599 , 100.00, 2
# target 50% 0.009
# target 75% 0.0146667
# target 90% 0.03
# target 99% 0.0562279
# target 99.9% 0.0568507
Error cases : count 9 avg 0.0030782114 +/- 0.002748 min 0.000646681 max 0.009490675 sum 0.027703903
# range, mid point, percentile, count
>= 0.000646681 <= 0.001 , 0.000823341 , 33.33, 3
> 0.001 <= 0.002 , 0.0015 , 44.44, 1
> 0.002 <= 0.003 , 0.0025 , 66.67, 2
> 0.003 <= 0.004 , 0.0035 , 77.78, 1
> 0.005 <= 0.006 , 0.0055 , 88.89, 1
> 0.009 <= 0.00949068 , 0.00924534 , 100.00, 1
# target 50% 0.00225
# target 75% 0.00375
# target 90% 0.00904907
# target 99% 0.00944651
# target 99.9% 0.00948626
# Socket and IP used for each connection:
[0] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.00034085183 +/- 9.122e-05 min 0.000203988 max 0.000459448 sum 0.002045111
[1] 4 socket used, resolved to 10.43.44.113:8000, connection timing : count 4 avg 0.00041282025 +/- 0.0001522 min 0.000187264 max 0.000559172 sum 0.001651281
Connection time (s) : count 10 avg 0.0003696392 +/- 0.0001245 min 0.000187264 max 0.000559172 sum 0.003696392
Sockets used: 10 (for perfect keepalive, would be 2)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.44.113:8000: 10
Code 200 : 11 (55.0 %)
Code 503 : 9 (45.0 %)
Response Header Sizes : count 20 avg 126.8 +/- 114.7 min 0 max 231 sum 2536
Response Body/Total Sizes : count 20 avg 561.95 +/- 290.3 min 241 max 825 sum 11239
All done 20 calls (plus 0 warmup) 12.770 ms avg, 139.8 qps
```
結果發現,有幾乎一半還是可以正常請求的,代表 istio 允許存在一些誤差
```
Code 200 : 11 (55.0 %) # 代表正常請求
Code 503 : 9 (45.0 %) # 代表請求失敗
```
#### 發送連接數為 3 的連線(-c 3),請求 30 次(-n 30)
```
$ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c fortio -- /usr/bin/fortio load -c 3 -qps 0 -n 30 -loglevel Warning http://httpbin:8000/get
```
螢幕輸出
```
{"ts":1720169589.239981,"level":"info","r":1,"file":"logger.go","line":254,"msg":"Log level is now 3 Warning (was 2 Info)"}
Fortio 1.60.3 running at 0 queries per second, 8->8 procs, for 30 calls: http://httpbin:8000/get
Starting at max qps with 3 thread(s) [gomax 8] for exactly 30 calls (10 per thread + 0)
{"ts":1720169589.247802,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
{"ts":1720169589.249129,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.250238,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
{"ts":1720169589.253388,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.255138,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.256829,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.257720,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.276840,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169589.277520,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
{"ts":1720169589.280439,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169589.285195,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.286524,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169589.288304,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.292242,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.292570,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
{"ts":1720169589.296302,"level":"warn","r":25,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":0,"run":0}
{"ts":1720169589.297674,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169589.308965,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
{"ts":1720169589.314986,"level":"warn","r":26,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":1,"run":0}
{"ts":1720169589.320419,"level":"warn","r":27,"file":"http_client.go","line":1104,"msg":"Non ok http code","code":503,"status":"HTTP/1.1 503","thread":2,"run":0}
Ended after 85.810211ms : 30 calls. qps=349.61
Aggregated Function Time : count 30 avg 0.0070091095 +/- 0.008139 min 0.00049344 max 0.030126658 sum 0.210273286
# range, mid point, percentile, count
>= 0.00049344 <= 0.001 , 0.00074672 , 16.67, 5
> 0.001 <= 0.002 , 0.0015 , 33.33, 5
> 0.002 <= 0.003 , 0.0025 , 46.67, 4
> 0.003 <= 0.004 , 0.0035 , 56.67, 3
> 0.004 <= 0.005 , 0.0045 , 63.33, 2
> 0.005 <= 0.006 , 0.0055 , 66.67, 1
> 0.006 <= 0.007 , 0.0065 , 70.00, 1
> 0.01 <= 0.011 , 0.0105 , 80.00, 3
> 0.012 <= 0.014 , 0.013 , 83.33, 1
> 0.014 <= 0.016 , 0.015 , 86.67, 1
> 0.016 <= 0.018 , 0.017 , 90.00, 1
> 0.025 <= 0.03 , 0.0275 , 96.67, 2
> 0.03 <= 0.0301267 , 0.0300633 , 100.00, 1
# target 50% 0.00333333
# target 75% 0.0105
# target 90% 0.018
# target 99% 0.0300887
# target 99.9% 0.0301229
Error cases : count 20 avg 0.0022672909 +/- 0.001464 min 0.00049344 max 0.006094168 sum 0.045345818
# range, mid point, percentile, count
>= 0.00049344 <= 0.001 , 0.00074672 , 25.00, 5
> 0.001 <= 0.002 , 0.0015 , 50.00, 5
> 0.002 <= 0.003 , 0.0025 , 70.00, 4
> 0.003 <= 0.004 , 0.0035 , 85.00, 3
> 0.004 <= 0.005 , 0.0045 , 95.00, 2
> 0.006 <= 0.00609417 , 0.00604708 , 100.00, 1
# target 50% 0.002
# target 75% 0.00333333
# target 90% 0.0045
# target 99% 0.00607533
# target 99.9% 0.00609228
# Socket and IP used for each connection:
[0] 9 socket used, resolved to 10.43.44.113:8000, connection timing : count 9 avg 0.00027354944 +/- 0.0001713 min 8.3008e-05 max 0.000635358 sum 0.002461945
[1] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.00024202917 +/- 5.902e-05 min 0.000167526 max 0.000324743 sum 0.001452175
[2] 6 socket used, resolved to 10.43.44.113:8000, connection timing : count 6 avg 0.0003877755 +/- 0.000185 min 0.000102225 max 0.000595858 sum 0.002326653
Connection time (s) : count 21 avg 0.00029717967 +/- 0.0001637 min 8.3008e-05 max 0.000635358 sum 0.006240773
Sockets used: 21 (for perfect keepalive, would be 3)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.44.113:8000: 21
Code 200 : 10 (33.3 %)
Code 503 : 20 (66.7 %)
Response Header Sizes : count 30 avg 76.833333 +/- 108.7 min 0 max 231 sum 2305
Response Body/Total Sizes : count 30 avg 435.5 +/- 275.1 min 241 max 825 sum 13065
All done 30 calls (plus 0 warmup) 7.009 ms avg, 349.6 qps
```
* 現在只剩 33.3% 請求成功,其餘的均被熔斷器攔截
```
Code 200 : 10 (33.3 %)
Code 503 : 20 (66.7 %)
```
### 查詢 istio-proxy 狀態以了解熔斷詳情
```
$ kubectl exec fortio-deploy-689bd5969b-nn8c5 -c istio-proxy -- pilot-agent request GET stats | grep httpbin | grep pending
```
螢幕輸出
```
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.remaining_pending: 1
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_failure_eject: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_overflow: 29
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_total: 21
```
可以看到 `upstream_rq_pending_overflow` 值 29,這意代表,目前為止已有 29 個呼叫被標記為熔斷
## 環境清理
```
$ kubectl delete destinationrule httpbin
$ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/httpbin.yaml
$ kubectl delete -f https://raw.githubusercontent.com/istio/istio/release-1.22/samples/httpbin/sample-client/fortio-deploy.yaml
```
## 參考網站
https://istio.io/latest/zh/docs/tasks/traffic-management/circuit-breaking/