2023 K8s Summit

# 2023 K8s Summit * [共筆](https://hackmd.io/@k8ssummit/2023/%2F%40k8ssummit%2FrJsT1AkMp) # 10/25(Wed.) ## 趨勢科技大規模 Kubernetes 運維甘苦談：關於自建、雲端和 CI/CD 的那些大小事 - 張哲魁 (Jeff) ### Personal * 2023 k8s v1.28 * 1 version / 3 month * self-managed * 2017 v1.6 需自己刻 auto-scaling * cloud-half * var/log 把另一個硬碟掛上去，避免log塞爆root導致node 掛掉 * 由於 autoscaling 是自己刻的，但AWS的autoscaling 只能關掉 bode提供資源，會造成 downtime（正常的流程是把node上的pod都drain，再關掉） * K8s upgrade: EKS control plane * cloud-full * GKE 不用擔心升級 autoscaling + 升級都ok * CI/CD： * before: JENKINS + kubectl deploy * now: github action + ArgoCD * IaC: * before: AWS Cloud Information * Now: Terraform * Logging: * Kube-logging https://kube-logging.dev/ * conclusion: * self-managed 自架 k8s 功力可以提昇，但須花的時間太多 * cloud-half： production 會需要很多力氣維運 ### 共筆 * 架構 + 過去: 自架k8s + AWS EC2 as VM + IaC: CloudFormation + Ansible + 遇到的問題: + infra & operations + Auto scaling: 條件? 要自己判斷，純手動 + Kubernetes升級造成Ansible playbook需要更新 + production issue + API server FQDN resolved IP changed 導致整座 cluster 倒掉 + K8s 1.6那個時候，如果DNS resolved出來API server的IP變了會導致node跟master失去連線，而講者當時使用的load balancer有這個狀況 * 過去: Cloud half-managed + EKS release + 半託管 + Control plane -> EKS + Worker -> self managed + 原因 + 當時log採用先入檔案，再用filebeats去收 + 怕log把node硬碟塞爆 + 多掛了 volume 到 /var (現在主要採 log streaming) + 遇到的問題 + auto scaling + 叢集縮放採用auto-scaling group + scale-out沒問題 + scale-in會出事，因為node自管，跟EKS本身的方案不相容，auto-scaling group會直接把節點關掉，不會先轉移workload + kubernetests 升級 + 工作大部分的時間從ansible變成包VM image * 現在: Cloud fully-managed + 架構：大同小異 (講者分享採用EKS) * CI/CD, IaC, Log * CI/CD + 以前：Jenkins + kubectl deploy + 現在：github Action + Argo CD deploy + Argo 優點：更容易看到版本差異 * IaC + Ansible > CloudFormation > Terraform + Terraform (管理多座的k8s cluster) + 跨平台 + state跟infra實際狀態有差的時候比較好修 + modularizef * Logging + 以前 + log 到檔案，需維運 file fluentbit + 現在 + [kube-logging](https://kube-logging.dev/)，簡化管理，避免 config 散落各地 ## 融合 AI 技術，加速應用開發、交付與優化 - 江楨義 (Jeamy) ### Personal * safecoder：code generation 與huggingface 合作的 copilot 地端版，VM ware/k8s 並支援 java c c++ python... * private AI foundation: 插上 GPU 硬體建於地端的私有雲有統整 NV framework 包含 Nemo * AI: 轉地端？ * chatgpt 沒有自己的private data * 需要自己的 data train 自己的model * 支援 Opensource * open ecosystem, ray... * intelligence system 維運 * 有 console hub 可以看 application deployment, performance 等等維運可以用對話的方式詢問解決問題、優化的方法 * Tanzu hub 結合開發維運自動化 ?? + intelligence system ### 共筆 * 應用生成式ＡＩ * 企業IT管理層調查分析 + 有 82 % 的企業正在上雲 + 有 72 % 使用 K8s 與容器 + 有 90 % 開始使用 Opensource + 有 87 % 使用多個公有雲 + 有 47 % 開始部署 AI application (成長中?) * VMware Tanzu + SafeCoder + 與hugging face合作 + 地端copilot + Private AI foundation + 與nv合作 + 其他opensource + Cloud fundation * 地端 Private AI + 業務資料：公用的 chatGPT 無法回答企業問題 + 安全性：企業資料無法直接公開 * VMware Tanzu + 開發維運最佳化 + 應用平台 + 整合服務 ## Google 重塑新一代 Kubernetes Ops 與 AI 技術的致勝關鍵 - 林書平 (Harry Lin) ### Personal * google cloud AI-Optimized Infrastructure * AI workloads * duet AI for Google cloud * management: * 用對話的方式問 ex:要怎麼用 terraform 的方式建 cluster * contextualized log explanations, explain error logs * optimization: * GKE 資源優化問題詢問/建議 * cloud run * enterprise workloads * GKE enterprise * trouble-shooting dashboard * create alert to preact ### 共筆 Gartner Research + >90% of orgnization will run production apps on containers by 2027 Duet AI + contectualized log explanation GKE enterprise + Cross platform(on prem/multi cloud cluster) ## How to link K8s to Business 讓主管們都聽得懂 - 楊偉強 (Terence Yeung) ### Personal * cloud-native application ### 共筆 What is Kubernetes? + not friendly to non-IT colleagues. + one of DI's component Digital Application(DA) and Digital Infrastructure(DI) + DA: CRM, chatbot, Finance System, AI + DI: Data Centre, Cloud, Data Lake, Middleware, K8S The Challenge on current technology stack + very difficult to understand even by tech staffs + You need to know basis before you can understand (most challenging problem when communicating with Business) + More technical terms trying to be user friendly Establish Group DI Framework + Governance + 企業架構 + 資訊整合中台架構 + Execution/Delivery + Foundation 要讓業務單位想聽要把系統關聯到業務系統對業務的影響他們就會有興趣講 Solution 選擇，而不是只提出問題 Cost 省錢 (講實例) Agility Resilience 避免單點故障 ## ithome 各家產業投資 cloud k8s 資金比例 - 王宏仁 ### Personal --- ### 共筆 --- ## Resource as Code for Kubernetes: Stop kubectl apply - 張哲嘉 ### Personal * gitflow 解1 * helm chart解3 * argoCD：解2&4 application要declarative方法選到，版本控管自動化 ### 共筆 * yaml 地獄 * yaml上版控變成yaml維護師 * HELM 讓你從管理 K8s object 變成一整包的 package (更高層級的封裝)，配合 HELM 生態系有現成的一堆人家已經設定好的 helm chart 可以直接使用。更高層級的封裝：版本更新可以更具一致性維護更容易 * ARGO CD 雖然 [K9s](https://github.com/derailed/k9s) 可以提供觀察 k8s 叢集內容的功能，但 ARGO CD 在使用上的流程整合更容易： * **git generator** : 讀取 git repo 指定的路徑，只要有內容變化就會自動產生新的 *ApplicationSet* 產生出一組新的應用集合定義，就可用於部署。 * **cluster generator** : 可在 yaml 設定裡定義 matchLabel ，定義當碰到某種 label 時根據不同的設定去使用不同的 k8s 服務(ex: EKS, AKS, GKE) ## suse 變色龍 ### Personal --- ### 共筆 + fully managed environtment 並非絕對安全，仍可以薦由dump RAM file取得如certificate的機密資訊何謂機密運算 + Data in transit + Data at rest + Data in use 為何需要機密運算？ A user in compromised shared environment unencrypted 為什麼機密運算對k8s是重要的？ + No SWAP by default + RAM didn't encrypt. 機密運送硬體所需支援(可參考投影片) 機密運算的目的 + 資料安全 + privacy and compliance + 資料主動權 Azure & google 已經ready for confidential computing Suse's tool for container security NeuVector https://www.suse.com/neuvector/ ## 使用 gRPC 優化 Kubernetes 中 Pod 之間的資料傳輸速度 - 李毅山 (John) ### Personal * III DevOPs * 微服務之間的溝通有 cost，想使用一個protocol來加速 * gRPC http2+proto buffer * http2 vs http1.1 * 1.壓縮成 headerframe+data frame (binary) 傳輸量較小 * 2.multiplexing 加速傳輸速率 * 3.server push * proto buffer * 1.傳輸的 data 壓縮成 binary * 2. * benchmark gRPC 勝出，勝過restful API，效能跟沒有切差不多，但傳輸速度較快、動態調整pod數量 * restful vs grpc 重視資料回傳格式 - restful 微服務與微服務之間的溝通 - gRPC ### 共筆 * gRPC特點: + USE HTTP2 + HTTP2 : 解決效能瓶頸 - Binary framing layer: - Header封裝成HEADERS Frame - Body封裝成DATA Frame - 只要建立一次TCP CONNECTION(一個TCP連線可同時處理多個請求及回應) - Server Push: 利用push promise自動回傳需要的資源(需事先設定) - FASTER + Proto buffer + Data 壓縮成 binary + binary encoding Restful API vs. gRPC - | | Restful API | gRPC | | -------- | -------- | -------- | | 協議基礎 | HTTP1 | HTTP2 | | 數據格式 | JSON, XML | Protocol Buffers | | 回應時間 | 較長 | 較短 | | 推薦應用時機 | 與第三方做串接,注重資料回傳格式 | 微服務與微服務之間的溝通 | ## 如何盡量避免 Throttling 在 K8s 中 - 李啓維 (Kiwi) ### Personal * case: VM service 搬到 k8s 後 latency 變高 * 拉高 cpu resource 能減低 latency 但效果有限 * 但發現實際 cpu utilization 偏低 * VM 能看到的是所有的 CPU & 能利用的一致 * K8s 看到的是 node 中全部的 cpu - 但能利用的是放飯時（throttling rate = 沒吃飽等待下次的次數/放飯的次數)允許使用的 cpu resource * ex: 單人：每100分鐘給20g * ex: 多人：大家一起分享那20g，因此在多人情境下 throttling rate會變大 * 1.調整 numpy 環境變數可以大幅改善，但仍有部分 project 會有此狀況 * 2.package調整 process/thread 參數 * 3.k8s cpu management policy:kubelet static policy 讓 pod 直接綁定節點上的 cpu，讓能看到的=所有能很利用的如何使用？ * 1.啟用 * 2.QoS - bestEffort -> burstable -> guaranteed * throttling 避免方法： * 0. 增加 metric 可見度，才能明確知道問題出在哪裡 * 1. 提高 CPU limit * 2. 降低 container中的thread數量：調整 python package 參數 number of thread/process pool & CPU Cores 數 => 問題：不一定每個package都能調整 * 3. static cpu manager policy => 問題：資源閒置浪費 & 無法彈性使用 k8s container 得 autoscaling ### 共筆 --- # 10/26(Thu.) ## Kubernetes APIs for the Future: Building Platforms and Managing Everything - Janet Kuo ### Personal * kubernetes controller * CustomResourceDefinition: 可以自己定義 schema, resource * kubebuilder - google open source project 提供 code 框架（程式語言 go）寫自己的 api, * github gcp k8s-config-connector * before gitops: clickops 類似按一下執行，但不 scalable 而且沒辦法 audit 所有的 change ### 共筆 * Kubernetes Controllers + Declarative APIs (imperative need to elaberate all the steps) + 僅需要告訴最後的狀態是什麼，系統會自動執行 + Continuous reconciliation + 一直不斷重複執行 + level triggered + 系統會根據聲明的狀態自動修正與重啟 + desired state vs actual state * From monolithic to democratized + CRD: CustomResourceDefinition + 在spec下面寫入自己的API schema，就可以像內建的API一樣使用，可以和 kubectl 整合 + 透過Build & Extend 建自已需要的API + 仍需要寫CustomController的邏輯 + kubebuilder + similar to ruby on rails: 提供一個框架直接填入不一樣的地方 + https://book.kubebuilder.io/ + Controller 和 CRD 之間最好要一對一，保持scalable 與避免 race condition + composite resource pattern + High level API: wrap multiple APIs and policies + GitOps + version control (audit/approve) + continous reconciliation + Config Sync + https://github.com/GoogleContainerTools/kpt-config-sync ## 打造雲原生世界的資安堡壘 - 疏宇軒 (Renee) ### Personal ### 共筆 ## 幫服務建立觀測性，利用 ITSM 與自動化完成數位企業最後一哩路 - 李建志 (JJ) ### Personal serviceOPs 快速找 root cause, 會有系統自己產生的拓撲 Control-M ### 共筆 + bmc helix + Federated CMDB + 手敲的不好 + 自動建立K8s上、K8s外的服務之間構成的拓撲 + Multiple Dashboard + 不同人想看的不一樣 + Shared Responsibility model + Cross-Network Dependencies + Focus on Service Performance Indicators + 服務導向 + Business-Based KPIs ## Feature Toggle Makes Development more Efficient - 黃信豪 (Sean Huang) ### Personal * OpenFlagr * feature toggle: * 使用情境1：能確認每一步都完成再部署上去，避免前面都部署上去但後續有一步未完成導致前面都要rollback的狀況 * 使用情境2：部署上去後檢查 * 使用情境3: * 工法： * Openflagr * OpenFeature * Mockmoon - MockAPI server * performance: * LTC lead time for change * DF deployment frequency 部署週期要快 * CFR change failure rate * MTTR Mean time to restore service ### 共筆 + 前言 + 專案管理鐵三角 + 範疇 + 成本 + 品質 + 介紹 + 基礎 + Feature toggle 的工作原理是通過在程式碼中插入一個布林值變量來控制功能的啟用狀態。開發人員可以使用這個變量來決定哪些使用者可以看到或使用新功能。 + 進階1 + 透過 configmap 來注入 flag 值 + 開關功能不用修改程式碼 + 進階2 + 用Database來當toggle + 導入考量 + API access for CRUD + 快速調整flag + SDK for each language + 可以快速整合 + client side caching + 避免存取toggle狀態變成buttlenek + feature toggle + on/off + gradual rollout 0~100% + hash bucket + segmentation + A/B testing + metrics collection + current A/B testing + Authentication and audit log + performance report + line採用OpenFlagr https://github.com/openflagr/flagr ![](https://hackmd.io/_uploads/rJ1C9UPG6.png) + 應用場景 + Use Case for E-Commerce + 第三方合作夥伴整合 + 功能先做好了，可是第三方延期，可以簡單的把功能關掉，不需要revert code, 重新測試, ... + Canary release + 只在少量的使用者作測試 + A/B testing + 任意的分群方式 + 各個team自己host open flagr而非提供集中服務 + 有authentication，避免有人亂開關到東西 + 鎖住open flagr的UI + Line有改openflagr讓他可以有修改記錄 + Open Feature Standardizing Feature Flagging for Everyone https://openfeature.dev/ + MockAPI + 可以幫助前後端協作, 以及 QA 的測試 + Mockoon: https://mockoon.com/ + 一個跟pacman一樣可以拿來打API，又可以當mock server的GUI工具 + Line開發指標 + LTC - Lead Time for Change + DF - Deployment Frequency + CRF - Change Failure Rate + MTTR - Mean Time to Restore Service ## On-premise Workload 遷移 Kubernetes 心路歷程 - 楊秉諺 (Michael) ### Personal * 雲地連結的兩種方式 connect vpc * kubenet 可以節省私網 ip 但 performance 較差 * cni performance 比較好 * traefik * externalDNS * traefik route 可以做 traffic mirroring 評估雲端和地端的流量 * weighted load balancer * middleware * circuitbreaker 根據 latency, networkerror, response評估如果service壞了就不會浪費時間再去打他 * secrets manager * external secrets * secret store csi driver * KEDA kubernetes-based Event Driven Autoscaler：ex 半夜比較少人聽歌資源可以開少一點 * event source and scalers: ex: prometheus * qemu ### 共筆一開始服務都在地端的機房，3年前開始migrate到雲端，降底Cost + Goal + Reduce machine management + 時間、週期、流程至少2個月 + 影響產品交付時程 + Quick deployment of service + VPC + 雲地互聯 + VPN vs Direct connect + 專線穩定度高，一開始有牽專線 + 對latency沒有太大的幫助 + 比較貴 + Kubernetes networking model + Kubenet or CNI network + Kubenet: 不會expose IP到VPC + 節省IP + CNI network : Fast + Ingress ``` Client -> ingress -> L7 LB -> VM \-> Pods ``` + 選擇 + Application gateway (cloud provider) + 浮動IP會有問題，DNS failover不一定對所有商業模式都有用，有些可能需要固定IP白名單 + 自己組software ingress + nginx + traefik + 原生CRD支援比較好 + annotation + 可以跳過service直接走到pod，流量不用到Service再轉一圈，perfoermance比較好 + sticky cookie + middleware + IpWhiteList + RateLimit + Headers + ForwardAuth + Extenal DNS + 自動同步ingress service的資訊到外部DNS Server + Traefik + Traffic + KKBOX不太能停機 + traefik可以幫忙簡單做流量切換 + Traffic Mirroring: 上線前測試服務是否正常，承載量是否可以負擔 + Weighted Load Balancer + 逐步切換，減少 downtime + Secrets External Secrets Secret Store CSI Driver + Cost saving + 老闆對帳單會有意見 + Autoscaling + [KEDA](https://keda.sh/) + K8s-based Event Driven Autoscaler + 而KEDA是個以Kubernetes為基礎的事件驅動自動縮放器，用戶可以評估來自Azure、AWS、GCP、Redis和Kafka等服務的事件數量，作為驅動Kubernetes容器縮放的依據。 + 可以多指標，取最大值寫一個 cron 監控 + ARM64架構轉換 + 比 x86 便宜 XD + Multi-arch build and images + QEMU emulation + easier + cross-comile + golang 原生支援 + taint要記得設定 + 盡量排程到arm64 + Auditor通常看到k8s就跳過 XD ## K8s 網路除錯所需技能大全 - 邱宏瑋 (hwchiu) ### Personal * client和server關係、架構、問題描述要有足夠資訊才能查問題 * 排除無關元件/環境 * 釐清問題發生情境 * 嘗試重製問題 * 找錯誤 log, 錄封包 tcpdump/ksniff * 東西向流量 ex:cluster內的pod to pod, pod to service * pod to pod （同 node 情境）全都是 linux kernel 內資料轉移 * pod to pod （不同 node）有 overlay underlay 網路問題，每種 CNI問題不一樣 ex IPIP, vxlan * pod to service 如果採用 service產生的hostname 封包會先流向 CoreDNS service 主要兩個元件：clusterIP/NodePort IP internaltrafficpolicy externaltrafficpolicy * 南北向流量 cluster * WAN to pod ### 共筆講者演講分享： https://www.hwchiu.com/public_sharing + 網路問題 + 應用程式問題 + 網路元件的問題 + 如何除錯網路問題 + 排除無關之元件 + 釐清問題發生情境 + 東西向 + Pod to Pod + Pod to Service + 同節點 + 不會經過實體網卡，kernel內傳遞 + CNI + Host + Kernel Module + **Conntrack** + Firewall + 大部分CNI會自行修改 + 錄製封包 + + 跨節點 + 南北向 + Ingress/Egress + 嘗試重置問題 + 反覆執行上述流程 > 能越精確描述你的網路架構，就能夠越有效除錯網路封包問題假如有使用 Cilium，除錯工具就得用 Cilium 提供的；而導入 Istio 的話，流量又會是被 Istio 全部控制。有些網路不穩的問題甚至不是 K8s 本身組態造成，而是底層跑的 VM 實體網路連線的 NAT Gateway 設定導致。 ## 深入淺出 Kubernetes - smalltown ### Personal * deployment下面有 replica set，replica set 讓 pod 達到設定的數量 * service 抽象概念，定義會連到哪些pod * ingress 定義要去哪個 service * configmap 儲存不同環境的組態檔案 * kubernetes secret 只有做 base64 加密，會被decode輕易解密，我建議放重要資訊 * secret management ex: Vault * statefulsets: * stateful ex database/elasticsearch/mongoDB 存 user 資訊會有狀態 * stateless application, 指拿來給 user 打 * daemonsets: 讓 k8s 每個 node 都執行的服務 * label & selector 決定 resource * resource limitation & priorities * CPU memory limit & request * guaranteed, burstable, besteffort * health check and self-healing * business/startup/readiness probe * deployment: * argoCD 定期到 git server （github gitlab) 拉 yaml 檔部署 * horizontal pod autoscaler - pod 越來越多 * vertical pod autoscaler - 讓 pod 變大 * cluster autoscaler - 把 node 變多 * log management: fluentbit 丟到 elasticsearch * morning w/promethus * crashloopBackOFf: config 有問題,resource 不夠用,dependency issue.. * errorimagepull * scheduling/image/dependency issue ### 共筆 + K8s 101 + What is Container + A software packet technology. + Pros: reduce the gap between RD & OP + still need to deal with issue(open container, scale out etc) + Kubernetes manage the containers + Two type of nodes + Master + 主要核心元件 + 像人類的大腦 + api server:接收命令 + Controller: 讓k8s朝想要的方向運向 + 讓cluster維持想要的狀態 + Scheduler: 派送pod到worker node + 新的workload(pod)要開在哪個node由scheduler決定(分散loading) + etcd: 資料庫 + Worker + pod是最基本的元件 + Dive into K8s + K8s pod + K8s的最小元件 + 可以有一個或多個containers + Service + 定義Service會連到哪些pod + Ingress + 定義Request進來的時候，要交由哪個Service + K8s Secrets + 不建議使用 + 用base64編碼 + 僅可以存一些不重要的資訊 + K8s W/Secret Management + 可以放在像vault + K8s StatefulSets + 上面還是pod + 宣告Volumn, 讓pod可以使用 + i.e., MySQL + K8s DeamonSets + 運行在每一個node + i.e., log, monitoring + Labels and Selectors + Resource and + 最多只能用這麼多 + 通常是cpu& ram + Request:告訴scheduler至少要這麼多 + 超過cpu，會throttling(拿不到cpu), 超過memory會被砍掉 + 砍的順序 + best effort > bustable > guaranteed + K8s in Action + argo + 定期去git server拉yaml檔看是否一致，如果不一致就deploy + HPA + 讓pod變多 + VPA + 讓pod長高變胖(少用) + restart (有downtime) + Cluster autoscaler + 增加node + K8s log management + 用demon set裝fluentbit + Monitoring w/ prometheus operator MultiCluster Monitoring + Security and update management ## Cloud-native messaging service by NATS - 何秉賢 ### Personal ### 共筆 > https://nats.io/ > https://landscape.cncf.io/ + NATS introduction + messaging service (messaging broker/queue) + 2 modes + at most once delivery(Fire-and-forget) + NATS core + 不保證一定會收到 + at lease once delivery semantics + Streaming(Eos on 2023/6) + JetStream(2.2 GA, but after 2.8後才堪用) + JetStream 每個 msg 都有序號 seq，consume 會根據 seq 去讀取 + 可以設置 TTL + message distribution models: + pub-sub + queue group + request-reply + components and archeitecture + can deploy: cloud, on-prem, edge, iot + cluster - cluster 透過 gateway, nats-node <-> nats-node 透過 route + stream/consumer replication factor + meta group: stream/consumer location and info + stream groups + consumer groups + push consumers + 效能較快，但會有 msg 發送多次的狀況，因為 push 太快，來不及 ack 回去 + pull consumers + 較穩定好控制，建議初學者先用這個 + Suggestion + use the latest version + 先確認使用情境，而非用新的功能(i.e.,key value) + NATS community 很活躍，有問題可以反應 + e.g Propose server and client version should have mapping table + Accepted, so start to provide NATS ADR + KEDA, OpenFaaS也支援 NATS JetStream + Demo Video + > https://github.com/phho/nats-demo-video ## K8S Alerts Auto Remediation (Auto Runbook) - 林士涵 (Kim) ### Personal * observability: metrics traces logs * runbook: 一個howto指南for完成運營流程或程序 * robusta: opensource observability tool for k8s * trigger action * sinks: send notification to various destinations ex:teams (opsgenie 拋 alert) * enrichment(chatgpt): fetch data * auto remediation 可以自定義 prometheus 的 alert type 以及 action, 有遇到 CPU Memory 調用沒有權限可以在 k8s bind RBAC role 來解決 * forwarder,runner ### 共筆 `opensource` ref#1: https://home.robusta.dev/ ref#2: https://github.com/robusta-dev/robusta --- + Robusta’s behaviour :::success >Robusta's behaviour is defined by rules like this: * triggers * actions * sinks ::: **亮點:** 不只alert還提供建議修改按鈕. ![](https://hackmd.io/_uploads/ry9Jfwif6.png) **Remediation** (Actions) https://docs.robusta.dev/master/catalog/actions/remediation.html ref#3: https://github.com/robusta-dev/robusta#%EF%B8%8F-how-it-works --- * Architecture :::info > Component * Forwarder * Runner ::: ![](https://hackmd.io/_uploads/H1EI4PsGp.png) ref#4: https://docs.robusta.dev/master/how-it-works/architecture.html --- * Configuration :::warning > After installing robusta ... [configuration] * **Integrating** AlertManager and Prometheus ::: ref#5: https://docs.robusta.dev/master/configuration/alert-manager.html