Openshift cluster monitoring operator 我才不告訴你勒

# Openshift cluster monitoring operator 我才不告訴你勒  slide: https://hackmd.io/p/OonUQ9QKQ7-7JPBd1N9tOA?both --- We have a collaborative session please prepare laptop or smartphone to join! --- ## Who am I? - Jason Li - SRE/Backend developer - :heart: kubernetes Go Rust - :cat: lover - 不斷的從入門到放棄 --- ## Agenda - Background - Related Work - Method - Conclusion --- ## Background Prometheus Operator, Prometheus, Prometheus Adapter, kube-state-metrics, ... e.t.c. In order to manage such diverse components, a centralized management configuration file is required. --- ## Related Work - UI - Prometheus - Metrics - Thanos --- ### UI - Grafana --- ### Prometheus - Prometheus Operator - Prometheus-k8s :-1: - Prometheus-user-workload - Alertmanager --- #### Prometheus Operator - Provide Kubernetes native deployment and management related monitoring components. - automate the configuration of a Prometheus based monitoring stack for Kubernetes clusters. - Prometheus - Alertmanager - Related components --- #### Prometheus Operator(cont’d) ![](https://prometheus.io/assets/architecture.png) --- ### Metrics - node-exporter - kube-state-metrics - openshift-state-metrics :-1: prometheus-adapter :-1: Telemeter Client :-1: configuration sharing --- #### node-exporter - Node exporter for hardware and OS metrics exposed by *NIX kernels. - We can scrape, including a wide variety of system metrics further down in the output (prefixed with node_). --- ```bash # HELP node_network_transmit_queue_length transmit_queue_length value of /sys/class/net/<iface>. # TYPE node_network_transmit_queue_length gauge node_network_transmit_queue_length{device="br0"} 1000 node_network_transmit_queue_length{device="eth0"} 1000 node_network_transmit_queue_length{device="lo"} 1000 node_network_transmit_queue_length{device="ovs-system"} 1000 node_network_transmit_queue_length{device="tun0"} 1000 node_network_transmit_queue_length{device="veth24377b8e"} 0 node_network_transmit_queue_length{device="veth58bd788d"} 0 ... ``` --- ![](https://imgconvert.csdnimg.cn/aHR0cHM6Ly9sZWVoYW8ub3NzLWNuLXNoZW56aGVuLmFsaXl1bmNzLmNvbS8yMDIwLTAyLTE2LTEzMjg0Ni5qcGc?x-oss-process=image/format,png) --- #### kube-state-metrics - Focused on the health of the individual Kubernetes components, such as deployments, nodes and pods. - Exposes raw data unmodified from the Kubernetes API - Designed to be consumed either by Prometheus --- #### openshift-state-metrics - Expands upon kube-state-metrics by adding metrics for OpenShift specific resources. - Expose cluster-level metrics for OpenShift specific resources --- #### openshift-state-metrics (cont’d) - BuildConfig Metrics - Build Metrics - DeploymentConfig Metrics - ClusterResourceQuota Metrics - Route Metrics - Group Metrics ref: https://github.com/openshift/openshift-state-metrics --- ### Thanos - Thanos - Thanos Querier - Thanos Ruler --- ### Thanos - Have a global view - Have an HA in place - Unlimited retention ref : https://kkc.github.io/2019/08/22/coscup-ha-prometheus-solution-thanos/ --- ### Thanos ![](https://i.imgur.com/lv8EffA.png) ref : https://banzaicloud.com/blog/multi-cluster-monitoring/ ## Method |Component|Key| |--- |--- | |Prometheus Operator|prometheusOperator| |Prometheus|prometheusK8s| |Alertmanager|alertmanagerMain| |kube-state-metrics|kubeStateMetrics| |openshift-state-metrics|openshiftStateMetrics| |Grafana|grafana| |Telemeter Client|telemeterClient| |Prometheus Adapter|k8sPrometheusAdapter| |Thanos Querier|thanosQuerier| --- ## Method (cont’d) - Only Prometheus and Alertmanager have extensive configuration options. - Other components usually provide only the nodeSelector field. --- ## Method (cont’d) move components to the node ```yaml data: config.yaml: | prometheusOperator: nodeSelector: foo: bar prometheusK8s: nodeSelector: foo: bar ``` --- persistent volume claim ```yaml data: config.yaml: | prometheusK8s: volumeClaimTemplate: metadata: name: localpvc spec: storageClassName: local-storage resources: requests: storage: 40Gi ``` --- ### custom Alertmanager configuration - At this stage, cluster monitoring does not provide Alertmanager settings --- ## Conclusion :100: :muscle: :tada: ### Wrap up - Self-updating monitoring stack that is based on Prometheus wider eco-system - Provides monitoring of cluster components - Expect to manage each component through the configuration file:tada: --- ## Thank you! :sheep: