# System Monitoring --- ``` $ whoami Hoang Manh Tien <tienhm@eway.vn> - 1990 Dad, Backend, SysAdmin, DevOps, SRE Stack: python, postgres, mysql, redis, gitlabci, phabricator, arcanist, docker, k8s, gcp, nginx, git ``` --- ### Monitoring is: <span>- System status<!-- .element: class="fragment" data-fragment-index="0" --></span> <span>- Detect incident --> Avoid disaster<!-- .element: class="fragment" data-fragment-index="1" --></span> <span>- Define Threshold and Alert<!-- .element: class="fragment" data-fragment-index="2" --></span> --- ### Monitoring is not: <span>- Raw log/Event collection<!-- .element: class="fragment" data-fragment-index="0" --></span> <span>- Request tracing<!-- .element: class="fragment" data-fragment-index="1" --></span> <span>- "Black Magic" detection<!-- .element: class="fragment" data-fragment-index="2" --></span> <span>- Durable long-term storage<!-- .element: class="fragment" data-fragment-index="3" --></span> <span>- Automation horizontal scaling<!-- .element: class="fragment" data-fragment-index="4" --></span> --- ### Expected ![](https://i.imgur.com/uOj5Z6h.png) --- ### Reality ![](https://i.imgur.com/D3ijhUV.png) --- ![](https://i.imgur.com/dYSOz5C.png) --- ![](https://i.imgur.com/4fHsJGS.png) --- ### SLI, SLO, SLA, oh my - SLI - Service Level Indicator - SLO - Service Level Objective - SLA - Service Level Agreement Note: SLI: Alive signal SLO: Objective of aliveness SLA: Contract of objective --- ### Risk Management - Error budgets --- ### Alert Avoid spamming alerts: - Oncall - Ticket --- ### References [Site Reliability Engineering](https://goo.gl/aCiiPV) [How SRE relates to DevOps](https://goo.gl/NWKuj9) [Liz & Seth playlist](https://goo.gl/CKv3tV) [SLI, SLO, SLA](https://cloudplatform.googleblog.com/2018/07/sre-fundamentals-slis-slas-and-slos.html) [LinkdedIn experiement](https://www.slideshare.net/ToddPalino/im-no-hero-full-stack-reliability-at-linkedin) [End to end monitoring with Prometheus](https://www.slideshare.net/Paris_Container_Day/end-toend-monitoring-with-the-prometheus-operator-max-inden)
{"metaMigratedAt":"2023-06-14T17:57:01.370Z","metaMigratedFrom":"YAML","title":"System Monitoring","breaks":true,"slideOptions":"{\"transition\":\"slide\",\"theme\":\"moon\"}","contributors":"[{\"id\":\"e6b773b2-cc89-48e8-972b-f54bc2b4028d\",\"add\":2620,\"del\":419}]"}
    656 views