AIOps 思維的 Observability

# AIOps 思維的 Observability ###### tags: `DevOpsDays Taipei 2018` `9/12` `13:30~13:55` `Track A` ## What is AIOps ? Algorithmic IT Operation ## 重新思考 Service Observability 你的系統現在運作良好嗎？你可能會先看監控(CPU、Disk ...)？ ### but 什麼是好？ user 說好才是好，user 說不好就是不好。 ## SLO(Service Level Objective)三大面向 - 能不能用 - 速度多快 - 是否正確 ### SLI 是一個測量值使用總時間與可用時間來衡量系統可用率 ### SLO 從 SLI 衍生的一個目標 SLI算出來後，定立一個目標，目標稱為SLO ## 那如果有遇到incident? 症狀發作找出根因解決根因回復正常 - symptom alerts(症狀偵測) - auto healing（快速恢復） - anomaly detection - Root cause analysis 拿 avalibiily 抓 Error Budget 定出 Ddaily Error threshold 超出每日可容許的量時發警告 ## Anormaly Detection - when metric is trending 固定 threshold / trending 的情況：中間的部份 is Normal? Random Cut Forest ## Root Cause Analysis metrics 長像相似時。 Dynamic Time Warping 不只可做 RCA，還可做預測 ## 總結 AIOps：減少人工遺漏 SLO Monitoring：避免 alert noise過多 --- > 場外聊天室，歡迎在下方喇賽血糖血壓血脂例：運動後、飯後的指數會超出平時的正常值有時候系統重開機，就無法找到root cause.就像犯案現場被破壞一樣。 > 如果系統掛掉，自動開台新的，把流量導過去，但是舊的 instance 保留著 debug, 也許是個可考慮的作法。不過這樣很燒錢...

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.