DevOpsDays Taipei 2018
9/12
13:30~13:55
Track A
Algorithmic IT Operation
你的系統現在運作良好嗎?
你可能會先看監控(CPU、Disk …)?
user 說好才是好,user 說不好就是不好。
使用總時間與可用時間來衡量系統可用率
SLI算出來後,定立一個目標,目標稱為SLO
症狀發作
找出根因
解決根因
回復正常
拿 avalibiily 抓 Error Budget
定出 Ddaily Error threshold
超出每日可容許的量時發警告
固定 threshold / trending 的情況:中間的部份 is Normal?
Random Cut Forest
metrics 長像相似時。
Dynamic Time Warping
不只可做 RCA,還可做預測
AIOps:減少人工遺漏
SLO Monitoring:避免 alert noise過多
場外聊天室,歡迎在下方喇賽
血糖血壓血脂
例:運動後、飯後的指數會超出平時的正常值
有時候系統重開機,就無法找到root cause.就像犯案現場被破壞一樣。
如果系統掛掉,自動開台新的,把流量導過去,但是舊的 instance 保留著 debug, 也許是個可考慮的作法。不過這樣很燒錢…
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing