SRE Conference
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# 做 SRE 還是要靠通靈?讓我們看見看不到的東西 - 曾光毅 (光光) ###### tags: `2023` {%hackmd @sre-conf/H1pCafrG3 %} 普羅米修斯 如何知道硬體面的問題 - Tempetature、Fan Speed - CPU, Memorym Disk I/O - Network - Cloudwatch Prometheus Exporters (硬體) - ipmi_exporter - node_exporter - cloudwatch_exporter Grafana dashboard for node_exporter ## 因應方式(硬體問題) - 解決問題 - 重開機就對了~~Server reboot engineer~~(不是 - 修復設備(找專業的來 - 購買更好的硬體 - 擴充原本的叢集 - 選擇更高資源的規格 (cloud) ###### 如何知道架構面的問題 - kubernetes cluster - database, queue, logging service - Networking Prometheus Exporters - prometheus operator:監控Pod - kafka - Mysqld_explorter - elasticsearch https://istio.io/latest/docs/tasks/observability/kiali/ ## 因應 - 解決問題 - 重啟服務 (Service Restart Engineer) - 修復監控到的告警 - 叢集資源調度 - 伺服器資源擴充 ### 如何知道服務面的問題 - application - logs - performance Prometheus Exporters - JMX_exporter(JAVA) - common library ## 因應 - 解決問題 - 看 log 並確認問題發生原因 - 找 RD 協助 - 服務資源調配 - 探詢系統延遲發生原因 - 尋找自動擴容的可能 ## 其他問題得知點 - Device - Online / Offline - Device location 因為 NextDrive 是賣 iot device 的 所以自己寫 prometheus client,從 db 拿資料 ## 總結 - 業務面監控 - 服務面監控 - 架構面監控 - 硬體面監控 ## 其他考量 - 監控數據量 -> iot 數據量很大 - 數據的保存 - 數據的抽樣 - 監控目的 - 為何監控 - 告警後的處理方法 - 成本 https://www.104.com.tw/job/6x70p?jobsource=company_job --- ==聊天區== ==直播聊天室開始== --- 是不是SRE九成都是男性 -> 這可能SRE無關 XD? 理工科的常態 -> ~~恭喜,問了就是~~ 沒,不過看剛剛廁所的狀況今天來的確實男生比較多 同意呀!!跨年的女廁排的都沒今天多人 科技廠尾牙也都男廁排隊比女廁多 ~~早知道當初不唸理工 改唸護理就好了~~ -> 有沒有可能唸護理也一樣 -> 遊戲設計表示,我們男生也比女生多... 護理逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經 ~~投履歷不如投胎~~ 有一年去參加 DevOps 的社群活動,全場只有我跟另外一個女生,今天算多了 妳確定那些都是女性SRE? 要看開始戰監控軟體了嗎? ~~早上國泰推薦Zabbix,現在被戰了嗎?~~ 其實國泰也有用prometheus+grafana 沒錯,其實都是各種拿來用啦XD -> ~~grafana啦~~收到:) Prometheus vs. Zabbix ~~小孩才做選擇,我全都要~~ Rancher可以一鍵安裝 普羅米修斯 -->rancher安裝的是他自己改過的promerheus----------------------------------- -->安裝不是重點,重點是寫exporter,這才是學不完的東西(苦啊 講者看時間是想要確認什麼?? -->確認他能不能撐完時間 ~~重新定義 SRE~~ 77777777 SRE : Server Reboot Engineering = System Restart Engineering good! 定義的真好 Good 重開救世界 重灌解千愁 -> ~~這我們公司的現況:)~~ 呂布治百病 普羅米修斯要花錢嗎 開源的 -> 你說會有異形跑進體內產蛋的那種嗎(誤) --> 這是他的監控原理與套件安裝方式嗎? (吸住臉部然後休眠,最後從肚子跑出來) ---> agent? ->大衛點了一個讚 個人有免費的,之前研究過 不過只有他=沒用 他就有點像是???? 他到底在供三小??? 感謝祭的概念 摁... 圖片支援? ~~感謝~~? ![](https://i.imgur.com/Ryiwaus.png) ``` 男男男男男男男男男 男男男男男男 男男男 女 ``` ^^^這個怎麼像是謎片??^^^ ``` AVOP-ll7? ``` 女生坐沙發,後面5個黑人? 是多小的系統要用session來確認db有沒有人在用 我已經舞法分辨是不是反串了 結論:要很多的女生 -> 正確 講個認真的,監控到consumer不夠,接下來就要看怎麼自動擴展了 用 KEDA! https://keda.sh/ 又確認時間???? 其實左後方會舉牌 哪有剛開始就舉牌的 但好歹化學反應不錯啦 只是 XD 看來nextDrive內部應該有定期standup comedy 這我現在腦袋的狀況⬇️ ![](https://i.imgur.com/BTbe7EH.png=45px) ~~漲姿勢~~ 認真講,以前楓之谷私服每天都要重啟 今天的最大收穫....重新定義了sre -> 三個蜘蛛人指著彼此(SRE) --> ![支援](https://i.imgur.com/f8nkKZl.png=25px) 剛剛的那張圖形化節點的錯誤畫面,是真的可以即時顯示出來的嗎? -> 你說 kiali 嗎 ->真的istio ->kiali 的功能 -> 裝個 NeuVector 就可以看到了,別真的裝 Istio + kiali -> NeuVector不錯,可以看到網路流向目標,他是透過分析node network log,istio是做service mesh -> 真的不錯,而且還有end to end 的資安防護,還可以寫WAF Rule,或是限制Pod內的指令,還有檔案異動監控等等... -> neuvector遇過的雷:gke autopilot會裝不起來 -> 感謝,知道關鍵字就好找了 https://ithelp.ithome.com.tw/articles/10252153 其實他算是有料 但是為什麼要搞笑 -> 搞笑很好啊XD * 問個認真的,請問這些 Observability dashboard, 除了 SRE 在維護外, 有些跟 appplication developer 比較相關哪會需要 customize,大家都怎麼合作的呢? 感謝! 1. 回答1:我會選擇開 Group 或是 Folder,看是他們要自己建,還是我幫他們建,不過切記開權限讓他們拉 Dashboard,要記得做好備份,偶而會發生他們誤砍來求救,不過大部分都是不想要用。 --> 我碰到多半也是不想用;備份確實是重點! 回答1補充,我自己在設定的時候,有過 Grafana 升級,結果 Cloudwatch Logs 功能失效,如果只有一座 Grafana 或是儀表板工具,可能還是要注意升級時通知的問題,這東西跟 Application 上版不一樣,算是輔助工具,有些公司不一定會出 Change Plan 通知大家 2. 回答2:Developer 一條龍 要看什麼加什麼 --> ~~你就是那條龍~~ 它...掛著公司的名稱 這樣真的沒問題嗎? 是不是用Splunk也可以? -> Splunk貴啊 -> 要錢啊 -> 養ELK只是改花人事費XD -> ELK 資料收到的大多是垃圾資訊,真的要的資料還會找不到,資料管理也是問題 -> 沒有用過 elk 也好奇想問: 之前有耳聞說也有點學習門檻, 有大大可以分享嗎? --> 門檻應該是在log的處理上面,如何取用你要的資訊,以前還有資料的管理跟備份,但現在內建的機制都不錯了。 用Grafana Loki也不錯,找log跟grafana整合佳,資源需求也比elk小... -> 同意,我們也是有開給開發者去看Log -> loki 的 chunk 太多會導致檔案系統掛掉,要注意 1. [Prometheus Metrics for Splunk](https://github.com/lukemonahan/splunk_modinput_prometheus) 2. [Prometheus Exporter](https://docs.splunk.com/Observability/gdi/p?rometheus-exporter/prometheus-exporter.html) $ 感謝分享 聊天室太鬧了,笑死我了😆 戰線switch 原來nextcloud跟nextdrive不一樣

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully