SRE Conference
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 做 SRE 還是要靠通靈?讓我們看見看不到的東西 - 曾光毅 (光光) ###### tags: `2023` {%hackmd @sre-conf/H1pCafrG3 %} 普羅米修斯 如何知道硬體面的問題 - Tempetature、Fan Speed - CPU, Memorym Disk I/O - Network - Cloudwatch Prometheus Exporters (硬體) - ipmi_exporter - node_exporter - cloudwatch_exporter Grafana dashboard for node_exporter ## 因應方式(硬體問題) - 解決問題 - 重開機就對了~~Server reboot engineer~~(不是 - 修復設備(找專業的來 - 購買更好的硬體 - 擴充原本的叢集 - 選擇更高資源的規格 (cloud) ###### 如何知道架構面的問題 - kubernetes cluster - database, queue, logging service - Networking Prometheus Exporters - prometheus operator:監控Pod - kafka - Mysqld_explorter - elasticsearch https://istio.io/latest/docs/tasks/observability/kiali/ ## 因應 - 解決問題 - 重啟服務 (Service Restart Engineer) - 修復監控到的告警 - 叢集資源調度 - 伺服器資源擴充 ### 如何知道服務面的問題 - application - logs - performance Prometheus Exporters - JMX_exporter(JAVA) - common library ## 因應 - 解決問題 - 看 log 並確認問題發生原因 - 找 RD 協助 - 服務資源調配 - 探詢系統延遲發生原因 - 尋找自動擴容的可能 ## 其他問題得知點 - Device - Online / Offline - Device location 因為 NextDrive 是賣 iot device 的 所以自己寫 prometheus client,從 db 拿資料 ## 總結 - 業務面監控 - 服務面監控 - 架構面監控 - 硬體面監控 ## 其他考量 - 監控數據量 -> iot 數據量很大 - 數據的保存 - 數據的抽樣 - 監控目的 - 為何監控 - 告警後的處理方法 - 成本 https://www.104.com.tw/job/6x70p?jobsource=company_job --- ==聊天區== ==直播聊天室開始== --- 是不是SRE九成都是男性 -> 這可能SRE無關 XD? 理工科的常態 -> ~~恭喜,問了就是~~ 沒,不過看剛剛廁所的狀況今天來的確實男生比較多 同意呀!!跨年的女廁排的都沒今天多人 科技廠尾牙也都男廁排隊比女廁多 ~~早知道當初不唸理工 改唸護理就好了~~ -> 有沒有可能唸護理也一樣 -> 遊戲設計表示,我們男生也比女生多... 護理逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經逝去抓神經 ~~投履歷不如投胎~~ 有一年去參加 DevOps 的社群活動,全場只有我跟另外一個女生,今天算多了 妳確定那些都是女性SRE? 要看開始戰監控軟體了嗎? ~~早上國泰推薦Zabbix,現在被戰了嗎?~~ 其實國泰也有用prometheus+grafana 沒錯,其實都是各種拿來用啦XD -> ~~grafana啦~~收到:) Prometheus vs. Zabbix ~~小孩才做選擇,我全都要~~ Rancher可以一鍵安裝 普羅米修斯 -->rancher安裝的是他自己改過的promerheus----------------------------------- -->安裝不是重點,重點是寫exporter,這才是學不完的東西(苦啊 講者看時間是想要確認什麼?? -->確認他能不能撐完時間 ~~重新定義 SRE~~ 77777777 SRE : Server Reboot Engineering = System Restart Engineering good! 定義的真好 Good 重開救世界 重灌解千愁 -> ~~這我們公司的現況:)~~ 呂布治百病 普羅米修斯要花錢嗎 開源的 -> 你說會有異形跑進體內產蛋的那種嗎(誤) --> 這是他的監控原理與套件安裝方式嗎? (吸住臉部然後休眠,最後從肚子跑出來) ---> agent? ->大衛點了一個讚 個人有免費的,之前研究過 不過只有他=沒用 他就有點像是???? 他到底在供三小??? 感謝祭的概念 摁... 圖片支援? ~~感謝~~? ![](https://i.imgur.com/Ryiwaus.png) ``` 男男男男男男男男男 男男男男男男 男男男 女 ``` ^^^這個怎麼像是謎片??^^^ ``` AVOP-ll7? ``` 女生坐沙發,後面5個黑人? 是多小的系統要用session來確認db有沒有人在用 我已經舞法分辨是不是反串了 結論:要很多的女生 -> 正確 講個認真的,監控到consumer不夠,接下來就要看怎麼自動擴展了 用 KEDA! https://keda.sh/ 又確認時間???? 其實左後方會舉牌 哪有剛開始就舉牌的 但好歹化學反應不錯啦 只是 XD 看來nextDrive內部應該有定期standup comedy 這我現在腦袋的狀況⬇️ ![](https://i.imgur.com/BTbe7EH.png=45px) ~~漲姿勢~~ 認真講,以前楓之谷私服每天都要重啟 今天的最大收穫....重新定義了sre -> 三個蜘蛛人指著彼此(SRE) --> ![支援](https://i.imgur.com/f8nkKZl.png=25px) 剛剛的那張圖形化節點的錯誤畫面,是真的可以即時顯示出來的嗎? -> 你說 kiali 嗎 ->真的istio ->kiali 的功能 -> 裝個 NeuVector 就可以看到了,別真的裝 Istio + kiali -> NeuVector不錯,可以看到網路流向目標,他是透過分析node network log,istio是做service mesh -> 真的不錯,而且還有end to end 的資安防護,還可以寫WAF Rule,或是限制Pod內的指令,還有檔案異動監控等等... -> neuvector遇過的雷:gke autopilot會裝不起來 -> 感謝,知道關鍵字就好找了 https://ithelp.ithome.com.tw/articles/10252153 其實他算是有料 但是為什麼要搞笑 -> 搞笑很好啊XD * 問個認真的,請問這些 Observability dashboard, 除了 SRE 在維護外, 有些跟 appplication developer 比較相關哪會需要 customize,大家都怎麼合作的呢? 感謝! 1. 回答1:我會選擇開 Group 或是 Folder,看是他們要自己建,還是我幫他們建,不過切記開權限讓他們拉 Dashboard,要記得做好備份,偶而會發生他們誤砍來求救,不過大部分都是不想要用。 --> 我碰到多半也是不想用;備份確實是重點! 回答1補充,我自己在設定的時候,有過 Grafana 升級,結果 Cloudwatch Logs 功能失效,如果只有一座 Grafana 或是儀表板工具,可能還是要注意升級時通知的問題,這東西跟 Application 上版不一樣,算是輔助工具,有些公司不一定會出 Change Plan 通知大家 2. 回答2:Developer 一條龍 要看什麼加什麼 --> ~~你就是那條龍~~ 它...掛著公司的名稱 這樣真的沒問題嗎? 是不是用Splunk也可以? -> Splunk貴啊 -> 要錢啊 -> 養ELK只是改花人事費XD -> ELK 資料收到的大多是垃圾資訊,真的要的資料還會找不到,資料管理也是問題 -> 沒有用過 elk 也好奇想問: 之前有耳聞說也有點學習門檻, 有大大可以分享嗎? --> 門檻應該是在log的處理上面,如何取用你要的資訊,以前還有資料的管理跟備份,但現在內建的機制都不錯了。 用Grafana Loki也不錯,找log跟grafana整合佳,資源需求也比elk小... -> 同意,我們也是有開給開發者去看Log -> loki 的 chunk 太多會導致檔案系統掛掉,要注意 1. [Prometheus Metrics for Splunk](https://github.com/lukemonahan/splunk_modinput_prometheus) 2. [Prometheus Exporter](https://docs.splunk.com/Observability/gdi/p?rometheus-exporter/prometheus-exporter.html) $ 感謝分享 聊天室太鬧了,笑死我了😆 戰線switch 原來nextcloud跟nextdrive不一樣

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully