owned this note changed 7 years ago
Linked with GitHub

Embracing Automation - An Autoscaler mechanism by using Saltstack and Prometheus

tags: DevOpsDays Taipei 2018 9/12 13:30~13:55 Track B

歡迎來到 DevOps Days 2018 共筆

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

共筆入口:https://hackmd.io/c/DevOpsDays2018
手機版請點選上方 按鈕展開議程列表。

在大會遇到任何問題都可以在下方的問題回報區中留言
大會問題與建議回報區

請從這裡開始

  • Software engineer - HTC DeepQ

投影片請點這

About DeepQ AI Platform

  • Lower the barrier of AI training

Operation Challenges

  • Maintain dev/sta/prod env
  • No. Of servers to be monitor increases
  • Diagnoses & provide feedback
  • Automate infrastructure management
    • GPU instrastrycture resource

Candidate solutions

  • Using existing cloud service solutions: ASW, GCP

    • increase deployment time
    • terminate instances when instances are idle
  • Implement by ourselves

    • Reduce deployment time : reserve instances
    • Scalability : can apply on exiting cloud service implement

Overall architecture

What choices do I Have

  • Cloudwatch
  • stackdriver
  • elasticsearch + kibana
  • Prometheus

Metric based vs Log based

Metrics Log
Exact Counter X O
Error cause X O
network bandwidth
Storage usage
Detect Incidents O O

Why Prometheus

  • A metric-based monitor system

  • We care about metrics like

Simple Prometheus Archetecture

What is autoscaler

What is Saltstack?

  • A configuration management tool

    • Flexible , Scalable to maintain 10000 of machines
    • ..
  • a remote execution framework

    • ..
  • Secure

    • Salt minion key authentication

What is event (in Salt event-driven system)

  • Everything you car about

Event and Reactor

  • An event-driven infra

Summary

  • The autoscale mechanism can apply on non-container based instance
  • Before survey new techniques, think purpose first
    • can services be containerized?
    • do the existing solutions meet our requirements?
    • Monitorning: metrics-based or log-based?
    • Configyratuib Management Tool : Ansible,Chef,Puppet,Saltstack ..etc

場外聊天室,歡迎在下方喇賽

忽然覺得這場有點快XD

+1 還沒理解完就跳下一張投影片了..

看時間還有五分鐘才結束噎XD
有人理解他這東西在解決什麼問題嗎?(理解不能

感覺是自幹 autoscaling,因為現有解決方案有些限制

Select a repo