owned this note changed 5 years ago
Linked with GitHub

How We Integrate and Develop Private Cloud in LINE - Gene Kuo

tags: COSCUP2020 入門 AU

歡迎來到 https://hackmd.io/@coscup/2020 共筆

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

點擊本頁上方的 開始用 Markdown 一起寫筆記!
手機版請點選上方 按鈕展開議程列表。

請從這裡開始

Why LINE chosen to build it own private cloud?

  • 在建置私有雲之前公司內就有很大的infra-structure,有很多伺服器和網路資源
  • 運用已經有的東西,搭配軟硬體取得更多好處
  • public cloud 有很多的隱藏成本,帳單可能不易讀,新功能請求不易,無法針對公司需求
  • 不會被 vendor lock-in,也很容易做客製開發

Change in Motivation

  • private cloud
    解決其他公司部門開發所遇到的問題
  • public cloud
    把公司的政策想辦法套用在公有雲上
  • Work as a proxy to public cloud services
  • feasibility check of public cloud services against company policies

Basic Overview of Verda

  • line內部的私有雲稱verda,有production和dev兩種版本
  • FaaS
    • function as a service
  • PaaS
    • k8s, Kafka, redis
  • IaaS
  • 規模
    超過2000台hypervisor, 50k VM, 20k實體機

What difficulties / pain point we faced to integrate and operate open source projects in our private cloud

  • upstream version does not 100% fit our needs
    • openstack upstream並沒有對pci device做quota管理
    • 自行 patch openstack來達成功能 Quota 控制。 eg. GPU / MEM / CPU allocation (其實花錢買也有 Horizon 替代品)
  • Upgrade/Operation
    • 成本高,需要多注意
  • Complexity
    • 軟體stack本身已經夠複雜,除錯會增加更多難度
    • 每週針對 issue/ projects 做 technical sharing,讓團隊成員更了解軟體運作,對開發有很多好處
    • 遇到問題是在內部wiki做紀錄,如何從零開始出錯及其流程,發現的 root cause 和如何解決
  • Perf issue on large scale
    • 負責人1/2: 需公司養一個 DevOp 團隊
    • openstack upstream 使用 filter scheduler,在 >=1000台以上hypevisor上同時啟動上百台VM instance會導致timeout(resource constraint)
    • 短期:修改scheduler, cache weighter result (減少每次schedule後需要重新weight的overhead,可到650台vm都不會timeout)
    • 長期:開發自己的scheduler: waiting strategy: filtering, sorting:
      • Solution: Weight first(in queue) -> select VM instance yo start

How do we solve these problems

What difficulties / pain point we faced when developing our in-house features and components on top of OpenStack

  • 與 upstream 用相同的開發模式(the Openstack way)
    • 整合line本身內部的系統
      • mapping line's org structure
    • OpenStack and ACL
      • DB ACL in k8s issue: Dynamic IP to Fixed IP
    • 功能整合
  • 不是每個更改都可以做成 plugin

Manage Custom changes

  • patch要很容易打到upstream上 for easier maintainance
  • 很容易的可以cherry-pick upstream功能
  • 越少customization越好
    • 對upstream有利,就送回去
  • a repo containing inly patches
  • separate different types of patches

future works

  • upgrade openstack version
  • more scalable IaaS service
    • integrate in-house scheduler to openstack
  • disaster recovery trsts
  • scale tests/benchmarks

demo

  • git diff to file
  • custom-source
  • make review: 方便reviewer了解做了哪些更改

https://engineering.linecorp.com/en/blog/verda-platform-team/

Select a repo