###### tags: `Meetup`、`Co-writing` # CNTUG Meetup #22 [TOC] --- ## Announcement ![](https://i.imgur.com/NnhzoAC.png =300x) - 請多加參與社群 Meetup 共筆 [HackMD: Cloud Native Taiwan User Group](https://hackmd.io/@CNTUG) - 多多支持台灣新創 HackMD [Use HackMD professionally and collaborate at scale](https://hackmd.io/pricing) ![](https://i.imgur.com/NLJmfwo.png =300x) - CNTUG <3 天瓏書局 > 多多買書回家讀,是充實知識最快的方式 --- ## Session 1. How to achieve canary deployment on Kubernetes - John Chen ([Grindr](https://www.grindr.com/)) QA 內部考量? 穩定度,承受服務的負載量 ### 藍綠部署 (Blue-Green Deployement) 遇到的問題點 * 連線是有狀態的(帳號密碼) * 新舊版本不相容 * 叢集系統 * 有持久性的資料儲存: 最怕遇到資料不對 ### 金絲雀部署 (Canary Deployment) 逐漸將生產環境流量從版本A切換到版本B。通常流量是按比例分配的。 例如90%的請求流向版本A,10%的流向版本B,轉移的趴數是根據的自己團隊去評估,趴數低不一定能及時看到狀況,如果不想等這麼久就切大量。 適合使用的時機: * 連線是無狀態 * 沒有 QA 進行測試 * 服務之間的流量和延遲問題 * 大量併發時才會發現的問題 ### Original Deployment * 使用 Route53 做 domain weight 導流 > Deployment 有版本控制,跟 ReplicaSet 功能做分工 ### Componnet relationship | Object Type | 對應比例 | | -------- | -------- | | Pod: Container| 1:N | | Replicaset: Pod| 1:N | | Deployment: ReplicaSet |1:N| | Deployement: Service| 1:N | | Service: Deployment| 1:N | **Kubernetes Canary Deployment** 可以藉由調整Replicas的方式去調整服務流量的百分比 e.g. 舊的deployement => replicas: 3 新的 => replicas: 1 => 25% new service 的設定不需要改動 透過給予不同 Label,讓 Service 導向到不同的 Deployment ,以此來做到金絲雀部署: ```yaml Deployment (Stable Version) labels: - app: guestbook - tier: frontend - track: stable Replicas: 9 Deployment (Canary Version) - app: guestbook - tier: frontend - track: canary Replicas: 1 Service selector: - app: guestbook - tier: frontend ``` ### Questions * 3 deployment 2 service => Canary deployment * 每次都要改 yaml 再執行 kubectl 指令 * 操作指令很複雜,會不會下錯指令 Github -> Jenkins -> Helm -> Kubernetes Helm upgrade chart 變更即可,數值的改動由Jenkins進行 ### Conclusions * Script can do, Jenkins can do, either. * Don't be limited by tools to do CI/CD * Useful approach is the most important. * 站穩腳跟再往前 ### Questions * Preprod : production = 1:1? => 機器關掉會不會導致上版受影響? * 流量還是沒有平均分配? => 有些服務如 Redis,一但建立好連線就不會重新連了,因此可能會有不管怎麼改比例,流量都不會平均分配的情況。 => 程式的連線建議用暫時的 cache 就好,三不五時問新的連線(rabbitmq 等等) => 也可以使用 Consul 做 Service Discovery * 為何不用ingress ? => AWS 上 type = LoadBalancer 即可,用一個LB屬於一個服務的方式在查問題比較好查,但是成本會比較高 * 為何不用 Jenkins Pipeline ? => 工程師學習成本,Jenkins 掛掉的風險 ### Supplement Multiple redis servers in one VM? ### Q&A * 新版本的服務,Log 如何追蹤? => 在 ES(Elaticsearch) 裡面可以用Image去看,但不推薦使用 Image Tag 去區分,因為 Image Tag 可能會隨時一直改變,最好改用固定的 Tag 來監控。 方式: 在yaml的區段會下ENV: staging/prod等tags, 可以透過filter搜尋的方式去找尋是哪個stage的log * 切換的比例如何拿捏? => 用算的,因為每個服務的Pods不見得數量相同。例如50% 25% --- ## Session 2. Massive Bare-Metal Operating System Provisioning Improvement - Date Huang (Engineer, Edgecore Networks) Focus on Bare Metal Deployment Unitcast: * Maas: Metal as a Service * Broadcast Multicast ### Problems need to be resolved * Unicast is Hard to scale up/out => Server bandwith is not enough * Multicast/Broadcast is NOT always avaiable ##### Solution: BitTorrent [Peer to Peer] * Peer can be sender too * Reduce the stress of origin sender * Temporary Storage is required * Can't assure data order * Image size will be Limit by RAM #### Solution: [EZIO](https://github.com/tjjh89017/ezio) * EZIO design * offset and length are record in file * "Piece" is minimum unit in bittorrent * cut continuous block into piece * Lagacy approach * One server stuck, all servers stuck. * BitTorrent Deployment(EZIO) * Heterogeneous Environment 表現優異 * 就算一台Server死了,還是可以從其他的Server拿資料 #### 搬資料大賽 #### CloneZilla [下載地址](https://clonezilla.nchc.org.tw/clonezilla-live/download/) #### QA * 有用BT, how to track? private tracker * For efficiency, how to synchronize data Broadcast to everyone to let them know what a server has, and retrieve data by rarity. * 萬一要使用到大量的bare metals? 國網中心大量的叢集電腦,先裝好一台,再把一台複製到大量的電腦 multicast 狀況越來越多,才會有CloneZilla的誕生 #### Reference * 理論依據: https://www.mdpi.com/2076-3417/9/2/296 - https://github.com/tjjh89017/ezio - https://www.libtorrent.org/ --- ## Session 3. SDN演義 - 初出茅廬 (1/n) - JianHao Chen (SDN 說書人, Edgecore Networks) 代表ONF ### Open Networking & SDN ``` Switch / Router Software - CP ------------------------------- SDN Switch / Router Software - DP ------------------------------- Open Networking Switch / Router - Harware ``` ### OpenFlow: Enabling Datacenter to Datacenter => B4 WAN Cloud Edge to Datacenter => B2 WAN Google B4 Network: openflow的商業化 SD WAN (software define WAN) 邱牛的B4心得: https://www.hwchiu.com/b4-after.html (講者說有不對的地方可以打臉) ### SDN 戰國時代 1. SDN Controller - 2010 NOX-Controller : * C++ * 可以在上面寫一些 C++ Application,然後使用該 Application 去 call OpenFlow API。 POX-Controller : * Python * 與 NOX 類似,只是改用 Python 撰寫。 2. Floodlight - 2011 ~ 2012mid, REST API Programmed by JAVA 3. Ryu - 2011 Programmed by Python, performance issue, latency, more complicated than NOX and POX. 4. OpenDayLight - 2013 * Support OpenFlow included CISCO, Juniper(?) * Support Restful API * Service Abstaction Layer, 很好開發 * OpenStack 介接 * Centralize - controller掛掉之後, switch都不做事 5. ONOS - 2014 * Distribute Architecture * Support failover mechanism. (The other controllers will take over the works of malfunctioned controller.) ### Data Plane 1. Virtual Network concept 2. TTP/NDM model - 2012 3. OF-DPA - 2014 * models 的設計 * OF-DPA Premium for support effort, limited 100 machines(business case) 4. FlowVisor - 2012 * 中間使用一層 Proxy Controller, Proxy Controller 對 SouthBound 來說是Controller,但是對 NorthBound 來說只是一個 Switch,用這種欺騙的方式。 5. VeRTIGO - 2012 * Topology 的抽象化 A-B-C => 只看得到 A-B 6. OpenVirteX - 2014 * Use virtual mac address * Edge Switch Concept, host switch mac address 轉換 7. VTN - 2013 * OVS Switch * Overlay and Underlay Network 的串連 ### 問題來了 晶片如何設計等等TBD ### Q&As ## ~~Lightening~~ Heavy Talk 1 . LINE DevDay - Kyle Bai ### LINE Dev Day ### Kubernetes Development Survival kit * 如何模擬線上的Developing Workflow情境 * Speed up deploy image flow ### K8s local * Minikube: * Pros: based on VMtons of options for customizing the cluster * easily run different k8s versions, container runtimes, and controllers * VM drivers 選擇 * KIND: (Kubernetes IN Docker) ***最後選擇*** * Fast to start up * Kubernetes SIGs * Used to test Kubernetes itself. * https://github.com/kubernetes-sigs/kind * Can run in most CI environments (Travis CI, CircleCI, etc.) * Cons: Push image to cluster very slow. * MicroK8s: * Pros: 不需要以VM的方式執行 * container-only * Linux-only * K3D(K3s) - (Inside a Docker container): * K3D runs K3s * lot of optional and legacy features removed. * lightweight K8s distro ### KIND 好處 * Customized KIND - node image * Customized KIND - config ### Observation & Debug * kubectl port-forward * kubefwd * kubebox - check cluster status * A terminal UI to access cluster * access k8s easily * kubespy ### Solution * Draft * Garden * Skaffold **較多** * workflow: * Tilt ### Q&A 用什麼東西監控? * Prometheus **較多** * CloudWatch Pod & Pod 之間的保護 * VPC ## ~~Lightening~~ Heavy Talk 2. Kubeflow v0.7 Multi-user Auth with Istio - Jack Lin ### What is Kubeflow ? How to train AI models on K8s ### Service 可以使用 KubeFlow 直接開啟 Jupyter Notebook ### Multi-users Isolation ### Design Decision * Istio: * 讓K8s對於網路的控制更方便 * HTTP Routing * RBAC authorization ### High Level Service Solution * Istio Gateway * Envoy filter -> OIDC AuthService -> 3rd party auth company -> Idp(LDAP) ### Profile Controller [Component LINK](https://github.com/kubeflow/kubeflow/tree/master/components/profile-controller)