Kubernetes Platform Review

# Kubernetes Platform Review and Learnings ## How do you setup Kubernetes on an on-prem VM's Setup - [ ] **Pre req for VM's before running playbook** - Corp proxy settings (bash and docker) - VM-2-VM connectivity - ssh-add-key - bridge config (different IP Address CIDR for different VM's) - netfilter modprobe - cleanup IPTables and systemd services - [ ] **Cleanups from older installations** - Special handling needed for etcd - Networking (IPtables and IP forwarding) enabled - swap disable -- kubeadm - cleanup kubernetes config - /var/lib/etcd clean - /etc/etcd - check ports are free - firewall - [ ] **Running ansible playbook** - re-enterant or not really? - etcd has to be odd number - adding and removing etcd doesn't work. - [ ] **Validate cluster** - DNS issues - cross Machines Networking - Learn Journey of a packet - flannel v/s calico - Networkpolicies - kubectl krew doctor - kubectl df-pv - [ ] **etcd risks** - kernel:unregister_netdevice: waiting for eth0 to become free - Dynamic IP of Nodes - kubespray unstable - proxy config (Squid) breaks etcd - network partitions - [ ] **Add more nodes**, or **Add more master** to cluster - Adding node is simpler and works - Adding master will work - Only if ordering of master nodes is same - Avoid adding etcd nodes, that may not work - [ ] **Minikube** for Dev - later `kind` also came to party as another option here - rancher lab's `rke` and `k3s` - Difficult to support all the Platforms/OS - [ ] **MultiTenant cluster for developers** Add namespaced users (key+cert) to cluster - playbook role addUser - Tiller per namespace? - give acess to a particular namespace - Set Limit and Quota for namespace (CPU, Mem) ## Cluster neccessary add on's - [ ] **Cert Manager** - Easy for outsode/external network - Corporate network challenges - [ ] **Ingress (Just to Save LBR cost?)** - URI route based routing is difficult - Inbuilt solution http server? - App need to support URL redirects - Hostbased need changes in `/etc/hosts` - Need to have a DNS Server for best outcomes - Subdomains are easy. - Can be used for Authn and Authz using `oauth` andor `openid-connect` - [ ] **ELK** - ES Operator is hard to manage - Split brains - Index for each day? or for each svc? - Kernel and Systemd, etcd, kubelet - Logstash (JVM) v/s Fluentd (c base) - tune log send time accordingly - Fluentd is evolving, vs Beats - TLS or no TLS for fluentd/filebeat - Custom Image for Logstash (systemd + kube + ?) - Ended up having 2TB Single Node ES, self managed. - [ ] **Prometheus Operator** [TDD] - Statefulset v/s operator - Benefits of using operator - CRD's and Operator - AlertManager - node exporter - System Monotoring `USE` - App Monitoring `RED` - Slack Alerts, Webhooks - Runbooks. **PromQL** - `sum by(instance, job) (rate(rest_client_requests_total{code!~"2.."}[5m])) * 100 / sum by(instance, job) (rate(rest_client_requests_total[5m])) > 1` - `absent(probe_http_status_code{instance="<instance-name>",job="blackbox"} == 200)` - [ ] **More Observability** - Spring Actuator for prometheus - Grafana Dashboards - Cluster Level - Node Level - Per Namespace - Resource utilizing services/pods - JVM Metrics - Control Plane status - Cluster/etcd health - **Blackbox exporter - Service view** - Add/Modify Alerts/Rules in CRD model - Backup/Snapshots of TSDB - Spcial cases for ETCD - [ ] **Custom Metric Server for HPA** - Can scale on custom metrics - HPA or VPA? - K8S Node Autoscaler - JVM metric or RPS or Per Tenant based - [ ] **__Jaeger Operator v/s statefulset** - Persistence, retention - Apps should have Telemetry UID to HTTP-header - Helped in finding slow DB SQL's - helped in finding inter APP communications issues - [ ] **__Istio** - Kiali Dashboard - Tracing - Overview of services - Monitoring - Circuit Breakers - Mesh svc-to-svc communication - **Request flow Observability/Animations/RPS** - Service DAG-> Inbound and outbound traffic - Correlating Resource usage and Request. - Filters in view/dashboards - **Response time for the 95th percentile** - mTLS - the concept. - Mesh v/s API Gateway Linkerd/Kong - Ingressgateways - VirtualServices v/s Ingress - **Per namespace basis injection** - Jaeger Telemetry - Prometheus and Grafana - [ ] **Minio** - Alternate for S3/Cloud Blob/Object Storage - Protection from bit rot - Distributed Storage across FD's - [ ] **Storage/Persistent Volume** - [NFS (slow) Provisioner](https://github.com/kubernetes-retired/external-storage/blob/master/nfs/pkg/server/server.go) - [HostPath/ LocalVolume provisioner](https://github.com/torchbox/k8s-hostpath-provisioner/blob/master/hostpath-provisioner.go) - GlusterFS - openEBS (mayasoft) - [ ] **Teamcity --v/s-- Jenkins** - Non K8S Deployment - Experimented with K8S Plugins - Logs and test results - [ ] **Spinnaker** - halyard - hl cli - Minio Integration - **Blue Green Deployments v/s Canary** - **Cons: Heavy JVM, try Wercker?** - [ ] **kafka** - Event processing - Strimzi operator - **Settled on Single Node Deployment** - [ ] **Flink** - Jobs as statefulsets - Operator - [ ] **kubeapps --v/s-- helm-cli --v/s-- kustomise** - helm + Monocular - UI for kubeapps - difficult config - NodePort vs LBR - StorageClass vs Manual PV - common Ingress - Hostname v/s Route based Routing - [ ] **kubeless --v/s-- Knative** - FAAS for kubernetes - can listen to a topic and can triger on messages - explore openFAAS - kubeless cli - **UC1: Send slack when a file uploaded to minio** - *UC2: Scale in/out pods with slack bots* - [ ] **Config Language** - ksonnet (deprecated now) - kustomise - Explore other options - Stay with helm - [ ] **HLF** - use cello, not stable - Use checked in script to create Peer and orderer - namespaced and properly secured with secrets - nephos - not tested yet - Finally used HLF-Operator (quite stable now) - [ ] Explore - Cluster API V3 - eBPF based Observability tools - Ports [IDP] - Operator framework # Operator for J2EE Container - [ ] **J2EE Operator features** - REST based scale up/down - Adding JVM params - Rollings restarts - Seamless scaling/replica - [ ] **Orchestration** - Self service K8S Job for provisioning - jyhton and JMX - [ ] **golang cli** for single click installation - cobra cli - reusable pattern -> interfaces - **State in configmap** for restore from failure - one line scale up/down via cli. - No need to manually change CRD yaml's - [ ] **Terraform** - Single click Install - Challenges like DB Licenses - SDB Sharded DataBase deployment and scaling ![](https://i.imgur.com/slJmVXD.jpg)