# Kubernetes Platform Review and Learnings
## How do you setup Kubernetes on an on-prem VM's Setup
- [ ] **Pre req for VM's before running playbook**
- Corp proxy settings (bash and docker)
- VM-2-VM connectivity
- ssh-add-key
- bridge config (different IP Address CIDR for different VM's)
- netfilter modprobe
- cleanup IPTables and systemd services
- [ ] **Cleanups from older installations**
- Special handling needed for etcd
- Networking (IPtables and IP forwarding) enabled
- swap disable -- kubeadm
- cleanup kubernetes config
- /var/lib/etcd clean
- /etc/etcd
- check ports are free
- firewall
- [ ] **Running ansible playbook**
- re-enterant or not really?
- etcd has to be odd number
- adding and removing etcd doesn't work.
- [ ] **Validate cluster**
- DNS issues
- cross Machines Networking
- Learn Journey of a packet
- flannel v/s calico - Networkpolicies
- kubectl krew doctor
- kubectl df-pv
- [ ] **etcd risks**
- kernel:unregister_netdevice: waiting for eth0 to become free
- Dynamic IP of Nodes
- kubespray unstable
- proxy config (Squid) breaks etcd
- network partitions
- [ ] **Add more nodes**, or **Add more master** to cluster
- Adding node is simpler and works
- Adding master will work
- Only if ordering of master nodes is same
- Avoid adding etcd nodes, that may not work
- [ ] **Minikube** for Dev
- later `kind` also came to party as another option here
- rancher lab's `rke` and `k3s`
- Difficult to support all the Platforms/OS
- [ ] **MultiTenant cluster for developers** Add namespaced users (key+cert) to cluster
- playbook role addUser
- Tiller per namespace?
- give acess to a particular namespace
- Set Limit and Quota for namespace (CPU, Mem)
## Cluster neccessary add on's
- [ ] **Cert Manager**
- Easy for outsode/external network
- Corporate network challenges
- [ ] **Ingress (Just to Save LBR cost?)**
- URI route based routing is difficult
- Inbuilt solution http server?
- App need to support URL redirects
- Hostbased need changes in `/etc/hosts`
- Need to have a DNS Server for best outcomes
- Subdomains are easy.
- Can be used for Authn and Authz using `oauth` andor `openid-connect`
- [ ] **ELK**
- ES Operator is hard to manage
- Split brains
- Index for each day? or for each svc?
- Kernel and Systemd, etcd, kubelet
- Logstash (JVM) v/s Fluentd (c base)
- tune log send time accordingly
- Fluentd is evolving, vs Beats
- TLS or no TLS for fluentd/filebeat
- Custom Image for Logstash (systemd + kube + ?)
- Ended up having 2TB Single Node ES, self managed.
- [ ] **Prometheus Operator** [TDD]
- Statefulset v/s operator
- Benefits of using operator
- CRD's and Operator
- AlertManager
- node exporter
- System Monotoring `USE`
- App Monitoring `RED`
- Slack Alerts, Webhooks
- Runbooks. **PromQL**
- `sum by(instance, job) (rate(rest_client_requests_total{code!~"2.."}[5m])) * 100 / sum by(instance, job) (rate(rest_client_requests_total[5m])) > 1`
- `absent(probe_http_status_code{instance="<instance-name>",job="blackbox"} == 200)`
- [ ] **More Observability**
- Spring Actuator for prometheus
- Grafana Dashboards
- Cluster Level
- Node Level
- Per Namespace
- Resource utilizing services/pods
- JVM Metrics
- Control Plane status
- Cluster/etcd health
- **Blackbox exporter - Service view**
- Add/Modify Alerts/Rules in CRD model
- Backup/Snapshots of TSDB
- Spcial cases for ETCD
- [ ] **Custom Metric Server for HPA**
- Can scale on custom metrics
- HPA or VPA?
- K8S Node Autoscaler
- JVM metric or RPS or Per Tenant based
- [ ] **__Jaeger Operator v/s statefulset**
- Persistence, retention
- Apps should have Telemetry UID to HTTP-header
- Helped in finding slow DB SQL's
- helped in finding inter APP communications issues
- [ ] **__Istio**
- Kiali Dashboard
- Tracing
- Overview of services
- Monitoring
- Circuit Breakers
- Mesh svc-to-svc communication
- **Request flow Observability/Animations/RPS**
- Service DAG-> Inbound and outbound traffic
- Correlating Resource usage and Request.
- Filters in view/dashboards
- **Response time for the 95th percentile**
- mTLS - the concept.
- Mesh v/s API Gateway Linkerd/Kong
- Ingressgateways
- VirtualServices v/s Ingress
- **Per namespace basis injection**
- Jaeger Telemetry
- Prometheus and Grafana
- [ ] **Minio**
- Alternate for S3/Cloud Blob/Object Storage
- Protection from bit rot
- Distributed Storage across FD's
- [ ] **Storage/Persistent Volume**
- [NFS (slow) Provisioner](https://github.com/kubernetes-retired/external-storage/blob/master/nfs/pkg/server/server.go)
- [HostPath/ LocalVolume provisioner](https://github.com/torchbox/k8s-hostpath-provisioner/blob/master/hostpath-provisioner.go)
- GlusterFS
- openEBS (mayasoft)
- [ ] **Teamcity --v/s-- Jenkins**
- Non K8S Deployment
- Experimented with K8S Plugins
- Logs and test results
- [ ] **Spinnaker**
- halyard - hl cli
- Minio Integration
- **Blue Green Deployments v/s Canary**
- **Cons: Heavy JVM, try Wercker?**
- [ ] **kafka**
- Event processing
- Strimzi operator
- **Settled on Single Node Deployment**
- [ ] **Flink**
- Jobs as statefulsets
- Operator
- [ ] **kubeapps --v/s-- helm-cli --v/s-- kustomise**
- helm + Monocular
- UI for kubeapps
- difficult config
- NodePort vs LBR
- StorageClass vs Manual PV
- common Ingress
- Hostname v/s Route based Routing
- [ ] **kubeless --v/s-- Knative**
- FAAS for kubernetes
- can listen to a topic and can triger on messages
- explore openFAAS
- kubeless cli
- **UC1: Send slack when a file uploaded to minio**
- *UC2: Scale in/out pods with slack bots*
- [ ] **Config Language**
- ksonnet (deprecated now)
- kustomise
- Explore other options
- Stay with helm
- [ ] **HLF**
- use cello, not stable
- Use checked in script to create Peer and orderer
- namespaced and properly secured with secrets
- nephos - not tested yet
- Finally used HLF-Operator (quite stable now)
- [ ] Explore
- Cluster API V3
- eBPF based Observability tools
- Ports [IDP]
- Operator framework
# Operator for J2EE Container
- [ ] **J2EE Operator features**
- REST based scale up/down
- Adding JVM params
- Rollings restarts
- Seamless scaling/replica
- [ ] **Orchestration**
- Self service K8S Job for provisioning
- jyhton and JMX
- [ ] **golang cli** for single click installation
- cobra cli
- reusable pattern -> interfaces
- **State in configmap** for restore from failure
- one line scale up/down via cli.
- No need to manually change CRD yaml's
- [ ] **Terraform**
- Single click Install
- Challenges like DB Licenses
- SDB Sharded DataBase deployment and scaling
