Office Hours for April 2020

West Coast Edition

Panelists

Jorge Castro, VMware (Host)
Samudrala Vamshi, American Airlines
Monica Rodriguez, VMware
Tunde Olu-Isa, VMware
Erik Osterman, Cloud Posse
Jeremy Rickard, VMware
Dave Strebel, Microsoft
Bhargav Bhikkaji(BB), Independent Consultant.

Question:
- Text : Rotating certs
- Asker: Panel answering Quetions from previous session
- Answer: https://github.com/kubernetes/website/pull/19351 Ansible playbooks
Question:
- Text: Do you have any suggestions on how to setup prometheus for cluster monitoring to keep it highly available? ie, would you put it in the same cluster, a separate cluster or an external vm?
Do you have any resources on how to setup a "master" prometheus? I am not sure how to send from one prometheus to another.
- Asker: Bre Gielissen
- Answer: Prometheus inside cluster ( issues with memory clog) Send to another one outside cluster Open source projects - storage backend Helm chart installation prometheus docs telegraf + influx for kubernetes monitoring is also a good choice trickster- https://github.com/tricksterproxy/trickster https://prometheus.io/docs/prometheus/latest/federation/ https://github.com/thanos-io/thanos https://www.instana.com/ https://istio.io/docs/tasks/observability/kiali/
Question:
- Text
- How to use Kafka more efficiently with K8s? Was using sync communication with Kafka from Microservice, but when it hits the FD limit, Microservice goes into restart mode. We used async communication later, but what are other ways to use it more efficiently?
- Asker: WarDaddy
- Answer: -https://github.com/danielqsj/kafka_exporter and https://github.com/lightbend/kafka-lag-exporter https://stackoverflow.com/questions/33649192/how-do-i-set-ulimit-for-containers-in-kubernetes
Question:
- Text What are folks doing for inventory / tracking if running many clusters with many nodes? Any slick inventory operators or ways to track clusters?
- Asker: jimangel
- Answer: https://infra.app/ https://kubecost.com/ https://github.com/kubectl-plus/kcf https://github.com/rchakode/kube-opex-analytics https://rancher.com/blog/2020/fleet-management-kubernetes/ https://github.com/getsentry/sentry https://github.com/lensapp/lens
Question:
- Text Hello.. Is there a way to get notified if Kubernetes pod goes out of memory without using kubestate metrics? (Basically the OOM killed terminated reason)
- Asker: Saloni
- Answer: -https://www.amazon.com/Kubernetes-Best-Practices-Blueprints-Applications-ebook/dp/B081J62KLW
Question:
- Text: What resources are people using to help developers and junior SREs get up the steep learning curve of K8s?Coming from a mix of backgrounds using ecs, docker-compose, heroku, terraform, puppet, and some bare metal. We have some passionate k8s advocates and some who don't know where to begin.
- Asker: Aaron Eaton
- Answer:
- https://www.katacoda.com/learn
- https://kind.sigs.k8s.io/docs/user/quick-start/
- https://kubernetes.io/docs/tasks/tools/install-minikube/
- https://github.com/kelseyhightower/kubernetes-the-hard-way https://azure.microsoft.com/en-in/resources/kubernetes-up-and-running/ https://kubernetes.io/ https://kubernetes.io/training/ https://training.linuxfoundation.org/training/kubernetes-for-developers/#outline https://www.edx.org/course/introduction-to-kubernetes https://www.edx.org/course/introduction-to-linux https://killer.sh/course/preview/e84d0e31-4fff-4c42-8afd-be1bdbc0d994 https://github.com/dgkanatsios/CKAD-exercises/blob/master/a.core_concepts.md https://kubernetes.io/docs/reference/kubectl/cheatsheet/ https://learnk8s.io/troubleshooting-deployments https://learnk8s.io/academy https://kodekloud.com/ https://kubernauts.de/en/training/kubernetes-training-course.html https://docs.google.com/presentation/d/13EQKZSQDounPC1I6EC4PmqaRmdCrpT3qswQJz9KRCyE/edit#slide=id.g33599df588_13_58 https://collabnix.github.io/kubelabs/ https://www.katacoda.com/courses/kubernetes/playground Pluralsight is free till 30th April https://eksworkshop.com/
Question:
- Text: I am currently setting up a bash script to setup our dev/staging GKE cluster so we can shut them down at night and restart them on mornings at will, but Im wondering if there's any solution that exists that would allow me to bundle up all the resources needed, the cluster, the dns, the various public help charts (mongo, redis, etc..) with custom config and our own applications (happens to be also in custom helm charts) rather than a custom bash script? I know AWS has something called CloudFormation that might have allowed me to do this but we are using GKE. If many tools exists, which one you recommend?
- Asker: Christian Roy
- Answer: Terraform ,flux, Argo CD https://github.com/cloudposse/terraform-aws-eks-cluster
  https://eksctl.io/
Question:
- Text:earlier today, I was troubleshooting an issue and noticed that in a cluster there are deployment using pvc with a storage class local-storage and others with local-storage-local, however there is SC object when I do "get sc" that presents the storage class, how did that happen? also tried to check the node for annotation/labels related, there was one label localstorage=true but nothing corresponding in the deployment
- Asker: Walid
Question:
- Text: we are running machine learning model in kubernetes pod.if my pod is reached 100% pod resources automatically restarted but need to make some delay 10 mits to restart.how we can do?
- Asker: devopsdymyr
- Answer : set resource limits https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ , maybe pod prestop can help
Question:
- Text: Do you have any suggestions on how to organize the yaml files for kubernetes resources? Right now, our cluster was setup with a bunch of yaml files that were applied manually which is very difficult to track. Would you use helm charts heavily there or something like a gitops workflow?
- Asker: Bre Gielissen
- Answer : Helm charts, Kustomize https://cloudposse.com/change-management/ https://github.com/roboll/helmfile https://www.openpolicyagent.org/docs/v0.12.2/kubernetes-admission-control/
Question:
- Text: What’s the purpose of setting a CPU limit if the container might be allowed to exceed it and won’t be killed or is it just to override the Limit Range admission controller? How does k8s knows whether if it will allow or not, does it depends on the allocatable resources of the node?
- Asker: Javier
- Answer: Its a cgroup behavior -https://github.com/kubernetes-sigs/descheduler
Question:
- Text:What’s the best persistent storage for on-premise workloads? I have tried rook-ceph and Longhorn. What are you guys using?
- Asker: Jawiz
- Answer: rook, minio, portworx, NFS, and investigating vsphere(VSan) CSI
- as a database geek, I do local volumes when I'm performance-constrained (and rely on DB replication), and when I'm not I do Ceph/Rook

URL's:

EU Edition

Panelists

Jorge Castro, VMware (Host)
Jeffrey Sica, Red Hat
Pierre Humberdroz, Spectrm
Puja Abbassi, Giant Swarm
Chris Carty, Google
Dave Strebel, Microsoft
Mario Loria, StockX
Povilas Versockas, UW
Yourname, Affiliation

Links

https://github.com/tekliner/rabbitmq-operator https://github.com/indeedeng/rabbitmq-operator https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/ https://github.com/influxdata/telegraf/issues/6959#issuecomment-608318315 https://relnotes.k8s.io/?markdown=balancer&releaseVersions=1.18.0 https://stackoverflow.com/questions/61113732/how-to-reuse-k8s-validation-in-my-own-crd-controller https://engineering.dollarshaveclub.com/kubernetes-fixing-delayed-service-endpoint-updates-fd4d0a31852c https://github.com/kubernetes/kubernetes/blob/master/go.mod#L124 https://kubernetes.io/docs/tasks/tls/certificate-rotation/ https://github.com/kubernetes/community/tree/master/sig-cluster-lifecycle https://github.com/kubernetes/kubernetes/issues/88553 https://github.com/lensapp/lens

Questions

Question:
- Recently I've been playing with a 3-node RabbitMQ cluster. The cluster is deployed as a StatefulSet with replica=3. For some reason, one of the pods refused to join the cluster; this led to readyness, liveness and entrypoints to fail, so one of the pods refused to start. According to RabbitMQ docs, the way to solve this is to issue a "reset" command, but how could I do that if the pod refused to run? I also thought of changing the entrypoint or the readyness probes, but this would have impacted the other pods since they are replicas (removing the readyness probe could prevent them to start in the right order for example).
- I ended up resetting the RabbitMQ cluster state in a somewhat-less-orthodox way, but this is not something I could have done in a production environment; I wondered if I could have recovered it in some other way (possibly less application-specific) like for example, if I could get a command to run in a pod that has problems on starting, or if I could override the readyness/liveness/entrypoint for a specific replica, or something else entirely?
- Asker: Simone Baracchi
Question:
- https://kubernetes.slack.com/archives/C6RFQ3T5H/p1585908749010200
- Asker: Jmorcar
- Answer in here: https://github.com/influxdata/telegraf/issues/6959
Question:
- I have an external netty server (not in cluster). How can I share same websocket connection from my pods(clients) to this external server?
- I need multiple pods so that messages sent by the service can be distributed but don't want service to have separate connection for each pod.
- Asker: prashant rathore
Question:
- https://kubernetes.slack.com/archives/C6RFQ3T5H/p1585063644252200
- Asker: felixdpg
Question:
- Hello all Since kubernetes v1.18 was released I encountered a lot of problems with MetalLoadBlancer External IP. All the services of type LoadBalancer doesn't get the address, it remain in "pending" state. I've checked the metallb logs and I didn't find anything that appeared meaningful for the problem. I'm working on a virtual environment deployed automatically, so I'm 100% sure that nothing changed in the configs, the only things changed is the kubernetes version. I've tried to deploy the cluster from scratch many times but the result was the same. I tried a kubernetes manual installation too, but the result is the same. I've used this configuration for 3 months without any problem, so I know that before kube 1.18 all worked as expected. Did you have any idea? Did you experienced a similar issue?
- Asker: Emanuel
Question:
- https://stackoverflow.com/questions/61113732/how-to-reuse-k8s-validation-in-my-own-crd-controller
- Asker: mingmin
Question:
- Hi All, I am getting the following errors from the APIServer logs does any one have any idea what these errors and how to fix. {code="500",component="apiserver",contentType="application/json",endpoint="https",instance="172.31.111.155:443",job="apiserver",namespace="default",resource="events",scope="namespace",service="kubernetes",verb="POST",version="v1"} Error from apiserver logs: I0415 12:47:08.913951 1 log.go:172] http: TLS handshake error from 172.31.111.240:41740: remote error: tls: bad certificate
- Asker: Ravi
Question:
- Why can't the kubedm certificate be valid for ten or one hundred years? It's too painful. When we deliver kubernetes to the customer's production environment, we need to manually compile the kubedm source code or manually create a large number of certificates. Can't you give us an optional parameter or variable to flexibly configure this option
- Asker: zhang
Question:
- I have a scenario where I see sometimes the endpoints do not gets created even when the pod is running service is there and labels are matching, I did see one blog as well https://engineering.dollarshaveclub.com/kubernetes-fixing-delayed-service-endpoint-updates-fd4d0a31852c but it didnt help, finally eneded up writing my own python script to handle this scenario. Where when you delete and recreate the service it immediately gets created. But this should nit happen ideally any thoughts on this?
- Asker: sammy
Question:
- Just wondering why kubernetes is using make and bazel I made a simple PR for an error handling and I had several CI stages failed because of bazel
- Asker: Rodrigo Villablanca
Question:
- Text
- Asker:
Question:
- Text
- Asker:

Office Hours for April 2020

West Coast Edition

Panelists

EU Edition

Panelists

Links

Questions

Read more

Lessons Learned from 3 years of Kubernetes Office Hours

Office Hours Feb 2021

Contributors Awards Ceremony Playbook

Office Hours January 2021