# Office Hours for April 2020
## West Coast Edition
- Jorge Castro, VMware (Host)
- Samudrala Vamshi, American Airlines
- Monica Rodriguez, VMware
- Tunde Olu-Isa, VMware
- Erik Osterman, Cloud Posse
- Jeremy Rickard, VMware
- Dave Strebel, Microsoft
- Bhargav Bhikkaji(BB), Independent Consultant.
- Text : Rotating certs
- Asker: Panel answering Quetions from previous session
Do you have any suggestions on how to setup prometheus for cluster monitoring to keep it highly available? ie, would you put it in the same cluster, a separate cluster or an external vm?
Do you have any resources on how to setup a "master" prometheus? I am not sure how to send from one prometheus to another.
- Asker: Bre Gielissen
- Answer: Prometheus inside cluster ( issues with memory clog)
Send to another one outside cluster
Open source projects - storage backend
Helm chart installation
telegraf + influx for kubernetes monitoring is also a good choice
- How to use Kafka more efficiently with K8s?
Was using sync communication with Kafka from Microservice, but when it hits the FD limit, Microservice goes into restart mode. We used async communication later, but what are other ways to use it more efficiently?
- Asker: WarDaddy
-https://github.com/danielqsj/kafka_exporter and https://github.com/lightbend/kafka-lag-exporter
What are folks doing for inventory / tracking if running many clusters with many nodes? Any slick inventory operators or ways to track clusters?
- Asker: jimangel
Hello.. Is there a way to get notified if Kubernetes pod goes out of memory without using kubestate metrics? (Basically the OOM killed terminated reason)
- Asker: Saloni
- Text: What resources are people using to help developers and junior SREs get up the steep learning curve of K8s?Coming from a mix of backgrounds using ecs, docker-compose, heroku, terraform, puppet, and some bare metal. We have some passionate k8s advocates and some who don't know where to begin.
- Asker: Aaron Eaton
Pluralsight is free till 30th April
- Text: I am currently setting up a bash script to setup our dev/staging GKE cluster so we can shut them down at night and restart them on mornings at will, but Im wondering if there's any solution that exists that would allow me to bundle up all the resources needed, the cluster, the dns, the various public help charts (mongo, redis, etc..) with custom config and our own applications (happens to be also in custom helm charts) rather than a custom bash script? I know AWS has something called CloudFormation that might have allowed me to do this but we are using GKE. If many tools exists, which one you recommend?
- Asker: Christian Roy
- Answer: Terraform ,flux, Argo CD
- Text:earlier today, I was troubleshooting an issue and noticed that in a cluster there are deployment using pvc with a storage class local-storage and others with local-storage-local, however there is SC object when I do "get sc" that presents the storage class, how did that happen? also tried to check the node for annotation/labels related, there was one label localstorage=true but nothing corresponding in the deployment
- Asker: Walid
- Text: we are running machine learning model in kubernetes pod.if my pod is reached 100% pod resources automatically restarted but need to make some delay 10 mits to restart.how we can do?
- Asker: devopsdymyr
- Answer : set resource limits https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/ , maybe pod prestop can help
- Text: Do you have any suggestions on how to organize the yaml files for kubernetes resources? Right now, our cluster was setup with a bunch of yaml files that were applied manually which is very difficult to track. Would you use helm charts heavily there or something like a gitops workflow?
- Asker: Bre Gielissen
- Answer : Helm charts, Kustomize
- Text: What’s the purpose of setting a CPU limit if the container might be allowed to exceed it and won’t be killed or is it just to override the Limit Range admission controller? How does k8s knows whether if it will allow or not, does it depends on the allocatable resources of the node?
- Asker: Javier
- Answer: Its a cgroup behavior
- Text:What’s the best persistent storage for on-premise workloads? I have tried rook-ceph and Longhorn. What are you guys using?
- Asker: Jawiz
- Answer: rook, minio, portworx, NFS, and investigating vsphere(VSan) CSI
- as a database geek, I do local volumes when I'm performance-constrained (and rely on DB replication), and when I'm not I do Ceph/Rook
- Rotating certs: https://github.com/kubernetes/website/pull/19351
- General Url's for prepartion
## EU Edition
- Jorge Castro, VMware (Host)
- Jeffrey Sica, Red Hat
- Pierre Humberdroz, Spectrm
- Puja Abbassi, Giant Swarm
- Chris Carty, Google
- Dave Strebel, Microsoft
- Mario Loria, StockX
- Povilas Versockas, UW
- Yourname, Affiliation
- Recently I've been playing with a 3-node RabbitMQ cluster. The cluster is deployed as a StatefulSet with replica=3. For some reason, one of the pods refused to join the cluster; this led to readyness, liveness and entrypoints to fail, so one of the pods refused to start. According to RabbitMQ docs, the way to solve this is to issue a "reset" command, but how could I do that if the pod refused to run? I also thought of changing the entrypoint or the readyness probes, but this would have impacted the other pods since they are replicas (removing the readyness probe could prevent them to start in the right order for example).
- I ended up resetting the RabbitMQ cluster state in a somewhat-less-orthodox way, but this is not something I could have done in a production environment; I wondered if I could have recovered it in some other way (possibly less application-specific) like for example, if I could get a command to run in a pod that has problems on starting, or if I could override the readyness/liveness/entrypoint for a specific replica, or something else entirely?
- Asker: Simone Baracchi
- Asker: Jmorcar
- Answer in here: https://github.com/influxdata/telegraf/issues/6959
- I have an external netty server (not in cluster). How can I share same websocket connection from my pods(clients) to this external server?
- I need multiple pods so that messages sent by the service can be distributed but don't want service to have separate connection for each pod.
- Asker: prashant rathore
- Asker: felixdpg
- Hello all
Since kubernetes v1.18 was released I encountered a lot of problems with MetalLoadBlancer External IP. All the services of type LoadBalancer doesn't get the address, it remain in "pending" state.
I've checked the metallb logs and I didn't find anything that appeared meaningful for the problem.
I'm working on a virtual environment deployed automatically, so I'm 100% sure that nothing changed in the configs, the only things changed is the kubernetes version. I've tried to deploy the cluster from scratch many times but the result was the same.
I tried a kubernetes manual installation too, but the result is the same.
I've used this configuration for 3 months without any problem, so I know that before kube 1.18 all worked as expected.
Did you have any idea? Did you experienced a similar issue?
- Asker: Emanuel
- Asker: mingmin
- Hi All,
I am getting the following errors from the APIServer logs does any one have any idea what these errors and how to fix.
Error from apiserver logs:
`I0415 12:47:08.913951 1 log.go:172] http: TLS handshake error from 172.31.111.240:41740: remote error: tls: bad certificate`
- Asker: Ravi
- Why can't the kubedm certificate be valid for ten or one hundred years? It's too painful. When we deliver kubernetes to the customer's production environment, we need to manually compile the kubedm source code or manually create a large number of certificates. Can't you give us an optional parameter or variable to flexibly configure this option
- Asker: zhang
- I have a scenario where I see sometimes the endpoints do not gets created even when the pod is running service is there and labels are matching, I did see one blog as well https://engineering.dollarshaveclub.com/kubernetes-fixing-delayed-service-endpoint-updates-fd4d0a31852c but it didnt help, finally eneded up writing my own python script to handle this scenario. Where when you delete and recreate the service it immediately gets created.
But this should nit happen ideally any thoughts on this?
- Asker: sammy
- Just wondering why kubernetes is using make and bazel
I made a simple PR for an error handling and I had several CI stages failed because of bazel
- Asker: Rodrigo Villablanca
Last changed by
Production grade container orchestration, made with love