Raffle:
URL's
Person: marcoceppi
Question:
Okay. I have a question about stoarge that I’m hoping panelists have experienced / have sage advice on. We currently run Kubernetes on baremetal in edge locations. Due to physical constraints these bare metal hosts (6-10 nodes) are unconventional, small, and unreliable. Each unit in a cluster can also experience independent power issues. Because of that it can be up to a week before a powered off node is turned back on. So far, because of how we run Kube/etcd we haven’t really lost a single site, things fail over appropriately, but we’re loosing workloads due to storage lockups.
We run rook/ceph and have a single statefulset and a single deployment that use storage (more apps, but they’re all ephemeral/stateless). We’re hitting the issue where if a node becomes NotReady / NoExecute Node Taint (when the node-controller can’t communicate with kubelet - 90% due to node losing power) storage will never “unlock” from that node and when the pod controller eventually reschedules the statefulset/pod its stuck because Ceph RBD is RWO and the storage - to Kubernetes - is still attached to the offline node.
To remedy this, we wrote a script that after a node is unavailable in k8s for 60s just removes it from the node. That reschedules the workload quicker but still doesn’t release the storage in a timely fashion so we’re stuck in the same situation - only this time the remedy is to force delete the statefulset pod a few times and 75% of the time it’ll get storage again.
All of what we’ve done are hacks so far - we’re grappling with hardware issues that we’ll replace but not for another 6mo but more importantly the way storage, CSI, and Kubernetes all intersect. To us it seems the only options are RWX storage which outside of NFS don’t seem to have many performant options or to continue developing this node-removal script to also repeatedly force delete objects until a desired state is met.
Answer:
Person: Ari
Question: One thing that everybody uses Kubernetes for is to make sure that pods have enough CPU and memory on nodes through resource requests and limits, and the Kubernetes scheduler does a good job of making sure that nodes aren't overscheduled in terms of CPU and memory.
My question is - what about disk I/O and networking? The underlying node storage and networking will have a real limit to IOPS etc. Are there plans to get Kubernetes to allow pods to define storage and networking requirements, and not over-schedule storage and networking requirements on nodes which cannot handle their requests?
Answer:
Person: Dimitrije M
Question: Good morning! I have a question about cluster utilization, I am setting resource requests and limits for our services and I was curious how to approach these. Should I set requests based on peak load or should I allow limits to handle those? My current request setup leaves my cluster at <20% utilization and I was wondering how I should approach increasing my utilization
Answer:
Person: Joel Davis
Question: Is there anything in RBAC that allows you to select against certain namespaces? For example, if I want to give someone the ability to create new namespaces (of which they have access to * verbs on * resources) but not interact with a given set of namespaces is that possible to do through RBAC?
Answer: Potential custom webhook or operator.
Person: cten
Question: How are you monitoring and managing the cost to run your clusters? (currently running in AWS)
Answer: https://kubecost.com/ is probably the best solution out there right now
Person: Joel Speed
Question: Something that's come up with my work recently, when hosting metrics endpoints in an application, do people/should people reqiure authn/authz to access them? ie should something like Kube-Rbac-Proxy be put in front or the functionality be implemented into the application
Answer:
Person: Baluwii
Question: Is there any storage backup recommended solution?
Answer:
Person: Andrei
Question: Hi, volume question on the "Volumes" doc page we can read: Kubernetes supports several types of Volumes generic: what does it mean k8s supports ?
particular: seems glusterfs client comes with hyperkube to k8s node; also seems hyperkube is going to be deprecated; how will k8s will continue support such kind of volumes?
Will be great to hear how to bring latest (most fresh) glusterfs client onto node and unlock developers, so they can use latest glusterfs (which deployed somehow, somewhere) from our k8s cluster as a client.
resource question would you recommend to limit cpu/mem for kube-api and etcd running as a static pod?
Answer:
knabben: question: Hi, Is there an official/recommended Grafana template for monitoring the entire Kubernetes ecosystem (metrics on apiserver/kubelet/scheduler)? What other tools people are using to get a big picture of it?
Person: Mihir Shah
Question: How to set Custom metrics for HPA?
Answer:
Person: athavan kanpuli ****
Question: Hi, I have a question. Is there way to specify certain pods in a deployment to get killed when horizontal autoscaler scales down the pods of a deployment?
Answer:
Person 11: Jojo Pad
Question: Is there a tool for testing k8s service latency?
Answer:
Person 12: Karoline Pauls
Question: Given a currently updating deployment, rolling in a new replicaset and and rolling out the previous one, which replicaset are pods taken away from if a user or a scaler decreases the deployment's replicaCount?
Answer:
Person 13: athavan kanapuli 9:23 AM Hi, I have a question. Is there way to specify certain pods in a deployment to get killed when horizontal autoscaler scales down the pods of a deployment?
Question:
Answer:
P 14: Vishnu Prasad *** Q: Is there a project or tools that would us help configure how and when to autoscale nodes up and down. Like in EKS node-groups when to scale down the nodes especially. Mainly cause certain loads can’t use one of those metrics like cpu,mem etc all the time for scaling up and down. A:
P15: Long Q: is the stable/metrics-server supposed to work out of the box or is it absolutely required to add the –kubelet-insecure-tls flag
Welcome everyone to today’s Kubernetes Office Hours, where we answer your user questions live on the air with our esteemed panel of experts. You can find us in [#office-hours] on slack, and check the topic for the URL for the information.
The hack.md notes document will have a list of who has asked questions, roll a dice to see who won the shirts. On occasion if someone from the audience has been helpful feel free to give them a shirt as well, we want to reward people for helping others. Note: Multi-sided dice not included.
(Note, the companies will change over time depending on the hosts)
And lastly, feel free to hang out in [#office-hours] afterwards, if the other channels are too busy for you and you’re looking for a friendly home, you’re more than welcome to pull up a chair and hang out.