# Lessons Learned from 3 years of Kubernetes Office Hours Date: 23 August 2020 Authors: Jorge Castro (VMware), Josh Berkus (Red Hat), Chris Carty (Google), Dan "POP" Papandrea (Sysdig), David McKay (Equinix Metal) Today we celebrate [three years](https://www.youtube.com/playlist?list=PL69nYSiGNLP3azFUvYJjGn45YbF6C-uIg) of the [Kubernetes Office Hours](https://github.com/kubernetes/community/blob/master/events/office-hours.md). This is a monthly event where we take a panel of volunteers, stick them on a live stream, and then see how many questions we can field from the community. [![Our latest episode](http://img.youtube.com/vi/5kJ6tJXq-qU/0.jpg)](http://www.youtube.com/watch?v=5kJ6tJXq-qU "Our latest episode") I started the show for one reason. I had recently started my Kubernetes journey and was learning all these new concepts while rewiring my traditional sysadmin brain to be more cloud native. So the idea was if I'm going to dig into this stuff and bother my new coworkers with silly questions, we might as well do it together and on the air to share our experiences and do it in a way that is fun and useful for others. Give away some tshirts, fame and fortune would surely follow. After 65 episodes we've decided to take a look at some of the more common problem areas that we've been tackling, and put together a quick summary for you on things where you might want to invest your attention. You will find many articles on "Top X things to know about Kubernetes". We've specifically avoided those and went back into our archives because what people think you need to know and what you actually need to know can be different. So here's our top 5 in no particular order, we won't go too indepth on each one but give you enough of an executive summary to start your research. ### Many of you need OPA and you don't even know it Every few episodes we have questions about how to best prevent user X from doing activity Y, such as enforcing certain resource limits, tags, or naming conventions to name a few. The answer we tend to agree on for these situations is Open Policy Agent([OPA](https://www.openpolicyagent.org/))! OPA is a policy engine which allows you to write policy as code. This is very powerful as it helps allow more granular and flexible policy than what can be done with Pod Security Policies (PSPs). For example a best practice in kubernetes is to not use the `latest` tag with your containers. Preventing this practice is not possible out of the box but using OPA with Rego we can this with the following snippet. ``` violation[{"msg":msg}] { container := input.review.object.spec.containers[_] tag := split(container.image, ":")[1] not tag != "latest" msg := sprintf("container <%v> uses '%v' tag", [container.name, tag]) } ``` You can do more complex checks with OPA as well such as checking for deprectiated [APIs](https://github.com/naquada/deprek8/) when moving to Kubernetes 1.16 (Thanks Jim Angel for [this](https://gist.github.com/jimangel/0014770713cdca8b363816930ef2520f) guide. The most popular way to implement OPA in Kubernetes is using [Gatekeeper](https://github.com/open-policy-agent/gatekeeper). This allows you to write constraints and template policies as CRDs in your cluster. You also start enforcing policies in your CI using tooling like [conftest](https://github.com/open-policy-agent/conftest) in order to start providing feedback earlier in your release process (Shift-Left). OPA is now seeing adoption in the industry such as Azure, GKE, and OpenShift as the Policy Engine of Choice. It won't be long before you won't be able to get very far without running into it. <!-- - [OPA](https://www.openpolicyagent.org/) is policy as code - Helps better enforce policy in Kubernetes and else where - More flexible and granular than [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) (PSPs) - Potential replacement of [PSPs](https://github.com/kubernetes/enhancements/issues/5) - Tools like [conftest](https://www.conftest.dev/) and [Gatekeeper](https://github.com/open-policy-agent/gatekeeper) help implement OPA in cluster and CI. - Adoption from major vendors (Google, Microsoft, RHEL) Stuff, Gatekeeper, policy as code, adoption in Azure (Azure Kubernetes Policy), Anthos (Anthos Policy), Rancher, OpenShift. Conftest. Should we mention PSPs while we're at it or forward look? I think that would be good. Haven't checked on the PR to sub PSP with Gatekeeper in a while. --> ### Many of you still struggle with databases and whether or not to put them in cluster Do you want to put your database on Kubernetes or not? [It depends](http://www.databasesoup.com/2018/07/should-i-run-postgres-on-kubernetes.html), and lots of people seem to have strong opinions on whether you should. Certainly many people *are* running databases in pods, successfully in production, but it's very use-case dependant. Small, single-application databases are easier than giant, business-central data stores. And once you've decided to put your database onto your cluster, there are still a lot of decisions to make, most of which are still debatable. What kind of storage should you use: local volumes, clustered storage, something else? What's performance like compared with bare metal? Do new clustered databases like Vitess offer strong advantages over more traditional options? Do you need an operator, or is Helm enough? Unfortunately, like pre-Kubernetes databases, many such decisions are tradeoffs, so you can always find at least one expert who is willing to champion any particular option. And DB deployment and management on clusters is still pushing the boundaries of Kubernetes feature development, so you'll need to learn a lot of things for yourself. The only thing we know for sure is: don't store your databases on NFS. ### Many of you still have lots of questions about autoscaling - Cluster Autoscaling - Vertical Pod Autoscaling - Horizontal Pod Autoscaling - Link to the episode we talked about it - goldilocks - there's a tgik on this ### Many of you run Java applications There's a myth that all the "cool kids" are rewriting the world into Go microservices, the real world is much different. :D While we do see a plethora of multiple languages being used in K8s workloads, many of our questions ask specifically about managing Java applications: - Setting Limits - Updates on JVM work upstream to be more container native. - Other gotchas In our January session we [discussed](https://www.youtube.com/watch?v=8JbGfNNG1mQ&list=PL69nYSiGNLP3azFUvYJjGn45YbF6C-uIg&index=26) an out of memory issue as it related to a Java application. From that we gathered a few resources on how best to run Java apps in Kubernetes. - [Java Inside Docker](https://developers.redhat.com/blog/2017/03/14/java-inside-docker/) - [Kubernetes Demystified Restrictions On Java Applications](https://dzone.com/articles/kubernetes-demystified-restrictions-on-java-applic) <!-- Java is more container friendly after v8? Setting limits - Issue is that JVM sees all the memory and wants it all, newer versions do this better. - Monitoring usage in a dev env to get an idea of what the limits need to be. - There was a tomcat example from a session or two ago... --> ### The Future of Office Hours is Bright!![](https://i.imgur.com/m73I11d.png) The volunteers have helped usher in new folx with open arms and that has augmented the Office Hours experience and helped relieve some of the burden. Folks like Rachel Leekin, Chauncey Thorn and many more have brought some great real world experience and know how to ensure the community has a helping hand. Office Hours shows the power of community with new faces growing the skill sets of the whole community by working together! ### Other Maybe we could have a little section on tools the community is always recommending, like we always have someone mentioning goldilocks, etc. <!-- Just looking through the notes for the past few sessions. - admin tools (k9s, octant) - longhorn (Has come up for a few episodes in a row. Good CSI for smaller clusters, hobby or something needing local disks.) - On-Prem registries with quay outages and now docker hub changes (Harbor, Quay, Artifactory, others?) - Managing secrets and Certs (Cert-Manager, Vault, sealed-secrets) - GitOps-y stuff is starting to be more frequent (flux, argo) - Helm v Kustomize, more like Helm & Kustomize --> ### In Conclusion As you can see, we've only scratched the surface of the topics that have been discussed over the past few years. We [publish our show notes](https://discuss.kubernetes.io/tag/office-hours) on the Kubernetes Forum, there's hours of useful links in there we recommend you check out. We're committed to blah blah. A big thanks to our contributors, who have taken the time out of their busy schedules over the years to help out: Povilas, Pierre Humberdroz, Joel, Ilya, Puja Abbassi, Chris Carty, Dave Strebel, Josh Berkus, Monica Rodriguez, BB, Samu, Joel Speed, Marky Jackson, Bob Killen, Jeffrey Sica, Mario Loria, Jorge Castro, Rachel Leekin, Chauncey Thorn, David "RAWKODE" McKay and Katie Gamanji. This group is all volunteers, if you're interested in joining this is a great way to contribute to the project, join us on #office-hours on the Kubernetes slack. We'd also like to thank the following companies for their support over the years: Weaveworks, Giant Swarm, VMware, StockX, University of Michigan, Red Hat, Google, and Utility Warehouse, Microsoft, Spectrm, Sysdig, Equinix Metal and Pusher. (Who'd I miss?) Special thanks to both Google and the CNCF for sponsoring the t-shirt giveaways. And an extra special thanks to Tim St. Clair, Andy Goldstein, and Ralph Bankston for their initial support during our first rough initial episodes.