Davanum Srinivas

@XYdYH0X5SYC3DUYFF5Wylg

Joined on Sep 25, 2018

  • Introduction Within the cloud native eco system, periodically there are new advances that do not fall neatly into existing TAGs or WGs, most recent examples being WASM, eBPF etc. When we do see these situations, various community members typically would like to collaborate together to see what the possible overlaps, benefits etc. In such cases, we need time bound entities that explore the space with explicit intent to provide recommendations to the TOC on how best to integrate these into our ecosystem. Overall Purpose Explore new areas to help the TOC and TAGs to illuminate, dive deep and understand the new areas. Write a specific charter with details on the exact problem space that the ETG is trying to solve, serve as a focal point for CNCF community members to collect and drive their energy towards concrete deliverables through consensus building. Specific Objectives Define problem space, deliverables and write a charter to get started Pick chairs for the group who can guide the group and keep them on track towards its goals. Reach out to folks in the cloud native community and outside to understand the technology, its applications and value it brings to our ecosystem
     Like  Bookmark
  • Join: Slack : #sig-node / #sig-testing / #kubernetes-dev (http://slack.k8s.io/ for invite) Mailing list : https://groups.google.com/forum/#!forum/kubernetes-sig-node Landing page: https://github.com/kubernetes/community/tree/master/sig-node#node-special-interest-group https://github.com/kubernetes/community/tree/master/sig-node#goals : pick some goals that interest you here and we can do walkthrus and demos, etc
     Like 1 Bookmark
  • Name Email URL Natacha Crooks ncrooks@berkeley.edu https://www2.eecs.berkeley.edu/Faculty/Homepages/ncrooks.html Malte Schwarzkopf malte@cs.brown.edu
     Like  Bookmark
  • Types of jobs Serial Non-Serial Architecture x86_64 arm64 Operating System AL2
     Like  Bookmark
  • Kubernetes Security Response Committee (SRC) Details: Main Page: https://github.com/kubernetes/committee-security-response Process: https://github.com/kubernetes/committee-security-response/blob/main/security-release-process.md CVE Feed: https://kubernetes.io/docs/reference/issues-security/official-cve-feed/ Participants: Sri Saran Balaji (@SaranBalaji90) srajakum@amazon.com Micah Hausler (@micahhausler) mhausler@amazon.com
     Like  Bookmark
  • What is it? A long lived stable branch with associated testing and release infrastructure Releases are infrequent Releases have to be secure (CVE's) Ability for folks to move from a non-LTS release to an LTS release Ability for folks to move from LTS release to another LTS release Need to keep cutting point releases for bugs/backports/security off this branch new point releases MUST be fully backward compatible with the previous point releases stricter policy than the kubernetes policy on backports/cherry-picks Clear Policy on when an older LTS is sunset. Forced upgrades to new LTS with no exceptions
     Like  Bookmark
  • Google has been paying the bills for community infrastructure for a long time since the inception of the Kubernetes project. A few years ago Google switched to the mode where they wanted CNCF and Kubernetes community to own the infrastructure and ensure it is run by the community (and take Google folks as the only ones having admin access). Towards this effort Google granted credits of $3 million dollars per year. There was a Special Interest Group in Kubernetes community called SIG-K8s-infra to help move the infra out from google's hands BUT still run on GCP as there was no other alternative. This SIG has been working on a lot of projects to help Kubernetes community have access to compute/network/storage resources across the board as needed by other SIG(s) in Kubernetes. Last few years a trend emerged where AWS customers, some of those who use EKS and many of those who run their clusters using own custom/proprietary k8s distros have been placing a lot of load on the community infrastructure. One example is container images, AWS generates a LOT of traffic across clouds as folks running things in AWS reach out across cloud boundaries (think egress costs) and pulling images from the google hosted registries (GCR). So the community reached out to AWS leadership and this year we are funding $3 million to match Google's commitment. The SIG has been thinking about and building infrastructure across GCP and AWS now. For example the CI system called prow which runs on GCP has now been stitched together with a prow on EKS as well. So CI jobs in the community can run on either cloud. There are various classes of CI jobs, so some of them are easily portable and we are working on ensuring all classes of jobs can work on either cloud. Another example is scalability jobs for 100 node cluster and 500 node clusters etc which take a lot of $$'s to run as well. So community is coming up with ways and means to run these as well in AWS. At the moment, one of the very first things that a community member or end of kubernetes will see is that there is a new container registry which is state of the art proxy that redirects traffic for large blobs to a location that is nearer to where the cluster is running. For example a cluster running on AWS will get redirected to a s3 bucket in the same region for large files. This helps in keeping egress costs down as well as speed up access and reduce latency as well. This is a net new win for AWS customers. There are several such efforts in progress, the community has set itself a priority 0 and priority -1. priority zero is to ensure we use the credits from AWS responsibly and in a sustainable fashion to build a multi cloud infrastructure for the community. priority -1 is to reduce spend on GCP by any means necessary to create headroom there to ensure that we don't need to go ask google for extra funds in the last quarter of the year. This will also help with rolling out more things we can be doing in GCP that we have been holding off as we tended to use up all the credits there. There are things that will be happening for binary artifacts, signed release artifacts, CDN(s) etc over time.
     Like  Bookmark
  • Ben Elder from Google - https://twitter.com/BenTheElder/status/1641824598764429321?s=20 Tim Hockin from Google - https://twitter.com/thockin/status/1585623895616507904?s=20 Arnaud Meukam from VMware - https://twitter.com/ameukam/status/1585677058365071362?s=20 Credits announcement impressions on Twitter Reactions on slack
     Like  Bookmark
  • Tech Debt elimination: Test coverage is low, which is blocking progress in many areas: Many regressions in pod lifecycle area recently Most complex tests are failing for a long time (for e.g. eviction tests) Many key features and functionalities are not covered by tests or documented Soak testing is minimal Perf/scale testing is concentrated on API scalability, not a kubelet features No startup/bootstrap perf testing Fault injection, chaos monkey, fuzzing testing are minimal or not existent.
     Like  Bookmark
  • ✅ - Good ❌ - Bad 🤷 - No idea! (no recent runs) post-submit jobs Job Branch Prow Status ci-containerd-build
     Like  Bookmark
  • [TODO] Convert these links to things that show how to debug the registry, we need to turn this into an issue template? "dig" - https://github.com/kubernetes/registry.k8s.io/issues/137#issuecomment-1376574499 "curl -v" - https://github.com/kubernetes/registry.k8s.io/issues/174#issuecomment-1467646821 "crane ls" - https://github.com/kubernetes-sigs/kind/issues/1895#issuecomment-1468991168 "crane pull --verbose" - https://github.com/kubernetes/registry.k8s.io/issues/174#issuecomment-1467646821 "crane pull --verbose" - https://github.com/kubernetes/registry.k8s.io/issues/154#issuecomment-1435028502 Also: how/what to look at kubelet's log?
     Like  Bookmark
  • Kubernetes community is sending out notices to switch the source of the community images from k8s.gcr.io to registry.k8s.io latest message on the kubernetes dev@ mailing list is [1]. Also please see the related blog post [2]. The community has designated April 3rd to switch over images as mentioned in [1]. The new registry.k8s.io will help the community serve everyone better by reducing egress costs and spreading the load between multiple cloud providers. The community is contemplating additional measures beyond this request that may affect the images in k8s.gcr.io so relying on this old registry and its images is risky going forward. In addition, the community has indicated that they will not be hosting images from the upcoming kubernetes 1.27 release in the old registry. Since all the images currently in the older registry are already present in the registry.k8s.io, the best way forward for everyone would be ensure that their clusters switch over to the new registry. [1] https://groups.google.com/a/kubernetes.io/g/dev/c/Oq8HUQJQkXQ/m/pnI-QqmgBAAJ [2] https://kubernetes.io/blog/2023/02/06/k8s-gcr-io-freeze-announcement/ Validation: Customers can validate if all the community images in a kubernetes cluster are coming from the new registry by using the new community-images plugin:
     Like  Bookmark
  • We have a lot of choices in the community like Kops/Kubespray/CAPA etc but all of them take too long or do not support all test sceanarios or hard to inject freshly built code. Hence the search for a new replacement. As you can see from this is a long standing issue and one that open to "easy" fixes. Must support 80% of jobs today (revisit all the environment flags we use to control different aspects of the cluster to verify) All nodes must run on a VM to replicate (how it is being done today, we already have kind to replicate things run inside a container) Must be able to deploy the cluster built directly from either a PR or the tip of a branch (to cover both presubmit and periodic jobs) Must use kubeadm to bootstrap the both the control plane node and the worker nodes kubeadm needs systemd for running kubelet so the images deployed should use systemd Must have a mechanical way to translate existing jobs to this new harness Should have a minimum of moving parts to ensure we are not chasing flakes and digging into things we don't need to Should have a clean path (UX) to debug things like we have today (logs from VM/cloudinit/systemd/kubelet/containers should tell the whole story)
     Like  Bookmark
  • Dear Steering, We the Chairs and Technical Leads would like steering to formally acknowledge the role we play in the "content" of the infrastructure we build. Specifically today we have a whole set of logs, binaries, container images and such that are in the infrastructure we administer which falls under our charter. We would like to highlight that we have the ability to set things like retention policies, move infrastructure across various cloud properties and even remove some of the "content" mentioned above that are being used both by our CI systems and the Kubernetes community in general. We need to be able to set policies and curate content to ensure that we can live within our budget and still serve the community that uses all the content we host. Please let us know if we need to update our charter if it is not clear. We are writing to you to make sure that you see it and the community is aware of where we are coming from and that we are authorized to take these decisions on behalf of steering under our charter. thanks, Arnaud Meukam Tim Hockin Aaron Crickenberger Davanum Srinivas
     Like  Bookmark
  • Context: Email with questions/concerns from the community (Dated Jul 2, 2021) sent to the cncf-private-toc mailing list were:Justin Cappos (CNCF TAG Security Tech Lead, TUF, in-toto, Uptane) Jason Hall (sigstore) Luke Hinds (sigstore) Trishank Karthik Kuppusamy (TUF, Uptane, in-toto, CNAB Security) Dan Lorenc (sigstore) Marina Moore (TUF, Uptane, sigstore) Response Email from Liz Rice (then Chair of TOC) summarizing the discussion and guidance from TOC (Jul 16,2021)
     Like  Bookmark
  • Dims: Steering needs to be included in the conversations. Some steering + k8s-infra + Joanna (CNCF) to come up with a plan David: No immediate plan to use AMZN resources Material precondition Which services does not matter Greater vendor neutrality
     Like  Bookmark
  • Need self starters All! CNCF SRE Staff (2 people) Monitor credit/cash burn rate Carry Pagers and interact with hyperscaler support as needed Provide continuity and stability Implement strategies and tactics to both (think both long and short term) evenly distribute costs between clouds
     Like  Bookmark
  • CNCF is the final backstop for funding (Not one of the current sponsors) (You cannot start a discussion with the statement that CNCF will not able to pay the bills if needed) Everyone pays their fair share (Spread the burden across sponsors) CNCF is ultimately responsible for staffing a team to enable projects to consume credits and run the infrastructure (Not volunteers) Contractors or Full time employees should have much of the technical skills neeed (This is not a on-the-job-learning thing, they should have handled things like this before) (They will need to be able to help come up with alternative options to choose from)
     Like  Bookmark
  • Unified view of Clusters (or) Multi-tenancy vcluster / kiosk kubeslice crossplane Capsule / Kamaji KubePlus NOTE: https://divya-mohan0209.medium.com/mo-tenancy-mo-problems-f031f75374f7
     Like  Bookmark
  • We have a conformance image here. This image is used by sonobuoy as well. The tool will launch the already existing image as a job or as a deployment/pod (possibly just mirror what sonobuoy does) with the right set of parameters. Then watches until the e2e tests finish. After that i will need to grab the logs for the end user to look at. Requirements Need a self-starter, good grasp of golang and kubernetes What we have today There's a presubmit job called pull-kubernetes-conformance-image-test This uses a script kind-conformance-image-e2e.sh that essentially creates a kind cluster to test the image You can see conformance-e2e.sh where we use the conformance-e2e.yaml to run the image and waits for it to be done.
     Like  Bookmark