# "Kueue"ing Up Security for Multi-Tenant Cloud Infra at Scale - why did we choose kueue - k8s was meant for scheduling and orchestration but problems when operating at a certain scale - issues with alternatives like self managed native k8s job and salt stack - Cluster administrators want to control usage and maximize the utilization of resources. Kueue is a could-native job scheduler that works in combination with the default Kubernetes scheduler, the job controller and the cluster-autoscaler to provide a full batch system. - Generally Job Scheduling is related with HPC and AI/ML but security toolings can benefit a lot from this - Kueue is a could-native job scheduler with which you can build a multi-tenant batch system on a Kubernetes cluster. Kueue implements job queueing, deciding when jobs should wait and when they should start, based on quotas, priority and a hierarchy for sharing heterogeneous resources among teams. Continous Compliance Security is not a one-and-done task. It's important to maintain security consistently. There are a lot of open source tools out there to help with the security assessment of our infra but managing and orchestrating these tools at scale is a major pain point. Scheduling regular scans to maintain cloud security posture helps in achieving continous compliance. Kubernetes is a scheduler and orchestrator at it's core and Kubernetes Jobs are a good way to help scheduling these security scans. However when you try to operate Kubernetes Jobs at scale by yourself, the limitations of this approach like overloading etcd, making api server slower, difficult to track the status of these jobs, random order of execution start popping up. We also realised that we were not able to control the usage and maximize the utilization of our cluster resources. Enter Kueue – a k8s-native job scheduler specifically designed to address these challenges. Working seamlessly with the default Kubernetes scheduler, the job controller, and the cluster-autoscaler, Kueue provides a comprehensive batch system that helps us manage kubernetes jobs efficiently. This session is going to dive deep into what are the challenges with native kubernetes jobs and job scheduler, how "kueue" helps with orchestrating jobs while solving these challenges and finally how Accuknox "kueue"s up security for multiple tenants at scale. - 10 tools used to do security scan - Steampipe - https://steampipe.io/images/steampipe_logo_wordmark_darkmode.svg - Cloudsploit - - CheckCov - Macie - SecurityHub - But you are not a single entity, so you have multiple roles/accounts/registries across teams to scan - 10*20 = 200 - But you are not single cloud, so now you have aws,gcp,azure, multiple regions - 200*3 = 600 things to doo - But as a service provider, you serve to multiple users operating at huge scale - 600*100 = 60000 with variable frequency - Explain what are the solutions we can leverage to operate these scans at this scale - Kubernetes and Kubernetes Jobs - But kubernetes won't run 50000 jobs - https://github.com/kubernetes/kubernetes/issues/95492 - Probably solved by deploying batch by batch - Cannot enforce FIFO - Can use priority class, for each pod - using unix timestamps - but then cleanup is a mess - Cannot share resources efficiently - Multiple tenants should have fair sharing - Admission checks? About Kueue Features of Kueue explain how it works and schedules Explain resource sharing Explain our architecture ![image](https://hackmd.io/_uploads/BkPH5tup6.png)