Dynamic Resource Allocation (DRA)

**Reviewers: to add a comment, highlight some text in the rendered preview and click the Comment button** --- # Dynamic Resource Allocation (DRA) {{< feature-state feature_gate_name="DynamicResourceAllocation" >}} This page describes _dynamic resource allocation_ in Kubernetes. ## About DRA {#about-dra} Dynamic resource allocation (DRA) is a Kubernetes feature that lets you request and share hardware resources among Pods. These resources are often attached {{glossary_tooltip text="devices" term_id="device"}} like hardware accelerators. With DRA, device drivers and cluster admins define device _classes_ that are available to request in a cluster. Developers "claim" devices within a device class and specify their claims in workloads. Kubernetes allocates matching devices to specific claims and places the corresponding Pods on nodes that can access the allocated devices. Allocating resources with DRA is a similar experience to [dynamic volume provisioning](/docs/concepts/storage/dynamic-provisioning/), in which you use PersistentVolumeClaims to claim storage capacity from storage classes and request the claimed capacity in your Pods. ### Benefits of DRA {#dra-benefits} DRA provides a flexible way to categorize, request, and use devices in your cluster. Using DRA provides benefits like the following: * **Flexible device filtering**: use common expression language (CEL) to perform fine-grained filtering for specific device attributes. * **Device sharing**: share the same resource with multiple containers or Pods by referencing the corresponding resource claim. * **Centralized device categorization**: device drivers and cluster admins can use device classes to provide app operators with hardware categories that are optimized for various use cases. For example, you can create a cost-optimized device class for general-purpose workloads, and a high-performance device class for critical jobs. * **Simplified Pod requests**: with DRA, app operators don't need to specify device quantities in Pod resource requests. Instead, the Pod requests a resource claim, and the device configuration in that claim applies to the Pod. These benefits provide significant improvements in the device allocation workflow compared to [device plugins](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/), which require per-Pod device requests, don't support device sharing, and don't support expression-based device filtering. ### Types of DRA users {#dra-user-types} The workflow of using DRA to dynamically allocate devices involves the following types of users: * **Device owner**: responsible for devices. Device owners might be commercial vendors, the cluster operator, or another entity. To use DRA, devices must have DRA-compatible drivers that do the following: * Create ResourceSlices that provide Kubernetes with information about nodes and resources. * Update ResourceSlices when resource capacity in the cluster changes. * Optionally, create DeviceClasses that workload operators can use to claim devices. * **Cluster admin**: responsible for configuring clusters and nodes, attaching devices, installing drivers, and similar tasks. To use DRA, cluster admins do the following: * Attach devices to nodes. * Install device drivers that support DRA. * Optionally, create DeviceClasses that workload operators can use to claim devices. * **Workload operator**: responsible for deploying and managing workloads in the cluster. To use DRA to allocate devices to Pods, workload operators do the following: * Create ResourceClaims or ResourceClaimTemplates to request specific configurations within DeviceClasses. * Deploy workloads that use specific ResourceClaims or ResourceClaimTemplates. ## DRA terminology {#terminology} DRA uses the following Kubernetes API kinds: DeviceClass : Defines what devices can be claimed and how to select specific device attributes in claims. Devices are claimed by ResourceClaims that select parameters in the DeviceClass. ResourceClaim : Describes a request for access to attached resources, such as devices, in the cluster. ResourceClaims provide Pods with access to a specific resource. ResourceClaims can be created by workload operators or generated by Kubernetes based on a ResourceClaimTemplate. ResourceClaimTemplate : Defines a template that Kubernetes uses to create per-Pod ResourceClaims for a workload. ResourceClaimTemplates provide Pods with access to separate, similar resources. Each ResourceClaim that Kubernetes generates from the template is bound to a specific Pod. When the Pod terminates, Kubernetes deletes the corresponding ResourceClaim. ResourceSlice : Represents one or more resources that are attached to nodes, such as devices. Drivers create and manage ResourceSlices in the cluster. When a ResourceClaim is created and used in a Pod, Kubernetes uses ResourceSlices to find nodes that have access to the claimed resources. Kubernetes allocates resources to the ResourceClaim and schedules the Pod onto a node that can access the resources. ### DeviceClass A DeviceClass lets cluster admins or device drivers define categories of devices in the cluster. DeviceClasses tell operators what devices they can request and how they can request those devices. You can use [common expression language (CEL)](https://cel.dev) to select devices based on specific attributes. A ResourceClaim that references the DeviceClass can then request specific configurations within the DeviceClass. To create a DeviceClass, see [Dynamically Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/allocate-resources/dynamic-resource-allocation/#create-deviceclass). ### ResourceClaims and ResourceClaimTemplates {#resourceclaims-templates} A ResourceClaim defines a request for resource allocation. Every ResourceClaim references a specific DeviceClass and uses _requests_ and _constraints_ to filter the resources in the DeviceClass. ResourceClaims can be created by workload operators or can be generated by Kubernetes based on a ResourceClaimTemplate. A ResourceClaimTemplate defines a template that Kubernetes can use to auto-generate ResourceClaims for Pods. #### Use cases for ResourceClaims and ResourceClaimTemplates {#when-to-use-rc-rct} The method that you use depends on your requirements, as follows: * **ResourceClaim**: you want multiple Pods to share access to specific devices. You manually manage the lifecycle of ResourceClaims that you create. * **ResourceClaimTemplate**: you want Pods to have independent access to separate, similarly-configured devices. Kubernetes generates ResourceClaims from the specification in the ResourceClaimTemplate. The lifetime of each generated ResourceClaim is bound to the lifetime of the corresponding Pod. When you define a workload, you can use [common expression language (CEL)](https://cel.dev) to perform fine-grained filtering based on specific device attributes or capacity. The available parameters for filtering depend on the device and the drivers. If you directly reference a specific ResourceClaim in a Pod, that ResourceClaim must already exist in the cluster. If the ResourceClaim doesn't exist, the Pod won't schedule. You can reference an auto-generated ResourceClaim in a Pod, but this isn't recommended because auto-generated ResourceClaims are bound to the lifetime of the Pod that triggered the generation. To learn how to claim resources using one of these methods, see [Dynamically Allocate Devices to Workloads with DRA](/docs/tasks/configure-pod-container/allocate-resources/dynamic-resource-allocation/#claim-resources). ### ResourceSlice {#resourceslice} Each ResourceSlice represents one or more hardware resources in a pool of resources. The pool is managed by a device driver, which creates and manages ResourceSlices. The hardware resources in a pool might be represented by a single ResourceSlice or span multiple ResourceSlices. ResourceSlices provide useful information to device users and to the scheduler, and are crucial for dynamic resource allocation. Every ResourceSlice must include the following information: * **Resource pool**: a group of one or more resources that the driver manages. The pool can span more than one ResourceSlice. Changes to the resources in a pool must be propagated across all of the ResourceSlices in that pool. The device driver that manages the pool is responsible for ensuring that this propagation happens. * **Devices**: devices in the managed pool. A ResourceSlice can list every device in a pool or a subset of the devices in a pool. The ResourceSlice defines device information like attributes, versions, and capacity. Device users can select devices for allocation by filtering for device information in ResourceClaims or in DeviceClasses. * **Nodes**: the nodes that can access the resources. Drivers can choose which nodes can access the resources. These can be all of the nodes in the cluster, a single named node, or nodes that have specific node labels. Drivers use a {{< glossary_tooltip text="controller" term_id="controller" >}} to reconcile ResourceSlices in the cluster with the information that the driver has to publish. This controller overwrites any manual changes, such as cluster users creating or modifying ResourceSlices. Consider the following example ResourceSlice: ```yaml apiVersion: resource.k8s.io/v1beta1 kind: ResourceSlice metadata: name: cat-slice spec: driver: "resource-driver.example.com" pool: generation: 1 name: "black-cat-pool" resourceSliceCount: 1 # The allNodes field defines whether any node in the cluster can access the device. allNodes: true devices: - name: "large-black-cat" basic: attributes: color: string: "black" size: string: "large" cat: boolean: true ``` This ResourceSlice is managed by the `resource-driver.example.com` driver in the `black-cat-pool` pool. The `allNodes: true` field indicates that any node in the cluster can access the devices. There's one device in the ResourceSlice named `large-black-cat` with the following attributes: * `color`: `black` * `size`: `large` * `cat`: `true` A DeviceClass could select this ResourceSlice by using these attributes, and a ResourceClaim could filter for specific devices in that DeviceClass. ## How resource allocation with DRA works {#how-it-works} The following sections describe the workflow for the various [types of DRA users](#dra-user-types) and for the Kubernetes system during dynamic resource allocation. ### Workflow for users {#user-workflow} 1. **Driver creation**: device owners or third-party entities create drivers that can create and manage ResourceSlices in the cluster. These drivers optionally also create DeviceClasses that define a category of devices and how to request them. 1. **Cluster configuration**: cluster admins create clusters, attach devices to nodes, and install the DRA device drivers. Cluster admins optionally create DeviceClasses that define categories of devices and how to request them. 1. **Resource claims**: workload operators create ResourceClaimTemplates or ResourceClaims that request specific device configurations within a DeviceClass. 1. **Workload deployment**: workload operators configure Kubernetes manifests to request specific ResourceClaims or ResourceClaimTemplates. ### Workflow for Kubernetes {#kubernetes-workflow} 1. **ResourceSlice creation**: drivers in the cluster create ResourceSlices that represent one or more devices in a managed pool of similar devices. 1. **Workload deployment**: the cluster control plane checks new workloads for references to ResourceClaimTemplates or to specific ResourceClaims. * If the workload uses a ResourceClaimTemplate, a controller named the `resourceclaim-controller` generates ResourceClaims for every Pod in the workload. * If the workload uses a specific ResourceClaim, Kubernetes checks whether that ResourceClaim exists in the cluster. If the ResourceClaim doesn't exist, the Pods won't deploy. 1. **ResourceSlice filtering**: for every Pod, Kubernetes checks the ResourceSlices in the cluster to find a device that satisfies all of the following criteria: * The nodes that can access the resources are eligible to run the Pod. * The ResourceSlice has unallocated resources that match the requirements of the Pod's ResourceClaim. 1. **Resource allocation**: after finding an eligible ResourceSlice for a Pod's ResourceClaim, the Kubernetes scheduler updates the ResourceClaim with the allocation details. 1. **Pod scheduling**: when resource allocation is complete, the scheduler places the Pod on a node that can access the allocated resource. The device driver and the kubelet on that node configure the device and the Pod's access to the device. ## Observability of dynamic resources {#observability-dynamic-resources} You can check the status of dynamically allocated resources by using any of the following methods: * [kubelet device metrics](#monitoring-resources) * [ResourceClaim status](#resourceclaim-device-status) ### kubelet device metrics {#monitoring-resources} The `PodResourcesLister` kubelet gRPC service lets you monitor in-use devices. The `DynamicResource` message provides information that's specific to dynamic resource allocation, such as the device name and the claim name. For details, see The kubelet provides a gRPC service to enable discovery of dynamic resources of running Pods. For more information about the gRPC endpoints, see the [Monitoring device plugin resources](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#monitoring-device-plugin-resources). ### ResourceClaim device status {#resourceclaim-device-status} {{< feature-state feature_gate_name="DRAResourceClaimDeviceStatus" >}} DRA drivers can report driver-specific [device status](/docs/concepts/overview/working-with-objects/#object-spec-and-status) data for each allocated device in the `status.devices` field of a ResourceClaim. For example, the driver might list the IP addresses that are assigned to a network interface device. The accuracy of the information that a driver adds to a ResourceClaim `status.devices` field depends on the driver. Evaluate drivers to decide whether you can rely on this field as the only source of device information. If you disable the `DRAResourceClaimDeviceStatus` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/), the `status.devices` field automatically gets cleared when storing the ResourceClaim. A ResourceClaim device status is supported when it is possible, from a DRA driver, to update an existing ResourceClaim where the `status.devices` field is set. For details about the `status.devices` field, see the {{< api-reference page="workload-resources/resource-claim-v1beta1" anchor="ResourceClaimStatus" text="ResourceClaim" >}} API reference. ## {{% heading "whatsnext" %}} - [Dynamically allocate devices to workloads using DRA](TODO: how-to page) - [Reduce the risk of hardware-related privilege escalation](TODO: concept for admin access and security guidance) - [Good practices for dynamic resource allocation](TODO: good practices page) - [Build DRA support into device drivers](TODO: driver DRA guide) - For more information on the design, see the [Dynamic Resource Allocation with Structured Parameters](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters) KEP.