---
# System prepended metadata

title: Mutually Exclusive Device Allocations in DRA

---

# Mutually Exclusive Device Allocations in DRA

## Summary

This document proposes an extension to the Dynamic Resource Allocation (DRA) API to support mutually exclusive device allocation constraints. 

## Motivation
Hardware devices often support multiple partitioning or virtualization schemes that provide different trade-offs in terms of isolation, performance, and resource sharing. However, these schemes are frequently mutually exclusive at the hardware level—once a physical device is partitioned or configured using one scheme, it cannot be reconfigured to use a different scheme until all existing allocations are released.

### Goals
- Allow DRA drivers to specify compatibility between virtual devices within a single physical device
- Allow the scheduler to make informed allocation decisions that respect compatibility rules
- Provide a generic mechanism applicable to any hardware with partitioning constraints
- Maintain backward compatibility with existing ResourceSlice specifications

### Non Goals
- Allow DRA drivers to specify compatibility between physical/virtual devices across different phisical devices or device classes

### Problem Statement

The current Partitionable Devices API does not provide a mechanism to express mutual exclusivity constraints between devices. Without this capability:

1. **Late Failure Detection**: Incompatible allocations are only detected during resource preparation (after scheduling decisions are made)
2. **Scheduler Unawareness**: The scheduler may allocate incompatible devices, leading to pod startup failures
3. **Poor User Experience**: Users receive cryptic preparation failures instead of clear scheduling feedback
4. **Resource Thrashing**: The scheduler may repeatedly attempt incompatible allocations

**Current Workaround Limitations:**

DRA drivers must fail resource preparation when incompatible allocations are attempted.


### Use Case
**Generic Example:**

Consider a physical accelerator device that supports four distinct operational modes, in all cases, **Partitionable Devices** is utilized:

1. **Exclusive Mode**: The entire physical device allocated to a single consumer
2. **Software-Partitioned Mode A** : Multiple consumers share the physical device through virtual devices
3. **Software-Partitioned Mode B** : Multiple consumers share the physical device through virtual devices
4. **Hardware-Partitioned Mode**: The device is divided into distinct isolated hardware partitions

These modes have compatibility constraints:
- **Exclusive Mode** is incompatible with all other modes
- **Software-Partitioned Mode A** may be compatible with **Software-Partitioned Mode B**, but not with exclusive and hardware partitioning modes
- **Hardware-Partitioned Mode** creates fixed partitions that cannot coexist with other partitioning modes

The constraint is bidirectional and transitive: if partition mode A excludes partition mode B, then allocating A must prevent B from being allocated, and vice versa.

#### GPU Example
```yaml
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
...
spec:
  sharedCounters:
    # This counter set represents a specific physical device.
    - name: gpu-0-cs
      counters:
        multiprocessors:
          value: "152"

  devices:
    # Incompatible with any other partitioning schemes
    - name: gpu-0
      bindsToNode: true
      consumesCounters:
      - counterSet: gpu-0-cs
        counters:
          multiprocessors:
            value: "152"
      attributes:
        partitioningMode:
          string: None

    # Incompatible with MIGSlicing and None partitioning modes,
    # but compatible with MPSSharing mode
    - name: gpu-0-fraction-0
      bindsToNode: true
      allowMultipleAllocations: true
      consumesCounters:
      - counterSet: gpu-0-cs
        counters:
          multiprocessors:
            value: "76"
      capacity:
        ...
      attributes:
        partitioningMode:
          string: GPUFractioning
    
    # Incompatible with any other partitioning modes, only compatible with devices 
    # partitioned with the same mode (MIGSlicing)
    - name: gpu-0-mig-1g.5gb-0 
      bindsToNode: true
      consumesCounters:
      - counterSet: gpu-0-cs
        counters:
          multiprocessors:
            value: "2"
      attributes:
        partitioningMode:
          string: MIGSlicing
    
    # Incompatible with any other partitioning modes, only compatible with devices 
    # partitioned with the same mode (MIGSlicing)
    - name: gpu-0-mig-1g.5gb-1 
      bindsToNode: true
      consumesCounters:
      - counterSet: gpu-0-cs
        counters:
          multiprocessors:
            value: "2"
      attributes:
        partitioningMode:
          string: MIGSlicing
  
    # Incompatible with the MIGSlicing and None
    # partitioning modes, but compatible with GPUFractioning
    - name: gpu-0-mps-0 
      bindsToNode: true
      allowMultipleAllocations: true
      consumesCounters:
      - counterSet: gpu-0-cs
        counters:
          multiprocessors:
            value: "15"
      capacity:
        ...
      attributes:
        partitioningMode:
          string: MPSSharing
```

## Proposal 1 - CompatibilityGroups Assignment

### API Changes

Add the `device.consumesCounters[].compatibilityGroups` field which specifies which device groups this device is compatible with.
Other devices must specify at least one `compatibilityGroup` from this list to be considered compatible. 

#### Field Structure

```yaml
apiVersion: resource.k8s.io/v1
kind: ResourceSlice
...
spec:
  sharedCounters:
    # This counter set represents a specific, physical device.
    - name: gpu-1-cs
      counters:
        multiprocessors:
          value: "152"
  devices:
    # Full, physical device. Consumes full counter set `gpu-1-cs`.
    - name: gpu-1
      attributes:
        type:
          string: gpu
      consumesCounters:
        - counterSet: gpu-1-cs
          counters:
            multiprocessors:
              value: "152"

    # MIG partition. This cannot be allocated
    # - when device `gpu-1` is allocated
    #   (reason: counters exhausted)
    # - when device `gpu-1-foo-part` is allocated
    #   (reason: mismatching compatibilityGroups)
    - name: gpu-1-mig1
      attributes:
        type:
          string: mig
      consumesCounters:
        - counterSet: gpu-1-cs
          # Can only consume from the same counter set when
          # all existing consumers also list compatibilityGroup "mig".
          compatibilityGroups:
            - mig
          counters:
            multiprocessors:
              value: "2"


    # FOO partition. This cannot be allocated
    # - when device `gpu-1` is allocated
    #   (reason: counters exhausted).
    # - when device `gpu-1-mig1` is allocated
    #   (reason: mismatching compatibilityGroups).
    #
    # This can generally still be allocated
    # - when `gpu-1-bar-part` is allocated
    #  (reason: shared compatibilityGroups "bar").
    #
    # The relationship between the foo and bar type
    # partitions on the same physical device is
    # modeled by counter consumption.
    - name: gpu-1-foo-part
      attributes:
        type:
          string: foo
      consumesCounters:
        - counterSet: gpu-1-cs
          compatibilityGroups:
            - foo
            - bar
          counters:
            multiprocessors:
              value: "17"

    # BAR paritition. Similar considerations as
    # described for FOO partition.
    - name: gpu-1-bar-part
      attributes:
        type:
          string: bar
      consumesCounters:
        - counterSet: gpu-1-cs
          compatibilityGroups:
            - bar
          counters:
            multiprocessors:
              value: "2"
```

### Semantics

#### Device Groupings

1. **Group Declaration**: Devices must declare which groups they are compatible with, otherwise they are assumed compatible with all groups.

3. **Scope**: Grouping rules apply:
   - To all devices within a device class, that specify `compatibilityGroups`
   - Across all resource claims

4. **Scheduler Enforcement**: The scheduler must:
   - Evaluate exclusion constraints during device selection
   - Skip device candidates that would violate existing allocations
   - Track allocated devices and their exclusion rules

## Proposal 2 - Attribute-based Compatibility with CEL

### API Changes

Add an optional `compatibleOnlyWith` field to device objects within the ResourceSlice specification. This field allows devices to declare which other devices can be allocated alongside them. 
If not provided, a device is deemed compatible with all other devices to preserve backwards compatibility

#### Field Structure

```yaml
devices:
- name: device-name
  # ... existing device fields ...

  # New field: compatibleOnlyWith
  # Specifies a CEL expression that the scheduler filters devices with when attempting
  # a device allocation.
  # This field is optional. If not specified, the device has no compatibility constraints.
  compatibleOnlyWith:
    expression: "cel exp"
```

### Semantics

#### Exclusion Rules

1. **Mutual Exclusivity**: If device A specifies a compatibility expression, scheduler must:
  - Evaluate the expression against already allocated devices when the device is considered for allocation
  - Evaluate the expression against devices that are considered for allocation if a device with an expression is already allocated

3. **Scope**: Compatibility expressions apply:
   - To all devices within a device class
   - Across all resource claims

#### Example Exclusion Patterns

**Pattern 1: Device-Level Exclusivity**
```yaml
- name: device-full
  attributes:
    physicalDevice: dev-0
  # Excludes all devices whos underlying device is dev-0
  compatibleOnlyWith:
    expression: 'device.attributes["device.example.com"].physicalDevice != "dev-0"'
```

**Pattern 2: Mode-Based Exclusivity**
```yaml
- name: dev-0-partition-1
  attributes:
    physicalDevice: dev-0
    mode: hardware-partitioned
  # Only compatible with specific paritioning modes
  compatibleOnlyWith:
    expression: 'device.attributes["device.example.com"].mode == "hardware-partitioned"'
```

## Proposal Comparison
**Attribute-based Compatibility with CEL**
- **Higher degree of freedom**:
  - Device compatibility can be defined in a multi-dimentional way, not only physical device placement
  - Can be extended to support additional use cases in the future (maybe across device-classes?)

**CompatibilityGroups Assignment**
- **Cleaner and simpler implementation** - Minimal additions to the API and codebase that solve the problem at hand

## Implementation Considerations

### Scheduler Changes

The DRA scheduler plugin must be enhanced to:

1. **Track Allocated Devices**: Maintain a cache of allocated devices per node with their attributes and compatibility expressions, or group mapping
2. **Evaluate Exclusions**: For each candidate device:
   - Check if all allocated devices are copmatible with this candidate
   - Check if this candidate is compatible with all allocated devices
3. **Filter Candidates**: Remove devices from consideration if they violate compatibility constraints
4. **Handle Allocation Failures**: If an incompatible device is allocated, provide clear feedback in scheduling events

### Driver Responsibilities

Resource drivers should:

1. **Declare Constraints**: Populate `compatibleOnlyWith` or `compatibilityGroups` for all devices with compatibility requirements
2. **Validation**: Ensure compatibility rules are symmetric and consistent across devices
4. **Documentation**: Document their compatibility matrix

### Backward Compatibility

- Both approaches are opt-in
- Devices without `compatibleOnlyWith` or `compatibilityGroups` behave identically to current behavior
- No changes to existing API fields or semantics
- Older schedulers will ignore the new field but may allocate incompatible devices (same as current behavior)
