Hardware Enablement Phases

Hardware Enablement Phases == Aug 12, 2020 Last modified: Oct 27, 2020 By [Zvonko Kaiser](https://blog.openshift.com/author/zkosic/) :::info :bell: **Hint:** Before deploying any accelerator stack on a container platform it is worthwhile to first test the toolchain and drivers in containers on bare-metal. ::: ## 1. Out-of-tree Kernel Modules Depending on if the driver is inbox or out-of-tree it may be needed to build a DriverContainer. What DriverContainers are and how they are used is described here: [red.ht/2XyOKz6](http://red.ht/2JQuNwB) Part 1: How to Enable Hardware Accelerators on OpenShift goes into detail how DriverContainers are used to enable out-of-tree drivers on OpenShit. [red.ht/2JQuNwB](http://red.ht/2JQuNwB) DriverContainers are also utilized to bridge the gap until they become inbox drivers. ## 2. Container/Runtime Enablement - Bare Metal Before deploying any accelerator stack on a container platform it is worthwhile to first test the toolchain and drivers in containers on bare-metal. There are currently two ways to add needed devices, libraries and binaries into a container. (1) All the needed files are installed into the container or (2) a prestart hook is used to mount the needed files from another source (host, container, volume). :::info :bell: **Hint:** Define the strategy of how the needed files are going to be made available in a container ::: Prestart hooks are often used to provide the needed files from a single source rather than having them installed in every container that may have different versions. ## 3. SELinux - Bare Metal Mounting files from different sources or creating devices, files etc can introduce SELinux labels that are not compatible with the container context (udev, copying mounting). Different containers and the host cannot interact _directly _with each other since they are contained. :::info :bell: **Hint:** Check for any AVCs and permission errors for all of your files and processes interacting with each other. ::: ## 4. Bootstrap Heterogeneity - Kubernetes In a heterogeneous cluster, users/admins do not want to label the nodes manually. OpenShift introduced Node Feature Discovery (NFD) to label nodes according to the features that a node exposes (CPU-flags, PCI-devices, SRIOV, etc). For a full list of all feature sources have a look at: [https://github.com/kubernetes-sigs/node-feature-discovery](https://github.com/kubernetes-sigs/node-feature-discovery) :::info :bell: **Hint:** PCI devices are automatically detected if they expose the vendor and class ID. NFD can also be configured to look for other PCI attributes. ::: If none of the feature sources is currently available for detecting the hardware, NFD also implemented a simple rule system to label the nodes according to the loaded kernel modules. If one needs more functionality create a PR and we can review it and add it to NFD. Pods, DaemonSets and other resources can now be scheduled on the right node that exposes the right accelerator. ![](https://i.imgur.com/I78bNUO.png) ## 5. Minimal Stack - Kubernetes The minimal stack for OpenShift looks like the following. ![](https://i.imgur.com/oCMEFOU.png) | Requirement | Description | | -------- | -------- | | REQ1000 | NFD is able to recognize and expose the hardware via node label | ||`$ kubectl describe node | grep pci-<vendor-id>`| | REQ2000 | DriverContainer needed to be built to enable the hardware | | REQ3000 | After enablement, we run a validation pod, user-space library can read HW settings or execute a small workload testing the correct functionality of the driver & HW | | REQ4000 | DevicePlugin exposing the hardware as an extended resource to the cluster. The DevicePlugin only exposes healthy cards to the cluster hence a userspace library needs to provide an interface to query healthy cards. `[Healthy | NotHealthy ]`| |REQ5000 | A small workload allocating the extended resource and running a small workload or a user-space library reading HW settings. | ## 6. Advanced Stack - Kubernetes The advanced stack adds metrics and feature discover. ![](https://i.imgur.com/DgVRTit.png) | Requirement | Description | | -------- | -------- | |REQ6000 | Expose hardware metrics via custom prometheus exporter on a specific port, must be > 9000 otherwise OpenShift will deny scraping from it. SRO handles the ServiceMonitor, Service, Namespace annotation and other configurations to plug-in the custom exporter into the OpenShift monitoring stack. | | REQ7000 | Custom Grafana Dashboard for the custom hardware that can be deployed with SRO. SRO will set up a custom Grafana and plug it into the OpenShift monitoring stack. | |REQ8000 | Advanced feature discovery detection, build a sidecar container that exposes new features as labels that are only available after the driver is loaded. Userspace can now extract more information (firmware-level, compute units, memory, etc). Can be used for advanced scheduling decisions. | ## 7. Full Stack - Kubernetes ![](https://i.imgur.com/hOJmSt2.png) | Requirement | Description | | -------- | -------- | |REQ9000| Topology Manager Support| |REQ10000| Cluster and Horizontal Pod Autoscaler (Vertical Pod Autoscaler?)| |REQ11000 | NUMA Aware Scheduling| |REQ12000 | Quotas | |REQ13000 | Prebuild DriverContainers | |REQ14000 | Separate Data Plane (Multus & High speed interconnects) | |REQ15000 | Proxy support | ## Appendix ![](https://i.imgur.com/3lIIiGJ.png) Accelerators on Kubernetes, OpenShift & OKD [https://bit.ly/3oejOA0](https://bit.ly/3oejOA0) Part 1: How to Enable Hardware Accelerators on OpenShift [red.ht/2JQuNwB](http://red.ht/2JQuNwB) Part 2: How to enable Hardware Accelerators on OpenShift, SRO Building Blocks [red.ht/34ubzq3](http://red.ht/34ubzq3) How to use entitled image builds to build DriverContainers with UBI on OpenShift [red.ht/2XyOKz6](http://red.ht/2XyOKz6)

Read more

NVIDIA - DTK + Centos Stream Userspace

NVIDIA CUDA Driver Dependency List

Supporting out-of-tree drivers on OpenShift - "@zvonkok"

How to use GPUs with OKD 4.5