# NVIDIA - DTK + Centos Stream Userspace ## ClusterVersion ```bash= $ export CLUSTERVERSION=$(oc get clusterversion version -o json | jq .status.desired.version | tr -d '"') # 4.8.0-fc.8 ``` ## DTK image URL ```bash= $ oc adm release info ${CLUSTERVERSION} --image-for=driver-toolkit quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d07d95029663561dc58560751936dc9569bd77a397206e80fb5ab8778a56d920 ``` ## DriverContainer with precompiled driver packages ```dockerfile= FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d07d95029663561dc58560751936dc9569bd77a397206e80fb5ab8778a56d920 RUN yum config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo RUN cat /etc/driver-toolkit-release.json # { "KERNEL_VERSION": "4.18.0-305.3.1.el8_4.x86_64", "RT_KERNEL_VERSION": "4.18.0-305.3.1.rt7.75.el8_4.x86_64", "RHEL_VERSION": "8.4" } RUN rpm -ivh http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/centos-gpg-keys-8-2.el8.noarch.rpm RUN rpm -ivh http://mirror.centos.org/centos/8-stream/BaseOS/aarch64/os/Packages/centos-stream-repos-8-2.el8.noarch.rpm RUN dnf -y module install --setopt=install_weak_deps=False nvidia-driver:465/default RUN dnf -y install nvidia-fabricmanager-465 RUN systemctl enable kmods-via-containers@nvidia-driver ``` Installing `dnf module install --setopt=install_weak_deps=False nvidia-driver:465/fm` does not work, nvidia-plugin needs update? (looking for "wrong" package `nvidia-fabric-manager`, now it is called `nvidia-fabricmanager`) With the above `systemctl` we can use a Pod with a unit file to load services and load the appropriate kernel modules. DTK has the template unit file installed that we use also for day-1 installations of kmods (https://github.com/kmods-via-containers/kmods-via-containers) There is also a configuration file to set specific things, like which kernel modules to load. ```yaml= containers: - image: nvidia-driver:<TAG> name: nvidia-driver imagePullPolicy: Always command: ["/sbin/init"] lifecycle: preStop: exec: command: ["/bin/sh", "-c", "systemctl stop kmods-via-containers@nvidia-driver"] securityContext: privileged: true ``` We can leverage `systemd` features for prestart, start,reload, stop etc. ```bash= # cat ./system/default.target.wants/kmods-via-containers@nvidia-driver.service [Unit] Description=Kmods Via Containers - %i # Start after the network is up Wants=network-online.target After=network-online.target # Also after docker.service (no effect on systems without docker) After=docker.service # Before kubelet.service (no effect on systems without kubernetes) Before=kubelet.service # But before users are allowed to login Before=systemd-user-sessions.service [Service] Type=oneshot TimeoutStartSec=25m RemainAfterExit=true # Use bash to workaround https://github.com/coreos/rpm-ostree/issues/1936 ExecStartPre=/usr/bin/bash -c "kmods-via-containers build %i %v" ExecStart=/usr/bin/bash -c "kmods-via-containers load %i %v" ExecReload=/usr/bin/bash -c "kmods-via-containers reload %i %v" ExecStop=/usr/bin/bash -c "kmods-via-containers unload %i %v" StandardOutput=journal+console [Install] WantedBy=default.target [root@24b2eda10448 systemd]# ``` ## APPENDIX ```bash= Last metadata expiration check: 0:32:36 ago on Mon Jul 26 10:24:47 2021. Dependencies resolved. =================================================================================================================================================================================================================================================================================================================================================================================================================== Package Architecture Version Repository Size =================================================================================================================================================================================================================================================================================================================================================================================================================== Installing group/module packages: cuda-drivers x86_64 465.19.01-1 cuda-rhel8-x86_64 7.0 k nvidia-driver x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 20 M nvidia-driver-NVML x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 495 k nvidia-driver-NvFBCOpenGL x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 113 k nvidia-driver-cuda x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 322 k nvidia-driver-cuda-libs x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 28 M nvidia-driver-devel x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 12 k nvidia-driver-libs x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 140 M nvidia-kmod-common noarch 3:465.19.01-1.el8 cuda-rhel8-x86_64 10 k nvidia-libXNVCtrl x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 51 k nvidia-libXNVCtrl-devel x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 55 k nvidia-modprobe x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 74 k nvidia-persistenced x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 98 k nvidia-settings x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 1.8 M nvidia-xconfig x86_64 3:465.19.01-1.el8 cuda-rhel8-x86_64 262 k Installing dependencies: abattis-cantarell-fonts noarch 0.0.25-6.el8 appstream 156 k adwaita-cursor-theme noarch 3.28.0-2.el8 appstream 647 k adwaita-icon-theme noarch 3.28.0-2.el8 appstream 11 M at-spi2-atk x86_64 2.26.2-1.el8 appstream 89 k at-spi2-core x86_64 2.28.0-1.el8 appstream 169 k atk x86_64 2.28.1-1.el8 appstream 272 k avahi-libs x86_64 0.7-20.el8 baseos 62 k cairo x86_64 1.15.12-3.el8 appstream 721 k cairo-gobject x86_64 1.15.12-3.el8 appstream 33 k colord-libs x86_64 1.4.2-1.el8 appstream 236 k cups-libs x86_64 1:2.2.6-40.el8 baseos 433 k dnf-plugin-nvidia noarch 2.0-1.el8 cuda-rhel8-x86_64 12 k egl-wayland x86_64 1.1.7-1.el8 appstream 34 k file x86_64 5.33-16.el8_3.1 ubi-8-baseos 77 k fontconfig x86_64 2.13.1-3.el8 baseos 275 k fontpackages-filesystem noarch 1.44-22.el8 baseos 16 k freetype x86_64 2.9.1-4.el8_3.1 baseos 394 k fribidi x86_64 1.0.4-8.el8 appstream 89 k gdk-pixbuf2 x86_64 2.36.12-5.el8 baseos 467 k gdk-pixbuf2-modules x86_64 2.36.12-5.el8 appstream 109 k gettext x86_64 0.19.8.1-17.el8 baseos 1.1 M gettext-libs x86_64 0.19.8.1-17.el8 baseos 314 k glib-networking x86_64 2.56.1-1.1.el8 baseos 155 k graphite2 x86_64 1.3.10-10.el8 appstream 122 k grub2-common noarch 1:2.02-99.el8 baseos 890 k grub2-tools x86_64 1:2.02-99.el8 baseos 2.0 M grub2-tools-minimal x86_64 1:2.02-99.el8 baseos 209 k grubby x86_64 8.40-41.el8 baseos 49 k gsettings-desktop-schemas x86_64 3.32.0-5.el8 baseos 633 k gtk-update-icon-cache x86_64 3.22.30-6.el8 appstream 32 k gtk3 x86_64 3.22.30-6.el8 appstream 4.5 M harfbuzz x86_64 1.7.5-3.el8 appstream 295 k hicolor-icon-theme noarch 0.17-2.el8 appstream 49 k jansson x86_64 2.11-3.el8 baseos 46 k jasper-libs x86_64 2.0.14-5.el8 appstream 167 k jbigkit-libs x86_64 2.1-14.el8 appstream 55 k kmod-nvidia-465.19.01-4.18.0-305.3.1 x86_64 3:465.19.01-3.el8_4 cuda-rhel8-x86_64 25 M lcms2 x86_64 2.9-2.el8 appstream 165 k libX11 x86_64 1.6.8-4.el8 appstream 611 k libX11-common noarch 1.6.8-4.el8 appstream 158 k libX11-devel x86_64 1.6.8-4.el8 appstream 976 k libX11-xcb x86_64 1.6.8-4.el8 appstream 14 k libXau x86_64 1.0.9-3.el8 appstream 37 k libXau-devel x86_64 1.0.9-3.el8 appstream 21 k libXcomposite x86_64 0.4.4-14.el8 appstream 28 k libXcursor x86_64 1.1.15-3.el8 appstream 36 k libXdamage x86_64 1.1.4-14.el8 appstream 27 k libXdmcp x86_64 1.1.3-1.el8 appstream 41 k libXext x86_64 1.3.4-1.el8 appstream 45 k libXfixes x86_64 5.0.3-7.el8 appstream 25 k libXfont2 x86_64 2.0.3-2.el8 appstream 149 k libXft x86_64 2.3.3-1.el8 appstream 67 k libXi x86_64 1.7.10-1.el8 appstream 49 k libXinerama x86_64 1.1.4-1.el8 appstream 16 k libXrandr x86_64 1.5.2-1.el8 appstream 34 k libXrender x86_64 0.9.10-7.el8 appstream 33 k libXtst x86_64 1.2.3-7.el8 appstream 22 k libXxf86vm x86_64 1.1.4-9.el8 appstream 19 k libcroco x86_64 0.6.12-4.el8_2.1 baseos 113 k libdatrie x86_64 0.2.9-7.el8 appstream 33 k libdrm x86_64 2.4.106-2.el8 appstream 167 k libepoxy x86_64 1.5.8-1.el8 appstream 225 k libevdev x86_64 1.10.0-1.el8 appstream 44 k libfontenc x86_64 1.1.3-8.el8 appstream 37 k libglvnd x86_64 1:1.3.2-1.el8 appstream 127 k libglvnd-egl x86_64 1:1.3.2-1.el8 appstream 49 k libglvnd-gles x86_64 1:1.3.2-1.el8 appstream 40 k libglvnd-glx x86_64 1:1.3.2-1.el8 appstream 137 k libglvnd-opengl x86_64 1:1.3.2-1.el8 appstream 47 k libgomp x86_64 8.5.0-3.el8 baseos 206 k libgudev x86_64 232-4.el8 baseos 33 k libgusb x86_64 0.3.0-1.el8 baseos 49 k libinput x86_64 1.16.3-2.el8 appstream 217 k libjpeg-turbo x86_64 1.5.3-12.el8 appstream 157 k libmodman x86_64 2.0.1-17.el8 baseos 36 k libpciaccess x86_64 0.14-1.el8 baseos 32 k libpng x86_64 2:1.6.34-5.el8 baseos 126 k libproxy x86_64 0.4.15-5.2.el8 baseos 75 k libsoup x86_64 2.62.3-2.el8 baseos 424 k libthai x86_64 0.1.27-2.el8 appstream 203 k libtiff x86_64 4.0.9-20.el8 appstream 188 k libvdpau x86_64 1.4-2.el8 appstream 41 k libwacom x86_64 1.6-3.el8 appstream 42 k libwacom-data noarch 1.6-3.el8 appstream 104 k libwayland-client x86_64 1.19.0-1.el8 appstream 39 k libwayland-cursor x86_64 1.19.0-1.el8 appstream 26 k libwayland-egl x86_64 1.19.0-1.el8 appstream 19 k libwayland-server x86_64 1.19.0-1.el8 appstream 47 k libxcb x86_64 1.13.1-1.el8 appstream 229 k libxcb-devel x86_64 1.13.1-1.el8 appstream 1.1 M libxkbcommon x86_64 0.9.1-1.el8 appstream 116 k libxkbfile x86_64 1.1.0-1.el8 appstream 88 k libxshmfence x86_64 1.3-2.el8 appstream 13 k llvm-libs x86_64 12.0.0-1.module_el8.5.0+840+21214faf appstream 23 M mesa-libEGL x86_64 21.1.3-1.el8.0.1 appstream 135 k mesa-libGL x86_64 21.1.3-1.el8.0.1 appstream 184 k mesa-libgbm x86_64 21.1.3-1.el8.0.1 appstream 57 k mesa-libglapi x86_64 21.1.3-1.el8.0.1 appstream 66 k mesa-vulkan-drivers x86_64 21.1.3-1.el8.0.1 appstream 6.1 M mtdev x86_64 1.1.5-12.el8 appstream 24 k ocl-icd x86_64 2.2.12-1.el8 appstream 51 k opencl-filesystem noarch 1.0-6.el8 appstream 8.4 k os-prober x86_64 1.74-6.el8 baseos 51 k pango x86_64 1.42.4-8.el8 appstream 297 k pixman x86_64 0.38.4-1.el8 appstream 257 k rest x86_64 0.8.1-2.el8 appstream 70 k vulkan-loader x86_64 1.2.182.0-1.el8_4 appstream 120 k xkeyboard-config noarch 2.28-1.el8 appstream 782 k xorg-x11-drv-fbdev x86_64 0.5.0-2.el8 appstream 27 k xorg-x11-drv-libinput x86_64 0.29.0-1.el8 appstream 50 k xorg-x11-drv-vesa x86_64 2.4.0-3.el8 appstream 31 k xorg-x11-proto-devel noarch 2020.1-3.el8 appstream 280 k xorg-x11-server-Xorg x86_64 1.20.11-2.el8 appstream 1.5 M xorg-x11-server-common x86_64 1.20.11-2.el8 appstream 42 k xorg-x11-xkb-utils x86_64 7.7-28.el8 appstream 114 k Installing module profiles: nvidia-driver/default Enabling module streams: llvm-toolset rhel8 nvidia-driver 465 Transaction Summary =================================================================================================================================================================================================================================================================================================================================================================================================================== Install 130 Packages ```