# Documentación GPU ## Configuración BIOS Aseguramos que está la virtualización disponible, por defecto ya venía activada ![](https://i.imgur.com/dJqvRmX.png) Hay que verificar que tiene sriov y ari activado El sriov hay que activarlo [x] SRIOV ACTIVADO (ya ) Las opciones de ARI no veo claro que hacer, porque por defecto está en enable y en el manual no aparece la de **pcie ari enumeration**. De momento lo hemos dejado como ir a PCIe/PCI/PnP Configuration * PCIe ARI Support => lo cambiamos enable * PCIe ARI Enumeration => lo dejamos en auto ![](https://i.imgur.com/j2uiC7E.png) ## Instalación Drivers Nvidia ### Aivar iommu en el grub en el fichero de grub **/etc/default/grub** añadir: ``` GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on" ``` en el caso de amd sería: ``` GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on" ``` y hacemos un update grub: ``` update-grub ``` ### Descargar software nvidia Ir al software: https://ui.licensing.nvidia.com/software, buscar ubuntu o linux y mirar la que está más actualizada, en este caso es la 14.2/14.3: ![](https://i.imgur.com/2RVCMBo.png) Descomprimir y subir el .deb/.run dentro de Host Drivers en el fichero zip que te descargas de la web de nvidia: - Listamos el contenido del zip ``` $ unzip -l Downloads/NVIDIA-GRID-Ubuntu-KVM-510.85.03-510.85.02-513.46.zip Archive: Downloads/NVIDIA-GRID-Ubuntu-KVM-510.85.03-510.85.02-513.46.zip Length Date Time Name --------- ---------- ----- ---- 385361 2022-07-29 11:32 510.85.03-510.85.02-513.46-grid-gpumodeswitch-user-guide.pdf 749855 2022-07-29 11:38 510.85.03-510.85.02-513.46-grid-licensing-user-guide.pdf 3372262 2022-07-29 10:55 510.85.03-510.85.02-513.46-grid-software-quick-start-guide.pdf 440851 2022-07-29 11:14 510.85.03-510.85.02-513.46-grid-vgpu-release-notes-ubuntu.pdf 6461556 2022-07-29 11:28 510.85.03-510.85.02-513.46-grid-vgpu-user-guide.pdf 191372 2022-07-29 10:47 510.85.03-510.85.02-513.46-whats-new-vgpu.pdf 0 2022-08-02 22:35 Guest_Drivers/ 636332712 2022-07-31 22:41 Guest_Drivers/513.46_grid_win10_win11_server2019_server2022_64bit_international.exe 331718264 2022-07-31 22:42 Guest_Drivers/nvidia-linux-grid-510_510.85.02_amd64.deb 335906433 2022-07-31 22:42 Guest_Drivers/NVIDIA-Linux-x86_64-510.85.02-grid.run 0 2022-08-02 22:30 Host_Drivers/ 29919246 2022-07-31 22:42 Host_Drivers/nvidia-vgpu-ubuntu-510_510.85.03_amd64.deb --------- ------- 1345477912 12 files ``` - Descomprimimos el fichero: ``` unzip -j Downloads/NVIDIA-GRID-Ubuntu-KVM-510.85.03-510.85.02-513.46.zip Host_Drivers/nvidia-vgpu-ubuntu-510_510.85.03_amd64.deb ``` - scp para subir fichero al servidor de GPU: ``` scp nvidia-vgpu-ubuntu-510_510.85.03_amd64.deb root@a40.isardvdi.com:/opt/ ``` En el servidor GPU apt install: ``` root@a40:/opt# apt install ./nvidia-vgpu-ubuntu-510_510.85.03_amd64.deb Reading package lists... Done Building dependency tree Reading state information... Done Note, selecting 'nvidia-vgpu-ubuntu-510' instead of './nvidia-vgpu-ubuntu-510_510.85.03_amd64.deb' The following packages will be REMOVED: nvidia-vgpu-ubuntu-470 The following NEW packages will be installed: nvidia-vgpu-ubuntu-510 ``` Reboot y verificamos la versión: ``` root@a40:~# nvidia-smi Sun Oct 23 20:44:47 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | ``` ## Activación GPU ``` root@gpu:~# nano gpus.sh ``` ``` #!/bin/bash for i in $(nvidia-smi --query-gpu=pci.bus_id --format=csv,noheader) do /usr/lib/nvidia/sriov-manage -e $i done cd /opt/isard/src docker compose up -d ``` ``` root@gpu:~# chmod +x gpus.sh ``` ``` root@gpu:~# nano sriov-manage.service ``` ``` [Unit] Description=Activate gpu Before=docker.service After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service [Service] Type=oneshot ExecStart=/root/gpus.sh RemainAfterExit=yes [Install] WantedBy=multi-user.target ``` ``` ln -s /root/sriov-manage.service /etc/systemd/system/sriov-manage.service systemctl daemon-reload systemctl enable sriov-manage systemctl start sriov-manage ``` ``` systemctl status sriov-manage root@gpu:~# systemctl status sriov-manage ● sriov-manage.service - Activate gpu Loaded: loaded (/etc/systemd/system/sriov-manage.service; enabled; vendor preset: enabled) Active: active (exited) since Thu 2022-10-27 10:46:59 CEST; 9min ago Process: 1456 ExecStart=/root/gpus.sh (code=exited, status=0/SUCCESS) Main PID: 1456 (code=exited, status=0/SUCCESS) Oct 27 10:46:58 gpu gpus.sh[2657]: Container isard-api Healthy Oct 27 10:46:58 gpu gpus.sh[2657]: Container isard-authentication Starting Oct 27 10:46:58 gpu gpus.sh[2657]: Container isard-stats-go Starting Oct 27 10:46:58 gpu gpus.sh[2657]: Container isard-vpn Starting Oct 27 10:46:58 gpu gpus.sh[2657]: Container isard-webapp Starting Oct 27 10:46:59 gpu gpus.sh[2657]: Container isard-stats-go Starteda Oct 27 10:46:59 gpu gpus.sh[2657]: Container isard-authentication Started Oct 27 10:46:59 gpu gpus.sh[2657]: Container isard-webapp Started Oct 27 10:46:59 gpu gpus.sh[2657]: Container isard-vpn Started Oct 27 10:46:59 gpu systemd[1]: Finished Activate gpu. ```