# Harvester GPU passthrough 設定
## 巢狀虛擬化架構設定
* 如果是實體機可以跳過此步驟
* 設定 viommu

* 在 pve 上掛載 PCI Device 給 harvester 使用

## 環境檢查
* 環境檢查
```
$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2805 (rev a1)
$ cat /proc/cmdline
BOOT_IMAGE=(loop0)/boot/vmlinuz console=tty1 root=LABEL=COS_STATE cos-img/filename=/cOS/active.img panic=0 net.ifnames=1 rd.cos.oemlabel=COS_OEM rd.cos.mount=LABEL=COS_OEM:/oem rd.cos.mount=LABEL=COS_PERSISTENT:/usr/local rd.cos.oemtimeout=120 audit=1 audit_backlog_limit=8192 intel_iommu=on amd_iommu=on iommu=pt
$ lspci -vvs 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2805 (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 5174
Physical Slot: 0
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f9000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at 383800000000 (64-bit, prefetchable) [size=16G]
Region 3: Memory at 383c00000000 (64-bit, prefetchable) [size=32M]
Region 5: I/O ports at 5000 [size=128]
Expansion ROM at fa000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <16us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Via message
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [128 v1] Power Budgeting <?>
Capabilities: [420 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [bb0 v1] #15
Kernel driver in use: vfio-pci
$ find /sys/kernel/iommu_groups/ -type l | grep "0000:01:00.0"
/sys/kernel/iommu_groups/11/devices/0000:01:00.0
```
* 檢查 Nvidia 的 IOMMU Group 一定是要自己獨立一個
```
$ vim iommu.sh
#!/bin/bash
shopt -s nullglob
for g in $(find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V); do
echo "IOMMU Group ${g##*/}:"
for d in $g/devices/*; do
echo -e "\t$(lspci -nns ${d##*/})"
done;
done;
$ bash iommu.sh
......
IOMMU Group 11:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2805] (rev a1)
```
## 啟用 pcidevices-controller

## Advanced > PCI Devices
* 啟用 Nvidia Devices

```
$ kubectl get pcideviceclaim
NAME ADDRESS NODE NAME USER NAME KERNEL DRIVER ΤΟ UNBIND PASSTHROUGH ENABLED
hvx-1-000001000 0000:01:00.0 hvx-1 admin true
```
## 建立 ubuntu vm
* 指定 node 並且選擇剛剛啟用的 device


## ubuntu install nvidia driver
* 進到 ubuntu 後檢查是否有看到 NVIDIA
```
$ lspci |grep -i nvidia
0a:00.0 VGA compatible controller: NVIDIA Corporation AD106 [GeForce RTX 4060 Ti 16GB] (rev a1)
```
* 安裝 driver
```
$ wget https://tw.download.nvidia.com/XFree86/Linux-x86_64/550.54.14/NVIDIA-Linux-x86_64-550.54.14.run
$ apt update
$ apt install gcc make
$ ./NVIDIA-Linux-x86_64-550.54.14.run
```
* 裝好後執行 `nvidia-smi` 還是有問題可以參考以下方式解決
* 參考 https://medium.com/@yt.chen/nvidia-smi-%E9%80%A3%E4%B8%8D%E5%88%B0-driver-%E7%9A%84%E8%87%AA%E6%95%91%E6%96%B9%E6%B3%95-69cbed16171d
```
$ ls /usr/src | grep nvidia
nvidia-550.54.14
$ apt-get install dkms
$ dkms install -m nvidia -v 550.54.14
$ apt-get install linux-headers-6.8.0-49-generic
$ apt upgrade
$ reboot
```
* 重啟後就執行 `nvidia-smi` 就可以看到 nvidia 了
```
$ nvidia-smi
Wed Dec 11 14:36:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:0A:00.0 Off | N/A |
| 33% 34C P0 29W / 165W | 0MiB / 16380MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
```

## troubleshotting
* 在啟動 vm 時有這個報錯,代表其他 group 9 的設備不是 bind 在 vfio-pci 所以導致 vm 都起不來,因此掛載的 nvidia iommu group 一定要自己獨立一個。
```
{"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"ubuntu","namespace":"default","pos":"server.go:202","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2024-12-09T14:01:18.841359Z qemu-system-x86_64: -device {\"driver\":\"vfio-pci\",\"host\":\"0000:06:10.0\",\"id\":\"ua-hostdevice-hvx-1-000006100\",\"bus\":\"pci.11\",\"addr\":\"0x1\"}: vfio 0000:06:10.0: group 9 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')","timestamp":"2024-12-09T14:01:19.043300Z","uid":"aff1fd41-f342-4ae7-8bd1-5fc3ab47226b"}
```
* 如果遇到這個問題重新啟用 pcidevices-controller 功能
```
$ kubectl get vm ubuntu -oyaml | grep -i message
message: virt-launcher pod has not yet been scheduled
message: 'failed to render launch manifest: HostDevice nvidia.com/AD106_GEFORCE_RTX_4060_TI_16GB
```
```
virt-launcher pod has not yet been scheduled when pci passthrough is enable for gpu
```
* 參考 https://github.com/harvester/harvester/issues/4160#issuecomment-2450515323
## 參考
https://github.com/harvester/harvester/issues/3833#issuecomment-1524900503
https://docs.harvesterhci.io/v1.4/advanced/vgpusupport/
https://gist.github.com/kralicky/0f9994526eac7ddc1808bcbfea6a8444