# Linux 核心專題: KVM 在 Arm64 的驗證和調整 > 執行人: otischung > [專題講解影片](https://youtu.be/Bct2vaPu_QU) ### Reviewed by `leowu0411` 關於 VirtIO ,想請問裝置和 Guest 之間用來溝通的 virtqueue,是在宿主端註冊 PCI 裝置時就先決定好並通知 Guest 核心其佇列位置,還是透過其他機制,讓雙方能共享並讀寫同一段記憶體區塊? 回應: Virtqueue 是在 Guest OS 開機時,透過寫入 PCI configuration space (CFG) 位於 capability (CAP) list 中的 [`struct virtio_pci_common_cfg`](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-1090004) 協商得到的。 Host 端的 kvm-host 只需要把 [PCI](https://wiki.osdev.org/PCI#Header_Type_0x0) 的設定,如 Header, CFG, CAP 註冊好 virtio device,Guest OS 中的 virtio driver 偵測到 virtio device 之後,即開始協商,完成協商之後,Host 再藉由 `vm_guest_to_host()` 得到位址即可讀寫。 ### Reviewed by `jserv` 針對 Arm64/RPi5,如何驗證 kvm-host 的 GIC 和相關中斷/例外處理機制? 回應: ```bash ~ $ cat /proc/interrupts CPU0 11: 1515022 GIC-0 27 Level arch_timer 13: 11 GIC-0 33 Edge virtio0 14: 0 GIC-0 34 Edge virtio1 15: 2493 GIC-0 32 Level ttyS0 IPI0: 0 Rescheduling interrupts IPI1: 0 Function call interrupts IPI2: 0 CPU stop interrupts IPI3: 0 CPU stop (for crash dump) interrupts IPI4: 0 Timer broadcast interrupts IPI5: 0 IRQ work interrupts IPI6: 0 CPU wake-up interrupts Err: 0 ``` ```bash ~ $ cat /sys/bus/pci/devices/0000:00:01.0/irq 14 ~ $ cat /sys/bus/pci/devices/0000:00:00.0/irq 13 ``` 我發現這裡的 IRQ NUM 不是預期中的設定,就 VirtIO-Net (0000:00:01.0) 來說,既不是我們設定的 2,也不是加上 SPI base (32) 的 34。 ## 任務描述 [sysprog21/kvm-host](https://github.com/sysprog21/kvm-host) 展示運用 Linux 的 [kernel-based virtual machine](https://hackmd.io/@sysprog/linux-kvm) (KVM),達成可載入 Linux 核心的系統級虛擬機器 (system virtual machine)的極小化實作,適合作為入門 Linux KVM 相關 API 的材料,已支援 x86-64 和 Arm64 處理器架構。 本任務預期針對 Raspberry Pi 5,移植 [sysprog21/kvm-host](https://github.com/sysprog21/kvm-host) 到該硬體,並針對硬體的 [Generic Interrupt Controller (GIC) 400](https://developer.arm.com/documentation/ddi0471/b) 予以調整,使其得以運作 [Linux v6.12+](https://github.com/raspberrypi/linux)。 ## ~~TODO~~: 在 x86-64 確認網路和區塊裝置得以運作 > [Issue #39](https://github.com/sysprog21/kvm-host/issues/39) > 更新相關任務狀態和文件,適時做出貢獻 已將網路測試方式寫入 [README.md](https://github.com/otischung/kvm-host/tree/feat/arm64-support) 裡面,詳見 [PR#40](https://github.com/sysprog21/kvm-host/pull/40) ## ~~TODO~~: 移植到 Arm64/RPi > 針對 GIC-400 和 Linux v6.12+ 進行必要調整 如下 Demo 影片,詳見 [PR#40](https://github.com/sysprog21/kvm-host/pull/40) ## ~~TODO~~: 確保 VirtIO-{blk,net} 得以在 Raspberry Pi 運作 {%youtube yALfvrwXRRk %} 詳見 [PR#40](https://github.com/sysprog21/kvm-host/pull/40) ## 在設定 BAR 時設定 Layout 於 [PR#41](https://github.com/sysprog21/kvm-host/pull/41) 提供修正。 原先在 [kvm-host (`src/pci.c`, commit: 93f1fee)](https://github.com/sysprog21/kvm-host/blob/93f1fee173645a01258084f9c65800f324f5805a/src/pci.c#L145-L157) 的方法: ```cpp void pci_set_bar(struct pci_dev *dev, uint8_t bar, uint32_t bar_size, bool is_io_space, dev_io_fn do_io) { /* TODO: mem type, prefetch */ /* FIXME: bar_size must be power of 2 */ PCI_HDR_WRITE(dev->hdr, PCI_BAR_OFFSET(bar), is_io_space, 32); dev->bar_size[bar] = bar_size; dev->bar_is_io_space[bar] = is_io_space; dev_init(&dev->space_dev[bar], 0, bar_size, dev, do_io); } ``` **在這裡被呼叫**: In [`src/virtio-pci.c`](https://github.com/sysprog21/kvm-host/blob/93f1fee173645a01258084f9c65800f324f5805a/src/virtio-pci.c#L271): ```cpp void virtio_pci_init(struct virtio_pci_dev *dev, struct pci *pci, struct bus *io_bus, struct bus *mmio_bus) { // ... pci_set_bar(&dev->pci_dev, 0, 0x100, PCI_BASE_ADDRESS_SPACE_MEMORY, virtio_pci_space_io); // ... } ``` **定義**: In `/usr/include/linux/pci_regs.h`: ```c /* * Base addresses specify locations in memory or I/O space. * Decoded size can be determined by writing a value of * 0xffffffff to the register, and reading it back. Only * 1 bits are decoded. */ #define PCI_BASE_ADDRESS_0 0x10 /* 32 bits */ #define PCI_BASE_ADDRESS_1 0x14 /* 32 bits [htype 0,1 only] */ #define PCI_BASE_ADDRESS_2 0x18 /* 32 bits [htype 0 only] */ #define PCI_BASE_ADDRESS_3 0x1c /* 32 bits */ #define PCI_BASE_ADDRESS_4 0x20 /* 32 bits */ #define PCI_BASE_ADDRESS_5 0x24 /* 32 bits */ #define PCI_BASE_ADDRESS_SPACE 0x01 /* 0 = memory, 1 = I/O */ #define PCI_BASE_ADDRESS_SPACE_IO 0x01 #define PCI_BASE_ADDRESS_SPACE_MEMORY 0x00 #define PCI_BASE_ADDRESS_MEM_TYPE_MASK 0x06 #define PCI_BASE_ADDRESS_MEM_TYPE_32 0x00 /* 32 bit address */ #define PCI_BASE_ADDRESS_MEM_TYPE_1M 0x02 /* Below 1M [obsolete] */ #define PCI_BASE_ADDRESS_MEM_TYPE_64 0x04 /* 64 bit address */ #define PCI_BASE_ADDRESS_MEM_PREFETCH 0x08 /* prefetchable? */ #define PCI_BASE_ADDRESS_MEM_MASK (~0x0fUL) #define PCI_BASE_ADDRESS_IO_MASK (~0x03UL) /* bit 1 is reserved if address_space = 1 */ ``` In [`src/pci.h`](https://github.com/sysprog21/kvm-host/blob/93f1fee173645a01258084f9c65800f324f5805a/src/pci.h#L26): ```c #define PCI_BAR_OFFSET(bar) (PCI_BASE_ADDRESS_0 + ((bar) << 2)) ``` ### 修正 在 commit [c4b325e](https://github.com/otischung/kvm-host/blob/c4b325e659e7cceebaed6b39b2eb3d6796b2f5e0/src/pci.h#L49-L82) 將原本 `bool is_io_space` 改為 `uint32_t layout`,使用者需要在傳入參數時設定好 layout,用法與 `pci_set_status()` 相似。 `bool dev->bar_is_io_space[bar]` 的設定改為直接讀取 bit[0],參考 OS Dev 的 [PCI](https://wiki.osdev.org/PCI#Base_Address_Registers) 頁面。 ```diff diff --git a/src/pci.c b/src/pci.c index 2b59ded..aa4dccf 100644 --- a/src/pci.c +++ b/src/pci.c @@ -145,14 +145,13 @@ static void pci_mmio_io(void *owner, void pci_set_bar(struct pci_dev *dev, uint8_t bar, uint32_t bar_size, - bool is_io_space, + uint32_t layout, dev_io_fn do_io) { - /* TODO: mem type, prefetch */ /* FIXME: bar_size must be power of 2 */ - PCI_HDR_WRITE(dev->hdr, PCI_BAR_OFFSET(bar), is_io_space, 32); + PCI_HDR_WRITE(dev->hdr, PCI_BAR_OFFSET(bar), layout, 32); dev->bar_size[bar] = bar_size; - dev->bar_is_io_space[bar] = is_io_space; + dev->bar_is_io_space[bar] = layout & 0x1U; // Get the bit[0] of layout dev_init(&dev->space_dev[bar], 0, bar_size, dev, do_io); } ``` 使用時,在參數設定時設定 layout 即可,例如在 `virtio-pci.c` 裡面,設定該 BAR 為 MMIO, 32 bit, non-prefetchable。 如果是 Non-prefetchable 就不需要設定 bit[3],在這裡還是以註解方式寫出設定的用法。 ```diff diff --git a/src/virtio-pci.c b/src/virtio-pci.c index 289abb8..f712e8c 100644 --- a/src/virtio-pci.c +++ b/src/virtio-pci.c @@ -268,7 +268,9 @@ void virtio_pci_init(struct virtio_pci_dev *dev, PCI_HDR_WRITE(dev->pci_dev.hdr, PCI_HEADER_TYPE, PCI_HEADER_TYPE_NORMAL, 8); PCI_HDR_WRITE(dev->pci_dev.hdr, PCI_INTERRUPT_PIN, 1, 8); pci_set_status(&dev->pci_dev, PCI_STATUS_CAP_LIST | PCI_STATUS_INTERRUPT); - pci_set_bar(&dev->pci_dev, 0, 0x100, PCI_BASE_ADDRESS_SPACE_MEMORY, + pci_set_bar(&dev->pci_dev, 0, 0x100, + PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_32 + /* | PCI_BASE_ADDRESS_MEM_PREFETCH */, virtio_pci_space_io); virtio_pci_set_cap(dev, cap_list); dev->device_feature |= ``` ## TODO: 設定 TAP/TUN 使得 Guest OS 可以連接不只 10.0.0.1 ### 2025.06.19 更新 目前無法正常啟動 TAP ```bash ❯ sudo ip link delete br0 ❯ sudo brctl addbr br0 ❯ sudo ip addr add 10.0.0.1/24 dev br0 ❯ sudo ip route add default via 10.0.0.1 dev br0 RTNETLINK answers: Network is down ❯ sudo ip link set br0 up ❯ sudo ip link set tap0 master br0 Cannot find device "tap0" ❯ sudo ip link set tap0 up ❯ ls -lash /dev/net total 0 0 drwxr-xr-x 2 root root 60 Jun 18 19:59 . 0 drwxr-xr-x 22 root root 5.2K Jun 19 13:03 .. 0 crw-rw-rw- 1 root root 10, 200 Jun 18 12:00 tun ``` 嘗試使用以下指令建立 `tap0`: ```bash sudo ip tuntap add dev tap0 mode tap user $(whoami) sudo ip link set dev tap0 up ``` 在 Guest OS 上 `ping 10.0.0.2` 沒有反應 ```bash ~ $ ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2): 56 data bytes ^C --- 10.0.0.2 ping statistics --- 8 packets transmitted, 0 packets received, 100% packet loss ``` 以下是在 Host OS 上 `ip a` 的部分結果: ```bash ❯ ip a # ... 12: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel master br0 state DOWN group default qlen 1000 link/ether 36:91:27:bb:8e:77 brd ff:ff:ff:ff:ff:ff 13: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 8e:a9:b2:7c:13:aa brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 scope global br0 valid_lft forever preferred_lft forever ``` 系統似乎找不到 `tun` 這個 kernel module ```bash sudo modprobe tun lsmod | grep tun ``` 在 `/lib/modules` 似乎也沒有 ```bash ❯ ls /lib/modules/$(uname -r)/kernel/drivers/net arcnet fddi mctp ppp vmxnet3 xen-netback gtp.ko.zst mhi_net.ko.zst rionet.ko.zst bonding fjes mdio pse-pd vxlan amt.ko.zst ifb.ko.zst mii.ko.zst sungem_phy.ko.zst caif hamradio netdevsim slip wan bareudp.ko.zst macsec.ko.zst netconsole.ko.zst tap.ko.zst can hyperv pcs team wireguard dummy.ko.zst macvlan.ko.zst nlmon.ko.zst veth.ko.zst dsa ieee802154 phy thunderbolt wireless eql.ko.zst macvtap.ko.zst ntb_netdev.ko.zst vrf.ko.zst ethernet ipvlan plip usb wwan geneve.ko.zst mdio.ko.zst pfcp.ko.zst vsockmon.ko.zst ``` ### 2025.06.20 更新 - 嘗試使用 qemu 開啟 > Reference: > [Setting up Qemu with a tap interface](https://gist.github.com/extremecoders-re/e8fd8a67a515fee0c873dcafc81d811c?fbclid=IwY2xjawLBukZleHRuA2FlbQIxMQABHmsNj5-pyCvxqkOWHf6CQMueyLVaJVC6NmEzwV_q3cpVv2MT8bZpQxGglWkZ_aem_jlpvAZuh6x84h6KQeD4Ptg) > [2024 年 KVM 專題](https://hackmd.io/@sysprog/ryG0h25I0#%E6%B8%AC%E8%A9%A6-Virtio-Net-%E8%A3%9D%E7%BD%AE) 設定步驟: ```bash # Install for brctl and tunctl sudo apt install bridge-utils uml-utilities sudo brctl addbr br0 # This may result in the network being temporarily unavailable. sudo ip addr flush dev eth0 sudo brctl addif br0 eth0 sudo tunctl -t tap0 -u `whoami` sudo brctl addif br0 tap0 sudo ip addr add 10.0.0.1/24 dev br0 sudo ip route add default via 10.0.0.1 dev br0 sudo ip link set tap0 master br0 sudo ip link set eth0 up sudo ip link set tap0 up sudo ip link set br0 up # Reactivate the network with nmcli nmcli c up eth0 ``` 若需移除 `br0` 與 `tap0`,輸入以下命令: ```bash sudo ip link delete br0 sudo ip link delete tap0 ``` 接下來啟動 qemu ```bash sudo qemu-system-x86_64 \ -kernel build/bzImage \ -initrd build/rootfs.cpio \ -append "console=ttyS0 root=/dev/vda rw" \ -m 1G \ -nographic \ -drive file=build/ext4.img,format=raw,id=hd0,if=none \ -device virtio-blk-pci,drive=hd0 \ -netdev tap,id=net0,ifname=tap0,script=no,downscript=no \ -device virtio-net-pci,netdev=net0 ``` 接著在 Guest OS 設定網路 ```bash ip addr add 10.0.0.2/24 dev eth0 ip link set eth0 up ip route add default via 10.0.0.1 ``` 發現只能 ping 到 10.0.0.1,其他的不行 ```bash ~ $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: seq=0 ttl=64 time=6.968 ms 64 bytes from 10.0.0.1: seq=1 ttl=64 time=1.555 ms 64 bytes from 10.0.0.1: seq=2 ttl=64 time=1.187 ms ^C --- 10.0.0.1 ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 1.187/3.236/6.968 ms ~ $ ping 127.0.0.1 PING 127.0.0.1 (127.0.0.1): 56 data bytes ^C --- 127.0.0.1 ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss ``` 雖然有成功,但是多了 2 個 PCI 裝置 與 PCI 相關的 Guest OS 的開機訊息如下: ``` PCI: Probing PCI hardware PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0000-0xffff] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffffff] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 pci 0000:00:01.1: reg 0x20: [io 0xc0a0-0xc0af] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 pci 0000:00:02.0: [1234:1111] type 00 class 0x030000 pci 0000:00:02.0: reg 0x10: [mem 0xfd000000-0xfdffffff pref] pci 0000:00:02.0: reg 0x18: [mem 0xfeb90000-0xfeb90fff] pci 0000:00:02.0: reg 0x30: [mem 0xfeb80000-0xfeb8ffff pref] pci 0000:00:03.0: [1af4:1001] type 00 class 0x010000 pci 0000:00:03.0: reg 0x10: [io 0xc000-0xc07f] pci 0000:00:03.0: reg 0x14: [mem 0xfeb91000-0xfeb91fff] pci 0000:00:03.0: reg 0x20: [mem 0xfe000000-0xfe003fff 64bit pref] pci 0000:00:04.0: [1af4:1000] type 00 class 0x020000 pci 0000:00:04.0: reg 0x10: [io 0xc080-0xc09f] pci 0000:00:04.0: reg 0x14: [mem 0xfeb92000-0xfeb92fff] pci 0000:00:04.0: reg 0x20: [mem 0xfe004000-0xfe007fff 64bit pref] pci 0000:00:04.0: reg 0x30: [mem 0xfeb00000-0xfeb7ffff pref] pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00 pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000] ``` 完整的 Guest OS 開機 log 如下: https://pastebin.com/rjfQjvqU 但是相同設定在 `kvm-host` 還是不行,以下是 `kvm-host` 開機訊息 https://pastebin.com/1tDfG4HG ### 2025.06.24 更新 發現不能 bridge WiFi interface ``` ❯ sudo brctl addif br0 wlp0s20f3 can't add wlp0s20f3 to bridge br0: Operation not supported ``` ### 2025.06.30 更新:成功連接 10.0.0.1 在 [src/virtio-net.c](https://github.com/sysprog21/kvm-host/blob/93f1fee173645a01258084f9c65800f324f5805a/src/virtio-net.c#L221) 裡面,在經過以下設定之後 ```c ioctl(virtio_net_dev->tapfd, TUNSETIFF, &ifreq); ``` `#define TAP_INTERFACE "tap%d"` 會取代成正確的數值,例如 tap0,所以,tap0 應該是由 kvm-host 成功開啟之後,在對它設定 bridge 就可以了。 剩餘的設定與 [2024 年 KVM 專題](https://hackmd.io/@sysprog/ryG0h25I0#%E6%B8%AC%E8%A9%A6-Virtio-Net-%E8%A3%9D%E7%BD%AE) 所述相同。 --- 在 Ubuntu 24.04 中,TUN 是設定為 built-in,TAP 是設定為 Module,系統的 configuration 可以從 `/boot/config-$(uname -r)` 中找到: ```bash cat /boot/config-`uname -r` | grep -C 5 CONFIG_TAP CONFIG_NTB_NETDEV=m CONFIG_RIONET=m CONFIG_RIONET_TX_SIZE=128 CONFIG_RIONET_RX_SIZE=128 CONFIG_TUN=y CONFIG_TAP=m # CONFIG_TUN_VNET_CROSS_LE is not set CONFIG_VETH=m CONFIG_VIRTIO_NET=y CONFIG_NLMON=m CONFIG_NETKIT=y ``` ### 2025.06.30 更新:嘗試使用 Bridge + DHCP 連接外網 #### Host 設定 由於 Ubuntu 24.04 Desktop 預設使用 NetworkManager 來管理網路,因此這裡使用 `nmcli` 來設定 ```bash sudo nmcli connection add \ type bridge \ con-name br0 \ ifname br0 \ autoconnect yes \ ipv4.method auto \ ipv4.dns 8.8.8.8,8.8.4.4 \ ipv6.method ignore \ bridge.stp no sudo nmcli connection add \ type bridge-slave \ con-name br0-enp7s0 \ ifname netplan-enp7s0 \ master br0 \ autoconnect yes sudo nmcli connection add \ type bridge-slave \ con-name br0-tap0 \ ifname tap0 \ master br0 \ autoconnect yes sudo nmcli connection up br0 sudo nmcli connection up br0-enp7s0 sudo nmcli connection up br0-tap0 ``` :::info 以我的電腦為例,我的網路卡名稱是 `enp7s0`,請根據自己的網卡名稱做修改 ::: 如果要刪除,則輸入以下指令: ```bash sudo nmcli connection delete br0 sudo nmcli connection delete br0-enp7s0 sudo nmcli connection delete br0-tap0 ``` #### Guest 設定 如果在 Guest OS 直接使用 `udhcpc`,會遇到以下錯誤: ```bash ~ $ udhcpc -i eth0 udhcpc: started, v1.36.1 udhcpc: socket: Address family not supported by protocol ``` 在 `configs/busybox.config` 裡面有關於 `udhcpc` 的設定如下 ``` CONFIG_UDHCPC=y CONFIG_UDHCPC_DEFAULT_SCRIPT="" ```