Try   HackMD

Add system call to Linux 6.6 and test it in QEMU

系統環境

OS: Ubuntu Server 22.04
Arch: AMD64
Kernel Source Version: 6.6

編譯 Kernel Source Code

# 安裝 git
sudo apt update && sudo apt install -y git
# 用 git clone 下載 source code
git clone --depth=1 --branch=v6.6 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
# 安裝編譯 kernel 需要用到的工具
sudo apt install make gcc libncurses-dev flex bison
# 產生 minimal kernel 編譯設定
make allnoconfig

Kernel 的編譯有非常多參數,在編譯前會要求先建立設定檔。執行 make allnoconfig 時會自動生成一個 .config 檔案來放置 minimal kernel 的編譯參數。

由於 minimal kernel 無法在 QEMU 內執行,我們還必須調整部分參數。

# 開啟 kernel 編譯設定編輯器
make menuconfig

以下是需要的編譯設定

64-bit kernel -> Enable
Executable file formats -> Enable all
Device Drivers > Character devices > Serial drivers and 8250/16550 and compatible serial support -> Enable
Device Drivers > Character devices > Console on 8250/16550 and compatible serial port -> Enable
General Setup > Initial RAM filesystem and RAM disk (initramfs/initrd) support -> Enable

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

# Kernel source code 有附設一些工具,需要這兩個 library
sudo apt install libssl-dev libelf-dev
# 編譯 kernel 為 bzImage
make -j <num_cpu> bzImage

編譯好的 bzImage 會位於 arch/x86/boot/bzImage,同時會建立一個位於 arch/x86_64/boot/bzImage 的 symbolic link。

建立 Root FS

只有 Kernel 是無法正常啟動的,如果用 QEMU 嘗試啟動應該會看到沒有 rootfs 的相關錯誤導致 kernel panic。在此我們採用 BusyBox 來快速建立 rootfs。

# 回到上層目錄
cd ..
# 下載 BusyBox 的 source code
wget https://busybox.net/downloads/busybox-1.36.1.tar.bz2
# 解壓縮
tar -xf busybox-1.36.1.tar.bz2
# 進入 BusyBox 的 source code 資料夾
cd busybox-1.36.1
# 調整編譯設定
make menuconfig # 要選取 Build static binary
# 編譯並安裝至 _install 資料夾
make -j <num_cpu> install
# 進入 _install 資料夾
cd _install
# 建立 rootfs 需要的資料夾
mkdir -p lib lib64 proc sys etc etc/init.d
# 寫入開機後執行的腳本
cat > ./etc/init.d/rcS << EOF
#!/bin/sh
# Mount the /proc and /sys filesystems
mount -t proc none /proc
mount -t sysfs none /sys
# Populate /dev
/sbin/mdev -s
EOF

# 設定 rcS 腳本的執行權限
chmod +x etc/init.d/rcS
# 建立 rootfs 的 image
find . | cpio -o --format=newc | gzip > ../../linux/rootfs.img.gz

需要把 BusyBox build 成 static binary 的原因是 rootfs 中缺少 ld.solibc,在缺乏這些 share library 的情況下我們只能採用靜態連結。然而靜態連結也並非萬能,實際上 glibc 對靜態連結的支援不友好,實際上可能會發生問題,在此之下,有開發者發展了 musl,可以完整的支援靜態連結。
glibc static linking problem

使用 QEMU 執行 Kernel

# QEMU 安裝
sudo apt install qemu qemu-kvm
# 回到 linux source code 資料夾
cd ../../linux
# 執行 kernel
qemu-system-x86_64 -kernel arch/x86/boot/bzImage -nographic -initrd rootfs.img.gz -append "root=/dev/ram rdinit=/sbin/init console=ttyS0"

成功開機後按 Enter 可進入 shell,若要退出 QEMU,可按 Ctrl + A + X

加入 System Call

首先要將 system call 的宣告加入清單中。System call 的清單位於 arch/x86/entry/syscalls/syscall_64.tbl

#
# 64-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point>
#
# The __x64_sys_*() stubs are created on-the-fly for sys_*() system calls
#
# The abi is "common", "64" or "x32" for this file.
#
0	common	read			sys_read
1	common	write			sys_write
2	common	open			sys_open

檔案內有介紹如何新增 system call 宣告,我們將要新增的 system call 置於 453 後即可。

454    common    my_syscall    sys_my_syscall   

這個檔案會在 compile 階段被讀取後轉為 header file(arch/x86/include/generated/asm/syscalls_64.h)

include/linux/syscalls.h

在 942 行後加入宣告。

asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags); // line 942
asmlinkage long sys_my_syscall();

最後需要撰寫 system call 的實作,可先將撰寫的檔案置於 kernel/my_syscall.c

#include <linux/syscalls.h>

SYSCALL_DEFINE0(my_syscall)
{
	printk("Hello Linux");
	return 0;
}

SYSCALL_DEFINE0 macro 可定義無參數的 system call,SYSCALL_DEFINE1 可定義一個參數,依此類推。
e.g.

SYSCALL_DEFINE1(my_syscall, long, arg1)

最後將其加入 makefile。

kernel/Makefile

obj-y     = fork.o exec_domain.o panic.o \
	    cpu.o exit.o softirq.o resource.o \
	    sysctl.o capability.o ptrace.o user.o \
	    signal.o sys.o umh.o workqueue.o pid.o task_work.o \
	    extable.o params.o \
	    kthread.o sys_ni.o nsproxy.o \
	    notifier.o ksysfs.o cred.o reboot.o \
	    async.o range.o smpboot.o ucount.o regset.o ksyms_common.o \
		my_syscall.o

測試編寫完的 System Call

由於 rootfs 內沒有 tool chain 讓我們編譯程式碼,因此可以先在外部編譯靜態連結的測試程式再將其加入 rootfs。

#include <unistd.h>

long my_syscall()
{
  return syscall(454);
}

int main()
{
  my_syscall();
  return 0;
}

若有參數可置於 syscall 後。
e.g.

long my_syscall()
{
  return syscall(454, 123);
}