Linux 核心設計: nolibc

--- tags: Linux Kernel Internals, 作業系統 --- # Linux 核心設計: nolibc ## Linux kernel 中的 libc [`libc`](https://en.wikipedia.org/wiki/C_standard_library) 是指在 C 語言中的標準函式庫。在 ISO C 標準中，其定義了一系列的函式原型(prototype)，並規範了每個函式可接受的輸入以及呼叫後應得的輸出，而函式本身則根據實現者可以有不同的實作。比如目前在 Linux 系統上較常使用的實作是 [GNU libc(glibc)](https://www.gnu.org/software/libc/)，或者有針對 glibc 之不足而提出的 [musl](https://www.musl-libc.org/)，其他的實現還有專注於嵌入式系統的 [Newlib](https://sourceware.org/newlib/)、[μClibc](https://uclibc.org/) 等等。對於 C 語言的標準函式庫這樣執行於用戶空間(user space)的程式碼，原則上是應該獨立於存在核心空間(kernel space)的 Linux 專案的。然而這情況在近期發生了一些改變，在 5.1 版本以來，我們可以在 [`tools/include/nolibc`](https://github.com/torvalds/linux/tree/master/tools/include/nolibc) 這路徑下找到一個 libc 的實現，其名為 `nolibc`。這個項目是為了替小型、低階的工作提供一個精簡、最小化(minimal)的 C 標準函式庫的模擬。 nolibc 的歷史可以追朔負責維護 RCU 的專家 Paul McKenney 在 [Kernel-only deployments?](https://lwn.net/ml/linux-kernel/20180823174359.GA13033@linux.vnet.ibm.com/) 這則信件中提到的問題: 在 Paul McKenney 於 Linux 維護的 rcutorture 項目中，同時會需要提供 [initrd](https://zh.wikipedia.org/zh-tw/Initrd)，然而以一般方式被編譯出來 initrd，有些部分對於 rcutorture 卻可能都是不必要的，其中很大部分都是屬於標準函式庫。對此問題，Paul McKenney 詢問到是否有類似經驗的開發者，而 Willy Tarreau 對於該信件所提出的問題產生了共鳴。既然這些執行於 user space 的程式實際上對 C 函式庫的依賴性是如此之小，那麼何不直接定義系統調用(system call) 讓這些程式可以直接呼叫到 kernel，而不需要標準函式庫的介入呢? 最終，nolibc 就在這樣的歷史背景下正式被採納至 Linux kernel 之中。更多的細節可以參考 Willy Tarreau 撰寫於 [Nolibc: a minimal C-library replacement shipped with the kernel](https://lwn.net/Articles/920158/) 的內容。 ## 測試 nolibc 取得測試 nolibc 相關的命令: ``` $ make -C tools/testing/selftests/nolibc make: Entering directory '/home/rin/GitHub/linux/tools/testing/selftests/nolibc' Supported targets under selftests/nolibc: all call the "run" target below help this help sysroot create the nolibc sysroot here (uses $ARCH) nolibc-test build the executable (uses $CC and $CROSS_COMPILE) run-user runs the executable under QEMU (uses $ARCH, $TEST) initramfs prepare the initramfs with nolibc-test defconfig create a fresh new default config (uses $ARCH) kernel (re)build the kernel with the initramfs (uses $ARCH) run runs the kernel in QEMU after building it (uses $ARCH, $TEST) rerun runs a previously prebuilt kernel in QEMU (uses $ARCH, $TEST) clean clean the sysroot, initramfs, build and output files The output file is "run.out". Test ranges may be passed using $TEST. Currently using the following variables: ARCH = x86 CROSS_COMPILE = CC = gcc OUTPUT = /home/rin/GitHub/linux/tools/testing/selftests/nolibc/ TEST = QEMU_ARCH = x86_64 [determined from $ARCH] IMAGE_NAME = bzImage [determined from $ARCH] make: Leaving directory '/home/rin/GitHub/linux/tools/testing/selftests/nolibc' ``` ## nolibc 中的實作/patch nolibc 實作上的一大訴求是盡可能降低函式被編譯出後的大小，因此其實作上皆相當精簡。然而這可不代表這個項目是毫無學問的! 以下就讓我們挑一些有趣的實作/改進案例來探討吧! ### `string.h` #### `strlen` ```cpp static __attribute__((unused)) size_t strlen(const char *str) { size_t len; for (len = 0; str[len]; len++) asm(""); return len; } ``` 若要求你實作一個 `strlen`，顯而易見最簡單的方式就是對 `char *` 下的每個字元做檢查，直到找到 `/0` 為止。然而這裡會令人有疑問的是 `asm("")` 存在的目的是甚麼呢? 可以參照 [`bfc3b0f`](https://github.com/torvalds/linux/commit/bfc3b0f05653a28c8d41067a2aa3875d1f982e3e) 這一 commit 的說明: 在 gcc-12 下，使用 -Os 參數時，由於編譯器從 for 迴圈的 pattern 識別到這相當於是在做 `strlen`，於是便自作聰明的把其產生成 jump 到 `strlen` 這個符號(symbol)的機器碼。但由於 nolibc 下本就不存在標準函式庫，`strlen` 實際上就是該函式本身，結果造成錯誤的無窮迴圈。編譯器中其實存在 [`-ffreestanding`](https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html) 這個編譯選項，這提示編譯的程式碼是 kernel code，因此不存在標準函式庫，可以使用之來避免產生出跳到 `strlen` 的機器碼。然而呼叫 `strlen` 的 user space 程式碼並不會知道背後的實作是透過 nolibc 或者其他，所以從其角度無法判別到底是否要加上 `-ffreestanding` 進行編譯。因此，nolibc 這裡採取另一方式，透過加上一個空白的指令 `asm("")` 讓 gcc 無法識別該 pattern，進而避免上述的問題。 ```cpp #if defined(__OPTIMIZE__) #define nolibc_strlen(x) strlen(x) #define strlen(str) ({ \ __builtin_constant_p((str)) ? \ __builtin_strlen((str)) : \ nolibc_strlen((str)); \ }) #endif ``` 另一方面，在 `__OPTIMIZE__` 被 config 的時候，事實上 `strlen` 可能並不是透過前面說明的函式操作的。我們可以藉由 gcc 的 builtin function [`__builtin_strlen`](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 來達到 `strlen` 的功能，前提是給定的字串屬於在編譯時期就可以確定下來的(`__builtin_constant_p((str)) == true`)。 #### `memset` ```cpp __attribute__((weak,unused,section(".text.nolibc_memset"))) void *memset(void *dst, int b, size_t len) { char *p = dst; while (len--) { /* prevent gcc from recognizing memset() here */ asm volatile(""); *(p++) = b; } return dst; } ``` `memset` 將給定的指標 `*dst` 根據長度 `len` 將每個 byte 逐一寫為給定的值 `b`。類似於 `strlen` 的考量，[`55abdd1`](https://github.com/torvalds/linux/commit/1bfbe1f3e96720daf185f03d101f072d69753f88) 中也加入 `asm volatile("")` 來避免編譯器辨識到特定的 pattern。 #### `memcmp` ```cpp int memcmp(const void *s1, const void *s2, size_t n) { size_t ofs = 0; char c1 = 0; while (ofs < n && !(c1 = ((char *)s1)[ofs] - ((char *)s2)[ofs])) { ofs++; } return c1; } ``` 讓我們先從不完全正確的前一版實作看起。預期上，`memcmp` 應該將 `s1` 和 `s2` 下，大小皆為 `n` byte 的內容進行比較，並根據相異的第一個 byte 孰大回傳正值或負值，或者回傳 0 表示兩段內容相同。然而根據規格書，`memcmp` 實作上需要將 buffer 中的內容視為以 unsigned chars 為單位(7.23.4/1)。而在 C 標準中 char 究竟是等同 singed char 或者 unsigned char 其實是 implementation-defined 的(C standard 6.2.5/15)。於是在上面的實作中，如果 char 是等同 unsigned char，則 `while` 底下的轉型是沒問題，但做為 return 值的 `c1` 總是非負整數是有問題的;而如果 char 是等同 singed char，這個轉型就不正確了，而且做為回傳值 char 可容納的大小和 int 仍不相符。舉例來說，假設 char 等同於 singed char，`s1` 是一個 1 byte 的 buffer，底下的資料為 `0x00` 而 `s2` 為另一個 1 byte buffer，底下的資料是 `0x80`。在上述的實作下無論是 `memcmp(a, b, 1)` 或 `memcmp(b, a, 1)` 將都是回傳 -128(因為計算上會先做 integer promotion(C standard 6.3.1.1/2))。 ```cpp static __attribute__((unused)) int memcmp(const void *s1, const void *s2, size_t n) { size_t ofs = 0; int c1 = 0; while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) { ofs++; } return c1; } ``` 因此根據 [b3f4f51](https://github.com/torvalds/linux/commit/b3f4f51ea68a495f8a5956064c33dce711a2df91) 需要修改為以上實作方式才正確。 ## Rerference * [Nolibc: a minimal C-library replacement shipped with the kernel](https://lwn.net/Articles/920158/)