你所不知道的 C 語言：動態連結器篇

--- tags: DYKC, CLANG, C LANGUAGE, dynamic linking --- # [你所不知道的 C 語言](https://hackmd.io/@sysprog/c-prog/)：動態連結器篇 > 連結器比你想像中還貼近程式行為 Copyright (**慣C**) 2016, 2018 [宅色夫](https://wiki.csie.ncku.edu.tw/User/jserv) ==[直播錄影](https://youtu.be/7aYVDPuH2uI)== ## 跨越程式語言的交流：透過 C 語言 ![image](https://hackmd.io/_uploads/BJdvw3qobl.png) "lingua franca" (IPA 音標 [ˌlɪŋgwə ˈfræŋkə]) 一詞源自 17 世紀義大利語稱呼「法蘭克語/口音」，後來引申為橋接用的語言，現代英語就扮演這樣的角色，讓世界各國、不同文化背景的人，得以透過共通的英語來交流。對近代程式語言而言，C 語言便扮演著相同的角色。以 Java 程式語言來說，儘管有 Java 虛擬機器，甚至能用 Java 開發 Java 虛擬機器 (如 [GraalVM](https://www.graalvm.org/)、[Jikes RVM](https://www.jikesrvm.org/)、[Maxine VM](https://en.wikipedia.org/wiki/Maxine_Virtual_Machine))，但和作業系統相關的操作仍需要透過 C 語言 (或 C++)，連同呼叫原本用 C/C++ 開發的函式庫在內。Java 的 [JNI](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/) (Java Native Interface) 正是為此而生，Python 的 [ctypes](https://docs.python.org/3/library/ctypes.html) 和 Rust 的 `extern "C"` 亦然。理解動態連結不僅是重新回顧 C 語言，更是透過橋接語言，深入理解系統運作。延伸閱讀: * [P-code 與 Kenneth Bowles 教授](https://www.facebook.com/JservFans/posts/1711808435612150) * [UCSD Pascal pioneer Ken Bowles has died](https://news.ycombinator.com/item?id=18161217) ## 貌似簡單的問題：「如何得知 malloc/free 的呼叫次數？」 ### 簡單的作法 1. 定義全域變數來記錄: `int malloc_count = 0, free_count = 0;` 2. 透過巨集 `#define MALLOC(x) do { if (malloc(x)) malloc_count++; } while (0)` > 參照〈[你所不知道的 C 語言：技巧篇](https://hackmd.io/@sysprog/c-trick)〉，避免 dangling-else 這有什麼問題？ * 要改寫原始程式碼，將 `malloc` 換成 `MALLOC` 巨集，若沒更換，就不會追蹤 * 對 C++ 不適用，即便底層 [libstdc++](https://gcc.gnu.org/onlinedocs/libstdc++/) 也用 `malloc()`/`free()` 來實作 `new` 和 `delete` * 使用的函式庫 (靜態和動態) 若呼叫 `malloc()`/`free()`，也無法追蹤要徹底解決這問題，就需要動用動態連結器 (dynamic linker)。以 GNU/Linux 搭配 [glibc](https://www.gnu.org/software/libc/) 為例，建立檔案 `malloc_count.c`: ```c #include <stddef.h> #include <string.h> #include <stdio.h> #include <dlfcn.h> #include <unistd.h> void *malloc(size_t size) { char buf[32]; static void *(*real_malloc)(size_t) = NULL; if (real_malloc == NULL) { real_malloc = dlsym(RTLD_NEXT, "malloc"); } sprintf(buf, "malloc called, size = %zu\n", size); write(2, buf, strlen(buf)); return real_malloc(size); } ``` 編譯和執行: ```shell $ gcc -D_GNU_SOURCE -shared -ldl -fPIC -o /tmp/libmcount.so malloc_count.c $ LD_PRELOAD=/tmp/libmcount.so ls ``` 其中 `-D_GNU_SOURCE` 是必要的，因為 `RTLD_NEXT` 屬於 GNU 擴充，並非 POSIX 標準的一部分。若不定義 `_GNU_SOURCE`，`<dlfcn.h>` 不會曝露 `RTLD_NEXT` 的宣告。`-fPIC` 指示編譯器產生 position-independent code，在 x86-64 架構上透過 `rip`-relative 定址存取資料，使得同一份程式碼可被載入到任意記憶體位址，這是建構動態函式庫的必要條件。即可得知每次 `malloc()` 呼叫對應的參數，甚至可以統計記憶體配置，完全不需要變更原始程式碼。這樣的技巧稱為 interpositioning (函式介入)。可能的應用場景包含： * 遊戲破解與執行時期追蹤 * sandboxing / software fault isolation (SFI) * profiling 與效能分析 * 效能最佳化的記憶體配置器 (如 [TCMalloc](https://github.com/google/tcmalloc)、[jemalloc](https://github.com/jemalloc/jemalloc)) ### 運作原理透過設定 `LD_PRELOAD` 環境變數，glibc 的 dynamic linker ([ld.so](https://man7.org/linux/man-pages/man8/ld.so.8.html)) 會在載入和重新定位 (relocation) `libc.so` 之前，先載入我們撰寫的 `/tmp/libmcount.so` 動態連結函式庫。如此一來，我們實作的 `malloc` 就會在 `libc.so` 提供的 `malloc` 函式之前被載入，依據 symbol resolution 的順序，呼叫端會優先找到我們的版本。當然，我們還是需要真正的 `malloc`，否則程式無法正常運作。程式碼中透過 `dlsym(RTLD_NEXT, "malloc")` 取得下一個動態函式庫 (即 `libc.so`) 中的 `malloc` 位址。`RTLD_NEXT` 是 GNU 擴充 (glibc 特有，需定義 `_GNU_SOURCE`)，定義於 `<dlfcn.h>`，告知動態連結器跳過目前的函式庫，從載入順序中的下一個函式庫搜尋指定的 symbol。 > :warning: 注意: 上述範例中使用 `sprintf` 搭配固定大小的 `buf[32]`。當 `size` 值極大時 (如 `SIZE_MAX`)，格式化後的字串可能超過 32 位元組，導致 buffer overflow。實務上應改用 `snprintf(buf, sizeof(buf), ...)` 以確保安全。此外，刻意不使用 `fprintf(stderr, ...)` 是因為 `fprintf` 內部可能呼叫 `malloc`，造成無窮遞迴。 > :warning: 注意: 此範例在多執行緒環境下存在競爭條件 (race condition)。`real_malloc` 是 `static` 區域變數，多個執行緒可能同時進入 `if (real_malloc == NULL)` 的分支而重複呼叫 `dlsym`。在嚴謹的實作中，應使用 `pthread_once` 或 C11 的 `call_once` 來確保初始化僅執行一次。此外，`dlsym` 本身可能呼叫 `malloc` (取決於實作)，在 `real_malloc` 尚未初始化時就觸發遞迴，導致 segfault。一種常見的解法是在 `real_malloc` 尚未就緒時，退回到 `mmap` 或靜態 buffer 提供臨時的記憶體配置。值得留意的是，GNU ld 有個選項 `-Bsymbolic-functions` 會影響 `LD_PRELOAD` 的行為：當動態函式庫以此選項連結時，庫內部對自身函式的呼叫會直接繫結 (bind)，不經過 PLT，從而使 interpositioning 失效。 > 延伸閱讀: [Symbolism and ELF files (or, what does -Bsymbolic do?)](https://flameeyes.blog/2012/10/symbolism-and-elf-files-or-what-does-bsymbolic-do) ### `--wrap` 替代方案除了 `LD_PRELOAD`，GNU ld 也提供連結時期的 interpositioning 機制：`ld --wrap=symbol`。連結器會將對 `symbol` 的參照重新導向至 `__wrap_symbol`，並將原始 symbol 以 `__real_symbol` 的名稱保留，讓包裝函式仍可呼叫原始實作。相較於 `LD_PRELOAD` 的執行時期介入，`--wrap` 在連結時期完成，適用於靜態連結的情境。詳見 [How to wrap a system call (libc function) in Linux](https://samanbarghi.com/post/2014-09-05-how-to-wrap-a-system-call-libc-function-in-linux/)。延伸閱讀: * [Dynamic linker tricks: Using LD_PRELOAD to cheat, inject features and investigate programs](https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker-tricks-using-ld_preload-to-cheat-inject-features-and-investigate-programs/) * [Tutorial: Function Interposition in Linux](https://jayconrod.com/posts/23/tutorial-function-interposition-in-linux) * [List of resources related to LD_PRELOAD](https://github.com/gaul/awesome-ld-preload) ## "No such file or directory" 可能跟你猜想的不一樣當你在 Linux 上執行一個 ELF 執行檔卻得到 "No such file or directory" 的錯誤訊息時，直覺會認為檔案不存在。然而實際上，檔案可能就在那裡，真正找不到的是 ELF interpreter，也就是動態連結器本身。每個動態連結的 ELF 執行檔都包含一個 `PT_INTERP` 段 (program header)，記載 dynamic linker 的路徑 (通常是 `/lib64/ld-linux-x86-64.so.2` 或類似路徑)。核心在執行 `execve` 系統呼叫時，會先讀取這個路徑並載入對應的 dynamic linker，再由 dynamic linker 載入程式所需的動態函式庫。如果 `PT_INTERP` 指向的路徑不存在 (例如跨平台編譯後直接執行，或在容器環境缺少對應的動態連結器)，核心就會回報 `ENOENT`，而 shell 將其顯示為 "No such file or directory"，指的是 ELF interpreter 不存在，而非執行檔本身。可用 `readelf -l <executable> | grep interpreter` 查看 ELF interpreter 路徑，並用 [PatchELF](https://nixos.org/patchelf.html) 修改。 Linux 核心的程式碼 [`fs/binfmt_elf.c`](https://elixir.bootlin.com/linux/v4.18.12/source/fs/binfmt_elf.c) 中，處理 `PT_INTERP` 的流程如下 (以 Linux v4.18 為例): ```c=750 if (elf_ppnt->p_type == PT_INTERP) { /* This is the program interpreter used for * shared libraries - for now assume that this * is an a.out format binary */ ... elf_interpreter = kmalloc(elf_ppnt->p_filesz, GFP_KERNEL); if (!elf_interpreter) goto out_free_ph; retval = kernel_read(bprm->file, elf_ppnt->p_offset, elf_interpreter, elf_ppnt->p_filesz); ``` > 自 Linux 4.14 起，`kernel_read()` 的函式簽名已從 `kernel_read(file, offset, buf, count)` 變更為 `kernel_read(file, buf, count, &pos)`。在現代核心 (v6.x) 中，相關程式碼結構類似但細節有所不同，參見 [`fs/binfmt_elf.c`](https://elixir.bootlin.com/linux/v6.8/source/fs/binfmt_elf.c)。核心從 `PT_INTERP` 段讀取 interpreter 路徑後，會嘗試開啟該檔案。若開啟失敗，`load_elf_binary()` 回傳錯誤碼，使用者看到的便是令人困惑的 "No such file or directory"。除了 `PT_INTERP`，動態連結器在啟動後還需要找到程式相依的動態函式庫。搜尋順序依序為: `LD_LIBRARY_PATH` 環境變數指定的路徑、`/etc/ld.so.cache` 快取檔 (由 `ldconfig` 工具建立和維護)、以及預設路徑 (如 `/lib`、`/usr/lib`)。當你安裝新的動態函式庫卻忘記執行 `ldconfig` 更新快取，也可能遇到類似的找不到問題。延伸閱讀： * 《[Binary Hacks](http://ukai.jp/Slides/2006/1024-gree/binhacks.pdf)》 * [Executable and Linkable Format](https://web.archive.org/web/20190428202733/https://www.cs.stevens.edu/~jschauma/631/elf.html) (非常詳盡) * [ELF Hacks](https://maskray.me/blog/2015-03-26-elf-hacks) * [Hacking Your ELF For Fun And Profit](https://mgalgs.io/2013/05/10/hacking-your-ELF-for-fun-and-profit.html) ## 複習 [編譯器和最佳化原理](https://hackmd.io/@sysprog/c-compiler-optimization) > [From Source to Binary: How A Compiler Works: GNU Toolchain](http://www.slideshare.net/jserv/how-a-compiler-works-gnu-toolchain) 在不啟用 LTO 的情況下，編譯器可刪除沒使用的 `static` global variable 來節省空間，但不能刪除沒使用的 non-static global variable，因為無法確定別的 [compilation unit](https://www.cs.auckland.ac.nz/references/unix/digital/AQTLTBTE/DOCU_015.HTM) 會不會用到此變數。依據 C99 6.2.2，沒有 storage-class specifier 的 file scope 宣告具有 external linkage，意味著其他 translation unit 可能參照它 (啟用 LTO 後，連結器具備跨 compilation unit 的可見性，可移除確實未被參照的 external linkage symbol)。 > 這是為何建議 local function 要宣告成 `static` 的用意: 賦予 internal linkage，讓編譯器確知該 symbol 不會被外部參照，進而放心移除或 inline。 [早期的 C 語言編譯器](https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html)沒有 preprocessor，在 1973 年之後，引入 preprocessor，啟發自 BCPL 和 PL/I 程式語言的 file inclusion 機制，對模組化設計有更好的支援。 * 延伸閱讀: [前置處理器應用篇](https://hackmd.io/@sysprog/c-preprocessor)和[函式呼叫篇](https://hackmd.io/@sysprog/c-function) 取自 [DIGITAL UNIX](https://en.wikipedia.org/wiki/Tru64_UNIX) 的 [DEC C Language Reference Manual](https://www.cs.auckland.ac.nz/references/unix/digital/AQTLTBTE/TITLE.HTM): (1997 年 12 月) > "A _compilation unit_ is C source code that is compiled and treated as **one logical unit**. The compilation unit is usually one or more entire files, but can also be a selected portion of a file if, for example, the #ifdef preprocessor directive is used to select specific code sections. **Declarations and definitions within a compilation unit determine the scope of functions and data objects**." [ [出處](https://www.cs.auckland.ac.nz/references/unix/digital/AQTLTBTE/DOCU_015.HTM) ] > 1998 年初，Compaq 收購 DEC，Digital UNIX 4.0F 更名為 Tru64 UNIX，強調是業界領先的 64-bit 作業系統，並且逐步淡化 Digital 品牌。在一個 compilation unit 內的宣告和定義，決定函式與資料物件的有效範圍 (scope)。 * Compilation units * The most common way of building C projects is to decompose every source file into an object file then link all the objects together at the end. This procedure works great for incremental development, but it is suboptimal for performance and optimization. Your compiler can’t detect potential optimizations across file boundaries this way. * LTO (Link Time Optimization) * LTO fixes the "source analysis and optimization across compilation units problem" by annotating object files with intermediate representation so source-aware optimizations can be carried out across compilation units at link time. * LTO can slow down the linking process noticeably, but make -j helps if your build includes multiple non-interdependent final targets (.a, .so, .dylib, testing executables, application executables, etc). * [clang LTO](https://llvm.org/docs/LinkTimeOptimization.html) ([guide](https://llvm.org/docs/GoldPlugin.html)) * [gcc LTO](https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html) 無論是 clang/LLVM 抑或 gcc，都支援 LTO，對應的命令列選項是 `-flto`。 LTO 的效益: * [Elimination of unused functions](https://gcc.gnu.org/wiki/LinkTimeOptimizationFAQ) * [cross-language ThinLTO](https://github.com/rust-lang/rust/issues/49879) (Rust/C++) * [Firefox is now built with clang LTO on all* platforms](https://glandium.org/blog/?p=3888) > on Linux, where we’re getting more than 5% performance improvements on most Talos tests (up to 18% (!) on some tests) compared to GCC 6.4 with PGO * [Shrinking the kernel with link-time optimization](https://lwn.net/Articles/744507/) > Let's pick the STM32 target which represents the kind of tiny systems we're aiming for. The advantage here is that mainline Linux already runs on most STM32 microcontrollers ... This is a 22% size reduction right there. ### [Modern C](https://gustedt.gitlabpages.inria.fr/modern-c/) * 來自法國 INRIA (國家資訊暨自動化研究院) 的大作，必讀！ * [CompCert](https://compcert.org/) 是 INRIA 旗下針對高度安全和可靠性需求研發、支援 formal verification 的 C 語言編譯器 * [CompCert](https://compcert.org/) 支援 C 語言的子集 (大致涵蓋 C99，排除部分如 `longjmp` 等難以驗證的特性)，可產生 x86、Arm、RISC-V 等處理器架構的輸出。最佳化後的程式碼效能約為 `gcc -O1`。CompCert 用 OCaml 開發，並且用證明驗證工具 [Rocq Prover](https://rocq-prover.org/) 驗證正確以下 Rule 摘錄自《[Modern C](https://gustedt.gitlabpages.inria.fr/modern-c/)》 - [ ] **Rule 4.22.2.1** > File scope static const objects may be replicated in all compilation units that use them. (Page 169) 英語中 replicate 有「複製」或「重複」的意思。一旦物件宣告為 `static const`，因為具有 internal linkage (C99 6.2.2) 且值不可變，編譯器可在每個使用到的 compilation unit 中各自保留一份副本，並施加 constant folding、將值嵌入指令的 immediate 欄位等更多樣的最佳化策略。 - [ ] **Rule 4.22.2.2** > File scope static const objects cannot be used inside inline functions with external linkage. Another way is to declare them ```c extern listElem const singleton; ``` and to define them in one of the compilation units: ```c listElem const singleton = { 0 }; ``` 第二種方法的重大缺點在於，僅包含宣告的其他 compilation unit 無法得知該物件的值。因此編譯器可能錯失最佳化的機會。考慮以下程式碼: ```c inline listElem *listElem_init(listElem *el) { if (el) *el = singleton; return el; } ``` 若編譯器已得知 `singleton` 的內含值，那麼原本指定數值的操作就不用重複自記憶體載入，而且呼叫 `listElem_init()` 的地方就能更緊湊，對效能和程式追蹤有助益。 - [ ] **Rule 4.22.2.3** > File scope extern const objects may miss optimization opportunities for constant folding and instruction immediates. 以 `extern` 宣告的物件具有 external linkage，其位址可能在其他 compilation unit 中被取得，形成指向同一物件的 alias。編譯器因無法確定是否存在這樣的 alias，就必須保守地假設每次透過指標的寫入都可能修改該物件的值，進而無法施加 constant folding 等最佳化。這就是所謂的 aliasing 問題。C99 6.5 §7 的 strict aliasing rule 規定: 物件的 stored value 只能透過與其 effective type 相容的型別 (compatible type)、其 qualified 版本、對應的 signed/unsigned 型別，或 `char` 型別的左值來存取。編譯器據此假設不相容型別的指標不會指向同一物件，從而啟用更積極的最佳化。然而，對於同一型別的不同指標 (如二個 `int *`)，編譯器仍無法排除 aliasing 的可能，此時需要 C99 引入的 `restrict` 修飾詞 (C99 6.7.3.1) 來明確告知編譯器這些指標不會 alias。 - [ ] **Rule 4.22.2.4** > File scope extern or static const objects may miss optimization opportunities because of mispredicted aliasing. [ **22.5\. Functions.** ] (Page 174) 以 `inline` 宣告的函式在存取不具 linkage 或僅具 internal linkage 的 symbol 時會遭遇困難。非 `static` 的 `inline` 函式無法存取檔案範圍的 `static` 變數，即便該變數具有 `const` 修飾亦然。本提案 (指 Modern C 書中的建議) 簡化這方面的規則: * 以 storage class register 宣告的函式等同於 static inline 宣告，並額外限制其位址不可被取得 * 所有 static inline 函式可存取其定義處可見的 register 物件 * 所有以 inline 宣告的函式可存取其定義處可見的 register 常數。 > 補充: C99 和 C11 對 `inline` 函式的 linkage 規則有重要差異。C99 6.7.4 §6 規定，單獨以 `inline` (不加 `static` 或 `extern`) 宣告的函式定義屬於 inline definition，不提供 external definition，因此若其他 translation unit 需要呼叫該函式，必須在某處提供 `extern inline` 宣告。gcc 在 C89 模式下的 `inline` 語義與 C99 不同 (行為與 C99 的 `extern inline` 互換)，這是常見的混淆來源。使用 `-std=c99` 或更新的標準可避免此問題。實務上，`static inline` 是最安全、可攜的選擇，因為它在所有 C 標準版本中行為一致。 ## Symbol Visibility 依據 C99 6.2.2，file scope 的函式和變數若沒有 `static` 修飾，預設具有 external linkage。在動態連結的情境中，這些 symbol 會被放入動態符號表 (dynamic symbol table) 而被 export，意味著其他動態函式庫或執行檔可以存取它們。一個 symbol 一旦 export，就可能遇到前述的 interpositioning，導致非預期的行為。解決方法是妥善地設定 symbol visibility。 gcc 和 clang 都支援 [visibility](https://gcc.gnu.org/wiki/Visibility) 屬性和 `-fvisibility` 編譯選項，以便對每個 object file 進行全域設定： * `default`: 不修改 visibility，symbol 會出現在動態符號表中 * `hidden`: 效果類似 `static`，此 symbol 不會被放入動態符號表，其他動態連結函式庫或執行檔看不到此 symbol。然而，與 `static` 不同的是，`hidden` symbol 仍具有 external linkage，可在同一動態函式庫內的不同 compilation unit 之間共用 ```c #if (__GNUC__ > 3) && (defined(__ELF__) || defined(__PIC__)) # define CHEWING_API __attribute__((__visibility__("default"))) # define CHEWING_PRIVATE __attribute__((__visibility__("hidden"))) #else # define CHEWING_API # define CHEWING_PRIVATE #endif ``` 可透過編譯器參數 `-fvisibility=hidden` 來指定全域的 visibility，一旦設定為 hidden，則所有沒有特別指定的 symbol 會被認定為 local，僅有明確修飾 `__attribute__((visibility("default")))` 的 symbol 才會被 export。這是大型函式庫的常見實務: 預設隱藏所有 symbol，僅匯出公開 API，既能避免 symbol 衝突，也能減少動態連結器在啟動時的 symbol resolution 開銷。延伸閱讀: * [Linker and Libraries Guide](https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-79797/index.html) * [Why symbol visibility is good](https://www.technovelty.org/code/why-symbol-visibility-is-good.html) 考慮以下程式碼: (syms.c) ```c static int local(void) { } int global(void) { } int __attribute__((weak)) weak(void) { } ``` 編譯和分析: ```shell $ gcc -o syms.o -c syms.c $ LC_ALL=C readelf --syms syms.o Symbol table '.symtab' contains 11 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS syms.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 SECTION LOCAL DEFAULT 2 4: 0000000000000000 0 SECTION LOCAL DEFAULT 3 5: 0000000000000000 7 FUNC LOCAL DEFAULT 1 local 6: 0000000000000000 0 SECTION LOCAL DEFAULT 5 7: 0000000000000000 0 SECTION LOCAL DEFAULT 6 8: 0000000000000000 0 SECTION LOCAL DEFAULT 4 9: 0000000000000007 7 FUNC GLOBAL DEFAULT 1 global 10: 000000000000000e 7 FUNC WEAK DEFAULT 1 ``` 對照看之前的 malloc_count: ```shell $ readelf --syms /tmp/libmcount.so | grep malloc 15: 00000000000007c0 163 FUNC GLOBAL DEFAULT 12 malloc 35: 0000000000000000 0 FILE LOCAL DEFAULT ABS malloc_count.c 36: 0000000000201050 8 OBJECT LOCAL DEFAULT 24 real_malloc.3854 61: 00000000000007c0 163 FUNC GLOBAL DEFAULT 12 malloc ``` 修改 malloc_count.c，讓定義的程式碼變更為以下: ```c __attribute__((visibility("hidden"))) void *malloc(size_t size) { ... 其餘不變 ... } ``` 就會發現 `LD_PRELOAD=/tmp/libmcount.so ls` 沒有效果。換言之，我們定義的 `malloc` 已經變成 local，不會影響其他動態連結函式庫和執行檔。重新觀察: ```shell $ readelf --syms /tmp/libmcount.so | grep malloc 35: 0000000000000000 0 FILE LOCAL DEFAULT ABS malloc_count.c 36: 0000000000201050 8 OBJECT LOCAL DEFAULT 24 real_malloc.3854 46: 00000000000007a0 163 FUNC LOCAL DEFAULT 12 malloc ``` 可見到 visibility 從原本的 GLOBAL 變更為 LOCAL。 ## 動態連結支援 > [Linking](https://www.scs.stanford.edu/18wi-cs140/notes/linking.pdf) 為了支援動態連結，編譯器和連結器必須協同產生額外的資料結構: * Procedure Linkage Table (PLT): 每個外部函式呼叫會經過一個 PLT entry。PLT 是一組 trampoline 程式碼，負責在首次呼叫時觸發 lazy binding，讓動態連結器解析函式的實際位址，後續呼叫則直接跳轉至已解析的位址。 * Global Offset Table (GOT): 存放全域變數和函式的實際位址。PLT 透過 GOT 取得目標函式的位址，GOT 的內容由動態連結器在載入時或首次呼叫時填入。二者的協作流程如下: 當程式首次呼叫外部函式 `foo()` 時，控制流經過 `foo@PLT`，此 trampoline 從 `GOT[foo]` 讀取位址。因為 lazy binding 的緣故，`GOT[foo]` 初始指向 PLT 中的 resolver stub，resolver 呼叫動態連結器的 `_dl_runtime_resolve()` 搜尋 `foo` 的實際位址，將結果寫回 `GOT[foo]`，並跳轉至 `foo`。後續對 `foo()` 的呼叫直接從 GOT 讀取已解析的位址，不再經過 resolver。在 x86-64 System V ABI 中，PLT/GOT 使用以下 relocation 類型: * `R_X86_64_JUMP_SLOT`: 用於 PLT entry 對應的 GOT slot，lazy binding 時由動態連結器填入函式位址 * `R_X86_64_GLOB_DAT`: 用於全域變數的 GOT entry，在載入時 (load time) 由動態連結器填入變數位址 * `R_X86_64_RELATIVE`: 用於位址相依的 relocation，僅需加上載入基底位址 (base address)，不涉及 symbol lookup 典型的 x86-64 PLT stub 序列如下: ``` foo@PLT: jmp *foo@GOTPCREL(%rip) # 透過 rip-relative 定址跳至 GOT entry pushq $index # 首次呼叫: 將 relocation index 壓入堆疊 jmp PLT[0] # 跳至 PLT 的第 0 項 (resolver entry) ``` `PLT[0]` 負責將 link_map 指標壓入堆疊，再跳至 `_dl_runtime_resolve()`。後者依據 relocation index 搜尋 symbol 並修改 GOT entry，使後續呼叫直接跳轉至目標函式。可透過設定環境變數 `LD_BIND_NOW=1` 或以 `-z now` 連結選項來停用 lazy binding，強制動態連結器在程式啟動時就解析所有 symbol。配合 `-z relro` (read-only relocations)，GOT 在完成 relocation 後會被設為唯讀，可有效防禦 GOT overwrite 攻擊，這組合稱為 full RELRO，是現代 Linux 發行版的預設安全配置。 ![image](https://hackmd.io/_uploads/H1c8o2cj-l.png) 延伸閱讀: * [Ian Wienand](https://github.com/ianw) 的電子書 [Computer Science from the Bottom Up](https://www.bottomupcs.com/)，其中 [Chapter 9. Dynamic Linking](https://www.bottomupcs.com/ch09.html) 詳盡解釋 PLT/GOT 的運作機制 * [C 語言編程透視](https://github.com/tinyclub/open-c-book) 第 4 章實驗小品: * [Better understanding Linux secondary dependencies solving with examples](http://www.kaizou.org/2015/01/linux-libraries) 進階題材: * [Modern dynamic linking infrastructure for PLT](http://lambda-the-ultimate.org/node/3474) * ~~Native Client~~: [Loading the dynamic linker and executable](https://chromium.googlesource.com/native_client/src/native_client/+/master/docs/initial_dynamic_load.md) (已於 2020 年由 Google 淘汰，功能被 [WebAssembly](https://webassembly.org/) 取代。此處保留為歷史參考，展示在網頁瀏覽器裡頭載入 ELF 執行檔的技術探索) * [Dynamic Linking and Loading](http://www.iecc.com/linker/linker10.html) * [Anatomy of Linux dynamic libraries](https://developer.ibm.com/tutorials/l-dynamic-libraries/) 應用場景: * [Applying Partial Virtualization on ELF Binaries Through Dynamic Loaders](http://amslaurea.unibo.it/5065/1/pareschi_federico_tesi.pdf) ### 從實踐中學習理解動態連結器最有效的方式是自己動手實作。以下三個開放原始碼專案各有不同的切入角度和深度，適合依序研讀: [min-dl](https://github.com/jserv/min-dl) (minimal dynamic linker) 是入門級的教學實作，展示 GOT/PLT 的處理機制以及 relocation 的流程，支援 x86_64 和 Arm (明確來說是 AArch32/Aarch64) 硬體架構。加上測試程式，整個程式碼才 400 餘行，適合作為理解動態連結器核心運作的起點。 [dynld](https://github.com/johannst/dynld) 採用漸進式的章節架構，以 C 和 x86-64 組合語言實作，涵蓋 System V x86-64 ABI 的細節: * 第 1 章介紹動態連結基礎 * 第 2 章建構不依賴標準函式庫的執行檔 (no-std)，檢視行程的初始狀態 (auxiliary vector、stack layout 等) * 第 3 章實作骨架動態連結器，能執行靜態連結的 no-std 執行檔 * 第 4 章實作完整的動態連結器，處理動態函式庫的相依關係相較於 min-dl 的一次看完全貌，dynld 的漸進設計讓讀者逐步理解從行程啟動到 symbol resolution 的完整流程。 [sloader](https://github.com/akawashiro/sloader) 則是更具野心的專案，目標是以現代 C++ (C++20) 取代 glibc 的 `ld-linux.so`。開發者指出，glibc 的 `ld-linux.so` 原始程式碼因大量巨集 (用於多架構支援) 和 `libc.so` 初始化交織在一起而難以閱讀。sloader 刻意將載入機制與 `libc.so` 初始化分離，追求程式碼的可讀性。目前 sloader 已能載入實用程式如 `cmake`、`g++`、`ld`、`htop`，甚至 GUI 應用程式，展示獨立實作 ELF loader 的可行性。不過，sloader 尚未實作安全相關功能 (如 RELRO、stack canary 驗證等)，僅適合作為研究和學習用途。 ELF loader 的設計與作業系統核心密切相關。主流的實作包括: 1. GNU LD ([ld.so](https://man7.org/linux/man-pages/man8/ld.so.8.html) 或 `ld-linux.so`): glibc 提供的動態連結器，是 Linux 桌面和伺服器環境的標準配置 2. Android linker (`bionic/linker/*`): Android 的 Bionic C 函式庫自帶的動態連結器，針對行動裝置的記憶體和啟動速度最佳化部分 Android App 出於啟動速度考量，希望預先載入所有動態連結函式庫，藉此避開 lazy loading 機制，減少 [Android Runtime](https://en.wikipedia.org/wiki/Android_Runtime) 因後續載入動態連結函式庫產生的延遲。為此衍生出二個專案: 1. Google 曾開發 [android_crazy_linker](https://chromium.googlesource.com/chromium/src.git/+/master/third_party/android_crazy_linker/src/) (目前已被 Chromium 專案淘汰，現代 Android API level 23+ 原生支援從 APK 直接載入 `.so` 等原本需要自行實作的功能) - 用於 Chrome 專案，以 C++ 撰寫 - 自行解析 ELF header 並配置對應記憶體，略過 Android linker 的操作 2. Meta 開發 [SoLoader](https://github.com/facebook/SoLoader) - 全部以 Java 撰寫，也會解析 ELF header，但僅用於找出動態函式庫的相依關係，實際載入仍委由 Android linker 處理