你所不知道的C與Linux

你所不知道的C與Linux --jserv lecture 心得與筆記看jserv 講座講韌體開發工程師要知道的[C語言細節](https://youtu.be/1QmbHXYc8_g?si=k0umvy1wdsIBeQ_G)與[Linux 架構](https://youtu.be/-2Pn4B8S1EM?si=9uFsv2CeAFERBtBA) 的筆記，與重點。對我來說，讀這篇，除了加強觀念(在The C Programing Language已經熟的就不會記錄了) 更重要是紀錄我從面試C陷阱問題與這篇額外提到的細節做統整，確保我能總是注意 [CERT Coding Standards](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard)。所以後續還要把這些重點做成Anki字卡做間隔性複習，以確保在業界，追bug 與debug 與寫程式能力。然後會期許自己 C++/ python 也透過leetcode 練習解問題，以及記錄重點在別篇繼續寫。不過以目前來說這兩篇C/Linux 以及 Linux kernel netowrking 書與 CCNA 200-301 (或Cisco LAN Switching) 是業界第一份上工最優先(聽起來就是根據晶片開發驅動的話 LDD可能也重點書?)。 [jserv C課程](https://hackmd.io/@sysprog/c-programming) [jserv linux課程](https://hackmd.io/@sysprog/linux-kernel-internal) Part 1. C ch2 指標篇 ch3 函式呼叫篇 ch4 記憶體管理、對齊及硬體特性 ch5 物件導向程式設計篇 (skip?) ch6 技巧篇 ch7 linked list 和非連續記憶體操作 ch8 Stream I/O, EOF 和例外處理以及未定義行為篇 Part 2. Linux Linux ch1 作業系統術語和概念 Linux ch2 透過 eBPF 觀察作業系統行為賦予應用程式生命的系統呼叫 Linux ch3 不僅是個執行單元的 Process Linux ch4 不只挑選任務的排程器 Linux ch5 記憶體管理 Linux ch6 檔案系統概念及實作手法 Linux ch7 中斷處理和現代架構考量 Linux ch8 Timer 及其管理機制 Linux ch9 針對事件驅動的 I/O 模型演化 Linux ch10 淺談同步機制 Linux ch11 多核處理器和 spinlock Linux ch12 RCU 同步機制與 Scalability 議題 ### ch2 指標篇 in C, everything is a representation (unsigned char[sizeof(TYPE)]). 讀規格書可大幅省去臆測(確實有感覺例如+=可但str = str+int不可) eg 大老從不寫void main 只寫 int main 溯源能力是很重要的，才不被舊瓶裝新酒或跨領域借用的『新觀念』所迷惑 C 語言的物件就指在執行時期，資料儲存的區域學會 [GDB](https://www.slideshare.net/slideshow/introduction-to-gdb-3790833/3790833#12)，p,b,r,c,info b, info program, n(直接調用func不進去),s(會進去func), x/ \<n(長度，當前addr後多長)/f(格式 d十進x十六)/u(byte數默認4byte, b單byte)\> \<addr\> [大神 Everything you need to know about pointers in C](https://boredzo.org/pointers/) 你常見這種寫法，這就是double pointer好處。(我自己用建兩個同一層級ptr, 然後用其中一個travesal，這跟pointer to pointer差在? 要思考清楚) ```cpp= struct list **lpp; for (lpp = &list; *lpp != NULL; lpp = &(*lpp)->next) ``` 在 object 的生命週期以內，其存在就意味著有對應的常數記憶體位址 (這點配合return static var-- 就是先回傳該var 若有像print用到就是先p再減)。注意，C 語言永遠只有 call-by-value :::warning 避免memory leak, 也避免重複free 動態記憶體(free(ptr), where ptr = malloc(...) )後，使用空指標，避免魔法數字，eg 你register offset 4 是有其意義的話要用對應macro 定的字 REG_LED 或 VLAN_TOTAL_TABLE 之類。小心segment fault，return \*strPtr ="hello"; 存取權限衝突試圖存取錯誤地方。 buffer overflow(gets, strcpy, scanf, printf 等不檢查buffer要注意用更好替換)，int overflow(使用INT_MAX 與INT_MIN等作檢查). function prototype 幫助debug 避免錯誤也幫助編譯器在最佳化階段，能最大化效益。 forward declaration 重要幫助偵錯，防止誤用，以及加快build time。 scope也重要，常看到Linux kernel 在function 內或struct內宣告，就是只想讓此block內知道(least privilage原則的實現)，外部不知道，static 也比global好就在scope差別。 ::: ```cpp= //設定某絕對地址的對應直 -> 轉型成指標，再用 * 取值做更改 *(int32_t * const) (0x67a9) = 0xaa6; ``` A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 重要規範 `void * 和 char * 彼此可互換`的表示法 (所以The C programing language 在字串處理部分，確實很多char*的使用void* 作return value...) ```cpp= void *memcpy(void *dest, const void *src, size_t n); ``` 指標(也是物件導向核心)是儲存-執行模型下必要條件 * 給一址，CPU 可讀該位址的資料。 * 給一址，CPU 可寫資料至該位址。 :::success 像是 Linux超常見的 [container_of](https://hackmd.io/@sysprog/linux-macro-containerof) 幫助link list, hash table 通用類資料結構簡化程式設計 void * 存在的目的就是為了強迫使用者`使用顯式轉型或是強制轉型` ::: If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined. 不保證 pointers to data(type meaning char* or void* in standard) 與 pointers to functions 之間相互轉換是正確的因為C 只有call by value，所以一定要注意 ptr 改值用double pointer其內涵本質 (你不用 pointer to pointer 情況是 aPtr 指向orignal data, function 複製aPtr出 bPtr做操作指向2號資料出去後就消失 ) 來比較用pointer to pointer : function 造aPPtr 指向aPtr (傳入&aPtr) 在函數內做操作指向2號data 使aPPtr 指向的aPtr 指向了2號data。想像一下圖片畫面，想不出來就回去看jserv指標篇 array vs. pointer: in declaration: extern arr[] 不可變ptr， arr[10] 不可變ptr，`func(char arr[])可變func(char *arr)` in expression: arr與ptr可互換 x\[i\] 總是被編譯器改寫為 \*(x + i) gbd : 可p sizeof(var), 運算式, 分析型態whatis var, 觀察與修改記憶體 ```cpp= (gdb) x/4 b 0x601060 <b>: 0x00000000 0x00000000 0x00000000 0x00000000 (gdb) p b $4 = {{ v = {0, 0, 0}, length = 0 } <repeats 17 times>} //p 命令不僅能 print，還可變更記憶體內容: (gdb) p (&b[0])->v = {1, 2, 3} ``` x /[長度][格式] [位址表達式] 提升輸出可讀性 (gdb) set print pretty 甚至在動態時期可呼叫函式 (改變執行順序)，比方說 memcpy ```cpp= //strcpy或strcat導致那個r超過原本配置的空間就會出問題 char *r = malloc(strlen(s) + strlen(t) + 1); // use sbrk; change program break if (!r) exit(1); /* print some error and exit */ strcpy(r, s); strcat(r, t); ``` ![ptr-assign-change](https://hackmd.io/_uploads/r1LF28YpJe.png) 左值 (lvalue) : 一個佔據某個特定記憶體的值。右值 (rvalue) : 一個 expression 結束後就消失的值。 *jserv大力推 Learn C The Hard Way此書* :::success char str[123]; 為何 str == &str 呢？只是值一樣但其實型態不同(面試題厲害== 之前有練到呀)，左為char * 右為pointer to an array `char (*)[123]` string literals 會被分配於 "static storage" 當中 (C99 [6.4.5]) 修改 string literals 的內容，將會造成未定義行為注意 arr, ptr 互換可能的問題 eg 有[程序員](https://hackmd.io/@sysprog/c-array-argument) func parameter 傳 arr\[...\] 然後for loop 終止條件用sizeof(arr) 這其實是回傳指標型態 (因為func pass arr convert to ptr ) 避免重複造輪子，而且你(普通人通常)造的又爛又醜...，所以要用向下方這種內核提供的 macro ```cpp= #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) ``` ::: :::spoiler lvalue and rvalue lvalue (C 和 C++ 都有) 和 rvalue (C++ 特有) C 語言標準中明確聲明 l 是 "locator" (定位器) 的意思，lvalue 定義為 "locator value"，亦即 lvalue 是個物件的表示式 C++ 98，非 lvalue 運算式被稱為 rvalue C++ 11，運算式又被重新分類為 glvalue(“generalized” lvalue), prvalue(“pure” rvalue), xvalue(an “eXpiring” value), lvalue 與 rvalue。 ::: [Insecure coding in C必看](https://www.slideshare.net/slideshow/insecure-coding-in-c-and-c/35699341#1) [null pointer與未定義行為](https://pvs-studio.com/en/blog/posts/cpp/0306/) ### ch3 函式呼叫篇 IRQ (interrupt request) ISR (Interrupt Service Routines) IRQ mode MMIO v.s PMIO 以網路卡的流程為例: 封包進來 -> interrupt(loading高就NAPI的polling buffer處理) -> ISR -> IRQ mode -> IORQ進行記憶體操作(讀取/寫入資料) 回傳值放在暫存器可以提高效能，放不下的就放起始位址 (e.g. struct) stack frame 最好的朋友是 2 個暫存器: stack pointer frame pointer * rip (instruction pointer): 記錄下個要執行的指令位址 (配合return) * rsp (stack pointer): 指向 stack 頂端 * rbp (base pointer, frame pointer): 指向 stack 底部記住memory理架構 stack由高至低長(以stack角度新長的就是頂端--rsp, rbp記錄這區域(子function)開頭位置), heap由低至高透過 gdb 追蹤程式: $ gdb -q stack 在 GDB 中使用 disas 命令將其反組譯![stack-overflow-ex](https://hackmd.io/_uploads/BkJmqwqpJe.png) (gdb) set disassembly-flavor intel (gdb) disas main [基本就是 ret addr(下一指令) 先push, 再建新func base空間,再rsp頂端繼續長,回來就rsp回已push的那地址與rbp回前一func stack開頭](https://hackmd.io/@sysprog/c-function#%E5%8B%95%E6%85%8B%E8%BF%BD%E8%B9%A4-Stack) 例子很好，適合trace。 another funA push ret addr then push rbp ![funA push ret addr then push rbp](https://hackmd.io/_uploads/rJdvqw5TJl.png) stack-based buffer overflow: 由來就是gets, strcpy 等function沒有限制size大小，使buffer塞滿，甚至使memory space被充斥這些壅塞值，而可能造成segment fault存取錯誤(無權限)區域。 ![stack-overflow-ex](https://hackmd.io/_uploads/HJsO9vc6Jg.png) :::spoiler ROP attack ROP attack 基礎概念：使用一連串的指令組合成一個完整的功能利用 ret 讓程式執行目標指令 RIP 裡面的內容是下一個要執行的指令；RSP 是目前 stack top 的位置在正常的情況下，程式當呼叫 ret 前會使用 pop 將 stack 內無用的內容丟掉、接著把 RBP 的內容設為 retrun 後的 RBP，最後讓 RSP 指在 return address 上。藉由 overflow 在 stack 中加入一連串的指令(且指令以 ret 結尾)，當程式執行 ret 時就會執行 RSP 所指的內容，因為每個指令都是以 ret 結尾，所以會繼續執行 stack 中的下一道指令，如此一來就能使程式執行一連串的指令，ROP 的目的就是透過這一連串的指令組合成一個有意義的行為。 ::: :::spoiler callo dif from malloc + memset! stack 不可以 heap 可以是因為 long arr[n + 1]; 時，作業系統會去衡量要求的記憶體足不足夠，此時發現要得太大，因此發出 segfault. 而 malloc() 是屬於妳要多少跟我說，我都給妳 (overcommit)。所以拿到的空間可能會比要求的小，作業系統發現開始不夠用時就開始殺東殺西把空間騰出來因此發現，malloc() 存取大容量其實是有很大風險的， calloc() 跟作業系統要空間時會拿到歸好零的記憶體 (for security)，已歸零的就不重覆歸零。　我們無法保證 malloc 要回來的空間是否為 0，因此都需要 memset 過。(可能重工) calloc() 在要大空間 (128KB↑) 時，如果執行環境能夠配合，就會有效率得多: kernel 分配 VM 給 calloc()，而 VM 空間用完時再執行騰出空間、歸零、swap 的動作 (amortize cost)；malloc() 先執行這些動作。 free() 釋放的是 pointer 指向位於 heap 的連續記憶體，而非 pointer 本身佔有的記憶體 (*ptr)。 (因此free完後該指標仍指向那位置要在設*ptr = NULL才好!) ::: 總是記得 memory架構! 最低位址(0) code section(最重要不可更改) 最高Stack(區變)往低長，heap往高長。而對Stack 角度來說 RSP就是其頂端(也就是mem目前stack區 mem addr最低) 而RBP紀錄此frame(function區段一切都是addr存儲以及pointer)的開頭而RIP就像計算機結構的next instruction，指標指向下個執行的指令(配合return) *為了防止對同一個 pointer 作 free() 兩次的操作，而導致程式失敗，free() 後應該設定為 NULL。* glibc 提供了 malloc_stats() 和 malloc_info() 這兩個函式，可顯示 process 的 heap 資訊 size_t: 一種無符號整數，隱含著本機理論所能容納建立最大物件的位元組數大小的意義，因此常被用於`數組索引、記憶體管理函數` (真的很常見，也很有必要) size_t 小陷阱: ```cpp= #include <stdio.h> #include <string.h> int main(void) { int i = -1; if (i < strlen("hello")) { printf("Hello, World\n"); } else { printf("Hello, test\n"); } return 0; } ``` 結果竟為 hello test 隱式自動類型轉換機制，size_t實際類型為unsigned long int 有號與無號運算，會自動把有號promote成無號結果變很大 *有號與無號間的運算，要小心!* [functional programing思考你為安全，簡潔明確，要做的事](https://hackmd.io/@sysprog/c-functional-programming) :::spoiler FP 思考法運算用的 function 大多只包含條件式及遞迴呼叫 (功能模塊化不要assign一堆又運算一堆 ) 沒有隱含 (implicit) 的 Side Effect (副作用) 三種常見操作 map(): 配合給定行為，回傳一個陣列或鏈結串列 //預處理 filter(): 用以檢索和篩選 //檢索處理目標與篩選 reduce(): 將素材混合在一塊，得到最終結果 //運算，或是大型運算再細分(recursion, div&conq)。 ![func-map](https://hackmd.io/_uploads/r1I_Hd5a1e.png) Pure functions 功能專一化 Function Composition 大分小 Avoid side effects 避免隱性轉換 (功能不專一搞IO)等副作用以及用`const避免不預期的改變` (最小特權原則) ::: ### ch4 記憶體管理、對齊及硬體特性 ![mem-allo](https://hackmd.io/_uploads/SyaZvu5pJl.png) A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer. void * 的設計，導致開發者必須透過 explicit (顯式) 或強制轉型 free() 釋放的是 pointer 指向位於 heap 的連續記憶體 pragma pack 是用來指定 struct 結構內部資料的儲存對齊方式的預處理指令 (處理低階資料 eg封包常用到) 指定 C 語言編譯器在處理記憶體對齊時，所使用的記憶體封裝長度 (可用1,2,4,16) malloc() and calloc() functions return a pointer to the allocated memory, which is suitably aligned for any built-in type. In the GNU system, the address is always a multiple of eight on most systems, and a multiple of 16 on 64-bit systems. 處理alignment做壓縮變更結構體佔用的空間 pack macro #pragma pack(push[,n])將目前對齊設定推送到內部堆疊，然後可選擇設定新的對齊。 #pragma pack(pop)將對齊設定還原為內部堆疊頂部儲存的設定（並刪除該堆疊條目） [struct alignment 三大規則](https://zhuanlan.zhihu.com/p/646870288) 成員各自記憶體對齊後，結構體本身還要進行一次記憶體對齊，確保整個結構體佔用內存大小是結構體內最大資料成員的最小整數倍；原則(3) 成員有陣列依照陣列資料類型與最大者取較大者對齊 nested struct 依照 raw data type最大者取對齊 (最後一步) 對應原則(3) static 成員放在靜態變數區不影響sizeof結果 //every thing in C is value and address(pointer)的概念 (有點不精確再去看指標篇!) [更多細節](https://zhuanlan.zhihu.com/p/499605869) 三大原則: （1）結構體變數的起始位址能夠被其最寬的成員大小整除。(起始位置從0 開始通常不會有問題) （2）結構體每個成員相對於起始位址的偏移量能夠被其自身大小整除，如果不能則在前一個成員後面補充位元組。 (對應每組變數alignment eg 結構內char a; int b 這情況 b 在a後面補alignemnt 直到4 bytes) （3）結構體總體大小能夠被最寬的成員的大小整除，如不能則在後面補充位元組。 :::spoiler compiler and optimize 編譯器和最佳化原理篇 ![gnu process-flow](https://hackmd.io/_uploads/r1dfPc9p1x.png) crt (C runtime ) 將 exit code 傳遞給作業系統的程式(可透過echo $? 來看) exception handling，透過 setjmp 和 longjmp 函式來存取編譯器要考慮最佳化也考慮不知道最糟會有什麼呼叫或用戶亂用。不知道 caller 會怎麼傳遞數值 ```cpp= void foo(int *i1, int *i2, int *res) { for (int i = 0; i < 10; i++) { *res += *i1 + *i2; } } //遇到這樣呼叫 (pointer alising) 就無最佳化可用 foo(&var, &var, &var); //同時也要注意在函式 paramerter做 a++, ++a 會因順序無法確定(賦值時機)，而未定義行為 ``` ![compile-flow](https://hackmd.io/_uploads/Sy4ls9qayl.png) COFF (common object file format) : 是種用於執行檔、目的碼、共享函式庫 (shared library) 的檔案格式 ELF (extended linker format) : 現在最常用的文件格式，是一種用於執行檔、目的碼、共享函式庫和核心的標準檔案格式，用來取代 COFF ::: ### ch5 物件導向程式設計篇 Data Encapsulation 善用 forward declaration 與 function pointer，用 C 語言實現 Encapsulation ![principle of OOP](https://hackmd.io/_uploads/r1zkNj9p1e.png) 抽象: 分離實作與 declare 介面封裝: 部分內部method 只由 obj interface去交互運作 (外部不讓你碰) 繼承: 基礎共有的properties and method 可被子物件繼承或延伸多型: 同樣method(四則運算)可在不同obj運作得略有不同(根據其物件特性) :::spoiler good example from jserv 要尤其注意細節 pointer to pointer and init \*ptr as NULL then use after assignemt(malloc) ```cpp= #include <stdio.h> #include <stdlib.h> /* forward declaration */ typedef struct object Object; typedef int (*func_t)(Object *); struct object { int a, b; func_t add, sub; }; static int add_impl(Object *self) { // method return self->a + self->b; } static int sub_impl(Object *self) { // method return self->a - self->b; } // & : address of // * : value of // indirect access int init_object(Object **self) { // call-by-value if (NULL == (*self = malloc(sizeof(Object)))) return -1; (*self)->a = 0; (*self)->b = 0; (*self)->add = add_impl; (*self)->sub = sub_impl; return 0; } int main(int argc, char *argv[]) { Object *o = NULL; init_object(&o); o->a = 9922; o->b = 5566; printf("add = %d, sub = %d\n", o->add(o), o->sub(o)); return 0; } ``` ::: designated initializers: struct init，Omitted field members are implicitly initialized the same as objects that have static storage duration. \[index\], .fieldname 這樣的形式被稱作 designator (指示符) 可同時使用 C 語言的物件就指在執行時期，*資料儲存的區域，可以明確表示數值的內容* from jserv 精彩沒想過的用法，一切都是(指標與記憶體) ```cpp= struct stack { struct stack_type *self; /* Put the stuff that you put after private: here */ }; struct stack_type { void (*construct)(struct stack *self); /**< initialize memory */ struct stack *(*operator_new)(); /**< allocate a new struct to construct */ void (*push)(struct stack *self, thing *t); thing * (*pop)(struct stack *self); int as_an_example_entry; } Stack = { .construct = stack_construct, .operator_new = stack_operator_new, .push = stack_push, .pop = stack_pop }; ``` 前置處理器: macro優: 快，文字替換，條件編譯。但缺點也很明顯: 重複運算`邏輯很容易出問題`，`不做型態檢查` ARRAY_SIZE() [這樣的macro其實是陷阱重重](https://frankchang0125.blogspot.com/2012/10/linux-kernel-arraysize.html)... 因為macro本身沒辦法做型態檢查，只是單純的將值帶入並展開而在C中我們常常會將指標和陣列混著使用因此Linux在定義ARRAY_SIZE() 時除了透過上述的方式來取得陣列元數個數外還另外加上了型態檢查最尾端額外加了__must_be_array() Linux 核心採用 macro 來實作 linked list (太多地方用list, 避免function call 來提升效能了) macro 在編譯時期會被編譯器展開成實際的程式碼，這樣做的好處是不依賴編譯器最佳化。 ### ch6 技巧篇矩陣運算的案例，實踐物件導向、指標操作、函式呼叫等觀念 [why c++11 dont support designator as c99](https://stackoverflow.com/questions/18731707/why-does-c11-not-support-designated-initializer-lists-as-c99) 兩方對物件看法的細節不同 following Designated Initializations, which are valid in C, are restricted in C++: ```cpp= struct A a = { .y = 1, .x = 2 } //c++ want designator in order assignment int arr[3] = { [1] = 5 } //array designator not allowed struct B b = {.a.x = 0} //designator can't be nested struct A c = {.x = 1, 2} //these are invalid in c++, either all or none data must be init by designator ``` 就如同你所知道 function pointer 幫助callback, 多型的感覺，為了大型開發，OOP(四件式Enc/Hide/Inherit/Polymorphism)與Functional Programing都是重點。抽象: 分離實作與 declare 介面封裝: 部分內部method 只由 obj interface去交互運作 (外部不讓你碰) 繼承: 基礎共有的properties and method 可被子物件繼承或延伸多型: 同樣method(四則運算)可在不同obj運作得略有不同(根據其物件特性) 由此可知，介面是個關鍵。類似的程式碼，即可透過一致的介面來存取，效能分析和正確性驗證的程式碼就能共用。需要更好封裝(結構內宣告var, operation，然後外部定義操作的函式，在main定義結構變數時再用. designator指派要用的函式! )，處理內部不同DS與Algo。 :::spoiler 要熟到光看上方文字簡述，就能想像此例子 ```cpp typedef struct matrix_impl Matrix; struct matrix_impl { float values[4][4]; /* operations */ bool (*equal)(const Matrix, const Matrix); Matrix (*mul)(const Matrix, const Matrix); }; static Matrix mul(const Matrix a, const Matrix b) { Matrix matrix = { .values = { { 0, 0, 0, 0, }, { 0, 0, 0, 0, }, { 0, 0, 0, 0, }, { 0, 0, 0, 0, }, }, }; for (int i = 0; i < 4; i++) for (int j = 0; j < 4; j++) for (int k = 0; k < 4; k++) matrix.values[i][j] += a.values[i][k] * b.values[k][j]; return matrix; } int main() { Matrix m = { .equal = equal, .mul = mul, .values = { ... }, }; Matrix o = { .mul = mul, }; o = o.mul(m, n); ``` ::: 優點: 建立實例後，輕鬆指派要用的函式，彈性與擴充性，變更方便(polymorphism) C99 (含) 以後，C 和 C++ 的發展就走向[截然不同的路線](http://david.tribble.com/text/cdiffs.htm) 像是String initializers, Nested structure tags, Array parameter qualifiers. 善用C99提供的[compound literals](https://www.ptt.cc/man/C_and_CPP/DB9B/DC33/M.1271034153.A.7F9.html) [Prefer to return a value rather than modifying pointers](https://github.com/mcinglis/c-style#prefer-to-return-a-value-rather-than-modifying-pointers) This encourages immutability, cultivates pure functions, and makes things simpler and easier to understand. It also improves safety by eliminating the possibility of a NULL argument. ```cpp // Bad: unnecessary mutation (probably), and unsafe void drink_mix( Drink * const drink, Ingredient const ingr ) { assert( drink != NULL ); color_blend( &( drink->color ), ingr.color ); drink->alcohol += ingr.alcohol; } // Good: immutability rocks, pure and safe functions everywhere Drink drink_mix( Drink const drink, Ingredient const ingr ) { return ( Drink ){ .color = color_blend( drink.color, ingr.color ), .alcohol = drink.alcohol + ingr.alcohol }; } ``` strdup 函式會呼叫 malloc 來配置足夠長度的記憶體，當然，你需要適時呼叫 free 以釋放資源。 [heap] strdupa 函式透過 alloca 函式來配置記憶體，後者存在 [stack]，而非 heap，當函式返回時，整個 stack 空間就會自動釋放，不需要呼叫 free。 C++11 提供三種 [smart pointer](https://learn.microsoft.com/en-us/cpp/cpp/smart-pointers-modern-cpp?view=msvc-170&redirectedfrom=MSDN) : unique_ptr/ shared_ptr/ weak_ptr Variable Length Arrays 就是陣列size 不是編譯時期確認的。但就像所學，這C99引入可用，但不好! 更多做法: 不要int totalSize = 8; char totalString\[totalSize\]; 而是#define TOTALSIZE 8 char totalString\[TOTALSIZE\]; 或是用malloc(但前者放stack 後者malloc放heap)。 ```cpp void f(int m, int C[m][m]) { double v1[m]; ... #pragma omp parallel firstprivate(C, v1) ... } ``` 善用 GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html): typeof 讓我們傳一變數，代表該變數的型態建議了一種寫法: (為避免重複運算，先用變數存值(就像violate 做運算也這樣處理)) ```cpp #define max(a,b) \ ({ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a > _b ? _a : _b; }) ``` [10 C99 tricks](https://gcher.com/posts/2015-02-13-c-tricks/) Linux-tools: 注意netstat, iptables, perf, ping, traceroute, top這些網路相關的! ![linux-tools](https://hackmd.io/_uploads/H16shYiTJl.png) 數值系統 bitwise operator: 因floating point 有其存儲極限避免做floating point compare 避免邏輯錯誤 ![FP-rep](https://hackmd.io/_uploads/SyesbLjpJl.png) 還有像struct 裡面signed int a = -1:1 使用bit field 會出問題數字運算，注意OVERFLOW引發邏輯錯誤，善用INT_MAX, INT_MIN等做判斷 Count Leading Zero: ```cpp int BITS = 31; while (BITS) { if (N & 0x80000000) break; N <<= 1; BITS--; } ``` ### ch7 linked list 和非連續記憶體操作 ```cpp void remove_list_node(List *list, Node *target) { Node *prev = NULL; Node *current = list->head; // Walk the list while (current != target) { prev = current; current = current->next; } // Remove the target by updating the head or the previous node. if (!prev) list->head = target->next; else prev->next = target->next; } ``` vs 有品味的版本 (但還是要注意除非你確定(腦裡很清楚整個邏輯也沒off by one error之類的)，不然業界易讀，易維護是重點) ```cpp void remove_list_node(List *list, Node *target) { // The "indirect" pointer points to the *address* // of the thing we'll update. Node **indirect = &list->head; // Walk the list, looking for the thing that // points to the node we want to remove. while (*indirect != target) indirect = &(*indirect)->next;//pointer to pointer 指向的地址所以才要先(*indirect)->next 然後再取址 *indirect = target->next; } ``` 理解還是要更準確些!! 不適替換target node 而是 indirect 是 `pointer point to a pointer!` 所以是先(\*indirect)->next 這個指標取其地址! 所以是 & (node->ptr)的概念既然如此我對這pointer to pointer 做一階deference \* 也就是(\*indirect = target->next這行) 就是prev node 的`ptr`這個指標改指向下下一個! `用pointer to pointer 一定要腦袋有那個間接的畫面` 也就是*指向的是一個指標* :::spoiler 間接指標 ![pointer2pointer](https://hackmd.io/_uploads/H180z9VAJg.png) ::: 從「要更新什麼位置的資料」思考，無論是 head 或者非 head，更新的是同一類型的資料，不用特別操作，自然省下額外的處理這點也有點像面試題練完後的感覺.. 根據做的事(回傳new or origin list) 來決定操作手法(但我想法粗淺的多)。還有像下列函式，思考目的 append -> 1.return 對應list head 加入新節點後(我以往也這樣做) 2.其實更進一步，在原list處理 append就好，但注意call by value 用原本一層pointer複製的list會消，所以用indirect pointer 兩層結構! 我copy的是第二層指標指向第一層*list 操作後消失的也是indirect pointer，原本會留下! 其實你malloc也是這樣概念要用pointer to pointer ```cpp void append_indirect(int value, list_entry_t **head) { list_entry_t **indirect = head; list_entry_t *new = malloc(1 * sizeof(list_entry_t)); new->value = value, new->next = NULL; while (*indirect) indirect = &((*indirect)->next); *indirect = new; } ``` [大神Linus on understanding pointer](https://grisha.org/blog/2013/04/02/linus-on-understanding-pointers/) merge sort exmaple ```cpp struct ListNode *mergeTwoLists(struct ListNode *L1, struct ListNode *L2) { struct ListNode *head; struct ListNode **ptr = &head; for (; L1 && L2; ptr = &(*ptr)->next) { if (L1->val < L2->val) { *ptr = L1; L1 = L1->next; } else { *ptr = L2; L2 = L2->next; } } *ptr = (struct ListNode *)((uintptr_t) L1 | (uintptr_t) L2); return head; } ``` vs 讓L1與L2用共通介面不寫重複程式碼，我感覺上式更好! 易讀易maintain, 但是下式還是要看懂現在L1 怎麼next?, 重點在node = &L1 or &L2 指向了這串而\*ptr = \*node 若ptr往前走也就是\*ptr->next == L1->next 再用double pointer去接其往前走的地址可以理解，但真的好嗎? 相比易懂易維護的寫法，暫時還改不過來的感覺，但重點業界遇到這樣的code要能理解! ```cpp struct ListNode *mergeTwoLists(struct ListNode *L1, struct ListNode *L2) { struct ListNode *head = NULL, **ptr = &head, **node; for (node = NULL; L1 && L2; *node = (*node)->next) { node = (L1->val < L2->val) ? &L1: &L2; *ptr = *node; ptr = &(*ptr)->next; } *ptr = (struct ListNode *)((uintptr_t) L1 | (uintptr_t) L2); return head; } ``` 重點就是換一個思考方向，double pointer 間接 (改外部能通用化處理特殊case) vs 直接同架構travesal (改不了外部為特殊case做特別code判斷) 來思考是否要使用另一個例子Leetcode 2095. Delete the Middle Node of a Linked List ```cpp struct ListNode *deleteMiddle(struct ListNode *head) { if (!head->next) return NULL; struct ListNode **indir = &head, *prev = NULL; for (struct ListNode *fast = head; fast && fast->next; fast = fast->next->next) { prev = *indir; indir = &(*indir)->next; } prev->next = (*indir)->next; free(*indir); return head; } ``` vs 前者被偵測 heap use after free , 後者由新的指標 next 來保存 (\*indir)->next，方可在 indir 釋放掉後再以 next 更新。 ```cpp struct ListNode *deleteMiddle(struct ListNode *head) { if (!head->next) return NULL; struct ListNode **indir = &head, *prev = NULL; for (struct ListNode *fast = head; fast && fast->next; fast = fast->next->next) { prev = *indir; indir = &(*indir)->next; } struct ListNode *next = (*indir)->next; free(*indir); prev->next = next; // *indir = next return head; } ``` ### ch8 Stream I/O, EOF 和例外處理以及未定義行為篇未定義行為重要目的： signed integer overflow 或其他機制，藉由標準的刻意留空，允許更激進最佳化。考慮此特殊案例, 正常思考就k = 0 , ++k 使其跑16次loop就出來此block, 但是++k使你出現dd = d\[16\], 然後編譯器在gcc4.8轉這成無限迴圈 d\[++k\] 等價於 (*((d) + (++k))) 而 ++k 在 for 迴圈「預期」最後的數值為 16，也就是 (*(d + 16)) 也就是說 (*(d + 16)) 取出的記憶體沒有規範，於是編譯器假設在 k < 16的條件中， k不可能等於16 產生無限迴圈。 (這案例好扯，平常寫哪可能強到隨時注意這些... 這可能就幫助`debug猜問題` 或是直接用UBSAN來檢查了, 也可能你工作中 git提交有問題的，就是要找到後查ref 換種寫法或是換個思維方向!) [Probelm report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53073) ```cpp= int d[16]; int SATD (void) { int satd = 0, dd, k; for (dd = d[k = 0]; k < 16; dd = d[++k]) { satd += (dd < 0 ? -dd : dd); } return satd; } ``` GCC 版本5.x 實作了更多 [UBSAN(執行時間未定義行為檢查器)](https://people.freedesktop.org/~narmstrong/meson_drm_doc/dev-tools/ubsan.html) , 使用設定CONFIG_UBSAN=y, 在對應的核心 Makefile 中 eg 單一檔案設置UBSAN_SANITIZE_main.o := y 來啟用 UB的幾種類型: Signed integer overflow Shifting an n-bit integer by n or more bits Divide by zero (編譯器假設除數不該為0 就把printf在最佳化拿掉了) ```cpp= #include <stdio.h> int testdiv(int i, int k) { if (k == 0) printf("found divide by zero\n"); return (i / k); } int main() { int i = testdiv(1, 0); return i; } ``` [Dereferencing a NULL pointer](https://lwn.net/Articles/342330/) 第 5 行，可能致使 dereferences the pointer prior to the check. ```cpp= static unsigned int tun_chr_poll(struct file *file, poll_table * wait) { struct tun_file *tfile = file->private_data; struct tun_struct *tun = __tun_get(tfile); struct sock *sk = tun->sk; unsigned int mask = 0; if (!tun) return POLLERR; ``` [error handling cmu lecture](https://www.cs.cmu.edu/afs/cs/academic/class/15213-m19/www/lectures/14-ecf-procs.pdf) [error handling in C](https://grisha.org/blog/2013/04/02/linus-on-understanding-pointers/) ![excep-flow](https://hackmd.io/_uploads/BJlH3Ajpye.png) ### Linux ch1 作業系統術語和概念 BPF (Berkeley Packet Filter): 在Linux擴充成 [eBPF (Extended BPF)](https://hackmd.io/@sysprog/linux-ebpf) 後，就變成 Linux 核心內建的內部行為分析工具，可用來動態/靜態追蹤 (dynamic/static tracing) 和 profiling eventsㄡ mseal: 現代微處理器支援對記憶體的 RW (讀寫和 NX (不可執行) 位元進行管理，提升記憶體損壞錯誤時的防禦能力，使攻擊者無法隨意修改記憶體並執行惡意指令。在該管理模式下，僅有標記為 X (可執行) 的記憶體區域才可執行程式 [jserv導讀-看漫畫學Linux](https://hackmd.io/@sysprog/linux-comic) ![Linux-brief](https://hackmd.io/_uploads/HJlWw9vklg.png) 最底層everything is file, 由process access file, process create by init(fork/clone), docker(一個軟體提供對應用程式的封裝的虛擬技術，類似VM是模擬硬體介面，docker提供軟體封裝到容器標準化單位)， process 5 state(R,S,D(uninterrput sleep),T(stop),Z) via signal/wake up, 記憶體管理-- mapping/ 虛擬記憶體/ slob/slab/slub記憶體配置器, 第二層http由pid 1314的行程接待port 80, 還有cron，負責例行性工作排程，按時執行週期性任務，或判斷是否要執行某個工作, 與pipe和ssh daemon，最上方tty對應的終端機是作業系統對外的操作和通訊的機制一開始行程就是一切後來發展出多執行序共享同記憶體空間(Linux多執行序仍以行程為單位做Contect switch與排程等等) 對於 Linux 核心的實作，不管是執行緒還是行程 (只有一個執行緒的行程)，一切都是 task_struct。fork 發生之際，子行程複製的僅是呼叫執行緒的 task_struct。倘若這時操作同一個地址空間的其他 task_struct 持有 lock，即便呼叫 fork 的 task_struct 並不知道這件事 (要持有 lock 後才知)，但這個事實還是會悄無聲息地傳給子行程。子行程如果這時候想去持有 lock，會發生死結！ SIGTERM 和 SIGKILL 插在SIGKILL不被攔截，收到立即退出，無法收尾，建議發SIGTERM如此行程收到後可能有機會做退出動作。 Linux具備overcommit 特性，也就是malloc提前給你配置記憶體(支票概念)，實際使用時仍可能OOM，目的是讓同一時間更多行程執行，只要在使用該配置記憶體前，其他放掉即可。包含 Apache 在內的網頁伺服器大多採取事件驅動的設計(過往是專線在線服務，現在是事件接服務要求，中間不維持專線，而是事件完成才去通知) [Linux kernel高階觀點](https://linux-kernel-labs.github.io/refs/heads/master/lectures/intro-slides.html#1) user mode 與 Kernel mode中間透過API溝通(system call)，OS管理driver/file 再去更底層Hardware abstraction layer 來與最底層的hardware溝通。 [微軟教材:適用所有驅動開發的觀念](https://learn.microsoft.com/zh-tw/windows-hardware/drivers/gettingstarted/virtual-address-spaces) [執行程式中間所有過程](https://cpu.land/) ### Linux ch2 透過 eBPF 觀察作業系統行為賦予應用程式生命的系統呼叫動態追蹤技術（dynamic tracing）提供除錯與追蹤來處理規模與複雜度高的程式通過探針 (probe) 機制來發起查詢 (就像Linux kernel networking 固定五個hook 點?) CONFIG_BPF_JIT 編譯選項要開啟 eBPF包刮: * 動態追蹤 (dynamic tracing); * 靜態追蹤 (static tracing); * profiling events; ![eBPF-tools](https://hackmd.io/_uploads/H1GdpcPkle.png) BPF 測試data flow: 在user mode 準備BPF code, 測量stack traces等資訊，將其編譯成BPF bytecode在kernel內部的隔離沙盒環境測試(for security) 1. BPF code compile to BPF bytecode at usermode * 2. load kernel verifier with 4 different test: kprobes(kernel dynamic tracing) * uprobes(user dyanmic tracing * tracepoints(kernel static tracing) * perf_events(timed sampling and Preventive maintaenance check and services) 3. 測試完傳回資料分per-event details or BPF maps(數據統計與圖表等整體性結果，感覺做簡報很有用? 聽說網通廠最常碰到1. C語言寫driver/IP層/APP層會再大致細分這幾份讓各部門做開發與整合 2. C語言與shell 常碰到最後是python 而C++幾乎無?) BPF就是in-kernel virtual machine，其官網介紹頁面提到The filter program is in the form of instructions for a virtual machine, which are interpreted, or compiled into machine code by a just-in-time (JIT) mechanism and executed, in the kernel. recap arp: TCP/IP layer protocol for request unknow mac addr by query ip addr. step1: host record their arp cache recording mac/ip mapping with time attributes. 2. when sending package, host check arp cache first. if no matching mapping, host go for arp request broadcast, receiver check its ip addr(prefix?) match ourown ip addr to decide response(arp reply) or not. 3. sender receive arp reply, and updating its arp cahce. :::spoiler [eBPF](https://www.brendangregg.com/ebpf.html) 很複雜... ::: eBPS: 負責kernel tracing, debuging, monitor, traffic control. 過去BPF核心溝通用recv()系統呼叫而EBPF引入map機制(在內核新開一空間有特製資料庫讓EBPF程式互動，如此設計感覺就安全/效益取捨都可見會讓惡意程式爽到? 所以verifier前檢查此BPF code是否接收變更重要吧) ![ebpf-map](https://hackmd.io/_uploads/SJmPWSEegg.png) [Intro to ebpf with code and slide](https://www.slideshare.net/slideshow/meet-cutebetweenebpfandtracing/62446985) [Netronome: 簡介EBPF與其程式限制eg no loop/ global/ 4096lines](https://docs.google.com/document/d/1NL1TBK3yZIkqjVtCxxHOv6X_1j_LnZgi1cv4FBGEzFc/edit?tab=t.0#heading=h.k44mvlkwmj85) Kprobes: 只讀系統呼叫的參數和返回值，無法變暫存器值。案例code 摘自[jserv linux lecture](https://hackmd.io/@sysprog/linux-syscall?type=view&stext=5592%3A308%3A0%3A1746323123%3APMsVv_) 準備`call.py` (原本還以為BPF是重寫 C 結果是要測試包起來再給額外參數與測試語法) ```cpp from bcc import BPF bpf_text = """ #include <net/inet_sock.h> #include <bcc/proto.h> int kprobe__inet_listen(struct pt_regs *ctx, struct socket *sock, int backlog) { bpf_trace_printk("Hello World!\\n"); return 0; }; """ b = BPF(text=bpf_text) while True: print b.trace_readline() ``` 先在一個終端機執行: ```shell $ sudo python call.py ``` 等五秒再開啟另一終端機執行nc (listen for an incoming connection, 0 hostname, 4242 port 相比於之前寫的clinet-server TCP連接是不同主機溝通這篇測試是用kprobe 在內部程式間溝通? 畢竟他用printk而且看起來語法確實與C有差異但為何是py?) Most examples of BPF code are written in a `mix of Python and C using BCC`, the "BPF Compiler Collection" ```shell $ nc -l 0 4242 ``` ### Linux ch3 不僅是個執行單元的 Process Linux初期只以single thread, 後來才引入multi-threaded and SMP(Symmetric multiprocessing), 而且仍把process 當作最基本abstraction, 排程與context switcg操作仍是process，thread僅分享空間與資源。(難怪與OS課程學的有差異) * clone()產生的thread本質仍是process，Linux內核thread只做到資源共享 * process管理有效率但thread相比其他實作則heavy weight 是種tradeoff [process address space and binary format講底層如何看程式記憶體空間的](https://www.cs.unc.edu/~porter/courses/cse506/s16/slides/address-spaces.pdf) ![vma](https://hackmd.io/_uploads/ByJN5SNexe.png) VMA 可以讓Linux kernel 以process 的角度來管理virtual address space (內心OS 忽然覺得好扯好大學課程扎實連Linux怎麼管理memory都講和給例子我學店考上清大更該在工作上拚一點... 想想你待過的環境就要能吃得瞭苦了再差也不比那時期差) [singal and IPC](https://www.cs.unc.edu/~porter/courses/cse506/s16/slides/ipc.pdf) ![sig-handler-flow](https://hackmd.io/_uploads/ryn9tr4elg.png) thread與同步基本問題: Race Condition, reentrancy, 解決用strtok_r拆分字串可reentrancy版本, 避免global variable, 使用工作區保存變數, Mutex [process management on memory and kernel thread mapping](https://wiki.csie.ncku.edu.tw/embedded/ProcessManagement.pdf) :要點就 scheduling, context switch時機點與作甚麼，以及many to one or one to one thread mapping. (user space syscall event -> kernel syscall handler and resource allocate(VFS/FS/dirver)-> schedule() and do context switch ) ![context-switch](https://hackmd.io/_uploads/rkM32HVgxx.png) Kernel can be interrupted(when some preemptive kerenl dont ahve lock) ≠ kernel is preemptive * Non‐preemptive kernel, interrupt returns to interrupted process * Preemptive kernel, interrupt returns to any schedulable process * ![process-with-thread](https://hackmd.io/_uploads/HJO66S4gee.png) ![shared-thread](https://hackmd.io/_uploads/ryGjTHEgex.png) [VMA/ memory管理程式補充](https://www.cnblogs.com/LoyenWang/p/12037658.html) ### Linux ch4 不只挑選任務的排程器 Linux排程器要考量 fairness, interactiveness, load balance, power management, preemption, real time 及時處理能力。資訊系統其實也存在多種排程器，例如: erlang scheduler: erlang processes 到 threads 間的映射; [nginx](https://nginx.org/en/): HTTP/HTTPS 請求到 processes / application 間的映射; [Memory manager](https://www.kernel.org/doc/html/latest/admin-guide/mm/index.html): virtual memory 到 physical memory 間的映射; Apache Spark: map/reduce job 到計算節點間的映射; 透過模組化與 mapping 得到QoS/ 資源利用/ another level of indirection" David J. Wheeler: "We can solve any problem by introducing an extra level of indirection." (也就是抽象化去開發電腦網路多一間接層使各部分職責明確?) 開發與維護時多去思考 principle of abstraction 和 principle of indirection兩個核心原則 (如同前章節C語言我思考的心得 1.問題描述2.input/output預處理3.模組化與函式化去明確分工4.寫好註解與其他程序互動定好) 普通行程排程細分: 交互式行程 (interactive processs注重互動)/ 批處理行程 (batch process注重throughput) Linux2.4提供的SMP 只用一global renqueue 使得O(N)複雜在現代成千上萬行程難以判斷系統負載 Linux 2.6提供兩queue: runqueue/ expired queue 當行程timeslice耗盡就進expired queue, 當runqueue空就兩者對調即可。(OS之round robin concept?) access 只能用 array; O(1) 若access用list 平均就要O(logN)了 insert/deletion 只能用 linked list, queue, stack; O(1) 在 Linux 2.6 排程器中，開發者採用 bitarray，為每種優先權分配一個 bit，若這個優先權佇列 (priority queue) 下面有行程，那麼就對相應的 bit 染色，置為 1，否則置為 0。這樣，問題就簡化成從低位到高位，尋找指定 bitarray 第一個設定為 1 的位元 Linux2.6，共 140 優先級，用長度 140 陣列去記錄權級別。每優先權下面用 FIFO queue 管理行程 ![prio](https://hackmd.io/_uploads/ByxtSsVxeg.png) 1. 在 active bitarray 裡，尋找 left-most bit 的位置 x; 2. 在 active priority array（APA）中，找到對應 queue APA[x]; 3. 從 APA[x] 中 dequeue 一個 process，dequeue 後，如果 APA[x] 的 queue 為空，那麼將 active bitarray 裡第 x bit 設定為 0; 4. *對於目前執行完的行程，重新計算其 priority，然後 enqueue* 到 expired priority array（EPA）相應的 queue EPA[priority]; 5. 若 priority 在 expired bitarray 裡對應的 bit 為 0，將其設定 1; 6. 如果 active bitarray 全為 0，將 `active bitarray 和 expired bitarray 交換`; 後續為了更高公平性，用[CFS](https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt) (Completely Fair Scheduler)取代Linux2.6排成器 80% of CFS's design can be summed up in a single sentence: CFS basically models an "ideal, precise multi-tasking CPU" on real hardware. cooperative multitasking: 協同式多工 preemptive multitasking: 搶佔式多工(via system call 之前運行時與之後都可能以及 timer interrupt) Rate Monotonice 週期越短的行程，有越高的優先權(所以要事先預估其時間並算優先權若有非週期的就判斷不了或是要計算比例 SUM(Time/Period) < 1才可排程 ) EDF 依樣算SUM(Time/Period) < 1 但是以最接近Deadline有最高優先權而CFS則是schedule_other類型以多種policy and classes 來決定nice priority 確保each task get a fair share of CPU time CFS implements three scheduling policies: - `SCHED_NORMAL` (traditionally called SCHED_OTHER): The scheduling policy that is used *for regular tasks*. - `SCHED_BATCH`: *Does not preempt nearly as often as regular tasks would*, thereby allowing tasks to run longer and make better use of caches but at the cost of interactivity. This is well *suited for batch jobs*. - `SCHED_IDLE`: This is even weaker than nice 19, but its not a true idle timer scheduler in order to avoid to get into priority inversion problems which would deadlock the machine. ### Linux ch5 記憶體管理 logical memory -> page physical memory -> frame OS為process準備segment table 紀錄base and limit 邏輯addr (s,d) 經過segment table contentation 後成physical addr (f,d) , d是offset. 在segment table 紀錄valid bit. 以process來看，記憶體分kernel/user兩部分比例1:3 所以是kernel 1GB + user process(stack , heap, BSS, data segment, text segment), stack由高至低成長. ![memory](https://hackmd.io/_uploads/SyBikpVxle.png) Linux 內記憶體 [mapping flow](https://blog.csdn.net/hfut_zhanghu/article/details/122340261): LA邏輯地址 –> 線性地址–> 實體地址 (PA) ![mapping](https://hackmd.io/_uploads/B1yaE64xgx.png) Intel的設計是段描述子集中存放在GDT或LDT中，而段暫存器存放的是段描述符在GDT或LDT內的索引值(index)。 Linux中邏輯位址等於線性位址 (反觀Intel 為了相容過往架構，把硬體設計搞得很複雜)。[為什麼這麼說呢？](https://zhuanlan.zhihu.com/p/149674856) 因為Linux所有的段（用戶代碼段、用戶資料段、內核代碼段、內核資料段）的線性地址都是從0x00000000 開始，長度4G，這樣線性地址=邏輯地址+ 0x00000000，也就是說邏輯地址等於線性地址了。線性地址到實體: 透過分頁機制與查表做mapping.(基礎OS ) Buddy memory allocation: divides memory into partitions to try to satisfy a memory request as suitably as possible buddy system 以 page (4KB) 為單位，但有更多小單位不適合故Linux引入slab解小物件分配。 slab 向 buddy system 去「批發」一些記憶體，加工切塊以後「零售」出去 slab不適合大規模多核processor，後續核心開發者引入 slub：改造 page 結構，削減 slab 管理結構的開銷、每個 CPU 都有一個本地活動的 slab `(kmem_cache_cpu)` vmalloc 用碎片來拼湊出一個大記憶體 Linux 為每個 process 管理了一套虛擬定址空間呼叫 malloc ，系統不會立即分配完整實體記憶體空間，而是維護虛擬定址空間，copy on write 精神。 ![malloc-real](https://hackmd.io/_uploads/By9kmpVxge.png) 若process存取時，虛擬定址空間部分尚未與 page frame 有所關聯，就引發page fault ![page-falut](https://hackmd.io/_uploads/SyAIrTExge.png) 關於slab/slub/slob: 後續再根據業界去研究，此處先skip。 [slub](http://www.wowotech.net/memory_management/427.html) 是目前 Linux 核心預設的 slab allocator [slob](https://lwn.net/Articles/157944/) 比 slab 配置器程式碼更精簡，但比 slab 更容易產生記憶體碎片問題，僅用於小型系統，特別是嵌入式系統。 vmalloc vs kmalloc 與 slab/slub/slob未看完 ### Linux ch6 檔案系統概念及實作手法 ![linux-fs](https://hackmd.io/_uploads/ryPuUTVgge.png) ### Linux ch7 中斷處理和現代架構考量 ### Linux ch8 Timer 及其管理機制 ### Linux ch9 針對事件驅動的 I/O 模型演化 ### Linux ch10 淺談同步機制