# 2018q3 Homework1 contributed by < [`butastur`](https://github.com/butastur-rtos) > ###### tags: `sysprog2018` `c11` `c99` `macro` `list` `neon` `linked list` `cache` `memory leak` * [你所不知道的 C 語言: 指標篇](https://hackmd.io/s/HyBPr9WGl#) * [你所不知道的 C 語言: 編譯器和最佳化原理篇](https://hackmd.io/s/Hy72937Me) * [你所不知道的 C 語言: 物件導向程式設計篇](https://hackmd.io/s/HJLyQaQMl) * [你所不知道的 C 語言: 未定義行為篇](https://hackmd.io/s/Skr9vGiQm) * [你所不知道的 C 語言: 前置處理器應用篇](https://hackmd.io/s/S1maxCXMl) * [你所不知道的 C 語言: linked list 和非連續記憶體操作](https://hackmd.io/s/SkE33UTHf) * [你所不知道的 C 語言:記憶體管理、對齊及硬體特性](https://hackmd.io/s/BkuMDQ9K7) * [現代處理器設計: Cache 原理和實際影響](https://hackmd.io/s/HkW3Dr1Rb) * [以位元駕馭能量](https://hackmd.io/s/rk2RO0cYX) * [Memory Allocation](https://hackmd.io/s/B1SRlfeee#) * malloc 實作的觀摩 * [X Macro](https://blog.noctua-software.com/c-tricks.html) * [Incompatibilities Between ISO C and ISO C++](http://david.tribble.com/text/cdiffs.htm) * [Experimenting with _Generic() for parametric constness in C11](https://fanf.livejournal.com/144696.html) ## 指標篇 * 用cdecl 解釋 C 的 declaration * void * 和 char * * Memory leak * Segmentation fault * Stack buffer overflows ### 用 cdecl 解釋 C 的 declaration 以下用 cdecl 的 explain 看一下 cast 和 declare 的差別 ```shell $ cdecl Type `help' or `?' for help cdecl> explain (char *)pt cast pt into pointer to char cdecl> explain char * pt declare pt as pointer to char cdecl> ``` 以下是 cdecl 用 英文產生 C 的 declaration :::info 這裡可以測試用英文描述的宣告正不正確, 換句話說,這裡不旦考驗英文,還考驗==宣告==的用字精不精準 ::: 先來個簡單的 pointer to char ```shell decl> declare a as pointer to char char *a ``` 然後看看 pointer to function ```shell cdecl> declare a as pointer to function returning char char (*a)() ``` pointer to function returning pointer to char 指標指向一個function, 這個function 傳回一個pointer to char ```shell cdecl> declare a as pointer to function returning pointer to char char *(*a)() ``` a 是一個指標,指向一個 function, 這個 function 傳回一個pointer,傳回的 pointer <b>指向一個 function</b>, 這個function 傳回一個 pointer to char ```shell cdecl> declare a as pointer to function returning pointer to function returning pointer to char char *(*(*a)())() ``` a 是一個 array, 由指標組成,<b>每一個指標都指向一個 function</b>, 每一個 function 都傳回一個指標,... ```shell cdecl> declare a as array of pointer to function returning pointer to function returning pointer to char char *(*(*a[])())() ``` 一個帶有 parameter 的 function, 這個 function 傳回一個 pointr to struct ```shell cdecl> declare a as function (int) returning pointer to struct b struct b *a(int ) ``` 這裡我初次寫下來的時候把 int 和 char 的位置理解相反了,正確的理解是:最右邊的 char 代表的是傳回的 pointer <b>所指向的 function 的參數</b> ```shell cdecl> declare a as function (int) returning pointer to function(char) returning int int (*a(int ))(char ) ``` ### void * 和 char * [C99 規格書, 6.2.5, page 36](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. void * 和 char * 互換的範例 ```clike #include <stdio.h> void main(){ char * str = "test"; void * voidptr = str; printf("voidptr: %s\r\n", voidptr); } ``` ```shell $ gcc voidptr.c -o voidptr $ ./voidptr voidptr: test ``` :::warning C 語言不保證你可以安全地轉換 void * 為任意型態後,再轉換回原本的型態 [你所不知道的 C 語言:記憶體管理、對齊及硬體特性](https://hackmd.io/s/BkuMDQ9K7) ::: ### Memory leak `valgrind` 這個工具, 先裝起來試試看 ```shell $ apt-get install valgrind ``` ```clike #include <stdlib.h> void main(){ void * ptr = malloc(200); } ``` ```shell $ gcc memoryleak.c -o memoryleak $ valgrind --leak-check=full ./memoryleak ==18931== Memcheck, a memory error detector ==18931== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==18931== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info ==18931== Command: ./memoryleak ==18931== ==18931== ==18931== HEAP SUMMARY: ==18931== in use at exit: 200 bytes in 1 blocks ==18931== total heap usage: 1 allocs, 0 frees, 200 bytes allocated ==18931== ==18931== 200 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==18931== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==18931== by 0x1086C1: main (in /root/temp/ssh/lab0-c/memoryleak) ==18931== ==18931== LEAK SUMMARY: ==18931== definitely lost: 200 bytes in 1 blocks ==18931== indirectly lost: 0 bytes in 0 blocks ==18931== possibly lost: 0 bytes in 0 blocks ==18931== still reachable: 0 bytes in 0 blocks ==18931== suppressed: 0 bytes in 0 blocks ==18931== ==18931== For counts of detected and suppressed errors, rerun with: -v ==18931== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ``` 因為 `malloc` 之後沒有 `free` , 所以發現 LEAK SUMMARY: `definitely lost: 200 bytes in 1 blocks` ### Segmentation fault 先做個實驗試試 ```clike void main(){ char * str = "fault"; *str = 'F'; } ``` ```shell $ gcc fault.c -o fault $ ./fault Segmentation fault ``` ### Stack buffer overflows gcc version: ```shell gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ``` gcc 跟 stack-protector 有關的參數有4個: -fno-stack-protector -fstack-protector -fstack-protector-strong -fstack-protector-all ```shell $ man gcc 1 ``` 先看一段 code ```clike= #include <stdio.h> #include <stdlib.h> #include <string.h> void main(){ char * str = malloc(sizeof(char) * 4); char arr[4]; strcpy(str, "12345678"); strcpy(arr, "12345678"); printf("str: %s\r\n", str); printf("arr: %s\r\n", arr); } ``` 這段 code 有幾個地方可以探討: - str 有沒有被 allocate 一段記憶體? - char * str 和 char str[4] 有什麼不一樣? - 在<b>第7行</b>, value of arr 是不是 NULL? - compile 的時候加 `-fstack-protector` - compile 的時候加 `-fstack-protector-all` - Stack smashing compile 的時候==不加== `-fstack-protector` ```shell $ gcc stackprotector.c -o stackprotector $ ./stackprotector Segmentation fault ``` compile 的時候==加== `-fstack-protector` ```shell $ gcc stackprotector.c -fstack-protector -o stackprotector $ ./stackprotector Segmentation fault ``` compile 的時候==加== `-fstack-protector-all` ```shell $ gcc stackprotector.c -fstack-protector-all -o stackprotector $ ./stackprotector str: 12345678 arr: 12345678 *** stack smashing detected ***: ./stackprotector terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffa481d7bfb] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7ffa482601f7] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x7ffa482601c0] ./stackprotector(+0x7f4)[0x55bcc9fe77f4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7ffa481872e1] ./stackprotector(+0x65a)[0x55bcc9fe765a] ======= Memory map: ======== 55bcc9fe7000-55bcc9fe8000 r-xp 00000000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector 55bcca1e7000-55bcca1e8000 r--p 00000000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector 55bcca1e8000-55bcca1e9000 rw-p 00001000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector 55bcca76a000-55bcca78b000 rw-p 00000000 00:00 0 [heap] 7ffa47f50000-7ffa47f66000 r-xp 00000000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1 7ffa47f66000-7ffa48165000 ---p 00016000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1 7ffa48165000-7ffa48166000 r--p 00015000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1 7ffa48166000-7ffa48167000 rw-p 00016000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1 7ffa48167000-7ffa482fc000 r-xp 00000000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so 7ffa482fc000-7ffa484fc000 ---p 00195000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so 7ffa484fc000-7ffa48500000 r--p 00195000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so 7ffa48500000-7ffa48502000 rw-p 00199000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so 7ffa48502000-7ffa48506000 rw-p 00000000 00:00 0 7ffa48506000-7ffa48529000 r-xp 00000000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so 7ffa4870e000-7ffa48710000 rw-p 00000000 00:00 0 7ffa48725000-7ffa48729000 rw-p 00000000 00:00 0 7ffa48729000-7ffa4872a000 r--p 00023000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so 7ffa4872a000-7ffa4872b000 rw-p 00024000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so 7ffa4872b000-7ffa4872c000 rw-p 00000000 00:00 0 7ffd3b4b9000-7ffd3b4ce000 rw-p 00000000 00:00 0 [stack] 7ffd3b526000-7ffd3b528000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Aborted ``` * Stack smashing 先解釋一下 [ `stack buffer overflow` ](https://en.wikipedia.org/wiki/Stack_buffer_overflow) * 觀摩一下 Linus Torvalds 怎麼處理 stack protector [ [ `source` ] ](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/stackprotector.h#L5) Stack protector works by putting predefined pattern at the start of the stack frame and verifying that it hasn't been overwritten when returning from the function. The pattern is called ==stack canary== and unfortunately gcc requires it to be at a fixed offset from %gs. * Stack canary ## 編譯器和最佳化原理篇 * Relocation * JIT 實做案例 * IR (Intermediate Representation) * 親手打造 Dynamic Library Loader * Study SSA * 提問 ### Relocation * 舉例說明一下 Relocation ### JIT 實做案例 #### [AMaCC](https://github.com/jserv/amacc) #### [jit-construct](https://github.com/jserv/jit-construct) ### IR (Intermediate Representation) 以下是一個 IR 的 example, 透過 AMaCC 產生, 細節可以參考 [ `How IR works inside AMaCC` ](https://github.com/jserv/amacc/blob/master/docs/IR.md) ```shell $ ./amacc -s -o hello hello.c 1: int main(int argc, char ** argv){ 2: printf("hello world\n\r"); ENT 0 IMM -1219248120 PSH PRTF ADJ 1 3: return 0; IMM 0 LEV 4: } LEV ``` ### [親手打造 Dynamic Library Loader](http://blog.linux.org.tw/~jserv/archives/002120.html) * Relocation * dlopen, dlsym, dlclose ### SSA (Static Single Assignment) LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly. ... A key design point of an SSA-based representation is how it represents memory. ... [ [`https://llvm.org/docs/LangRef.html`] ](https://llvm.org/docs/LangRef.html) 以上節錄自 LLVM 的官方網站,那麼具體來說, ==SSA究竟是什麼樣子? 用在什麼地方呢?== 底下會透過實驗,逐步增加對 SSA form的了解 節錄自gnu的官方網站: Most of the tree optimizers rely on the data flow information provided by the Static Single Assignment (SSA) form [ [`13.3 Static Single Assignment`] ](https://gcc.gnu.org/onlinedocs/gccint/SSA.html#SSA) 先看一下怎麼用 gcc dump SSA: ```clike #include <stdio.h> void main(int argc, char **argv) { int a, b; a = 1; a = 2; b = 3; b = 4; printf("b=%d", b); } ``` ```shell $ gcc -c -fdump-tree-ssa=ssa-out test.c $ cat ssa-out ;; Function main (main, funcdef_no=0, decl_uid=2210, cgraph_uid=0, symbol_order=0) main (int argc, char * * argv) { int b; int a; <bb 2>: a_1 = 1; a_2 = 2; b_3 = 3; b_4 = 4; printf ("b=%d", b_4); return; } ``` gcc 也可以輸出成 graphViz ```shell $ gcc -c -fdump-tree-ssa-graph test.c $ ls test.c.018t.ssa.dot test.c.018t.ssa.dot ``` ```graphviz digraph "test.c.018t.ssa" { overlap=false; subgraph "cluster_main" { style="dashed"; color="black"; label="main ()"; fn_0_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"]; fn_0_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"]; fn_0_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label="{ FREQ:0 |\<bb\ 2\>:\l\ |a_1\ =\ 1;\l\ |a_2\ =\ 2;\l\ |b_3\ =\ 3;\l\ |b_4\ =\ 4;\l\ |printf\ (\"b=%d\",\ b_4);\l\ |return;\l\ }"]; fn_0_basic_block_0:s -> fn_0_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"]; fn_0_basic_block_2:s -> fn_0_basic_block_1:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"]; fn_0_basic_block_0:s -> fn_0_basic_block_1:n [style="invis",constraint=true]; } } ``` 接下來比較一下 -fdump-tree-ssa-graph 和 -fdump-tree-cfg-graph 的差別 ```shell $ gcc -c -fdump-tree-cfg-graph test.c $ ls test.c.011t.cfg.dot test.c.011t.cfg.dot ``` ```graphviz digraph "test.c.011t.cfg" { overlap=false; subgraph "cluster_main" { style="dashed"; color="black"; label="main ()"; fn_0_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"]; fn_0_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"]; fn_0_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label="{ FREQ:0 |\<bb\ 2\>:\l\ |a\ =\ 1;\l\ |a\ =\ 2;\l\ |b\ =\ 3;\l\ |b\ =\ 4;\l\ |printf\ (\"b=%d\",\ b);\l\ |return;\l\ }"]; fn_0_basic_block_0:s -> fn_0_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"]; fn_0_basic_block_2:s -> fn_0_basic_block_1:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"]; fn_0_basic_block_0:s -> fn_0_basic_block_1:n [style="invis",constraint=true]; } } ``` * In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR),... Converting ordinary code into SSA form is primarily a simple matter of replacing the target of each assignment with a new variable... [ [ `Static single assignment form`] ](https://en.wikipedia.org/wiki/Static_single_assignment_form) #### 轉成 SSA form 之後的最佳化場景 * The algorithm operates by performing abstract interpretation of the code in SSA form. During abstract interpretation, it typically uses a flat lattice of constants for values and a global environment mapping SSA variables to values in this lattice. The crux of the algorithm comes in how it handles the interpretation of branch instructions...[ [`Sparse conditional constant propagation`] ](https://en.wikipedia.org/wiki/Sparse_conditional_constant_propagation) * Constant Propagation with Conditional Branches 1991年 MARK N. WEGMAN and F. KENNETH ZADECK 發表了一篇關於 Constant Propagation 的論文,裡面探討了使用 Constant Propagation Algorithms 對編譯器作最佳化的理論和實作 ### 提問 :::warning :question: 能否把 SSA form 理解為: 一種 representation,在 translator 將 oridinary code 轉成 SSA form 之後,由於 SSA form 能把原本複雜的 branch 簡化,所以可以更容易地對 SSA form 作最佳化? ::: :::success 原本編譯器最佳化就像下棋,需要前後反覆思忖,才能擬定策略,而且一有變動,就要重新佈局。SSA 的突破在於,將前述策略改為 Graph,後者已經是電腦科學中充分探討的題材。 比方說,Graph 裡頭的 dominator 是說,當一個點 A 到點 B 在控制流程圖中,若沒有其他的路線,那麼 A 及 B 就是 dominator,後者意思為著如果程式進行到 B 就代表著 A 一定也會執行到,我們可以說 A 支配著 B,反過來說,B 也支配著 A。這點就決定了最佳化的可行路徑。 :notes: jserv ::: :::warning :question: 是否表示 SSA 將 compiler 最佳化的問題轉成等價的圖論上的問題? ::: ## 未定義行為篇 * [6.2.4 Storage durations of objects ](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) * [SEI CERT C Coding Standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard) 先舉一個例子來說明 C11, C99 在6.2.4提及的Storage durations of objects [ [ source ] ](https://wiki.sei.cmu.edu/confluence/display/c/DCL30-C.+Declare+objects+with+appropriate+storage+durations) ```clike= const char *p; void dont_do_this(void) { const char c_str[] = "This will change"; p = c_str; /* Dangerous */ } ``` If an object is referred to outside of its lifetime, the behavior is undefined... [ [`C99 規格書, 6.2.4 Storage durations of objects `] ](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) <b>第 4 行</b>是一個 UB,因為 value of p 是 c_str 的 address,但 c_str 這個 object的 lifetime 只在 dont_do_this 裡,如果 c_str 在 lifetime 外部被 referred 到,就是一個未定義行為 可以用這個方法解決 ```clike const char c_str[] = "This will change"; p = strdup(c_str); ``` ## 前置處理器應用篇 以 llvm 的 MemorySSA 為例 [ [ source ] ](https://github.com/llvm-mirror/llvm/blob/master/include/llvm/PassSupport.h),考慮以下 macro [PassSupport.h](https://github.com/llvm-mirror/llvm/blob/master/include/llvm/PassSupport.h): ```clike #define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis) \ static void *initialize##passName##PassOnce(PassRegistry &Registry) { #define INITIALIZE_PASS_DEPENDENCY(depName) initialize##depName##Pass(Registry); #define INITIALIZE_AG_DEPENDENCY(depName) \ initialize##depName##AnalysisGroup(Registry); #define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis) \ PassInfo *PI = new PassInfo( \ name, arg, &passName::ID, \ PassInfo::NormalCtor_t(callDefaultCtor<passName>), cfg, analysis); \ Registry.registerPass(*PI, true); \ return PI; \ } \ static llvm::once_flag Initialize##passName##PassFlag; \ void llvm::initialize##passName##Pass(PassRegistry &Registry) { \ llvm::call_once(Initialize##passName##PassFlag, \ initialize##passName##PassOnce, std::ref(Registry)); \ } ``` MemorySSA.cpp [ [ source ] ](https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/MemorySSA.cpp) ```clike INITIALIZE_PASS_BEGIN(MemorySSAPrinterLegacyPass, "print-memoryssa", "Memory SSA Printer", false, false) INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass) INITIALIZE_PASS_END(MemorySSAPrinterLegacyPass, "print-memoryssa", "Memory SSA Printer", false, false) ``` 會產生以下的程式碼 ```clike static void *initializeMemorySSAPrinterLegacyPassPassOnce(PassRegistry &Registry) { initializeMemorySSAWrapperPassPass(Registry); PassInfo *PI = new PassInfo( "Memory SSA Printer", "print-memoryssa", &MemorySSAPrinterLegacyPass::ID, PassInfo::NormalCtor_t(callDefaultCtor<MemorySSAPrinterLegacyPass>), false, false); Registry.registerPass(*PI, true); return PI; } static llvm::once_flag InitializeMemorySSAPrinterLegacyPassPassFlag; void llvm::initializeMemorySSAPrinterLegacyPassPass(PassRegistry &Registry) { llvm::call_once(InitializeMemorySSAPrinterLegacyPassPassFlag, initializeMemorySSAPrinterLegacyPassPassOnce, std::ref(Registry)); } ``` ## linked list 和非連續記憶體操作 * [list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) * [WRITE_ONCE](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h#L273) * [READ_ONCE](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h#L258) 先從觀察這兩個 macro 開始: WRITE_ONCE, READ_ONCE,在list.h裡,這兩個macro出現的頻率相當高,這兩個macro的定義是在 [compiler.h](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h) ```clike static __always_inline void __write_once_size(volatile void *p, void *res, int size) { switch (size) { case 1: *(volatile __u8 *)p = *(__u8 *)res; break; case 2: *(volatile __u16 *)p = *(__u16 *)res; break; case 4: *(volatile __u32 *)p = *(__u32 *)res; break; case 8: *(volatile __u64 *)p = *(__u64 *)res; break; default: barrier(); __builtin_memcpy((void *)p, (const void *)res, size); barrier(); } } #define WRITE_ONCE(x, val) \ ({ \ union { typeof(x) __val; char __c[1]; } __u = \ { .__val = (__force typeof(x)) (val) }; \ __write_once_size(&(x), __u.__c, sizeof(x)); \ __u.__val; \ }) ``` [ [`source`] ](https://github.com/torvalds/linux/blob/master/include/linux/list.h#L103) ```clike static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; WRITE_ONCE(prev->next, next); } ``` 先觀察WRITE_ONCE(prev->next, next)會被前置處理器轉成什麼樣的code ```clike ({ union { typeof(prev->next) __val; char __c[1]; } __u = { .__val = (__force typeof(prev->next)) (next) }; __write_once_size(&(prev->next), __u.__c, sizeof(prev->next)); __u.__val; }) ``` __u.__c 會被 pass 到__write_once_size 的第二個parameter, 所以先從 __u 這個 union 來觀察 ```clike union { typeof(prev->next) __val; char __c[1]; } __u = { .__val = (__force typeof(prev->next)) (next) }; ``` :::success 上面的程式碼剛好搭配 Week2 進度的 union :notes: jserv ::: ```clike WRITE_ONCE(prev->next, next); ``` 上面的 macro 轉換後, prev->next 的 value 就是 next, 於是 prev->next 就指向 next, 然後, 在 prev, next 之間的 element of list 就被刪除了 ### 提問 :::warning :question: 從 __list_del 到 WRITE_ONCE 再到 __write_once_size, 都沒有看到 free, 所以如果 element of list 是 malloc 而來的object, 是不是不能用 __list_del ? >[name=butastur-rtos][time=Sun, Sep 30, 2018 11:59 PM] ::: :::warning :question: 是不是可以說如果要用 list_for_each_entry_safe 來 free,都要以下的方式來 free? >[name=butastur-rtos][time=Wed, Oct 24, 2018 6:49 AM] ::: 以下是對 p 作 free [source](https://github.com/torvalds/linux/blob/master/kernel/exit.c#L735) ```clike list_for_each_entry_safe(p, n, &dead, ptrace_entry) { list_del_init(&p->ptrace_entry); release_task(p); } ``` ## 記憶體管理、對齊及硬體特性 * [What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf) 這篇文章的作者 Drepper 同時也是 [glibc](https://github.com/lattera/glibc/graphs/contributors) 的開發者之一 * 對自己的應用開發自己的 memory allocator 是應該的 * arena 是一塊連續的記憶體 * MMU(Memory-Management Unit) 是硬體,通常是CPU的一部份。 * page fault [[`Page 14, 15, 18, 47, Virtual Memory and Linux`] ](https://events.static.linuxfound.org/sites/events/files/slides/elc_2016_mem.pdf) page fault 是一個 CPU exception。 當 Virtual memory 有 invalid access 的時候 MMU 會產生一個 exception(page fault)。invalid access 包含了 unmapped address 或是對特定位址的記憶體的存取權限不足,還有一種情況是存取一個己經 swapped out 的 address。 * page fault handler ### page fault handler 看一下這份文件怎麼說 [page fault handler](https://www.kernel.org/doc/Documentation/x86/exception-tables.txt) ```shell Whenever the kernel tries to access an address that is currently not accessible, the CPU generates a page fault exception and calls the page fault handler ``` 當存取一個 ==目前不能存取的 address==,CPU 會觸發一個 page fault exception,然後呼叫 page fault handler。 ### 提問 :::warning :question: > * MMU 一定是硬體嗎? > * MMU 如果不在 CPU 還有可能存在於什麼地方? > * MMU 有沒有可能是以軟體的方式存在? > [name='butastur-rtos'][time=Sun, Oct 7, 2018 4:26 PM] ::: ## [現代處理器設計: Cache 原理和實際影響](https://hackmd.io/s/HkW3Dr1Rb) * [有趣的 cache line 效應](https://champyen.blogspot.com/2017/05/cache-line.html) ## [以位元駕馭能量](https://hackmd.io/s/rk2RO0cYX) * [NEON](https://developer.arm.com/technologies/neon) * [Coding for NEON - Part 1: Load and Stores](https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores) * SIMD * [LCU14-504: Taming ARMv8 NEON: from theory to benchmark results](https://www.youtube.com/watch?v=ixuDntaSnHI) * [Auto-vectorization in GCC](https://gcc.gnu.org/projects/tree-ssa/vectorization.html) [ [`source`] ](https://www.youtube.com/watch?v=ixuDntaSnHI) ```clike uint8x8_t a, b; uint16x8_t a16, b16, c; /* The naive way */ a16 = vmovl_u8(a); b16 = vmovl_u8(b); c = vadd_u16(a16, b16); /* The smarter way */ c = vaddl_u8(a, b); ``` ```clike vst1.64 {d0-d1}, [r7:64] vld1.64 {d16-d17}, [r7:64] vstr d17, [r7, #24] vldr d18, [r7, #32] ``` 要理解以上這4行, 有幾個 keywords 要先看一下 [ [`Coding for NEON - Part 1: Load and Stores`] ](https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores): * Instruction mnemonic * Interleave Pattern * Address in ARM register, e.g. <b>[r7:64]</b> * NEON register, e.g. <b>d0</b> <b>vst</b>1, <b>vld</b>1, 粗體的地方是instruction mnemonic, 分別是 store, load vst<b>1</b>, vld<b>1</b>, 粗體的地方是interleave pattern (1, 2, 3 or 4) [ [`4.2.1, page 201`] ](http://infocenter.arm.com/help/topic/com.arm.doc.dui0489e/DUI0489E_arm_assembler_reference.pdf) The VLDR instruction loads an extension register from memory. The VSTR instruction saves the contents of an extension register to memory. double_elements 節錄自 [[`NEON intrinsic example`]](https://developer.arm.com/technologies/neon) ```shell $ cat neon.c #include <arm_neon.h> #include <stdio.h> uint32x4_t double_elements(uint32x4_t input) { return(vaddq_u32(input, input)); } int main(int argc, char **argv){ uint32x4_t input ={-1,0,-1,0}; double_elements(input); return 0; } $ arm-linux-gnueabihf-gcc -o neon.s neon.c -S -O0 -Wall -ftree-vectorize -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard ``` 輸出的 neon.s 的內容會是這樣, fpu 在第 2 行 ```clike= ... .fpu neon-vfpv4 .type double_elements, %function double_elements: @ args = 0, pretend = 0, frame = 48 @ frame_needed = 1, uses_anonymous_args = 0 @ link register save eliminated. push {r7} sub sp, sp, #52 add r7, sp, #0 vst1.64 {d0-d1}, [r7:64] vld1.64 {d16-d17}, [r7:64] vstr d16, [r7, #32] vstr d17, [r7, #40] vld1.64 {d16-d17}, [r7:64] vstr d16, [r7, #16] vstr d17, [r7, #24] vldr d18, [r7, #32] vldr d19, [r7, #40] vldr d16, [r7, #16] vldr d17, [r7, #24] vadd.i32 q8, q9, q8 vmov q0, q8 @ v4si adds r7, r7, #52 mov sp, r7 ... ``` ## Memory Allocation * [Anonymous memory](https://flylib.com/books/en/2.830.1.111/1/) * unmapped address space * mmap * shrink heap [ [`source`] ](https://github.com/torvalds/linux/blob/master/include/linux/page-flags.h#L390) ```clike /* * On an anonymous page mapped into a user virtual memory area, * page->mapping points to its anon_vma, not to a struct address_space; * with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h. * * On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled, * the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON * bit; and then page->mapping points, not to an anon_vma, but to a private * structure which KSM associates with that merged page. See ksm.h. * * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable * page and then page->mapping points a struct address_space. * * Please note that, confusingly, "page_mapping" refers to the inode * address_space which maps the page from disk; whereas "page_mapped" * refers to user virtual address space into which the page is mapped. */ #define PAGE_MAPPING_ANON 0x1 #define PAGE_MAPPING_MOVABLE 0x2 #define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) static __always_inline int PageMappingFlags(struct page *page) { return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0; } static __always_inline int PageAnon(struct page *page) { page = compound_head(page); return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0; } static __always_inline int __PageMovable(struct page *page) { return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) == PAGE_MAPPING_MOVABLE; } ``` 以下透過交叉參考 [rmap.h](https://github.com/torvalds/linux/blob/master/include/linux/rmap.h#L16), [ksm.h](https://github.com/torvalds/linux/blob/master/include/linux/ksm.h) 來說明 anonymous page 是如何動作的 :::info * anonymous page 是指沒有 mapped 到檔案的記憶體 * page_mapping, page_mapped 這兩個會引起誤會: <b>page_mapping 是指 maps the page from disk,是一個==動詞== page_mapped 是指 user virtual address space 所 mapped 的 page,是一個==名詞==</b> * rmap.h 定義了 anon_vma ::: ## malloc 實作的觀摩 ### libc * [Malloc-Examples](https://www.gnu.org/software/libc/manual/html_node/Malloc-Examples.html) 理解 malloc 之前, 要先理解一下什麼是記憶體對齊 [`Aligned-Memory-Blocks`](https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks) ### [ glibc ](https://github.com/lattera/glibc) [ [`source`] ](https://github.com/lattera/glibc/blob/master/malloc/malloc.c) ### [ musl ](https://git.musl-libc.org/cgit/musl/tree/src) [ [`source`] ](https://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c?id=e5d78fe8df9bd61940abcd98ad07ed69b7da4350#n41) ```clike #define OVERHEAD (2*sizeof(size_t)) ... #define CHUNK_TO_MEM(c) (void *)((char *)(c) + OVERHEAD) ``` 上面這段 code 是節錄自 musl 裡的 malloc.c, 在 [`malloc`](https://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c?id=e5d78fe8df9bd61940abcd98ad07ed69b7da4350#n326)這個 function 裡會使用到, 所以理解 musl 怎麼實作 malloc 之前, 先來看一下這兩個 macro 是什麼意思? 首先看這裡 ==(void *)((char *)(c ))==, 依照 [C99 規格書 (6.2.5, page 36),](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 這是一個合法的轉換, 但是, ==為什麼要做這個轉換呢? 為什麼要加一個常數 2*sizeof(size_t) 呢?== 底下會繼續說明 size_t 是什麼 [ [`C99 規格書, §7.17, page 254`] ](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) ```shell size_t which is the unsigned integer type of the result of the sizeof operator ``` 接下來看 data alignment 這個概念 ## X Macro * 透過實作來說明這個巨集 ## Incompatibilities Between ISO C and ISO C++ * Designated initializers(C99 support but C11 does not support) 這裡會依照 C11, C99 這兩個標準來舉出幾個不相容的特性 ## Experimenting with _Generic() for parametric constness in C11 ### 測試環境 ```shell $ cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 9 (stretch)" NAME="Debian GNU/Linux" VERSION_ID="9" VERSION="9 (stretch)" ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" $ cat /proc/version Linux version 2.6.32-042stab133.1 (root@kbuild-rh6-x64.eng.sw.ru) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Thu Aug 16 13:14:51 MSK 2018 ``` ## 心得 在寫這份學習歷程的過程中,覺得還有非常多的知識尚未掌握,每一個直播每一個共筆都有相當多的延伸主題可以寫,這不只是寫作業,也是一個整理相關知識的自我學習的過程,所以,雖然作業截止日是9/25,不過==9/25之後還是會想要繼續補充這份筆記== ## Reference - [ ] [the-new-c-declarations-initializations](http://www.drdobbs.com/the-new-c-declarations-initializations/184401377) - [ ] [SEI CERT C Coding Standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard) - [ ] [C99 規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) - [ ] [C11 規格書](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf) - [ ] [Understanding The Linux Virtual Memory Manager](https://www.kernel.org/doc/gorman/pdf/understand.pdf) :::danger 持續更新中... :::