# 2018q3 Homework1
contributed by < [`butastur`](https://github.com/butastur-rtos) >
###### tags: `sysprog2018` `c11` `c99` `macro` `list` `neon` `linked list` `cache` `memory leak`
* [你所不知道的 C 語言: 指標篇](https://hackmd.io/s/HyBPr9WGl#)
* [你所不知道的 C 語言: 編譯器和最佳化原理篇](https://hackmd.io/s/Hy72937Me)
* [你所不知道的 C 語言: 物件導向程式設計篇](https://hackmd.io/s/HJLyQaQMl)
* [你所不知道的 C 語言: 未定義行為篇](https://hackmd.io/s/Skr9vGiQm)
* [你所不知道的 C 語言: 前置處理器應用篇](https://hackmd.io/s/S1maxCXMl)
* [你所不知道的 C 語言: linked list 和非連續記憶體操作](https://hackmd.io/s/SkE33UTHf)
* [你所不知道的 C 語言:記憶體管理、對齊及硬體特性](https://hackmd.io/s/BkuMDQ9K7)
* [現代處理器設計: Cache 原理和實際影響](https://hackmd.io/s/HkW3Dr1Rb)
* [以位元駕馭能量](https://hackmd.io/s/rk2RO0cYX)
* [Memory Allocation](https://hackmd.io/s/B1SRlfeee#)
* malloc 實作的觀摩
* [X Macro](https://blog.noctua-software.com/c-tricks.html)
* [Incompatibilities Between ISO C and ISO C++](http://david.tribble.com/text/cdiffs.htm)
* [Experimenting with _Generic() for parametric constness in C11](https://fanf.livejournal.com/144696.html)
## 指標篇
* 用cdecl 解釋 C 的 declaration
* void * 和 char *
* Memory leak
* Segmentation fault
* Stack buffer overflows
### 用 cdecl 解釋 C 的 declaration
以下用 cdecl 的 explain 看一下 cast 和 declare 的差別
```shell
$ cdecl
Type `help' or `?' for help
cdecl> explain (char *)pt
cast pt into pointer to char
cdecl> explain char * pt
declare pt as pointer to char
cdecl>
```
以下是 cdecl 用 英文產生 C 的 declaration
:::info
這裡可以測試用英文描述的宣告正不正確, 換句話說,這裡不旦考驗英文,還考驗==宣告==的用字精不精準
:::
先來個簡單的 pointer to char
```shell
decl> declare a as pointer to char
char *a
```
然後看看 pointer to function
```shell
cdecl> declare a as pointer to function returning char
char (*a)()
```
pointer to function returning pointer to char
指標指向一個function, 這個function 傳回一個pointer to char
```shell
cdecl> declare a as pointer to function returning pointer to char
char *(*a)()
```
a 是一個指標,指向一個 function, 這個 function 傳回一個pointer,傳回的 pointer <b>指向一個 function</b>, 這個function 傳回一個 pointer to char
```shell
cdecl> declare a as pointer to function returning pointer to function returning pointer to char
char *(*(*a)())()
```
a 是一個 array, 由指標組成,<b>每一個指標都指向一個 function</b>, 每一個 function 都傳回一個指標,...
```shell
cdecl> declare a as array of pointer to function returning pointer to function returning pointer to char
char *(*(*a[])())()
```
一個帶有 parameter 的 function, 這個 function 傳回一個 pointr to struct
```shell
cdecl> declare a as function (int) returning pointer to struct b
struct b *a(int )
```
這裡我初次寫下來的時候把 int 和 char 的位置理解相反了,正確的理解是:最右邊的 char 代表的是傳回的 pointer <b>所指向的 function 的參數</b>
```shell
cdecl> declare a as function (int) returning pointer to function(char) returning int
int (*a(int ))(char )
```
### void * 和 char *
[C99 規格書, 6.2.5, page 36](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.
void * 和 char * 互換的範例
```clike
#include <stdio.h>
void main(){
char * str = "test";
void * voidptr = str;
printf("voidptr: %s\r\n", voidptr);
}
```
```shell
$ gcc voidptr.c -o voidptr
$ ./voidptr
voidptr: test
```
:::warning
C 語言不保證你可以安全地轉換 void * 為任意型態後,再轉換回原本的型態
[你所不知道的 C 語言:記憶體管理、對齊及硬體特性](https://hackmd.io/s/BkuMDQ9K7)
:::
### Memory leak
`valgrind` 這個工具, 先裝起來試試看
```shell
$ apt-get install valgrind
```
```clike
#include <stdlib.h>
void main(){
void * ptr = malloc(200);
}
```
```shell
$ gcc memoryleak.c -o memoryleak
$ valgrind --leak-check=full ./memoryleak
==18931== Memcheck, a memory error detector
==18931== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==18931== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==18931== Command: ./memoryleak
==18931==
==18931==
==18931== HEAP SUMMARY:
==18931== in use at exit: 200 bytes in 1 blocks
==18931== total heap usage: 1 allocs, 0 frees, 200 bytes allocated
==18931==
==18931== 200 bytes in 1 blocks are definitely lost in loss record 1 of 1
==18931== at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==18931== by 0x1086C1: main (in /root/temp/ssh/lab0-c/memoryleak)
==18931==
==18931== LEAK SUMMARY:
==18931== definitely lost: 200 bytes in 1 blocks
==18931== indirectly lost: 0 bytes in 0 blocks
==18931== possibly lost: 0 bytes in 0 blocks
==18931== still reachable: 0 bytes in 0 blocks
==18931== suppressed: 0 bytes in 0 blocks
==18931==
==18931== For counts of detected and suppressed errors, rerun with: -v
==18931== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
因為 `malloc` 之後沒有 `free` , 所以發現 LEAK SUMMARY: `definitely lost: 200 bytes in 1 blocks`
### Segmentation fault
先做個實驗試試
```clike
void main(){
char * str = "fault";
*str = 'F';
}
```
```shell
$ gcc fault.c -o fault
$ ./fault
Segmentation fault
```
### Stack buffer overflows
gcc version:
```shell
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
```
gcc 跟 stack-protector 有關的參數有4個:
-fno-stack-protector
-fstack-protector
-fstack-protector-strong
-fstack-protector-all
```shell
$ man gcc 1
```
先看一段 code
```clike=
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(){
char * str = malloc(sizeof(char) * 4);
char arr[4];
strcpy(str, "12345678");
strcpy(arr, "12345678");
printf("str: %s\r\n", str);
printf("arr: %s\r\n", arr);
}
```
這段 code 有幾個地方可以探討:
- str 有沒有被 allocate 一段記憶體?
- char * str 和 char str[4] 有什麼不一樣?
- 在<b>第7行</b>, value of arr 是不是 NULL?
- compile 的時候加 `-fstack-protector`
- compile 的時候加 `-fstack-protector-all`
- Stack smashing
compile 的時候==不加== `-fstack-protector`
```shell
$ gcc stackprotector.c -o stackprotector
$ ./stackprotector
Segmentation fault
```
compile 的時候==加== `-fstack-protector`
```shell
$ gcc stackprotector.c -fstack-protector -o stackprotector
$ ./stackprotector
Segmentation fault
```
compile 的時候==加== `-fstack-protector-all`
```shell
$ gcc stackprotector.c -fstack-protector-all -o stackprotector
$ ./stackprotector
str: 12345678
arr: 12345678
*** stack smashing detected ***: ./stackprotector terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7ffa481d7bfb]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x7ffa482601f7]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x7ffa482601c0]
./stackprotector(+0x7f4)[0x55bcc9fe77f4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7ffa481872e1]
./stackprotector(+0x65a)[0x55bcc9fe765a]
======= Memory map: ========
55bcc9fe7000-55bcc9fe8000 r-xp 00000000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector
55bcca1e7000-55bcca1e8000 r--p 00000000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector
55bcca1e8000-55bcca1e9000 rw-p 00001000 08:11 54658412 /root/temp/ssh/lab0-c/stackprotector
55bcca76a000-55bcca78b000 rw-p 00000000 00:00 0 [heap]
7ffa47f50000-7ffa47f66000 r-xp 00000000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffa47f66000-7ffa48165000 ---p 00016000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffa48165000-7ffa48166000 r--p 00015000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffa48166000-7ffa48167000 rw-p 00016000 08:11 40741947 /lib/x86_64-linux-gnu/libgcc_s.so.1
7ffa48167000-7ffa482fc000 r-xp 00000000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so
7ffa482fc000-7ffa484fc000 ---p 00195000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so
7ffa484fc000-7ffa48500000 r--p 00195000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so
7ffa48500000-7ffa48502000 rw-p 00199000 08:11 40741846 /lib/x86_64-linux-gnu/libc-2.24.so
7ffa48502000-7ffa48506000 rw-p 00000000 00:00 0
7ffa48506000-7ffa48529000 r-xp 00000000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so
7ffa4870e000-7ffa48710000 rw-p 00000000 00:00 0
7ffa48725000-7ffa48729000 rw-p 00000000 00:00 0
7ffa48729000-7ffa4872a000 r--p 00023000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so
7ffa4872a000-7ffa4872b000 rw-p 00024000 08:11 40741792 /lib/x86_64-linux-gnu/ld-2.24.so
7ffa4872b000-7ffa4872c000 rw-p 00000000 00:00 0
7ffd3b4b9000-7ffd3b4ce000 rw-p 00000000 00:00 0 [stack]
7ffd3b526000-7ffd3b528000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted
```
* Stack smashing
先解釋一下 [ `stack buffer overflow` ](https://en.wikipedia.org/wiki/Stack_buffer_overflow)
* 觀摩一下 Linus Torvalds 怎麼處理 stack protector
[ [ `source` ] ](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/stackprotector.h#L5)
Stack protector works by putting predefined pattern at the start of the stack frame and verifying that it hasn't been overwritten when returning from the function. The pattern is called ==stack canary== and unfortunately gcc requires it to be at a fixed offset from %gs.
* Stack canary
## 編譯器和最佳化原理篇
* Relocation
* JIT 實做案例
* IR (Intermediate Representation)
* 親手打造 Dynamic Library Loader
* Study SSA
* 提問
### Relocation
* 舉例說明一下 Relocation
### JIT 實做案例
#### [AMaCC](https://github.com/jserv/amacc)
#### [jit-construct](https://github.com/jserv/jit-construct)
### IR (Intermediate Representation)
以下是一個 IR 的 example, 透過 AMaCC 產生, 細節可以參考 [ `How IR works inside AMaCC` ](https://github.com/jserv/amacc/blob/master/docs/IR.md)
```shell
$ ./amacc -s -o hello hello.c
1: int main(int argc, char ** argv){
2: printf("hello world\n\r");
ENT 0
IMM -1219248120
PSH
PRTF
ADJ 1
3: return 0;
IMM 0
LEV
4: }
LEV
```
### [親手打造 Dynamic Library Loader](http://blog.linux.org.tw/~jserv/archives/002120.html)
* Relocation
* dlopen, dlsym, dlclose
### SSA (Static Single Assignment)
LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly.
...
A key design point of an SSA-based representation is how it represents memory.
...
[ [`https://llvm.org/docs/LangRef.html`] ](https://llvm.org/docs/LangRef.html)
以上節錄自 LLVM 的官方網站,那麼具體來說, ==SSA究竟是什麼樣子? 用在什麼地方呢?== 底下會透過實驗,逐步增加對 SSA form的了解
節錄自gnu的官方網站:
Most of the tree optimizers rely on the data flow information provided by the Static Single Assignment (SSA) form
[ [`13.3 Static Single Assignment`] ](https://gcc.gnu.org/onlinedocs/gccint/SSA.html#SSA)
先看一下怎麼用 gcc dump SSA:
```clike
#include <stdio.h>
void main(int argc, char **argv) {
int a, b;
a = 1;
a = 2;
b = 3;
b = 4;
printf("b=%d", b);
}
```
```shell
$ gcc -c -fdump-tree-ssa=ssa-out test.c
$ cat ssa-out
;; Function main (main, funcdef_no=0, decl_uid=2210, cgraph_uid=0, symbol_order=0)
main (int argc, char * * argv)
{
int b;
int a;
<bb 2>:
a_1 = 1;
a_2 = 2;
b_3 = 3;
b_4 = 4;
printf ("b=%d", b_4);
return;
}
```
gcc 也可以輸出成 graphViz
```shell
$ gcc -c -fdump-tree-ssa-graph test.c
$ ls test.c.018t.ssa.dot
test.c.018t.ssa.dot
```
```graphviz
digraph "test.c.018t.ssa" {
overlap=false;
subgraph "cluster_main" {
style="dashed";
color="black";
label="main ()";
fn_0_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];
fn_0_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];
fn_0_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label="{ FREQ:0 |\<bb\ 2\>:\l\
|a_1\ =\ 1;\l\
|a_2\ =\ 2;\l\
|b_3\ =\ 3;\l\
|b_4\ =\ 4;\l\
|printf\ (\"b=%d\",\ b_4);\l\
|return;\l\
}"];
fn_0_basic_block_0:s -> fn_0_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
fn_0_basic_block_2:s -> fn_0_basic_block_1:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
fn_0_basic_block_0:s -> fn_0_basic_block_1:n [style="invis",constraint=true];
}
}
```
接下來比較一下 -fdump-tree-ssa-graph 和 -fdump-tree-cfg-graph 的差別
```shell
$ gcc -c -fdump-tree-cfg-graph test.c
$ ls test.c.011t.cfg.dot
test.c.011t.cfg.dot
```
```graphviz
digraph "test.c.011t.cfg" {
overlap=false;
subgraph "cluster_main" {
style="dashed";
color="black";
label="main ()";
fn_0_basic_block_0 [shape=Mdiamond,style=filled,fillcolor=white,label="ENTRY"];
fn_0_basic_block_1 [shape=Mdiamond,style=filled,fillcolor=white,label="EXIT"];
fn_0_basic_block_2 [shape=record,style=filled,fillcolor=lightgrey,label="{ FREQ:0 |\<bb\ 2\>:\l\
|a\ =\ 1;\l\
|a\ =\ 2;\l\
|b\ =\ 3;\l\
|b\ =\ 4;\l\
|printf\ (\"b=%d\",\ b);\l\
|return;\l\
}"];
fn_0_basic_block_0:s -> fn_0_basic_block_2:n [style="solid,bold",color=blue,weight=100,constraint=true, label="[0%]"];
fn_0_basic_block_2:s -> fn_0_basic_block_1:n [style="solid,bold",color=black,weight=10,constraint=true, label="[0%]"];
fn_0_basic_block_0:s -> fn_0_basic_block_1:n [style="invis",constraint=true];
}
}
```
* In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR),...
Converting ordinary code into SSA form is primarily a simple matter of replacing the target of each assignment with a new variable... [ [ `Static single assignment form`] ](https://en.wikipedia.org/wiki/Static_single_assignment_form)
#### 轉成 SSA form 之後的最佳化場景
* The algorithm operates by performing abstract interpretation of the code in SSA form. During abstract interpretation, it typically uses a flat lattice of constants for values and a global environment mapping SSA variables to values in this lattice. The crux of the algorithm comes in how it handles the interpretation of branch instructions...[ [`Sparse conditional constant propagation`] ](https://en.wikipedia.org/wiki/Sparse_conditional_constant_propagation)
* Constant Propagation with Conditional Branches
1991年 MARK N. WEGMAN and F. KENNETH ZADECK 發表了一篇關於 Constant Propagation 的論文,裡面探討了使用 Constant Propagation Algorithms 對編譯器作最佳化的理論和實作
### 提問
:::warning
:question:
能否把 SSA form 理解為: 一種 representation,在 translator 將 oridinary code 轉成 SSA form 之後,由於 SSA form 能把原本複雜的 branch 簡化,所以可以更容易地對 SSA form 作最佳化?
:::
:::success
原本編譯器最佳化就像下棋,需要前後反覆思忖,才能擬定策略,而且一有變動,就要重新佈局。SSA 的突破在於,將前述策略改為 Graph,後者已經是電腦科學中充分探討的題材。
比方說,Graph 裡頭的 dominator 是說,當一個點 A 到點 B 在控制流程圖中,若沒有其他的路線,那麼 A 及 B 就是 dominator,後者意思為著如果程式進行到 B 就代表著 A 一定也會執行到,我們可以說 A 支配著 B,反過來說,B 也支配著 A。這點就決定了最佳化的可行路徑。
:notes: jserv
:::
:::warning
:question:
是否表示 SSA 將 compiler 最佳化的問題轉成等價的圖論上的問題?
:::
## 未定義行為篇
* [6.2.4 Storage durations of objects
](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
* [SEI CERT C Coding Standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard)
先舉一個例子來說明 C11, C99 在6.2.4提及的Storage durations of objects
[ [ source ] ](https://wiki.sei.cmu.edu/confluence/display/c/DCL30-C.+Declare+objects+with+appropriate+storage+durations)
```clike=
const char *p;
void dont_do_this(void) {
const char c_str[] = "This will change";
p = c_str; /* Dangerous */
}
```
If an object is referred to outside of its lifetime, the behavior is undefined...
[ [`C99 規格書, 6.2.4 Storage durations of objects
`] ](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
<b>第 4 行</b>是一個 UB,因為 value of p 是 c_str 的 address,但 c_str 這個 object的 lifetime 只在 dont_do_this 裡,如果 c_str 在 lifetime 外部被 referred 到,就是一個未定義行為
可以用這個方法解決
```clike
const char c_str[] = "This will change";
p = strdup(c_str);
```
## 前置處理器應用篇
以 llvm 的 MemorySSA 為例 [ [ source ] ](https://github.com/llvm-mirror/llvm/blob/master/include/llvm/PassSupport.h),考慮以下 macro [PassSupport.h](https://github.com/llvm-mirror/llvm/blob/master/include/llvm/PassSupport.h):
```clike
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis) \
static void *initialize##passName##PassOnce(PassRegistry &Registry) {
#define INITIALIZE_PASS_DEPENDENCY(depName) initialize##depName##Pass(Registry);
#define INITIALIZE_AG_DEPENDENCY(depName) \
initialize##depName##AnalysisGroup(Registry);
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis) \
PassInfo *PI = new PassInfo( \
name, arg, &passName::ID, \
PassInfo::NormalCtor_t(callDefaultCtor<passName>), cfg, analysis); \
Registry.registerPass(*PI, true); \
return PI; \
} \
static llvm::once_flag Initialize##passName##PassFlag; \
void llvm::initialize##passName##Pass(PassRegistry &Registry) { \
llvm::call_once(Initialize##passName##PassFlag, \
initialize##passName##PassOnce, std::ref(Registry)); \
}
```
MemorySSA.cpp [ [ source ] ](https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/MemorySSA.cpp)
```clike
INITIALIZE_PASS_BEGIN(MemorySSAPrinterLegacyPass, "print-memoryssa",
"Memory SSA Printer", false, false)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_END(MemorySSAPrinterLegacyPass, "print-memoryssa",
"Memory SSA Printer", false, false)
```
會產生以下的程式碼
```clike
static void *initializeMemorySSAPrinterLegacyPassPassOnce(PassRegistry &Registry) {
initializeMemorySSAWrapperPassPass(Registry);
PassInfo *PI = new PassInfo(
"Memory SSA Printer", "print-memoryssa", &MemorySSAPrinterLegacyPass::ID,
PassInfo::NormalCtor_t(callDefaultCtor<MemorySSAPrinterLegacyPass>), false, false);
Registry.registerPass(*PI, true);
return PI;
}
static llvm::once_flag InitializeMemorySSAPrinterLegacyPassPassFlag;
void llvm::initializeMemorySSAPrinterLegacyPassPass(PassRegistry &Registry) {
llvm::call_once(InitializeMemorySSAPrinterLegacyPassPassFlag,
initializeMemorySSAPrinterLegacyPassPassOnce, std::ref(Registry));
}
```
## linked list 和非連續記憶體操作
* [list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h)
* [WRITE_ONCE](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h#L273)
* [READ_ONCE](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h#L258)
先從觀察這兩個 macro 開始: WRITE_ONCE, READ_ONCE,在list.h裡,這兩個macro出現的頻率相當高,這兩個macro的定義是在 [compiler.h](https://github.com/torvalds/linux/blob/master/include/linux/compiler.h)
```clike
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
default:
barrier();
__builtin_memcpy((void *)p, (const void *)res, size);
barrier();
}
}
#define WRITE_ONCE(x, val) \
({ \
union { typeof(x) __val; char __c[1]; } __u = \
{ .__val = (__force typeof(x)) (val) }; \
__write_once_size(&(x), __u.__c, sizeof(x)); \
__u.__val; \
})
```
[ [`source`] ](https://github.com/torvalds/linux/blob/master/include/linux/list.h#L103)
```clike
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
WRITE_ONCE(prev->next, next);
}
```
先觀察WRITE_ONCE(prev->next, next)會被前置處理器轉成什麼樣的code
```clike
({
union { typeof(prev->next) __val; char __c[1]; } __u =
{ .__val = (__force typeof(prev->next)) (next) };
__write_once_size(&(prev->next), __u.__c, sizeof(prev->next));
__u.__val;
})
```
__u.__c 會被 pass 到__write_once_size 的第二個parameter, 所以先從 __u 這個 union 來觀察
```clike
union {
typeof(prev->next) __val;
char __c[1];
} __u = {
.__val = (__force typeof(prev->next)) (next)
};
```
:::success
上面的程式碼剛好搭配 Week2 進度的 union
:notes: jserv
:::
```clike
WRITE_ONCE(prev->next, next);
```
上面的 macro 轉換後, prev->next 的 value 就是 next, 於是 prev->next 就指向 next, 然後, 在 prev, next 之間的 element of list 就被刪除了
### 提問
:::warning
:question:
從 __list_del 到 WRITE_ONCE 再到 __write_once_size, 都沒有看到 free, 所以如果 element of list 是 malloc 而來的object, 是不是不能用 __list_del ?
>[name=butastur-rtos][time=Sun, Sep 30, 2018 11:59 PM]
:::
:::warning
:question:
是不是可以說如果要用 list_for_each_entry_safe 來 free,都要以下的方式來 free?
>[name=butastur-rtos][time=Wed, Oct 24, 2018 6:49 AM]
:::
以下是對 p 作 free
[source](https://github.com/torvalds/linux/blob/master/kernel/exit.c#L735)
```clike
list_for_each_entry_safe(p, n, &dead, ptrace_entry) {
list_del_init(&p->ptrace_entry);
release_task(p);
}
```
## 記憶體管理、對齊及硬體特性
* [What Every Programmer Should Know About Memory](https://www.akkadia.org/drepper/cpumemory.pdf)
這篇文章的作者 Drepper 同時也是 [glibc](https://github.com/lattera/glibc/graphs/contributors) 的開發者之一
* 對自己的應用開發自己的 memory allocator 是應該的
* arena 是一塊連續的記憶體
* MMU(Memory-Management Unit) 是硬體,通常是CPU的一部份。
* page fault
[[`Page 14, 15, 18, 47, Virtual Memory and Linux`] ](https://events.static.linuxfound.org/sites/events/files/slides/elc_2016_mem.pdf)
page fault 是一個 CPU exception。
當 Virtual memory 有 invalid access 的時候 MMU 會產生一個 exception(page fault)。invalid access 包含了 unmapped address 或是對特定位址的記憶體的存取權限不足,還有一種情況是存取一個己經 swapped out 的 address。
* page fault handler
### page fault handler
看一下這份文件怎麼說
[page fault handler](https://www.kernel.org/doc/Documentation/x86/exception-tables.txt)
```shell
Whenever the kernel tries to access an address that is currently not
accessible, the CPU generates a page fault exception and calls the
page fault handler
```
當存取一個 ==目前不能存取的 address==,CPU 會觸發一個 page fault exception,然後呼叫 page fault handler。
### 提問
:::warning
:question:
> * MMU 一定是硬體嗎?
> * MMU 如果不在 CPU 還有可能存在於什麼地方?
> * MMU 有沒有可能是以軟體的方式存在?
> [name='butastur-rtos'][time=Sun, Oct 7, 2018 4:26 PM]
:::
## [現代處理器設計: Cache 原理和實際影響](https://hackmd.io/s/HkW3Dr1Rb)
* [有趣的 cache line 效應](https://champyen.blogspot.com/2017/05/cache-line.html)
## [以位元駕馭能量](https://hackmd.io/s/rk2RO0cYX)
* [NEON](https://developer.arm.com/technologies/neon)
* [Coding for NEON - Part 1: Load and Stores](https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores)
* SIMD
* [LCU14-504: Taming ARMv8 NEON: from theory to benchmark results](https://www.youtube.com/watch?v=ixuDntaSnHI)
* [Auto-vectorization in GCC](https://gcc.gnu.org/projects/tree-ssa/vectorization.html)
[ [`source`] ](https://www.youtube.com/watch?v=ixuDntaSnHI)
```clike
uint8x8_t a, b;
uint16x8_t a16, b16, c;
/* The naive way */
a16 = vmovl_u8(a);
b16 = vmovl_u8(b);
c = vadd_u16(a16, b16);
/* The smarter way */
c = vaddl_u8(a, b);
```
```clike
vst1.64 {d0-d1}, [r7:64]
vld1.64 {d16-d17}, [r7:64]
vstr d17, [r7, #24]
vldr d18, [r7, #32]
```
要理解以上這4行, 有幾個 keywords 要先看一下 [ [`Coding for NEON - Part 1: Load and Stores`] ](https://community.arm.com/processors/b/blog/posts/coding-for-neon---part-1-load-and-stores):
* Instruction mnemonic
* Interleave Pattern
* Address in ARM register, e.g. <b>[r7:64]</b>
* NEON register, e.g. <b>d0</b>
<b>vst</b>1, <b>vld</b>1, 粗體的地方是instruction mnemonic, 分別是 store, load
vst<b>1</b>, vld<b>1</b>, 粗體的地方是interleave pattern (1, 2, 3 or 4)
[ [`4.2.1, page 201`] ](http://infocenter.arm.com/help/topic/com.arm.doc.dui0489e/DUI0489E_arm_assembler_reference.pdf)
The VLDR instruction loads an extension register from memory. The VSTR instruction saves the contents of an extension register to memory.
double_elements 節錄自 [[`NEON intrinsic example`]](https://developer.arm.com/technologies/neon)
```shell
$ cat neon.c
#include <arm_neon.h>
#include <stdio.h>
uint32x4_t double_elements(uint32x4_t input)
{
return(vaddq_u32(input, input));
}
int main(int argc, char **argv){
uint32x4_t input ={-1,0,-1,0};
double_elements(input);
return 0;
}
$ arm-linux-gnueabihf-gcc -o neon.s neon.c -S -O0 -Wall -ftree-vectorize -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard
```
輸出的 neon.s 的內容會是這樣, fpu 在第 2 行
```clike=
...
.fpu neon-vfpv4
.type double_elements, %function
double_elements:
@ args = 0, pretend = 0, frame = 48
@ frame_needed = 1, uses_anonymous_args = 0
@ link register save eliminated.
push {r7}
sub sp, sp, #52
add r7, sp, #0
vst1.64 {d0-d1}, [r7:64]
vld1.64 {d16-d17}, [r7:64]
vstr d16, [r7, #32]
vstr d17, [r7, #40]
vld1.64 {d16-d17}, [r7:64]
vstr d16, [r7, #16]
vstr d17, [r7, #24]
vldr d18, [r7, #32]
vldr d19, [r7, #40]
vldr d16, [r7, #16]
vldr d17, [r7, #24]
vadd.i32 q8, q9, q8
vmov q0, q8 @ v4si
adds r7, r7, #52
mov sp, r7
...
```
## Memory Allocation
* [Anonymous memory](https://flylib.com/books/en/2.830.1.111/1/)
* unmapped address space
* mmap
* shrink heap
[ [`source`] ](https://github.com/torvalds/linux/blob/master/include/linux/page-flags.h#L390)
```clike
/*
* On an anonymous page mapped into a user virtual memory area,
* page->mapping points to its anon_vma, not to a struct address_space;
* with the PAGE_MAPPING_ANON bit set to distinguish it. See rmap.h.
*
* On an anonymous page in a VM_MERGEABLE area, if CONFIG_KSM is enabled,
* the PAGE_MAPPING_MOVABLE bit may be set along with the PAGE_MAPPING_ANON
* bit; and then page->mapping points, not to an anon_vma, but to a private
* structure which KSM associates with that merged page. See ksm.h.
*
* PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable
* page and then page->mapping points a struct address_space.
*
* Please note that, confusingly, "page_mapping" refers to the inode
* address_space which maps the page from disk; whereas "page_mapped"
* refers to user virtual address space into which the page is mapped.
*/
#define PAGE_MAPPING_ANON 0x1
#define PAGE_MAPPING_MOVABLE 0x2
#define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE)
static __always_inline int PageMappingFlags(struct page *page)
{
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0;
}
static __always_inline int PageAnon(struct page *page)
{
page = compound_head(page);
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
}
static __always_inline int __PageMovable(struct page *page)
{
return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) ==
PAGE_MAPPING_MOVABLE;
}
```
以下透過交叉參考 [rmap.h](https://github.com/torvalds/linux/blob/master/include/linux/rmap.h#L16), [ksm.h](https://github.com/torvalds/linux/blob/master/include/linux/ksm.h) 來說明 anonymous page 是如何動作的
:::info
* anonymous page 是指沒有 mapped 到檔案的記憶體
* page_mapping, page_mapped 這兩個會引起誤會:
<b>page_mapping 是指 maps the page from disk,是一個==動詞==
page_mapped 是指 user virtual address space 所 mapped 的 page,是一個==名詞==</b>
* rmap.h 定義了 anon_vma
:::
## malloc 實作的觀摩
### libc
* [Malloc-Examples](https://www.gnu.org/software/libc/manual/html_node/Malloc-Examples.html)
理解 malloc 之前, 要先理解一下什麼是記憶體對齊 [`Aligned-Memory-Blocks`](https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html#Aligned-Memory-Blocks)
### [ glibc ](https://github.com/lattera/glibc)
[ [`source`] ](https://github.com/lattera/glibc/blob/master/malloc/malloc.c)
### [ musl ](https://git.musl-libc.org/cgit/musl/tree/src)
[ [`source`] ](https://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c?id=e5d78fe8df9bd61940abcd98ad07ed69b7da4350#n41)
```clike
#define OVERHEAD (2*sizeof(size_t))
...
#define CHUNK_TO_MEM(c) (void *)((char *)(c) + OVERHEAD)
```
上面這段 code 是節錄自 musl 裡的 malloc.c, 在 [`malloc`](https://git.musl-libc.org/cgit/musl/tree/src/malloc/malloc.c?id=e5d78fe8df9bd61940abcd98ad07ed69b7da4350#n326)這個 function 裡會使用到, 所以理解 musl 怎麼實作 malloc 之前, 先來看一下這兩個 macro 是什麼意思?
首先看這裡 ==(void *)((char *)(c ))==, 依照 [C99 規格書 (6.2.5, page 36),](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 這是一個合法的轉換, 但是, ==為什麼要做這個轉換呢? 為什麼要加一個常數 2*sizeof(size_t) 呢?== 底下會繼續說明
size_t 是什麼
[ [`C99 規格書, §7.17, page 254`] ](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
```shell
size_t
which is the unsigned integer type of the result of the sizeof operator
```
接下來看 data alignment 這個概念
## X Macro
* 透過實作來說明這個巨集
## Incompatibilities Between ISO C and ISO C++
* Designated initializers(C99 support but C11 does not support)
這裡會依照 C11, C99 這兩個標準來舉出幾個不相容的特性
## Experimenting with _Generic() for parametric constness in C11
### 測試環境
```shell
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ cat /proc/version
Linux version 2.6.32-042stab133.1 (root@kbuild-rh6-x64.eng.sw.ru) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) ) #1 SMP Thu Aug 16 13:14:51 MSK 2018
```
## 心得
在寫這份學習歷程的過程中,覺得還有非常多的知識尚未掌握,每一個直播每一個共筆都有相當多的延伸主題可以寫,這不只是寫作業,也是一個整理相關知識的自我學習的過程,所以,雖然作業截止日是9/25,不過==9/25之後還是會想要繼續補充這份筆記==
## Reference
- [ ] [the-new-c-declarations-initializations](http://www.drdobbs.com/the-new-c-declarations-initializations/184401377)
- [ ] [SEI CERT C Coding Standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard)
- [ ] [C99 規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)
- [ ] [C11 規格書](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf)
- [ ] [Understanding The Linux Virtual Memory Manager](https://www.kernel.org/doc/gorman/pdf/understand.pdf)
:::danger
持續更新中...
:::