owned this note
owned this note
Published
Linked with GitHub
# 2021q1 Homework1 (quiz1)
contributed by < `ambersun1234` >
> [2021q1 第 1 週測驗題](https://hackmd.io/@sysprog/linux2021-quiz1)
## 開發環境
```shell
$ uname -a
Linux station 5.8.0-44-generic #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
```
## 解釋程式運作原理
```c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
typedef struct __node {
int value;
struct __node *next;
} node_t;
static inline void list_add_node_t(node_t **list, node_t *node_t) {
node_t->next = *list;
*list = node_t;
}
static inline void list_concat(node_t **left, node_t *right) {
while (*left)
left = &((*left)->next);
*left = right;
}
void quicksort(node_t **list)
{
if (!*list)
return;
node_t *pivot = *list;
int value = pivot->value;
node_t *p = pivot->next;
pivot->next = NULL;
node_t *left = NULL, *right = NULL;
while (p) {
node_t *n = p;
p = p->next;
list_add_node_t(n->value > value ? &right : &left, n);
}
quicksort(&left);
quicksort(&right);
node_t *result = NULL;
list_concat(&result, left);
list_concat(&result, pivot); list_concat(&result, right);
*list = result;
}
```
`quicksort` 函式取 list 中第一個元素作為 pivot value,在迴圈內部從第二個元素開始依序比較;若元素值大於 pivot 則將其加入 `right` 的 list 中,反之則加入至 `left`。
`list_add_node_t` 採用 insert head 的方式,即將原本的 list 串接在後面
```graphviz
digraph foo {
rankdir=LR;
node [shape=record];
a [label="{<data> 12| <ref>}", width=1.2];
b [label="right(NULL)}"];
a:ref:c -> b [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2];
}
```
實際上的串接方式就如下圖所示,足以證明為何 `left`, `right` 初始值為 NULL。
```graphviz
digraph foo {
rankdir=LR;
node [shape=record];
a [label="{<data> 12| <ref>}", width=1.2];
b [label="{<data> 13(right)| <ref>}"];
c [label="{<data> 14| <ref>}"];
d [label="NULL", shape="box"];
a:ref:c -> b [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2];
b:ref:c -> c [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
c:ref:c -> d [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
}
```
divide 完之後就要把 list 合併,`list_concat` 就是單純將 right 串接在 left 後面而已;總共需要合併 3 次,left, pivot, right 因為 pivot 並未連接在任何 list 後面。
為了可以進行實驗,需要補齊上述程式碼; `list_make_node_t` 以及 `list_free` 的實做程式碼如下
```cpp
static void list_free(node_t **list) {
while (*list) {
node_t *tmp = (*list)->next;
free(list);
list = &tmp;
}
}
static node_t* list_make_node_t(node_t *list, int value) {
node_t *newNode = (node_t *)malloc(sizeof(node_t));
newNode->value = value;
newNode->next = list;
return newNode;
}
```
`list_free` 從頭走訪到 list 尾端,依序釋放記憶體
`list_make_node_t` 新增一個 node,其 value 為 `random() % 1024` 所產生。將新的 node 放置於 list head,next 則接上原本的 list。
---
## Pseudorandom number generator
+ 執行 list.c 中可發現輸出結果都一樣,這是因為 [pseudorandom number generator(PRNG)](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) 並不是真正的亂數,他受到初始值(PRNG's seed)影響所致。
+ 一開始我採用 [multiplicative LCG](https://en.wikipedia.org/wiki/Linear_congruential_generator) 即 `c=0` 的 LCG
+ LCG 數學公式定義如下 $$y_{i+1}\ =\ [ay_i + c]\ mod\ m$$
+ 由上述公式可得知一個基本訊息,第 i+1 結果取決於第 i 次的結果,而在 y0 的挑選上如果是固定的,那不管執行幾次都是一樣的結果
+ 更改過後的程式如下
```cpp
static int MLCG(int seed) {
int a = 15997, m = 65536, c = 0;
return (a*seed + c)%m;
}
int main(int argc, char **argv) {
srand(time(NULL));
size_t count = 20;
int previousMLCG = random() % 1024;
node_t *list = NULL;
while (count--) {
previousMLCG = MLCG(previousMLCG);
list = list_make_node_t(list, previousMLCG);
}
}
```
程式執行結果為
```
$ ./a.out
1614245903(time(NULL) 輸出結果)
NOT IN ORDER : 36439 13411 21599 18699 5159 43635 12207 58011 30967 43907 45567 28459 3271 51091 43855 65211 8087 36515 60319 41291
IN ORDER : 3271 5159 8087 12207 13411 18699 21599 28459 30967 36439 36515 41291 43635 43855 43907 45567 51091 58011 60319 65211
$ ./a.out
1614245904(time(NULL) 輸出結果)
NOT IN ORDER : 36439 13411 21599 18699 5159 43635 12207 58011 30967 43907 45567 28459 3271 51091 43855 65211 8087 36515 60319 41291
IN ORDER : 3271 5159 8087 12207 13411 18699 21599 28459 30967 36439 36515 41291 43635 43855 43907 45567 51091 58011 60319 65211
```
結果驚訝的發現輸出亂數的部份竟是如此的相似,於是反過頭檢查 `time(NULL)` 輸出結果只相差 1。
根據 [time(2) man page](https://linux.die.net/man/2/time)
> time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC).
> If t is non-NULL, the return value is also stored in the memory pointed to by t.
因為 time() 精度只有到 seconds 所以會導致輸出結果太接近
既然使用時間是不可靠的,於是我就嘗試去取一些常常變動的數值,受到 [@linD026](https://hackmd.io/@linD026/2021q1_quiz1) 所啟發,可以用地址去當作 srand 的參數。`malloc` 出來的地址經過實驗時常變動,所以很適合拿來當作亂數種子
### ASLR(Address space layout randomization)
+ [ASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization) 是一種電腦資訊安全的技巧,避免 [memory corruption](https://en.wikipedia.org/wiki/Memory_corruption) 等各種資安問題。ASLR 機制隨機分配 process 的地址(stack, heap, libraries),使得地址變得難以預測
> 要注意 ASLR 僅是一種針對攻擊的 mitigation,並不能避免資安問題
> [name=HexRabbit][color=purple]
+ [return-to-libc attacks](https://en.wikipedia.org/wiki/Return-to-libc_attack) 一般應用於 buffer overflow attack 中; 攻擊手法為將 stack 中的返回地址替換為其他地址。使用 ASLR 可以讓 stack 的地址變得不可預測,使得攻擊變得更加困難
> ASLR 0: 關閉
> ASLR 1: Conservative Randomization
> ASLR 2: Full Randomization
+ [Position-independent executable(PIE)](https://zh.wikipedia.org/wiki/%E5%9C%B0%E5%9D%80%E6%97%A0%E5%85%B3%E4%BB%A3%E7%A0%81) 又稱作 PIC,通常搭配 ASLR 使用,達到 [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization)
考慮以下實驗結果
```shell
$ gcc -no-pie a.c
$ ./a.out &
$ cat /proc/PID/maps
```
```
ASLR 1, no PIE
00400000-00401000 r--p 00000000 08:10 40308 /home/ambersun/ASLR/a.out
00401000-00402000 r-xp 00001000 08:10 40308 /home/ambersun/ASLR/a.out
00402000-00403000 r--p 00002000 08:10 40308 /home/ambersun/ASLR/a.out
00403000-00404000 r--p 00002000 08:10 40308 /home/ambersun/ASLR/a.out
00404000-00405000 rw-p 00003000 08:10 40308 /home/ambersun/ASLR/a.out
00405000-00426000 rw-p 00000000 00:00 0 [heap]
7f425fac9000-7f425faee000 r--p 00000000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425faee000-7f425fc66000 r-xp 00025000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425fc66000-7f425fcb0000 r--p 0019d000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425fcb0000-7f425fcb1000 ---p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425fcb1000-7f425fcb4000 r--p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425fcb4000-7f425fcb7000 rw-p 001ea000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f425fcb7000-7f425fcbd000 rw-p 00000000 00:00 0
7f425fcc5000-7f425fcc6000 r--p 00000000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f425fcc6000-7f425fce9000 r-xp 00001000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f425fce9000-7f425fcf1000 r--p 00024000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f425fcf2000-7f425fcf3000 r--p 0002c000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f425fcf3000-7f425fcf4000 rw-p 0002d000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f425fcf4000-7f425fcf5000 rw-p 00000000 00:00 0
7ffe726da000-7ffe726fb000 rw-p 00000000 00:00 0 [stack]
7ffe7271c000-7ffe7271f000 r--p 00000000 00:00 0 [vvar]
7ffe7271f000-7ffe72721000 r-xp 00000000 00:00 0 [vdso]
```
```
ASLR 1, no PIE
00400000-00401000 r--p 00000000 08:10 40308 /home/ambersun/ASLR/a.out
00401000-00402000 r-xp 00001000 08:10 40308 /home/ambersun/ASLR/a.out
00402000-00403000 r--p 00002000 08:10 40308 /home/ambersun/ASLR/a.out
00403000-00404000 r--p 00002000 08:10 40308 /home/ambersun/ASLR/a.out
00404000-00405000 rw-p 00003000 08:10 40308 /home/ambersun/ASLR/a.out
00405000-00426000 rw-p 00000000 00:00 0 [heap]
7f0600b54000-7f0600b79000 r--p 00000000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600b79000-7f0600cf1000 r-xp 00025000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600cf1000-7f0600d3b000 r--p 0019d000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600d3b000-7f0600d3c000 ---p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600d3c000-7f0600d3f000 r--p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600d3f000-7f0600d42000 rw-p 001ea000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f0600d42000-7f0600d48000 rw-p 00000000 00:00 0
7f0600d50000-7f0600d51000 r--p 00000000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f0600d51000-7f0600d74000 r-xp 00001000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f0600d74000-7f0600d7c000 r--p 00024000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f0600d7d000-7f0600d7e000 r--p 0002c000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f0600d7e000-7f0600d7f000 rw-p 0002d000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f0600d7f000-7f0600d80000 rw-p 00000000 00:00 0
7ffff7f2b000-7ffff7f4c000 rw-p 00000000 00:00 0 [stack]
7ffff7f93000-7ffff7f96000 r--p 00000000 00:00 0 [vvar]
7ffff7f96000-7ffff7f98000 r-xp 00000000 00:00 0 [vdso]
```
可以看到 heap 的位置,兩者是一樣的
`00405000-00426000`
> c.f 64 位元下,no PIE 起點為 `0x00400000`
> c.f 32 位元下,no PIE 起點為 `0x08048000`
gcc 預設 PIE 是開啟的,需使用 `-no-pie` 關閉,參見 [gcc/defaults.h](https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/defaults.h#L1233)
> [关于 Linux 下 ASLR 与 PIE 的一些理解](https://www.cnblogs.com/rec0rd/p/7646857.html)
參考 [Performance and Entropy of Various ASLR Implementations](http://pages.cs.wisc.edu/~riccardo/736finalpaper.pdf),分析 ASLR 的 entropy,考慮 stack pointer 的位置
考慮以下程式碼
```cpp
static volatile void* getsp(void)
{
volatile void *sp;
__asm__ __volatile__ ("movq %%rsp,%0" : "=r" (sp) : /* No input */);
return sp;
}
int main(int argc, const char *argv[]) {
printf("sp: %p\n", getsp());
while (1);
return 0;
}
```
得到以下執行結果
```
sp: 0x7ffeb6b5bd10
```
對照 `/proc/PID/maps`
```
00400000-00401000 r--p 00000000 08:10 40306 /home/ambersun/ASLR/a.out
00401000-00402000 r-xp 00001000 08:10 40306 /home/ambersun/ASLR/a.out
00402000-00403000 r--p 00002000 08:10 40306 /home/ambersun/ASLR/a.out
00403000-00404000 r--p 00002000 08:10 40306 /home/ambersun/ASLR/a.out
00404000-00405000 rw-p 00003000 08:10 40306 /home/ambersun/ASLR/a.out
00405000-00426000 rw-p 00000000 00:00 0 [heap]
7f14bdaaf000-7f14bdad4000 r--p 00000000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdad4000-7f14bdc4c000 r-xp 00025000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdc4c000-7f14bdc96000 r--p 0019d000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdc96000-7f14bdc97000 ---p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdc97000-7f14bdc9a000 r--p 001e7000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdc9a000-7f14bdc9d000 rw-p 001ea000 08:10 37132 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f14bdc9d000-7f14bdca3000 rw-p 00000000 00:00 0
7f14bdcab000-7f14bdcac000 r--p 00000000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f14bdcac000-7f14bdccf000 r-xp 00001000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f14bdccf000-7f14bdcd7000 r--p 00024000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f14bdcd8000-7f14bdcd9000 r--p 0002c000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f14bdcd9000-7f14bdcda000 rw-p 0002d000 08:10 36843 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f14bdcda000-7f14bdcdb000 rw-p 00000000 00:00 0
7ffeb6b3c000-7ffeb6b5d000 rw-p 00000000 00:00 0 [stack]
7ffeb6bd4000-7ffeb6bd7000 r--p 00000000 00:00 0 [vvar]
7ffeb6bd7000-7ffeb6bd9000 r-xp 00000000 00:00 0 [vdso]
```
不難發現 sp 的位置有一點誤差,我想這也是可接受的範圍內(`0x7ffeb6b5d000`, `0x7ffeb6b5bd10`)
值得注意的是,上述 16 進位輸出共有 `48` 位元,而不是 64 位元
可參考 [why virtual address are 48 bits not 64 bits? [duplicate]
](https://stackoverflow.com/questions/63975447/why-virtual-address-are-48-bits-not-64-bits)
多次實驗觀察 stack pointer 可得以下結果
```
sp: 0x7fff296fa310
sp: 0x7fffd5825950
sp: 0x7ffc725181d0
sp: 0x7ffe6da0dfc0
sp: 0x7ffcd8fc69d0
...
```
可以發現到這些 virtual address 的 MSB 部分都是相同,而最後 4 bits 都是 0。這也就是為甚麼論文中實驗並不採用全部位元的原因
> For Debian, we observed 30 bits of entropy in the stack. This was bits
4 to 34 (least significant to most significant)
> 有興趣的話也可以觀察看看 libc 或相關 library 的地址在開啟 ASLR 下有多少 bits 的 entropy
> [name=HexRabbit][color=purple]
考慮以下測試程式碼
- [ ] main.c
```cpp
int main(int argc, const char *argv[]) {
if (fork() == 0) {
char *str[] = {"./test", NULL, NULL};
char *envp[] = {0};
if (execve("./test", str, envp) < 0) {
perror("error");
exit(0);
}
}
return 0;
}
```
- [ ] test.c
```cpp
int main(int argc, const char *argv[]) {
register void *p asm("sp");
printf("%p\n", p);
return 0;
}
```
為了驗證 ASLR 的 entropy,我參照論文設計了一個簡單的實驗,內容是取得 stack pointer 的位置去分析(搭配 execve 以及 fork)
根據 [man execve](https://man7.org/linux/man-pages/man2/execve.2.html)
> execve() executes the program referred to by pathname. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.
execve 會重新 new stack, heap 以及 data 區塊,藉由分析 stack pointer 我們可以知道 ASLR 的 entropy。這次實驗總共測試 1000000(一百萬) 次,分別在 32 位元(`raspberry pi zero wh(rapbian)`),以及 64 位元(`ubuntu 20.04 LTS`) 下測試
完整程式碼可參考 [linux2021q1_quiz1/ASLR](https://github.com/ambersun1234/linuxkernel_internals/tree/master/2021q1_quiz1/ASLR)
### 64 位元
```
地址: 重複次數
...
0x7ffeb9c5d880: 2
0x7fff0baab970: 2
0x7ffeada07720: 2
0x7ffdcce3da80: 2
0x7ffe8c644020: 2
0x7ffec6822460: 2
0x7ffe54d41c80: 2
0x7fffe12b40a0: 2
0x7ffda39f9960: 2
0x7fff7dabd5a0: 2
0x7ffcdb205a70: 3
```
總共統計結果
```
重複次數: 有多少地址重複n次
1: 999083
2: 457
3: 1
```
在1百萬重複次數中,僅有 917(457\*2+1\*3) 個地址重複到
### 32 位元
```
地址: 重複次數
...
0xbea55dd8: 544
0xbec38dd8: 545
0xbeb92dd8: 545
0xbeb4fdd8: 545
0xbeb1ddd8: 545
0xbea92dd8: 547
0xbee5cdd8: 549
0xbeaf1dd8: 551
0xbe9dedd8: 553
0xbe931dd8: 556
0xbeec9dd8: 559
0xbe927dd8: 566
0xbeb1fdd8: 570
0xbe992dd8: 572
```
可以看到在 32 位元架構下,重複機率很高,意味著攻擊者相較於 64 位元架構中,更容易猜到地址。
更改過的程式碼如下:
```cpp
int main(int argc, char **argv) {
int *tmp = (int *)malloc(sizeof(int));
int a = (intptr_t)tmp;
srand(a);
free(tmp);
size_t count = 20;
node_t *list = NULL;
while (count--) {
list = list_make_node_t(list, random() % 1024);
}
```
首先 malloc 一個空間出來,將其地址轉為儲存起來,使用 `intptr_t` 強制轉型(將 pointer 轉型為 integer),並且使用其作為亂數種子,使用完之後當然要把他 free 掉避免 memory leak。
:::warning
TODO: 回顧 [「Linux 核心設計」 課程說明 (2021年春季)](https://docs.google.com/presentation/d/1bJFwpg20GCJmOcdOt6NRYGU3HKVX-Dt0fsuohqTY8x4/edit?usp=sharing) 第 19 頁,嘗試學習早期 mimalloc 運用 ASLR 的手法。
:notes: jserv
:::
考慮 ASLR 以及 PIE 可以將程式改寫如下,參考 [mimalloc/random.c](https://github.com/microsoft/mimalloc/blob/92ead2d88061dde1264800b389b744ac1b79cf39/src/random.c#L255)
```cpp
int main(int argc, char **argv) {
srand((uintptr_t)&main);
size_t count = 20;
node_t *list = NULL;
while (count--) {list = list_make_node_t(list, random() % 1024);
}
```
執行結果如下
```
$ ./a.out
568087200
NOT IN ORDER : 537 827 166 485 66 417 108 305 462 893 98 688 814 861 422 513 477 327 237 170
IN ORDER : 66 98 108 166 170 237 305 327 417 422 462 477 485 513 537 688 814 827 861 893
$ ./a.out
-1303850336
NOT IN ORDER : 111 782 625 900 491 356 273 901 803 125 278 511 612 108 51 450 278 473 436 495
IN ORDER : 51 108 111 125 273 278 278 356 436 450 473 491 495 511 612 625 782 803 900 901
```
---
## 參照 [Optimized QuickSort — C Implementation (Non-Recursive)](https://alienryderflex.com/quicksort/) 並改寫 quick sort
文章中所實作的程式碼如下
```cpp
bool quickSort(int *arr, int elements) {
#define MAX_LEVELS 1000
int piv, beg[MAX_LEVELS], end[MAX_LEVELS], i=0, L, R ;
beg[0]=0;
end[0]=elements;
while (i>=0) {
L=beg[i];
R=end[i]-1;
if (L<R) {
piv=arr[L];
if (i==MAX_LEVELS-1) return NO;
while (L<R) {
while (arr[R]>=piv && L<R) R--;
if (L<R) arr[L++]=arr[R];
while (arr[L]<=piv && L<R) L++;
if (L<R) arr[R--]=arr[L];
}
arr[L]=piv; beg[i+1]=L+1; end[i+1]=end[i]; end[i++]=L;
}
else i--;
}
return YES;
}
```
+ 基本上的思路是,由於 function call 是耗費時間的,因此該實作避免 call function(recursion) 進而減少時間成本。另外,相較於遞迴版的程式多了 `beg[], end[]` 陣列的使用,因為遞迴可以使用 private array 簡化。文中提到 private array 相比 real array 來說慢得多,
> Recursive functions look neat and simple in part because they don’t have to have big arrays like my beg[] and end[]. But all they’re really doing is using the stack as their own private array. This is much slower than using a real array, and could cause stack overflow on some systems, crashing the app that called the quicksort.
+ 在計算機組織中我們有學到,function call 就是把目前暫存器內容處存起來,然後跳到函式位置進行執行,等到返回時,將原先暫存器內容放回,所以單純以變數的角度去看不會比較慢,唯一可能影響性能的只有在 push/pop stack 當中而已
+ 為了程式撰寫的方便性,在 node 的定義上新增一個 previous
```c
typedef struct __node {
int value;
struct __node *next;
struct __node *previous;
} node_t;
```
+ 參照以上程式碼改寫出的非遞迴版 quick sort 如下
```c
void quicksort_iterative(node_t *list, node_t *tail, int count) {
#define MAX_LEVELS 100
int i = 0;
node_t *tl, *tr;
node_t *beg[MAX_LEVELS];
node_t *end[MAX_LEVELS];
beg[0] = list;
end[0] = tail;
while(i >= 0) {
tl = beg[i];
tr = end[i];
if (tl != tr && tl && tr) {
int pivot = tl->value;
if (i == count - 1) return;
while(tl != tr && tl && tr) {
while (tr->value >= pivot && tl != tr) tr = tr->previous;
if (tl != tr) {
tl->value = tr->value;
tl = tl->next;
}
while (tl->value <= pivot && tl != tr) tl = tl->next;
if (tl != tr) {
tr->value = tl->value;
tr = tr->previous;
}
}
tl->value = pivot;
beg[i + 1] = tl->next;
end[i + 1] = end[i];
end[i] = tl;
i++;
}
else i--;
}
return;
}
```
:::warning
TODO:
* beg[MAX_LEVELS] 和 end[MAX_LEVELS] 能否只用其中一者呢?
* 及早排除 list 只包含一個節點的狀況
* 對 `MAX_LEVELS` 的範圍進行數學分析
:notes: jserv
:::
## 探討 [linux-list](https://github.com/sysprog21/linux-list/blob/master/examples/quick-sort.c) 與 [Linux list](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 的差別
+ 最明顯的差別莫過於 sysprog21 的 list 實作是 doubly-linked list 而 Linux 本身的實作是 `circular linked list`
+ 接下來考慮 `list_add` 的實作
```c
// linux
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
if (!__list_add_valid(new, prev, next))
return;
next->prev = new;
new->next = next;
new->prev = prev;
WRITE_ONCE(prev->next, new);
}
---
// sysprog21
static inline void list_add(struct list_head *node, struct list_head *head)
{
struct list_head *next = head->next;
next->prev = node;
node->next = next;
node->prev = head;
head->next = node;
}
```
我們可以發現,對於 `head->next = new` 這行的做法不一樣,差別在於 linux 使用了 `WRITE_ONCE` macro
+ [WRITE_ONCE](https://github.com/torvalds/linux/blob/614cb5894306cfa2c7d9b6168182876ff5948735/tools/include/linux/compiler.h#L184) 的功用,根據註解
> Prevent the compiler from merging or refetching reads or writes. The
> compiler is also forbidden from reordering successive instances of
> READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
> particular ordering. One way to make the compiler aware of ordering is to
> put the two invocations of READ_ONCE or WRITE_ONCE in different C
> statements.
>
> These two macros will also work on aggregate data types like structs or
> unions. If the size of the accessed data type exceeds the word size of
> the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
> fall back to memcpy and print a compile-time warning.
> Their two major use cases are: (1) Mediating communication between
> process-level code and irq/NMI handlers, all running on the same CPU,
> and (2) Ensuring that the compiler does not fold, spindle, or otherwise
> mutilate accesses that either do not require ordering or that interact
> with an explicit memory barrier or atomic instruction that provides the
> required ordering.
就是為了避免 compiler 做優化這件事,比如說,在不影響程式正確性下互調程式碼,讓 `寫入資料` 與 `讀取資料` 互換。在多執行緒的架構下,同一時間可能有多個 process 對同一資料進行存取,可能導致 data race。可參考 [LINUX KERNEL MEMORY BARRIERS](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
```c
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
{
switch (size) {
case 1: *(volatile __u8_alias_t *) p = *(__u8_alias_t *) res; break;
case 2: *(volatile __u16_alias_t *) p = *(__u16_alias_t *) res; break;
case 4: *(volatile __u32_alias_t *) p = *(__u32_alias_t *) res; break;
case 8: *(volatile __u64_alias_t *) p = *(__u64_alias_t *) res; break;
default:
barrier();
__builtin_memcpy((void *)p, (const void *)res, size);
barrier();
}
}
#define WRITE_ONCE(x, val) \
({ \
union { typeof(x) __val; char __c[1]; } __u = \
{ .__val = (val) }; \
__write_once_size(&(x), __u.__c, sizeof(x)); \
__u.__val; \
})
```
[volatile](https://zh.wikipedia.org/wiki/Volatile%E5%8F%98%E9%87%8F) 關鍵字可以避免存取變數被過度優化(使用暫存器資料),因為變數有可能在另一個執行緒中被改變,導致資料不一致的行為發生。
:::danger
上述關於 volatile 關鍵字的考量不充分,請閱讀規格書和思考 embedded 的場景。
:notes: jserv
:::
<hr>
那為甚麼又只有 `WRITE_ONCE(prev->next, new);` 需要 WRITE_ONCE 呢? 根據 [include/linux/list.h 7f5f873 commit message](https://github.com/torvalds/linux/commit/7f5f873c6a0772970d5fee1f364231207051ecd8) 中的線索
> Unfortunately, the compiler is within its rights to (for example) use
> byte-at-a-time writes to update the pointer, which would fatally confuse
> concurrent readers
由上可知,compiler 有可能會把 64 位元的資料分成兩個 32 位元進行讀取,導致資料未更新完全的情況下被存取,為了達成 atomic 所以需要 WRITE_ONCE 的協助,可參考 [Load/Store tearing in this day and age?](https://lwn.net/Articles/793895/)
總結來說,sysprog21 的 list 版本簡化了多執行緒下的必要保護措施。
> [WRITE_ONCE in linux kernel lists](https://stackoverflow.com/questions/34988277/write-once-in-linux-kernel-lists)
> [Linux内核中的READ_ONCE和WRITE_ONCE宏](https://zhuanlan.zhihu.com/p/344256943)
## [A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS](https://pages.mtu.edu/~shene/PUBLICATIONS/1996/3Conline.pdf)
+
###### tags: `linux2021`