2021q1 Homework1 (quiz1)

contributed by < linD026 >

tags: `linux2021`

2021 年第 1 週隨堂測驗題目

Reviewed by `jserv`

儘管已列出 glibc 的實作程式碼，但缺乏針對 Pseudorandom number generator 的原理討論，應涵蓋其數學定義和評估標準，從而說明 glibc 的考量和潛在風險
在 Tail call 一節已說明 TCO 的效益，但編譯器具體做了什麼事情呢？請解說
避免說「我想像中的 PRNG」一類的話語，這無助於讀者理解
A Comparative Study Of Linked List Sorting Algorithms 論文提及 tree sort 效能表現，在 linked list 場景優於 quick sort 和 merge sort 一類的排序演算法，但為何在 Linux 核心的 list_sort 仍採用修改過的 merge sort，能否推敲和列出考量？

0 進度

1 解釋程式運作原理

結構定義

此 linked list 結構以 node_t ( aka struct __node ) 構成單向鏈結串列。

typedef struct __node {                   
    int value;
    struct __node *next;
} node_t;

函式說明

1. list_add_node_t

static inline void list_add_node_t(node_t **list, node_t *node_t) {
    node_t->next = *list;
    *list = node_t;
}

此函式在 *list 節點前增加 node_t 節點，並讓 *list 重新指向 *list 鏈結串列最前端。
實作手法可以從傳遞參數的型態看出來，被加入的節點以指標的指標來保存，因此在 node_t->next = *list; 中可以看到 list 需要 dereference 出來被保存的節點。

Reference

你所不知道的C語言: linked list 和非連續記憶體操作 - remove_list_node

額外閱讀

ISO/IEC 9899:2018 N2310 (C2x)
- 6.7.4 inline 相關說明

2. list_concat

static inline void list_concat(node_t **left, node_t *right) {
    while (*left)
        left = &((*left)->next);
    *left = right;
}

此函式先找出 *left linked list 末端節點的 struct __node next; ，之後和 right 串聯。
此函式利用指標的指標來簡化頭端是 NULL 以及因單指標的 scope 而需回傳值的額外處理。
1

3. quicksort

























void quicksort(node_t **list)
{
    if (!*list)
        return;

    node_t *pivot = *list;
    int value = pivot->value;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    while (p) {
        node_t *n = p;
        p = p->next;
        list_add_node_t(n->value > value ? &right : &left, n);
    }

    quicksort(&left);
    quicksort(&right);

    node_t *result = NULL;
    list_concat(&result, left);
    list_concat(&result, pivot); list_concat(&result, right);
    *list = result;
}

把 pivot 指向 *list 的最初節點，與原先 list 單獨獨立出來後，使其做為區分*list 成 left 和 right list 的依據。

注意，能讓pivot獨立是因為 pivot 指向的值為 *list 最前端，因此無任何節點指向它，外加上指定 struct __node next; NULL 。


    node_t *pivot = *list;


    pivot->next = NULL;

把 *list 中小於pivot的利用 list_add_node_t 加入 left ，反之right 。隨後再分別進行 quicksort 遞迴。
而因為 quicksort 中是以 left 優先遞迴，較小的值會先排序完成，再加上 list_concat 會把 right 取代調 *left 的第一個出現的 NULL ，因此節點之間 NULL 可以規避，不會產生斷點。


    quicksort(&left);
    quicksort(&right);


    while (*left)
        left = &((*left)->next);

最終利用 list_concat 依序把 left 、 pivot 、 right 串聯即完成。
EXAMPLE :

會得到 NULL > 1 > 2 > 3 > NULL > 5 > 6 > 7 > NULL => 1 > 2 > 3 > 5 > 6 > 7 的排序。

Tail call

閱讀 Tail Recursion，確認上述 quick sort 實作是否能透過 tail call 最佳化來改進效能？

簡單說 tail call 可以在經由最佳化後達到盡可能的使用重複的 stack 空間，簡化操作以增進執行效能與空間使用率。

void quicksort_tco(node_t **list) {
    if (!*list)
        return;

    node_t *pivot = *list;
    int value = pivot->value;
    node_t *p = pivot->next;
    pivot->next = NULL;

    node_t *left = NULL, *right = NULL;
    while (p) {
        node_t *n = p;
        p = p->next;
        list_add_node_t(n->value > value ? &right : &left, n);
    }

    node_t *result = NULL;
    if (!left && right) {
        list_concat(&result, pivot);    
        quicksort_tco(&right);
        list_concat(&result, right);
    } else if (left && right) {
        quicksort_tco(&left);
        list_concat(&result, left);
        list_concat(&result, pivot);
        quicksort_tco(&right);
        list_concat(&result, right);
    } else if (left && !right) {
        quicksort_tco(&left);
        list_concat(&result, left);
        list_concat(&result, pivot); 
    } else {
        list_concat(&result, pivot); 
    }
    *list = result;
}

對此我進行比較：

可看出迴圈版本自製的 stack ，其效能無法與被編譯器最佳化過的其他兩個版本相比。

#define time_diff(start, end) (end.tv_nsec - start.tv_nsec < 0 ? (1000000000 + end.tv_nsec - start.tv_nsec): (end.tv_nsec - start.tv_nsec) )
#define check(list) do{\
  if (list_is_ordered(list)) printf("INORDER CHECK\n");\
  else printf("NOT INORDER CHECK\n");\
} while(0)

#define times 100
#define nodes 10000
int main(int argc, char **argv) {
  FILE *ptr_l = NULL, *ptr_lco = NULL, *ptr_it = NULL;
  ptr_l = fopen("list_re.txt", "w"); if(!ptr_l) return 0;
  ptr_it = fopen("list_it.txt", "w"); if(!ptr_it) return 0;
  ptr_lco = fopen("list_lco.txt", "w"); if(!ptr_lco){ return 0;} 

  size_t count = nodes;
  node_t *list = NULL;
  node_t *list_it = NULL;
  node_t *list_tco = NULL;

  unsigned int seed = (uintptr_t)*argv;

  time_t current_time;
  srandom(seed & time(&current_time));

  while (count--) {
    int value = self_random(random() % 1024, random() % 1024);
    list = list_make_node_t(list, value);
    list_it = list_make_node_t(list_it, value);
    list_tco = list_make_node_t(list_tco, value);
  }
  struct timespec time_start;
  struct timespec time_end;
  double during;
  int time_i = 0;
  for (time_i = 0;time_i < times;time_i++) {
    printf("%d\n", time_i);
    if (time_i != 0) set_rand(list, list_it, list_tco);

    clock_gettime(CLOCK_MONOTONIC, &time_start);
    quicksort(&list);
    clock_gettime(CLOCK_MONOTONIC, &time_end);
    during = time_diff(time_start, time_end);
    fprintf(ptr_l, "%d %f\n" , time_i, during);

    clock_gettime(CLOCK_MONOTONIC, &time_start);
    quicksort_iterative(&list_it);
    clock_gettime(CLOCK_MONOTONIC, &time_end);
    during = time_diff(time_start, time_end);
    fprintf(ptr_it, "%d %f\n" , time_i, during);

    clock_gettime(CLOCK_MONOTONIC, &time_start);
    quicksort_tco(&list_tco);
    clock_gettime(CLOCK_MONOTONIC, &time_end);
    during = time_diff(time_start, time_end);
    //check(list_tco);
    fprintf(ptr_lco, "%d %f\n" , time_i, during);  
  }

  fclose(ptr_l);
  fclose(ptr_it);
  fclose(ptr_lco);
  
  printf("nodes :%d\n", nodes);
  printf("times :%d\n", times);

  // forget list_free
  if (!list_is_ordered(list))
    return EXIT_FAILURE;

  list_free(&list);
  return EXIT_SUCCESS;
}

Reference

random 問題

根據 random(3) - Linux man page

The random() function uses a nonlinear additive feedback random number generator

以及

The srandom() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random(). These sequences are repeatable by calling srandom() with the same seed value. If no seed value is provided, the random() function is automatically seeded with a value of 1.

和 glibc random.c 中的註解與定義

  94 /* For each of the currently supported random number generators, we have a
  95    break value on the amount of state information (you need at least this many
  96    bytes of state info to support this random number generator), a degree for
  97    the polynomial (actually a trinomial) that the R.N.G. is based on, and
  98    separation between the two lower order coefficients of the trinomial.  */
 100 /* Linear congruential.  */
 101 #define TYPE_0          0
 102 #define BREAK_0         8
 103 #define DEG_0           0
 104 #define SEP_0           0
 105 
 106 /* x**7 + x**3 + 1.  */
 107 #define TYPE_1          1
 108 #define BREAK_1         32
 109 #define DEG_1           7
 110 #define SEP_1           3
 111 
 112 /* x**15 + x + 1.  */
 113 #define TYPE_2          2
 114 #define BREAK_2         64
 115 #define DEG_2           15
 116 #define SEP_2           1
 117 
 118 /* x**31 + x**3 + 1.  */
 119 #define TYPE_3          3
 120 #define BREAK_3         128
 121 #define DEG_3           31
 122 #define SEP_3           3
 123 
 124 /* x**63 + x + 1.  */
 125 #define TYPE_4          4
 126 #define BREAK_4         256
 127 #define DEG_4           63
 128 #define SEP_4           1
 
 131 /* Array versions of the above information to make code run faster.
 132    Relies on fact that TYPE_i == i.  */
 133 
 134 #define MAX_TYPES       5       /* Max number of types above.  */

可以看出 random 會因為我們擁有的 state information 大小 ( break value ) 而有不同數學式。

/*67    By default, the package runs with 128 bytes of state
  68    information and generates far better random numbers than a linear
  69    congruential generator.  If the amount of state information is less than
  70    32 bytes, a simple linear congruential R.N.G. is used.*/

可以從這裡得證。

/*73    Thus, 32 bytes of
  74    state information will give 7 longs worth of state information, which will
  75    allow a degree seven polynomial.  (Note: The zeroth word of state
  76    information also has some other information stored in it; see setstate
  77    for details).*/

然而 state information 也可以用其他函式替換：

/*65    The state can be switched by
  66    calling the setstate() function with the same array as was initialized
  67    with initstate().*/

再來可以從下註解知道我們所擁有的資訊是 array 型態：

/*70    32 bytes, a simple linear congruential R.N.G. is used.  Internally, the
  71    state information is treated as an array of longs; the zeroth element of
  72    the array is the type of R.N.G. being used (small integer); the remainder
  73    of the array is the state information for the R.N.G.*/

可以從第 71 行看到 state information 的 array 的第一個元素是存放用哪種數學型態。
以下為初始型態：

 137 /* Initially, everything is set up as if from:
 138         initstate(1, randtbl, 128);
 139    Note that this initialization takes advantage of the fact that srandom
 140    advances the front and rear pointers 10*rand_deg times, and hence the
 141    rear pointer which starts at 0 will also end up at zero; thus the zeroth
 142    element of the state information, which contains info about the current
 143    position of the rear pointer is just
 144         (MAX_TYPES * (rptr - state)) + TYPE_3 == TYPE_3.  */
 145 
 146 static int32_t randtbl[DEG_3 + 1] =
 147   {
 148     TYPE_3,
 149 
 150     -1726662223, 379960547, 1735697613, 1040273694, 1313901226,
 151     1627687941, -179304937, -2073333483, 1780058412, -1989503057,
 152     -615974602, 344556628, 939512070, -1249116260, 1507946756,
 153     -812545463, 154635395, 1388815473, -1926676823, 525320961,
 154     -1009028674, 968117788, -123449607, 1284210865, 435012392,
 155     -2017506339, -911064859, -370259173, 1132637927, 1398500161,
 156     -205601318,
 157   };

 160 static struct random_data unsafe_state =
 161   {
 172     .fptr = &randtbl[SEP_3 + 1],
 173     .rptr = &randtbl[1],
 185     .state = &randtbl[1],
 187     .rand_type = TYPE_3,
 188     .rand_deg = DEG_3,
 189     .rand_sep = SEP_3,
 191     .end_ptr = &randtbl[sizeof (randtbl) / sizeof (randtbl[0])]
 192 };

可以知道 state infomation 的初始化跟 srandom 有關：

 199 /* Initialize the random number generator based on the given seed.  If the
 200    type is the trivial no-state-information type, just remember the seed.
 201    Otherwise, initializes state[] based on the given "seed" via a linear
 202    congruential generator.  Then, the pointers are set to known locations
 203    that are exactly rand_sep places apart.  Lastly, it cycles the state
 204    information a given number of times to get rid of any initial dependencies
 205    introduced by the L.C.R.N.G.  Note that the initialization of randtbl[]
 206    for default usage relies on values produced by this routine.  */
 207 void
 208 __srandom (unsigned int x)
 209 {
 210   __libc_lock_lock (lock);
 211   (void) __srandom_r (x, &unsafe_state);
 212   __libc_lock_unlock (lock);
 213 }

 152 /* Initialize the random number generator based on the given seed.  If the
 153    type is the trivial no-state-information type, just remember the seed.
 154    Otherwise, initializes state[] based on the given "seed" via a linear
 155    congruential generator.*/
 160 int
 161 __srandom_r (unsigned int seed, struct random_data *buf)
 162 {
 163   int type;
 164   int32_t *state;
 165   long int i;
 166   int32_t word;
 167   int32_t *dst;
 168   int kc;
 169 
 170   if (buf == NULL)
 171     goto fail;
 172   type = buf->rand_type;
 173   if ((unsigned int) type >= MAX_TYPES)
 174     goto fail;
 175 
 176   state = buf->state;
 177   /* We must make sure the seed is not 0.  Take arbitrarily 1 in this case.  */
 178   if (seed == 0)
 179     seed = 1;
 180   state[0] = seed;
 181   if (type == TYPE_0)
 182     goto done;
 183 
 184   dst = state;
 185   word = seed;
 186   kc = buf->rand_deg;
 187   for (i = 1; i < kc; ++i)
 188     {
 189       /* This does:
 190            state[i] = (16807 * state[i - 1]) % 2147483647;
 191          but avoids overflowing 31 bits.  */
 192       long int hi = word / 127773;
 193       long int lo = word % 127773;
 194       word = 16807 * lo - 2836 * hi;
 195       if (word < 0)
 196         word += 2147483647;
 197       *++dst = word;
 198     }
 199

seed 如果是 0 時設為 1 。
而 word 會改變 state 所指的元素的數值，利用第 192 到 197 行的操作改變，亦即實現 state[i] = (16807 * state[i - 1]) % 2147483647 ( linear congruential generator )。
因此如果第一個 state 為 0 的話，會變成都是 0 。

        /* Then, the pointers are set to known locations
 156    that are exactly rand_sep places apart.  Lastly, it cycles the state
 157    information a given number of times to get rid of any initial dependencies
 158    introduced by the L.C.R.N.G.  Note that the initialization of randtbl[]
 159    for default usage relies on values produced by this routine.*/
 200   buf->fptr = &state[buf->rand_sep];
 201   buf->rptr = &state[0];
 202   kc *= 10;
 203   while (--kc >= 0)
 204     {
 205       int32_t discard;
 206       (void) __random_r (buf, &discard);
 207     }
 208 
 209  done:
 210   return 0;
 211 
 212  fail:
 213   return -1;
 214 }

最後在利用 ( random的實作，__random_r 的 ) L.C.R.N.G 消除初始值得相依性。

 351 
 352 int
 353 __random_r (struct random_data *buf, int32_t *result)
 354 {
 355   int32_t *state;
 356 
 357   if (buf == NULL || result == NULL)
 358     goto fail;
 359 
 360   state = buf->state;
 361 
 362   if (buf->rand_type == TYPE_0)
 363     {
 364       int32_t val = ((state[0] * 1103515245U) + 12345U) & 0x7fffffff;
 365       state[0] = val;
 366       *result = val;
 367     }
 368   else
 369     {
 370       int32_t *fptr = buf->fptr;
 371       int32_t *rptr = buf->rptr;
 372       int32_t *end_ptr = buf->end_ptr;
 373       uint32_t val;
 374 
 375       val = *fptr += (uint32_t) *rptr;
 376       /* Chucking least random bit.  */
 377       *result = val >> 1;
 378       ++fptr;
 379       if (fptr >= end_ptr)
 380         {
 381           fptr = state;
 382           ++rptr;
 383         }
 384       else
 385         {
 386           ++rptr;
 387           if (rptr >= end_ptr)
 388             rptr = state;
 389         }
 390       buf->fptr = fptr;
 391       buf->rptr = rptr;
 392     }
 393   return 0;
 394 
 395  fail:
 396   __set_errno (EINVAL);
 397   return -1;
 398 }

*result 為最後函式輸出的值，可以看出是由 *fptr 和 *rptr 給出。
可以看到在 __random_r 裡沒有運用一開始設定的數學式，因為這已經在 srandom 設置好了。
所以這也就是為什麼在沒使用 srandom 和 setstate 、initstate 狀態下運用 random 會是有規律的了，因為她所用的 state 值是一開始就設定好的。請往回翻查看 randtbl 的初始設定。

對於如何改變 type ，這要從 initstate 和 setstate 看起。
有了上述概念後，可以直接從 Linux Programmer's Manual 看相關描述：

The initstate() function allows a state array state to be initialized for use by random(). The size of the state array n is used by initstate() to decide how sophisticated a random number generator it should use—the larger the state array, the better the random numbers will be.

The setstate() function changes the state array used by the random() function.

可以看到會因為 initstate() 的傳入的 array 大小 n 來決定要用哪個 type 。
在 initstate 實作 __initstate_r 中也可以看到這個判斷式：

 246   int type;
 247   if (n >= BREAK_3)
 248     type = n < BREAK_4 ? TYPE_3 : TYPE_4;
 249   else if (n < BREAK_1)
 250     {
 251       if (n < BREAK_0)
 252         goto fail;
 253 
 254       type = TYPE_0;
 255     }
 256   else
 257     type = n < BREAK_2 ? TYPE_1 : TYPE_2;

有趣的是在 __initstate_r 已經在 271 行幫你呼叫了 __srandom_r (seed, buf);了。
最後我們也可以從 man page 上看到：

       ┌────────────────────────┬───────────────┬─────────┐
       │Interface               │ Attribute     │ Value   │
       ├────────────────────────┼───────────────┼─────────┤
       │random(), srandom(),    │ Thread safety │ MT-Safe │
       │initstate(), setstate() │               │         │
       └────────────────────────┴───────────────┴─────────┘

這是因為在呼叫這些函式的實作時，都已經有 __libc_lock_lock (lock); 和 __libc_lock_unlock (lock); 了。

包裝 random - LFG

我以用 srandom(time(NULL)); 以及外包另一個 PRNG Lagged Fibonacci generators 來重新取回想要的亂數。

// Lagged Fibonacci generators
// Sn ≡ Sn−j ⋆ Sn−k (mod m), 0 < j < k
int self_random(int seed_f, int seed_s) {
  int sn_1 = 0;
  int sn_2 = 1;
  int sn;
  int i = seed_f;
  while (i > 0) {
    sn = (sn_2 & sn_1) % seed_s;
    sn_1 = sn_1 + 3;
    sn_2 = sn_2 + 7;
    i--;
  }
  return sn;
}

int main(int argc, char **argv) {
  size_t count = 20;
  node_t *list = NULL;

  srandom(time(NULL));
  while (count--) {
    list = list_make_node_t(list, self_random(random() % 1024, random() % 1024));
  }

但還是有問題，如果快速執行此執行檔會因為 seed 是由 time() 去取得，會如先前一樣狀況(重複)。

  time_t current_time;
  srandom(time(&current_time));
  struct tm *timeinfo;
  timeinfo = localtime(&current_time);
  printf("now: %s", asctime(timeinfo));

$ ./test
now: Thu Feb 25 02:13:34 2021
NOT IN ORDER : 12 4 10 7 1 157 874 24 24 357 0 656 16 128 196 128 88 24 22 550
    IN ORDER : 0 1 4 7 10 12 16 22 24 24 24 88 128 128 157 196 357 550 656 874
$ ./test
now: Thu Feb 25 02:13:34 2021
NOT IN ORDER : 12 4 10 7 1 157 874 24 24 357 0 656 16 128 196 128 88 24 22 550
    IN ORDER : 0 1 4 7 10 12 16 22 24 24 24 88 128 128 157 196 357 550 656 874
$ ./test
now: Thu Feb 25 02:13:35 2021
NOT IN ORDER : 16 6 490 204 88 410 180 24 26 16 102 34 2 76 44 4 14 32 128 470
    IN ORDER : 2 4 6 14 16 16 24 26 32 34 44 76 88 102 128 180 204 410 470 490

因此可以說對於 random() 所造成的問題並沒有從根本上解決。
在這之後我又找了 int main(int argc, char **argv) 的 *argv地址來做另一個亂數。

  unsigned int seed = (uintptr_t)*argv;
  time_t current_time;
  srandom(seed & time(&current_time));

$ ./test
now: Thu Feb 25 02:42:30 2021
NOT IN ORDER : 692 544 352 86 332 16 810 64 32 136 72 26 82 276 22 368 24 759 8 24
    IN ORDER : 8 16 22 24 24 26 32 64 72 82 86 136 276 332 352 368 544 692 759 810
$ ./test
now: Thu Feb 25 02:42:31 2021
NOT IN ORDER : 22 2 16 0 4 82 6 18 4 0 528 33 2 78 53 2 290 194 88 585
    IN ORDER : 0 0 2 2 2 4 4 6 16 18 22 33 53 78 82 88 194 290 528 585
$ ./test
now: Thu Feb 25 02:42:31 2021
NOT IN ORDER : 68 82 68 532 4 132 514 216 550 64 182 490 412 614 36 356 587 1 101 264
    IN ORDER : 1 4 36 64 68 68 82 101 132 182 216 264 356 412 490 514 532 550 587 614
$ ./test
now: Thu Feb 25 02:42:31 2021
NOT IN ORDER : 16 20 144 96 32 22 2 0 1 12 50 322 12 272 144 256 304 16 144 68
    IN ORDER : 0 1 2 12 12 16 16 20 22 32 50 68 96 144 144 144 256 272 304 322

這才大致上完成我想像中的 PRNG ，雖然還是有種美中不足的感覺，之後如果有新想法再改吧。

TODO 指出上述 ASLR 運作機制，並探討其 entropy
jserv

ASLR ( Address Space Layout Randomization )

ASLR ( Address Space Layout Randomization )

根據 grsecurity 的 ASLR 文件中 1. Design 對其的描述：

The goal of Address Space Layout Randomization is to introduce randomness into addresses used by a given task. This will make a class of exploit techniques fail with a quantifiable probability and also allow their detection since failed attempts will most likely crash the attacked task.

可以知道是為了避免當程式執行的起始位置過於單調導致容易被攻擊，所做的防護措施。

在 ASLR: How Robust is the Randomness? 論文中也有更好理解的說明：

Address Space Layout Randomization (ASLR)is a class of computer security defense techniques designed to reduce theimpact of buffer-overflow vulnerabilities. It adds a random offset to the virtual memory layout of each program, making itharder for an attacker to predict the target memory address thatthey wish the vulnerable program to return to. If the attackeroverwrites the return address with a bad memory address, theprobability of a successful exploit decreases.

詳細的內容如作業系統的比較可以請見 ASLR: How Robust is the Randomness? 。

在 grsecurity 文件中也有舉例說明讓讀者能更好理解 ASLR ：

To help understand the ideas behind ASLR, let's look at an example task and its address space: we made a copy of /bin/cat into /tmp/cat then disabled all PaX features on it and executed "/tmp/cat /proc/self/maps". The [x] marks are not part of the original output, we use them to refer to the various lines in the explanation (note that the VMMIRROR document contains more examples with various PaX features active).

[1] 08048000-0804a000 R+Xp 00000000 00:0b 812 /tmp/cat
[2] 0804a000-0804b000 RW+p 00002000 00:0b 812 /tmp/cat
[3] 40000000-40015000 R+Xp 00000000 03:07 110818 /lib/ld-2.2.5.so
[4] 40015000-40016000 RW+p 00014000 03:07 110818 /lib/ld-2.2.5.so
[5] 4001e000-40143000 R+Xp 00000000 03:07 106687 /lib/libc-2.2.5.so
[6] 40143000-40149000 RW+p 00125000 03:07 106687 /lib/libc-2.2.5.so
[7] 40149000-4014d000 RW+p 00000000 00:00 0
[8] bfffe000-c0000000 RWXp fffff000 00:00 0

As we can see, /tmp/cat is a dynamically linked ELF executable, its address space contains several file mappings.

[1] and [2] correspond to the loadable ELF segments of /tmp/cat containing code and data (both initialized and uninitialized), respectively.

[3] and [4] represent the dynamic linker whereas [5], [6] and [7] are the segments of the C runtime library ([7] holds its uninitialized data that is big enough to not fit into the last page of [6]).

[8] is the stack which grows downwards.

There are other mappings as well that this simple example does not show us: the brk() managed heap that would directly follow [2] and various anonymous and file mappings that the task can create via mmap() and would be placed between [7] and [8] (unless an explicit mapping address outside this region was requested using the MAP_FIXED flag).

For our purposes all these possible mappings can be split into three groups:
- [1], [2] and the brk() managed heap following them,
- [3]-[7] and all the other mappings created by mmap(),
- [8], the stack.

The mappings in the first and last groups are established during execve() and do not move (only their size can change) whereas the mappings in the second group may come and go during the lifetime of the task. Since the base addresses used to map each group are not related to each other, we can apply different amount of randomization to each. This also has the benefit that whenever a given attack technique needs advance knowledge of addresses from more than group, the attacker will likely have to guess or brute force all entropies at once which further reduces the chances of success.

在 grsecurity 也有詳細對 ASLR 的相關機率描述，詳細請見 ASLR 1. Design ，在此指引入一部分對於 ASLR 搭配 crash detection 所形成保護的內容。

On the other hand however the defense side has quite some control over the value of x: whenever an attack attempt makes a wrong guess on the randomized bits, the attacked application will go into a state that will likely result in a crash and hence becomes detectable by the kernel. It is therefore a good strategy to use a crash detection and reaction mechanism together with ASLR (PaX itself contains no such mechanism).

而可從文件中得知， ASLR 也會造成記憶體碎片化與 entropy pool exhaustion ：

The last set of side effects of ASLR is address space fragmentation and entropy pool exhaustion. Since randomization shifts entire ranges of memory, it will also randomly change the gaps between them (which were constant before). This in turn will change the maximum size of memory mappings that will fit in there and applications expecting to be able to create them will fail. Finally, ASLR increases the consumption of the system's entropy pool since every task creation (through the execve() system call) requires some bits of randomness to determine the new address space layout. Depending on the system's threat model however a given implementation can relax the requirements for the quality of this entropy. In particular, if only remote attacks are considered, then ASLR does not need cryptographically secure random bits as a remote attacker cannot observe them (or if he can, he does not need to care about ASLR at all).

關於實際例子，以我的電腦為例：

$ cat /proc/sys/kernel/randomize_va_space
2
$ sysctl -a --pattern randomize
kernel.randomize_va_space = 2
$ ls
cities.txt  initial-analysis.txt  list.h  merge  merge_sort.c  test.c
$ ldd merge
        linux-vdso.so.1 (0x00007fffc087a000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa969c80000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa969e97000)
$ ldd merge
        linux-vdso.so.1 (0x00007fffdf9fb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f99359a0000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9935bb9000)

根據 Linux man page - ldd 顯示：

For each dependency, ldd displays the location of the matching object and the (hexadecimal) address at which it is loaded. (The linux-vdso and ld-linux shared dependencies are special; see vdso(7) and ld.so(8).)

可以看到共享物件的位置不同。
註：kernel.randomize_va_space = 2 為 full Randomization 。

0 = Disabled
1 = Conservative Randomization
2 = Full Randomization

而關於 entropy pool 可以看 Linux kernel source tree - root/drivers/char/random.c 。












/*
 * Computers are very predictable devices.  Hence it is extremely hard
 * to produce truly random numbers on a computer --- as opposed to
 * pseudo-random numbers, which can easily generated by using a
 * algorithm.  Unfortunately, it is very easy for attackers to guess
 * the sequence of pseudo-random number generators, and for some
 * applications this is not acceptable.  So instead, we must try to
 * gather "environmental noise" from the computer's environment, which
 * must be hard for outside attackers to observe, and use that to
 * generate random numbers.  In a Unix environment, this is best done
 * from inside the kernel.
 */

可以從下列引述看到 entropy pool 的輸入為裝置、中斷等環境噪音：






































/*
 * Exported interfaces ---- input
 * ==============================
 *
 * The current exported interfaces for gathering environmental noise
 * from the devices are:
 *
 *	void add_device_randomness(const void *buf, unsigned int size);
 * 	void add_input_randomness(unsigned int type, unsigned int code,
 *                                unsigned int value);
 *	void add_interrupt_randomness(int irq, int irq_flags);
 * 	void add_disk_randomness(struct gendisk *disk);
 *
 * add_device_randomness() is for adding data to the random pool that
 * is likely to differ between two devices (or possibly even per boot).
 * This would be things like MAC addresses or serial numbers, or the
 * read-out of the RTC. This does *not* add any actual entropy to the
 * pool, but it initializes the pool to different values for devices
 * that might otherwise be identical and have very little entropy
 * available to them (particularly common in the embedded world).
 *
 * add_input_randomness() uses the input layer interrupt timing, as well as
 * the event type information from the hardware.
 *
 * add_interrupt_randomness() uses the interrupt timing as random
 * inputs to the entropy pool. Using the cycle counters and the irq source
 * as inputs, it feeds the randomness roughly once a second.
 *
 * add_disk_randomness() uses what amounts to the seek time of block
 * layer request events, on a per-disk_devt basis, as input to the
 * entropy pool. Note that high-speed solid state drives with very low
 * seek times do not make for good sources of entropy, as their seek
 * times are usually fairly consistent.
 *
 * All of these routines try to estimate how many bits of randomness a
 * particular randomness source.  They do this by keeping track of the
 * first and second order deltas of the event timings. 
 */

關於如果是在系統開機時，能夠確保其不可預測性的方式：


























/*
 * Ensuring unpredictability at system startup
 * ============================================
 *
 * When any operating system starts up, it will go through a sequence
 * of actions that are fairly predictable by an adversary, especially
 * if the start-up does not involve interaction with a human operator.
 * This reduces the actual number of bits of unpredictability in the
 * entropy pool below the value in entropy_count.  In order to
 * counteract this effect, it helps to carry information in the
 * entropy pool across shut-downs and start-ups.  To do this, put the
 * following lines an appropriate script which is run during the boot
 * sequence:
 *
 *	echo "Initializing random number generator..."
 *	random_seed=/var/run/random-seed
 *	# Carry a random seed from start-up to start-up
 *	# Load and then save the whole entropy pool
 *	if [ -f $random_seed ]; then
 *		cat $random_seed >/dev/urandom
 *	else
 *		touch $random_seed
 *	fi
 *	chmod 600 $random_seed
 *	dd if=/dev/urandom of=$random_seed count=1 bs=512 
 */

也可以下達 cat /proc/sys/kernel/random/entropy_avail 來看電腦目前儲存的 entropy bit 為多少。 ( below 4096 )

$ cat /proc/sys/kernel/random/entropy_avail
4096

此為 Linux 核心所提供的 RNG ，相關函式可見 random(7) 和 random(4) 的 Linux manual page 。

urandom 和 getrandom

The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bits of noise in the entropy pool. From this entropy pool, random numbers are created.

When read, the /dev/urandom device returns random bytes using a pseudorandom number generator seeded from the entropy pool. Reads from this device do not block (i.e., the CPU is not yielded), but can incur an appreciable delay when requesting large amounts of data.

When read during early boot time, /dev/urandom may return data prior to the entropy pool being initialized. If this is of concern in your application, use getrandom(2) or /dev/random instead.

筆記 : // linux's rand using cryptographically secure pseudo random number

rand 實作是 LCG
glibc - rand.c

  23 /* Return a random integer between 0 and RAND_MAX.  */
  24 int
  25 rand (void)
  26 {
  27   return (int) __random ();
  28 }

Reference

2 非遞迴呼叫實作

#define MAX 300

void quicksort_iterative(node_t **list) {
  if (!*list)
    return;
    
  node_t *stack[MAX] = {0};
  int top = -1;
  stack[++top] = *list;
  node_t *partition = NULL;
  node_t *result = NULL;

  while (top >= 0) {
    partition = stack[top--];
    if (partition && partition->next != NULL) {
      node_t *pivot = partition;
      int value = pivot->value;
      node_t *p = pivot->next;
      pivot->next = NULL;
      node_t *left = NULL, *right = NULL;
      while (p) {
        node_t *n = p;
        p = p->next;
        list_add_node_t(n->value >= value ? &right : &left, n);
      }
      if (left != NULL)
        list_concat(&left, pivot);
      else
        left = pivot;
      if (right)
        stack[++top] = right;
      if (left)
        stack[++top] = left;
    } else {
      top++;
      while (top >= 0 && stack[top]->next == NULL) {
        node_t *temp = stack[top--];
        list_concat(&result, temp);
      }
    }
  }
  *list = result;
}

註解 :

利用原本 quicksort 的 function stack 資訊在單一函式內重新建構所需的資訊。

  node_t *stack[MAX] = {0};
  int top = -1;
  stack[++top] = *list;
  node_t *partition = NULL;

利用這上列資訊檢測每次外圍迴圈 (仿 stack ) partition = stack[top--]; 的值為單一節點。
若否，則如原先 quicksort 一樣對linked list 處理成 left 、 pivot 、 right 。
- 須注意若 left 或 right 為 NULL 的狀況。
若是，則進入 else 進行 pop stack 直到 top 指到不為單一節點的 item 。
當 stack 空了，即排序完成。

Reference

Optimized QuickSort — C Implementation (Non-Recursive)

3 linux-list 探討並改寫

1. Linux 核心實作

linux-list 問題中有描述到：

Linux 核心內部也有 linked list 實作，但是 circulr doubly-linked list，linux-list 仿效 Linux 核心的實作並予以簡化……

關於 Linux 核心的實作可以從下列網址查閱：

Linux kernel source tree
Linux 原始程式碼 ( GitHub )
- sort.c ( heap sort - function sort_r )
- list_sort.c ( merge sort - function list_sort )

實作方面是以 merge sort 為主。在更早之前好像是看「Linux 核心設計」系列講座的錄影檔有說道是因為 merge sort 是 stable 且 average, best case, worst case 都是

$O (n \log n)$ ，所以才採用 merge sort (stable sort) 而非 worst case

n^{2}

的 quick sort (unstable sort) 。

2. linux-list 的 `list_qsort`

結構定義

此 list_head 結構是由 prev 指標指向前一個節點與 next 指向下一個節點的雙向鏈結串列所構成。

struct list_head {
    struct list_head *prev;
    struct list_head *next;
};

listitem 結構為儲存目標節點的數值 i 以及其來源 list。

struct listitem {
    uint16_t i;
    struct list_head list;
};

函式說明

static void list_qsort(struct list_head *head)
{
    struct list_head list_less, list_greater;
    struct listitem *pivot;
    struct listitem *item = NULL, *is = NULL;

    if (list_empty(head) || list_is_singular(head))
        return;

    INIT_LIST_HEAD(&list_less);
    INIT_LIST_HEAD(&list_greater);

    pivot = list_first_entry(head, struct listitem, list);
    list_del(&pivot->list);

    list_for_each_entry_safe (item, is, head, list) {
        if (cmpint(&item->i, &pivot->i) < 0)
            list_move_tail(&item->list, &list_less);
        else
            list_move(&item->list, &list_greater);
    }

    list_qsort(&list_less);
    list_qsort(&list_greater);

    list_add(&pivot->list, head);
    list_splice(&list_less, head);
    list_splice_tail(&list_greater, head);
}

可以很明確地看到每個動作都被包裝，這很符合一般對於大型專案為了多人維護、開發等考量的認知，每個動作都獨立出來使得在修改底下程式碼時 API 依舊保持一致。

對於一一解釋包裝內容會使篇幅過於冗長，因此在此會先列出各行為內部原始碼後，再以 pseudocode 形式完整表示所有行為。

下列為各被包裝的底部細節，依循 list_qsort 所逐一使用的動作由上而下列出：

INIT_LIST_HEAD

static inline void INIT_LIST_HEAD(struct list_head *head)
{
    head->next = head;
    head->prev = head;
}

list_first_entry

/**
 * list_entry() - Calculate address of entry that contains list node
 * @node: pointer to list node
 * @type: type of the entry containing the list node
 * @member: name of the list_head member variable in struct @type
 *
 * Return: @type pointer of entry containing node
 */
#define list_entry(node, type, member) container_of(node, type, member)

/**
 * list_first_entry() - get first entry of the list
 * @head: pointer to the head of the list
 * @type: type of the entry containing the list node
 * @member: name of the list_head member variable in struct @type
 *
 * Return: @type pointer of first entry in list
 */
#define list_first_entry(head, type, member) \
    list_entry((head)->next, type, member)
    
#ifndef container_of
#ifdef __LIST_HAVE_TYPEOF
#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })
#else
#define container_of(ptr, type, member) \
    ((type *) ((char *) (ptr) -offsetof(type, member)))
#endif
#endif

list_for_each_entry_safe

/**
 * list_for_each_entry_safe - iterate over list entries and allow deletes
 * @entry: pointer used as iterator
 * @safe: @type pointer used to store info for next entry in list
 * @head: pointer to the head of the list
 * @member: name of the list_head member variable in struct type of @entry
 *
 * The current node (iterator) is allowed to be removed from the list. Any
 * other modifications to the the list will cause undefined behavior.
 *
 * FIXME: remove dependency of __typeof__ extension
 */
#define list_for_each_entry_safe(entry, safe, head, member)                \
    for (entry = list_entry((head)->next, __typeof__(*entry), member),     \
        safe = list_entry(entry->member.next, __typeof__(*entry), member); \
         &entry->member != (head); entry = safe,                           \
        safe = list_entry(safe->member.next, __typeof__(*entry), member))

cmpint

static inline int cmpint(const void *p1, const void *p2)
{
    const uint16_t *i1 = (const uint16_t *) p1;
    const uint16_t *i2 = (const uint16_t *) p2;

    return *i1 - *i2;
}

list_move_tail

/**
 * list_move_tail() - Move a list node to the end of the list
 * @node: pointer to the node
 * @head: pointer to the head of the list
 *
 * The @node is removed from its old position/node and add to the end of @head
 */
static inline void list_move_tail(struct list_head *node,
                                  struct list_head *head)
{
    list_del(node);
    list_add_tail(node, head);
}

list_move

/**
 * list_move() - Move a list node to the beginning of the list
 * @node: pointer to the node
 * @head: pointer to the head of the list
 *
 * The @node is removed from its old position/node and add to the beginning of
 * @head
 */
static inline void list_move(struct list_head *node, struct list_head *head)
{
    list_del(node);
    list_add(node, head);
}

list_add

/**
 * list_add() - Add a list node to the beginning of the list
 * @node: pointer to the new node
 * @head: pointer to the head of the list
 */
static inline void list_add(struct list_head *node, struct list_head *head)
{
    struct list_head *next = head->next;

    next->prev = node;
    node->next = next;
    node->prev = head;
    head->next = node;
}

list_splice & list_splice_tail

/**
 * list_splice() - Add list nodes from a list to beginning of another list
 * @list: pointer to the head of the list with the node entries
 * @head: pointer to the head of the list
 *
 * All nodes from @list are added to to the beginning of the list of @head.
 * It is similar to list_add but for multiple nodes. The @list head is not
 * modified and has to be initialized to be used as a valid list head/node
 * again.
 */
static inline void list_splice(struct list_head *list, struct list_head *head)
{
    struct list_head *head_first = head->next;
    struct list_head *list_first = list->next;
    struct list_head *list_last = list->prev;

    if (list_empty(list))
        return;

    head->next = list_first;
    list_first->prev = head;

    list_last->next = head_first;
    head_first->prev = list_last;
}

/**
 * list_splice_tail() - Add list nodes from a list to end of another list
 * @list: pointer to the head of the list with the node entries
 * @head: pointer to the head of the list
 *
 * All nodes from @list are added to to the end of the list of @head.
 * It is similar to list_add_tail but for multiple nodes. The @list head is not
 * modified and has to be initialized to be used as a valid list head/node
 * again.
 */
static inline void list_splice_tail(struct list_head *list,
                                    struct list_head *head)
{
    struct list_head *head_last = head->prev;
    struct list_head *list_first = list->next;
    struct list_head *list_last = list->prev;

    if (list_empty(list))
        return;

    head->prev = list_last;
    list_last->next = head;

    list_first->prev = head_last;
    head_last->next = list_first;
}

pseudocode

























procdedure list_qsort is
    -- Input:
    --    HEAD
    --       > first node of linked list
    --         here represent the target linked list
    -- Output:
    --    sorted list
    -- Object:
    --    PIVOT
    --       > the first node of HEAD
    --    LESS  
    --       > the linked list of that the value of node is less than PIVOT
    --    GREATER
    --       > the linked list of that the value of node is greater than PIVOT
    if HEAD is empty or singular node then exit;
    initial PIVOT, LESS, GREATER, ITEM;
    assign first node of HEAD to PIVOT and delete it from HEAD;
    forall ( the node of HEAD ) do
        separate into LESS and GREATER;
            > add it to the tail of LESS
            > add it to the beginning of GREATER
    input LESS to list_qsort;
    input GREATER to list_qsort;
    concatenate LESS, PIVOT, GREATER and assign to HEAD;
end; -- list_qsort

3. `list_qsort` 與 `quicksort` 差異

從上述的 list_qsort pseudocode 可以明確的得出，兩個函式行為大同小異，差別在對於一個行為它程式碼的是否被包裝成 object 。

Reference

4. 改寫 list_qsort

若想直接看 4 思考高效率的 linked list 排序演算法所提出的實作請見 Tree sort。

quick sort

#define MAX 512
static void list_qsort_iterative(struct list_head *head) {
    if (list_empty(head) || list_is_singular(head))
        return;    
    
    struct list_head stack[MAX];
    for (int i = 0;i < MAX;i++) INIT_LIST_HEAD(&stack[i]);
    int top = -1;
    list_splice_init(head, &stack[++top]);

    struct list_head partition;
    INIT_LIST_HEAD(&partition);

    while (top >= 0) {
        INIT_LIST_HEAD(&partition);
        list_splice_init(&stack[top--], &partition);
        if (!list_empty(&partition) && !list_is_singular(&partition)) {
        
            struct list_head list_less, list_greater;
            struct listitem *pivot;
            struct listitem *item = NULL, *is = NULL;
            INIT_LIST_HEAD(&list_less);
            INIT_LIST_HEAD(&list_greater);
            pivot = list_first_entry(&partition, struct listitem, list);
            list_del(&pivot->list);
            INIT_LIST_HEAD(&pivot->list);

            list_for_each_entry_safe (item, is, &partition, list) {
                list_del(&item->list);
                if (cmpint(&item->i, &pivot->i) < 0) 
                    list_move(&item->list, &list_less);
                else 
                    list_move(&item->list, &list_greater);
            }
            
            list_move_tail(&pivot->list, &list_less);
            if(!list_empty(&list_greater))
                list_splice_tail(&list_greater, &stack[++top]);
            if(!list_empty(&list_less))
                 list_splice_tail(&list_less, &stack[++top]);
                 
        } else {
            top++;
            list_splice_tail(&partition, &stack[top]);
            while (top >= 0 && list_is_singular(&stack[top])) {
                struct listitem *temp = list_first_entry(&stack[top], struct listitem, list);
                list_del(&temp->list);
                INIT_LIST_HEAD(&stack[top--]);
                list_add_tail(&temp->list, head);
            }
        }
    }
}

$ make
  CC    tests/quick-sort.o
  LD    tests/quick-sort
*** Validating tests/quick-sort ***
        [ Verified ]

註解：

根據問題四所得出得結論，依舊是更改 quick sort 。對此我延續上述的演算法對此進行變更。
對此，舊的演算法雖然沒有新的感想，但在實際運用 object 觀念進行改寫時，真的深刻地感受到對行為進行包裝後，只要了解其運行方式與原理並靈活運用即可。至此在進行維護、擴充或改寫時，能夠更輕便、簡潔。

4 思考高效率的 linked list 排序演算法

A Comparative Study Of Linked List Sorting Algorithms

根據 A Comparative Study Of Linked List Sorting Algorithms 探討六個排序演算法，其中 Doubly Linked List Sorting Algorithms 提到的 tree sort 在文中所給出的數據是最快速。

n D-Bub S-Bub Select Msort Qsort Tree

100 0.22 0.12 0.10 0.08 0.07 0.05

200 0.98 0.54 0.44 0.15 0.13 0.10

300 2.20 1.22 0.76 0.23 0.22 0.19

400 4.18 2.42 1.44 0.32 0.30 0.21

500 6.38 3.74 2.18 0.42 0.37 0.29

600 10.22 6.48 4.06 0.53 0.51 0.40

700 15.38 10.10 6.46 0.69 0.57 0.43

800 21.20 14.82 9.68 0.76 0.69 0.51

900 28.34 20.20 13.62 0.88 0.79 0.61

1000 36.58 26.14 17.88 1.01 0.89 0.69

2000 159 127 93 2.75 2.00 1.38

3000 379 302 220 3.38 3.38 2.88

4000 693 549 401 5.50 4.12 4.12

5000 1104 867 643 6.00 6.88 5.50

6000 1763 1395 1082 9.00 8.88 6.38

7000 3037 2604 2169 12.38 11.00 9.62

8000 4449 3850 3252 13.75 11.62 10.25

9000 5515 4630 3917 16.38 14.38 12.25

10000 6591 5509 4619 19.25 16.50 12.25

table 1 and 2 : Running Time for item n = 100 to 1000 and 2000 to 10000

n	D-Bub	S-Bub	Select	Msort	Qsort	Tree
100	0.22	0.12	0.10	0.08	0.07	0.05
200	0.98	0.54	0.44	0.15	0.13	0.10
300	2.20	1.22	0.76	0.23	0.22	0.19
400	4.18	2.42	1.44	0.32	0.30	0.21
500	6.38	3.74	2.18	0.42	0.37	0.29
600	10.22	6.48	4.06	0.53	0.51	0.40
700	15.38	10.10	6.46	0.69	0.57	0.43
800	21.20	14.82	9.68	0.76	0.69	0.51
900	28.34	20.20	13.62	0.88	0.79	0.61
1000	36.58	26.14	17.88	1.01	0.89	0.69
2000	159	127	93	2.75	2.00	1.38
3000	379	302	220	3.38	3.38	2.88
4000	693	549	401	5.50	4.12	4.12
5000	1104	867	643	6.00	6.88	5.50
6000	1763	1395	1082	9.00	8.88	6.38
7000	3037	2604	2169	12.38	11.00	9.62
8000	4449	3850	3252	13.75	11.62	10.25
9000	5515	4630	3917	16.38	14.38	12.25
10000	6591	5509	4619	19.25	16.50	12.25
table 1 and 2 : Running Time for item n = 100 to 1000 and 2000 to 10000

在文中有提到如何進行 tree sort ：

Therefore, we can use these fields to build a binary search tree and reconstruct a sorted doubly linked list as the binary search tree is traversed with inorder.

雖然文中對於如何建構 binary search tree 沒有詳細闡述，但有列出是如何進行轉換。

static NodePTR head, tail;
void Traverse(NodePTR root)
{
    NodePTR work;
    if (root != NULL) {
    Traverse(root->LEFT);
    work = root->RIGHT;
    APPEND_NODE(root);
    Traverse(work);
    }
}
Figure 2 : Reconstruct a List from a Tree

Figure 2 shows a modified recursive inorder traversal of a binary search tree. Two static variables, head and tail, are set to NULL before calling Traverse(). Function Traverse() receives the current root pointer. If it is not NULL, the left subtree is traversed. Then, the pointer to root's right subtree is saved to work, the root is appended to the end of the doubly linked list with head and tail pointers head and tail, and finally the right subtree pointed to by work is traversed.
Note that the pointer to the right subtree must be saved before the root is appended to the doubly linked list since macro APPEND_NODE destroys prev and next.

至於如何建構 binary search tree 可以看 splay tree 或是參考 Linux 核心裡的 red black tree。

疑問

此疑問已解決。詳細請看錯誤修正。

在此有個疑問，明明都是

O (n \log n)

的排序演算法為何會在 10000 個 items 時 quick sort 會與 tree sort 有 4 秒多差距，甚至和 merge sort 相比會有 7 秒之差呢？

文中也有說到作者的測試環境：

All of these six algorithms were coded in ANSI C and SWAP(), APPEND() and JOIN() are C macros rather than functions except for the sediment sort whose swap function is taken directly from Carraway's paper.2 For those who love C++, these macros and variable parameters can easily be changed to inline functions and aliases, respectively. Each sorting algorithm is repeated several times sorting the same set of input to minimize timing error and the average elapsed time is recorded. The clock() function is used to retrieve the elapsed time between the start and the end of a sorting algorithm, excluding data generation and all other operations. Note that clock() returns the number of clock ticks rather than the number of seconds.
Moreover, since clock() returns elapsed time rather than user time (i.e., the CPU time used by a user program), this test is performed under MSDOS rather than Windows and Unix to minimize the multitasking effect. The machine used for this test is an Intel 66mhz 486DX2 IBM PC compatible and the compiler is Watcom C/C++ Version 10.0 with compiler options set to /oneatx/zp4/4/fp3 as suggested by Watcom for maximum efficiency.

也有 Bubble sort, merge sort, quick sort 的程式碼，在此就先不列出，詳見 A comparative study of linked list sorting algorithms。

文中也有提出關於使用 binary search tree 的 tree sort 為何是

n l o g_{2} (n)

：

As is well-known, the complexity of binary search tree insertion is
$O (n^{2})$ , since in a binary search tree, except for one leaf, all nodes could have exactly one child and in this case the tree reduces to a linked list. However, if the input data are random, the resulting binary search tree could be reasonably balanced and the complexity would be approximately
$O (n l o g_{2} n)$ .

關於為何是

O (n^{2})

可以看 stackoverflow - Run time for inserting a binary search tree is n^2? 。

因此可以推測在文中 tree sort 所分析計算的時間，是沒有包含 Traverse 所耗的時間。
至此，在我自己的 tree sort 時間複雜度算法會是

n l o g_{2} n + n

，簡單的列出為何會是如此：

linked list 依序一個個節點插入至 balanced binary search tree
- $n l o g_{2} n$
traverse
- $n$

當然對於作者為何會提出

n l o g_{2} n

，可能有其他因素考量而我不知道或漏看也說不定。

關於此疑問在 (2021/3/4 - 12:48)已解決，是我算法有問題。
詳細看錯誤修正。

Quick sort

排除尚未探討的 tree sort，本處仍以 quick sort 分析為主。

void Qsort(NodePTR *first, NodePTR *last)
{
    NodePTR lesHEAD=NULL, lesTAIL=NULL;
    NodePTR equHEAD=NULL, equTAIL=NULL;
    NodePTR larHEAD=NULL, larTAIL=NULL;
    NodePTR current = *first;
    int pivot, info;
    if (current == NULL)
        return;
    pivot = current->data;
    APPEND(current, equHEAD, equTAIL);
    while (current != NULL) {
        info = current->data;
        if (info < pivot)
            APPEND(current,lesHEAD,lesTAIL)
        else if (info > pivot)
            APPEND(current,larHEAD,larTAIL)
        else
        APPEND(current,equHEAD,equTAIL);
    }
    Qsort(&lesHEAD, &lesTAIL);
    Qsort(&larHEAD, &larTAIL);
    JOIN(lesHEAD,lesTAIL,equHEAD,equTAIL);
    JOIN(lesHEAD,equTAIL,larHEAD,larTAIL);
    *first = lesHEAD;
    *last = larTAIL;
}

Figure 3: Quick Sort

Figure 3 shows the basic concept, where macro APPEND() appends the first argument to the tail of a singly linked list whose head and tail are defined by the second and third arguments. On return, the first argument will be modified so that it points to the next node of the list.

Macro JOIN() appends the list whose head and tail are defined by the third and fourth arguments to the list whose head and tail are defined by the first and second arguments.

在文中，作者把 quick sort 歸類在 4 Singly Linked List Sorting Algorithms 中；而就其 unstable 特性也有大略提到：

For quick sort, Hoare's original algorithm [3] cannot be used since this algorithm “burns a candle from both ends”. Nico Lomuto's algorithm as described in Bentley [1] could be a better candidate for our study since it keeps two forward scanning pointers. However, since quick sort is not stable (Sedgewick [6]), it is not included. Instead, an algorithm which was originally designed to make quick sort stable and to handle equal keys is selected for this study. This algorithm was first proposed by Motzkin [5] and then analyzed by Wegner [7]. In fact, Wegner showed that on average this algorithm is of order
$O ((m + n) 2 (n / m))$ , where n is the number of keys in an input linked list in which each key occurs m times.

錯誤修正

對上述所說 Tree sort 為

n l o g_{2} n + n

是沒錯的，但在轉換成 Big O notation 時，會因為定義：

Big O notation characterizes functions according to their growth rates: different functions with the same growth rate may be represented using the same O notation. The letter O is used because the growth rate of a function is also referred to as the order of the function. A description of a function in terms of big O notation usually only provides an upper bound on the growth rate of the function. Associated with big O notation are several related notations, using the symbols o, Ω, ω, and Θ, to describe other kinds of bounds on asymptotic growth rates.

因此 Tree sort 為

$O (n l o g n + n) = O (n l o g n)$
至此，文中把其歸類為

O (n l o g_{2} n)

是正確的。

Tree sort

此 tree sort 的 balanced binary search tree 採用 red black tree。
我改寫 linux kernel 裡的 rbtree.h 和 rb_tree_augmented.h ，簡化它並藉此達到我想要的功能。原先是打算實作 cache 或 augmented 版本，但詳細操作還需要一段時間才能理解，因此做出最簡化的版本。
關於 Linux 核心的實作似乎是以 rb_tree_augmented.h 為主，因為其也可以達到 rbtree.h 的效果。

rv32emu-next

我實作過一個精簡的 red-black tree，可見:

jserv

tree-sort.c

#include "list.h"
#include "rb_tree_augmented.h"

#include "common.h"

#define u32 uint32_t

//l2t is list to tree. 
struct l2t_node {
    struct list_head *list_node;
    struct rb_node rb;
};

static void insert(struct l2t_node *node, struct rb_root_cached *root)
{
	struct rb_node **new = &root->rb_root.rb_node, *parent = NULL;
    struct listitem *key = list_entry(node->list_node, struct listitem, list);

	while (*new) {
		parent = *new;
		if (key->i < list_entry(rb_entry(parent, struct l2t_node, rb)->list_node, struct listitem, list)->i)
			new = &parent->rb_left;
		else
			new = &parent->rb_right;
	}

	rb_link_node(&node->rb, parent, new);
	rb_insert_color(&node->rb, &root->rb_root);
}

static void tree_sort(struct list_head *head, int nnodes) {
    struct l2t_node nodes[nnodes];
    struct rb_root_cached root = RB_ROOT_CACHED;
    struct listitem *item = NULL;
    struct listitem *is = NULL;
    
    // the sum of node of list need equal to nnodes
    int node_n = 0;
    list_for_each_entry_safe(item, is, head, list) {
        list_del(&item->list);
        INIT_LIST_HEAD(&item->list);
        nodes[node_n++].list_node = &item->list;
    }

    for (int i = 0; i < nnodes;i++) {
        insert(&nodes[i], &root);
    }

    INIT_LIST_HEAD(head);    
	struct rb_node *node;
    //inorder Traverse
    for (node = rb_first(&root.rb_root); node; node = rb_next(node)){
        struct l2t_node *temp = container_of(node, struct l2t_node, rb);
        struct listitem *temp_item = list_entry(temp->list_node, struct listitem, list);
        list_add_tail( &temp_item->list, head);
    }
}

詳細請看github - linux-list)

因為如果利用 linux-list 內建的 make 會產生

$ make
  CC    tests/quick-sort.o
  LD    tests/quick-sort
*** Validating tests/quick-sort ***
        [ Verified ]
  CC    tests/tree-sort.o
In file included from tests/tree-sort.c:4:
./include/rb_tree_augmented.h: In function ‘dummy_rotate’:
./include/rb_tree_augmented.h:468:49: error: unused parameter ‘old’ [-Werror=unused-parameter]
  468 | static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {}
      |                                 ~~~~~~~~~~~~~~~~^~~
./include/rb_tree_augmented.h:468:70: error: unused parameter ‘new’ [-Werror=unused-parameter]
  468 | static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {}
      |                                                      ~~~~~~~~~~~~~~~~^~~
cc1: all warnings being treated as errors

原因就如同上面所顯示的 dummy_rotate 是 NULL function。
一般我們在編譯的時候 warnings 可以忽略，但以此 makefile 操作的話會因為 cc1: all warnings being treated as errors 而出現錯誤不給編譯。
但 dummy_rotate 又是必要的，因此檢測方式我先改成列印出來與獨立的 valgrind 來交叉檢測。

列印 (values size = 256)：

#include <stdio.h>
#define TEST(name, LIST) do{\
            printf("%s: ", #name);\
            struct listitem *item_r = NULL, *is_r = NULL;\
            list_for_each_entry_safe(item_r, is_r, LIST, list)\
                printf("%d ", item_r->i);\
            printf("\n");\
            getchar();\
} while (0)









    tree_sort(&testlist, ARRAY_SIZE(values));
    
    TEST(testlist, &testlist);
    printf("values  ");
    for(int i = 0;i < ARRAY_SIZE(values);i++){
        printf("%d ", values[i]);
    }
    printf("\n");

$ gcc -o test -I./include -I./private tests/tree-sort.c -g
$ ./test
testlist: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255

values  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255

valgrind (values size 10000)：

$ valgrind -v --leak-check=full  ./ttest
==10539== Memcheck, a memory error detector
==10539== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10539== Using Valgrind-3.15.0-608cb11914-20190413 and LibVEX; rerun with -h for copyright info
==10539== Command: ./ttest
==10539==
--10539-- Valgrind options:
--10539--    -v
--10539--    --leak-check=full
--10539-- Contents of /proc/version:
--10539--   Linux version 4.4.0-18362-Microsoft (Microsoft@Microsoft.com) (gcc version 5.4.0 (GCC) ) #1049-Microsoft Thu Aug 14 12:01:00 PST 2020
--10539--
--10539-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--10539-- Page sizes: currently 4096, max supported 4096
==10539==
==10539== HEAP SUMMARY:
==10539==     in use at exit: 0 bytes in 0 blocks
==10539==   total heap usage: 10,001 allocs, 10,001 frees, 260,000 bytes allocated
==10539==
==10539== All heap blocks were freed -- no leaks are possible
==10539==
==10539== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

tree sort 和 quick sort 分析

以下是針對 quick sort 和 tree sort 的效能分析程式:

#include <sys/random.h>
#include <stddef.h>
#include <time.h>
static uint16_t values[1000];
uint16_t self_random(void) {
    random_shuffle_array(values, (uint16_t) ARRAY_SIZE(values));
    time_t seed;
    return values[values[(time(&seed) & values[7] ) % ARRAY_SIZE(values)]];
}

#define time_diff(start, end) (end.tv_nsec - start.tv_nsec < 0 ? (1000000000 + end.tv_nsec - start.tv_nsec): (end.tv_nsec - start.tv_nsec) )
#define len 10000
#define times 1000
void analysis(void) {
    FILE *ptr = NULL;
    //ptr = fopen("time_quick.txt", "w");
    ptr = fopen("time_tree.txt", "w");
    if(!ptr) return;
    printf("len:%d  time:%d\n", len, times);
    
    struct list_head testlist;
    INIT_LIST_HEAD(&testlist);
    struct listitem *item, *is = NULL;
    size_t i;
    for (i = 0; i < len; i++) {
        item = (struct listitem *) malloc(sizeof(*item));
        list_add_tail(&item->list, &testlist);
    }
    
    struct timespec time_start;
    struct timespec time_end;
    double during;
    for (int time_i = 0;time_i < times;time_i++) {
        printf("%d\n", time_i);
        list_for_each_entry_safe(item, is, &testlist, list) {
            item->i = self_random();
        }
        clock_gettime(CLOCK_MONOTONIC, &time_start);
        //list_qsort_iterative(&testlist);
        tree_sort(&testlist, len);
        clock_gettime(CLOCK_MONOTONIC, &time_end);  
        during = time_diff(time_start, time_end);
        fprintf(ptr, "%d %f\n" , time_i, during);
    }
    fclose(ptr);

}

int main() {
    analysis();
    return 0;
}

對此我進行了比對，可以明顯看出 Tree sort 所擁有的效率大於 quick sort 。

可以做出推論，quick sort 在進行分離、比較及合併所累積的成本，比轉化成 balanced binary search tree 與走訪重建順序還要高。
對此思考過 quick sort 所使用的 stack 操作造成資源使用過大，運行成本高。

Reference

tree_sort and list_sort

考量點

cache 記憶體使用。

stackoverflow - What is a “cache-friendly” code?
The idea is that most of the executing code will be hitting a small set of variables often, and the rest (a much larger set of variables) infrequently. If the processor can't find the data in L1 cache, then it looks in L2 cache. If not there, then L3 cache, and if not there, main memory. Each of these "misses" is expensive in time.
- 在 list_sort ( merge sort ) 是以至少 2:1 的 balanced merge 來排序，盡可能在進入到 cache 即可排序。
- 而 tree_sort 則是以插入節點後，再以 inorder 走訪來排序。
- 在此，兩個可以比較出結點被移進 cache 的次數在原理上會有差。
  //TODO 尋找 cache friendly 的 tree sort
  Cache-Friendly Search Tree

Reference

5 c_map

在經過第三周 quiz3 的 fbstring 測驗後，以及 hankluo6 同學對於 list_sort 和 tree_sort 的 cache 使用一提，令我想到能對 rbtree 的儲存方式進行最佳化。
因此仔細回去查看 Linux kernel 裡的 rbtree 實作發現，實作中早已對節點的記憶體進行了最佳化。
以下是我發現的地方：

根據 hankluo6 同學與 jserv 老師的 c_map：











typedef enum { C_MAP_RED, C_MAP_BLACK, C_MAP_DOUBLE_BLACK } c_map_color_t;

/*
 * Store the key, data, and values of each element in the tree.
 * This is the main basis of the entire tree aside from the root struct.
 */
typedef struct c_map_node {
    void *key, *data;
    struct c_map_node *left, *right, *up;
    c_map_color_t color;
} c_map_node_t;






typedef struct __node {                   
    int value;
    struct __node *next;
    struct __node *left, *right, *up;
    c_map_color_t color;
} node_t;

可以看到對於 parent 這個連結點是以 struct c_map_node *up 或是 struct __node *up 表示。

然而在 Linux kernel 中卻是只有：

struct rb_node {
	unsigned long  __rb_parent_color;
	struct rb_node *rb_right;
	struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));

少了 parent 這個參數，且 color 被設為 unsigned long 型態。
根據 stackoverflow - Red black node's struct alignment in linux kernel 的解說，是因為他利用 __attribute__((aligned(sizeof(long)))) 把 color 這個參數儲存在 parent 的最後兩個 bit 了。
原理引述上續那篇提問者 Don Hall 的留言：

I think i get it: the pointer save a node's address, which is multiple of 4(100) because of alignment of 4, so the value of a pointer will always like 100, 1000, 1100, 10100100. and is always end with 00. thanks a lot! – Don Hall

可以知道因為 aligned 之後會讓記憶體對齊，一般來說是讓 CPU 的讀取效率提高，但在 Linux kernel 的紅黑樹卻利用這點使 rb_node 的記憶體位置的最後兩的位元一定為零，然後利用這兩個位元去紀錄是黑還是紅。

因此我學習了 Linux kernel 裡的技巧改了 hankluo6 同學的 node_t ( c_map ) 對顏色儲存的方式：

typedef struct __node { 
    unsigned long  color;
    struct __node *left;
    struct __node *right;
    struct __node *next;
    long value;
} node_t __attribute__((aligned(sizeof(long))));

並設了以下 marco 讓我們能分別讀取 parent 和 color ：

#define C_MAP_RED          0
#define C_MAP_BLACK        1
#define rb_parent(r)   ((node_t *)((r)->color & ~3))
#define rb_color(r)   ((r)->color & 1)
#define rb_set_parent(r, p) do{ (r)->color =  rb_color(r) | (unsigned long)(p); } while (0)
#define rb_set_red(r)  do { (r)->color &= ~1; } while (0)
#define rb_set_black(r)  do { (r)->color |= 1; } while (0)
#define rb_is_red(r)      (!rb_color(r))
#define rb_is_black(r)    (rb_color(r))

下方為各節點映出的佔用空間：

normal(node_t) size: 48
cmap size: 48
aligned size: 40

可以看到就如第三周測驗的 quiz3 裡的 fbstring 一樣，近乎榨乾了所有能用的空間。

對此因為如果原先的 c_map 與更改過後的函式一起執行的話會衝突到，但改下去要花費大量的時間，在此先進行較不精確的效能比對：

詳細程式碼請到 c_map_bit.c 和 type.h 查閱，謝謝。

Reference

stackoverflow - Red black node's struct alignment in linux kernel
stackoverflow - Parent pointer in linux kernel RBTree
github
- sysprog21
- hankluo6
Linux kernel source tree
- root/include/linux/rbtree.h

額外閱讀

Linux 核心設計: 記憶體管理

Jim Huang

2021/02/27 09:27:12

在文中所給出的數據是最快速。

改用 HackMD 的表格語法來展現，避免非必要的 HTML 語法，後者不易維護 (Edited)

2021/03/09 13:52:00

一個 PRNG [**Lagged Fibonacci generators

第三週的作業提到 LFG: https://hackmd.io/@sysprog/linux2021-fibdrv (Edited)

2021/03/11 09:42:17

因此我學習了 Linux kernel 裡的技巧改了 hankluo6 同學的 node_t ( c_map ) 對顏色儲存的方

等你送 pull request! (Edited)

linD026

2021/03/11 09:58:50

但 enum 裡還有 C_MAP_DOUBLE_BLACK 雖然搜尋 c_map 檔案都沒查到使用的地方但會設這個的原因我還不確定 (Edited)

2021/03/11 11:05:01

當初實作考慮到節點可能會是 red, black, 及可能的 double black: https://www.usna.edu/Users/cs/crabbe/SI321/current/red-black/red-black.html (Edited)

2021/03/11 11:05:31

會嘗試 pull request ，但需要一段時間。 (Edited)

2021q1 Homework1 (quiz1)

tags: linux2021

Reviewed by jserv

0 進度

1 解釋程式運作原理

結構定義

函式說明

1. list_add_node_t

2. list_concat

1

2

3

4

3. quicksort

Tail call

random 問題

包裝 random - LFG

ASLR ( Address Space Layout Randomization )

2 非遞迴呼叫實作

3 linux-list 探討並改寫

1. Linux 核心實作

2. linux-list 的 list_qsort

結構定義

函式說明

INIT_LIST_HEAD

list_first_entry

list_for_each_entry_safe

cmpint

list_move_tail

list_move

list_add

list_splice & list_splice_tail

pseudocode

3. list_qsort 與 quicksort 差異

4. 改寫 list_qsort

quick sort

4 思考高效率的 linked list 排序演算法

A Comparative Study Of Linked List Sorting Algorithms

疑問

Quick sort

錯誤修正

Tree sort

rv32emu-next

tree sort 和 quick sort 分析

tree_sort and list_sort

5 c_map

Read more

Linux 核心採納 Rust 的狀況

[Linux 核心 Copy On Write 實作機制](https://hackmd.io/@linD026/Linux-kernel-COW-content)

Linux 核心 Read-Copy Update 筆記整理

[Linux 核心 Copy On Write - Memory Region](https://hackmd.io/@linD026/Linux-kernel-COW-content)

tags: `linux2021`

Reviewed by `jserv`

2. linux-list 的 `list_qsort`

3. `list_qsort` 與 `quicksort` 差異