2023q1 Homework1 (lab0)

contributed by < Paintako >

開發環境

$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:
  39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
    CPU family:          6
    Model:               165
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            5
    CPU max MHz:         4300.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5799.77

開發過程

q_new

使用 malloc 在 heap 中配置 struct list_head 的大小, 如果失敗的話就 return NULL, 否則使用 Linux 核心風格的 list.h 所提供的 INIT_LIST_HEAD 巨集，後者作用是對一個 node 做初始化, 使 next/ prev pointer 皆指向自己

/* Create an empty queue */
struct list_head *q_new()
{
    struct list_head *head = malloc(sizeof(struct list_head));
    if (!head)
        return NULL;
    INIT_LIST_HEAD(head);
    return head;
}

q_free

利用 Linux Kernal API 提供的 list_for_each_safe function 來 iterate the whole list.

實作過程中報錯 ???, 皆與 allocated block 有關
查看 queue.h 才知道配置空間時是以 block 為單位

改進漢語書寫
jserv

實作過程中, 只使用 free 是不夠的, 因為整個 queue 中的 node 有:

char*
list_head
這兩種 element 構成, 所以在釋放記憶體時要把整個 node 給釋放掉, 使用 queue.h 提供的 q_release_element 來安全的釋放掉 node.
關於 q_release_element 函式:

static inline void q_release_element(element_t *e)
{
    test_free(e->value);
    test_free(e);
}

void q_free(struct list_head *l)
{
    if (!l)
        return;
    struct list_head *pos, *n = NULL;
    // list_for_each_safe:
    // iterate over a list safe against removal of list entry
    list_for_each_safe (pos, n, l) {
        if (!pos) {
            free(l);
            return;
        }
        free(pos);
        element_t *free_point = list_entry(pos, element_t, list);
        q_release_element(free_point);
    }
    // after removing all nodes in queue, free head pointer
    free(l);
    return;
}

q_insert_head

依據 list.h 中的描述，佇列內每個節點皆是一個含有以下兩者的 struct：

char *value
struct list_head list

所以在 malloc 時, 除了上述 struct 的空間需要配置外, 需要額外 malloc 傳入的字串, 需要特別注意的是, 當配置字串的大小時, 需要額外配置 1 byte, 原因是因為需要額外配置一個空間給 EOF

為何是 EOF？

另外, 第8行以及第15/16行需要特別注意, 如果 malloc 失敗, 要回傳 false 前需要把剛剛配置的空間給釋放掉, 否則會造成 memory leak.

bool q_insert_head(struct list_head *head, char *s)
{
    // malloc a spcae for newnode
    if (!head)
        return false;
    element_t *newnode = malloc(sizeof(element_t));
    if (!newnode) {
        free(newnode);
        return false;
    }

    // +1 for '/0'
    char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
    if (!str_ptr) {
        free(str_ptr);
        free(newnode);
        return false;
    }

    memcpy(str_ptr, s, strlen(s) + 1);
    newnode->value = str_ptr;
    list_add(&newnode->list, head);
    return true;
}

q_insert_tail

同insert_head, 但是這次 call 的 api 是 list_add_tail

/* Insert an element at tail of queue */
bool q_insert_tail(struct list_head *head, char *s)
{
    // similar to insert_head
    element_t *newnode = malloc(sizeof(element_t));
    if (!newnode) {
        free(newnode);
        return false;
    }

    // +1 for '/0'
    char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
    if (!str_ptr) {
        free(str_ptr);
        free(newnode);
        return false;
    }

    memcpy(str_ptr, s, strlen(s) + 1);
    newnode->value = str_ptr;
    list_add_tail(&newnode->list, head);
    return true;
}

q_remove_head

將某個 node remove 是將它移出 list中, 並非將它回收掉, 此 function 中有限定:

將被移除節點的字串複製到 *sp 中.

而且此 function 只有給 pointer to head, 所以要透過 list_first_entry 或者 list_entry 找的對應的 element_t, 然後把此 element_t 中的字串複製到 sp 中

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    // copy remove node's value into sp, and limit sp size < bufsize
    if (!head || list_empty(head))
        return NULL;
    element_t *del_node = list_first_entry(head, element_t, list);
    list_del(&del_node->list);
    if (sp) {
        // when bufsize>1, sp can be assinged with char.
        for (char *str_ptr = del_node->value; bufsize > 1;
             sp++, str_ptr++, bufsize--)
            *sp = *str_ptr;
        *sp = '\0';  // end of string ends with \0
    }
    return del_node;
}

Note:
執行 $ make SANITIZER=1 開啟 Address Sanitizer 後, 發現 remove 皆有 heap over flow 的問題, 原因是：
使用之前的 for loop 會導致越界, 試圖存取 heap 之外的的範圍, 改用 strncpy 可以解決問題

補充：

何謂 buffer overflow?
答：當今天宣告一段記憶體位置, 當試圖存取宣告以外的地方, 稱之.

strcpy v.s strncpy

strcpy 有 buffer overflow的問題
strncpy 加入 count 參數來控制 copy 數量

可使用以下工具來 Debug:

GDB ( 可搭配 AddressSanitizer )
AddressSanitizer
CoreDump （還沒看）

q_remove_tail

跟 q_remove_head 的差別在於, 在找 list_entry 時, 給的位址並非 head 而是 head->prev, 此動作相當於找 tail.

/* Remove an element from tail of queue */
element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    // /**
    //  * since each node has prev and next pointer,
    //  * we can simply find tail position in queue in O(1)
    //  */
    if (!head || list_empty(head))
        return NULL;
    element_t *del_node =
        list_entry(head->prev, element_t, list);  // find entry for remove node
    if (!del_node)
        return NULL;
    list_del(&del_node->list);
    if (sp) {
        // when bufsize>1, sp can be assinged with char.
        for (char *str_ptr = del_node->value; bufsize > 1;
             sp++, str_ptr++, bufsize--)
            *sp = *str_ptr;
        *sp = '\0';  // end of string ends with \0
    }
    return del_node;
}

q_size

使用一變數 len 來紀錄 node 個數.

/* Return number of elements in queue */
int q_size(struct list_head *head)
{
    if (!head)
        return 0;

    int len = 0;
    struct list_head *li;

    list_for_each (li, head)
        len++;
    return len;
}

q_delete_mid

使用兩種 pointer:

fast pointer
slow pointer

這兩種的差別是: fast pointer一次移動兩次, 而 slow 移動一次, 如果 fast 移動到終點, 那 slow 的所在位置即是中點.

/* Delete the middle node in queue */
bool q_delete_mid(struct list_head *head)
{
    // https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
    if (!head)
        return false;

    struct list_head *fast = head->next->next;
    struct list_head *slow = head->next;

    // after while loop, slow == mid_position
    while (fast->prev != head && fast != head) {
        fast = fast->next->next;
        slow = slow->next;
    }

    // update queue
    list_del(slow);
    element_t *rmv_ptr = list_entry(slow, element_t, list);
    q_release_element(rmv_ptr);

    return true;
}

q_swap

實作過程中, 因為 list_mode 可以把兩個節點交換, 所以只需要用兩個指標指向要交換的兩個點, 並且若第一個指標指向 head or head->next, 則不做交換.

/* Swap every two adjacent nodes */
void q_swap(struct list_head *head)
{
    // https://leetcode.com/problems/swap-nodes-in-pairs/
    if (!head)
        return;
    // void list_add(*node, *target): insert node after target
    // void list_del(*node): remove target node form list

    struct list_head *n = head->next;
    struct list_head *t = NULL;
    while (n != head && n->next != head) {
        t = n;
        list_move(n, t->next);
        n = n->next;
    }

    return;
}

q_delete_dup

做的過程中, 首先我的想法是：宣告一個類似 stack 的結構, 走訪每一的節點的同時, 如果這個點沒有看過, 就把它 push 進去 stack 中, 如果看過就 remove , 但是我誤會題目的意思了
這題的題意是：如果 list 中有重複的值, 全部刪掉, 例如:

list: 1->1->1->2->3->3->3
after delete_dup
list: 2

參考 https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup 的作法

Linux Kernal API 中的 function：

list_cut_position（head_to, head_from, node)
- head_to: 要移進去的 list
- head_from: 被移動的 list
- node: 指定 node 以前全部移到 head_to
- e.g.
  - list_from: head->1->2->3->4->5
  - node: 3
  - 做完cut後
  - list_from: head->4->5
  - list_to: head->1->2->3
- 直觀的講, 如果把整個 list 看成一個 array, 有10個元素
- i.e. array[10]
- cut postion 4:
- 分出： arr[0:4], arr[5:]
list_splice()
- 把一個 list 移到另外一個 list

q_sort()

由於有要求複雜度在 nlgn 內, 選用 merget sort.
參考 https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup 的作法
採用 top down 的 merge sort, 先找到 list 中點, 切成左右兩邊下去做排序.

q_merge()

Intutive 的想法：把所有 list 接起來, 再 merge.

分析：沒有利用到 sort 後的優點

開啟 Address Sanitizer，修正 qtest 執行過程中的錯誤

錯誤來自 q_remove_head 以及 q_remove_tail

cmd> rh
=================================================================
==31019==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6040000008ba at pc 0x562cad2f2e78 bp 0x7ffe55680170 sp 0x7ffe55680160
READ of size 1 at 0x6040000008ba thread T0
    #0 0x562cad2f2e77 in q_remove_head /home/eric/linux2023/lab0-c/queue.c:108
    #1 0x562cad2ecb73 in do_remove /home/eric/linux2023/lab0-c/qtest.c:404
    #2 0x562cad2ece06 in do_rh /home/eric/linux2023/lab0-c/qtest.c:461
    #3 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
    #4 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
    #5 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
    #6 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
    #7 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #8 0x7f9c03229e3f in __libc_start_main_impl ../csu/libc-start.c:392
    #9 0x562cad2e9d44 in _start (/home/eric/linux2023/lab0-c/qtest+0x9d44)

0x6040000008ba is located 0 bytes to the right of 42-byte region [0x604000000890,0x6040000008ba)
allocated by thread T0 here:
    #0 0x7f9c036b4867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x562cad2f23c3 in test_malloc /home/eric/linux2023/lab0-c/harness.c:133
    #2 0x562cad2f2c87 in q_insert_tail /home/eric/linux2023/lab0-c/queue.c:82
    #3 0x562cad2ed3a5 in do_it /home/eric/linux2023/lab0-c/qtest.c:313
    #4 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
    #5 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
    #6 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
    #7 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
    #8 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/eric/linux2023/lab0-c/queue.c:108 in q_remove_head
Shadow bytes around the buggy address:
  0x0c087fff80c0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff80d0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff80e0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff80f0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
  0x0c087fff8100: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
=>0x0c087fff8110: fa fa 00 00 00 00 00[02]fa fa fa fa fa fa fa fa
  0x0c087fff8120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==31019==ABORTING

使用 gdb 確認出錯地方, 使用前把 exception_setup 給註解掉, 否則無法進入 q_remove_head 內檢查

$ gdb qtest
$ b 402 
$ r
cmd> new
l = []
cmd> it 4
l = [4]
cmd> rh
Breakpoint 1, do_remove (option=option@entry=0, argc=1, argv=<optimized out>) at qtest.c:402
402	    if (current) //&& exception_setup(true))
(gdb) s
403	        re = option ? q_remove_tail(current->q, removes, string_length + 1)
(gdb) s
q_remove_head (head=0x606000000040, sp=sp@entry=0x61d000000080 "", bufsize=1025) at queue.c:97
97	{
(gdb) n
99	    if (!head || list_empty(head))
(gdb) n
101	    element_t *del_node = list_first_entry(head, element_t, list);
(gdb) n
102	    list_del(&del_node->list);
(gdb) n
103	    if (sp) {
(gdb) n
106	        for (char *str_ptr = del_node->value; bufsize > 1;
(gdb) finish
Run till exit from #0  q_remove_head (head=<optimized out>, sp=0x61d000000083 "\336", 'X' <repeats 199 times>..., 
    sp@entry=0x61d000000080 "4", bufsize=<optimized out>) at queue.c:107

進入 q_remove_head 中 for 迴圈後, 不斷 n 直到退出循環為止, 但是停止條件有誤, 故 for 迴圈會不斷執行直到 heap-buffer-overflow 觸發 exception 為止
這裡越界的指標是 char *str_ptr

解法改用 strncpy, 可防止 buffer-overflow

運用 Valgrind 排除 qtest 實作的記憶體錯誤

q_remove_head 以及 q_remove_head 還沒修復前, 執行 valgrind 結果如下

$ valgrind -q --leak-check=full ./qtest <traces/trace-01-ops.cmd 
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
==41444== Invalid read of size 1
==41444==    at 0x10F7F3: q_remove_head (queue.c:108)
==41444==    by 0x10C468: do_remove (qtest.c:404)
==41444==    by 0x10C610: do_rh (qtest.c:461)
==41444==    by 0x10DEED: interpret_cmda (console.c:181)
==41444==    by 0x10E4A2: interpret_cmd (console.c:201)
==41444==    by 0x10F10C: run_console (console.c:691)
==41444==    by 0x10D2DF: main (qtest.c:1227)
==41444==  Address 0x4b88f00 is 0 bytes after a block of size 48 alloc'd
==41444==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444==    by 0x10F203: test_malloc (harness.c:133)
==41444==    by 0x10F6A7: q_insert_head (queue.c:55)
==41444==    by 0x10CBDA: do_ih (qtest.c:224)
==41444==    by 0x10DEED: interpret_cmda (console.c:181)
==41444==    by 0x10E4A2: interpret_cmd (console.c:201)
==41444==    by 0x10F10C: run_console (console.c:691)
==41444==    by 0x10D2DF: main (qtest.c:1227)
==41444== 
==41444== Invalid read of size 1
==41444==    at 0x10F80A: q_remove_head (queue.c:106)
==41444==    by 0x10C468: do_remove (qtest.c:404)
==41444==    by 0x10C610: do_rh (qtest.c:461)
==41444==    by 0x10DEED: interpret_cmda (console.c:181)
==41444==    by 0x10E4A2: interpret_cmd (console.c:201)
==41444==    by 0x10F10C: run_console (console.c:691)
==41444==    by 0x10D2DF: main (qtest.c:1227)
==41444==  Address 0x4b88f01 is 1 bytes after a block of size 48 alloc'd
==41444==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444==    by 0x10F203: test_malloc (harness.c:133)
==41444==    by 0x10F6A7: q_insert_head (queue.c:55)
==41444==    by 0x10CBDA: do_ih (qtest.c:224)
==41444==    by 0x10DEED: interpret_cmda (console.c:181)
==41444==    by 0x10E4A2: interpret_cmd (console.c:201)
==41444==    by 0x10F10C: run_console (console.c:691)
==41444==    by 0x10D2DF: main (qtest.c:1227)
==41444== 

...

Invalid read of size 1 表示有越界讀取的行為發生

論文閱讀〈Dude, is my code constant time?〉

參考 WeiCheng14159

方法： leakage detection test on the execution time.

具體之作法是給程式兩個不同類型的輸入資料（i.e. fix-vs-random 測試），並觀察給定不同的輸入資料，所造成的執行時間分佈在統計上是否相同。

How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper

規格書中描述如何量測 Eexecution time
RDTSC ( Read Time-Stamp Counter and Processor ID IA assembly instruction )
基本上, 使用這個指令時，需要先把 preemption 給關掉，以及 interrupts 給關掉，避免因為搶佔 (preemption) 影響量測。以下是範例程式碼:

static int __init hello_start(void)
{
    unsigned long flags;
    uint64_t start, end;
    int variable = 0;
    unsigned cycles_low, cycles_high, cycles_low1, cycles_high1;
    printk(KERN_INFO "Loading test module...\n");
    preempt_disable(); /*we disable preemption on our CPU*/
    raw_local_irq_save(flags); /*we disable hard interrupts on our CPU*/
    /*at this stage we exclusively own the CPU*/
    asm volatile (
    "RDTSC\n\t"
    "mov %%edx, %0\n\t"
    "mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low));
    measured_function(&variable);
    asm volatile (
    "RDTSC\n\t"
    "mov %%edx, %0\n\t"
    "mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1));
    raw_local_irq_restore(flags);
    /*we enable hard interrupts on our CPU*/
    preempt_enable();/*we enable preemption*/
    start = ( ((uint64_t)cycles_high << 32) | cycles_low );
    end = ( ((uint64_t)cycles_high1 << 32) | cycles_low1 );
    printk(KERN_INFO "\n function execution time is %llu clock cycles", (end-
    start));
    
    return 0;
}

上面的程式看似合理, 但是有缺陷會影響評測, 例如：

關掉 interupt/ preemtion 的 function call overhead
- 包含 Calling CPUID 以及 RDTSC 本身
Register Overwriting （待釐清, about register )
Out of Order Execution ( 亂序執行）

用來寫 inline assembly，C 語言中為了效率等目的直接要求編譯器加入我們所指定組合語言。

volatile 通常用於以下幾種情況：

外部設備/硬體介面：當變量是外部設備/硬體介面的寄存器時，該變量可能會在未知的時間被更改，因此使用 volatile 關鍵字可以告訴編譯器不要對這些變量進行某些最佳化，例如刪除未使用的變量或進行重複計算等。

多執行緒環境：在多執行緒環境下，多個執行緒可能同時訪問同一個變量，如果不使用 volatile 關鍵字，編譯器可能會對該變量進行某些最佳化，導致在執行緒間共享變量時出現問題。

非預期的控制流：當控制流在編譯器無法預測的方式下更改變量時，使用 volatile 關鍵字可以確保編譯器不會對變量進行某些最佳化。

總之，volatile 關鍵字告訴編譯器該變量是易變的，不能對其進行某些最佳化，以確保變量在某些情況下的可見性和可預測性。

目前 `lab0-c` 內 `dudect` 缺陷

執行 RDTSC 前，並沒有preemption 給關掉，以及 interrupts 給關掉

改進你的漢語表達
jserv

q_test 改善

觀察 qtest.c, 檢查 insert head, insert_tail 等動作時, 會呼叫 is_##_const 的 function, 這些 function 定義在 fixture.h 頭文件中

C 語言中, ##是一個預處理器運算符, 以下節錄自 C99 規格書

6.10.3.3 The ## operator

Semantics

1 The resulting token is the concatenation of the original two tokens. The resulting token is available for further macro expansion, except in the following cases:

— It is formed by replacing a parameter by a token sequence that does not contain any of the following: ##, #, or a parameter;

— It is the result of a ## operator in a macro definition that was not used in a # operator with a string literal as its operand;

— It is a character constant, a string literal, or a header name (that is, the result of a < or a > operator in a #include directive).

簡而言之，## 的作用是將兩個 token 結合成一個新的 token 。該運算符通常用於擴展巨集（macro）中，以將多個 token 結合成一個新的巨集名稱、變量名稱、函數名稱等等。

以上節錄自Chatgpt

改進上方的書寫，留意資訊科技詞彙翻譯
jserv

收到！

fixture.c 中, 存在一個函式 measure, 該 function 會呼叫 cpucycles 來計算程式執行期間的 cpu cycles 數量

dudect 中, 沒有按照 interl 白皮書中的那樣, 計算 cpu cycle 前關掉 preemption 及 interrupts .
根據這點, 利用 perf 來做實驗

$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest < traces/trace-17-complexity.cmd

Performance counter stats for './qtest' (5 runs):

              4438      cache-misses              #    0.000 % of all cache refs      ( +-40185.85% )
            6,7127      cache-references                                              ( +-3686501.81% )
          134,6960      instructions              #    0.00  insn per cycle           ( +-11300715.53% )
          112,5612      cycles                                                        ( +-5661047.29% )

             15.54 +- 15.53 seconds time elapsed  ( +-100.00% )

以下是統計來自 user space 的 instruction count

perf stat --repeat 5 -e cache-misses,cache-references,instructions:u,cycles:u ./qtest < traces/trace-17-complexity.cmd 

 Performance counter stats for './qtest' (5 runs):

              3775      cache-misses              #    0.000 % of all cache refs      ( +-293615.29% )
            7,4683      cache-references                                              ( +-3633221.45% )
           26,5826      instructions:u            #    0.00  insn per cycle           ( +-57370700.97% )
           30,7086      cycles:u                                                      ( +-21566357.39% )

             16.43 +- 16.43 seconds time elapsed  ( +-100.00% )

使用白皮書內的寫法後, cycles 數量不減反增. 且無法 #include <linux/preempt.h>, 待查

確認 cache 對效能的影響。
jserv

以下參考自 KHLee529

論文中有以下論述：

Cropping: Typical timing distributions are positively
skewed towards larger execution times. This may be caused by measurement artifacts, the main process being interrupted by the OS or other extraneous activities. In this case, it may be convenient to crop the measurements (or discard measurements that are larger than a fixed, class-independent threshold).

可以把 outlier （離群值） 刪除

同時檢查 cpu cycles, 發現其實並沒有和前者有太大的差距, 但是通過 constant time 檢查

 Performance counter stats for './qtest' (5 runs):

              5973      cache-misses              #    0.001 % of all cache refs      ( +-18700.29% )
            7,1710      cache-references                                              ( +-598499.01% )
          134,8395      instructions              #    0.00  insn per cycle           ( +-1760534.07% )
          111,9976      cycles                                                        ( +-928018.50% )

              2.56 +- 2.56 seconds time elapsed  ( +- 99.98% )