---
tags: linux2023
---
# 2023q1 Homework1 (lab0)
contributed by < `Paintako` >
## 開發環境
```shell
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes:
39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
CPU family: 6
Model: 165
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 5
CPU max MHz: 4300.0000
CPU min MHz: 800.0000
BogoMIPS: 5799.77
```
## 開發過程
### q_new
使用 malloc 在 heap 中配置 struct list_head 的大小, 如果失敗的話就 return NULL, 否則使用 Linux 核心風格的 `list.h` 所提供的 `INIT_LIST_HEAD` 巨集,後者作用是對一個 node 做初始化, 使 next/ prev pointer 皆指向自己
```c
/* Create an empty queue */
struct list_head *q_new()
{
struct list_head *head = malloc(sizeof(struct list_head));
if (!head)
return NULL;
INIT_LIST_HEAD(head);
return head;
}
```
### q_free
利用 Linux Kernal API 提供的 ```list_for_each_safe``` function 來 iterate the whole list.
:::warning
實作過程中<s>報錯</s> ???, 皆與 `allocated block` 有關
查看 `queue.h` 才知道配置空間時是以 block 為單位
> 改進漢語書寫
> :notes: jserv
:::
實作過程中, 只使用 `free` 是不夠的, 因為整個 queue 中的 node 有:
* char*
* list_head
這兩種 element 構成, 所以在釋放記憶體時要把整個 node 給釋放掉, 使用 `queue.h` 提供的 `q_release_element` 來安全的釋放掉 node.
關於 `q_release_element` 函式:
```c
static inline void q_release_element(element_t *e)
{
test_free(e->value);
test_free(e);
}
```
```c
void q_free(struct list_head *l)
{
if (!l)
return;
struct list_head *pos, *n = NULL;
// list_for_each_safe:
// iterate over a list safe against removal of list entry
list_for_each_safe (pos, n, l) {
if (!pos) {
free(l);
return;
}
free(pos);
element_t *free_point = list_entry(pos, element_t, list);
q_release_element(free_point);
}
// after removing all nodes in queue, free head pointer
free(l);
return;
}
```
### q_insert_head
依據 `list.h` 中的描述,佇列內每個節點皆是一個含有以下兩者的 struct:
* char *value
* struct list_head list
所以在 malloc 時, 除了上述 struct 的空間需要配置外, 需要額外 malloc 傳入的字串, 需要特別注意的是, 當配置字串的大小時, 需要額外配置 1 byte, 原因是因為需要額外配置一個空間給 `EOF`
:::warning
為何是 `EOF`?
:::
另外, 第8行以及第15/16行需要特別注意, 如果 malloc 失敗, 要回傳 false 前需要把剛剛配置的空間給釋放掉, 否則會造成 `memory leak`.
```c
bool q_insert_head(struct list_head *head, char *s)
{
// malloc a spcae for newnode
if (!head)
return false;
element_t *newnode = malloc(sizeof(element_t));
if (!newnode) {
free(newnode);
return false;
}
// +1 for '/0'
char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
if (!str_ptr) {
free(str_ptr);
free(newnode);
return false;
}
memcpy(str_ptr, s, strlen(s) + 1);
newnode->value = str_ptr;
list_add(&newnode->list, head);
return true;
}
```
### q_insert_tail
同insert_head, 但是這次 call 的 api 是 `list_add_tail`
```c
/* Insert an element at tail of queue */
bool q_insert_tail(struct list_head *head, char *s)
{
// similar to insert_head
element_t *newnode = malloc(sizeof(element_t));
if (!newnode) {
free(newnode);
return false;
}
// +1 for '/0'
char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
if (!str_ptr) {
free(str_ptr);
free(newnode);
return false;
}
memcpy(str_ptr, s, strlen(s) + 1);
newnode->value = str_ptr;
list_add_tail(&newnode->list, head);
return true;
}
```
### q_remove_head
將某個 node **remove** 是將它移出 list中, 並非將它回收掉, 此 function 中有限定:
* 將被移除節點的字串複製到 `*sp` 中.
而且此 function 只有給 pointer to head, 所以要透過 `list_first_entry` 或者 `list_entry` 找的對應的 element_t, 然後把此 element_t 中的字串複製到 `sp` 中
```c
element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
// copy remove node's value into sp, and limit sp size < bufsize
if (!head || list_empty(head))
return NULL;
element_t *del_node = list_first_entry(head, element_t, list);
list_del(&del_node->list);
if (sp) {
// when bufsize>1, sp can be assinged with char.
for (char *str_ptr = del_node->value; bufsize > 1;
sp++, str_ptr++, bufsize--)
*sp = *str_ptr;
*sp = '\0'; // end of string ends with \0
}
return del_node;
}
```
:::warning
Note:
執行 `$ make SANITIZER=1 ` 開啟 Address Sanitizer 後, 發現 remove 皆有 `heap over flow` 的問題, 原因是:
使用之前的 for loop 會導致越界, 試圖存取 heap 之外的的範圍, 改用 `strncpy` 可以解決問題
補充:
* 何謂 buffer overflow?
* 答: 當今天宣告一段記憶體位置, 當試圖存取宣告以外的地方, 稱之.
strcpy v.s strncpy
* strcpy 有 buffer overflow的問題
* strncpy 加入 count 參數來控制 copy 數量
可使用以下工具來 Debug:
* GDB ( 可搭配 AddressSanitizer )
* AddressSanitizer
* CoreDump ( 還沒看 )
:::
### q_remove_tail
跟 `q_remove_head` 的差別在於, 在找 `list_entry` 時, 給的位址並非 `head` 而是 `head->prev`, 此動作相當於找 tail.
```c
/* Remove an element from tail of queue */
element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
// /**
// * since each node has prev and next pointer,
// * we can simply find tail position in queue in O(1)
// */
if (!head || list_empty(head))
return NULL;
element_t *del_node =
list_entry(head->prev, element_t, list); // find entry for remove node
if (!del_node)
return NULL;
list_del(&del_node->list);
if (sp) {
// when bufsize>1, sp can be assinged with char.
for (char *str_ptr = del_node->value; bufsize > 1;
sp++, str_ptr++, bufsize--)
*sp = *str_ptr;
*sp = '\0'; // end of string ends with \0
}
return del_node;
}
```
### q_size
使用一變數 `len` 來紀錄 node 個數.
```c
/* Return number of elements in queue */
int q_size(struct list_head *head)
{
if (!head)
return 0;
int len = 0;
struct list_head *li;
list_for_each (li, head)
len++;
return len;
}
```
### q_delete_mid
使用兩種 pointer:
* fast pointer
* slow pointer
這兩種的差別是: fast pointer一次移動兩次, 而 slow 移動一次, 如果 fast 移動到終點, 那 slow 的所在位置即是中點.
```c
/* Delete the middle node in queue */
bool q_delete_mid(struct list_head *head)
{
// https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
if (!head)
return false;
struct list_head *fast = head->next->next;
struct list_head *slow = head->next;
// after while loop, slow == mid_position
while (fast->prev != head && fast != head) {
fast = fast->next->next;
slow = slow->next;
}
// update queue
list_del(slow);
element_t *rmv_ptr = list_entry(slow, element_t, list);
q_release_element(rmv_ptr);
return true;
}
```
### q_swap
實作過程中, 因為 `list_mode` 可以把兩個節點交換, 所以只需要用兩個指標指向要交換的兩個點, 並且若第一個指標指向 `head` or `head->next`, 則不做交換.
```c
/* Swap every two adjacent nodes */
void q_swap(struct list_head *head)
{
// https://leetcode.com/problems/swap-nodes-in-pairs/
if (!head)
return;
// void list_add(*node, *target): insert node after target
// void list_del(*node): remove target node form list
struct list_head *n = head->next;
struct list_head *t = NULL;
while (n != head && n->next != head) {
t = n;
list_move(n, t->next);
n = n->next;
}
return;
}
```
### q_delete_dup
做的過程中, 首先我的想法是: 宣告一個類似 stack 的結構, 走訪每一的節點的同時, 如果這個點沒有看過, 就把它 push 進去 stack 中, 如果看過就 remove , 但是我誤會題目的意思了
這題的題意是: 如果 list 中有重複的值, 全部刪掉, 例如:
* list: 1->1->1->2->3->3->3
* after delete_dup
* list: 2
參考 `https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup` 的作法
`Linux Kernal API` 中的 function:
* list_cut_position(head_to, head_from, node)
* head_to: 要移進去的 list
* head_from: 被移動的 list
* node: 指定 node 以前全部移到 head_to
* e.g.
* list_from: head->1->2->3->4->5
* node: 3
* 做完cut後
* list_from: head->4->5
* list_to: head->1->2->3
* 直觀的講, 如果把整個 list 看成一個 array, 有10個元素
* i.e. array[10]
* cut postion 4:
* 分出: arr[0:4], arr[5:]
* list_splice()
* 把一個 list 移到另外一個 list
### q_sort()
由於有要求複雜度在 nlgn 內, 選用 merget sort.
參考 `https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup` 的作法
採用 `top down` 的 merge sort, 先找到 list 中點, 切成左右兩邊下去做排序.
### q_merge()
Intutive 的想法: 把所有 list 接起來, 再 merge.
分析: 沒有利用到 sort 後的優點
---
## 開啟 Address Sanitizer,修正 qtest 執行過程中的錯誤
錯誤來自 `q_remove_head` 以及 `q_remove_tail`
```shell
cmd> rh
=================================================================
==31019==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6040000008ba at pc 0x562cad2f2e78 bp 0x7ffe55680170 sp 0x7ffe55680160
READ of size 1 at 0x6040000008ba thread T0
#0 0x562cad2f2e77 in q_remove_head /home/eric/linux2023/lab0-c/queue.c:108
#1 0x562cad2ecb73 in do_remove /home/eric/linux2023/lab0-c/qtest.c:404
#2 0x562cad2ece06 in do_rh /home/eric/linux2023/lab0-c/qtest.c:461
#3 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
#4 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
#5 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
#6 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
#7 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x7f9c03229e3f in __libc_start_main_impl ../csu/libc-start.c:392
#9 0x562cad2e9d44 in _start (/home/eric/linux2023/lab0-c/qtest+0x9d44)
0x6040000008ba is located 0 bytes to the right of 42-byte region [0x604000000890,0x6040000008ba)
allocated by thread T0 here:
#0 0x7f9c036b4867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x562cad2f23c3 in test_malloc /home/eric/linux2023/lab0-c/harness.c:133
#2 0x562cad2f2c87 in q_insert_tail /home/eric/linux2023/lab0-c/queue.c:82
#3 0x562cad2ed3a5 in do_it /home/eric/linux2023/lab0-c/qtest.c:313
#4 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
#5 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
#6 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
#7 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
#8 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/eric/linux2023/lab0-c/queue.c:108 in q_remove_head
Shadow bytes around the buggy address:
0x0c087fff80c0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80d0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80e0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80f0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff8100: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
=>0x0c087fff8110: fa fa 00 00 00 00 00[02]fa fa fa fa fa fa fa fa
0x0c087fff8120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==31019==ABORTING
```
使用 gdb 確認出錯地方, 使用前把 `exception_setup` 給註解掉, 否則無法進入 `q_remove_head` 內檢查
```shell
$ gdb qtest
$ b 402
$ r
cmd> new
l = []
cmd> it 4
l = [4]
cmd> rh
Breakpoint 1, do_remove (option=option@entry=0, argc=1, argv=<optimized out>) at qtest.c:402
402 if (current) //&& exception_setup(true))
(gdb) s
403 re = option ? q_remove_tail(current->q, removes, string_length + 1)
(gdb) s
q_remove_head (head=0x606000000040, sp=sp@entry=0x61d000000080 "", bufsize=1025) at queue.c:97
97 {
(gdb) n
99 if (!head || list_empty(head))
(gdb) n
101 element_t *del_node = list_first_entry(head, element_t, list);
(gdb) n
102 list_del(&del_node->list);
(gdb) n
103 if (sp) {
(gdb) n
106 for (char *str_ptr = del_node->value; bufsize > 1;
(gdb) finish
Run till exit from #0 q_remove_head (head=<optimized out>, sp=0x61d000000083 "\336", 'X' <repeats 199 times>...,
sp@entry=0x61d000000080 "4", bufsize=<optimized out>) at queue.c:107
```
進入 `q_remove_head` 中 for 迴圈後, 不斷 `n` 直到退出循環為止, 但是停止條件有誤, 故 for 迴圈會不斷執行直到 heap-buffer-overflow 觸發 exception 為止
這裡越界的指標是 `char *str_ptr`
解法改用 `strncpy`, 可防止 buffer-overflow
## 運用 Valgrind 排除 qtest 實作的記憶體錯誤
`q_remove_head` 以及 `q_remove_head` 還沒修復前, 執行 valgrind 結果如下
```shell
$ valgrind -q --leak-check=full ./qtest <traces/trace-01-ops.cmd
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
==41444== Invalid read of size 1
==41444== at 0x10F7F3: q_remove_head (queue.c:108)
==41444== by 0x10C468: do_remove (qtest.c:404)
==41444== by 0x10C610: do_rh (qtest.c:461)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444== Address 0x4b88f00 is 0 bytes after a block of size 48 alloc'd
==41444== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444== by 0x10F203: test_malloc (harness.c:133)
==41444== by 0x10F6A7: q_insert_head (queue.c:55)
==41444== by 0x10CBDA: do_ih (qtest.c:224)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444==
==41444== Invalid read of size 1
==41444== at 0x10F80A: q_remove_head (queue.c:106)
==41444== by 0x10C468: do_remove (qtest.c:404)
==41444== by 0x10C610: do_rh (qtest.c:461)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444== Address 0x4b88f01 is 1 bytes after a block of size 48 alloc'd
==41444== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444== by 0x10F203: test_malloc (harness.c:133)
==41444== by 0x10F6A7: q_insert_head (queue.c:55)
==41444== by 0x10CBDA: do_ih (qtest.c:224)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444==
...
```
`Invalid read of size 1` 表示有越界讀取的行為發生
## 論文閱讀〈[Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)〉
參考 [WeiCheng14159](https://hackmd.io/@WeiCheng14159/HyAa0RTVP#%E8%AB%96%E6%96%87%E7%A0%94%E8%AE%80)
方法: leakage detection test on the execution time.
具體之作法是給程式兩個不同類型的輸入資料(i.e. fix-vs-random 測試),並觀察給定不同的輸入資料,所造成的執行時間分佈在統計上是否相同。
### How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper
規格書中描述如何量測 `Eexecution time`
RDTSC ( Read Time-Stamp Counter and Processor ID IA assembly instruction )
基本上, 使用這個指令時,需要先把 `preemption` 給關掉,以及 `interrupts` 給關掉,避免因為搶佔 (preemption) 影響量測。以下是範例程式碼:
```c
static int __init hello_start(void)
{
unsigned long flags;
uint64_t start, end;
int variable = 0;
unsigned cycles_low, cycles_high, cycles_low1, cycles_high1;
printk(KERN_INFO "Loading test module...\n");
preempt_disable(); /*we disable preemption on our CPU*/
raw_local_irq_save(flags); /*we disable hard interrupts on our CPU*/
/*at this stage we exclusively own the CPU*/
asm volatile (
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low));
measured_function(&variable);
asm volatile (
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1));
raw_local_irq_restore(flags);
/*we enable hard interrupts on our CPU*/
preempt_enable();/*we enable preemption*/
start = ( ((uint64_t)cycles_high << 32) | cycles_low );
end = ( ((uint64_t)cycles_high1 << 32) | cycles_low1 );
printk(KERN_INFO "\n function execution time is %llu clock cycles", (end-
start));
return 0;
}
```
上面的程式看似合理, 但是有缺陷會影響評測, 例如:
* 關掉 interupt/ preemtion 的 function call overhead
* 包含 Calling CPUID 以及 RDTSC 本身
* Register Overwriting ( 待釐清, about register )
* Out of Order Execution ( 亂序執行 )
:::warning
* asm
用來寫 inline assembly,C 語言中為了效率等目的直接要求編譯器加入我們所指定組合語言。
* volatile 通常用於以下幾種情況:
外部設備/硬體介面:當變量是外部設備/硬體介面的寄存器時,該變量可能會在未知的時間被更改,因此使用 volatile 關鍵字可以告訴編譯器不要對這些變量進行某些最佳化,例如刪除未使用的變量或進行重複計算等。
多執行緒環境:在多執行緒環境下,多個執行緒可能同時訪問同一個變量,如果不使用 volatile 關鍵字,編譯器可能會對該變量進行某些最佳化,導致在執行緒間共享變量時出現問題。
非預期的控制流:當控制流在編譯器無法預測的方式下更改變量時,使用 volatile 關鍵字可以確保編譯器不會對變量進行某些最佳化。
總之,volatile 關鍵字告訴編譯器該變量是易變的,不能對其進行某些最佳化,以確保變量在某些情況下的可見性和可預測性。
:::
### 目前 `lab0-c` 內 `dudect` 缺陷
執行 `RDTSC` 前,並沒有`preemption` 給關掉,以及 `interrupts` 給關掉
:::danger
改進你的漢語表達
:notes: jserv
:::
### q_test 改善
觀察 `qtest.c`, 檢查 `insert head`, `insert_tail` 等動作時, 會呼叫 `is_##_const` 的 function, 這些 function 定義在 `fixture.h` 頭文件中
:::warning
C 語言中, ##是一個預處理器運算符, 以下節錄自 C99 規格書
6.10.3.3 The ## operator
Semantics
1 The resulting token is the concatenation of the original two tokens. The resulting token is available for further macro expansion, except in the following cases:
— It is formed by replacing a parameter by a token sequence that does not contain any of the following: ##, #, or a parameter;
— It is the result of a ## operator in a macro definition that was not used in a # operator with a string literal as its operand;
— It is a character constant, a string literal, or a header name (that is, the result of a < or a > operator in a #include directive).
簡而言之,`##` 的作用是將兩個 token 結合成一個新的 token 。該運算符通常用於擴展巨集(macro)中,以將多個 token 結合成一個新的巨集名稱、變量名稱、函數名稱等等。
**以上節錄自Chatgpt**
> :warning: 改進上方的書寫,留意 [資訊科技詞彙翻譯](https://hackmd.io/@sysprog/it-vocabulary)
> :notes: jserv
>> 收到!
:::
`fixture.c` 中, 存在一個函式 `measure`, 該 function 會呼叫 `cpucycles` 來計算程式執行期間的 cpu cycles 數量
`dudect` 中, 沒有按照 interl 白皮書中的那樣, 計算 cpu cycle 前關掉 `preemption` 及 `interrupts` .
根據這點, 利用 `perf` 來做實驗
```shell
$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest < traces/trace-17-complexity.cmd
Performance counter stats for './qtest' (5 runs):
4438 cache-misses # 0.000 % of all cache refs ( +-40185.85% )
6,7127 cache-references ( +-3686501.81% )
134,6960 instructions # 0.00 insn per cycle ( +-11300715.53% )
112,5612 cycles ( +-5661047.29% )
15.54 +- 15.53 seconds time elapsed ( +-100.00% )
```
以下是統計來自 `user space` 的 instruction count
```shell
perf stat --repeat 5 -e cache-misses,cache-references,instructions:u,cycles:u ./qtest < traces/trace-17-complexity.cmd
Performance counter stats for './qtest' (5 runs):
3775 cache-misses # 0.000 % of all cache refs ( +-293615.29% )
7,4683 cache-references ( +-3633221.45% )
26,5826 instructions:u # 0.00 insn per cycle ( +-57370700.97% )
30,7086 cycles:u ( +-21566357.39% )
16.43 +- 16.43 seconds time elapsed ( +-100.00% )
```
使用白皮書內的寫法後, cycles 數量不減反增. 且無法 `#include <linux/preempt.h>`, 待查
:::warning
確認 cache 對效能的影響。
:notes: jserv
:::
以下參考自 [KHLee529](https://hackmd.io/@KHLee529/linux2023-lab0#lab0-%E4%B8%AD%E7%9A%84-dudect-%E5%AF%A6%E4%BD%9C)
論文中有以下論述:
> Cropping: Typical timing distributions are positively
skewed towards larger execution times. This may be caused by measurement artifacts, the main process being interrupted by the OS or other extraneous activities. In this case, it may be convenient to crop the measurements (or discard measurements that are larger than a fixed, class-independent threshold).
可以把 `outlier ( 離群值 )` 刪除
同時檢查 cpu cycles, 發現其實並沒有和前者有太大的差距, 但是通過 constant time 檢查
```shell
Performance counter stats for './qtest' (5 runs):
5973 cache-misses # 0.001 % of all cache refs ( +-18700.29% )
7,1710 cache-references ( +-598499.01% )
134,8395 instructions # 0.00 insn per cycle ( +-1760534.07% )
111,9976 cycles ( +-928018.50% )
2.56 +- 2.56 seconds time elapsed ( +- 99.98% )
```
---
## 確保 qtest 提供的 web 命令得以搭配佇列實作使用