contributed by < Paintako
>
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes:
39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
CPU family: 6
Model: 165
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 5
CPU max MHz: 4300.0000
CPU min MHz: 800.0000
BogoMIPS: 5799.77
使用 malloc 在 heap 中配置 struct list_head 的大小, 如果失敗的話就 return NULL, 否則使用 Linux 核心風格的 list.h
所提供的 INIT_LIST_HEAD
巨集,後者作用是對一個 node 做初始化, 使 next/ prev pointer 皆指向自己
/* Create an empty queue */
struct list_head *q_new()
{
struct list_head *head = malloc(sizeof(struct list_head));
if (!head)
return NULL;
INIT_LIST_HEAD(head);
return head;
}
利用 Linux Kernal API 提供的 list_for_each_safe
function 來 iterate the whole list.
實作過程中報錯 ???, 皆與 allocated block
有關
查看 queue.h
才知道配置空間時是以 block 為單位
改進漢語書寫
jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
實作過程中, 只使用 free
是不夠的, 因為整個 queue 中的 node 有:
queue.h
提供的 q_release_element
來安全的釋放掉 node.q_release_element
函式:static inline void q_release_element(element_t *e)
{
test_free(e->value);
test_free(e);
}
void q_free(struct list_head *l)
{
if (!l)
return;
struct list_head *pos, *n = NULL;
// list_for_each_safe:
// iterate over a list safe against removal of list entry
list_for_each_safe (pos, n, l) {
if (!pos) {
free(l);
return;
}
free(pos);
element_t *free_point = list_entry(pos, element_t, list);
q_release_element(free_point);
}
// after removing all nodes in queue, free head pointer
free(l);
return;
}
依據 list.h
中的描述,佇列內每個節點皆是一個含有以下兩者的 struct:
所以在 malloc 時, 除了上述 struct 的空間需要配置外, 需要額外 malloc 傳入的字串, 需要特別注意的是, 當配置字串的大小時, 需要額外配置 1 byte, 原因是因為需要額外配置一個空間給 EOF
為何是 EOF
?
另外, 第8行以及第15/16行需要特別注意, 如果 malloc 失敗, 要回傳 false 前需要把剛剛配置的空間給釋放掉, 否則會造成 memory leak
.
bool q_insert_head(struct list_head *head, char *s)
{
// malloc a spcae for newnode
if (!head)
return false;
element_t *newnode = malloc(sizeof(element_t));
if (!newnode) {
free(newnode);
return false;
}
// +1 for '/0'
char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
if (!str_ptr) {
free(str_ptr);
free(newnode);
return false;
}
memcpy(str_ptr, s, strlen(s) + 1);
newnode->value = str_ptr;
list_add(&newnode->list, head);
return true;
}
同insert_head, 但是這次 call 的 api 是 list_add_tail
/* Insert an element at tail of queue */
bool q_insert_tail(struct list_head *head, char *s)
{
// similar to insert_head
element_t *newnode = malloc(sizeof(element_t));
if (!newnode) {
free(newnode);
return false;
}
// +1 for '/0'
char *str_ptr = malloc(sizeof(char) * strlen(s) + 1);
if (!str_ptr) {
free(str_ptr);
free(newnode);
return false;
}
memcpy(str_ptr, s, strlen(s) + 1);
newnode->value = str_ptr;
list_add_tail(&newnode->list, head);
return true;
}
將某個 node remove 是將它移出 list中, 並非將它回收掉, 此 function 中有限定:
*sp
中.而且此 function 只有給 pointer to head, 所以要透過 list_first_entry
或者 list_entry
找的對應的 element_t, 然後把此 element_t 中的字串複製到 sp
中
element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
// copy remove node's value into sp, and limit sp size < bufsize
if (!head || list_empty(head))
return NULL;
element_t *del_node = list_first_entry(head, element_t, list);
list_del(&del_node->list);
if (sp) {
// when bufsize>1, sp can be assinged with char.
for (char *str_ptr = del_node->value; bufsize > 1;
sp++, str_ptr++, bufsize--)
*sp = *str_ptr;
*sp = '\0'; // end of string ends with \0
}
return del_node;
}
Note:
執行 $ make SANITIZER=1
開啟 Address Sanitizer 後, 發現 remove 皆有 heap over flow
的問題, 原因是:
使用之前的 for loop 會導致越界, 試圖存取 heap 之外的的範圍, 改用 strncpy
可以解決問題
補充:
strcpy v.s strncpy
可使用以下工具來 Debug:
跟 q_remove_head
的差別在於, 在找 list_entry
時, 給的位址並非 head
而是 head->prev
, 此動作相當於找 tail.
/* Remove an element from tail of queue */
element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
// /**
// * since each node has prev and next pointer,
// * we can simply find tail position in queue in O(1)
// */
if (!head || list_empty(head))
return NULL;
element_t *del_node =
list_entry(head->prev, element_t, list); // find entry for remove node
if (!del_node)
return NULL;
list_del(&del_node->list);
if (sp) {
// when bufsize>1, sp can be assinged with char.
for (char *str_ptr = del_node->value; bufsize > 1;
sp++, str_ptr++, bufsize--)
*sp = *str_ptr;
*sp = '\0'; // end of string ends with \0
}
return del_node;
}
使用一變數 len
來紀錄 node 個數.
/* Return number of elements in queue */
int q_size(struct list_head *head)
{
if (!head)
return 0;
int len = 0;
struct list_head *li;
list_for_each (li, head)
len++;
return len;
}
使用兩種 pointer:
這兩種的差別是: fast pointer一次移動兩次, 而 slow 移動一次, 如果 fast 移動到終點, 那 slow 的所在位置即是中點.
/* Delete the middle node in queue */
bool q_delete_mid(struct list_head *head)
{
// https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/
if (!head)
return false;
struct list_head *fast = head->next->next;
struct list_head *slow = head->next;
// after while loop, slow == mid_position
while (fast->prev != head && fast != head) {
fast = fast->next->next;
slow = slow->next;
}
// update queue
list_del(slow);
element_t *rmv_ptr = list_entry(slow, element_t, list);
q_release_element(rmv_ptr);
return true;
}
實作過程中, 因為 list_mode
可以把兩個節點交換, 所以只需要用兩個指標指向要交換的兩個點, 並且若第一個指標指向 head
or head->next
, 則不做交換.
/* Swap every two adjacent nodes */
void q_swap(struct list_head *head)
{
// https://leetcode.com/problems/swap-nodes-in-pairs/
if (!head)
return;
// void list_add(*node, *target): insert node after target
// void list_del(*node): remove target node form list
struct list_head *n = head->next;
struct list_head *t = NULL;
while (n != head && n->next != head) {
t = n;
list_move(n, t->next);
n = n->next;
}
return;
}
做的過程中, 首先我的想法是: 宣告一個類似 stack 的結構, 走訪每一的節點的同時, 如果這個點沒有看過, 就把它 push 進去 stack 中, 如果看過就 remove , 但是我誤會題目的意思了
這題的題意是: 如果 list 中有重複的值, 全部刪掉, 例如:
參考 https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup
的作法
Linux Kernal API
中的 function:
list_cut_position(head_to, head_from, node)
list_splice()
由於有要求複雜度在 nlgn 內, 選用 merget sort.
參考 https://hackmd.io/@yanjiew/linux2023q1-lab0#q_delete_dup
的作法
採用 top down
的 merge sort, 先找到 list 中點, 切成左右兩邊下去做排序.
Intutive 的想法: 把所有 list 接起來, 再 merge.
分析: 沒有利用到 sort 後的優點
錯誤來自 q_remove_head
以及 q_remove_tail
cmd> rh
=================================================================
==31019==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6040000008ba at pc 0x562cad2f2e78 bp 0x7ffe55680170 sp 0x7ffe55680160
READ of size 1 at 0x6040000008ba thread T0
#0 0x562cad2f2e77 in q_remove_head /home/eric/linux2023/lab0-c/queue.c:108
#1 0x562cad2ecb73 in do_remove /home/eric/linux2023/lab0-c/qtest.c:404
#2 0x562cad2ece06 in do_rh /home/eric/linux2023/lab0-c/qtest.c:461
#3 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
#4 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
#5 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
#6 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
#7 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#8 0x7f9c03229e3f in __libc_start_main_impl ../csu/libc-start.c:392
#9 0x562cad2e9d44 in _start (/home/eric/linux2023/lab0-c/qtest+0x9d44)
0x6040000008ba is located 0 bytes to the right of 42-byte region [0x604000000890,0x6040000008ba)
allocated by thread T0 here:
#0 0x7f9c036b4867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x562cad2f23c3 in test_malloc /home/eric/linux2023/lab0-c/harness.c:133
#2 0x562cad2f2c87 in q_insert_tail /home/eric/linux2023/lab0-c/queue.c:82
#3 0x562cad2ed3a5 in do_it /home/eric/linux2023/lab0-c/qtest.c:313
#4 0x562cad2efdbe in interpret_cmda /home/eric/linux2023/lab0-c/console.c:181
#5 0x562cad2f085a in interpret_cmd /home/eric/linux2023/lab0-c/console.c:201
#6 0x562cad2f22ac in run_console /home/eric/linux2023/lab0-c/console.c:691
#7 0x562cad2ee7c0 in main /home/eric/linux2023/lab0-c/qtest.c:1227
#8 0x7f9c03229d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/eric/linux2023/lab0-c/queue.c:108 in q_remove_head
Shadow bytes around the buggy address:
0x0c087fff80c0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80d0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80e0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff80f0: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
0x0c087fff8100: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00 fa
=>0x0c087fff8110: fa fa 00 00 00 00 00[02]fa fa fa fa fa fa fa fa
0x0c087fff8120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8130: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8140: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8150: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8160: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==31019==ABORTING
使用 gdb 確認出錯地方, 使用前把 exception_setup
給註解掉, 否則無法進入 q_remove_head
內檢查
$ gdb qtest
$ b 402
$ r
cmd> new
l = []
cmd> it 4
l = [4]
cmd> rh
Breakpoint 1, do_remove (option=option@entry=0, argc=1, argv=<optimized out>) at qtest.c:402
402 if (current) //&& exception_setup(true))
(gdb) s
403 re = option ? q_remove_tail(current->q, removes, string_length + 1)
(gdb) s
q_remove_head (head=0x606000000040, sp=sp@entry=0x61d000000080 "", bufsize=1025) at queue.c:97
97 {
(gdb) n
99 if (!head || list_empty(head))
(gdb) n
101 element_t *del_node = list_first_entry(head, element_t, list);
(gdb) n
102 list_del(&del_node->list);
(gdb) n
103 if (sp) {
(gdb) n
106 for (char *str_ptr = del_node->value; bufsize > 1;
(gdb) finish
Run till exit from #0 q_remove_head (head=<optimized out>, sp=0x61d000000083 "\336", 'X' <repeats 199 times>...,
sp@entry=0x61d000000080 "4", bufsize=<optimized out>) at queue.c:107
進入 q_remove_head
中 for 迴圈後, 不斷 n
直到退出循環為止, 但是停止條件有誤, 故 for 迴圈會不斷執行直到 heap-buffer-overflow 觸發 exception 為止
這裡越界的指標是 char *str_ptr
解法改用 strncpy
, 可防止 buffer-overflow
q_remove_head
以及 q_remove_head
還沒修復前, 執行 valgrind 結果如下
$ valgrind -q --leak-check=full ./qtest <traces/trace-01-ops.cmd
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
==41444== Invalid read of size 1
==41444== at 0x10F7F3: q_remove_head (queue.c:108)
==41444== by 0x10C468: do_remove (qtest.c:404)
==41444== by 0x10C610: do_rh (qtest.c:461)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444== Address 0x4b88f00 is 0 bytes after a block of size 48 alloc'd
==41444== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444== by 0x10F203: test_malloc (harness.c:133)
==41444== by 0x10F6A7: q_insert_head (queue.c:55)
==41444== by 0x10CBDA: do_ih (qtest.c:224)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444==
==41444== Invalid read of size 1
==41444== at 0x10F80A: q_remove_head (queue.c:106)
==41444== by 0x10C468: do_remove (qtest.c:404)
==41444== by 0x10C610: do_rh (qtest.c:461)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444== Address 0x4b88f01 is 1 bytes after a block of size 48 alloc'd
==41444== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==41444== by 0x10F203: test_malloc (harness.c:133)
==41444== by 0x10F6A7: q_insert_head (queue.c:55)
==41444== by 0x10CBDA: do_ih (qtest.c:224)
==41444== by 0x10DEED: interpret_cmda (console.c:181)
==41444== by 0x10E4A2: interpret_cmd (console.c:201)
==41444== by 0x10F10C: run_console (console.c:691)
==41444== by 0x10D2DF: main (qtest.c:1227)
==41444==
...
Invalid read of size 1
表示有越界讀取的行為發生
方法: leakage detection test on the execution time.
具體之作法是給程式兩個不同類型的輸入資料(i.e. fix-vs-random 測試),並觀察給定不同的輸入資料,所造成的執行時間分佈在統計上是否相同。
規格書中描述如何量測 Eexecution time
RDTSC ( Read Time-Stamp Counter and Processor ID IA assembly instruction )
基本上, 使用這個指令時,需要先把 preemption
給關掉,以及 interrupts
給關掉,避免因為搶佔 (preemption) 影響量測。以下是範例程式碼:
static int __init hello_start(void)
{
unsigned long flags;
uint64_t start, end;
int variable = 0;
unsigned cycles_low, cycles_high, cycles_low1, cycles_high1;
printk(KERN_INFO "Loading test module...\n");
preempt_disable(); /*we disable preemption on our CPU*/
raw_local_irq_save(flags); /*we disable hard interrupts on our CPU*/
/*at this stage we exclusively own the CPU*/
asm volatile (
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low));
measured_function(&variable);
asm volatile (
"RDTSC\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high1), "=r" (cycles_low1));
raw_local_irq_restore(flags);
/*we enable hard interrupts on our CPU*/
preempt_enable();/*we enable preemption*/
start = ( ((uint64_t)cycles_high << 32) | cycles_low );
end = ( ((uint64_t)cycles_high1 << 32) | cycles_low1 );
printk(KERN_INFO "\n function execution time is %llu clock cycles", (end-
start));
return 0;
}
上面的程式看似合理, 但是有缺陷會影響評測, 例如:
用來寫 inline assembly,C 語言中為了效率等目的直接要求編譯器加入我們所指定組合語言。
外部設備/硬體介面:當變量是外部設備/硬體介面的寄存器時,該變量可能會在未知的時間被更改,因此使用 volatile 關鍵字可以告訴編譯器不要對這些變量進行某些最佳化,例如刪除未使用的變量或進行重複計算等。
多執行緒環境:在多執行緒環境下,多個執行緒可能同時訪問同一個變量,如果不使用 volatile 關鍵字,編譯器可能會對該變量進行某些最佳化,導致在執行緒間共享變量時出現問題。
非預期的控制流:當控制流在編譯器無法預測的方式下更改變量時,使用 volatile 關鍵字可以確保編譯器不會對變量進行某些最佳化。
總之,volatile 關鍵字告訴編譯器該變量是易變的,不能對其進行某些最佳化,以確保變量在某些情況下的可見性和可預測性。
lab0-c
內 dudect
缺陷執行 RDTSC
前,並沒有preemption
給關掉,以及 interrupts
給關掉
改進你的漢語表達
觀察 qtest.c
, 檢查 insert head
, insert_tail
等動作時, 會呼叫 is_##_const
的 function, 這些 function 定義在 fixture.h
頭文件中
C 語言中, ##是一個預處理器運算符, 以下節錄自 C99 規格書
6.10.3.3 The ## operator
Semantics
1 The resulting token is the concatenation of the original two tokens. The resulting token is available for further macro expansion, except in the following cases:
— It is formed by replacing a parameter by a token sequence that does not contain any of the following: ##, #, or a parameter;
— It is the result of a ## operator in a macro definition that was not used in a # operator with a string literal as its operand;
— It is a character constant, a string literal, or a header name (that is, the result of a < or a > operator in a #include directive).
簡而言之,##
的作用是將兩個 token 結合成一個新的 token 。該運算符通常用於擴展巨集(macro)中,以將多個 token 結合成一個新的巨集名稱、變量名稱、函數名稱等等。
以上節錄自Chatgpt
改進上方的書寫,留意 資訊科技詞彙翻譯Image Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
收到!
fixture.c
中, 存在一個函式 measure
, 該 function 會呼叫 cpucycles
來計算程式執行期間的 cpu cycles 數量
dudect
中, 沒有按照 interl 白皮書中的那樣, 計算 cpu cycle 前關掉 preemption
及 interrupts
.
根據這點, 利用 perf
來做實驗
$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest < traces/trace-17-complexity.cmd
Performance counter stats for './qtest' (5 runs):
4438 cache-misses # 0.000 % of all cache refs ( +-40185.85% )
6,7127 cache-references ( +-3686501.81% )
134,6960 instructions # 0.00 insn per cycle ( +-11300715.53% )
112,5612 cycles ( +-5661047.29% )
15.54 +- 15.53 seconds time elapsed ( +-100.00% )
以下是統計來自 user space
的 instruction count
perf stat --repeat 5 -e cache-misses,cache-references,instructions:u,cycles:u ./qtest < traces/trace-17-complexity.cmd
Performance counter stats for './qtest' (5 runs):
3775 cache-misses # 0.000 % of all cache refs ( +-293615.29% )
7,4683 cache-references ( +-3633221.45% )
26,5826 instructions:u # 0.00 insn per cycle ( +-57370700.97% )
30,7086 cycles:u ( +-21566357.39% )
16.43 +- 16.43 seconds time elapsed ( +-100.00% )
使用白皮書內的寫法後, cycles 數量不減反增. 且無法 #include <linux/preempt.h>
, 待查
確認 cache 對效能的影響。
以下參考自 KHLee529
論文中有以下論述:
Cropping: Typical timing distributions are positively
skewed towards larger execution times. This may be caused by measurement artifacts, the main process being interrupted by the OS or other extraneous activities. In this case, it may be convenient to crop the measurements (or discard measurements that are larger than a fixed, class-independent threshold).
可以把 outlier ( 離群值 )
刪除
同時檢查 cpu cycles, 發現其實並沒有和前者有太大的差距, 但是通過 constant time 檢查
Performance counter stats for './qtest' (5 runs):
5973 cache-misses # 0.001 % of all cache refs ( +-18700.29% )
7,1710 cache-references ( +-598499.01% )
134,8395 instructions # 0.00 insn per cycle ( +-1760534.07% )
111,9976 cycles ( +-928018.50% )
2.56 +- 2.56 seconds time elapsed ( +- 99.98% )