# 2018q1 Homework3 (list) contributed by < `BodomMoon` > ###### tags `BodomMoon` `list` `sysprog2018` [github]()、 [作業區](https://hackmd.io/s/S1iCyyziG#)、 [作業要求](https://hackmd.io/s/HkxZbJzif#) ## 內容大綱 * 測試環境 * 函式庫優化測試 ## 測試環境 ``` $ lscpu Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數: 2 每通訊端核心數: 4 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 142 Model name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 製程: 10 CPU MHz: 3181.463 CPU max MHz: 4000.0000 CPU min MHz: 400.0000 BogoMIPS: 3984.00 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 8192K NUMA node0 CPU(s): 0-7 ``` ## 函式庫優化測試 list.h 中有許多函式可以進行類似以下的優化: ```=c static inline void list_splice(struct list_head *list, struct list_head *head){ //以下註解掉的皆為原始程式碼 //struct list_head *head_first = head->next; //struct list_head *list_first = list->next; //struct list_head *list_last = list->prev; /*head->next = list_first; list_first->prev = head; list_last->next = head_first; head_first->prev = list_last;*/ if (list_empty(list)) return; list->prev->next = head->next; head->next->prev = list->prev; head->next = list->next; list->next->prev = head; } ``` 本修改是基於不再透過類似 temp 變數而是直接按照特定順序操作,犧牲少許可讀性減少多餘的記憶體空間使用的作法。但在修改的同時我也思考到「如此簡單的邏輯調換 gcc 會不會已經能夠自動處理?」 經過以下比對: `gcc -OX -S list_splice.c -o list_org` `gcc -OX -S list_splice_test.c -o list_opt` > OX測試過O2 O3 在以上測試中 opt 的版本比起 org 的版本指令數都的確有所減少,再經過 perf 的測試: org 優化前的版本 ``` perf stat --repeat 200 -e cache-misses,cache-references,instructions,cycles ./list_org Performance counter stats for './list_org' (200 runs): 4,015 cache-misses # 10.801 % of all cache refs ( +- 6.16% ) 37,169 cache-references ( +- 0.43% ) 556,034 instructions # 0.75 insn per cycle ( +- 0.15% ) 746,289 cycles ( +- 0.83% ) 0.000504717 seconds time elapsed ( +- 2.38% ) ``` opt 優化後的版本 ``` perf stat --repeat 200 -e cache-misses,cache-references,instructions,cycles ./list_opt Performance counter stats for './list_opt' (200 runs): 841 cache-misses # 2.248 % of all cache refs ( +- 8.58% ) 37,425 cache-references ( +- 0.41% ) 556,999 instructions # 0.85 insn per cycle ( +- 0.14% ) 654,017 cycles ( +- 0.63% ) 0.000360601 seconds time elapsed ( +- 1.06% ) ``` 可見犧牲部份可讀性後效能的確有所提昇 > 某種程度上用 bitwise 運算不也是在犧牲可讀性換取效能嗎? >[name=BodomMoon]