Try   HackMD

2018q1 Homework3 (list)

contributed by < BodomMoon >

tags BodomMoon list sysprog2018

github作業區作業要求

內容大綱

  • 測試環境
  • 函式庫優化測試

測試環境

$ lscpu
Architecture:        x86_64
CPU 作業模式:       32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
每核心執行緒數:     2
每通訊端核心數:     4
Socket(s):           1
NUMA 節點:          1
供應商識別號:       GenuineIntel
CPU 家族:           6
型號:               142
Model name:          Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
製程:               10
CPU MHz:            3181.463
CPU max MHz:         4000.0000
CPU min MHz:         400.0000
BogoMIPS:            3984.00
虛擬:               VT-x
L1d 快取:           32K
L1i 快取:           32K
L2 快取:            256K
L3 快取:            8192K
NUMA node0 CPU(s):   0-7

函式庫優化測試

list.h 中有許多函式可以進行類似以下的優化:

static inline void list_splice(struct list_head *list, struct list_head *head){
    //以下註解掉的皆為原始程式碼
    //struct list_head *head_first = head->next;
    //struct list_head *list_first = list->next;
    //struct list_head *list_last = list->prev;
    /*head->next = list_first;
    list_first->prev = head;
    list_last->next = head_first;
    head_first->prev = list_last;*/
    if (list_empty(list))
        return;
    list->prev->next = head->next;
    head->next->prev = list->prev;
    head->next = list->next;
    list->next->prev = head;
}

本修改是基於不再透過類似 temp 變數而是直接按照特定順序操作,犧牲少許可讀性減少多餘的記憶體空間使用的作法。但在修改的同時我也思考到「如此簡單的邏輯調換 gcc 會不會已經能夠自動處理?」

經過以下比對:
gcc -OX -S list_splice.c -o list_org
gcc -OX -S list_splice_test.c -o list_opt

OX測試過O2 O3

在以上測試中 opt 的版本比起 org 的版本指令數都的確有所減少,再經過 perf 的測試:

org 優化前的版本

perf stat --repeat 200 -e cache-misses,cache-references,instructions,cycles ./list_org

 Performance counter stats for './list_org' (200 runs):

             4,015      cache-misses              #   10.801 % of all cache refs      ( +-  6.16% )
            37,169      cache-references                                              ( +-  0.43% )
           556,034      instructions              #    0.75  insn per cycle           ( +-  0.15% )
           746,289      cycles                                                        ( +-  0.83% )

       0.000504717 seconds time elapsed                                          ( +-  2.38% )

opt 優化後的版本

perf stat --repeat 200 -e cache-misses,cache-references,instructions,cycles ./list_opt

 Performance counter stats for './list_opt' (200 runs):

               841      cache-misses              #    2.248 % of all cache refs      ( +-  8.58% )
            37,425      cache-references                                              ( +-  0.41% )
           556,999      instructions              #    0.85  insn per cycle           ( +-  0.14% )
           654,017      cycles                                                        ( +-  0.63% )

       0.000360601 seconds time elapsed                                          ( +-  1.06% )

可見犧牲部份可讀性後效能的確有所提昇

某種程度上用 bitwise 運算不也是在犧牲可讀性換取效能嗎?
BodomMoon