Try   HackMD

2017q1 Homework1(phonebook)

contributed by < refleex >

Reviewed by king1224

  • phonebook_opt_hash 的 findName 執行時間是 0.0000 秒,你認為是不正常的嗎? 如果將要查找的 lastName 改變之後,時間還會是 0.0000 秒嗎
  • 我試著將查找目標改為 " aaaa ",卻出現了錯誤,append 是不是需要做修正
  • 可以詳細閱讀電話簿的輸入順序,main.c 中呼叫 append 和 findName 的時機,以及你是如何處理 append 和 findName 的,就會發現為什麼 findName 在這邊是 0 秒,為什麼不能查找別筆資料

main.c 中的 input array 即為查找目標
dictionary 資料夾中的 word.txt 為預設電話簿
洪正皇

開發環境

作業系統 : ubuntu 16.04 LTS (64-bit)
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 60
Model name: Intel® Core i5-4210M CPU @ 2.60GHz
製程: 3
CPU MHz: 1694.671
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 5187.92
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3

進度

  • 2017/02/22

不用特別標時間,HackMD 會自動記錄更新,你只要及時整理進度和所見所聞即可 "jserv"


前置作業

先安裝並熟悉 perf 的使用

$ sudo apt-get install linux-tools-common
$ sudo apt-get install linux-tools-3.16.0-50-generic linux-cloud-tools-3.16.0-50-generic

但因為 kernel 版本不同,因此照出現的警示安裝相符的版本

$ sudo apt-get install linux-tools-4.4.0-63-generic
$ sudo apt-get install linux-cloud-tools-4.4.0-63-generic

接著就照 Perf 的說明繼續練習,先查看權限

$ cat /proc/sys/kernel/perf_event_paranoid

結果為 1

1

但在我要解除 kernel pointer 限制時

$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

查看權限的結果仍為 1

執行 example.c

$ ./example

得到 pid: 26710

$ perf top -p 26710

輸入上列指令,卻只得到 perf top 的參數列表

Usage: perf top [<options>]

    -a, --all-cpus        system-wide collection from all CPUs
    -b, --branch-any      sample any taken branches
    -c, --count <n>       event period to sample
    -C, --cpu <cpu>       list of cpus to monitor
    -d, --delay <n>       number of seconds to delay between refreshes
    -D, --dump-symtab     dump the symbol table used for profiling
    -E, --entries <n>     display this many functions
    -e, --event <event>   event selector. use 'perf list' to list available even
    -f, --count-filter <n>
                          only display functions with more events than this
    -F, --freq <n>        profile at this frequency
    -g                    enables call-graph recording and display
    -i, --no-inherit      child tasks do not inherit counters
    -j, --branch-filter <branch filter mask>
                          branch stack filter modes
    -K, --hide_kernel_symbols
                          hide kernel symbols
    -k, --vmlinux <file>  vmlinux pathname
    -M, --disassembler-style <disassembler style>
                          Specify disassembler style (e.g. -M intel for intel syntax)
                         .
                         .
                         .

上述部份試了很多種方法,結果不是沒變,不然就是

bash: syntax error near unexpected token `newline'

不過沒關係,大致重複來回做7個小時後,再參考一些網路上的文章,已經稍微對 perf 有些概念,所以正式開始這次的作業>優化


Original

先用 perf stat 檢查原始版本的 cash miss

$sudo perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

結果如下

 Performance counter stats for './phonebook_orig' (100 runs):

         969,283      cache-misses              #   85.291 % of all cache refs      ( +-  0.19% )
       1,136,447      cache-references                                              ( +-  0.13% )
     261,184,322      instructions              #    1.47  insns per cycle          ( +-  0.02% )
     177,351,808      cycles                                                        ( +-  0.14% )

     0.056668601 seconds time elapsed                                          ( +-  0.15% )

cache-miss 85.291%

size of entry : 136 bytes
execution time of append() : 0.059108 sec
execution time of findName() : 0.005920 sec

可能的效能改進方向

  • 使用 binary search tree 或其他資料結構改寫
  • 藉由 hash function 來加速查詢,請詳閱 hash function 觀念和實務
  • 既然 first name, last name, address 都是合法的英文 (可假設成立),使用字串壓縮的演算法,降低資料表示的成本

優化 1 (struct)

打開 phonebook_opt.h

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e);

打開 phonebook_orig.c

entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; }

可以發現在phonebook_opt.h struct 所選宣告變數中 append() 和 findName() 只用到了 lastName

因此把不會用到的變數 用另外一個struct宣告

typedef struct __PHONE_BOOK_ENTRY1 { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry1; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct entry1 *detail; struct __PHONE_BOOK_ENTRY *pNext; } entry;

然後$ make plot

size of entry : 32 bytes
execution time of append() : 0.038840 sec
execution time of findName() : 0.005018 sec
Performance counter stats for './phonebook_opt' (100 runs):

         971,060      cache-misses              #   85.131 % of all cache refs      ( +-  0.17% )
       1,140,662      cache-references                                              ( +-  0.12% )
     261,089,459      instructions              #    1.47  insns per cycle          ( +-  0.02% )
     177,377,228      cycles                                                        ( +-  0.13% )

     0.056722433 seconds time elapsed                                          ( +-  0.16% )

85.131%,嗯。
modify struct

However, 我可憐的電腦,cache-miss 僅優化了 0.16%

不死心重複嘗試後,把原本的 phonebook 資料夾整個刪掉,重頭開始做。

#define OPT 1 typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; struct entry1 *detail; } entry; typedef struct __PHONE_BOOK_ENTRY1 { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry1; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e); #endif
entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; }

然後$ make plot

size of entry : 32 bytes
execution time of append() : 0.034790 sec
execution time of findName() : 0.002003 sec
 Performance counter stats for './phonebook_opt' (100 runs):

           142,859      cache-misses              #   47.150 % of all cache refs      ( +-  0.50% )
           302,990      cache-references                                              ( +-  0.60% )
       243,965,824      instructions              #    1.93  insns per cycle          ( +-  0.02% )
       126,343,413      cycles                                                        ( +-  0.26% )

       0.040863594 seconds time elapsed                                          ( +-  0.34% )

47.15% 總算回歸正常數值
newruntime


優化 2 ( hash function )

看完 hash function 觀念和實務 ,再上網找了一些關於hash function的文章
了解到 hash function 其實就是把一堆資料分門別類,再根據索引去搜尋。
所以如果把電話簿裡的名字以某種規則分類,若要找特定名字,便可在較小的範圍內搜尋,以減少搜索時間

所謂某種規則,便是 hash function ,雜湊函數。
上網找了一個 BKDR-hash function 便開始著手修改程式碼。

unsigned int hash(char lastName[]) { int hash_number = 0; int h = 53; for (int i = 0; lastName[i] != '\0'; i++) hash_number = (hash_number * h + lastName[i]) % TABLE; return hash_number; }

hash function 本身很好寫,難在 main 函式的更改,尤其 allocate memory 時極常出錯

size of entry : 32 bytes
execution time of append() : 0.073824 sec
execution time of findName() : 0.000000 sec
*** Error in `./phonebook_opt_hash': munmap_chunk(): invalid pointer: 0x00007ffd6f3b87a0 ***

上網找了一下關於 munmap_chunk(): invalid pointer 的問題,找到一篇提到:the program will crash if you pass a pointer to free() that has not been obtained through malloc().

修改一下程式碼後(包括 Makefile, calculate.c, runtime.gp等等)

size of entry : 32 bytes
execution time of append() : 0.056479 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_orig' (100 runs):

           570,732      cache-misses              #   59.378 % of all cache refs      ( +-  0.33% )
           961,182      cache-references                                              ( +-  0.31% )
       260,905,474      instructions              #    1.47  insns per cycle          ( +-  0.02% )
       177,699,696      cycles                                                        ( +-  0.42% )

       0.057268308 seconds time elapsed                                          ( +-  0.52% )

Performance counter stats for './phonebook_opt' (100 runs):

           139,321      cache-misses              #   46.176 % of all cache refs      ( +-  0.56% )
           301,716      cache-references                                              ( +-  0.53% )
       243,828,644      instructions              #    1.94  insns per cycle          ( +-  0.02% )
       125,408,510      cycles                                                        ( +-  0.29% )

       0.040634235 seconds time elapsed                                          ( +-  0.38% )

 Performance counter stats for './phonebook_opt_hash' (100 runs):

           742,301      cache-misses              #   59.684 % of all cache refs      ( +-  0.76% )
         1,243,727      cache-references                                              ( +-  0.24% )
       336,095,343      instructions              #    1.22  insns per cycle          ( +-  0.01% )
       275,331,193      cycles                                                        ( +-  0.30% )

       0.089550008 seconds time elapsed                                          ( +-  0.36% )

Origin 的 cash-miss 只有 59% ,正常不應該那麼低,所以我下載原版跑了一次,跟之前一樣約略是 85% ~ 86%。
應該是因為我在更改 main.c 時不小心更動某一部份(還不確定再哪),而 origin 有近乎和 hash 版本一樣的 cash-miss

findName 一直是0.0000秒,原本以為他是沒有進到 while迴圈而直接回傳 NULL ,可是試驗之後發現也不是,這邊一直想不透


參考資料