Try   HackMD

2016q3 Homework1 (phonebook)

contributed by <ierosodin>

開發環境

作業系統 : CentOS 7

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Genuine Intel® CPU @ 3.30GHz
Stepping: 5
CPU MHz: 1277.976
BogoMIPS: 6600.19
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11

軟體安裝

yum install perf
yum install gnuplot
astyle for centos
rpm -ivh astyle-2.03-3.el6.x86_64.rpm

Astyle

To format your file you can execute below command:

astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

GitHub

fork to your GitHub

$ git add .
$ git commit -m 'Ver.1'
$ git remote set-url origin https://github.com/ierosodin/phonebook 
$ git push origin master

PhoneBook 效能分析

$ ./phonebook_orig &perf top -p $!

  21.55%  libc-2.17.so    [.] __strcasecmp_l_avx
  17.30%  phonebook_orig  [.] findName
   6.40%  [kernel]        [k] clear_page_c
   5.79%  libc-2.17.so    [.] _IO_fgets

熱點發生在strcasecmp( 不斷的比較字串 )

可能的改善方向

老師提到可能的效能改進方向有

  • 改寫 struct __PHONE_BOOK_ENTRY 的成員,搬動到新的結構中
  • 使用 hash function 來加速查詢
  • 既然 first name, last name, address 都是合法的英文 (可假設成立),使用字串壓縮的演算法,降低資料表示的成本
  • 使用 binary search tree 改寫演算法

原始Code效能

$ make run

size of entry : 136 bytes
execution time of append() : 0.083056 sec
execution time of findName() : 0.007171 sec

$ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

Performance counter stats for './phonebook_orig' (100 runs):

         1,990,809      cache-misses              #   86.735 % of all cache refs      ( +-  0.24% )
         2,295,273      cache-references                                              ( +-  0.11% )
       247,251,886      instructions              #    1.11  insns per cycle          ( +-  0.02% )
       223,340,345      cycles                                                        ( +-  0.24% )

       0.068523391 seconds time elapsed                                          ( +-  0.30% )

嘗試一( phonebook_opt.[ch] )

發現在search時, 只會使用到last name, 嘗試修改phonebook_opt.h中的__PHONE_BOOK_ENTRY, 僅保留char lastName[MAX_LAST_NAME_SIZE]

效能影響

$ watch -d -t "./phonebook_opt && echo 3 | sudo tee /proc/sys/vm/drop_caches"

size of entry : 32 bytes
execution time of append() : 0.039816 sec
execution time of findName() : 0.002610 sec

entry縮減到32 bytes, append與fineName時間都有明顯縮短
entry大小會影響到電腦cache的使用
$ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_opt

Performance counter stats for './phonebook_opt' (100 runs):

           365,862      cache-misses              #   49.537 % of all cache refs      ( +-  0.90% )
           738,568      cache-references                                              ( +-  0.37% )
       224,825,485      instructions              #    1.65  insns per cycle          ( +-  0.02% )
       136,519,429      cycles                                                        ( +-  0.29% )

       0.042013263 seconds time elapsed                                          ( +-  0.37% )

嘗試二( phonebook_hash.[ch] )

嘗試老師提到的hash function加速查詢
需要在Makefile, main.c, calculate.c中作額外的修改
參考hash tables寫法 : http://algs4.cs.princeton.edu/34hash/

int hash = 0; for (int i = 0; i < s.length(); i++) hash = (R * hash + s.charAt(i)) % M;

效能影響

$ watch -d -t "./phonebook_hash && echo 3 | sudo tee /proc/sys/vm/drop_caches"

size of entry : 24 bytes
execution time of append() : 0.057173 sec
execution time of findName() : 0.000026 sec

hash tables可以減少資料的比對次數, 對search的速度有明顯的提升, 但實際使用時, 造成append的時間變長
是否有方法可以使hash tables對append的影響降低?
$ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_hash

Performance counter stats for './phonebook_hash' (100 runs):

           269,028      cache-misses              #   36.329 % of all cache refs      ( +-  0.58% )
           740,533      cache-references                                              ( +-  0.77% )
       213,131,517      instructions              #    1.47  insns per cycle          ( +-  0.03% )
       145,081,096      cycles                                                        ( +-  0.63% )

       0.057804122 seconds time elapsed                                          ( +-  1.17% )

三種版本的比較圖