2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by < `s8w1e2ep` > ###### tags: `sys2017` `s8w1e2ep` `phonebook` ## Reviewed by `davis8211` * 可以說明為什麼使用 BKDR hash function。 > [各種字符串Hash函數比較](https://www.byvoid.com/zhs/blog/string-hash-compare) > 這個網頁裡面有各個hash function的比較實驗，其中BKDR hash的分數最高，因此我選擇使用BKDR hash function。[name=s8w1e2ep][color=#fc936a] * Hash table 設定的大小可以再說明的清楚一點，或許有人會對 44,887 這個大小有疑問。 > 44,887是一個質數，會選擇這麼大的數字，是因為 data set 有35萬筆資料。如果 size 太小，發生 collision 的機率會提高。導致搜尋時間變長。而 size 大到某個程度後，發生 collision 的機率相差不大，搜尋效率也不會變得更好，反而會佔取額外的空間。[name=s8w1e2ep][color=#fc936a] * Hash 版本的 code ```clike= entry *append(char lastName[], entry *e) { unsigned int hashValue = BKDRHash(lastName); e = (entry *) malloc(sizeof(entry)); e->pNext = hashTable[hashValue]; strcpy(e->lastName, lastName); e->detailEntry = NULL; e->pNext = NULL; hashTable[hashValue] = e; return e; } ``` * 第六行，你將新建立的 entry e 的指標指向 hashTable[hashValue]，但在第九行，又將指標指向 NULL，因此你的 hashTable 一直都把過去儲存的資料蓋掉，需再思考一下。 > 確實會蓋過過去儲存的資料，我會再做修改。[name=s8w1e2ep][color=#fc936a] ## 開發環境 ``` OS: Ubuntu 16.04.1 (LTS) Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：1 每通訊端核心數：4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 158 Model name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 製程： 9 CPU MHz： 799.987 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 6800.00 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 6144K NUMA node0 CPU(s): 0-3 ``` ## 前置作業 ### 終端機與Vim * [終端機與Vim設定](https://hackmd.io/s/HJv9naEwl#) * 終端機文字顏色 ![terminal color](https://i.imgur.com/nflUCt3.png) 因為我安裝的Ubuntu版本是16.04，因此語法上有些微不同。 ```shell PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\[\033[01;33m\]$\[\033[00m\] ' ``` * Vim設定 ``` set ai syntax on set background=dark set enc=utf8 set hls set number map <F4> : set nu!<BAR>set nonu?<CR> set ic set expandtab set tabstop=4 set shiftwidth=2 ``` ### Perf * [perf 原理和實務](https://hackmd.io/s/B11109rdg#) * 遇到的問題我照著步驟輸入下列指令 ```shell $ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` 接著輸入`$ perf top` 結果出現 ![](https://i.imgur.com/hr4S11z.png) 一開始想法是用Vim直接改/proc/sys/kernel/perf_event_paranoid這個檔案，後來發現無法強制改值。最後參考了這位大大的共筆。 * [ nekoneko - 共筆 ](https://hackmd.io/s/rJCs_ZVa#) 輸入下列指令 ```shell $ sudo sh -c " echo 1 > /proc/sys/kernel/perf_event_paranoid" ``` 再執行`$ perf top`，就解決啦！ ### Astyle 輸入下列指令就可以依照 K&R style 自動排版程式碼 ```shell $ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch] ``` ### gnuplot * [gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg#) * [gnu plot 延伸閱讀](https://www.electricmonk.nl/log/2014/07/12/generating-good-looking-charts-with-gnuplot/) ## 開發紀錄 ### Cache 簡介 * [CPU Cache - Wiki](https://en.wikipedia.org/wiki/CPU_cache) * [Get cache line size](https://ubuntuforums.org/showthread.php?t=2277551) * [Cache原理和實驗](https://hackmd.io/MwMwRgHGDsIIYFoAm04DYEBZMEYcIhwCYBTBNABgFYlgSRYBOUoA?view) Cache entries : Data is transferred between memory and cache in blocks of fixed size, called cache lines or cache blocks. When a cache line is copied from memory into the cache, a cache entry is created. The cache entry will include the copied data as well as the requested memory location (called a tag). Cache line : CPU cache 中最小的緩存單位，也可以稱 cache block。 Cache miss : A cache miss is a failed attempt to read or write a piece of data in the cache, which results in a main memory access with much longer latency. 資料在 memory 與 cache 傳送時，會以固定大小的block型式傳遞，我們稱為 cache line 或是 cache block。當1個 cache line 從 memory 複製到 cache 時，1個 cache entry 會被創造。 Cache entry 包含了複製的資料以及請求的記憶體位址(又稱為 tag )。當程序需要讀取或寫入資料到記憶體位址時，首先會確認 cache 中對應的 entry 。並確認任一個可能包含此位址的 cahce line 的內容。如果程序在 cache 中找到位址，就會發生 cache hit ；反之，則會發生 cache miss 。在 cache hit 的情況下，程序可以從 cache line 中立即的讀取或寫入資料；在 cache miss 的情況下， cache 會分配一個新的 entry 的空間並將資料從記憶體複製到 entry 中，然後用 cache 的內容去完成請求。發生 Cache miss 的情況分為三種 1. instruction read miss (largest delay) 2. data read miss (smaller delay) 3. data write miss (shortest delay) ***Cache entry structure*** | tag | index | block offset | | :---: | :---: | :---: | * block offset: ⌈ log~2~( *b* ) ⌉ bits, *b* is the number of bytes per cache block. * index: ⌈ log~2~( *n* ) ⌉ bits, *n* is the number of ways. * tag : `the length of address` - `index` - `block offset` bits 輸入下列指令，可以得到自己電腦的cache line size，我的電腦是64B。 ```shell $ cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size ``` 執行下列指令可以查看 cache 詳細資訊 ```shell $ sudo dmidecode -t cache | more ``` 我的電腦是64位元的 OS ，因此位址長度為 64 bits。而 cache block 大小為 64B，L1 cache 使用的是 8 - way associative，且 L1d cache 大小為 32KB。 * data offset = log~2~64 = 6(bits) * index = log~2~(32(KB) / 64(B) / 8) = 6(bits) * tag = 64 - 6 - 6 = 52(bits) ### 計算 * 已知條件 - L1 cache size : 64KB - L1d size : 32KB (L1中用來處理 data 的部分) - cache line size = cache block size : 64B - 1 個 address size : 8B = 64 bits - 每個 entry size : 136B(data size) + 8B(data pointer) = 144B * 需要花費多少 cache block? - 350,000(總資料筆數) * 144 B = 50,400,000B - 50,400,000B / 64B * 2(兩個 function 存取) = 1,575,000(block) - 從下方原始版本的 cache miss 次數，可以看到 1,575,000 與 5,029,615 量級相同。reference 一次 = 讀 1 個 block * 有多少個 set? - 我的電腦是採用 8-way associative - 32KB / 64B / 8 = 64，每個 set 有 8 個 cache line, 共有 64 個 set * 每個 set 可以存多少筆資料? - 144B(entry size) / 64B(blcok) = 2.25，也就是說至少要有3個 block 才能完整存一筆資料。 - 每個 set 只能存 2 筆資料，因為每個 set 只有 8 個 block ### 原始版本 * append()與findName()執行時間先清除快取 `$ echo 1 | sudo tee /proc/sys/vm/drop_caches` 再執行`$ ./phonebook_orig` ``` size of entry : 136 bytes execution time of append() : 0.035636 sec execution time of findName() : 0.005177 sec ``` * cache miss 執行`$ make cache-test` 發現 cache-miss 高達 91% ``` Performance counter stats for './phonebook_orig' (100 runs): 4,586,848 cache-misses # 91.197 % of all cache refs ( +- 0.15% ) 5,029,615 cache-references ( +- 0.09% ) 262,487,884 instructions # 1.45 insn per cycle ( +- 0.02% ) 180,634,865 cycles ( +- 0.08% ) 0.050913126 seconds time elapsed ( +- 0.18% ) ``` ### 優化版本1 - 減少 phonebook 資料結構大小 * 修改了 phonebook_opt.h ```clike= typedef struct __PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detail *detailEntry; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` * append() 與 findName() 執行時間執行$ ./phonebook_opt ``` size of entry : 32 bytes execution time of append() : 0.042310 sec execution time of findName() : 0.001569 sec ``` * cache miss 執行`$ make cache-test` cache miss 從原本 91% 降低到 63%，比原本好一些，但還是很高。 ``` Performance counter stats for './phonebook_opt' (100 runs): 1,401,857 cache-misses # 62.629 % of all cache refs ( +- 0.45% ) 2,238,341 cache-references ( +- 0.12% ) 244,525,333 instructions # 2.09 insn per cycle ( +- 0.02% ) 117,255,055 cycles ( +- 0.10% ) 0.032406767 seconds time elapsed ( +- 0.28% ) ``` * 優化後的phonebook效能比較圖 ![](https://i.imgur.com/7HQHF8w.png) ### 優化版本2 - 使用 hash function * [各種字符串Hash函數比較](https://www.byvoid.com/zhs/blog/string-hash-compare) * 使用 BKDR hash function 因為data set數量有350,000筆資料，所以我將 MAX_HASH_TABLE_SIZE 設定為 44,887。 ```clike= unsigned int BKDRHash(char *str) { unsigned int seed = 131, hash = 0; while(*str) { hash = hash * seed + (*str++); } return hash % MAX_HASH_TABLE_SIZE; } ``` * append() 與 findName() 執行時間執行$ ./phonebook_hash ``` size of entry : 32 bytes execution time of append() : 0.034605 sec execution time of findName() : 0.000000 sec ``` * cache miss 執行`$ make cache-test` cache miss 已經降低成 37%。跟原始版本相比減少了將近 6 成。 ``` Performance counter stats for './phonebook_hash' (100 runs): 586,421 cache-misses # 37.400 % of all cache refs ( +- 0.51% ) 1,567,973 cache-references ( +- 0.17% ) 242,581,266 instructions # 1.93 insn per cycle ( +- 0.02% ) 125,639,830 cycles ( +- 0.10% ) 0.035417917 seconds time elapsed ( +- 0.23% ) ``` * 與其它優化版本的phonebook效能比較圖 ![](https://i.imgur.com/I1Oh8gh.png)