# 2016q3 Homework1 (phonebook) contributed by <`nekoneko`> ###### tags: `sys2016` `nekoneko` `homework` ## Reviewed by `ChenYi` * 請試著把#define OPT 1 改成 #define OPT 0試試看 * #if defined(var)使用方法請詳見[Gnu文件-Defined](https://gcc.gnu.org/onlinedocs/gcc-5.3.0/cpp/Defined.html) * 記得commit前將沒有用到的code註解刪除 * 如果要測的資料遠大於hash table所能使用的空間的話,處理方案為何? ## 開發環境 - 系統環境: Lubuntu 16.04.1 LTS(64-bit) - linux kernel 版本: 4.4.0-38-generic - processor: 4x Intel(R) Core(TM) i3 CPU M330 @2.13GHz - memory: 3833MB - L1d cache: 32K - L1i cache: 32K - L2 cache: 256K - L3 cache: 3072K ## Git設定 - 因為把舊有的Ubuntu 14.04重灌成lubuntu 16.04.1,git 和 ssh key的設定都要重新來。參考[GitHub 設定指引](http://wiki.csie.ncku.edu.tw/github)和[Generating an SSH key](https://help.github.com/articles/generating-an-ssh-key/)完成git ssh key的設定。 - 參考[初次設定Git](https://git-scm.com/book/zh-tw/v1/%E9%96%8B%E5%A7%8B-%E5%88%9D%E6%AC%A1%E8%A8%AD%E5%AE%9AGit),設定git configure - 參照dada同學上學期的[共筆](https://embedded2016.hackpad.com/ep/pad/static/YkqjhwgnQcA),設定terminal prompt的顏色。 ![terminal prompt](https://i.imgur.com/8yZelfE.png) - clone完phonebook repository,做以下設定便可以使用ssh key做`git push` `$ git remote add origin git@github.com:your_account/your_repository.git` ## vimrc設定 其實以前都有設定過,只是這次重灌,重新設定之外,做點紀錄 - 參考[vimrc設定教學](http://wiki.csie.ncku.edu.tw/vim/vimrc),做基本設定 - 紀錄離開前的行號([參考資料](http://vim.wikia.com/wiki/Restore_cursor_to_file_position_in_previous_editing_session)) - 漂亮的版本分支圖 ([Pretty git branch graphs](http://stackoverflow.com/questions/1057564/pretty-git-branch-graphs)) ## Astyle ``` astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch] ``` - --style=kr : bracket style為K&R的風格 - --suffix=none : 取消自動備份 ## gnu plot ## Perf 照著`Linux 效能分析工具: Perf`資料的`perf_top_example.c`來執行perf top,但是~~沒有跑出預期結果~~ - 將kernel.perf_event_paranoid改為-1(權限全開) ```txt $ sudo sh -c " echo -1 > /proc/sys/kernel/perf_event_paranoid" $ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` - 編譯 ```txt g++ -c perf_top_example.c g++ perf_top_example.o -o example ./example ``` - 執行perf ``` sudo perf top -p $pid ``` ![](https://i.imgur.com/OExdpHJ.png) >`sudo perf top` 顯示也還是如上圖,但是下`-e`參數的話,是會有資訊的。應該是文件沒看完,看熟文件後,來想辦法找出一點頭緒[name=cheng hung lin] - `cycles`event 參考文件中,提到預設的perfomance event為cycles,然而從上圖來看,預設卻是`cycles:pp`。接著執行`perf list`來查閱有cycles關鍵字的event。 - cpu-cycles OR cycles - ref-cycles - stalled-cycles-backend OR idle-cycles-backend - stalled-cycles-frontend OR idle-cycles-frontend - cpu-cycles OR cpu/cpu-cycles/ 實際上,在我的筆電上是沒有`cycles:pp`的event。所以改成以下就可看到我們要的`cycles`event ```txt $ sudo perf top -e cycles -p $pid ``` - 結果 ![](https://i.imgur.com/scsVb5W.png) >「程式輸出的文字訊息」請勿使用圖片表示,請改由文字貼上[name=課程助教] > 好的,之後不會再使用圖片,謝謝提醒[name=cheng hung lin] > 可以用 `perf --stdio` 來作為文字輸出 [name=louielu] ### perf 基本指令 - perf top - -e \<event\>: 選擇要測試的PMU event - -p \<pid\>: 針對某個$pid process做profiling - perf state - --repeat \<n\>: 重複執行n次 - :u : 測量的instructions只限在user level - :k : kernel level - perf record - 紀錄command的profile到perf.data - -F: 取樣的平均頻率 - -c: 取樣的間距 - example: -F 250,以平均每秒250個sample的平率取樣 - example: -c 2000,監測event每2000次取樣一次 - perf report - 讀取perf.data來顯示profile ### 背景知識 #### Branch Prediction - Static predict v.s. Dynamic predict - Branch History Table (BHT) - Correlating Branch Predictor - General (m,n) Branch Predictors - Branch Target Buffer (BTB) #### if-else statement ```clike for (int i = 0; i < max; i++) if (<condition>) sum++; ``` ![branch fetch](https://i.imgur.com/xY8YjiD.png) - [ ]**[Branch Patterns, Using GCC](http://cellperformance.beyond3d.com/articles/2006/04/branch-patterns-using-gcc.html)** > 之後有時間再回來看[name=cheng hung lin] ## Cache miss 複習白算盤 4e 5.2 章節 - 書中提到cache miss對於效能的影響 *The processing of a cache miss create a pipeline stall as opposed to an interrupt, which would require saving the state of registers* - cahe miss on write - write-through: 當write miss 發生時會同時寫入cache 和main memory,但是會花長的時間 - write buffer: processor將要寫入的資料放進write buffer後,就可繼續執行而不被暫停。當processor寫入的速度小於寫入main memory時,processor仍然會有stall產生的機會。(burst) - write-back: 只先寫入cache,當cache blocks將被替換,再寫回main memory(原文是:`The modified block is written to the lower level of the hierarchy when it is replaced.`) ## phonebook ### 原始版本 `$ echo 1 | sudo tee /proc/sys/vm/drop_caches` `$ sudo perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,instructions,cycles ./phonebook_orig` - 觀察的事件有 - cache-misses - cache-references - L1-dcache-load-misses - L1-dcache-store-misses - L1-dcache-prefetch-misses - L1-icache-load-misses - instructions - cycles - 結果 ```txt size of entry : 136 bytes execution time of append() : 0.085346 sec execution time of findName() : 0.011937 sec Performance counter stats for './phonebook_orig' (10 runs): 988,062 cache-misses # 93.309 % of all cache refs ( +- 0.87% ) (61.95%) 1,058,908 cache-references ( +- 0.63% ) (61.94%) 2,644,837 L1-dcache-load-misses ( +- 0.56% ) (61.94%) 877,813 L1-dcache-store-misses ( +- 1.34% ) (50.10%) 1,294,117 L1-dcache-prefetch-misses ( +- 1.38% ) (52.38%) 526,219 L1-icache-load-misses ( +- 5.38% ) (54.38%) 260,342,568 instructions # 1.00 insns per cycle ( +- 0.36% ) (65.70%) 260,840,687 cycles ( +- 0.61% ) (63.41%) 0.129547167 seconds time elapsed ( +- 2.20% ) ``` - caches-misses高達93.309% - append() 的時間 0.085346 sec - findName() 的時間 0.011937 sec `$ perf record -F 12500 -e cache-misses ./phonebook_orig && perf report` ![](https://i.imgur.com/gknX4j2.png) - findName的Overhead佔10.73% - ~~append所花的時間相對findName來的多~~,但是cache miss的測試上,findName比append的overhead還來的多。 - 從main.c的程式碼來看,append和findName分別是用cpu_time1和cpu_time2這兩個變數來紀錄,cpu_time1紀錄的時間是整筆資料append的時間,也就是呼叫了349900次的append(),時間上本來就會比findName大 - append: ``` 0.05% phonebook_orig phonebook_orig [.] append ``` > append在cache miss的測試上,並沒有每次都有紀錄。[name=cheng hung lin] - 原始版本的圖 ![phonebook_origin](https://i.imgur.com/JFf53cK.png) ### 優化版本1 - 調整struct內容 - 目標: 縮小struct的大小 原始struct大小為136Byte,findName 和 append需要只有last name的資訊。 - 原始的entry ```clike=0 /* original version */ typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` - 縮小過後的struct ```clike=0 #define OPT 1 typedef struct __DETAIL_ENTRY { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detailEntry; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detailEntry *detail; /* char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; */ struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` - phonebook_opt.h 裡要把`#define OPT 1`註解拿掉,回到main.c裡可以看到 ``` #if defined(OPT) output = fopen("opt.txt", "a"); #else ``` - `defined`[說明](https://gcc.gnu.org/onlinedocs/cpp/Defined.html) - perf stat 結果 ```txt size of entry : 32 bytes execution time of append() : 0.103262 sec execution time of findName() : 0.016726 sec Performance counter stats for './phonebook_opt' (10 runs): 1,669,477 cache-misses # 94.412 % of all cache refs ( +- 0.44% ) (62.14%) 1,768,295 cache-references ( +- 0.45% ) (62.22%) 2,823,014 L1-dcache-load-misses ( +- 0.47% ) (62.07%) 1,058,923 L1-dcache-store-misses ( +- 0.87% ) (50.66%) 1,672,812 L1-dcache-prefetch-misses ( +- 2.14% ) (53.15%) 559,604 L1-icache-load-misses ( +- 7.74% ) (52.70%) 330,247,777 instructions # 1.00 insns per cycle ( +- 0.63% ) (64.64%) 328,716,232 cycles ( +- 0.41% ) (62.74%) 0.161549694 seconds time elapsed ( +- 1.50% ) ``` ![phonebook_opt](https://i.imgur.com/0V2NiO5.png) 如同參考文件寫的,append的時候還有malloc detailEntry的話,cache miss不降反生,從93.309%升到94.412 %,但是這裡可以看出來findName的cache miss比原先版本的下降,從10.3%到3.03%。 > 那94.412%的整體cache miss是給detail配置記憶體空間造成的,但要從何才能觀察到呢?[name=cheng hung lin] > 原本編譯參數自己有加上`-g`,忘記刪除,所以上面數據重新更改為去除`-g`flags[name=cheng hung lin][time=Mon, Sep 26, 2016 8:27 PM] detail pointer不配置記憶體而先初始成NULL ```txt size of entry : 32 bytes execution time of append() : 0.072660 sec execution time of findName() : 0.004983 sec Performance counter stats for './phonebook_opt' (10 runs): 349,355 cache-misses # 78.215 % of all cache refs ( +- 0.77% ) (59.17%) 446,659 cache-references ( +- 1.63% ) (59.07%) 954,744 L1-dcache-load-misses ( +- 3.86% ) (60.24%) 295,731 L1-dcache-store-misses ( +- 3.02% ) (53.95%) 1,506,670 L1-dcache-prefetch-misses ( +- 5.08% ) (57.94%) 272,066 L1-icache-load-misses ( +- 4.59% ) (55.98%) 261,005,283 instructions # 1.41 insns per cycle ( +- 0.98% ) (66.21%) 185,111,487 cycles ( +- 0.69% ) (61.76%) 0.102715691 seconds time elapsed ( +- 12.42% ) ``` ![phonebook_opt](https://i.imgur.com/vnVSsyJ.png) - findName cache misses Overhead: 2.69% ![phonebook_opt](https://i.imgur.com/QOfEhzN.png) - append(): 時間減少**15.42%** 速度為原來的1.18倍 - findName(): 時間減少**58.22%** 速度為原來的2.39倍 ### 優化版本2 -- hash function - hash 當初資料結構沒學好,從原本的教科書(Fundamentals of data structures in c)和網路資料看起。 - 定義: 將key經過hash function預算後,轉化成bucket address。 - collision:不同的key,產生相同的hash值 (frome wikipedia) - overflow: bucket裡的slot滿了 - 實做 參考了Yuron大大和ayueh0822大大的共筆,初步了解hashing的過程,分別是 建立hash table、hash function把key轉成buckect address、將資料存進bucket address。因為在插入資料到hash table時,有可能會有overflow和collision的狀況,從ayueh0822大大共筆的圖來看,每個buckect 的slot用linked list來存可以避免這樣個狀況。 > overflow和collision其實還是沒有很懂,等實做完後再來查資料。 - [ ] overflow and collision - hash table structure ```clike= #define TWO_POWER_NUM 8 #define MAX_HASH_TABLE_SIZE 1 << TWO_POWER_NUM typedef struct __HASH_SLOT { entry *data; struct __HASH_SLOT *pNext; } slot_unit; typedef struct __HASH_BUCKET { slot_unit *slot; } hash_bucket; hash_bucket hashTable[MAX_HASH_TABLE_SIZE]; ``` 再精簡一些 ```clike= #define TWO_POWER_NUM 8 #define MAX_HASH_TABLE_SIZE 1 << TWO_POWER_NUM typedef struct __HASH_SLOT { entry *data; struct __HASH_SLOT *pNext; } slot_unit; slot_unit hashTable[MAX_HASH_TABLE_SIZE]; ``` - string conversion 參照Yuron大大的筆記和資節課本,單純將每個字元相加產生key ```clike= unsigned int stringToInt(char *key) { int number = 0; while (*key) number += *key++; return number; } ``` - unsigned int v.s. int bit 的表示差別 - hashFunction - 方法:division ```clike= unsigned int hashFunction(unsigned int key) { return key & ((1<<TWO_POWER_NUM)-1); } ``` - [ ]`key & ((1<<TWO_POWER_NUM)-1)`是否作到預期結果 - [ ]是否有比`key % 256`來的快 - findName ```clike= entry *findName(char lastname[], entry *pHead) { unsigned int hashPos; slot_unit *slot; hashPos = hashFunction(stringToInt(lastname)); slot = hashTable[hashPos]; while (slot!=NULL) { if (strcasecmp(lastname, slot->data->lastName) == 0) return slot->data; slot = slot->pNext; } return NULL; } ``` - append ```clike= /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e->detail = NULL; e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; unsigned int hashPos; slot_unit *newSlot; hashPos = hashFunction(stringToInt(lastName)); newSlot = (slot_unit *) malloc(sizeof(slot_unit)); newSlot->pNext = hashTable[hashPos]; newSlot->data = e; hashTable[hashPos] = newSlot; ``` - main.c hash table 初始化 ```clike= #if defined(HASH) unsigned int __i; for (__i=0;__i < MAX_HASH_TABLE_SIZE;++__i) hashTable[__i] = NULL; #endif ``` > ~~這個版本的程式碼,無法跑出結果,應該是卡在某個loop中,來debug~~[name=cheng hung lin][time=Tue, Sep 27, 2016 3:09 AM] > > `stringToInt`函式第五行`number += *key;`,改成`*key++`[name=cheng hung lin][time=Tue, Sep 27, 2016 3:28 AM] > 使用hash table的資料結構,代表說原本的linked list可以不需要用到了,目前的版本保留原本的linked list是多餘的,等bug解完再修[name=cheng hung lin] - 結果 ```txt size of entry : 32 bytes execution time of append() : 0.103418 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_hash' (10 runs): 484,838 cache-misses # 89.380 % of all cache refs ( +- 1.35% ) (59.20%) 542,444 cache-references ( +- 1.22% ) (63.18%) 781,892 L1-dcache-load-misses ( +- 2.35% ) (65.95%) 491,304 L1-dcache-store-misses ( +- 0.47% ) (55.71%) 68,136 L1-dcache-prefetch-misses ( +- 12.38% ) (53.77%) 380,728 L1-icache-load-misses ( +- 5.78% ) (50.07%) 304,166,446 instructions # 1.36 insns per cycle ( +- 0.37% ) (61.18%) 224,161,591 cycles ( +- 0.74% ) (57.94%) 0.111725998 seconds time elapsed ( +- 2.45% ) ``` - findName所花的時間減少,但是cache-misses反而增加 - [ ]為什麼cache-misses增加 - [ ]何謂cache-references ```txt Samples: 1K of event 'cache-misses', Event count (approx.): 515758 Overhead Command Shared Object Symbol ▒ 64.32% phonebook_hash [kernel.kallsyms] [k] clear_page ▒ 8.63% phonebook_hash [kernel.kallsyms] [k] pte_lockptr.isra.13 ▒ 3.73% phonebook_hash [kernel.kallsyms] [k] try_charge ▒ 3.48% phonebook_hash [kernel.kallsyms] [k] copy_user_generic_string ▒ 2.74% phonebook_hash [kernel.kallsyms] [k] unmap_page_range ▒ 2.62% phonebook_hash [kernel.kallsyms] [k] __rmqueue.isra.79 ▒ 2.48% phonebook_hash [kernel.kallsyms] [k] get_page_from_freelist ▒ 1.74% phonebook_hash [kernel.kallsyms] [k] get_mem_cgroup_from_mm ▒ 1.57% phonebook_hash [kernel.kallsyms] [k] free_pcppages_bulk ◆ 1.29% phonebook_hash [kernel.kallsyms] [k] handle_mm_fault ▒ 0.98% phonebook_hash [kernel.kallsyms] [k] __alloc_pages_nodemask ▒ 0.94% phonebook_hash [kernel.kallsyms] [k] mem_cgroup_try_charge ▒ ``` > 64.32%, 8.63%都為紅字,其餘為綠字[name=cheng hung lin] ![phonebook_hash](https://i.imgur.com/LcGJry5.png) - slots 分佈 - h(x)=x mod M - M = 64 ![](https://i.imgur.com/5fIDIIf.png) - M = 256 ![](https://i.imgur.com/Z3kmoHW.png) - M = 1024 ![](https://i.imgur.com/iXN98oF.png) - M = 42737 ![](https://i.imgur.com/CC7fkzD.png) - M = 42737 ```txt size of entry : 32 bytes execution time of append() : 0.109029 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_hash' (10 runs): 877,411 cache-misses # 87.872 % of all cache refs ( +- 1.44% ) (61.75%) 998,513 cache-references ( +- 1.43% ) (63.25%) 1,406,025 L1-dcache-load-misses ( +- 1.37% ) (64.22%) 536,177 L1-dcache-store-misses ( +- 1.04% ) (52.47%) 71,249 L1-dcache-prefetch-misses ( +- 22.30% ) (51.38%) 603,052 L1-icache-load-misses ( +- 4.61% ) (49.54%) 321,729,410 instructions # 0.93 insns per cycle ( +- 2.92% ) (62.35%) 347,147,596 cycles ( +- 0.76% ) (60.77%) 0.169511003 seconds time elapsed ( +- 0.80% ) ``` - M = 42737 將原本的兩次malloc降到一次 ```clike= entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ unsigned int hashPos; hashPos = hashFunction(stringToInt(lastName)); e = (entry *) malloc(sizeof(entry)); e->pNext = hashTable[hashPos]; strcpy(e->lastName, lastName); e->detail = NULL; hashTable[hashPos] = e; return e; } ``` ```txt size of entry : 32 bytes execution time of append() : 0.087568 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_hash' (10 runs): 692,774 cache-misses # 87.907 % of all cache refs ( +- 0.90% ) (59.08%) 788,076 cache-references ( +- 1.63% ) (61.45%) 1,110,403 L1-dcache-load-misses ( +- 1.20% ) (63.80%) 300,690 L1-dcache-store-misses ( +- 2.65% ) (54.17%) 79,456 L1-dcache-prefetch-misses ( +- 4.96% ) (53.95%) 448,250 L1-icache-load-misses ( +- 5.02% ) (51.45%) 295,086,356 instructions # 0.98 insns per cycle ( +- 1.58% ) (62.71%) 299,696,349 cycles ( +- 0.56% ) (59.94%) 0.148717215 seconds time elapsed ( +- 3.20% ) ``` ### hash funtion 2 - hash function - M = 42737 ```clike= unsigned int hash(hash_table *hashtable , char *str) { unsigned int hash_value = 0; while (*str) hash_value = (hash_value << 5) - hash_value + (*str++); return (hash_value % MAX_HASH_TABLE_SIZE); } ``` - slot 分佈 ![](https://i.imgur.com/9ef4h8o.png) - cache misses rate: 61.825 % ```txt size of entry : 32 bytes execution time of append() : 0.093066 sec execution time of findName() : 0.000010 sec Performance counter stats for './phonebook_hash' (10 runs): 695,569 cache-misses # 61.825 % of all cache refs ( +- 0.40% ) (60.64%) 1,125,064 cache-references ( +- 0.35% ) (60.66%) 1,524,817 L1-dcache-load-misses ( +- 0.77% ) (61.47%) 321,656 L1-dcache-store-misses ( +- 0.73% ) (52.48%) 48,764 L1-dcache-prefetch-misses ( +- 1.28% ) (54.86%) 567,795 L1-icache-load-misses ( +- 3.91% ) (53.19%) 281,766,203 instructions # 0.89 insns per cycle ( +- 1.17% ) (64.34%) 316,135,640 cycles ( +- 0.63% ) (61.96%) 0.159598584 seconds time elapsed ( +- 3.83% ) ``` ```txt 48.91% phonebook_hash1 phonebook_hash1 [.] main 30.85% phonebook_hash1 [kernel.kallsyms] [k] clear_page 3.48% phonebook_hash1 [kernel.kallsyms] [k] pte_lockptr.isra.13 2.60% phonebook_hash1 [kernel.kallsyms] [k] copy_user_generic_string 1.65% phonebook_hash1 [kernel.kallsyms] [k] try_charge 1.49% phonebook_hash1 phonebook_hash1 [.] append 1.37% phonebook_hash1 [kernel.kallsyms] [k] unmap_page_range 1.30% phonebook_hash1 [kernel.kallsyms] [k] get_page_from_freelist 1.17% phonebook_hash1 [kernel.kallsyms] [k] __rmqueue.isra.79 0.91% phonebook_hash1 [kernel.kallsyms] [k] get_mem_cgroup_from_mm 0.80% phonebook_hash1 [kernel.kallsyms] [k] mem_cgroup_try_charge 0.75% phonebook_hash1 [kernel.kallsyms] [k] free_pcppages_bulk 0.51% phonebook_hash1 [kernel.kallsyms] [k] handle_mm_fault ``` - cache miss rate 下降,但是cache miss沒有減少,反而是cache-references增加 - 為了測試`slot`,phonebook_hash,phonebook_hash1,都會在讀取整個hash table一遍,試著將以下程式碼註解 ```clike= #if 0 FILE *__fp; entry *__entry; unsigned int __count; __fp = fopen("hash_slots.txt", "w"); for (__i=0; __i<MAX_HASH_TABLE_SIZE; ++__i) { __count = 0; __entry = hashTable[__i]; while (__entry) { ++__count; __entry = __entry->pNext; } fprintf(__fp, "%d %d\n", __i, __count); } fclose(__fp); #endif ``` ```txt= size of entry : 32 bytes execution time of append() : 0.087941 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_hash1' (10 runs): 337,303 cache-misses # 53.906 % of all cache refs ( +- 0.37% ) (64.80%) 625,727 cache-references ( +- 1.56% ) (64.38%) 912,984 L1-dcache-load-misses ( +- 1.76% ) (65.78%) 313,882 L1-dcache-store-misses ( +- 0.70% ) (69.79%) 62,026 L1-dcache-prefetch-misses ( +- 5.81% ) (70.86%) 290,789 L1-icache-load-misses ( +- 6.16% ) (67.31%) 0.105764890 seconds time elapsed ( +- 13.96% ) ``` - 所以讀取整個hash table也會造成cache miss ### 小結 - 相對其他人的測出的cache miss rate從9x%降到3x%,我測出的只有從94.412 %降到78.215 % >> 延伸閱讀: [Intel Ivy Bridge Cache Replacement Policy ](http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/) >> 這可見 Intel i3, i5, i7 之間 cache 設計的落差 [name=jserv] - division的hash function為減少collision,要以"質數"來取餘數 ### perfect hash functions [Generating Perfect Hash Functions](http://www.drdobbs.com/architecture-and-design/generating-perfect-hash-functions/184404506?pgno=1) [Perfect Hash Function Generator](https://www.gnu.org/software/gperf/manual/gperf.html) ## 參考資料 - [Linux 效能分析工具: Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) - [dada同學上的筆記](https://embedded2016.hackpad.com/ep/pad/static/YkqjhwgnQcA) - [Yuron同學上的筆記](https://embedded2016.hackpad.com/ep/pad/static/NcHhQCQ4ijr) - [ayueh0822同學的筆記](https://hackpad.com/2016q1-Homework-1-xWkbsmaWVEn) - [[C 範例代碼] 尋找演算法 : 哈希查找](http://puremonkey2010.blogspot.tw/2010/11/c_15.html) - [Hash Funtion](https://zh.wikipedia.org/wiki/%E6%95%A3%E5%88%97%E5%87%BD%E6%95%B8) - [Programing Small](https://hackmd.io/s/S1rbwmZ6)