2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by <`chenweiii`> [Preparing...](https://hackmd.io/s/HkbFQSOYg) ### Reviewed by `laochanlam` > [Question] 仍不太清楚 assert 運作，若 findName(intput, e) 找到，應該才會接著執行下一個 statement，但實際運作好像反了過來。 - assert() 的用法是若括號內的 Statement 回傳的值是 false 時，終止程式的執行。 ```C assert(findName(input, e) && "Did you implement findName() in " IMPL "?"); ``` 是為了驗證您在當次執行的程式中有實作 findName() 這個 function ( 預設回傳 NULL )，所以若你 assert 接收到 NULL 的時候，便會中斷程式。所以那兩句的作用是：先確認你有實作 findName() ，再確認您能夠準確地使用 findName() 找到電話簿中的 "zyxel" 。 - 是我的問題嗎，為什麼我沒有看到任何 Commit... Github連結有誤 ( ? >>大大，我有另外開 branch ，您可能要切換才能看到喔！[name=Chen Wei] - 可以把每個優化版本都保留，一同畫圖比較效能時會更為清晰 >>了解！[name=Chen Wei] ## environment * OS: Ubuntu 16.04.2 LTS (64 bit) * `$ cat /etc/os-release` * CPU: Intel(R) Pentium(R) CPU B960 @ 2.20GHz * `$ lscpu` * Cache: * L1d cache: 32K * L1i cache: 32K * L2 cache: 256K * L3 cache: 2048K * cash alignment: 64B (cache block size) * `$ cat /proc/cpuinfo` * Memory: 4GB * `$ cat /proc/meminfo` ## 初步觀察 ### 程式觀察 * main.c ```clike= while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; line[i - 1] = '\0'; i = 0; e = append(line, e); ``` ~~[Question] 為什麼要 line[i - 1] = '\0'？~~ **因 line 最後一個字元為 '\n'** ```clike= assert(findName(input, e) && "Did you implement findName() in " IMPL "?"); assert(0 == strcmp(findName(input, e)->lastName, "zyxel")); ``` [Question] 仍不太清楚 assert 運作，若 findName(intput, e) 找到，應該才會接著執行下一個 statement，但實際運作好像反了過來。 ```clike= if (pHead->pNext) free(pHead->pNext); free(pHead); ``` [Question] 不太清楚為何僅 free 這兩塊，而不是整條 linked-list ### 使用 perf 觀察熱點 `$ ./phonebook_orig & sudo perf top -p $!` ```Samples: 254 of event 'cycles', Event count (approx.): 97639063 Overhead Shared Object Symbol 15.28% libc-2.23.so [.] __strcasecmp_l_sse4_2 13.35% phonebook_orig [.] findName 11.69% phonebook_orig [.] main 9.34% libc-2.23.so [.] __strcasecmp_sse4_2 6.84% libc-2.23.so [.] __memchr_ia32 5.92% libc-2.23.so [.] __memcpy_ia32 5.37% libc-2.23.so [.] _IO_getline_info 3.48% libc-2.23.so [.] _int_malloc 2.66% [kernel] [k] get_page_from_freelist 2.59% libc-2.23.so [.] _IO_fgets 2.31% phonebook_orig [.] append 2.28% [kernel] [k] page_fault 2.00% [kernel] [k] native_flush_tlb_single 1.61% libc-2.23.so [.] __strcpy_ssse3 1.55% libc-2.23.so [.] malloc 1.34% [kernel] [k] free_hot_cold_page 1.33% [kernel] [k] free_pcppages_bulk 1.23% phonebook_orig [.] 0x000005b0 0.94% [kernel] [k] __rmqueue.isra.90 0.88% [kernel] [k] page_remove_rmap 0.88% [kernel] [k] __mod_zone_page_state 0.76% libc-2.23.so [.] _IO_getline 0.73% [kernel] [k] __copy_from_user_ll_nozero 0.73% [kernel] [k] handle_mm_fault 0.44% [kernel] [k] native_write_msr_safe 0.44% [kernel] [k] mem_cgroup_page_lruvec 0.44% [kernel] [k] PageHuge 0.44% [kernel] [k] enqueue_entity 0.44% [kernel] [k] unmap_page_range 0.41% [kernel] [k] _cond_resched 0.40% [kernel] [k] do_fast_syscall_32 0.39% [kernel] [k] page_add_new_anon_rmap 0.38% [kernel] [k] lru_cache_add_active_or_unevictable 0.37% [kernel] [k] find_get_entry 0.36% [kernel] [k] __do_page_fault 0.36% [kernel] [k] __kunmap_atomic 0.01% [kernel] [k] rcu_irq_exit ``` ### make run ``` size of entry : 128 bytes execution time of append() : 0.148193 sec execution time of findName() : 0.006575 sec ``` ### make cache-test ``` Performance counter stats for './phonebook_orig' (100 runs): 1,216,994 cache-misses # 67.717 % of all cache refs ( +- 0.20% ) 1,797,186 cache-references ( +- 0.24% ) 309,915,175 instructions # 1.40 insns per cycle ( +- 0.01% ) 221,699,529 cycles ( +- 0.06% ) 0.103731045 seconds time elapsed ( +- 0.82% ) ``` ### make plot ![](https://i.imgur.com/aEH4sw8.png) ### 小結 * 大量的 cache-reference 可能藉由縮小 entry size 而下降 * 熱點集中在 strcasecmp, findName, main，改善此三個 function 表現應可大幅改善 execution time * strcasecmp：尚未有想法。 * findName：應可藉由 Hash function 取代 linked-list 改善。 * main：熱點集中在處理 append 的 while 迴圈，也尚未有想法處理。 * [Question] **發現 make plot 的 append execution time 與 make run 之 append execution time 相差甚遠** ### 改善方向 - [x] 縮小 struct size - [x] 更改資料結構為 hash table，並觀察不同 hash function 之表現 - [ ] ~~使用 int[] 取代 char[]，應該降低 strcasecmp 所帶來之負擔。~~ - [x] 增加 hash table item 之 size 為 cache block size (i.e. 一個 item 放多個 phonebook entry)，驗證是否會降低 cache miss - [x] 實作 memory pool 取代每次 malloc ## 改善方向一：縮小 struct size **commit id: c7d1268dae231456e058ab64c57a394be5c17e7c** ```clike= typedef struct __PHONE_BOOK_ENTRY_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY_DETAIL *pDetail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` size of entry: 128 Byte -> 24 byte ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 234,867 cache-misses # 75.321 % of all cache refs ( +- 0.18% ) 311,822 cache-references ( +- 0.22% ) 287,494,977 instructions # 1.68 insns per cycle ( +- 0.09% ) 171,261,822 cycles ( +- 0.08% ) 0.079299396 seconds time elapsed ( +- 0.18% ) ``` ### plot ![](https://i.imgur.com/K67jVOE.png) ### 小結 * append() execution time：0.078157s -> 0.062106s (減少約20%) * findName() execution time：0.006593s -> 0.005095s (減少約22%) * cache references：1,797,186 -> 311,822 (減少約83%) 減少 struct size 效果顯著 * cache miss ratio：67.717 % -> 75.321 % 但 miss ratio 反而上升 * [Question] cache misses ratio 反而上升之原因？ >> 把硬體資訊列出來，在一開頭。之後再來分析 cache 的行為落差 [name="jserv"] ### 使用 perf 觀察 phonebook_orig cache misses 之熱點 ``` $ perf record -e cache-misses ./phonebook_orig $ perf report ``` ``` Samples: 451 of event 'cache-misses', Event count (approx.): 1255837 Overhead Command Shared Object Symbol 41.48% phonebook_orig [kernel.kallsyms] [k] get_page_from_freelist 17.18% phonebook_orig libc-2.23.so [.] __strcasecmp_l_sse4_2 9.27% phonebook_orig libc-2.23.so [.] __strcasecmp_sse4_2 8.87% phonebook_orig phonebook_orig [.] findName 7.75% phonebook_orig [kernel.kallsyms] [k] native_flush_tlb_single 2.16% phonebook_orig [kernel.kallsyms] [k] __kunmap_atomic 1.91% phonebook_orig [kernel.kallsyms] [k] __copy_from_user_ll_nozero 1.65% phonebook_orig phonebook_orig [.] 0x000005b0 1.47% phonebook_orig [kernel.kallsyms] [k] __rmqueue.isra.90 ``` ### 使用 perf 觀察 phonebook_opt cache misses 之熱點 ``` Samples: 272 of event 'cache-misses', Event count (approx.): 251187 Overhead Command Shared Object Symbol 45.07% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 12.86% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 9.20% phonebook_opt libc-2.23.so [.] __strcasecmp_l_sse4_2 7.82% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 3.54% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic 3.14% phonebook_opt [kernel.kallsyms] [k] __rmqueue.isra.90 2.84% phonebook_opt [kernel.kallsyms] [k] unmap_page_range 2.11% phonebook_opt phonebook_opt [.] findName 1.90% phonebook_opt [kernel.kallsyms] [k] native_pte_clear 1.87% phonebook_opt libc-2.23.so [.] __strcasecmp_sse4_2 1.78% phonebook_opt [kernel.kallsyms] [k] free_pages_prepare 1.50% phonebook_opt [kernel.kallsyms] [k] handle_mm_fault 0.61% phonebook_opt [kernel.kallsyms] [k] balance_dirty_pages_ratelimited ``` ### 減少 struct size 之觀察 * malloc 仍佔 cache misses 大宗 * **native_flush_tlb_single**: change from 7.75% to 12.86% * **__strcasecmp_l_sse4_2**: change from 17.18% to 9.2% * **__strcasecmp_sse4_2**: change from 9.27% to 1.87% * **findName**: change from 8.87% to 2.11% * **__copy_from_user_ll_nozero**: change from 1.19% to 7.82% ### 何謂 native_flush_tlb_single ```clike= /* arch/x86/kernel/paravirt.c, line 193 */ static void native_flush_tlb_single(unsigned long addr) { __native_flush_tlb_single(addr); } /* arch/x86/include/asm/tlbflush.h, line 184 */ static inline void __native_flush_tlb_single(unsigned long addr) { asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); } ``` 其中 **invlpg** 為 x86 Invalidate TLB Entries 的指令 [Question] wiki 上寫只有在 address space switch 時才須清空 TLB，但是 switch 到哪？還是只是一般的 context switch Referenced from: [invlpg](http://www.felixcloutier.com/x86/INVLPG.html), [TLB wiki](http://lxr.free-electrons.com/ident?i=__native_flush_tlb_single) ## 改善方向二：更改資料結構為 hash table **commit id: c5af6c8800d450d95598c83f0ef8905093d78ea2** hash(x) = x mod M, M = 1031 ```clike= int init_hash(entry** table, int size) { if(NULL == (*table = (entry *) malloc(sizeof(entry) * size))) return -1; for(int i = 0; i < size; i++) { (*table)[i].lastName[0] = '\0'; (*table)[i].pDetail = NULL; (*table)[i].pNext = NULL; } return 1; } int hash_function(char* str) { int value = 0; while(*str) value += *str++; value %= HASH_TABLE_SIZE; return value; } ``` 初始化 hash_table 之函數，以 lastName[0] 設定為 '\0' 作為尚未使用此格的象徵。hash_function 先以簡單為主，取質數 1031 作為 HASH_TABLE_SIZE。 ```clike= entry *findName(char lastName[], entry *table) { int key = hash_function(lastName); entry *current = &table[key]; while (current != NULL) { if (strcasecmp(lastName, current->lastName) == 0) return current; current = current->pNext; } return NULL; } void append(char lastName[], entry *table) { /* allocate memory for the new entry and put lastName ¦* into the right location */ int key = hash_function(lastName); if(table[key].lastName[0] == '\0') strcpy(table[key].lastName, lastName); else { entry *tmp = table[key].pNext; table[key].pNext = (entry *) malloc(sizeof(entry)); strcpy(table[key].pNext->lastName, lastName); table[key].pNext->pDetail = NULL; table[key].pNext->pNext = tmp; } } ``` 將原本 linked list implementation 修改為 hash table ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 209,585 cache-misses # 74.282 % of all cache refs ( +- 0.11% ) 282,147 cache-references ( +- 0.18% ) 272,838,801 instructions # 1.58 insns per cycle ( +- 0.00% ) 173,166,124 cycles ( +- 0.06% ) 0.081228612 seconds time elapsed ( +- 0.41% ) ``` [Qustion] 大抵與降低 struct size 減少後相同，但注意到 cache-reference 次數從 311,822 -> 282,147 (減少 10%)，不知道是否為結尾沒有做 free 之影響。 ### plot ![](https://i.imgur.com/87pK7od.png) append() 上升至與 original 相同，findName() 則下降至幾乎沒有。故接下來我想把目標放在降低 append() Execution time ### 使用 perf 觀察 hash table 版本之 cache misses ``` Samples: 364 of event 'cache-misses', Event count (approx.): 209224 Overhead Command Shared Object Symbol 62.86% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 10.15% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 8.15% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 3.02% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic 2.35% phonebook_opt [kernel.kallsyms] [k] page_remove_rmap 1.79% phonebook_opt [kernel.kallsyms] [k] handle_mm_fault 1.65% phonebook_opt [kernel.kallsyms] [k] free_pcppages_bulk 1.41% phonebook_opt [kernel.kallsyms] [k] wp_page_copy.isra.74 1.20% phonebook_opt [kernel.kallsyms] [k] unmap_page_range ``` malloc() 我等等就來解決你！ [Qustion] 不知道為什麼 '__strcasecmp_l_sse4_2' 通通不見了... ### 將 append() 計算 hash value 的時間減少 ```clike= while (fgets(line, sizeof(line), fp)) { int hash_value = 0; while (line[i] != '\0') { hash_value += line[i]; i++; } line[i - 1] = '\0'; ``` 因在 main.c 在做 I/O 時已經會處理過每一個字串，在此時便可先計算出 hash value，不用到 append() 時再計算一次，從 0.08s -> 0.0722s，改善幅度不大QQ (減少 10%) ![](https://i.imgur.com/6fRamWy.png) ### 小結 * [Question] 當 opt 部分的 code 改的部分越多，#if, #endif 逐漸會顯得凌亂，想同時比較三份 code 之圖表，想另外再處理。 * [Question] 如果採用 Hash table，不知道該如何清空 cache，能想到的辦法是預先分配 memory pool，才能掌握自己 malloc 的起始與終止位址。 * [Question] 程式結束時，free 之必要性？ ## 改善方向三：更改 malloc 為 memory pool **commit id: 1251464f2c0ca0f9adf4486ade4f4a46033c4f3e** 一次 malloc 所需要之空間給 memory pool ### plot ![](https://i.imgur.com/MPpTOMQ.png) ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 149,254 cache-misses # 69.794 % of all cache refs ( +- 0.11% ) 213,849 cache-references ( +- 0.15% ) 193,345,737 instructions # 1.54 insns per cycle ( +- 0.01% ) 125,624,775 cycles ( +- 0.05% ) 0.062834888 seconds time elapsed ( +- 1.60% ) ``` [Question] 原本只預期 cache-misses 降低，但沒想到 cache-references 與上個版本比較，下降了 24.3% ### 使用 perf 觀察 hash table + memory pool 版本之 cache misses ``` Samples: 30 of event 'cache-misses', Event count (approx.): 149390 Overhead Command Shared Object Symbol 29.09% phonebook_opt [kernel.kallsyms] [k] clear_huge_page 20.95% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 19.87% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 8.60% phonebook_opt [kernel.kallsyms] [k] find_get_entry 7.30% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic 6.25% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 2.83% phonebook_opt [kernel.kallsyms] [k] strlen 2.74% phonebook_opt libc-2.23.so [.] __mmap 1.97% phonebook_opt [kernel.kallsyms] [k] xfer_secondary_pool ``` * 成功減緩 get_page_from_freelist 之問題，其 cache misses ratio 從上個版本的 62% -> 20.92% * [Question] 多出了 [clear_huge_page](http://lxr.free-electrons.com/source/mm/memory.c#L4104), [migrate_page_copy](http://lxr.free-electrons.com/source/mm/migrate.c#L621)，尚未知道其用途。 ### 小結 * cache misses 從原本 hash table 版本：74% -> 69% * append() execution time 進一步下降：0.0722s -> 0.0626s * [Question] 使用 memory pool，其大小我是事先宣告，若要動態視 input 大小決定，想不到比較好的方法解決。自己覺得折衷的方法是一次不用 malloc 完所需要的空間，但可以一次宣告大一點的空間。 ## 改善方向四：Hash table 使用大一點的 item **commit id: 74f11e78fb49c52770fa90bd0aff6b0132f0d561** 觀察原本 __PHONE_BOOK_ENTRY size 為 32 byte，為配合 cache block size 為 64 byte，我可以宣告一個大一點的 struct __HASH_ITEM 包含兩個 __PHONE_BOOK_ENTRY。 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY_DETAIL *pDetail; } entry; typedef struct __HASH_ITEM { struct __PHONE_BOOK_ENTRY slot[2]; struct __HASH_ITEM *hNext; } hitem; ``` ### plot ![](https://i.imgur.com/Se9Eq9X.png) ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 147,686 cache-misses # 62.053 % of all cache refs ( +- 0.16% ) 237,999 cache-references ( +- 0.35% ) 180,179,581 instructions # 1.33 insns per cycle ( +- 0.04% ) 135,115,547 cycles ( +- 0.10% ) 0.064164752 seconds time elapsed ( +- 0.58% ) ``` ### 使用 perf 此版本之 cache misses ``` Samples: 56 of event 'cache-misses', Event count (approx.): 146322 Overhead Command Shared Object Symbol 24.78% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 23.10% phonebook_opt [kernel.kallsyms] [k] clear_huge_page 12.00% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 9.26% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 8.39% phonebook_opt libc-2.23.so [.] _IO_getline_info 7.76% phonebook_opt [kernel.kallsyms] [k] native_pte_clear N ``` ### 小結 * append() Execution time 沒有受到影響 * [Question] Cache References 又上升了 10% (213,849 -> 237,999) **猜測是多了 __HASH_ITEM struct 之緣故** * cache misses 進一步從 69.794% -> 62.053% ## 改善方向五：Hash table 的每個 row 擁有自己的 memory pool **commit id: 1cda2ec90beaa18263f360ad419a10f3beb60243** 改善在 Hash table 同一個 row 搜尋及新增的成本，連續的位址期望可降低 cache-misses。 memory pool 改良為一次宣告一部分的空間，若該 row 之 memory pool 之空間用完時，再透過自己撰寫的 pool_allocate 取得空間。 ```clike= pool *memory_pool[HASH_TABLE_SIZE]; for(int i = 0; i < HASH_TABLE_SIZE; i++) memory_pool[i] = pool_init(15 * sizeof(hitem)); ``` 每個 row 擁有自己的 memory pool ```clike= typedef struct __POOL_LIST { char *start; struct __POOL_LIST *link; } plist; typedef struct __POOL { char *next; char *end; size_t size; int count; struct __POOL_LIST *mlist; } pool; ``` 由於 memory pool 設計的更動，勢必得每次 pool_allocate 新空間給 memory pool 時，維護他目前所擁有的空間。在這邊使用 plist 把每次空間的起始位址給串起來。不過只有在清除 cache 時用到這個 list。 ### plot ![](https://i.imgur.com/trHucgX.png) ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 338,859 cache-misses # 51.688 % of all cache refs ( +- 0.17% ) 655,590 cache-references ( +- 0.17% ) 188,114,159 instructions # 1.19 insns per cycle ( +- 0.00% ) 158,371,962 cycles ( +- 0.06% ) 0.074786404 seconds time elapsed ( +- 0.38% ) ``` ### 使用 perf 此版本之 cache misses ``` Samples: 292 of event 'cache-misses', Event count (approx.): 309526 Overhead Command Shared Object Symbol 43.43% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 22.77% phonebook_opt libc-2.23.so [.] _IO_getline_info 9.73% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 8.57% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 2.85% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic 2.15% phonebook_opt phonebook_opt [.] append 1.89% phonebook_opt libc-2.23.so [.] _IO_fgets 1.13% phonebook_opt [kernel.kallsyms] [k] prepend_name ``` ### 小結 * 因為需要 malloc 的關係，append() Execution Time 較上一個版本略為增加 * cache misses 從上一個版本：62% -> 51% ## 改善方向六：改善 Hash function **commit id: 05ec020f5128ef5c6a084654dfb8929d38da54aa** 在這邊參考 [nekoneko](https://hackmd.io/s/rJCs_ZVa) 的 hash function 2，但有改良一下 bucket number M = 2737 ### plot ![](https://i.imgur.com/EoavsvK.png) ![](https://i.imgur.com/KEpClBa.png) ### cache-test ``` Performance counter stats for './phonebook_opt' (100 runs): 407,907 cache-misses # 25.821 % of all cache refs ( +- 0.56% ) 1,579,760 cache-references ( +- 0.12% ) 212,887,099 instructions # 1.14 insns per cycle ( +- 0.00% ) 185,930,380 cycles ( +- 0.11% ) 0.091072857 seconds time elapsed ( +- 0.60% ) ``` ### 使用 perf 此版本之 cache misses ``` Samples: 334 of event 'cache-misses', Event count (approx.): 358953 Overhead Command Shared Object Symbol 32.73% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist 23.82% phonebook_opt libc-2.23.so [.] _IO_getline_info 7.11% phonebook_opt phonebook_opt [.] append 6.43% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero 4.59% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single 4.02% phonebook_opt phonebook_opt [.] main 2.63% phonebook_opt libc-2.23.so [.] _IO_fgets 2.23% phonebook_opt phonebook_opt [.] pool_allocate 1.87% phonebook_opt libc-2.23.so [.] __memcpy_ia32 1.48% phonebook_opt [kernel.kallsyms] [k] page_remove_rmap 1.40% phonebook_opt libc-2.23.so [.] __memchr_ia32 1.40% phonebook_opt [kernel.kallsyms] [k] __rmqueue.isra.90 1.37% phonebook_opt libc-2.23.so [.] _int_malloc 1.34% phonebook_opt [kernel.kallsyms] [k] jbd2_journal_add_journal_head ``` ### 小結 * append() Execution Time 增加許多，因 Hash function 變複雜 * Cache missis 從上個版本大幅下降！　**51.688% -> 25.821%** * [Question] 該如何調整當 Hash table 每個 Row 的 memory pool 不夠時所增加的量？ ### 參考材料： * 參考 [nekoneko之筆記](https://hackmd.io/s/rJCs_ZVa) 之 Hash function 分析 * 參考 [memory pool implementation](http://stackoverflow.com/questions/11749386/implement-own-memory-pool)