# 2017q1 Homework1 (phonebook)
contributed by <`chenweiii`>
[Preparing...](https://hackmd.io/s/HkbFQSOYg)
### Reviewed by `laochanlam`
> [Question] 仍不太清楚 assert 運作,若 findName(intput, e) 找到,應該才會接著執行下一個 statement,但實際運作好像反了過來。
- assert() 的用法是若括號內的 Statement 回傳的值是 false 時,終止程式的執行。
```C
assert(findName(input, e) && "Did you implement findName() in " IMPL "?");
```
是為了驗證您在當次執行的程式中有實作 findName() 這個 function ( 預設回傳 NULL ),所以若你 assert 接收到 NULL 的時候,便會中斷程式。
所以那兩句的作用是:先確認你有實作 findName() ,再確認您能夠準確地使用 findName() 找到電話簿中的 "zyxel" 。
- 是我的問題嗎,為什麼我沒有看到任何 Commit... Github連結有誤 ( ?
>>大大,我有另外開 branch ,您可能要切換才能看到喔![name=Chen Wei]
- 可以把每個優化版本都保留,一同畫圖比較效能時會更為清晰
>>了解![name=Chen Wei]
## environment
* OS: Ubuntu 16.04.2 LTS (64 bit)
* `$ cat /etc/os-release`
* CPU: Intel(R) Pentium(R) CPU B960 @ 2.20GHz
* `$ lscpu`
* Cache:
* L1d cache: 32K
* L1i cache: 32K
* L2 cache: 256K
* L3 cache: 2048K
* cash alignment: 64B (cache block size)
* `$ cat /proc/cpuinfo`
* Memory: 4GB
* `$ cat /proc/meminfo`
## 初步觀察
### 程式觀察
* main.c
```clike=
while (fgets(line, sizeof(line), fp)) {
while (line[i] != '\0')
i++;
line[i - 1] = '\0';
i = 0;
e = append(line, e);
```
~~[Question] 為什麼要 line[i - 1] = '\0'?~~
**因 line 最後一個字元為 '\n'**
```clike=
assert(findName(input, e) &&
"Did you implement findName() in " IMPL "?");
assert(0 == strcmp(findName(input, e)->lastName, "zyxel"));
```
[Question] 仍不太清楚 assert 運作,若 findName(intput, e) 找到,應該才會接著執行下一個 statement,但實際運作好像反了過來。
```clike=
if (pHead->pNext) free(pHead->pNext);
free(pHead);
```
[Question] 不太清楚為何僅 free 這兩塊,而不是整條 linked-list
### 使用 perf 觀察熱點
`$ ./phonebook_orig & sudo perf top -p $!`
```Samples: 254 of event 'cycles', Event count (approx.): 97639063
Overhead Shared Object Symbol
15.28% libc-2.23.so [.] __strcasecmp_l_sse4_2
13.35% phonebook_orig [.] findName
11.69% phonebook_orig [.] main
9.34% libc-2.23.so [.] __strcasecmp_sse4_2
6.84% libc-2.23.so [.] __memchr_ia32
5.92% libc-2.23.so [.] __memcpy_ia32
5.37% libc-2.23.so [.] _IO_getline_info
3.48% libc-2.23.so [.] _int_malloc
2.66% [kernel] [k] get_page_from_freelist
2.59% libc-2.23.so [.] _IO_fgets
2.31% phonebook_orig [.] append
2.28% [kernel] [k] page_fault
2.00% [kernel] [k] native_flush_tlb_single
1.61% libc-2.23.so [.] __strcpy_ssse3
1.55% libc-2.23.so [.] malloc
1.34% [kernel] [k] free_hot_cold_page
1.33% [kernel] [k] free_pcppages_bulk
1.23% phonebook_orig [.] 0x000005b0
0.94% [kernel] [k] __rmqueue.isra.90
0.88% [kernel] [k] page_remove_rmap
0.88% [kernel] [k] __mod_zone_page_state
0.76% libc-2.23.so [.] _IO_getline
0.73% [kernel] [k] __copy_from_user_ll_nozero
0.73% [kernel] [k] handle_mm_fault
0.44% [kernel] [k] native_write_msr_safe
0.44% [kernel] [k] mem_cgroup_page_lruvec
0.44% [kernel] [k] PageHuge
0.44% [kernel] [k] enqueue_entity
0.44% [kernel] [k] unmap_page_range
0.41% [kernel] [k] _cond_resched
0.40% [kernel] [k] do_fast_syscall_32
0.39% [kernel] [k] page_add_new_anon_rmap
0.38% [kernel] [k] lru_cache_add_active_or_unevictable
0.37% [kernel] [k] find_get_entry
0.36% [kernel] [k] __do_page_fault
0.36% [kernel] [k] __kunmap_atomic
0.01% [kernel] [k] rcu_irq_exit
```
### make run
```
size of entry : 128 bytes
execution time of append() : 0.148193 sec
execution time of findName() : 0.006575 sec
```
### make cache-test
```
Performance counter stats for './phonebook_orig' (100 runs):
1,216,994 cache-misses # 67.717 % of all cache refs ( +- 0.20% )
1,797,186 cache-references ( +- 0.24% )
309,915,175 instructions # 1.40 insns per cycle ( +- 0.01% )
221,699,529 cycles ( +- 0.06% )
0.103731045 seconds time elapsed ( +- 0.82% )
```
### make plot
![](https://i.imgur.com/aEH4sw8.png)
### 小結
* 大量的 cache-reference 可能藉由縮小 entry size 而下降
* 熱點集中在 strcasecmp, findName, main,改善此三個 function 表現應可大幅改善 execution time
* strcasecmp:尚未有想法。
* findName:應可藉由 Hash function 取代 linked-list 改善。
* main:熱點集中在處理 append 的 while 迴圈,也尚未有想法處理。
* [Question] **發現 make plot 的 append execution time 與 make run 之 append execution time 相差甚遠**
### 改善方向
- [x] 縮小 struct size
- [x] 更改資料結構為 hash table,並觀察不同 hash function 之表現
- [ ] ~~使用 int[] 取代 char[],應該降低 strcasecmp 所帶來之負擔。~~
- [x] 增加 hash table item 之 size 為 cache block size (i.e. 一個 item 放多個 phonebook entry),驗證是否會降低 cache miss
- [x] 實作 memory pool 取代每次 malloc
<br>
## 改善方向一:縮小 struct size
**commit id: c7d1268dae231456e058ab64c57a394be5c17e7c**
```clike=
typedef struct __PHONE_BOOK_ENTRY_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY_DETAIL *pDetail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
size of entry: 128 Byte -> 24 byte
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
234,867 cache-misses # 75.321 % of all cache refs ( +- 0.18% )
311,822 cache-references ( +- 0.22% )
287,494,977 instructions # 1.68 insns per cycle ( +- 0.09% )
171,261,822 cycles ( +- 0.08% )
0.079299396 seconds time elapsed ( +- 0.18% )
```
### plot
![](https://i.imgur.com/K67jVOE.png)
### 小結
* append() execution time:0.078157s -> 0.062106s (減少約20%)
* findName() execution time:0.006593s -> 0.005095s (減少約22%)
* cache references:1,797,186 -> 311,822 (減少約83%)
減少 struct size 效果顯著
* cache miss ratio:67.717 % -> 75.321 %
但 miss ratio 反而上升
* [Question] cache misses ratio 反而上升之原因?
>> 把硬體資訊列出來,在一開頭。之後再來分析 cache 的行為落差 [name="jserv"]
### 使用 perf 觀察 phonebook_orig cache misses 之熱點
```
$ perf record -e cache-misses ./phonebook_orig
$ perf report
```
```
Samples: 451 of event 'cache-misses', Event count (approx.): 1255837
Overhead Command Shared Object Symbol
41.48% phonebook_orig [kernel.kallsyms] [k] get_page_from_freelist
17.18% phonebook_orig libc-2.23.so [.] __strcasecmp_l_sse4_2
9.27% phonebook_orig libc-2.23.so [.] __strcasecmp_sse4_2
8.87% phonebook_orig phonebook_orig [.] findName
7.75% phonebook_orig [kernel.kallsyms] [k] native_flush_tlb_single
2.16% phonebook_orig [kernel.kallsyms] [k] __kunmap_atomic
1.91% phonebook_orig [kernel.kallsyms] [k] __copy_from_user_ll_nozero
1.65% phonebook_orig phonebook_orig [.] 0x000005b0
1.47% phonebook_orig [kernel.kallsyms] [k] __rmqueue.isra.90
```
### 使用 perf 觀察 phonebook_opt cache misses 之熱點
```
Samples: 272 of event 'cache-misses', Event count (approx.): 251187
Overhead Command Shared Object Symbol
45.07% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
12.86% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
9.20% phonebook_opt libc-2.23.so [.] __strcasecmp_l_sse4_2
7.82% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
3.54% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic
3.14% phonebook_opt [kernel.kallsyms] [k] __rmqueue.isra.90
2.84% phonebook_opt [kernel.kallsyms] [k] unmap_page_range
2.11% phonebook_opt phonebook_opt [.] findName
1.90% phonebook_opt [kernel.kallsyms] [k] native_pte_clear
1.87% phonebook_opt libc-2.23.so [.] __strcasecmp_sse4_2
1.78% phonebook_opt [kernel.kallsyms] [k] free_pages_prepare
1.50% phonebook_opt [kernel.kallsyms] [k] handle_mm_fault
0.61% phonebook_opt [kernel.kallsyms] [k] balance_dirty_pages_ratelimited
```
### 減少 struct size 之觀察
* malloc 仍佔 cache misses 大宗
* **native_flush_tlb_single**: change from 7.75% to 12.86%
* **__strcasecmp_l_sse4_2**: change from 17.18% to 9.2%
* **__strcasecmp_sse4_2**: change from 9.27% to 1.87%
* **findName**: change from 8.87% to 2.11%
* **__copy_from_user_ll_nozero**: change from 1.19% to 7.82%
### 何謂 native_flush_tlb_single
```clike=
/* arch/x86/kernel/paravirt.c, line 193 */
static void native_flush_tlb_single(unsigned long addr)
{
__native_flush_tlb_single(addr);
}
/* arch/x86/include/asm/tlbflush.h, line 184 */
static inline void __native_flush_tlb_single(unsigned long addr)
{
asm volatile("invlpg (%0)" ::"r" (addr) : "memory");
}
```
其中 **invlpg** 為 x86 Invalidate TLB Entries 的指令
[Question] wiki 上寫只有在 address space switch 時才須清空 TLB,但是 switch 到哪?還是只是一般的 context switch
Referenced from: [invlpg](http://www.felixcloutier.com/x86/INVLPG.html), [TLB wiki](http://lxr.free-electrons.com/ident?i=__native_flush_tlb_single)
<br>
## 改善方向二:更改資料結構為 hash table
**commit id: c5af6c8800d450d95598c83f0ef8905093d78ea2**
hash(x) = x mod M, M = 1031
```clike=
int init_hash(entry** table, int size)
{
if(NULL == (*table = (entry *) malloc(sizeof(entry) * size))) return -1;
for(int i = 0; i < size; i++) {
(*table)[i].lastName[0] = '\0';
(*table)[i].pDetail = NULL;
(*table)[i].pNext = NULL;
}
return 1;
}
int hash_function(char* str)
{
int value = 0;
while(*str)
value += *str++;
value %= HASH_TABLE_SIZE;
return value;
}
```
初始化 hash_table 之函數,以 lastName[0] 設定為 '\0' 作為尚未使用此格的象徵。hash_function 先以簡單為主,取質數 1031 作為 HASH_TABLE_SIZE。
```clike=
entry *findName(char lastName[], entry *table)
{
int key = hash_function(lastName);
entry *current = &table[key];
while (current != NULL) {
if (strcasecmp(lastName, current->lastName) == 0)
return current;
current = current->pNext;
}
return NULL;
}
void append(char lastName[], entry *table)
{
/* allocate memory for the new entry and put lastName
¦* into the right location */
int key = hash_function(lastName);
if(table[key].lastName[0] == '\0')
strcpy(table[key].lastName, lastName);
else {
entry *tmp = table[key].pNext;
table[key].pNext = (entry *) malloc(sizeof(entry));
strcpy(table[key].pNext->lastName, lastName);
table[key].pNext->pDetail = NULL;
table[key].pNext->pNext = tmp;
}
}
```
將原本 linked list implementation 修改為 hash table
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
209,585 cache-misses # 74.282 % of all cache refs ( +- 0.11% )
282,147 cache-references ( +- 0.18% )
272,838,801 instructions # 1.58 insns per cycle ( +- 0.00% )
173,166,124 cycles ( +- 0.06% )
0.081228612 seconds time elapsed ( +- 0.41% )
```
[Qustion] 大抵與降低 struct size 減少後相同,但注意到 cache-reference 次數從 311,822 -> 282,147 (減少 10%),不知道是否為結尾沒有做 free 之影響。
### plot
![](https://i.imgur.com/87pK7od.png)
append() 上升至與 original 相同,findName() 則下降至幾乎沒有。故接下來我想把目標放在降低 append() Execution time
### 使用 perf 觀察 hash table 版本之 cache misses
```
Samples: 364 of event 'cache-misses', Event count (approx.): 209224
Overhead Command Shared Object Symbol
62.86% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
10.15% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
8.15% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
3.02% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic
2.35% phonebook_opt [kernel.kallsyms] [k] page_remove_rmap
1.79% phonebook_opt [kernel.kallsyms] [k] handle_mm_fault
1.65% phonebook_opt [kernel.kallsyms] [k] free_pcppages_bulk
1.41% phonebook_opt [kernel.kallsyms] [k] wp_page_copy.isra.74
1.20% phonebook_opt [kernel.kallsyms] [k] unmap_page_range
```
malloc() 我等等就來解決你!
[Qustion] 不知道為什麼 '__strcasecmp_l_sse4_2' 通通不見了...
### 將 append() 計算 hash value 的時間減少
```clike=
while (fgets(line, sizeof(line), fp)) {
int hash_value = 0;
while (line[i] != '\0') {
hash_value += line[i];
i++;
}
line[i - 1] = '\0';
```
因在 main.c 在做 I/O 時已經會處理過每一個字串,在此時便可先計算出 hash value,不用到 append() 時再計算一次,從 0.08s -> 0.0722s,改善幅度不大QQ (減少 10%)
![](https://i.imgur.com/6fRamWy.png)
### 小結
* [Question] 當 opt 部分的 code 改的部分越多,#if, #endif 逐漸會顯得凌亂,想同時比較三份 code 之圖表,想另外再處理。
* [Question] 如果採用 Hash table,不知道該如何清空 cache,能想到的辦法是預先分配 memory pool,才能掌握自己 malloc 的起始與終止位址。
* [Question] 程式結束時,free 之必要性?
<br>
## 改善方向三:更改 malloc 為 memory pool
**commit id: 1251464f2c0ca0f9adf4486ade4f4a46033c4f3e**
一次 malloc 所需要之空間給 memory pool
### plot
![](https://i.imgur.com/MPpTOMQ.png)
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
149,254 cache-misses # 69.794 % of all cache refs ( +- 0.11% )
213,849 cache-references ( +- 0.15% )
193,345,737 instructions # 1.54 insns per cycle ( +- 0.01% )
125,624,775 cycles ( +- 0.05% )
0.062834888 seconds time elapsed ( +- 1.60% )
```
[Question] 原本只預期 cache-misses 降低,但沒想到 cache-references 與上個版本比較,下降了 24.3%
### 使用 perf 觀察 hash table + memory pool 版本之 cache misses
```
Samples: 30 of event 'cache-misses', Event count (approx.): 149390
Overhead Command Shared Object Symbol
29.09% phonebook_opt [kernel.kallsyms] [k] clear_huge_page
20.95% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
19.87% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
8.60% phonebook_opt [kernel.kallsyms] [k] find_get_entry
7.30% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic
6.25% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
2.83% phonebook_opt [kernel.kallsyms] [k] strlen
2.74% phonebook_opt libc-2.23.so [.] __mmap
1.97% phonebook_opt [kernel.kallsyms] [k] xfer_secondary_pool
```
* 成功減緩 get_page_from_freelist 之問題,其 cache misses ratio 從上個版本的 62% -> 20.92%
* [Question] 多出了 [clear_huge_page](http://lxr.free-electrons.com/source/mm/memory.c#L4104), [migrate_page_copy](http://lxr.free-electrons.com/source/mm/migrate.c#L621),尚未知道其用途。
### 小結
* cache misses 從原本 hash table 版本:74% -> 69%
* append() execution time 進一步下降:0.0722s -> 0.0626s
* [Question] 使用 memory pool,其大小我是事先宣告,若要動態視 input 大小決定,想不到比較好的方法解決。自己覺得折衷的方法是一次不用 malloc 完所需要的空間,但可以一次宣告大一點的空間。
<br>
## 改善方向四:Hash table 使用大一點的 item
**commit id: 74f11e78fb49c52770fa90bd0aff6b0132f0d561**
觀察原本 __PHONE_BOOK_ENTRY size 為 32 byte,為配合 cache block size 為 64 byte,我可以宣告一個大一點的 struct __HASH_ITEM 包含兩個 __PHONE_BOOK_ENTRY。
```clike=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY_DETAIL *pDetail;
} entry;
typedef struct __HASH_ITEM {
struct __PHONE_BOOK_ENTRY slot[2];
struct __HASH_ITEM *hNext;
} hitem;
```
### plot
![](https://i.imgur.com/Se9Eq9X.png)
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
147,686 cache-misses # 62.053 % of all cache refs ( +- 0.16% )
237,999 cache-references ( +- 0.35% )
180,179,581 instructions # 1.33 insns per cycle ( +- 0.04% )
135,115,547 cycles ( +- 0.10% )
0.064164752 seconds time elapsed ( +- 0.58% )
```
### 使用 perf 此版本之 cache misses
```
Samples: 56 of event 'cache-misses', Event count (approx.): 146322
Overhead Command Shared Object Symbol
24.78% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
23.10% phonebook_opt [kernel.kallsyms] [k] clear_huge_page
12.00% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
9.26% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
8.39% phonebook_opt libc-2.23.so [.] _IO_getline_info
7.76% phonebook_opt [kernel.kallsyms] [k] native_pte_clear N
```
### 小結
* append() Execution time 沒有受到影響
* [Question] Cache References 又上升了 10% (213,849 -> 237,999)
**猜測是多了 __HASH_ITEM struct 之緣故**
* cache misses 進一步從 69.794% -> 62.053%
<br>
## 改善方向五:Hash table 的每個 row 擁有自己的 memory pool
**commit id: 1cda2ec90beaa18263f360ad419a10f3beb60243**
改善在 Hash table 同一個 row 搜尋及新增的成本,連續的位址期望可降低 cache-misses。
memory pool 改良為一次宣告一部分的空間,若該 row 之 memory pool 之空間用完時,再透過自己撰寫的 pool_allocate 取得空間。
```clike=
pool *memory_pool[HASH_TABLE_SIZE];
for(int i = 0; i < HASH_TABLE_SIZE; i++)
memory_pool[i] = pool_init(15 * sizeof(hitem));
```
每個 row 擁有自己的 memory pool
```clike=
typedef struct __POOL_LIST {
char *start;
struct __POOL_LIST *link;
} plist;
typedef struct __POOL {
char *next;
char *end;
size_t size;
int count;
struct __POOL_LIST *mlist;
} pool;
```
由於 memory pool 設計的更動,勢必得每次 pool_allocate 新空間給 memory pool 時,維護他目前所擁有的空間。在這邊使用 plist 把每次空間的起始位址給串起來。不過只有在清除 cache 時用到這個 list。
### plot
![](https://i.imgur.com/trHucgX.png)
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
338,859 cache-misses # 51.688 % of all cache refs ( +- 0.17% )
655,590 cache-references ( +- 0.17% )
188,114,159 instructions # 1.19 insns per cycle ( +- 0.00% )
158,371,962 cycles ( +- 0.06% )
0.074786404 seconds time elapsed ( +- 0.38% )
```
### 使用 perf 此版本之 cache misses
```
Samples: 292 of event 'cache-misses', Event count (approx.): 309526
Overhead Command Shared Object Symbol
43.43% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
22.77% phonebook_opt libc-2.23.so [.] _IO_getline_info
9.73% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
8.57% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
2.85% phonebook_opt [kernel.kallsyms] [k] __kunmap_atomic
2.15% phonebook_opt phonebook_opt [.] append
1.89% phonebook_opt libc-2.23.so [.] _IO_fgets
1.13% phonebook_opt [kernel.kallsyms] [k] prepend_name
```
### 小結
* 因為需要 malloc 的關係,append() Execution Time 較上一個版本略為增加
* cache misses 從上一個版本:62% -> 51%
<br>
## 改善方向六:改善 Hash function
**commit id: 05ec020f5128ef5c6a084654dfb8929d38da54aa**
在這邊參考 [nekoneko](https://hackmd.io/s/rJCs_ZVa) 的 hash function 2,但有改良一下 bucket number M = 2737
### plot
![](https://i.imgur.com/EoavsvK.png)
![](https://i.imgur.com/KEpClBa.png)
### cache-test
```
Performance counter stats for './phonebook_opt' (100 runs):
407,907 cache-misses # 25.821 % of all cache refs ( +- 0.56% )
1,579,760 cache-references ( +- 0.12% )
212,887,099 instructions # 1.14 insns per cycle ( +- 0.00% )
185,930,380 cycles ( +- 0.11% )
0.091072857 seconds time elapsed ( +- 0.60% )
```
### 使用 perf 此版本之 cache misses
```
Samples: 334 of event 'cache-misses', Event count (approx.): 358953
Overhead Command Shared Object Symbol
32.73% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist
23.82% phonebook_opt libc-2.23.so [.] _IO_getline_info
7.11% phonebook_opt phonebook_opt [.] append
6.43% phonebook_opt [kernel.kallsyms] [k] __copy_from_user_ll_nozero
4.59% phonebook_opt [kernel.kallsyms] [k] native_flush_tlb_single
4.02% phonebook_opt phonebook_opt [.] main
2.63% phonebook_opt libc-2.23.so [.] _IO_fgets
2.23% phonebook_opt phonebook_opt [.] pool_allocate
1.87% phonebook_opt libc-2.23.so [.] __memcpy_ia32
1.48% phonebook_opt [kernel.kallsyms] [k] page_remove_rmap
1.40% phonebook_opt libc-2.23.so [.] __memchr_ia32
1.40% phonebook_opt [kernel.kallsyms] [k] __rmqueue.isra.90
1.37% phonebook_opt libc-2.23.so [.] _int_malloc
1.34% phonebook_opt [kernel.kallsyms] [k] jbd2_journal_add_journal_head
```
### 小結
* append() Execution Time 增加許多,因 Hash function 變複雜
* Cache missis 從上個版本大幅下降! **51.688% -> 25.821%**
* [Question] 該如何調整當 Hash table 每個 Row 的 memory pool 不夠時所增加的量?
### 參考材料:
* 參考 [nekoneko之筆記](https://hackmd.io/s/rJCs_ZVa) 之 Hash function 分析
* 參考 [memory pool implementation](http://stackoverflow.com/questions/11749386/implement-own-memory-pool)