owned this note
owned this note
Published
Linked with GitHub
# 2017q3 Homework1 (phonebook)
contributed by <`LinRiver`>
## 實驗環境架設
```
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 78
Model name: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
Stepping: 3
CPU MHz: 1247.949
CPU max MHz: 3100.0000
CPU min MHz: 400.0000
BogoMIPS: 5184.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
```
## perf設定
依照 [perf原理與實務](https://hackmd.io/s/B11109rdg#) 步驟設定perf
* 執行: ```$ sudo apt-get install linux-tools-common```
* 執行: ```$ perf top```
接著會出現錯誤訊息須更改權限值進行後續效能分析
* 執行: ```$ sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"```
並且輸入以下指令解除kernel pointer禁用
* 執行: ```$ sudo sh -c "echo 0 > /proc/sys/kernel/kptr_restrict"```
## 程式效能分析(優化前)
* 觀察優化前的資料結構型態: ``` $ vim ./phonebook_orig.h ```
```
/* original version */
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
entry的資料結構型態為linked list
共計 136 bytes 由16+16+10+10+16+16+16+2+5+5(記憶體會有 [alignment](https://stackoverflow.com/questions/381244/purpose-of-memory-alignment) )+8( pointer size 在64 bit 為 8 bytes)
* 了解優化前的搜尋程式: ``` $ vim ./phonebook_orig.c ```
```
/* original version */
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
可以清楚看到 findName() 為 linked list 的搜尋方式
另外 append() 則是在entry後新增entry並且分配記憶體來放置 lastName
了解原始程式碼架構後開始執行效能分析
* 執行: ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches ```
首先清空cache
開始執行效能分析
* 執行: ``` $ ./phonebook_orig ```
```
size of entry : 136 bytes
execution time of append() : 0.079085 sec
execution time of findName() : 0.005247 sec
```
測試cache miss rate
* 執行: ``` perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig ```
* 結果:
```
Performance counter stats for './phonebook_orig' (100 runs):
461,9888 cache-misses # 93.254 % of all cache refs ( +- 0.07% )
495,4080 cache-references ( +- 0.08% )
2,6176,4908 instructions # 1.52 insn per cycle ( +- 0.02% )
1,7224,3408 cycles ( +- 0.10% )
0.058807717 seconds time elapsed ( +- 1.13% )
```
cache miss 達93.254%
透過perf hotspot 進一步了解原因
* 執行: ``` $ sudo perf record -e cache-misses./phonebook_orig```
* 執行: ``` $ sudo perf report ```
* 結果:
```
Samples: 316 of event 'cache-misses', Event count (approx.): 4731251
Overhead Command Shared Object Symbol
53.21% phonebook_orig libc-2.23.so [.] __strcasecmp_l_avx
17.11% phonebook_orig phonebook_orig [.] findName
9.77% phonebook_orig [kernel.kallsyms] [k] clear_page_c_e
3.41% phonebook_orig [kernel.kallsyms] [k] _raw_spin_lock
2.35% phonebook_orig [kernel.kallsyms] [k] __rmqueue
2.32% phonebook_orig libc-2.23.so [.] __strcasecmp_avx
2.19% phonebook_orig [kernel.kallsyms] [k] free_pcppages_bulk
1.40% phonebook_orig [kernel.kallsyms] [k] copy_user_enhanced_fast_string
1.30% phonebook_orig [kernel.kallsyms] [k] get_page_from_freelist
1.16% phonebook_orig [kernel.kallsyms] [k] __dec_node_state
1.07% phonebook_orig [kernel.kallsyms] [k] try_charge
0.97% phonebook_orig [kernel.kallsyms] [k] free_hot_cold_page_list
0.59% phonebook_orig [kernel.kallsyms] [k] __alloc_pages_nodemask
0.53% phonebook_orig [kernel.kallsyms] [k] handle_mm_fault
0.48% phonebook_orig [kernel.kallsyms] [k] alloc_pages_vma
0.42% phonebook_orig [kernel.kallsyms] [k] mem_cgroup_try_charge
0.25% phonebook_orig [kernel.kallsyms] [k] pmd_page_vaddr
0.25% phonebook_orig [kernel.kallsyms] [k] get_mem_cgroup_from_mm
0.23% phonebook_orig ld-2.23.so [.] _dl_catch_error
0.15% phonebook_orig [kernel.kallsyms] [k] pte_lockptr.isra.15
0.13% phonebook_orig [kernel.kallsyms] [k] __vm_enough_memory
```
主要hotspot是findName函數字串比對次數過多而佔了17.11%
## 程式效能分析(優化後)
優化方向目前有二:
* 縮減entry資料結構大小
* 使用hash function進行搜尋
### 縮減entry資料結構大小
```
typedef struct __PHONE_BOOK_ITEM {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} item;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ITEM *pitem;
struct __PHONE_BOOK_ENTRY *pNext;
}entry;
```
在此電話簿搜尋中, 其 append() 與 findName() 主要以 last name 進行。 將 last name 從原先資料結構中移出, 建立只儲存 last name 且設立指針指向詳細資訊的struct。
```
Performance counter stats for './phonebook_opt' (100 runs):
155,2555 cache-misses # 68.910 % of all cache refs ( +- 0.26% )
225,3006 cache-references ( +- 0.10% )
2,4438,8739 instructions # 2.09 insn per cycle ( +- 0.02% )
1,1719,5806 cycles ( +- 0.12% )
0.039599956 seconds time elapsed ( +- 0.70% )
```
刪減過後的 struct 進行分析後, 可發現 cache miss 從 93.254% 降至 68.910%
![](https://i.imgur.com/PeAsNNo.png)
同時相較原版本, append() 與 findName() 之執行時間也都下降
### 使用hash function進行搜尋
hash function介紹(參考[Fundamental Data Structure](https://www.scribd.com/document/199819816/Fundamental-Data-Structures))
```
A hash function is any algorithm or subroutine that mapslarge data sets of variable length, called keys, to smallerdata sets of a fixed length. For example, a person's name, having a variable length, could be hashed to a singleinteger. The values returned by a hash function are called
hash values, hash codes, hash sums, checksums or simply hashes.
```
![](https://i.imgur.com/5NhTi6a.png)
由圖示可發現, 若是兩筆資料( Josh Smith 與 Sam Doe )映射至同一值( 02 )時, 會產生 Collision
欲避免此種現象發生, 在這使用 hash table with chaining 方式, 在同一值之地址上建立 linked list 來串接此兩筆資料
參考[tina0405共筆](https://hackmd.io/GYFgpghgrAHDEFoAmUBsSHgMwCMERAGMBOBQgdlQEYAGGKnVYiGIA===?view), 主要進行 append() 資料放置上從 linked list 到 hash table 修改
```
void *append(char lastName[], unsigned int pos)
{
while(strcmp(hashTable[pos]->lastName, "\000")) {
hashTable[pos]->pNext = (slot_unit *) malloc(sizeof(slot_unit));
hashTable[pos]=hashTable[pos]->pNext;
}
strcpy(hashTable[pos]->lastName, lastName);
hashTable[pos]->pNext = (slot_unit *) malloc(sizeof(slot_unit));
hashTable[pos]=hashTable[pos]->pNext;
}
```
參考[Yuron共筆](https://embedded2016.hackpad.com/ep/pad/static/NcHhQCQ4ijr), 注意到需將字串資料轉換為數值, 將每一字元相加轉為key
```
unsigned int stringToInt(char *key)
{
int number = 0;
while (*key)
number += (int)*key++;
return number;
}
```
在使用hash function 並同時縮減 struct 資料結構的分析下, 其cache miss 從 68.910% 降至 43.553%
```
Performance counter stats for './phonebook_opt' (100 runs):
51,2191 cache-misses # 43.553 % of all cache refs ( +- 0.13% )
117,6008 cache-references ( +- 0.73% )
2,2735,9330 instructions # 2.06 insn per cycle ( +- 0.02% )
1,1040,0258 cycles ( +- 0.35% )
0.037077142 seconds time elapsed ( +- 0.39% )
```
同時 append() 與 findName() 之執行時間有所變化
append() 從 0.033114 升至 0.043364 秒
findName() 從 0.001892 降至 0.000121 秒
## 參考資料
[perf原理與實務](https://hackmd.io/s/B11109rdg#)
[phonebook作業解說](https://hackmd.io/s/HJJUmdXsZ#)
[Fundamental Data Structure](https://www.scribd.com/document/199819816/Fundamental-Data-Structures)
[paul5566共筆](https://hackmd.io/CYIwjAhgbArALMAtFAHMApouBmADCxFMXAY0QCYVySB2KEkGdOAMyA==#)
[Hsiang共筆](https://hackmd.io/s/HylfV2FFe#)
[st9007a共筆](https://hackmd.io/s/BJvNXiEib#)
[tina0405共筆](https://hackmd.io/GYFgpghgrAHDEFoAmUBsSHgMwCMERAGMBOBQgdlQEYAGGKnVYiGIA===?view)
[Yuron共筆](https://embedded2016.hackpad.com/ep/pad/static/NcHhQCQ4ijr)