# 2017q3 Homework1 (phonebook)
contributed by <`kevin550029`>
## Perf 工具練習
### 安裝&初步測試
1. 按照 [perf 原理和實務](https://hackmd.io/s/B11109rdg) 中的步驟安裝perf
2. 嘗試執行 `perf top` 會出現錯誤訊息, 表示只能在root權限下使用
![error message](https://i.imgur.com/rpdDV6L.png)
3. 嘗試輸入下列指令, 但仍無法更改 restrict
```
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
```
4. 解決方法是在另外開一個 Terminal , 並在 Kernal 模式下執行
5. 利用 `perf top -p $pid` 測試 perf_top_example, 結果如下
![](https://i.imgur.com/GJcH37s.png)
#### 練習使用perf stat
* 測試用程式碼 `stat_test.c`
```clike=
#include <stdio.h>
#include <stdint.h>
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[j][i]++;
return 0;
}
```
* 使用 perf stat 分析 `stat_test.c` cache miss 情形
```
g++ -c stat_test.c
g++ stat_test.o -o stat
perf stat ./stat
```
得到結果如下
```
Performance counter stats for './stat' (5 runs):
493,7852 cache-misses # 2.241 % of all cache refs ( +- 1.18% )
2,2034,9691 cache-references ( +- 0.15% )
22,0592,2702 instructions # 2.04 insns per cycle ( +- 0.00% )
10,8220,2494 cycles ( +- 0.19% )
0.344483546 seconds time elapsed ( +- 1.39% )
```
考慮存取的局部性,將 i,j對調改成array[i][j]++
```clike=
#include <stdio.h>
#include <stdint.h>
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[i][j]++;
return 0;
}
```
cache-references 從 2,2034,9691 下降到 519,4173
```
201,6420 cache-misses # 38.821 % of all cache refs ( +- 1.78% )
519,4173 cache-references ( +- 0.75% )
22,0580,7841 instructions # 2.74 insns per cycle ( +- 0.00% )
8,0382,8158 cycles ( +- 0.04% )
0.257051135 seconds time elapsed ( +- 2.12% )
```
---
## Phonebook
### 測試環境
```
Linux version 4.4.0-96-generic
Ubuntu 16.04.3 LTS
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
```
### 測試原始程式碼
```
size of entry : 136 bytes
execution time of append() : 0.038564 sec
execution time of findName() : 0.005541 sec
Performance counter stats for './phonebook_orig' (100 runs):
463,5331 cache-misses # 93.928 % of all cache refs ( +- 0.86% )
493,4994 cache-references ( +- 0.83% )
2,5927,9283 instructions # 1.39 insns per cycle ( +- 0.53% )
1,8636,6232 cycles ( +- 0.66% )
0.061023413 seconds time elapsed ( +- 1.04% )
```
> cache-misses 為 93.928 %
* phonebook_orig.c
```clike=
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "phonebook_orig.h"
/* original version */
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
> findname 和 append 皆使用 Linklist 的結構
> Linklist 插入或刪除節點較為容易
> 但在搜尋存取上必須花時間去取出鏈結指標, 以便讀取下一個節點
> 若資料量大, 則須花較多時間搜尋
* phonebook_orig.h
```clike=
#ifndef _PHONEBOOK_H
#define _PHONEBOOK_H
#define MAX_LAST_NAME_SIZE 16
/* original version */
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
#endif
```
## 修改Struct內容
* 從減少 cache-miss 角度出發
* 先嘗試將struct中尚未使用到的內容註解掉
```clike=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
/*
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
*/
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
```
> 可以發現 cache-misses 的情況下降到 68.699%,
> append 和 findname 的 performance 都有增加
```
Performance counter stats for './phonebook_opt' (100 runs):
152,6120 cache-misses # 68.699 % of all cache refs ( +- 0.46% )
222,1462 cache-references ( +- 0.12% )
2,4388,2112 instructions # 2.03 insns per cycle ( +- 0.02% )
1,1994,1058 cycles ( +- 0.22% )
0.039064779 seconds time elapsed ( +- 0.33% )
```
![](https://i.imgur.com/bM07eho.png)
* 縮小 Struct 內容
```clike=
typedef struct __PHONE_BOOK_USER_INFO{
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} UserInfoEntry;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
UserInfoEntry *UserInfo;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
> cache-misses 的情況下降到 70.090%,
```
Performance counter stats for './phonebook_opt' (100 runs):
154,9857 cache-misses # 70.090 % of all cache refs ( +- 0.20% )
221,1253 cache-references ( +- 0.11% )
2,4384,1941 instructions # 2.02 insns per cycle ( +- 0.02% )
1,2076,2749 cycles ( +- 0.13% )
0.039567825 seconds time elapsed ( +- 0.65% )
```
![](https://i.imgur.com/XDawH7F.png)
## 參考資料
[共筆 ryanwang522](https://hackmd.io/s/Sk-qaWiYl)
[共筆 ZixinYang](https://hackmd.io/s/B1qEBZFoW)
[共筆 tina0405](https://hackmd.io/GYFgpghgrAHDEFoAmUBsSHgMwCMERAGMBOBQgdlQEYAGGKnVYiGIA===?view)
[perf 原理和實務](https://hackmd.io/s/B11109rdg)
[hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e)
[Git 的基本使用](https://gogojimmy.net/2012/01/17/how-to-use-git-1-git-basic/)