owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework 1 (phonebook)
contributed by <`vic85821`>
### Reviewed by `jkrvivian`
* 優化方法一(使用union),已改變程式行為,因此無法和其他優化方法比較。
* 沒有評析binary search tree搜尋演算法對findName()的影響
* 已提出使用hash function優化搜尋速度,還未實作,也還未比較不同hash function間的效能
* 沒有採用不同的data set實驗,因原本使用的資料與現實差異很大
* [commit 2b10484](https://github.com/vic85821/phonebook/commit/2b10484241bc2033b40f7a0e433072ddabf25966)的commit訊息很模糊,並不知道具體做了甚麼修改
## 開發環境
* Ubuntu 16.04.1 LTS
>不小心就把windows洗掉了...[name=vic85821]
* Linux version 4.4.0-38-generic
* CPU : Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
* MEM : 8GB
* L1d cache : 32K
* L1i cache : 32K
* L2 cache : 256K
* L3 cache : 3072K
> 一併列出硬體規格,像是 CPU processor, cache memory 等資訊,這樣日後實驗重現才有依據 [name=jserv]
因為我的硬體設備比較特別,linux抓不到我的無線網卡,因此要額外安裝驅動程式
* `sudo lshw -C network` 查詢網卡產品名稱 : **MT7630e**
* 到 [網站](https://github.com/neurobin/MT7630E) 抓取驅動程式
* `cd`到該檔案所在的目錄
```shell
$ sudo chmod +x install
$ sudo ./install
```
### perf
`$ perf list`
```
List of pre-defined events (to be used in -e):
branch-instructions OR branches [Hardware event]
branch-misses [Hardware event]
bus-cycles [Hardware event]
cache-misses [Hardware event]
cache-references [Hardware event]
cpu-cycles OR cycles [Hardware event]
instructions [Hardware event]
ref-cycles [Hardware event]
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
```
**範例效能評析測試**
![](https://i.imgur.com/V1iKAyz.png)
>第一次使用perf,還不是很熟悉 試了很多次 ` perf top -p $pid `才成功!! [name=vic8521] [time=Sat, Sep 24, 2016 5:17 PM]
>>「程式輸出的文字訊息」請勿使用圖片表示,請改由文字貼上[name=課程助教]
### gnuplot
>之前大二在程式語言,曾經對matrix multiplication,以不同的方法執行(暴力解, 施特拉森演算法, Concurrency),並對效能進行分析 [color=#dd75c5]
![](https://i.imgur.com/CHY8PrS.png)
### astyle
用來檢查coding style是否符合
`astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]`
* indent=spaces=4 縮排=4個空白
## 案例分析 : phonebook
### 作業說明
更改程式碼以降低cache-miss的比例
### 猜想
- [ ] 更改資料結構,減少不常使用的儲存空間
- [ ] 更改搜尋的方法
### 未優化版本(origin)
**struct**
```c
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
* 資料結構冗長
---
**findName**
```c
entry *findName(char lastname[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastname, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
```
* 方法效率低(如果不是,一直往下找直到null)
---
**append**
```c
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
---
### 執行結果
`make cache-test`
```
size of entry : 136 bytes
execution time of append() : 0.043322 sec
execution time of findName() : 0.005194 sec
```
```
Performance counter stats for './phonebook_orig' (100 runs):
2,037,607 cache-misses # 94.069 % of all cache refs ( +- 0.38% )
2,166,081 cache-references ( +- 0.32% )
261,123,688 instructions # 1.36 insns per cycle ( +- 0.02% )
191,871,256 cycles ( +- 0.85% )
0.070512191 seconds time elapsed ( +- 1.71% )
```
---
### 分析
main.c的一段程式碼
```c
e = pHead;
/* the givn last name to find */
char input[MAX_LAST_NAME_SIZE] = "zyxel";
e = pHead;
assert(findName(input, e) &&
"Did you implement findName() in " IMPL "?");
assert(0 == strcmp(findName(input, e)->lastName, "zyxel"));
```
可以發現都是去找"zyxel",並且findname()都是根據lastname去搜尋
- 資料結構的精減化
---
### 優化版本
**1.資料結構優化**
* 版本1
```c
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
union {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
};
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
執行結果
`make cache-test`
```
size of entry : 40 bytes
execution time of append() : 0.072287 sec
execution time of findName() : 0.003676 sec
```
```
Performance counter stats for './phonebook_opt' (100 runs):
389,500 cache-misses # 66.630 % of all cache refs ( +- 0.33% )
584,572 cache-references ( +- 1.12% )
243,952,419 instructions # 1.72 insns per cycle ( +- 0.02% )
141,447,587 cycles ( +- 0.83% )
0.050986039 seconds time elapsed ( +- 1.70% )
```
cache-miss 原本 `94.069%` -> 優化後 `66.630%`
>老師上課也有說過,單純的簡化資料結構,也可以達到最佳化!!是真的!! [color=#dd75c5]
* note : union會自動對齊內部包含的變數中,size最大的空間(但**對齊會有個最小值。e.g. 4/8byte**,仍有可能造成空間浪費)
>> "example" 的簡寫是 [e.g.](https://en.wiktionary.org/wiki/e.g.),不是 "ex",後者是拿來描述前女友、前東家等詞彙
>> 這段程式碼已經改變預期行為了,作最佳化不能改變正確性。 [name=jserv]
---
* 版本2
```c
typedef struct _PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct _PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
detail* detailDate;
struct _PHONE_BOOK_ENTRY *pNext;
} entry;
```
執行結果
`perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt
`
```
size of entry : 24 bytes
execution time of append() : 0.070298 sec
execution time of findName() : 0.003760 sec
```
```
Performance counter stats for './phonebook_opt' (100 runs):
252,673 cache-misses # 67.435 % of all cache refs ( +- 0.44% ) (64.12%)
374,694 cache-references ( +- 0.82% ) (64.44%)
713,048 L1-dcache-load-misses ( +- 3.34% ) (65.51%)
240,697 L1-dcache-store-misses ( +- 1.05% ) (70.62%)
522,278 L1-dcache-prefetch-misses ( +- 3.61% ) (73.19%)
87,401 L1-icache-load-misses ( +- 3.72% ) (67.82%)
0.046339100 seconds time elapsed ( +- 1.24% )
```
![](https://i.imgur.com/SZNNRUp.png)
> 備註:這部份都尚未修改findname() append(),而是使用phonebook_orig.c 裡的function [color=#9707bf]
---
**2.更改搜尋方法**
* 版本1 binary search tree
```
+-----+
|node |
+----+ +----+ +-------+-----+
+----+ |node| |node| | |
|node| +---> +-------+ +--------+ +--v-+ +-v--+
+----+ | +----> | | +---> |node| |node|
+--v--+ | | +-----------+ +----+
|node | +--v-+ +--v-+ | |
+-----+ |node| |node| +-v--+ +-v--+
+----+ +----+ |node| |node|
+----+ +----+
```
對資料結構稍做更動,增加 **bst**
```c
#define OPT 1
#define BST 1
typedef struct _PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct _PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
detail *detailDate;
struct _PHONE_BOOK_ENTRY *pNext;
} entry;
typedef struct _PHONE_BOOK_BST {
struct _PHONE_BOOK_ENTRY *data;
struct _PHONE_BOOK_BST *left;
struct _PHONE_BOOK_BST *right;
} bst;
bst *findName(char lastname[], bst *root);
entry *append(char lastName[], entry *e);
bst *build_bst(entry** pHead,int num);
#endif
```
建立binary search tree雖然findname()時間變快許多,但是一開始還要額外花時間在建樹
```c
//build binary search
bst *build_bst(entry** pHead,int num)
{
if(num == 0)
return NULL;
bst *root,*left;
//recursive build left leaf
left = build_bst(pHead, num>>1);
// build root(center node)
root = (bst *) malloc(sizeof(bst));
root -> data = (entry *)malloc(sizeof(entry));
strcpy((root->data)->lastName, (*pHead)->lastName);
root->left = left;
// build right left
entry *tmp = *pHead;
*pHead = (*pHead)->pNext;
free(tmp);
root->right = build_bst(pHead, num-(num>>1)-1);
return root;
}
```
執行結果
`perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt`
```shell
size of entry : 32 bytes
execution time of append() : 0.080784 sec # 包含建樹,可以看出用了許多時間
execution time of findName() : 0.000001 sec
```
```
Performance counter stats for './phonebook_opt' (100 runs):
555,542 cache-misses # 85.642 % of all cache refs ( +- 0.51% ) (65.22%)
648,682 cache-references ( +- 0.47% ) (66.16%)
1,059,861 L1-dcache-load-misses ( +- 0.47% ) (67.19%)
772,671 L1-dcache-store-misses ( +- 0.35% ) (68.97%)
314,794 L1-dcache-prefetch-misses ( +- 0.66% ) (68.96%)
89,415 L1-icache-load-misses ( +- 1.78% ) (66.29%)
0.100239647 seconds time elapsed ( +- 2.05% )
```
![](https://i.imgur.com/MwcupNB.png)
* ==cache miss (85.642%)== 沒有下降的很明顯,可能原因為
* struct bst 有額外的`bst *left,*right`,在建樹的時候使用遞迴,造成大量cache miss
---
## Makefile
之前很少用到makefile,如果有也只是簡單打幾行compile的指令,第一次看到如同這次的Makefile
```
CC ?= gcc
CFLAGS_common ?= -Wall -std=gnu99
CFLAGS_orig = -O0
CFLAGS_opt = -O0
EXEC = phonebook_orig phonebook_opt
all: $(EXEC)
SRCS_common = main.c
phonebook_orig: $(SRCS_common) phonebook_orig.c phonebook_orig.h
$(CC) $(CFLAGS_common) $(CFLAGS_orig) \
-DIMPL="\"$@.h\"" -o $@ \
$(SRCS_common) $@.c
phonebook_opt: $(SRCS_common) phonebook_opt.c phonebook_opt.h
$(CC) $(CFLAGS_common) $(CFLAGS_opt) \
-DIMPL="\"$@.h\"" -o $@ \
$(SRCS_common) $@.c
run: $(EXEC)
echo 3 | sudo tee /proc/sys/vm/drop_caches
watch -d -t "./phonebook_orig && echo 3 | sudo tee /proc/sys/vm/drop_caches"
cache-test: $(EXEC)
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./phonebook_orig
perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./phonebook_opt
output.txt: cache-test calculate
./calculate
plot: output.txt
gnuplot scripts/runtime.gp
calculate: calculate.c
$(CC) $(CFLAGS_common) $^ -o $@
.PHONY: clean
clean:
$(RM) $(EXEC) *.o perf.* \
calculate orig.txt opt.txt output.txt runtime.png
```
## 待完成的項目
- [ ] 用hash table 做findNmae()的最佳化
- [ ] 學會GDB!!!!
- [ ] 熟悉 Makefile的語法
## 參考資料
* [louielu HackMD](https://hackmd.io/CYJgZgHA7AnADAFgLQGMCmaCGSEGYCsyMARjPkgIwhS4gT4VwogJA===)
* [楊靜妃學姐 Hackpad](https://embedded2015.hackpad.com/note-hw2-phonebook-3mUOSk5J9cu)