Try   HackMD

2016q3 Homework 1 (phonebook)

contributed by <vic85821>

Reviewed by jkrvivian

  • 優化方法一(使用union),已改變程式行為,因此無法和其他優化方法比較。
  • 沒有評析binary search tree搜尋演算法對findName()的影響
  • 已提出使用hash function優化搜尋速度,還未實作,也還未比較不同hash function間的效能
  • 沒有採用不同的data set實驗,因原本使用的資料與現實差異很大
  • commit 2b10484的commit訊息很模糊,並不知道具體做了甚麼修改

開發環境

  • Ubuntu 16.04.1 LTS

    不小心就把windows洗掉了vic85821

  • Linux version 4.4.0-38-generic
  • CPU : Intel® Core i5-3230M CPU @ 2.60GHz
  • MEM : 8GB
  • L1d cache : 32K
  • L1i cache : 32K
  • L2 cache : 256K
  • L3 cache : 3072K

一併列出硬體規格,像是 CPU processor, cache memory 等資訊,這樣日後實驗重現才有依據 jserv

因為我的硬體設備比較特別,linux抓不到我的無線網卡,因此要額外安裝驅動程式

  • sudo lshw -C network 查詢網卡產品名稱 : MT7630e
  • 網站 抓取驅動程式
  • cd到該檔案所在的目錄
$ sudo chmod +x install
$ sudo ./install

perf

$ perf list

List of pre-defined events (to be used in -e):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]

  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]

範例效能評析測試

第一次使用perf,還不是很熟悉 試了很多次 perf top -p $pid才成功!! vic8521Sat, Sep 24, 2016 5:17 PM

「程式輸出的文字訊息」請勿使用圖片表示,請改由文字貼上課程助教

gnuplot

之前大二在程式語言,曾經對matrix multiplication,以不同的方法執行(暴力解, 施特拉森演算法, Concurrency),並對效能進行分析

astyle

用來檢查coding style是否符合
astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]

  • indent=spaces=4 縮排=4個空白

案例分析 : phonebook

作業說明

​更改程式碼以降低cache-miss的比例

猜想

  • 更改資料結構,減少不常使用的儲存空間
  • [ ] 更改搜尋的方法

未優化版本(origin)

struct

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;
  • 資料結構冗長

findName

entry *findName(char lastname[], entry *pHead)
{
    while (pHead != NULL) {
        if (strcasecmp(lastname, pHead->lastName) == 0)
            return pHead;
        pHead = pHead->pNext;
    }
    return NULL;
}
  • 方法效率低(如果不是,一直往下找直到null)

append

entry *append(char lastName[], entry *e)
{
    /* allocate memory for the new entry and put lastName */
    e->pNext = (entry *) malloc(sizeof(entry));
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;

    return e;
}

執行結果

make cache-test

size of entry : 136 bytes
execution time of append() : 0.043322 sec
execution time of findName() : 0.005194 sec
Performance counter stats for './phonebook_orig' (100 runs):

         2,037,607      cache-misses              #   94.069 % of all cache refs      ( +-  0.38% )
         2,166,081      cache-references                                              ( +-  0.32% )
       261,123,688      instructions              #    1.36  insns per cycle          ( +-  0.02% )
       191,871,256      cycles                                                        ( +-  0.85% )

       0.070512191 seconds time elapsed                                          ( +-  1.71% )

分析

main.c的一段程式碼

e = pHead;
	
/* the givn last name to find */
char input[MAX_LAST_NAME_SIZE] = "zyxel";
e = pHead;

assert(findName(input, e) &&
       "Did you implement findName() in " IMPL "?");
assert(0 == strcmp(findName(input, e)->lastName, "zyxel"));
	

可以發現都是去找"zyxel",並且findname()都是根據lastname去搜尋

  • 資料結構的精減化

優化版本

1.資料結構優化

  • 版本1
typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    
    union {
        char firstName[16];
        char email[16];
        char phone[10];
        char cell[10];
        char addr1[16];
        char addr2[16];
        char city[16];
        char state[2];
        char zip[5];
    };
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

執行結果
make cache-test

size of entry : 40 bytes
execution time of append() : 0.072287 sec
execution time of findName() : 0.003676 sec
Performance counter stats for './phonebook_opt' (100 runs):

           389,500      cache-misses              #   66.630 % of all cache refs      ( +-  0.33% )
           584,572      cache-references                                              ( +-  1.12% )
       243,952,419      instructions              #    1.72  insns per cycle          ( +-  0.02% )
       141,447,587      cycles                                                        ( +-  0.83% )

       0.050986039 seconds time elapsed                                          ( +-  1.70% )

cache-miss 原本 94.069% -> 優化後 66.630%

老師上課也有說過,單純的簡化資料結構,也可以達到最佳化!!是真的!!

  • note : union會自動對齊內部包含的變數中,size最大的空間(但對齊會有個最小值。e.g. 4/8byte,仍有可能造成空間浪費)

"example" 的簡寫是 e.g.,不是 "ex",後者是拿來描述前女友、前東家等詞彙
這段程式碼已經改變預期行為了,作最佳化不能改變正確性。 jserv


  • 版本2
typedef struct _PHONE_BOOK_DETAIL {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} detail;

typedef struct _PHONE_BOOK_ENTRY {	
    char lastName[MAX_LAST_NAME_SIZE]; 
    detail* detailDate;
    struct _PHONE_BOOK_ENTRY *pNext;
} entry;

執行結果
perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt

size of entry : 24 bytes
execution time of append() : 0.070298 sec
execution time of findName() : 0.003760 sec
 Performance counter stats for './phonebook_opt' (100 runs):

           252,673      cache-misses              #   67.435 % of all cache refs      ( +-  0.44% )  (64.12%)
           374,694      cache-references                                              ( +-  0.82% )  (64.44%)
           713,048      L1-dcache-load-misses                                         ( +-  3.34% )  (65.51%)
           240,697      L1-dcache-store-misses                                        ( +-  1.05% )  (70.62%)
           522,278      L1-dcache-prefetch-misses                                     ( +-  3.61% )  (73.19%)
            87,401      L1-icache-load-misses                                         ( +-  3.72% )  (67.82%)

       0.046339100 seconds time elapsed                                          ( +-  1.24% )

備註:這部份都尚未修改findname() append(),而是使用phonebook_orig.c 裡的function


2.更改搜尋方法

  • 版本1 binary search tree
                                                                                  +-----+
                                                                                  |node |
                          +----+                    +----+                      +-------+-----+
+----+                    |node|                    |node|                      |             |
|node|    +--->        +-------+                  +--------+                 +--v-+         +-v--+
+----+                 |             +---->       |        |     +--->       |node|         |node|
                    +--v--+                       |        |               +-----------+    +----+
                    |node |                    +--v-+   +--v-+             |           |
                    +-----+                    |node|   |node|           +-v--+      +-v--+
                                               +----+   +----+           |node|      |node|
                                                                         +----+      +----+

對資料結構稍做更動,增加 bst

#define OPT 1
#define BST 1

typedef struct _PHONE_BOOK_DETAIL {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} detail;

typedef struct _PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE]; 
    detail *detailDate;
    struct _PHONE_BOOK_ENTRY *pNext;
} entry;

typedef struct _PHONE_BOOK_BST {	
    struct _PHONE_BOOK_ENTRY *data;
    struct _PHONE_BOOK_BST *left;
    struct _PHONE_BOOK_BST *right;
} bst;

bst *findName(char lastname[], bst *root);
entry *append(char lastName[], entry *e);
bst *build_bst(entry** pHead,int num);

#endif

建立binary search tree雖然findname()時間變快許多,但是一開始還要額外花時間在建樹

//build binary search
bst *build_bst(entry** pHead,int num)
{
    if(num == 0)
        return NULL;

    bst *root,*left;

    //recursive build left leaf
    left = build_bst(pHead, num>>1);
    

    // build root(center node)
    root = (bst *) malloc(sizeof(bst));
    root -> data = (entry *)malloc(sizeof(entry));
    strcpy((root->data)->lastName, (*pHead)->lastName);
    root->left = left;

    // build right left
    entry *tmp = *pHead;
    *pHead = (*pHead)->pNext;
    free(tmp);
    root->right = build_bst(pHead, num-(num>>1)-1);
    
    return root;
}

執行結果
perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt

size of entry : 32 bytes
execution time of append() : 0.080784 sec # 包含建樹,可以看出用了許多時間
execution time of findName() : 0.000001 sec 
Performance counter stats for './phonebook_opt' (100 runs):

           555,542      cache-misses              #   85.642 % of all cache refs      ( +-  0.51% )  (65.22%)
           648,682      cache-references                                              ( +-  0.47% )  (66.16%)
         1,059,861      L1-dcache-load-misses                                         ( +-  0.47% )  (67.19%)
           772,671      L1-dcache-store-misses                                        ( +-  0.35% )  (68.97%)
           314,794      L1-dcache-prefetch-misses                                     ( +-  0.66% )  (68.96%)
            89,415      L1-icache-load-misses                                         ( +-  1.78% )  (66.29%)

       0.100239647 seconds time elapsed                                          ( +-  2.05% )

  • cache miss (85.642%) 沒有下降的很明顯,可能原因為
    • struct bst 有額外的bst *left,*right,在建樹的時候使用遞迴,造成大量cache miss

Makefile

之前很少用到makefile,如果有也只是簡單打幾行compile的指令,第一次看到如同這次的Makefile

CC ?= gcc
CFLAGS_common ?= -Wall -std=gnu99
CFLAGS_orig = -O0
CFLAGS_opt  = -O0

EXEC = phonebook_orig phonebook_opt
all: $(EXEC)

SRCS_common = main.c

phonebook_orig: $(SRCS_common) phonebook_orig.c phonebook_orig.h
	$(CC) $(CFLAGS_common) $(CFLAGS_orig) \
		-DIMPL="\"$@.h\"" -o $@ \
		$(SRCS_common) $@.c

phonebook_opt: $(SRCS_common) phonebook_opt.c phonebook_opt.h
	$(CC) $(CFLAGS_common) $(CFLAGS_opt) \
		-DIMPL="\"$@.h\"" -o $@ \
		$(SRCS_common) $@.c

run: $(EXEC)
	echo 3 | sudo tee /proc/sys/vm/drop_caches
	watch -d -t "./phonebook_orig && echo 3 | sudo tee /proc/sys/vm/drop_caches"

cache-test: $(EXEC)
	perf stat --repeat 100 \
		-e cache-misses,cache-references,instructions,cycles \
		./phonebook_orig
	perf stat --repeat 100 \
		-e cache-misses,cache-references,instructions,cycles \
		./phonebook_opt

output.txt: cache-test calculate
	./calculate

plot: output.txt
	gnuplot scripts/runtime.gp

calculate: calculate.c
	$(CC) $(CFLAGS_common) $^ -o $@

.PHONY: clean
clean:
	$(RM) $(EXEC) *.o perf.* \
	      	calculate orig.txt opt.txt output.txt runtime.png

待完成的項目

  • 用hash table 做findNmae()的最佳化
  • 學會GDB!!!
  • 熟悉 Makefile的語法

參考資料