2016q3 Homework1 (phonebook)

contributed by <abba123>

請依照格式填寫標題和"contributed by"課程助教

Reviewed by HuangWenChen

底下 Create_hashtable 地方程式碼亂掉須修改
github 上建議不用將編譯後執行產生的檔案.txt push 上去

開發環境

OS : ubuntu 16.04.1 LTS

這個作業主要是要研究如何降低 cache miss
現在就讓我們來看看電腦的 cache 長什麼樣子吧

$ lscpu

L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           6144K

我只擷取了 cache 的部份
簡單的來說
越上層的 cache 存取的速度越快 L1>L3

為了讓我們 coding 更舒服
我們來修改一下 vim 裡面的設定
順便複習一下 vim 的操作吧
鳥哥的 linux 私房菜

$vim ~/.vimrc

打上自己要的設定之後下次在用 vim 就會生效啦

set hlsearch
set backspace=2
set autoindent
set nu
syntax on

案例分析 Phonebook

目的：分析電話簿搜尋程式，探討 cache miss 對於整體效能有顯著影響

在分析前當然要跑一次啦

先把我們的檔案編譯

$ make phonebook_orig

用 perf 來做分析

$ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_orig

 Performance counter stats for './phonebook_orig' (10 runs):

         1,216,859      cache-misses              #   85.464 % of all cache refs      ( +-  1.08% )
         1,423,831      cache-references                                              ( +-  0.77% )
         2,599,531      L1-dcache-load-misses                                         ( +-  0.21% )
   <not supported>      L1-dcache-store-misses   
   <not supported>      L1-dcache-prefetch-misses
            33,887      L1-icache-load-misses                                         ( +- 10.02% )

       0.062661439 seconds time elapsed                                          ( +-  6.23% )

無法查看 L1-dcache-store-misses L1-dcache-prefetch-misses ?

「程式輸出的文字訊息」請勿使用圖片表示，請改由文字貼上課程助教

這邊是把程式做10次來取一個平均
可以發現尚未優話的版本 cache miss 的機率很高
這樣表示在 cache 裡面幾乎找不到我們所需要的資料
必須往下到 memory 裡面去找
花費相當大的時間

經過閱讀 Programming Small之後
有了大概的方向來減少 cache miss 的發生

簡化 Struct 增加 cache 的存放數
實做 Hash function 降低 findName() 執行速度
構想中…

簡化 Struct

先來看一下我們原本儲存資料的結構













typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

根據 findName() ，其實我們只需要 lastName 這項資訊
這表示我們存了太多不會用到的資訊

我們把除了 lastName 的其餘資訊搬到別的空間做儲存
並加入一個指標做連結
這樣可以大幅簡少 struct 的大小


















typedef struct __PHONE_BOOK_DETAIL {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_DETAIL *pNext;
} detail;

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    detail *book_detail;
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

我們來執行一次優化版本吧

$ make phonebook_opt
$ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt

 Performance counter stats for './phonebook_opt' (10 runs):

           167,123      cache-misses              #   43.034 % of all cache refs      ( +-  2.28% )
           388,348      cache-references                                              ( +-  1.65% )
         1,279,994      L1-dcache-load-misses                                         ( +-  0.18% )
   <not supported>      L1-dcache-store-misses   
   <not supported>      L1-dcache-prefetch-misses
            24,808      L1-icache-load-misses                                         ( +- 10.19% )

       0.040618060 seconds time elapsed                                          ( +-  8.60% )

我們可以看到，經由我們的優化，改變 struct，cache miss 大幅下降

size of entry : 136 bytes
execution time of append() : 0.037243 sec
execution time of findName() : 0.005767 sec

size of entry : 32 bytes
execution time of append() : 0.028887 sec
execution time of findName() : 0.001861 sec

entry 大幅縮減，可以放進cache的entry更多，所以cache miss 的機率下降，也導致我們的執行速度上升

Gnuplot

	$ make plot

不論是 append() findName() 執行時間都下降了呢！

Hash Function

接下來為了減少 findname() 搜尋的時間
我們來實做一下 hash table
我們所採用的是 djb2 這個 hash function







int hash(char *str)
{
    unsigned long hash=5381;
    while(*str != '\0')
        hash = (hash<<5)+ *str++;
    return hash%42737;
}

接下來是建立一個 hash table



typedef struct hash_table {
    entry **table;
} hash_table;









hash_table *Create_hashtable(hash_table *t,int size)
{
    int i=0;
    t->table=(entry **)malloc(sizeof(entry)*size);
    for(i=0; i<size; i++) {
        t->table[i]=NULL;
    }
    return t;
}

去修改一下 find() append()

我們直接把經過 hash 轉換的字串，直接丟到相對應的 table 裡面












entry *append(char lastName[], hash_table *e)
{
    /* allocate memory for the new entry and put lastName */
    int x=hash(lastName);
    entry *node;
    node = (entry *) malloc(sizeof(entry));
    node->pNext=e->table[x];
    e->table[x]=node;
    strcpy(node->lastName, lastName);

    return 0;
}

先找到相對應的 table，再去裡面一個一個尋找 lastName 是否相等

entry *findName(char lastName[], hash_table *pHead)
{
    int x=hash(lastName);
    entry *e;
    e=pHead->table[x];
    while (e != NULL) {
        if(strcasecmp(e->lastName,lastName)==0)
            return e;
        e = e->pNext;
    }
    return NULL;
}

來比對一下我們優化的結果吧

 Performance counter stats for './phonebook_opt_hash' (100 runs):

           103,901      cache-misses              #   30.510 % of all cache refs      ( +-  0.93% )
           340,545      cache-references                                              ( +-  0.23% )
       237,127,330      instructions              #    1.62  insns per cycle          ( +-  0.02% )
       146,186,435      cycles                                                        ( +-  0.38% )

       0.041308486 seconds time elapsed                                          ( +-  0.40% )

經過加入 hashtable，可以發現 cache miss 又再次降低
而且findName()幾乎不用時間了
但 append() 的時間卻往上升了一點點

雖然hash的時間上多了一點，但由於phonebook最主要的功能是在查詢
所以假如是指 append 一次，findName 1000 次的話，差距就會很顯著

FUTURE WORK

字串壓縮
binary search tree