2017q3 Homework1(phonebook)

contributed by <loolofen>

預期目標

準備 GNU/Linux 開發工具
學習使用 Git 與 GitHub
學習效能分析工具
研究軟體最佳化

開發環境

使用 lscpu 查看CPU、快取內容

Ubuntu 16.04.1 LTS
CPU:Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz

L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K

Phone Book

未優化版本

原本 phonebook 的每個 entry 的大小為 136bytes
而使用的電腦為 L1 Cache 為 32KB ，所以可存資料為

32 K / （ 138 * 8 ） = 30.12

，代表裏面只能存 30 筆左右。在進行35萬筆資料比對時，會造成大量的cache miss。

./phonebook_orig

size of entry : 136 bytes
execution time of append() : 0.038199 sec
execution time of findName() : 0.006569 sec

Performance counter stats for './phonebook_orig' (100 runs):

125,1543      cache-misses              #   87.177 % of all cache refs      ( +-  0.04% )
143,5629      cache-references                                              ( +-  0.05% )
2,6206,9048      instructions              #    1.16  insn per cycle           ( +-  0.02% )
2,2534,1157      cycles                                                        ( +-  0.20% )

1.0.069925365 seconds time elapsed                                          ( +-  0.25% )

優化版本

第一種優化：減少entry

在phonebook_orig.c中，可以發現每次進行findName()時，只用到lastName，卻須將整個entry 136 byte載入cache中。









entry *findName(char lastName[], entry *pHead)
{
    while (pHead != NULL) {
        if (strcasecmp(lastName, pHead->lastName) == 0)
            return pHead;
        pHead = pHead->pNext;
    }
    return NULL;
}

其實findName跟append都只用到lastName，所以我們其實可以把那些多餘的訊息都去掉不要，因此我新增了一個比較簡單的struct。











typedef struct __DETAIL{
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
}detail;

並用一指標*detailInfo指向它。





typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    detail *detailInfo;
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

./phonebook_opt

size of entry : 32 bytes
execution time of append() : 0.033703 sec
execution time of findName() : 0.002436 sec

從下面的資訊可以看到，只是單純的把entry的大小減小，就大幅的降低了 cache miss ，從原本的 87.177 % 降低到了 48.673 % ，效果非常驚人。

Performance counter stats for './phonebook_opt' (100 runs):

           22,5903      cache-misses              #   48.673 % of all cache refs      ( +-  1.01% )
           46,4121      cache-references                                              ( +-  0.73% )
       2,4492,4206      instructions              #    1.87  insn per cycle           ( +-  0.02% )
       1,3093,8592      cycles                                                        ( +-  0.25% )

       0.042957874 seconds time elapsed                                          ( +-  0.55% )

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

第二種優化： hash function

參考：BKDRHash、petermouse

以下為 BKDR_hash 。














unsigned int BKDR_hash(char lastName[], int hash_table_size)
{

    unsigned int seed = 131;
    unsigned int hash = 0;
    int i = 0;
    while(lastName[i] != '\0')
    {
        hash = hash * seed + lastName[i];
        i++;
    }

    return hash %= hash_table_size;
}

我們藉由改變他的 hash_table_size 看有什麼差別。

當 hash table size = 300 時

size of entry : 32 bytes
execution time of append() : 0.043916 sec
execution time of findName() : 0.000033 sec

當 hash table size = 1500 時

size of entry : 32 bytes
execution time of append() : 0.055376 sec
execution time of findName() : 0.000002 sec

當 hash table size = 2000 時

size of entry : 32 bytes
execution time of append() : 0.057486 sec
execution time of findName() : 0.000003 sec

看起來 'append()'的執行時間，隨著 hash table size 增大而變大，也比位優化版本的還大，但是 'findName' 的時間隨著 hash table size 增大而變小的趨勢，與未優化版本比較少了許多。

           24,3102      cache-misses              #   46.631 % of all cache refs      ( +-  1.58% )
           52,1337      cache-references                                              ( +-  0.29% )
       2,5526,5485      instructions              #    1.43  insn per cycle           ( +-  0.02% )
       1,7872,5769      cycles                                                        ( +-  0.50% )

       0.057909914 seconds time elapsed                                          ( +-  0.55% )

cache miss 降到 46.631% 左右，和縮小 entry 的優化版本（48%）稍微低一點。

以下為所有方式比較圖，可以看出 BKDR_hash 總執行時間比未修改的還久，縮小entry 執行時間最短，

參考資料

BKDRHash
petermouse
作業區1