Try   HackMD

2017q3 Homework1 (phonebook)

contributed by <jackyhobingo>

系統環境

硬體規格

$ lscpu
Architecture:          x86_64
CPU 作業模式:    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
每核心執行緒數:2
每通訊端核心數:4
Socket(s):             1
NUMA 節點:         1
供應商識別號:  GenuineIntel
CPU 家族:          6
型號:              158
Model name:            Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
製程:              9
CPU MHz:             4022.314
CPU max MHz:           4200.0000
CPU min MHz:           800.0000
BogoMIPS:              7200.00
虛擬:              VT-x
L1d 快取:          32K
L1i 快取:          32K
L2 快取:           256K
L3 快取:           8192K
NUMA node0 CPU(s):     0-7

作業系統

$ uname -a
Linux jackyhobingo-D830MT 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 09:02:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

分析原版程式

電話簿資料結構是 linked list
function findName 的時間複雜度為 O(n)
function append 的時間複雜度為 O(1)
而且在搜尋及添加時只需要用到lastName的資料
程式會將過多不必要的資料 (firstName, email, and other unimportant data) 存取到 cache,程式執行時 cache 可以存取的資料筆數過少,造成容易 cache miss

/* phonebook_orig.h */
typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;
/* phonebook_orig.c */
entry *findName(char lastName[], entry *pHead)
{
    while (pHead != NULL) {
        if (strcasecmp(lastName, pHead->lastName) == 0)
            return pHead;
        pHead = pHead->pNext;
    }
    return NULL;
}

entry *append(char lastName[], entry *e)
{
    /* allocate memory for the new entry and put lastName */
    e->pNext = (entry *) malloc(sizeof(entry));
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;

    return e;
}

測試原版程式

$ sudo make cache-test

perf stat --repeat 100 \
	-e cache-misses,cache-references,instructions,cycles \
	./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.041267 sec
execution time of findName() : 0.004651 sec
...
 Performance counter stats for './phonebook_orig' (100 runs):

         4,428,668      cache-misses              #   89.994 % of all cache refs      ( +-  0.21% )
         4,921,091      cache-references                                              ( +-  0.14% )
       261,751,981      instructions              #    1.22  insn per cycle           ( +-  0.02% )
       214,970,704      cycles                                                        ( +-  0.57% )

       0.053889418 seconds time elapsed                                          ( +-  0.60% )

最佳化策略

先不修改 phonebook_opt.c 先使用 phonebook_orig.c 的寫法,只先修改資料結構,將phonebook_opt.h 中冗餘的資料另外存放。

/* phonebook_opt.h */ 
typedef struct __PHONE_BOOK_OTHER_INFO { // new data structure
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} otherInfo;

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    otherInfo * info; // new pointer 
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

產生圖表,新的資料結構明顯改善效能,新的進入點 Size 只剩下 32 bytes,在 append() 時 malloc() 所配置的空間變小,因此加快append() 的速度,此外在 L1-cache 可以存放的資料筆數多上 4.25 倍,cache-misses降低,findName() 效能提高。

$ make plot

$ sudo make cache-test
...
 Performance counter stats for './phonebook_opt' (100 runs):

         1,186,638      cache-misses              #   54.303 % of all cache refs      ( +-  0.40% )
         2,185,228      cache-references                                              ( +-  0.14% )
       244,863,348      instructions              #    2.11  insn per cycle           ( +-  0.02% )
       116,021,194      cycles                                                        ( +-  0.39% )

       0.028874041 seconds time elapsed                                          ( +-  0.43% )

參考資料

ktvexe開發紀錄(phonebook)