2017q3 Homework1 (phonebook)

contributed by <FATESAIKOU>

環境說明

CPU: Intel® Core™ i7-6700 CPU @ 3.40GHz
Memory: 40GB
Cache:
- L1d/i: 32KB
- L2: 256KB
- L3: 8192KB

優化前效能

測試效能前，首先查看範例程式碼當中，是如何進行效能測試，這邊我們看到其中的 main.c。

__buildin_clear_cache

首先引起我們的興趣的是從 main.c 56 行開始的：



#if defined(__GNUC__)
    __builtin___clear_cache((char *) pHead, (char *) pHead + sizeof(entry)); 
#endif

參考了共筆才發現這是 gnugcc 所提供的 build-in function，目的在於清除之前程式執行時可能留下的指令快取，避免效能測定的不確定性。

clock_gettime

接著是 main.c 59 行：


clock_gettime(CLOCK_REALTIME, &start)

參考此處的說明，此函式用於計時時十分精準，有機會到達柰秒的精準度等級，我使用參考處提供的程式碼實際測量一次自己電腦的精準度等級：

timer_test.c













#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main()
{
    struct timespec tt;

    clock_getres(CLOCK_REAL_TIME, &tt);
    printf("Resolution: %ld\n", tt.tv_nsec);

    return 0;
}

結果：

Resolution: 1

很幸運的，電腦的精細度可以達到 tv_nsec 的最小值，也就是柰秒。

append

接著我們在 main.c 第 65 行的位置發現了 append，也就是插入資料進電話簿的部份，接著往深處看發現到其資料結構定義於 phonebook_orig.h 並且實做 append 方法於 phonebook_orig.c：

phonebook_orig.h: entry













typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
}

可以發現電話簿中每一筆紀錄都十分龐大，並且本身只包含一個指標。

phonebook_orig.c: append










enrty *append(char lastName[], entry *e)
{    
    /* allocate memory for the new entry and put lastName */
    e->pNext = (entry *) malloc(sizeof(entry));
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;
    
    return e;
}

append 很明顯只是將傳進來的名字放入新分配的空間，接著接上傳進來的節點，並且傳出這個新節點。

main.c:







while (fgets(line, sizeof(line), fp)) {
    while (line[i] != '\0')
        i ++;
    line[i - 1] = '\0';
    i = 0;
    e = append(line, e);
}

至此很明顯可以發現此處使用的是單向的 Linked List 作為儲存的資料結構，並且透過這種 append，可以省去插入時搜尋結尾的時間消耗，但同時也必須維護一份 pHead。

fineName

接著我們看到了一個實際進行搜尋的部份，位於 main.c 的 88 行，其對 "zyxel" 呼叫了 findName 並且同時為其計時。





char input[MAX_LAST_NAME_SIZE] = "zyxel";
...
clock_gettime(CLOCK_REALTIME, &start);
findName(input, e);
clock_gettime(CLOCK_REALTIME, &end);

而我們仍然可以在 phonebook_orig.c 當中找到 findName 的實作。
phonebook_orig.c: findName









entry *findName(char lastName[], entry *pHead)
{
    while (pHead != NULL) {
        if (strcasecmp(lastName, pHead->lastName) == 0)
            return pHead;
        pHead = pHead->pNext;
    }
    return NULL;
}

上面的 findName 採用 linear search 的方式，搜尋整條 pHead 所指向的 Linked List。

Report

在 main.c 的第 93 與 96 以及 97 行，程式最後輸出了建立電話簿資料庫以及搜尋字串zyxel所耗費的時間。

make plot

接著可以看看實際產生報表的地方(Makefile plot)，循著相依性，發現到其是執行 phonebook_orig 以及將來要優化的 phonebook_opt 各100次，寫入各自的 orig.txt 以及 opt.txt，接著透過 calculate.c 轉換為 output.txt 交給 gnuplot script runtime .gp 進行繪圖，產生報表，但由於 phonebook_opt 的部份因還未開始撰寫，因此去除 phonebook_opt 的報表。

以下便是範例程式產生的報表。

讀入的資料庫為 words.txt，總共 349900 筆資料。

Bug

另外 main.c 當中還有個值得注意的 bug 位於 99 行


if (pHead->pNext) free(pHead->pNext);
free(pHead);

很容易可以注意到這樣是記憶體釋放不全。

尋找問題

透過上面的 report 我們可以看到這個未優化的版本對於 words.txt 的資料以及搜尋 zyxel 時的效能表現，但這樣的資訊實在太過片面，我們希望能透過同時調整資料庫大小、資料庫內字串長度分佈以及搜尋的目標字串長度，加上觀察 append 以及 findName 的效能表現變化，以及對照 perf 產生的資料，嘗試找出效能低落的原因並改善。

make more plot

首先我們看看輸入的 words.txt 上的資料分佈情形：

上圖是以字串長度配合計數進行繪圖，可以發現其中長度為 8 的字串最常出現，接著開始向兩側遞減。

接著以下針對調整載入的電話簿大小，同時使用常態分佈並調變

μ

控制載入的字串長度，其中

σ

為求作圖方便固定為

μ / 2

，最後將獲得的資料進行繪圖整理。

append
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
findName
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
cache-misses
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
cache-references
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
cache-miss-rate
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

優化實驗

接著我們希望能透過一些優化，加速本程式的載入以及查詢速度。

快取優化

structure simplize

memory pool

y = (append, findName, alloc speed => cache miss)
x = (db entry num), y = (average string length)

查詢加速

hash

hash + linked list

hash(Soundex) + BST

x = (db entry num), y = (average string length), z = (append, findName, encode speed => collision, cache miss)

(branch predict?)

hash table 大小影響

x = (db entry num), y = (hash table size), z = (append, findName => collision, cache miss)
(上述三方法之綜合繪圖)

dataset 更換

Sorted / Not sorted

分布圖 (orig + opt*3)

y = (accumulation)
x = (db string len)

效能比較

y = (append, findName, encode speed => collision, cache miss)
x = (db entry num)

原生資料 / 現實資料

分布圖

y = (accumulation)
x = (db string len)

效能比較

y = (append, findName, encode speed => collision, cache miss)
x = (db entry num)

2017q3 Homework1 (phonebook)

環境說明

優化前效能

__buildin_clear_cache

clock_gettime

append

fineName

Report

make plot

Bug

尋找問題

make more plot

優化實驗

快取優化

structure simplize

memory pool

查詢加速

hash

hash + linked list

hash(Soundex) + BST

hash table 大小影響

dataset 更換

Sorted / Not sorted

分布圖 (orig + opt*3)

效能比較

原生資料 / 現實資料

分布圖

效能比較

結果討論

參考資料