Phonebook

tags: `sysprog2017`

開發環境

OS: Ubuntu 16.04 LTS
L1d cache： 32K
L1i cache： 32K
L2 cache： 256K
L3 cache： 3072K
Architecture: x86_64
CPU 作業模式： 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
Model name: Intel® Core™ i5-3230M CPU @ 2.60GHz
memory
description: System memory
physical id: 0
size: 7869MiB

未優化版本

執行: $ ./phonebook_orig
- append() 及 findName() 時間如下

size of entry : 136 bytes
execution time of 
append() : 0.086253 sec
execution time of findName() : 0.005621 sec

catch test: $ make cache-test
- cache-misses 高達 94%

 Performance counter stats for ’./phonebook_orig’ (100 runs):

         2,136,649      cache-misses              #  94.149 % of all cache refs
         2,257,520      cache-references
       263,626,662      instructions              #  1.15  insns per cycle
       220,088,188      cycles

       0.079278420 seconds time elapsed                           ( +-  0.76% )

透過 perf 找熱點: $ ./phonebook_orig & sudo perf top -p $!
- 發現 findName() (23.96%) 所占的時間比 append() (3.66%) 多，但是 append() (0.086 sec) 所花的執行時間卻比 findName() (0.006 sec) 長

優化版本 1 - 減少資料結構大小

因為在 find 的時候，只需要尋找 lastName，所以定義了一個新的結構 __LAST_NAME_ENTRY 只存 lastName，並用一個 pointer *detail 指向原本的 __PHONE_BOOK_ENTRY





















typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry_detail;

typedef struct __LAST_NAME_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    entry_detail *detail;
    struct __LAST_NAME_ENTRY *pNext;
} entry;

執行時間:

size of entry : 32 bytes
execution time of append() : 0.037053 sec
execution time of findName() : 0.004086 sec

cache-misses: 縮小了許多 (94% to 69%)
與 cache reference 有關
- cache 相關知識

Performance counter stats for ’./phonebook_opt’ (100 runs):

           450,047      cache-misses              #  68.996 % of all cache refs
           769,018      cache-references
       247,234,819      instructions              #   1.64  insns per cycle
       166,489,625      cycles

       0.052383610 seconds time elapsed                         ( +-  1.40% )

plot

優化版本 2 - 使用 Hash Function

這裡我選擇了 BKDR-Hash 進行優化，下面是 BKDR-Hash 的程式碼片段：
BKDR-Hash 說明








unsigned int BKDRHash(char *str){
    unsigned int seed = 131;
    unsigned int hash = 0;
    while (*str){
        hash = hash * seed + (*str++);
    }
    return hash;
}

執行時間:
- append() 的執行時間大幅提升 (0.037053 => 0.092839) ，因為經過 hash function 運算增加了時間成本
- findName() 時間則大幅下降，因算出 hash value 後搜尋的範圍變小了

size of entry : 32 bytes
execution time of append() : 0.092839 sec
execution time of findName() : 0.000006 sec

cache-misses

 Performance counter stats for ’./phonebook_hash’ (100 runs):

           338,982      cache-misses              #  48.153 % of all cache refs
           705,201      cache-references
       234,245,679      instructions              #   1.56  insns per cycle
       149,868,423      cycles

       0.055589024 seconds time elapsed             ( +-  1.85% )

plot

Hash function 說明

優化版本 3 - BST

把 link list 轉換成 BST 的結構, 再利用 binary search 搜尋
- 使用遞迴:
  1. 先找左子樹的 root
  2. 建立 root, 給定 lastname 及左子樹
  3. 前往下一個 link list pHead (pHead = pHead->pNext)
  4. 找右子樹的 root, 然後把剛建好的 root 給它
append 花費的時間比之前還長，因為還額外使用了 conver_to_bst 建樹
findName 所花的時間比之前少，且 cache miss 有些許的降低
結果:

execution time of append() : 0.104955 sec
execution time of findName() : 0.000004 sec

Performance counter stats for './phonebook_opt' (10 runs):
            753779 cache-misses              #   81.183 % of all cache refs      
    ( +-  1.43% ) [65.55%]
            928492 cache-references                                              
    ( +-  0.82% ) [66.84%]
           1272539 L1-dcache-load-misses                                       
    ( +-  0.85% ) [68.39%]
            972993 L1-dcache-store-misses                                        
    ( +-  0.59% ) [69.46%]
            483164 L1-dcache-prefetch-misses                                     
    ( +-  1.55% ) [67.10%]
            180794 L1-icache-load-misses                                         
    ( +-  5.64% ) [64.76%]

       0.137188294 seconds time elapsed                                          
    ( +-  1.19% )

優化版本 4 - trie and array

首先，在 trie struct 中使用兩個 pointer pNext 和 pChild 建立整個 list






typedef struct __PHONE_BOOK_TRIE {
    char ch; 
    entry_detail *detail;
    struct __PHONE_BOOK_TRIE *pNext;
    struct __PHONE_BOOK_TRIE *pChild;
} entry;

然而, findName() 的時間還是和原本的差不多，因為仍須使用 pNext 和 pChild 走訪整個
嘗試修改了 struct 成以下，保留 pChild 並給它一個 array[27] 來存字母





typedef struct __PHONE_BOOK_TRIE {
    char ch;
    entry_detail *detail;
    struct __PHONE_BOOK_TRIE *pChild[27];
} entry;

這讓搜尋的時間是最快的
但是 cache-misses 和 append 時間都是最多且最長的，因為其資料結構較大
結果:

size of entry : 232 bytes
execution time of append() : 0.400078 sec
execution time of findName() : 0.000001 sec
*** stack smashing detected ***: ./phonebook_trie terminated
./phonebook_trie: Aborted

 Performance counter stats for './phonebook_trie' (10 runs):
           5197525 cache-misses              #   90.833 % of all cache refs      
   ( +-  0.37% ) [66.30%]
           5722095 cache-references                                              
   ( +-  0.34% ) [66.65%]
           6633948 L1-dcache-load-misses                                         
   ( +-  1.13% ) [67.17%]
           5653139 L1-dcache-store-misses                                        
   ( +-  0.30% ) [67.47%]
            299454 L1-dcache-prefetch-misses                                     
   ( +-  4.76% ) [66.74%]
            924947 L1-icache-load-misses                                         
   ( +- 11.02% ) [66.13%]

       0.429757602 seconds time elapsed                                          
   ( +-  0.72% )

Dataset 的影響

首先，觀察程式中所使用的 words.txt 可以發現，已經照字典序排序好，那如果把他打亂 (sort -R words.txt)再測試一次的話，結果是否會有所不同？

打亂後發現，未優化前與前兩種方式所受到的影響比較小。而後面兩種有關樹的方式，均有明顯的不同。原因在於順序的不同，導致在維護樹的時候有所差異。

下一步，應該尋找一個適當的 lastname dataset 來做測試，因為 words.txt 中包含了許多沒有名字意義的單字。

優化版本 5 - 利用 memory pool

老師有提到可以將 malloc 替換成 memory pool 的方式來做，先分配好一大塊記憶體，等 append 需要的時候直接取一小塊來用，這樣 append 的時間也能有所改善，畢竟頻繁的要求/釋放記憶體也會造成系統的負荷。

將 memory pool 套用至各種版本上之後可看到總時間比原本短。看來對於大量的記憶體操作時，可以優先考慮利用 memory pool 來取代傳統的 malloc，進而提升執行效率。

版本	套用前後時間差異
優化前	-0.01173
小結構	-0.009254
Hash function	-0.007918
字典樹	-0.063499
紅黑樹	-0.014424

進階版本 - Fuzzy Search

支援 fuzzy search，允許搜尋 lastName 時，不用精準打出正確的寫法
比方說電話簿有一筆資料是 McDonald，但若使用者輸入 MacDonald 或 McDonalds，也一併檢索出來。

這裏的實做方面也是利用編輯距離來進行，以編輯距離的大小來進行搜尋











void findName(char lastname[], entry *pHead)
{
    int distance;
    char buf[MAX_LAST_NAME_SIZE];
    while (pHead != NULL) {
        distance = dist(lastname , pHead->lastName);
        if(distance <= 2)
            printf("intput=%s , %s , dist=%d\n",lastname,pHead->lastName,distance);
        pHead = pHead -> pNext;
    }
}

將所有編輯距離小於3的資料都列出來

以 "douglas" 來進行測試

intput=douglas , dongles , dist=2
intput=douglas , douala , dist=2
intput=douglas , doubles , dist=2
intput=douglas , dougl , dist=2
intput=douglas , douglas , dist=0
intput=douglas , douglass , dist=1
intput=douglas , dougnac , dist=2
intput=douglas , dougnan , dist=2
intput=douglas , dounias , dist=2

size of entry : 32 bytes
execution time of append() : 0.054440 sec
execution time of findName() : 0.022088 sec

在這裏想說要加個記錄最小 dist 的資料，但考慮到如果在真實世界，有太多名字相同且相似的人，那樣列出來的最小 dist 的資料也會有幾十萬筆，那就有搜尋跟沒搜尋沒什麼差別。但如果再加多幾個詳細資料在 details 中，那這個 fuzzy searching 可能就有非常大的作用了。

參考資料：
Levenshtein Distance (編輯距離)

Phonebook

tags: `sysprog2017`

開發環境

未優化版本

優化版本 1 - 減少資料結構大小

優化版本 2 - 使用 Hash Function

優化版本 3 - BST

優化版本 4 - trie and array

Dataset 的影響

優化版本 5 - 利用 memory pool

進階版本 - Fuzzy Search

資料來源

背景知識

Phonebook

tags: sysprog2017

開發環境

未優化版本

優化版本 1 - 減少資料結構大小

優化版本 2 - 使用 Hash Function

優化版本 3 - BST

優化版本 4 - trie and array

Dataset 的影響

優化版本 5 - 利用 memory pool

進階版本 - Fuzzy Search

資料來源

背景知識

tags: `sysprog2017`