phonebook

學習目標

1.學習perf工具
2.學習gunplot工具
3.學習git工具

環境配置

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 61
Model name:            Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Stepping:              4
CPU MHz:               2200.945
CPU max MHz:           2700.0000
CPU min MHz:           500.0000
BogoMIPS:              4393.70
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

改進目標

降低cache miss以及提升效能,可行的方向有
1.降低搬移的資料量
2.使用binary search tree建立資料表提升查詢效能
3.hash function 來提升查詢效能

原始效能分析

  33.83%  phonebook_orig  [.] findName
  28.97%  libc-2.23.so    [.] __strcasecmp_l_avx
   5.98%  [kernel]        [k] clear_page_c_e
   3.25%  [kernel]        [k] _raw_spin_lock
   3.24%  phonebook_orig  [.] main
   3.09%  [kernel]        [k] free_pcppages_bulk
   2.89%  libc-2.23.so    [.] _IO_fgets
   2.29%  libc-2.23.so    [.] __strcasecmp_avx
   2.04%  libc-2.23.so    [.] _int_malloc
   1.57%  [kernel]        [k] alloc_pages_vma
   1.32%  libc-2.23.so    [.] _IO_getline_info
   1.18%  phonebook_orig  [.] append

          345,0778      cache-misses              #   89.510 % of all cache refs      ( +-  0.15% )
          385,5173      cache-references                                              ( +-  0.24% )
       2,6173,7428      instructions              #    1.06  insns per cycle          ( +-  0.02% )
       2,4631,4013      cycles                                                        ( +-  0.71% )

cache miss大約為90%

效能改進(struct)

為了減少記憶體的搬移,先嘗試降低entry的大小,將除了搜尋引所之外的lastName移到別的struct


















typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    struct __PHONE_BOOK_DETAIL *pDetail;
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

typedef struct __PHONE_BOOK_DETAIL {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} detail;

cache miss降低到69%左右

          124,5225      cache-misses              #   68.849 % of all cache refs      ( +-  0.22% )
          180,8641      cache-references                                              ( +-  0.35% )
       2,4419,7204      instructions              #    1.43  insns per cycle          ( +-  0.02% )
       1,7110,3445      cycles                                                        ( +-  0.68% )

效能改進(BST)

加入binary search tree的結構,同時也加入hash value

typedef struct __BST {
    long hash_v;
    struct __BST *left;
    struct __BST *right;
    struct __PHONE_BOOK_ENTRY *data;
} node;

在append中加入了簡單的tree的insert,比較hash value的值,依序插入node






















node *currunt=head;
    if (head == NULL) {
        head = new_node;
        return NULL;
    }
    for (;;) {
        if (new_node->hash_v > currunt->hash_v) {
            if (currunt->right == NULL) {
                currunt->right = new_node;
                return NULL;
            } else {
                currunt = currunt->right;
            }
        } else {
            if (currunt->left == NULL) {
                currunt->left = new_node;
                return NULL;
            } else {
                currunt = currunt->left;
            }
        }
    }

這裡在一開始我遇到一個問題,就是hash function的選擇
一開始我在stackoverflow上找到的djb2 hash function by Dan Bernstein










unsigned long hash(unsigned char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

使用這個hash function發現等了很久程式都還沒跑完
於是在append function中把值印出來
跑了五分鐘字母開頭為a的名字都還沒跑完

我猜測應該是因為測資是有排序過,所以hash value可能讓我的tree相當的傾斜
造成每次append一筆資料都必須造訪過非常多的node,造成append的速度相當緩慢

再來選擇其他hash function後就沒有問題了
hash function使用Jenkins hash function

參考資料：
Jenkins hash function
















uint32_t hash(char * s)
{
    uint32_t hash = 0;

    for(; *s; ++s) {
        hash += *s;
        hash += (hash << 10);
        hash ^= (hash >> 6);
    }

    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);

    return hash;
}

改善後效能

root@XXX:/Desktop/phonebook# ./phonebook_opt 
size of entry : 32 bytes
execution time of append() : 0.378361 sec
execution time of findName() : 0.000000 sec

平均的append花費時間大概是0.37 大約是0.069的5.36倍

但是在findName部分的時間就幾乎降到0
我想電話簿建立一次多花一些時間但是在搜尋上獲得的好處應該是可以彌補的

未來展望

在原始的append部分我發現了每次都會呼叫malloc
如果能宣告一個夠長的陣列來儲存這些資料,當陣列滿時再realloc就可以再提升效能