2017q1 Homework1 (phonebook)

contributed by <F74021200>

中英文字間請以空白隔開
進度落後太多請快快追上!
課程助教

好的!
F74021200

開發環境

電腦規格：

tinin@tinin:~$ lscpu
Architecture:        x86_64
CPU 作業模式：    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：2
每通訊端核心數：2
Socket(s):             1
NUMA 節點：         1
供應商識別號：  GenuineIntel
CPU 家族：          6
型號：              58
Model name:            Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz
製程：              9
CPU MHz：             2994.030
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              5188.16
虛擬：              VT-x
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           3072K
NUMA node0 CPU(s):     0-3

linux kernel:

tinin@tinin:~$ uname -a
Linux tinin 4.8.0-39-generic 
#42~16.04.1-Ubuntu SMP Mon Feb 20 15:06:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Perf top

$./phonebook_orig & sudo perf top -p $!

  39.16%  libc-2.23.so    [.] __strcasecmp_l_avx
  31.70%  phonebook_orig  [.] findName
   7.24%  [kernel]        [k] release_pages
   5.56%  phonebook_orig  [.] strcasecmp@plt
   5.45%  [kernel]        [k] unmap_page_range
   3.60%  [kernel]        [k] nmi
   1.83%  libc-2.23.so    [.] __strcasecmp_avx
   1.82%  [kernel]        [k] common_perm_cond
   1.82%  [kernel]        [k] mem_cgroup_update_lru_size
   1.81%  [kernel]        [k] kmem_cache_free
   0.01%  [kernel]        [k] native_iret
   0.00%  [kernel]        [k] native_write_msr

消耗 CPU 周期最多的部份為函式 findName()

Phonebook 效能

cache miss

$ ./phonebook_orig

size of entry : 136 bytes
execution time of append() : 0.048713 sec
execution time of findName() : 0.005059 sec

sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

執行

perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

Performance counter stats for './phonebook_orig' (100 runs):
    2,096,128    cache-misses        #   94.483 % of all cache refs    ( +-  0.20% )
    2,218,535    cache-references                                      ( +-  0.16% )
  262,467,890    instructions        #    1.32  insn per cycle                                        ( +-  0.02% )
  199,293,309    cycles                                                ( +-  0.57% )
  
  0.065489355 seconds time elapsed                                  ( +-  0.69% )

由上圖知， cache miss 高達 94.800%

檢視程式與資料結構發現，在搜尋 phonebook 時，僅使用到 lastName ，因此，透過縮小 struct，使能加入 cache 的資料 struct 增加，從而降低 cache miss 發生的機率。















typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    /*
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    */
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

修改後

$ ./phonebook_opt

size of entry : 24 bytes
execution time of append() : 0.044088 sec
execution time of findName() : 0.002203 sec

執行時間縮短了！

perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_opt

Performance counter stats for './phonebook_opt' (100 runs):
      253,406    cache-misses        #   71.468 % of all cache refs    ( +-  0.26% )
      354,574    cache-references                                      ( +-  0.93% )
  241,179,130    instructions        #    1.80  insn per cycle                                        ( +-  0.02% )
  133,748,026    cycles                                                ( +-  0.67% )
  
  0.043806218 seconds time elapsed                                  ( +-  0.69% )

由上圖知， cache miss 發生次數下降了

Hash function

使用 hash function 增加搜尋速度
In phonebook_opt.h





struct phonebook {
    Entry *Data[HASH_TABLE_SIZE];
    func_t findName;
    read_t readFile;
};

In phonebook_opt.c


































































static Entry *findName_impl(char lastName[], Phonebook **self)
{
    Entry *e;

    e = ((*self)->Data)[BKDRHash(lastName) % HASH_TABLE_SIZE];

    while (e) {
        if (strcasecmp(lastName, e->lastName) == 0)
            return e;
        e = e->pNext;
    }
    return NULL;
}

static void readFile_impl(FILE *fp, Phonebook **self)
{
    char line[MAX_LAST_NAME_SIZE];
    int i = 0;
    Entry *e;
    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
        i = 0;
        e = (*self)->Data[BKDRHash(line) % HASH_TABLE_SIZE];
        while(e->pNext)
            e = e->pNext;
        e = append(line, e);
    }
}

Entry *append(char lastName[], Entry *e)
{
    e->pNext = (Entry *) malloc(sizeof(Entry));
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;

    return e;
}

int init_phonebook (Phonebook **self)
{
    int i;

    if (NULL == ((*self) = malloc(sizeof(Phonebook)))) return -1;
    for (i = 0; i < HASH_TABLE_SIZE; ++i) {
        if (NULL == (((*self)->Data)[i] = (Entry *) malloc(sizeof(Entry)))) return -1;
        ((*self)->Data)[i]->pNext = NULL;
    }
    (*self)->findName = findName_impl;
    (*self)->readFile = readFile_impl;
    return 0;
}

unsigned int BKDRHash(char *str)
{
    unsigned int seed = 131;
    unsigned int hash = 0;

    while (*str) {
        hash = hash * seed + (*str++);
    }

    return hash;
}

修改後

$ ./phonebook_opt

execution time of append() : 30.423652 sec
execution time of findName() : 0.000126 sec

perf stat --repeat 100 \
        -e cache-misses,cache-references,instructions,cycles \
        ./phonebook_opt

 Performance counter stats for './phonebook_opt' (100 runs):

       294,807,802      cache-misses              #   65.948 % of all cache refs      ( +-  0.68% )
       447,031,194      cache-references                                              ( +-  0.07% )
     3,394,511,333      instructions              #    0.04  insn per cycle                                              ( +-  0.01% )
    80,193,329,008      cycles                                                        ( +-  0.57% )

      27.472831615 seconds time elapsed                                          ( +-  0.50% )

搜尋時間確實下降了（0.005059 sec->0.000126 sec）， cache-miss 也下降（94.483%->65.948 %），但讀入時間增加太多(0.044088 sec->30.423652 sec)，應該是因為在讀入每筆資料時，都須先計算其 hash 值，在計算 hash 值時，除了額外計算的時間，還有呼叫函式的時間，另外，當找到 hash 值後，若所在的陣列已有其他資料，又必須經過一串 linked list 到達最後一個後，才能將新資料加入；因此，讀入所有資料的時間才會如此可觀。
以下嘗試兩種方法:

(1)將 hash function 展開，減少呼叫函式所花費的時間。












































static Entry *findName_impl(char lastName[], Phonebook **self)
{
    Entry *e; 
    unsigned int seed = 131;
    unsigned int hash = 0;
    char *str;

    str = lastName;
    while (*str) {
        hash = hash * seed + (*str++);
    }   
    e = ((*self)->Data)[hash % HASH_TABLE_SIZE];
    while (e) {
        if (strcasecmp(lastName, e->lastName) == 0)
            return e;
        e = e->pNext;
    }   
    return NULL;
}

static void readFile_impl(FILE *fp, Phonebook **self)
{
    char line[MAX_LAST_NAME_SIZE], *str;
    int i = 0;
    unsigned int seed = 131;
    unsigned int hash = 0;
    Entry *e; 
    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
        i = 0;

        str = line;
        hash = 0;
        while(*str) {
            hash = hash * seed + (*str++);
        }
        e = (*self)->Data[hash % HASH_TABLE_SIZE];
        while(e->pNext)
            e = e->pNext;
        e = append(line, e); 
    }   
}

$ make run

(上為原本的 hash 版本，下為修改後的)

execution time of append() : 33.082860 sec
execution time of findName() : 0.000200 sec
execution time of append() : 33.490506 sec
execution time of findName() : 0.000144 sec

$ perf stat --repeat 100 \
	-e cache-misses,cache-references,instructions,cycles \
	./phonebook_opt_1

Performance counter stats for './phonebook_opt_1' (100 runs):

       332,570,144      cache-misses              #   73.024 % of all cache refs      ( +-  0.75% )
       455,426,921      cache-references                                              ( +-  0.16% )
     3,394,695,626      instructions              #    0.04  insn per cycle                                              ( +-  0.02% )
    88,919,125,151      cycles                                                        ( +-  0.70% )

      29.833768554 seconds time elapsed                                          ( +-  0.51% )

結果，並沒有獲得太大的改善。

(2)加入一筆資料時，直接放在第一個

說明：因為原本的情況是，在加入一筆資料時，會將其放在所屬 hash set 裡 linked-list 的最後一個；現在，改將其直接加入到第一個位置。

























static void readFile_impl(FILE *fp, Phonebook **self)
{
    char line[MAX_LAST_NAME_SIZE], *str;
    int i = 0;
    unsigned int seed = 131;
    unsigned int hash = 0;
    Entry *e, *e1;
    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
        i = 0;

        str = line;
        hash = 0;
        while(*str) {
            hash = hash * seed + (*str++);
        }
        e = (*self)->Data[hash % HASH_TABLE_SIZE];
        e1 = e->pNext;
        e->pNext = NULL;
        e = append(line, e); 
        e->pNext = e1; 
    }   
}

$ make run

execution time of append() : 31.395222 sec
execution time of findName() : 0.000202 sec
execution time of append() : 0.092144 sec
execution time of findName() : 0.000000 sec

$ perf stat --repeat 100 \
	-e cache-misses,cache-references,instructions,cycles \
	./phonebook_opt_1

Performance counter stats for './phonebook_opt_1' (100 runs):

           260,094      cache-misses              #   59.930 % of all cache refs      ( +-  0.20% )
           433,995      cache-references                                              ( +-  1.34% )
       231,826,919      instructions              #    1.43  insn per cycle                                              ( +-  0.02% )
       161,765,903      cycles                                                        ( +-  1.40% )

       0.057799116 seconds time elapsed                                          ( +-  1.44% )

輸入時所花時間大幅下降，由此可知，在原版本的輸入中，於 linked-list 的搜尋花了很多時間(上為原本的 hash 版本，下為修改後的)；另外，修改後，搜尋字串"zyxel"的時間降為0.000000 sec ，在檢查 words.txt 後，發現"zyxel"為所在 hash set 的第一個結點，可能因此使得搜尋時間非常短。

binary search tree

使用不同的資料結構觀察效能的改變，這次用 binary search tree 。
In phonebook_bst.h











struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    struct __PHONE_BOOK_ENTRY *left;
    struct __PHONE_BOOK_ENTRY *right;
};

struct phonebook {
    Entry *Data;
    func_t findName;
    read_t readFile;
};

In phonebook_bst.c






















































































static Entry *findName_impl(char lastName[], Phonebook **self)
{
    Entry * p;
    int cmp;
    
    p = (*self)->Data;
    while (1) {
        cmp = strcmp (lastName, p->lastName);
        if (cmp > 0) {
            if (p->right)p = p->right;
            else {
                printf ("No such name!\n");
                return NULL;
            }
        }
        else if (cmp < 0) { 
            if (p->left)p = p->left;
            else {
                printf ("No such name!\n");
                return NULL;
            }
        }
        else {
            return p;
        }
    }
}

static void readFile_impl(FILE *fp, Phonebook **self)
{
    int i = 0;
    char line[MAX_LAST_NAME_SIZE];

    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        line[i - 1] = '\0';
        i = 0;
        insert (line, (*self)->Data);
    }
}

Entry *insert(char lastName[], Entry *Data)
{
    Entry *p;
    int cmp;
    
    p = Data; 
    while (1) {
        cmp = strcmp (lastName, p->lastName);
        if (cmp > 0) {
            if (p->right)p = p->right;
            else {
                p->right = (Entry *) malloc (sizeof(Entry));
                p = p->right;
                strcpy (p->lastName, lastName);
                p->left = NULL;
                p->right = NULL;
                return p;
            }
        }
        else { 
            if (p->left)p = p->left;
            else {
                p->left = (Entry *) malloc (sizeof(Entry));
                p = p->left;
                strcpy (p->lastName, lastName);
                p->left = NULL;
                p->right = NULL;
                return p;
            }
        }
    }
}

int init_phonebook (Phonebook **self)
{
    if (NULL == (*self = malloc(sizeof(Phonebook)))) return -1;
    if (NULL == ((*self)->Data = (Entry *) malloc(sizeof(Entry)))) return -1;
    strcpy((*self)->Data->lastName, "n");
    (*self)->Data->left = NULL;
    (*self)->Data->right = NULL;
    (*self)->findName = findName_impl;
    (*self)->readFile = readFile_impl;
    return 0;
}

在觀察 words.txt 後，我發現資料的前幾個字母是按照 abcdefg … 的方式排列，為了不使 tree 中的結點都集中在某一邊，我先將根結點預設為 n 。

$  ./phonebook_bst

execution time of append() : 195.034462 sec
execution time of findName() : 0.000834 sec

與 orig 相比，搜尋時間確實減少了，但讀入資料的時間卻太長；
與 hash function 版本相比，除了搜尋時間較長外，讀入時間亦長的誇張，由 words.txt 中的資料來看，應是所產生的 tree 太過類似 skewed tree 所導致的。

不過， cache-miss 卻下降許多（94.483%->41.123 %），如下：

Performance counter stats for './phonebook_bst' (100 runs):

     1,908,252,577      cache-misses              #   41.123 % of all cache refs      ( +-  1.27% )
     4,640,306,720      cache-references                                              ( +-  1.10% )
 1,167,592,842,608      instructions              #    2.26  insn per cycle                                              ( +-  0.22% )
   516,320,047,658      cycles                                                        ( +-  0.47% )

     163.680049478 seconds time elapsed                                          ( +-  0.54% )

問題:
Q1.:本例選用的dataset(也就是輸入電話簿的姓名)是否存在問題?
A1.:
首先，須確認dataset 中資料有多少比例是目前真實存在的英文姓氏，因為在電話簿中，不可能出現非姓氏的字，若出現，則此資料中的人也不存在，這筆資料會被直接捨棄，因此，當此 dataset 中出現非姓氏詞時，就表示此 dataset 有問題；所以，要回答這問題需要對此 dataset 中的資料與真實姓氏的資料比對，依比對結果回答。

Q2.:有代表性嗎?跟真實英文姓氏差異大嗎?
A2.:
關於是否有代表性，需要考慮許多因素，就此電話簿所用年代而言，就必須考慮那些姓氏出現在所用年代，有一網站能查詢到英文姓氏在不同年代於美國的分布以及其起源，舉個例，依記載，在1843到1999年，紀錄了一位名叫 Elka Aaaa 的人，確實有人的姓氏是 aaaa；所以，需要知道更多此電話簿的用途資訊，才能知道其是否有代表性；關於第二個問題，可能需要先更明確的定義'所要的真實英文姓氏'的定義，才能有後續的動作，就如上一個問題中一樣，所謂'真實英文姓氏'也有年代問題，不同年代的英文姓氏組成不同，須先設定好年代才能比較差異性。
Q3.:資料難道「數大便是美」嗎?
A3.:
就統計學來說，於母群體中，隨機抽樣的數量越多，資料分析出的結果越趨近於母群體，因此，'數大'，只代表取樣的數量大，第一，沒限制須隨機抽樣，這很可能導致資料出現極大偏差；第二，沒限制所取樣本是在母群體中，這會使得資料的代表性降低；因此，資料並非'數大'便是美，所謂 garbage in, garbage out ，即使'數大'的垃圾，所產亦是垃圾。

Q4.:如果我們要建立符合真實電話簿情境的輸入，該怎麼作?
A4.:
首先，要先確認此情境的設定，例如:年代、地區，若情況允許，當然是直接取得該地的人口姓氏，若無法取得，但時間允許的話，就在此地區隨機分配調查人員，依照人群流動時的疏密程度比例配置調查員的數量，如此能盡可能將母群體中的資料都蒐集到，使取樣資料越接近母群體的實際資料；若無法做實地調查，則無法知道各姓氏所佔比例，那只好從現在仍存在的所有姓氏中，隨機取樣做出電話簿了。

這種等級的回覆，跟文組學生有什麼差別？理工人要拿出數據、數學模型，還有推理出來！ –jserv

是!我趕緊修改!
F74021200

參考資料：
perf 原理和實務
 Programming Small
gnuplot 語法解說和示範
 Makefile 語法和示範
 hash function 觀念和實務
 哈西表
 哈西函數
 哈西函式 in C
你所不知道的 C 語言：物件導向程式設計篇
 How to Write a Git Commit Message
Hsiang的共筆

2017q1 Homework1 (phonebook)

開發環境

Perf top

Phonebook 效能

cache miss

Hash function

binary search tree

tags: by F74021200

tags: `by F74021200`