Try   HackMD

2016q3 Homework1 (phonebook)

contributed by <a530788140>

Reviewed by Louie Lu

  • git log is too simplify for reading
  • calcuate.c 33:40 寫的很奇怪。
  • 沒有推上 Hashtable 的 code.
  • coding style
    • consist if else block (you got two style, i prefer this one)
if (expr) {
	stmt
} else {
	stmt
}

實作二元搜尋樹

  • append
entry *append(char lastName[], entry *root)
{
    /* allocate memory for the new entry and put lastName */
    if (strcasecmp("\0", root->lastName) == 0) {
        strcpy(root->lastName, lastName);
    } else {
        entry *tmp, *current, *parent;
        tmp = (entry *) malloc(sizeof(entry));
        strcpy(tmp->lastName, lastName);
        tmp->lNext = NULL;
        tmp->rNext = NULL;
        current = root;
        while (current != NULL) {
            parent = current;
            if (hash(lastName) > hash(current->lastName))
                current = current->rNext;
            else
                current = current->lNext;
        }
        if (hash(lastName) > hash(parent->lastName))
            parent->rNext = tmp;
        else
            parent->lNext = tmp;
    }
    return root;
}

請保持原始程式碼一致的縮排 (4 個空白,而非 tab),注意看作業要求中 astyle 的設定方式 jserv
感謝老師提醒,已修改儂偉

  • findName
entry *findName(char lastname[], entry *root)
{

    while (root != NULL) {
        if (strcasecmp(lastname, root->lastName) == 0) {
            return root;
        } else {
            if (hash(lastname) > hash(root->lastName)) {
                root = root->rNext;
            } else
                root = root->lNext;
        }
    }
    return NULL;
}

發現問題

程式碼停在main 43行無法編譯儂偉Sat, Sep 24, 2016 3:30 PM

使用GDB

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:25
25	{
(gdb) n
27	    int i = 0;
(gdb) n
33	    fp = fopen(DICT_FILE, "r");
(gdb) n
34	    if (fp == NULL) {
(gdb) n
41	    pHead = (entry *) malloc(sizeof(entry));
(gdb) n
42	    printf("start");
(gdb) n
43	    printf("size of entry : %lu bytes\n", sizeof(entry));
(gdb) n
startsize of entry : 32 bytes
44	    printf("end");
(gdb) n
45	    e = pHead;
(gdb) n
48	    e->rNext = e->lNext = NULL; 
(gdb) 

請避免用圖片來呈現文字,請多用文字方塊,這樣才能方便複製和搜尋jserv
發現44行後無法執行的問題,待解決儂偉Sat, Sep 24, 2016 3:32 PM
已改善,但目前還找不出問題儂偉Sat, Sep 24, 2016 4:51 PM
發現降低字典檔數量就可以執行,還在思考原因儂偉Sat, Sep 24, 2016 5:05 PM

startsize of entry : 32 bytes
endexecution time of append() : 257.225241 sec
execution time of findName() : 0.003507 sec

發現原因,由於append用字串長度,字串長度太接近,樹的深度太深儂偉Sat, Sep 24, 2016 5:13 PM


降低字典檔數量為5千


用二元樹會讓append的時間大幅增加,由於每次都要從頭開始判斷,也有可能是因為判斷大小的不夠均勻,導致樹歪掉,用數學來看log250000~=15,照理說應該不會差到超過100倍,還在思考。

使用hash(djb2)處理字串 改用5W字典檔

unsigned long hash(char* str)
{
    unsigned long hash = 5381;
    int c;
    while ((c = *str++) != 0)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    return hash;
}


發現 findName 的速度快了5倍,但append時間還是太長,應該可以再進一步優化。

未優化

startsize of entry : 136 bytes
endexecution time of append() : 0.008545 sec
execution time of findName() : 0.000757 sec

 Performance counter stats for './phonebook_orig' (100 runs):

           298,739      cache-misses       #   90.767  refs           ( +-  0.55% )
           329,128      cache-references                              ( +-  0.45% )
        38,030,533      instructions       #    0.91  insns per       ( +-  0.04% )
        41,899,470      cycles                                        ( +-  2.70% )

       0.013604583 seconds time elapsed                               ( +-  2.61% )

優化後

 Performance counter stats for './phonebook_opt' (100 runs):

         2,874,890      cache-misses     #    3.110 % of all cache refs ( +-  3.77% )
        92,449,220      cache-references                                ( +-  0.22% )
    51,082,295,945      instructions     #    1.94  insns per cycle     ( +-  0.00% )
    26,326,936,041      cycles                                          ( +-  0.80% )

       9.239935448 seconds time elapsed                                 ( +-  0.89% )

發現優化後cache-misses差了30倍

對照不同資料結構對 phonebook 效能影響的實驗 by Jingfei. jserv

優化hash方式來處理字串

unsigned long hash(char *key)
{
    unsigned int hashVal = 0;
    while (*key != '\0') {
        hashVal = (hashVal << 5) + *key++;
    }

    return hashVal % 42737;
}

大量減少append時間和findname時間

size of entry : 32 bytes
execution time of append() : 0.832717 sec
execution time of findName() : 0.000000996 sec

但cache-misses增加了

 Performance counter stats for './phonebook_opt' (100 runs):

     5,932,784  cache-misses  #   36.397 % of all cache refs( +-  0.26% )
    16,300,312  cache-references                            ( +-  0.07% )
 3,345,379,637  instructions  #    1.23  insns per cycle    ( +-  0.00% )
    2,711,734,441      cycles                               ( +-  0.13% )

    0.829867464 seconds time elapsed                        ( +-  0.99% )

實作HashTable

  • findName
entry *findName(char lastname[], Hashtable *ht)
{
    entry *tmp;
    unsigned int index = hash2(lastname,ht);
    for(tmp = ht->list[index]; tmp != NULL ; tmp = tmp->pNext)
        if(strcasecmp(tmp->lastName,lastname) == 0)
            return tmp;
    printf("%12s can not be find! \n",lastname);
    return NULL;
}

*append

Hashtable *append(char lastName[], Hashtable *ht)
{
    unsigned int index = hash2(lastName, ht);
    entry *new_entry;
    new_entry = (entry*) malloc(sizeof(entry));
    if (new_entry == NULL) return NULL;
    strcpy(new_entry->lastName,lastName);
    new_entry->pNext = ht->list[index];
    ht->list[index] = new_entry;
    return ht;
}

遇到問題

Hashtable size = 42737 
size of entry : 24 bytes
execution time of append() : 0.090755 sec
execution time of findName() : 0.000000 sec

findname一直都是0秒 待解決儂偉

使用GDB觀察findname時間

Breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:100
100	    clock_gettime(CLOCK_REALTIME, &start);
(gdb) p start
$1 = {tv_sec = 1474797779, tv_nsec = 612819655}
(gdb) n
101	    findName(input, e);
(gdb) n
102	    clock_gettime(CLOCK_REALTIME, &end);
(gdb) p end
$2 = {tv_sec = 1474797779, tv_nsec = 698358342}
(gdb) 

計算出findname時間差約為85538687ns