Try   HackMD

2017q1 Homework1(phonebook)

contributed by < potinglee >

Reviewed by yachiyang01

-git commit message需用命令語氣,內文與標題需一個空白行

Reviewed by harryoooooooooo

  • C/C++標準中,free(), delete等函式不須再另外判斷是否為NULL,傳入NULL並不會發生錯誤

注意一致的格式 "jserv"
進度嚴重落後,請趕工!課程助教

開發環境

  • OS: Ubuntu 16.04.2 LTS
  • L1d cache: 32K
  • L1i cache: 32K
  • L2 cache: 256K
  • L3 cache: 6144K
  • Architecture: x86_64
  • CPU 作業模式: 32-bit, 64-bit
  • Byte Order: Little Endian
  • CPU(s): 4
  • Model name: Intel® Core i5-6300HQ CPU @ 2.30GHz
  • memory
    description: System memory
    physical id: 0
    size: 7872MiB

未優化版本

  • 執行: $ ./phonebook_orig
    • append() 及 findName() 時間如下
size of entry : 136 bytes
execution time of append() : 0.083434 sec
execution time of findName() : 0.005348 sec
  • catch test: $ make cache-test
    • cache-misses 為 92%
 Performance counter stats for './phonebook_orig' (100 runs):

          457,4257      cache-misses              #   92.288 % of all cache refs      ( +-  0.18% )
          495,6511      cache-references                                              ( +-  0.10% )
       2,6246,7397      instructions              #    1.50  insn per cycle           ( +-  0.02% )
       1,7458,4735      cycles                                                        ( +-  0.19% )

       0.066569148 seconds time elapsed                                          ( +-  0.26% )

優化版本1 - 減少資料結構大小

原本的資料結構為:

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry;

因為只搜尋 lastName,所以把其他不會用到的成員移除。結果如下:

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; } entry;
  • 執行: $ ./phonebook_opt
    • append() 及 findName() 時間如下
size of entry : 24 bytes
execution time of append() : 0.039963 sec
execution time of findName() : 0.001991 sec
  • catch test: $ make cache-test
    • cache-misses 減少為 63%
Performance counter stats for './phonebook_opt' (100 runs):

          102,7372      cache-misses              #   63.629 % of all cache refs      ( +-  0.76% )
          161,4624      cache-references                                              ( +-  0.13% )
       2,4126,7291      instructions              #    2.16  insn per cycle           ( +-  0.02% )
       1,1166,5508      cycles                                                        ( +-  0.19% )

       0.041329077 seconds time elapsed                                          ( +-  0.30% )
  • 和未優化版本的比較圖:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

優化版本2 - 使用二元搜尋樹

將原本的用 linked list 建立的資料,改用二元搜尋樹建立。
原本建立 linked list 的程式:

entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; }

建立二元搜尋樹的程式:

entry *append(char lastName[], entry *e) { if(e == NULL) { strcpy(e->lastName, lastName); return e; } else { //create a new node entry* newnode = (entry *) malloc(sizeof(entry)); strcpy(newnode->lastName, lastName); newnode->pRight = newnode->pLeft = NULL; entry *tmp = e; entry *prev = NULL; while(tmp != NULL) { prev = tmp; if(strcasecmp(lastName, tmp->lastName) > 0) tmp = tmp->pRight; else if(strcasecmp(lastName, tmp->lastName) < 0) tmp = tmp->pLeft; } if(prev->pRight == tmp) prev->pRight = newnode; else prev->pLeft = newnode; return newnode; } }

搜尋二元搜尋樹的 findName():

entry *findName(char lastName[], entry *pHead) { if(pHead == NULL) { /* Element is not found */ return NULL; } while(pHead != NULL) { if(strcasecmp(lastName, pHead->lastName) == 0) { return pHead; } else { if(strcasecmp(lastName, pHead->lastName) > 0) { pHead = pHead->pRight; } else { pHead = pHead->pLeft; } } } return pHead; }
  • 執行: $ ./phonebook_opt
    • append() 及 findName() 時間如下
size of entry : 32 bytes
execution time of append() : 0.309874 sec
execution time of findName() : 0.003658 sec
  • catch test: $ make cache-test
    • cache-misses 減少為 68%,但因資料結構較大的緣故,cache-miss較優化版本1增加 5%。
 Performance counter stats for './phonebook_opt' (100 runs):

          160,2956      cache-misses              #   68.397 % of all cache refs      ( +-  0.50% )
          234,3613      cache-references                                              ( +-  0.11% )
       3,2665,4220      instructions              #    2.38  insn per cycle           ( +-  0.01% )
       1,3712,2147      cycles                                                        ( +-  0.16% )

       0.050254760 seconds time elapsed                                          ( +-  0.32% )
  • 和未優化版本的比較圖:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    雖然使用了二元搜尋樹,但 findNmae()的時間卻沒減少,反而增加。
    推測是由於每次加新資料都從 tree 的尾端,因此會形成這種狀況:






bst



aaaa

aaaa



aaaaa

aaaaa



aaaa->aaaaa


pRight



NULL

NULL



aaaa->NULL


pLeft



aaaaaa

aaaaaa



aaaaa->aaaaaa


pRight



aaaaa->NULL


pLeft



aaaaaaa

aaaaaaa



aaaaaa->aaaaaaa


pRight



aaaaaa->NULL


pLeft



aaaaaaa->NULL


pLeft



每一個 node 雖有兩個指標,實際上卻只用到一個(另一個指向 NULL),因此整個二元搜尋樹等同於普通的 linked list,而無法發揮二元搜尋樹的優勢。

請及早調整輸入的 data set "jserv"

優化版本3 - 用樹旋轉改進以上的問題

(working)

可以先將 words.txt 透過 -R 打亂排序
課程助教