Try   HackMD

2017q1 Homework1 (phonebook)

contributed by < twzjwang >
作業說明: B01: phonebook
github: twzjwang/phonebook

Reviewed by zhanyangch

  • 修改 runtime.gp 的指令順序可以使數字不被蓋住

開發環境

  • Ubuntu 14.04.5 LTS
    Linux 4.4.0-62-generic

這個 distribution 太舊,之後的實驗可能會有問題,請升級到 Ubuntu 16.04 之後 "jserv"

  • cpu
    version: Intel® Core i5-3337U CPU @ 1.80GHz
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 3072K

  • memory
    size: 4GiB

開發前

  • 安裝Ubuntu系統
  • 安裝開發工具
    • vim
    • git
    • build-essential
    • linux-tools-common
    • linux-tools-generic
    • cppcheck
    • astyle
    • colordiff
    • gnuplot
  • github設定

未優化版本

  • 清空 cache
    • $ echo 1 | sudo tee /proc/sys/vm/drop_caches
  • 執行 phonebook_orig
    • $ ./phonebook_orig
  • 結果
    ​​​​size of entry : 136 bytes
    ​​​​execution time of append() : 0.108467 sec
    ​​​​execution time of findName() : 0.006022 sec
    
  • cache-test $ make cache-test
  • ​​​​Performance counter stats for './phonebook_orig' (100 runs):
    
    ​​​​     2,080,686      cache-misses              #   96.031 % of all cache refs      ( +-  0.16% )
    ​​​​     2,166,682      cache-references                                              ( +-  0.18% )
    ​​​​   262,557,158      instructions              #    1.41  insns per cycle          ( +-  0.02% )
    ​​​​   185,753,200      cycles                                                        ( +-  0.12% )
    
    ​​​​   0.075103558 seconds time elapsed                                          ( +-  0.94% )
    ​​​​
    

實驗一 - 用 binary search tree

不要濫用「優化」(optimize) 這詞,在你有具體改進之前,都只是實驗(experiment) "jserv"

方法 1: 使用strcmp

避免用 version 1, 2, 3, 等寫法,明確標注你的嘗試途徑,尤其在 Git commit messages 更該如此,原始程式碼和技術報告都是寫給人看的! "jserv"

  • strcmp判斷,left < parent, right > parent
    • append 時間太久 => 直接 ctrl+c
int strcmp(const char *s1, const char *s2)
{
    while(*s1 && (*s1 == *s2))
        s1++, s2++;
    return *(const unsigned char*) s1 - *(const unsigned char*) s2;
}

注意一致的 coding style,應該寫作 char *s1,而非 char *s1,然後是 *s1 == *s2 (有空白) 而非 *s1==*s2 "jserv"

  • 改用all-names.txt find zora測試
  • 結果
    • findName 時間大幅縮短(優化前 20%)
    • append 時間太久(優化前的 90 倍)
    • cache miss 約為 4.795%
 Performance counter stats for './phonebook_orig' (100 runs):

            69,435      cache-misses              #   77.038 % of all cache refs      ( +-  1.23% )
            90,131      cache-references                                              ( +-  1.26% )
        12,058,151      instructions              #    1.13  insns per cycle          ( +-  0.03% )
        10,699,046      cycles                                                        ( +-  1.63% )

       0.004715377 seconds time elapsed                                          ( +-  1.86% )


 Performance counter stats for './phonebook_opt_bst' (100 runs):

            97,979      cache-misses              #    4.795 % of all cache refs      ( +-  2.68% )
         2,043,183      cache-references                                              ( +-  1.37% )
       971,374,950      instructions              #    2.01  insns per cycle          ( +-  0.00% )
       482,156,625      cycles                                                        ( +-  1.04% )

       0.198216919 seconds time elapsed                                          ( +-  1.31% )

上面這行是?課程助教

已刪除 twzjwang

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

方法 2: 改用自訂義 entry_cmp

  • 因為輸入檔字母按順序(a-z) => 產生 skewed BST
  • 改用自訂 cmp => 先比長度
int entry_cmp(char a[], char b[])
{
    if(strlen(a) == strlen(b))
        return strcmp(a, b);
    return strlen(a) - strlen(b);
}

沒必要多次呼叫 strlen,請避免 "jserv"

int entry_cmp(char a[], char b[])
{
    int i = strlen(a) - strlen(b);
    return i ? i : strcmp(a, b);
}
  • 結果
    • append 時間縮短為 version 1 的 0.47%
    • findName 時間縮短為優化前 4%
    • cache miss 約為 5.148%
 Performance counter stats for './phonebook_orig' (100 runs):

            74,670      cache-misses              #   78.183 % of all cache refs      ( +-  0.70% )
            95,508      cache-references                                              ( +-  1.00% )
        12,061,787      instructions              #    1.02  insns per cycle          ( +-  0.02% )
        11,865,179      cycles                                                        ( +-  1.95% )

       0.005664565 seconds time elapsed                                          ( +-  3.60% )


 Performance counter stats for './phonebook_opt_bst' (100 runs):

           134,457      cache-misses              #    5.148 % of all cache refs      ( +-  2.42% )
         2,611,651      cache-references                                              ( +-  0.88% )
       449,076,938      instructions              #    1.88  insns per cycle          ( +-  0.15% )
       238,755,941      cycles                                                        ( +-  1.50% )

       0.098018759 seconds time elapsed                                          ( +-  1.83% )

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 問題
    • 能否再縮短 append 時間?

方法 3: 嘗試修改 entry_cmp

  • 再修改用 cmp => 先比長度,再從結尾比大小(與 strcmp相反)
int entry_cmp(char a[], char b[])
{
    if (strlen(a) == strlen(b)) {
        int l = strlen(a);
        for (int i = l - 1; i >= 0; i--) {
            if (*(a + i) != *(b + i))
                return *(a + i) - *(b + i);
        }
    }
    return strlen(a) - strlen(b);
}

縮減 int iint l 的 scope。
strlen 多次呼叫,可以簡化 "jserv"

int entry_cmp(char a[], char b[])
{
    int m = strlen(a);
    int n = m - strlen(b);
    if (!n) {
        while (m--) {
            if (*(a + m) != *(b + m))
                return *(a + m) - *(b + m);
        }
    }
    return n;
}
  • 結果
    • append 時間縮短為 version 2 的 0.14%
    • findName 時間縮短 < 優化前 1%
    • cache miss 約為 44.641%
      • 推測:
        • 過程中採用 binary search tree 結構存資料
          插入和搜尋中存取 cache 次數與 singly-linked list 不同
          (cache-references 和 cache-misses 次數明顯不同)
        • 已知
          cache max size: 32 KB
          cache block size: 64 B
          entry size: 144 B
          8-way set associative
        • orig
          • append : n * O(1)
            16750筆資料 * 每筆 144 B = 2,412,000 B
            2,412,000 B / 64 B = 37688 block
          • findName : O(n)
            16750筆資料 * 每筆 144 B = 2,412,000 B
            2,412,000 B / 64 B = 37,688 block
          • 37,688 + 37,688 = 75,376 次
        • opt_bst
          • append : O(nlogn)
            16750 * log(16750) * 144 B = 33,844,878 B
            33,844,878 B / 64 B = 528,826 block
          • findName :O(n)
            log(16750) * 144 B = 2,021 B
            2,021 B / 64 B = 32 block
          • 528,826 + 32 = 528,858 次
        • 參考 cache 原理和實驗

如何解釋 cache miss 的變化? jserv

 Performance counter stats for './phonebook_orig' (100 runs):

            61,477      cache-misses              #   74.388 % of all cache refs      ( +-  1.54% )
            82,643      cache-references                                              ( +-  1.34% )
        12,059,048      instructions              #    1.26  insns per cycle          ( +-  0.03% )
         9,593,850      cycles                                                        ( +-  1.84% )

       0.004231672 seconds time elapsed                                          ( +-  2.34% )


 Performance counter stats for './phonebook_opt_bst' (100 runs):

           200,986      cache-misses              #   44.641 % of all cache refs      ( +-  1.99% )
           450,229      cache-references                                              ( +-  1.23% )
        55,910,271      instructions              #    0.95  insns per cycle          ( +-  0.25% )
        58,829,146      cycles                                                        ( +-  1.87% )

       0.025019251 seconds time elapsed                                          ( +-  2.06% )

更改字串 cmp 方法比較 (用all-names.txt find zora)

版本 方法 1 方法 2 方法 3
append 0.189335 0.089585 0.013539
findName 0.000022 0.000007 0.000001
  • 改回words.txt find zyxel
  • 問題
    • 除了改變 cmp 規則,有沒有辦法減少插入、搜尋的深度?
      • balance BST

方法 4: 將 single-linked list 轉換成 Balanced BST

  • 因為輸入字母已按順序排列

可用 sort -R 建立新的輸入資料集 "jserv"

//search by bst
entry *findName(char lastName[], entry *pHead)
{
    entry *temp = pHead;
    int cmp;
    while (temp) {
        cmp = strcmp(lastName, temp->lastName);
        if (cmp == 0)
            return temp;
        else if (cmp < 0)
            temp = temp->pLeft;
        else
            temp = temp->pRight;
    }
    return NULL;
}

//orig append
entry *append(char lastName[], entry *e)
{
    /* allocate memory for the new entry and put lastName */
    e->pNext = (entry *) malloc(sizeof(entry));
    e = e->pNext;
    strcpy(e->lastName, lastName);
    e->pNext = NULL;
    e->pRight = NULL;
    e->pLeft = NULL;

    return e;
}

entry *new_balance_bst(entry *root, int num)
{
    if(!num)
        return NULL;
    entry *bst_root = NULL;
    bst_root = balance_bst(bst_root, num, root);
    free(root);
    return bst_root;
}


entry *balance_bst(entry *e, int num, entry *root)
{
    static entry *bst_build_cur = NULL;
    if(!num)
        return NULL;
    if(!bst_build_cur)
        bst_build_cur = root;

    if(!e){
        e = (entry *) malloc(sizeof(entry));
        e->pNext = NULL;
        e->pRight = NULL;
        e->pLeft = NULL;
    }

    int mid = (num +1)/2;
    e->pLeft = balance_bst(e->pLeft, mid - 1, root);
    strcpy(e->lastName, bst_build_cur->lastName);
    bst_build_cur = bst_build_cur->pNext;
    e->pRight = balance_bst(e->pRight, num - mid, root);
    return e;
}

void free_bst(entry *e)
{
    if (e) {
        free_bst(e->pRight);
        free_bst(e->pLeft);
        free(e);
    }
}

上述的 findName 可改寫為以下:

{
   entry *temp = pHead;
   while (temp) {
       if (!strcmp(lastName, temp->lastName))
           return temp;
       temp = (cmp < 0) ? temp->pLeft : temp->pRight;
   }
   return NULL;
}

"jserv"

  • 結果
    • append 時間降為優化前1.69倍
    • findName 時間 < 優化前0.0178%
    • cache-miss 約為 94.463%
 Performance counter stats for './phonebook_opt_bst' (100 runs):

         2,431,096      cache-misses              #   94.463 % of all cache refs      ( +-  0.10% )
         2,573,593      cache-references                                              ( +-  0.20% )
       481,113,045      instructions              #    1.34  insns per cycle          ( +-  0.01% )
       358,634,304      cycles                                                        ( +-  0.29% )

       0.142062161 seconds time elapsed                                          ( +-  0.62% )

  • 問題
    • 降低 cache-misses ?
    • 直接建成 BST (不經過 singly-linked list )?
      • 邊讀檔邊建 => 不知資料筆數 => 先讀過一次存下資料數

方法 5: 嘗試省略 singly-linked list ,直接建 Balanced BST

  • 修改 versoin 4 先讀檔取得資料數量
  • 直接讀檔建 balance BST
    while (fgets(line, sizeof(line), fp)) {
        while (line[i] != '\0')
            i++;
        count_node++;
    }
  • 結果
    • append 時間降為優化前 1.44 倍
    • findName 時間 < 優化前 0.0169%
    • cache-miss 約為 92.262%
Performance counter stats for './phonebook_opt_bst' (100 runs):

         1,425,612      cache-misses              #   92.262 % of all cache refs      ( +-  0.04% )
         1,545,179      cache-references                                              ( +-  0.11% )
       434,912,471      instructions              #    1.38  insns per cycle          ( +-  0.00% )
       314,507,544      cycles                                                        ( +-  0.10% )

       0.124844305 seconds time elapsed                                          ( +-  0.54% )!

  • 問題
    • 其他方法?

binary search tree 小結

Algo. 優化前(singly-linked list) 優化後(balanced binary search tree)
space O(n) O(n)
insert O(1) O(n) (讀資料數) + O(log n) (插入)
search O(n) O(log n)

實驗二 - 縮減 entry 結構

  • 程式中依據 lastName 插入及搜尋
  • 將其他資訊定義在新的結構 entry_info
    需要時再配置記憶體
/* original version */
typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;

改成以下

typedef struct __PHONE_BOOK_ENTRY_INFO {
    char firstName[16];
    char email[16];
    char phone[10];
    char cell[10];
    char addr1[16];
    char addr2[16];
    char city[16];
    char state[2];
    char zip[5];
} entry_info;

typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    entry_info *pInfo;
    struct __PHONE_BOOK_ENTRY *pNext;
} entry;
  • 結果
    • append 時間為修改前 82%
    • findName 時間為修改前 51%
    • cache-miss 約為 68.577%
      • cache miss 降低原因: size of entry 變小
 Performance counter stats for './phonebook_orig' (100 runs):

         2,042,217      cache-misses              #   95.007 % of all cache refs      ( +-  0.12% )
         2,149,550      cache-references                                              ( +-  0.16% )
       262,784,639      instructions              #    1.43  insns per cycle          ( +-  0.02% )
       183,943,466      cycles                                                        ( +-  0.37% )

       0.073711683 seconds time elapsed                                          ( +-  0.90% )


 Performance counter stats for './phonebook_opt_struct' (100 runs):

           378,199      cache-misses              #   68.577 % of all cache refs      ( +-  0.20% )
           551,497      cache-references                                              ( +-  1.13% )
       245,480,764      instructions              #    1.71  insns per cycle          ( +-  0.02% )
       143,394,756      cycles                                                        ( +-  1.12% )

       0.056445606 seconds time elapsed                                          ( +-  1.34% )

實驗三 - 結合實驗一(BST)與實驗二(縮減 entry 結構)

  • 將前兩個方法結合
  • 結果
    • (優化前) original:
      • 使用 single linked list 儲存資料
      • append 時紀錄最後一個 node
        插入linked list 時間不會太多 ( n*O(1) )
      • findName 最糟的情況下必須從頭找到尾 ( O(n) )
    • (實驗一) opt bst:
      • 使用BST雖然 append 時間稍微增加 ( n * O(log n) )
      • 但 findName 時間大幅減少 ( O(log n) )
    • (實驗二) opt struct:
      • 簡化 entry 結構降低 cache misses rate
      • append 和 findName 時間因此減少
    • (實驗三) optimized
      • 使用 size 較小的資料結構讓 BST 插入、搜尋時間再縮短
Performance counter stats for './phonebook_orig' (100 runs):

         1,987,476      cache-misses              #   95.591 % of all cache refs      ( +-  0.67% )
         2,079,154      cache-references                                              ( +-  0.70% )
       262,484,896      instructions              #    1.43  insns per cycle          ( +-  0.02% )
       184,098,351      cycles                                                        ( +-  0.58% )

       0.084555741 seconds time elapsed                                          ( +-  2.78% )


 Performance counter stats for './phonebook_opt' (100 runs):

           469,681      cache-misses              #   78.364 % of all cache refs      ( +-  0.35% )
           599,356      cache-references                                              ( +-  1.20% )
       314,544,912      instructions              #    1.49  insns per cycle          ( +-  0.00% )
       211,290,940      cycles                                                        ( +-  0.75% )

       0.094401048 seconds time elapsed                                          ( +-  2.10% )


 Performance counter stats for './phonebook_opt_bst' (100 runs):

         1,387,546      cache-misses              #   90.434 % of all cache refs      ( +-  0.14% )
         1,534,318      cache-references                                              ( +-  0.47% )
       434,348,761      instructions              #    1.37  insns per cycle          ( +-  0.00% )
       317,930,025      cycles                                                        ( +-  0.59% )

       0.137107756 seconds time elapsed                                          ( +-  1.86% )


 Performance counter stats for './phonebook_opt_struct' (100 runs):

           374,466      cache-misses              #   71.274 % of all cache refs      ( +-  0.25% )
           525,392      cache-references                                              ( +-  0.86% )
       245,306,198      instructions              #    1.78  insns per cycle          ( +-  0.02% )
       137,518,542      cycles                                                        ( +-  0.87% )

       0.057796514 seconds time elapsed                                          ( +-  2.39% )

changing git commit message

  • 之前沒有注意 commit message 內容及格式
    Author email也打錯
    經過提醒後嘗試修改 commit message
  • git rebase -i HEAD~5 修改前五次 commit message
  • 將要修改那行的 pick 改成 edit
  • git commit --amend
    Author 資訊錯誤也可在此更改
    git commit --amend --author="twzjwang <twzjwang@gmail.com>"
  • 開啟編輯,修改後儲存
  • git rebase --continue
  • 重複以上三步驟
  • git push --force-with-lease

本例選用的 dataset 是否存在問題?

  • 有代表性嗎?跟真實英文姓氏差異大嗎?
    • 許多字不是名詞,甚至是連續而無意義
  • 資料難道「數大就是美」嗎?
    • 除了資料"量"外,"質"也很重要
    • 如果內容大部分是無意義的資料,要從中篩選出有參考價值的資料
  • 如果我們要建立符合真實電話簿情境的輸入,該怎麼作?
    • 從已整理好的開放資料取得更常見的姓氏
    • 要考慮姓名有可能重複的狀況
    • 除了根據"姓名"搜尋資料外,根據 電話號碼 搜尋也是常見的方式,要重新設計演算法

參考資料

tags: twzjwang