Try   HackMD

2017q3 Homework1 (phonebook)

contributed by <ZixinYang>

開發環境

Linux 4.10.0-35-generic
Ubuntu 16.04.1 LTS
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K

Linux 效能分析工具: Perf

  • 暖身

    • 執行結果
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
    • 重要: 輸入 $ sudo su 切換到 root 權限下, 以取得所有量測資訊
  • 背景知識 review

    • Cache: 為一種SRAM, 存取速度非常快, 而一般記憶體跟不上處理器的執行速度, 因此將常用的資料保存在cache中, 處理器便無須等待, 因此能夠改善軟體效能。 Cache 具有局部性的特徵, 利用局部性可以達到極高的命中率, 進而提昇效能
    • Pipeline: 將一個指令切成很多個小指令
    • Superscalar: 在同一個時鐘週期內平行處理多個指令
    • Out-of-order execution: 因不同指令所需的執行時間和時鐘週期不同, 為充分利用處理器的pipeline, 指令可能被亂序執行
    • Branch prediction: 根據同一條指令的歷史執行紀錄進行預測, 讀取最可能的下一條指令, 而非順序讀取!
  • perf stat 練習

static char array[100000000]; int main(){ int i,j; for(i = 0; i < 10; i++) for(j = 0; j < 100000000; j++) array[j]++; return 0; }

結果

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • 考慮時間局部性, 更改程式碼, 內外迴圈執行次數對調
static char array[100000000]; int main(){ int i,j; for(j = 0; j < 1000000000; i++) for(i = 0; i < 10; j++) array[j]++; return 0; }

結果

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

cache-references 前後相差了6倍

案例探討: Phonebook

原始程式

phonebook_orig.h

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *findName(char lastname[], entry *pHead); entry *append(char lastName[], entry *e);

phonebook_orig.c

entry *findName(char lastname[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastname, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; }

每次搜尋皆從頭搜尋一一比對, 複雜度為

O(n), 若資料龐大, 則不利於搜尋。

Performance counter stats for './phonebook_orig' (100 runs):

          464,8577      cache-misses              #   93.742 % of all cache refs      ( +-  0.17% )
          495,8902      cache-references                                              ( +-  0.04% )
       2,6220,2416      instructions              #    1.45  insn per cycle           ( +-  0.02% )
       1,8034,3116      cycles                                                        ( +-  0.11% )

       0.056852745 seconds time elapsed                                          ( +-  0.37% )

縮減 struct 大小

搜尋過程只用到 lastName 的資訊, 所以其他資訊就放在另一個 struct, 這樣可以減少存取不必要的資訊, 降低 cache misses

phonebook_struct.h

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; struct __PHONE_BOOK_ENTRY *pNext; } entry; typedef struct __PHONE_BOOK_UNUSED { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } unused; entry *findName(char lastname[], entry *pHead); entry *append(char lastName[], entry *e);
Performance counter stats for './phonebook_opt' (100 runs):

          105,6083      cache-misses              #   65.584 % of all cache refs      ( +-  0.59% )
          161,0266      cache-references                                              ( +-  0.10% )
       2,4115,1425      instructions              #    2.17  insn per cycle           ( +-  0.02% )
       1,1106,5061      cycles                                                        ( +-  0.08% )

       0.035205177 seconds time elapsed                                          ( +-  0.21% )

改寫 Makefile

若需要更改標頭檔內容, 同時就要更改 main.c, 而執行 $make指令時, phonebook_orig.c, phonebook_orig.h 會重新編譯, 就會因為原始程式沒有定義新的變數而發生 error, 因此需要更改Makefile (需確定之前已經產生過原始程式的執行檔)

註解以下這行:

phonebook_orig: $(SRCS_common) phonebook_orig.c phonebook_orig.h
      $(CC) $(CFLAGS_common) $(CFLAGS_orig) \
          -DIMPL="\"$@.h\"" -o $@ \
          $(SRCS_common) $@.c

run 的目標改為 phonebook_opt:

run: $(EXEC)
        echo 3 | sudo tee /proc/sys/vm/drop_caches
        watch -d -t "./phonebook_opt && echo 3 | sudo tee /proc/sys/vm/drop_caches"

Binary Search Tree

直覺想到可以一次翻一半的電話簿進行比對, 但如果用 linked list 無法快速找到中間節點, 所以我們需要建一棵 BST, 新增每筆資料時依照 lastName 長出 BST, 第一次搜尋先比對 root, 再繼續往子節點搜尋。

phonebook_BST.h

typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pLeft; struct __PHONE_BOOK_ENTRY *pRight; } entry; entry *findName(char lastname[], entry *pHead); entry *append(char lastName[], entry *e);

phonebook_BST.c

entry *findName(char lastName[], entry *pHead) { if (pHead != NULL){ if(strcmp(lastName, pHead->lastName) == 0) return pHead; else if (strcmp(lastName, pHead->lastName) < 0) return findName(lastName, pHead->pLeft); else { return findName(lastName, pHead->pRight); } } return NULL; } entry *append(char lastName[], entry *e) { if(strcmp(e->lastName, "")==0 && !e->pRight){strcpy(e->lastName, lastName); return e;} entry *current; current = e; entry *ptr; while(current != NULL){ if(strcmp(lastName, current->lastName)==0){ return e; } else if(strcmp(lastName, current->lastName)<0){ ptr = current; current = current->pLeft; } else{ ptr = current; current = current->pRight; } } entry *Node = (entry *)malloc(sizeof(entry)); strcpy(Node->lastName, lastName); Node->pLeft = NULL; Node->pRight = NULL; if(strcmp(lastName, ptr->lastName)<0){ ptr->pLeft = Node; } else{ ptr->pRight = Node; } return e; }
Performance counter stats for './phonebook_opt' (100 runs):

          241,3916      cache-misses              #   79.843 % of all cache refs      ( +-  0.07% )
          302,3343      cache-references                                              ( +-  0.09% )
       2,4345,1900      instructions              #    1.52  insn per cycle           ( +-  0.09% )
       1,5988,7185      cycles                                                        ( +-  0.18% )

       0.150832553 seconds time elapsed                                          ( +-  7.75% )


跟原始程式相比 cache-misses下降許多, 但花了更多時間。

綜合前兩個方法

Performance counter stats for './phonebook_opt' (100 runs):

           99,3860      cache-misses              #   77.827 % of all cache refs      ( +-  0.22% )
          127,7015      cache-references                                              ( +-  0.13% )
       2,2206,5066      instructions              #    1.77  insn per cycle           ( +-  0.06% )
       1,2528,9025      cycles                                                        ( +-  0.30% )

       0.147768829 seconds time elapsed                                          ( +-  7.12% )

Hash Function

參考st9007a同學的共筆, 使用BKDR hash function

phonebook_HASH.h

typedef struct __PHONE_BOOK_INFO { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } info; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_INFO *pInfo; struct __PHONE_BOOK_ENTRY *pNext; } entry; typedef struct { entry *cell[HASH_TABLE_SIZE]; } hashTable; unsigned int BKDRHash(char *str); entry *findName(char lastName[], hashTable *table); void append(char lastName[], hashTable *table);

phonebook_HASH.c

unsigned int BKDRHash(char *str) { unsigned int seed = 131; unsigned int hash = 0; while (*str) { hash = hash * seed + (*str++); } return (hash & 0x7FFFFFFF); } entry *findName(char lastName[], hashTable *table) { unsigned int i = BKDRHash(lastName) % HASH_TABLE_SIZE; entry *head = table->cell[i]; while (strcmp(head->lastName, lastName) != 0) { head = head->pNext; if (head == NULL) break; } return head; } void append(char lastName[], hashTable *table) { unsigned int i = BKDRHash(lastName) % HASH_TABLE_SIZE; entry *e = (entry *) malloc(sizeof(entry)); e->pInfo = (info *) malloc(sizeof(entry)); e->pNext = NULL; strcpy(e->lastName, lastName); if (table->cell[i] == NULL) { table->cell[i] = e; return; } entry *head = table->cell[i]; while (head->pNext != NULL) { head = head->pNext; } head->pNext = e; }

遇到的問題:
執行$make run
error: recipe for target 'run' failed
error:

make: *** wait: No child processes.  Stop.
make: *** Waiting for unfinished jobs....
make: *** wait: No child processes.  Stop.

微心得:
花的時間比想像中多, 完成度不足, perf, gnuplot, git, makefile 都不熟, 寫個 c 還有 bug, 有點慘orz, 不過受到很多刺激感覺滿興奮的, 要再多花時間努力追上。