Try   HackMD

2017q3 Homework1 (phonebook)

contributed by <kevin550029>

Perf 工具練習

安裝&初步測試

  1. 按照 perf 原理和實務 中的步驟安裝perf

  2. 嘗試執行 perf top 會出現錯誤訊息, 表示只能在root權限下使用

    error message

  3. 嘗試輸入下列指令, 但仍無法更改 restrict

    ​​​​$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
    
  4. 解決方法是在另外開一個 Terminal , 並在 Kernal 模式下執行

  5. 利用 perf top -p $pid 測試 perf_top_example, 結果如下

練習使用perf stat

  • 測試用程式碼 stat_test.c

    ​​​​#include <stdio.h> ​​​​#include <stdint.h> ​​​​static char array[10000][10000]; ​​​​int main (void){ ​​​​ int i, j; ​​​​ for (i = 0; i < 10000; i++) ​​​​ for (j = 0; j < 10000; j++) ​​​​ array[j][i]++; ​​​​ return 0; ​​​​}
  • 使用 perf stat 分析 stat_test.c cache miss 情形

    ​​​​ g++ -c stat_test.c
    ​​​​ g++ stat_test.o -o stat
    ​​​​ perf stat ./stat
    

    得到結果如下

    ​​​​Performance counter stats for './stat' (5 runs):
    
    ​​​​          493,7852      cache-misses              #    2.241 % of all cache refs      ( +-  1.18% )
    ​​​​       2,2034,9691      cache-references                                              ( +-  0.15% )
    ​​​​      22,0592,2702      instructions              #    2.04  insns per cycle          ( +-  0.00% )
    ​​​​      10,8220,2494      cycles                                                        ( +-  0.19% )
    
    ​​​​       0.344483546 seconds time elapsed                                               ( +-  1.39% )
    
    

    考慮存取的局部性,將 i,j對調改成array[i][j]++

    ​​​​#include <stdio.h> ​​​​#include <stdint.h> ​​​​static char array[10000][10000]; ​​​​int main (void){ ​​​​ int i, j; ​​​​ for (i = 0; i < 10000; i++) ​​​​ for (j = 0; j < 10000; j++) ​​​​ array[i][j]++; ​​​​ return 0; ​​​​}

    cache-references 從 2,2034,9691 下降到 519,4173

    ​​​​      201,6420      cache-misses              #   38.821 % of all cache refs      ( +-  1.78% )
    ​​​​      519,4173      cache-references                                              ( +-  0.75% )
    ​​​​  22,0580,7841      instructions              #    2.74  insns per cycle          ( +-  0.00% )
    ​​​​   8,0382,8158      cycles                                                        ( +-  0.04% )
    
    ​​​​   0.257051135 seconds time elapsed                                               ( +-  2.12% )
    

Phonebook

測試環境

    Linux version 4.4.0-96-generic 
    Ubuntu 16.04.3 LTS
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              256K
    L3 cache:              6144K
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                4
    On-line CPU(s) list:   0-3
    Thread(s) per core:    1
    Core(s) per socket:    4
    Socket(s):             1
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 94
    Model name:            Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz

測試原始程式碼

size of entry : 136 bytes
execution time of append() : 0.038564 sec
execution time of findName() : 0.005541 sec

 Performance counter stats for './phonebook_orig' (100 runs):

          463,5331      cache-misses              #   93.928 % of all cache refs      ( +-  0.86% )
          493,4994      cache-references                                              ( +-  0.83% )
       2,5927,9283      instructions              #    1.39  insns per cycle          ( +-  0.53% )
       1,8636,6232      cycles                                                        ( +-  0.66% )

       0.061023413 seconds time elapsed                                                 ( +-  1.04% )

cache-misses 為 93.928 %

  • phonebook_orig.c

    ​​​​ #include <stdio.h> ​​​​ #include <stdlib.h> ​​​​ #include <string.h> ​​​​ #include <ctype.h> ​​​​ #include "phonebook_orig.h" ​​​​ /* original version */ ​​​​ entry *findName(char lastName[], entry *pHead) ​​​​ { ​​​​ while (pHead != NULL) { ​​​​ if (strcasecmp(lastName, pHead->lastName) == 0) ​​​​ return pHead; ​​​​ pHead = pHead->pNext; ​​​​ } ​​​​ return NULL; ​​​​ } ​​​​ entry *append(char lastName[], entry *e) ​​​​ { ​​​​ /* allocate memory for the new entry and put lastName */ ​​​​ e->pNext = (entry *) malloc(sizeof(entry)); ​​​​ e = e->pNext; ​​​​ strcpy(e->lastName, lastName); ​​​​ e->pNext = NULL; ​​​​ return e; ​​​​ }

    findname 和 append 皆使用 Linklist 的結構
    Linklist 插入或刪除節點較為容易
    但在搜尋存取上必須花時間去取出鏈結指標, 以便讀取下一個節點
    若資料量大, 則須花較多時間搜尋

  • phonebook_orig.h

    ​​​​#ifndef _PHONEBOOK_H ​​​​#define _PHONEBOOK_H ​​​​#define MAX_LAST_NAME_SIZE 16 ​​​​/* original version */ ​​​​typedef struct __PHONE_BOOK_ENTRY { ​​​​ char lastName[MAX_LAST_NAME_SIZE]; ​​​​ char firstName[16]; ​​​​ char email[16]; ​​​​ char phone[10]; ​​​​ char cell[10]; ​​​​ char addr1[16]; ​​​​ char addr2[16]; ​​​​ char city[16]; ​​​​ char state[2]; ​​​​ char zip[5]; ​​​​ struct __PHONE_BOOK_ENTRY *pNext; ​​​​} entry; ​​​​entry *findName(char lastName[], entry *pHead); ​​​​entry *append(char lastName[], entry *e); ​​​​#endif

修改Struct內容

  • 從減少 cache-miss 角度出發

  • 先嘗試將struct中尚未使用到的內容註解掉

    ​​​​typedef struct __PHONE_BOOK_ENTRY { ​​​​ char lastName[MAX_LAST_NAME_SIZE]; ​​​​ /* ​​​​ char firstName[16]; ​​​​ char email[16]; ​​​​ char phone[10]; ​​​​ char cell[10]; ​​​​ char addr1[16]; ​​​​ char addr2[16]; ​​​​ char city[16]; ​​​​ char state[2]; ​​​​ char zip[5]; ​​​​ */ ​​​​ struct __PHONE_BOOK_ENTRY *pNext; ​​​​} entry; ​​​​entry *findName(char lastName[], entry *pHead); ​​​​entry *append(char lastName[], entry *e);

    可以發現 cache-misses 的情況下降到 68.699%,
    append 和 findname 的 performance 都有增加

    ​​​​Performance counter stats for './phonebook_opt' (100 runs):
    
    ​​​​      152,6120      cache-misses              #   68.699 % of all cache refs      ( +-  0.46% )
    ​​​​      222,1462      cache-references                                              ( +-  0.12% )
    ​​​​   2,4388,2112      instructions              #    2.03  insns per cycle          ( +-  0.02% )
    ​​​​   1,1994,1058      cycles                                                        ( +-  0.22% )
    
    ​​​​   0.039064779 seconds time elapsed                                               ( +-  0.33% )
    
    

  • 縮小 Struct 內容

    ​​​​ typedef struct __PHONE_BOOK_USER_INFO{ ​​​​ char firstName[16]; ​​​​ char email[16]; ​​​​ char phone[10]; ​​​​ char cell[10]; ​​​​ char addr1[16]; ​​​​ char addr2[16]; ​​​​ char city[16]; ​​​​ char state[2]; ​​​​ char zip[5]; ​​​​ } UserInfoEntry; ​​​​ typedef struct __PHONE_BOOK_ENTRY { ​​​​ char lastName[MAX_LAST_NAME_SIZE]; ​​​​ UserInfoEntry *UserInfo; ​​​​ struct __PHONE_BOOK_ENTRY *pNext; ​​​​ } entry;

    cache-misses 的情況下降到 70.090%,

    ​​​​Performance counter stats for './phonebook_opt' (100 runs):
    
    ​​​​      154,9857      cache-misses              #   70.090 % of all cache refs      ( +-  0.20% )
    ​​​​      221,1253      cache-references                                              ( +-  0.11% )
    ​​​​   2,4384,1941      instructions              #    2.02  insns per cycle          ( +-  0.02% )
    ​​​​   1,2076,2749      cycles                                                        ( +-  0.13% )
    
    ​​​​   0.039567825 seconds time elapsed                                               ( +-  0.65% )
    

參考資料

共筆 ryanwang522
共筆 ZixinYang
共筆 tina0405
perf 原理和實務
hash function 觀念和實務
Git 的基本使用