contributed by <kevin550029
>
按照 perf 原理和實務 中的步驟安裝perf
嘗試執行 perf top
會出現錯誤訊息, 表示只能在root權限下使用
嘗試輸入下列指令, 但仍無法更改 restrict
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
解決方法是在另外開一個 Terminal , 並在 Kernal 模式下執行
利用 perf top -p $pid
測試 perf_top_example, 結果如下
測試用程式碼 stat_test.c
#include <stdio.h>
#include <stdint.h>
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[j][i]++;
return 0;
}
使用 perf stat 分析 stat_test.c
cache miss 情形
g++ -c stat_test.c
g++ stat_test.o -o stat
perf stat ./stat
得到結果如下
Performance counter stats for './stat' (5 runs):
493,7852 cache-misses # 2.241 % of all cache refs ( +- 1.18% )
2,2034,9691 cache-references ( +- 0.15% )
22,0592,2702 instructions # 2.04 insns per cycle ( +- 0.00% )
10,8220,2494 cycles ( +- 0.19% )
0.344483546 seconds time elapsed ( +- 1.39% )
考慮存取的局部性,將 i,j對調改成array[i][j]++
#include <stdio.h>
#include <stdint.h>
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[i][j]++;
return 0;
}
cache-references 從 2,2034,9691 下降到 519,4173
201,6420 cache-misses # 38.821 % of all cache refs ( +- 1.78% )
519,4173 cache-references ( +- 0.75% )
22,0580,7841 instructions # 2.74 insns per cycle ( +- 0.00% )
8,0382,8158 cycles ( +- 0.04% )
0.257051135 seconds time elapsed ( +- 2.12% )
Linux version 4.4.0-96-generic
Ubuntu 16.04.3 LTS
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
size of entry : 136 bytes
execution time of append() : 0.038564 sec
execution time of findName() : 0.005541 sec
Performance counter stats for './phonebook_orig' (100 runs):
463,5331 cache-misses # 93.928 % of all cache refs ( +- 0.86% )
493,4994 cache-references ( +- 0.83% )
2,5927,9283 instructions # 1.39 insns per cycle ( +- 0.53% )
1,8636,6232 cycles ( +- 0.66% )
0.061023413 seconds time elapsed ( +- 1.04% )
cache-misses 為 93.928 %
phonebook_orig.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "phonebook_orig.h"
/* original version */
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
findname 和 append 皆使用 Linklist 的結構
Linklist 插入或刪除節點較為容易
但在搜尋存取上必須花時間去取出鏈結指標, 以便讀取下一個節點
若資料量大, 則須花較多時間搜尋
phonebook_orig.h
#ifndef _PHONEBOOK_H
#define _PHONEBOOK_H
#define MAX_LAST_NAME_SIZE 16
/* original version */
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
#endif
從減少 cache-miss 角度出發
先嘗試將struct中尚未使用到的內容註解掉
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
/*
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
*/
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
可以發現 cache-misses 的情況下降到 68.699%,
append 和 findname 的 performance 都有增加
Performance counter stats for './phonebook_opt' (100 runs):
152,6120 cache-misses # 68.699 % of all cache refs ( +- 0.46% )
222,1462 cache-references ( +- 0.12% )
2,4388,2112 instructions # 2.03 insns per cycle ( +- 0.02% )
1,1994,1058 cycles ( +- 0.22% )
0.039064779 seconds time elapsed ( +- 0.33% )
縮小 Struct 內容
typedef struct __PHONE_BOOK_USER_INFO{
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} UserInfoEntry;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
UserInfoEntry *UserInfo;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
cache-misses 的情況下降到 70.090%,
Performance counter stats for './phonebook_opt' (100 runs):
154,9857 cache-misses # 70.090 % of all cache refs ( +- 0.20% )
221,1253 cache-references ( +- 0.11% )
2,4384,1941 instructions # 2.02 insns per cycle ( +- 0.02% )
1,2076,2749 cycles ( +- 0.13% )
0.039567825 seconds time elapsed ( +- 0.65% )
共筆 ryanwang522
共筆 ZixinYang
共筆 tina0405
perf 原理和實務
hash function 觀念和實務
Git 的基本使用