contributed by <yusihlin
>
zhanyangch
$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 78
Model name: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
製程: 3
CPU MHz: 799.072
CPU max MHz: 2800.0000
CPU min MHz: 400.0000
BogoMIPS: 4800.00
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
$ sudo apt-get install linux-tools-common
$ uname -r
4.10.0-35-generic
$ sudo apt-get install linux-tools-4.10.0-35-generic linux-cloud-tools-4.10.0-35-generic
$ cat /proc/sys/kernel/perf_event_paranoid
3
$ sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"
$ sudo sh -c "echo 0 > /proc/sys/kernel/kptr_restrict"
$ git clone https://github.com/yusihlin/phonebook
$ cd phonebook
$ make
$ make run
size of entry : 136 bytes
execution time of append() : 0.074943 sec
execution time of findName() : 0.005417 sec
$ sudo apt-get install gnuplot //安裝軟體
$ make plot
$ eog runtime.png
$ perf stat --repeat 100 \
-e cache-misses,cache-references,instructions,cycles \
./phonebook_orig
Performance counter stats for './phonebook_orig' (100 runs):
4,752,110 cache-misses # 93.834 % of all cache refs ( +- 0.06% )
5,064,396 cache-references ( +- 0.16% )
261,801,297 instructions # 1.46 insn per cycle ( +- 0.02% )
179,657,037 cycles ( +- 0.23% )
0.068653651 seconds time elapsed ( +- 1.38% )
可以看到 cache miss 達到 93.8%
$ perf record ./phonebook_orig
$ perf report
Samples: 375 of event 'cycles', Event count (approx.): 176128125
Overhead Command Shared Object Symbol
14.18% phonebook_orig phonebook_orig [.] main
13.23% phonebook_orig phonebook_orig [.] findName
12.77% phonebook_orig libc-2.23.so [.] __strcasecmp_l_avx
8.35% phonebook_orig libc-2.23.so [.] _IO_fgets
7.26% phonebook_orig libc-2.23.so [.] _int_malloc
5.53% phonebook_orig libc-2.23.so [.] __memcpy_sse2
3.78% phonebook_orig libc-2.23.so [.] _IO_getline_info
3.73% phonebook_orig [kernel.kallsyms] [k] page_fault
3.20% phonebook_orig libc-2.23.so [.] __strcpy_sse2_unaligned
3.00% phonebook_orig libc-2.23.so [.] malloc
2.92% phonebook_orig [kernel.kallsyms] [k] clear_page_c_e
2.32% phonebook_orig [kernel.kallsyms] [k] free_pcppages_bulk
2.19% phonebook_orig [kernel.kallsyms] [k] _raw_spin_lock
1.71% phonebook_orig phonebook_orig [.] append
1.52% phonebook_orig [kernel.kallsyms] [k] __pagevec_lru_add_fn
1.44% phonebook_orig libc-2.23.so [.] memchr
1.31% phonebook_orig [kernel.kallsyms] [k] __rmqueue
0.94% phonebook_orig [kernel.kallsyms] [k] handle_mm_fault
0.77% phonebook_orig libc-2.23.so [.] __strcasecmp_avx
0.76% phonebook_orig [kernel.kallsyms] [k] __mod_node_page_state
0.76% phonebook_orig [kernel.kallsyms] [k] release_pages
可以看出在 findName 和 strcasecmp_l_avx 的 overhead 較高,可以修改函式的內容以提高效能
typedef struct __PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
detail *data;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
size of entry : 32 bytes
execution time of append() : 0.070914 sec
execution time of findName() : 0.002730 sec
Performance counter stats for './phonebook_opt' (100 runs):
1,733,721 cache-misses # 70.045 % of all cache refs ( +- 0.19% )
2,475,146 cache-references ( +- 0.61% )
244,622,619 instructions # 1.86 insn per cycle ( +- 0.02% )
131,384,158 cycles ( +- 0.67% )
0.050672494 seconds time elapsed ( +- 1.72% )
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
unsigned int hash = 0;
while (*str)
{
hash = hash * seed + (*str++);
}
return (hash & 0x7FFFFFFF);
}
Performance counter stats for './phonebook_hash' (100 runs):
1,071,269 cache-misses # 41.564 % of all cache refs ( +- 0.70% )
2,577,372 cache-references ( +- 0.53% )
244,040,130 instructions # 1.51 insn per cycle ( +- 0.02% )
161,480,391 cycles ( +- 0.30% )
0.061633582 seconds time elapsed ( +- 1.78% )