contributed by<zhanyangch
>
tina0405
Fix bug in opt anddelete executable file
,會讓人不清楚是哪部份的 bug紀錄一下自己在重新 fork 的步驟
首先直接在自己目前的專案下 pull 課程的專案
git pull https://github.com/sysprog21/phonebook
之後會自動進行 merge,如果出現 merge conflit 就必須要處理
git config --global merge.tool kidiff3
git merge
這裡使用 kidiff3 也可以使用其他相同功能的軟體像是 vimdiff,將 conflit 的部份進行修正。
修改完成後記得將程式重新測試一次,如果有問題趕快與原先的程式碼對照,確認之後就像一般 commit、push 的步驟即可。
之前作過的部份,hackmd
lscpu 的資訊
$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 42
Model name: Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz
製程: 7
CPU MHz: 855.421
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 5587.06
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 4096K
NUMA node0 CPU(s): 0-3
$ make
$ make run
size of entry : 136 bytes
execution time of append() : 0.105809 sec
execution time of findName() : 0.007168 sec
echo 3 | sudo tee /proc/sys/vm/drop_caches
$ make plot
$perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig
Performance counter stats for './phonebook_orig' (100 runs):
2,064,274 cache-misses # 91.100 % of all cache refs ( +- 0.12% )
2,265,937 cache-references ( +- 0.11% )
261,177,434 instructions # 1.28 insns per cycle ( +- 0.02% )
204,356,391 cycles ( +- 0.39% )
0.071805217 seconds time elapsed ( +- 2.00% )
可以看出 cache miss 高達 91.1%
$ perf record ./phonebook_orig
$ perf report
Samples: 484 of event 'cycles:pp', Event count (approx.): 212235201
Overhead Command Shared Object Symbol
13.49% phonebook_orig libc-2.23.so [.] __strcasecmp_l_avx
12.59% phonebook_orig phonebook_orig [.] main
10.77% phonebook_orig libc-2.23.so [.] _int_malloc
8.47% phonebook_orig phonebook_orig [.] findName
6.54% phonebook_orig libc-2.23.so [.] __memcpy_sse2
6.24% phonebook_orig libc-2.23.so [.] _IO_fgets
5.74% phonebook_orig libc-2.23.so [.] _IO_getline_info
5.20% phonebook_orig libc-2.23.so [.] __strcpy_sse2_unaligned
4.05% phonebook_orig [kernel.kallsyms] [k] clear_page
3.37% phonebook_orig phonebook_orig [.] append
2.72% phonebook_orig libc-2.23.so [.] memchr
2.47% phonebook_orig [kernel.kallsyms] [k] get_page_from_freelist
2.34% phonebook_orig libc-2.23.so [.] malloc
2.23% phonebook_orig [kernel.kallsyms] [k] native_irq_return_iret
1.75% phonebook_orig [kernel.kallsyms] [k] handle_mm_fault
0.93% phonebook_orig phonebook_orig [.] fgets@plt
0.91% phonebook_orig [kernel.kallsyms] [k] __mod_zone_page_state
0.86% phonebook_orig [kernel.kallsyms] [k] unmap_page_range
0.86% phonebook_orig [kernel.kallsyms] [k] __pagevec_lru_add_fn
0.84% phonebook_orig libc-2.23.so [.] __strcasecmp_avx
0.57% phonebook_orig [kernel.kallsyms] [k] free_hot_cold_page
Overhead 最高的前幾名為 findName 與 __strcasecmp_l_avx 字串比較的部份,
若能夠改善 findName 的效能,對程式整體的速度影響較大。
Cache Information
Socket Designation: L1-Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 32 kB
Maximum Size: 32 kB
Supported SRAM Types:
Other
Installed SRAM Type: Other
Speed: Unknown
Error Correction Type: None
System Type: Unified
Associativity: 8-way Set-associative
typedef struct __DETAILS {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} details;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
details *data;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
Performance counter stats for './phonebook_opt' (100 runs):
382,089 cache-misses # 58.184 % of all cache refs ( +- 0.17% )
656,691 cache-references ( +- 0.36% )
244,028,058 instructions # 1.63 insns per cycle ( +- 0.02% )
149,303,793 cycles ( +- 0.45% )
0.046472370 seconds time elapsed ( +- 1.08% )
Samples: 347 of event 'cycles:pp', Event count (approx.): 156687981
Overhead Command Shared Object Symbol ▒
19.20% phonebook_opt phonebook_opt [.] main ◆
13.37% phonebook_opt libc-2.23.so [.] _int_malloc ▒
12.93% phonebook_opt libc-2.23.so [.] __strcasecmp_l_avx ▒
7.09% phonebook_opt libc-2.23.so [.] _IO_getline_info ▒
6.31% phonebook_opt libc-2.23.so [.] __memcpy_sse2 ▒
6.23% phonebook_opt libc-2.23.so [.] _IO_fgets ▒
5.57% phonebook_opt libc-2.23.so [.] __strcpy_sse2_unaligned ▒
4.48% phonebook_opt phonebook_opt [.] append ▒
4.30% phonebook_opt libc-2.23.so [.] malloc ▒
3.91% phonebook_opt phonebook_opt [.] findName ▒
3.82% phonebook_opt libc-2.23.so [.] memchr ▒
2.02% phonebook_opt [kernel.kallsyms] [k] clear_page ▒
1.31% phonebook_opt [kernel.kallsyms] [k] release_pages ▒
0.84% phonebook_opt [kernel.kallsyms] [k] native_irq_return_iret ▒
0.73% phonebook_opt phonebook_opt [.] strcasecmp@plt ▒
0.69% phonebook_opt [kernel.kallsyms] [k] unmap_page_range ▒
0.58% phonebook_opt [kernel.kallsyms] [k] get_page_from_freelist ▒
0.56% phonebook_opt phonebook_opt [.] strcpy@plt ▒
0.56% phonebook_opt [kernel.kallsyms] [k] handle_mm_fault ▒
0.55% phonebook_opt libc-2.23.so [.] _IO_getline ▒
0.53% phonebook_opt [kernel.kallsyms] [k] __rmqueue.isra.79
#define TABLESIZE 5393
entry **entryArray=NULL;
entry **entryArrayHead=NULL;
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
unsigned int hash = 0;
while (*str){
hash = hash * seed + (*str++);
}
return (hash % TABLESIZE);
}
entry *findName(char lastName[], entry *pHead)
{
unsigned int hashvalue = BKDRHash(lastName);
pHead = *(entryArrayHead+hashvalue);
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
if(entryArray == NULL) {
entryArray = (entry **)malloc((TABLESIZE+1)*sizeof(entry *));
entryArrayHead = (entry **)malloc((TABLESIZE+1)*sizeof(entry *));
}
unsigned int hashvalue = BKDRHash(lastName);
e = *(entryArray+hashvalue);
if(*(entryArrayHead+hashvalue) == NULL) {
e = (entry *) malloc(sizeof(entry));
*(entryArrayHead+hashvalue) = e;
}
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
*(entryArray+hashvalue) = e;
return e;
}
修改 calculate.c、runtime.gp來紀錄資料與繪圖,可以發現由於 hash 版本的 append 時間太短,須另外找方式紀錄,而 append 的時間增加,試著找方法優化 append。
修改 main 和 calculate 即可 double 精度可以到小數點 15 位,diff_in_second 提供 nanosecond (10-9) 精度, 將 findName 的時間顯示 %lf 改為 %.9lf
cache miss 降低至 35%
Performance counter stats for './phonebook_opt_hash' (100 runs):
306,952 cache-misses # 35.750 % of all cache refs ( +- 0.08% )
858,599 cache-references ( +- 0.26% )
242,237,830 instructions # 1.49 insns per cycle ( +- 0.02% )
162,039,957 cycles ( +- 0.28% )
0.064788502 seconds time elapsed ( +- 3.89% )
利用 perf record 分析,發現 hash 的部份為函式熱點
15.79% phonebook_opt_h phonebook_opt_hash [.] BKDRHash ▒
14.84% phonebook_opt_h phonebook_opt_hash [.] main ▒
14.04% phonebook_opt_h phonebook_opt_hash [.] append ◆
9.80% phonebook_opt_h libc-2.23.so [.] _int_malloc ▒
7.94% phonebook_opt_h libc-2.23.so [.] __memcpy_sse2 ▒
7.08% phonebook_opt_h libc-2.23.so [.] _IO_fgets ▒
5.26% phonebook_opt_h libc-2.23.so [.] __strcpy_sse2_unaligned ▒
4.26% phonebook_opt_h libc-2.23.so [.] _IO_getline_info ▒
2.92% phonebook_opt_h libc-2.23.so [.] memchr ▒
2.33% phonebook_opt_h libc-2.23.so [.] malloc
參考ierosodin的共筆
valgrind --leak-check=full ./phonebook_orig
==5682== HEAP SUMMARY:
==5682== in use at exit: 47,586,264 bytes in 349,899 blocks
==5682== total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated
可以看到 allocs 與 frees 不相等
修改 main.c 釋放記憶體的部份
entry *tmp;
while ((tmp = pHead) != NULL) {
pHead = pHead->pNext;
free(tmp);
}
#ifdef HASH
hash_free();
#endif
==5792== HEAP SUMMARY:
==5792== in use at exit: 0 bytes in 0 blocks
==5792== total heap usage: 349,906 allocs, 349,906 frees, 47,596,856 bytes allocated
處理 hash 版本的Memory leak,在 phonebook_opt_hash.c加入
void hash_free(){
int i;
entry *tmp;
entry *pHead;
for(i=0;i<TABLESIZE;i++){
pHead=*(entryArrayHead+i);
while ((tmp = pHead) != NULL) {
pHead = pHead->pNext;
free(tmp);
}
}
if(entryArrayHead) free(entryArrayHead);
if(entryArray) free(entryArray);
}
==6709== HEAP SUMMARY:
==6709== in use at exit: 0 bytes in 0 blocks
==6709== total heap usage: 355,301 allocs, 355,301 frees, 11,466,032 bytes all
gnuplot 語法解說和示範
觀察 Linux 的虛擬記憶體
淺談memory cache
關於 CPU Cache:程序猿需要知道的那些事
hash function 觀念和實務
Hashing原理介紹
Is malloc thread-safe?
在 Linux 中以特定的 CPU 核心執行程式