sysprog2017
phonebook
contributed by <stanleytazi
>
etc276
這次 commit message 的 title 是 "Replace single linked-list by hash table" ,但內文並沒有具體補充說明 hash 後會比 single linked-list 優化的原因(看完內容沒有比看完標題時更懂這次 commit 的理由,以及更改後的優缺點)
在 Hackmd 上放 code 時,盡量都使用 clike =
,以便使用者閱讀時知道行數(例如 memory pool 那邊),也較好提出建議,開發環境的內容也需要排版。
後面有張圖顯示優化後的 append() 時間並沒有下降,我猜測原因是 hash table size 的選擇,可以參考這篇和共筆,選擇特定數量級的質數以優化
指令 $ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 69
Model name: Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz
製程: 1
CPU MHz: 1400.312
CPU max MHz: 2700.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.20
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
中英文字間請以空白隔開!
課程助教
指令 $ cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
得到 64(bytes? 猜測,待文件或是實驗查證)
在補充資料中皆有提到目前主流是64B
$ make plot
可以看到 cache-misses 的比例很高,首要目標降低 cache-misses rate
Performance counter stats for './phonebook_orig' (100 runs):
1,024,254 cache-misses # 90.150 % of all cache refs
1,140,377 cache-references
263,685,295 instructions # 1.48 insns per cycle
178,278,149 cycles
0.067626164 seconds time elapsed ( +- 0.13% )
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
phonebook_opt.h
typedef struct {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entry_detail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
entry_detail *info;
} entry;
與 orig 版本 append()&findName() 執行時間比較
phonebook_opt 的 cache使用狀況
Performance counter stats for './phonebook_opt' (100 runs):
114,443 cache-misses # 29.189 % of all cache refs
397,487 cache-references
245,583,148 instructions # 2.00 insns per cycle
122,015,515 cycles
0.047291252 seconds time elapsed ( +- 0.26% )
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131;
unsigned int hash = 0;
while (*str) {
hash = hash * seed + (*str++);
}
return (hash & 0x7FFFFFFF);
}
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
unsigned int hashVal;
entry_detail *info;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
#define MAX_HASH_TABLE_SIZE 1024
entry *findName(char lastName[], entry *pHead)
{
unsigned int hVal = BKDRHash(lastName);
unsigned char hashTableIndex = hVal % MAX_HASH_TABLE_SIZE;
pHead = hashTable[hashTableIndex];
while (pHead != NULL) {
if (pHead->hashVal == hVal &&
strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
unsigned char hashTableIndex ;
/* allocate memory for the new entry and put lastName */
entry *hashEntry = (entry *) malloc(sizeof(entry));
strcpy(hashEntry->lastName, lastName);
hashEntry->hashVal = BKDRHash(lastName);
hashTableIndex = hashEntry->hashVal % MAX_HASH_TABLE_SIZE;
if(hashTable[hashTableIndex]) {
hashEntry->pNext = hashTable[hashTableIndex];
}
hashTable[hashTableIndex] = hashEntry;
return hashEntry;
}
# Samples: 528 of event 'cycles'
# Event count (approx.): 131645130
#
# Overhead Command Shared Object Symbol
# ........ ............. ................. ......................................
#
24.16% phonebook_opt phonebook_opt [.] BKDRHash
18.10% phonebook_opt phonebook_opt [.] main
10.62% phonebook_opt libc-2.21.so [.] _int_malloc
9.17% phonebook_opt phonebook_opt [.] append
8.80% phonebook_opt libc-2.21.so [.] _IO_fgets
5.28% phonebook_opt libc-2.21.so [.] _IO_getline_info
4.85% phonebook_opt libc-2.21.so [.] __strcpy_sse2_unaligned
3.48% phonebook_opt libc-2.21.so [.] malloc
2.31% phonebook_opt libc-2.21.so [.] __memcpy_sse2
1.52% phonebook_opt [kernel.kallsyms] [k] page_fault
1.29% phonebook_opt [kernel.kallsyms] [k] clear_page_c_e
1.16% phonebook_opt libc-2.21.so [.] memchr
1.09% phonebook_opt [kernel.kallsyms] [k] _raw_spin_lock
entry
裡又多了一個hashVal
element,使得每次每一條cache line所能儲存的entry個數減少Performance counter stats for './phonebook_opt' (5 runs):
132,277,864 cycles
236,245,171 instructions # 1.80 insns per cycle
143,696 cache-references
68,468 cache-misses # 44.361 % of all cache refs
5,047,187 bus-cycles
0.057255147 seconds time elapsed ( +- 9.29% )
malloc()
產生的負擔,實驗進行在下個項目initFreeEntryPool()
和 allocEmptyEntry()
// create a memory pool
void initFreeEntryPool(unsigned int freeBufferNum)
{
int i,j;
entry *freeEntry = NULL;
for(i=0; i < MAX_HASH_TABLE_SIZE; i++) {
freeEntry = (entry *) malloc(sizeof(entry));
freePool[i] = freeEntry;
for(j=1; j < freeBufferNum; j++) {
if(freeEntry) {
freeEntry->pNext = (entry *) malloc(sizeof(entry));
freeEntry = freeEntry->pNext;
freeEntry->pNext = NULL;
}
}
}
}
//allocate a free entry
entry *allocEmptyEntry(unsigned int tableIndex)
{
entry *output = NULL;
if(tableIndex < MAX_HASH_TABLE_SIZE) {
output = freePool[tableIndex];
if(output)
freePool[tableIndex] = output->pNext;
}
return output;
}
Performance counter stats for './phonebook_opt' (100 runs):
305,065 cache-misses # 67.835 % of all cache refs
431,262 cache-references
292,667,383 instructions # 1.46 insns per cycle
199,296,907 cycles
0.078124204 seconds time elapsed ( +- 0.24% )
執行時間比較
perf top
41.23% phonebook_opt [.] allocEmptyEntry
20.12% phonebook_opt [.] main
11.57% phonebook_opt [.] BKDRHash
7.13% phonebook_opt [.] append
4.33% libc-2.21.so [.] _IO_fgets
3.25% libc-2.21.so [.] _IO_getline_info
2.21% [kernel] [k] page_fault
2.13% libc-2.21.so [.] __memcpy_sse2
1.45% libc-2.21.so [.] malloc
perf record -e cache-refereces,cache-misses -F 10000 ./phonebook_opt
# Samples: 628 of event 'cache-references'
# Event count (approx.): 496297
#
# Overhead Command Shared Object Symbol
# ........ ............. ................. ..................................
#
44.82% phonebook_opt libc-2.21.so [.] _IO_fgets
9.20% phonebook_opt phonebook_opt [.] allocEmptyEntry
6.15% phonebook_opt [kernel.kallsyms] [k] copy_user_enhanced_fast_string
6.15% phonebook_opt phonebook_opt [.] append
4.81% phonebook_opt phonebook_opt [.] main
1.87% phonebook_opt [kernel.kallsyms] [k] clear_page_c_e
1.72% phonebook_opt [kernel.kallsyms] [k] free_pages_and_swap_cache
1.37% phonebook_opt [kernel.kallsyms] [k] _raw_spin_lock_irqsave
1.35% phonebook_opt [kernel.kallsyms] [k] __rmqueue
1.26% phonebook_opt phonebook_opt [.] BKDRHash
1.25% phonebook_opt libc-2.21.so [.] _IO_getline_info
1.03% phonebook_opt [kernel.kallsyms] [k] page_remove_rmap
#
# Total Lost Samples: 0
#
# Samples: 742 of event 'cache-misses'
# Event count (approx.): 331210
#
# Overhead Command Shared Object Symbol
# ........ ............. ................. ...................................
#
61.73% phonebook_opt libc-2.21.so [.] _IO_fgets
7.07% phonebook_opt phonebook_opt [.] append
5.49% phonebook_opt [kernel.kallsyms] [k] copy_user_enhanced_fast_string
2.40% phonebook_opt [kernel.kallsyms] [k] clear_page_c_e
2.36% phonebook_opt phonebook_opt [.] allocEmptyEntry
2.15% phonebook_opt [kernel.kallsyms] [k] __rmqueue
1.54% phonebook_opt [kernel.kallsyms] [k] unmap_page_range
1.52% phonebook_opt [kernel.kallsyms] [k] ext4_discard_preallocations
1.44% phonebook_opt [kernel.kallsyms] [k] free_pcppages_bulk
1.35% phonebook_opt [kernel.kallsyms] [k] __mod_zone_page_state
0.97% phonebook_opt phonebook_opt [.] BKDRHash
allocEmptyEntry()
反而耗費cpu很多執行時間不懂就不懂,不要假文青用「感覺」,理工科系的學生說話要精確。
快去讀書,並且把你的認知貼出來 –jserv
是! 會誠實面對自己!