contributed by <zmke
>
ryanwang522
findName()
的效率。OS: Ubuntu 16.04 LTS
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
Model name: Intel® Core™ i5-4200H CPU @ 2.80GHz
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
這句話不精確,請參照 eecs-370 的說明 Cache,搭配短片 What is cache memory,再來回顧你原本的論述 "jserv"
static random access memory (SRAM) 速度快、成本高
電腦的 main memory 主要是 DRAM ,跟不上 CPU 存取的速度, cache 採用的是 SRAM
CPU 在 cache 找不到需要的資料時,會到 main memory 去找,但 CPU 和 main memory 的速度差了好幾個數量級
Von Neumann bottleneck : main memory 的速度跟不上 CPU 的速度, 造成 CPU performance 的限制
$ dmidecode --type 17 | more
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Transcend
Serial Number: 0009FB43
Asset Tag: 9876543210
Part Number: TS512MSK64W6H
Rank: 1
Configured Clock Speed: 1600 MHz
以我的電腦為例, CPU 時脈是 2.8GHz 但 main memory 時脈只有 1600 MHz
function of the cache
The cache will hold data that we think is most likely to be referenced.
Minimize the average memory access latency
Maximize the number of references that are serviced by the cache
N-Way Set Associative : 避免 Fully Associative 和 Direct Mapped 的缺陷,將 cache 分成 N 個 set
cache miss: 在 cache 找不到需要的資料,需要到 main memory 尋找
檢查電腦的 cache : $sudo dmidecode -t cache | more
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 128 kB
Maximum Size: 128 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Other
Associativity: 8-way Set-associative
中英文字間請已空白隔開
課程助教
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ ./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.066602 sec
execution time of findName() : 0.005839 sec
$ make cache-test
Performance counter stats for './phonebook_orig' (100 runs):
1,025,260 cache-misses # 85.415 % of all cache refs ( +- 0.35% )
1,200,333 cache-references ( +- 0.29% )
262,101,582 instructions # 1.40 insn per cycle ( +- 0.02% )
187,299,573 cycles ( +- 0.14% )
0.057128980 seconds time elapsed ( +- 1.18% )
$ ./phonebook_orig & sudo perf top -p $!
30.72% libc-2.23.so [.] __strcasecmp_l_avx
19.16% phonebook_orig [.] findName
6.63% [kernel] [k] clear_page_c_e
6.52% phonebook_orig [.] main
5.67% libc-2.23.so [.] _IO_fgets
5.30% libc-2.23.so [.] _IO_getline_info
3.71% libc-2.23.so [.] _int_malloc
3.32% phonebook_orig [.] append
2.54% [kernel] [.] native_irq_return_iret
2.46% [kernel] [k] unmap_page_range
2.46% [kernel] [k] free_pcppages_bulk
1.66% [kernel] [k] release_pages
1.65% phonebook_orig [.] strcasecmp@plt
1.35% libc-2.23.so [.] __memcpy_sse2
0.93% libc-2.23.so [.] memchr
0.91% [kernel] [k] do_page_fault
0.86% [kernel] [k] __do_page_fault
0.83% [unknown] [k] 0x00007f29e9ab0e02
0.83% [unknown] [k] 0x00007f29e9ab23c8
0.83% [kernel] [k] page_remove_rmap
0.82% [kernel] [k] __find_get_block
0.82% [kernel] [k] free_hot_cold_page
0.00% [kernel] [k] native_write_msr
typedef struct __ENTRY_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entryDetail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
entryDetail *detail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
$ ./phonebook_opt
不是「沒有用到」,請思考用語的精確性 "jserv"
size of entry : 32 bytes
execution time of append() : 0.066095 sec
execution time of findName() : 0.002160 sec
$ make plot
$ make cache-test
Performance counter stats for './phonebook_opt' (100 runs):
145,400 cache-misses # 46.061 % of all cache refs ( +- 0.34% )
315,669 cache-references ( +- 0.56% )
244,493,450 instructions # 1.90 insn per cycle ( +- 0.02% )
128,986,123 cycles ( +- 0.31% )
0.039869771 seconds time elapsed ( +- 0.62% )
bst *list_to_BST(entry *pHead)
{
int listLen = 0;
entry *cur = pHead;
while(cur) {
listLen++;
cur = cur->pNext;
}
return build(&pHead, listLen);
}
bst *build(entry **pHead, int listLen)
{
if(listLen <= 0)
return NULL;
bst *left = build(pHead, listLen/2);
bst *root = (bst *) malloc(sizeof(bst));
strcpy(root->lastName, (*pHead)->lastName);
root->left = left;
*pHead = (*pHead)->pNext;
root->right = build(pHead, listLen-listLen/2-1);
return root;
}
$ ./phonebook_bst
size of entry : 136 bytes
execution time of append() : 0.066551 sec
execution time of findName() : 0.000000 sec
$ make plot
發現 findName() 的時間明顯降低,但 append() 較前兩個情形稍微上升
$ make cache-test
Performance counter stats for './phonebook_bst' (100 runs):
770,479 cache-misses # 80.189 % of all cache refs ( +- 0.34% )
960,834 cache-references ( +- 1.09% )
359,999,587 instructions # 1.21 insn per cycle ( +- 0.01% )
297,319,379 cycles ( +- 1.15% )
0.092091102 seconds time elapsed ( +- 1.28% )
與原始版本相比, cache miss rate 有些微下降(85.41% -> 80.19%)
$ ./phonebook_bst
size of entry : 32 bytes
execution time of append() : 0.065951 sec
execution time of findName() : 0.000000 sec
$ make plot
append() 的時間較前兩個情況下降了
$ make cache-test
Performance counter stats for './phonebook_bst' (100 runs):
302,300 cache-misses # 65.713 % of all cache refs ( +- 0.19% )
460,034 cache-references ( +- 0.57% )
342,481,904 instructions # 1.59 insn per cycle ( +- 0.01% )
214,799,756 cycles ( +- 0.32% )
0.065648659 seconds time elapsed ( +- 0.39% )
同時使用 bst 和較小的 entry , cache miss rate 較只用 bst 明顯下降
valgrind :可以用來檢查程式記憶體
$ valgrind --leak-check=yes ./phonebook_orig
修正 main.c 最後的 release memory 的部份
while(pHead) {
entry *tmp = pHead;
pHead = pHead->pNext;
free(tmp);
}
valgrind 結果
==9712== All heap blocks were freed -- no leaks are possible
void releaseTree(bst *root)
{
if(root == NULL) return;
releaseTree(root->left);
releaseTree(root->right);
free(root);
}
valfrind 結果
==9716== All heap blocks were freed -- no leaks are possible
Cache
von Neumann bottleneck
What is cache memory – Gary explains
Binary Tree
C Binary Tree with an Example C Code
ktvexe 筆記
twzjwang 筆記
檢查程式記憶體的小工具-valgrind