其實光是說明影片中就有很多線索可以把這個程式改善,例如利用 Hash 或是利用順序關係是可以省下很多搜尋時間。記憶體除了大小外位置也是很重要的,可以了解一下順序。
$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
每核心執行緒數:2
每通訊端核心數:4
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 94
Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
製程: 3
CPU MHz: 808.429
CPU max MHz: 4000.0000
CPU min MHz: 800.0000
BogoMIPS: 6816.61
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 8192K
NUMA node0 CPU(s): 0-7
$ git clone git@github.com:GoblinBear/repository.git
$ git add .
$ git commit -m "my commit"
$ git remote add origin http://github.com/GoblinBear/repository.git
$ git push origin master
$ git reset
:重設工作目錄的索引狀態$ git log
:查詢版本的歷史紀錄$ git status
:查詢當前工作目錄的詳細狀態reset
set ylabel 'time(sec)'
set style fill solid
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
plot [:][:0.150]'output.txt' using 2:xtic(1) with histogram title 'original', \
'' using ($0-0.06):($2+0.001):2 with labels title ' ', \
'' using 3:xtic(1) with histogram title 'optimized' , \
'' using ($0+0.3):($3+0.0015):3 with labels title ' '
set ylabel 'time(sec)'
設定 y 座標軸名稱set title 'perfomance comparison'
設定圖上的標題[:][:0.150]
X 座標軸,Y 座標軸從 0 到 0.150xtic(1)
X 軸的 label 是取自第1個 columnusing 2:xtic(1) with histogram title 'original'
取第2個 column 的數據產生柱狀圖($0-0.06)
label 的 X 軸座標($2+0.001)
label 的 Y 軸座標struct __PHONE_BOOK_ENTRY
的成員,搬動到新的結構中$ make
$ make run
size of entry : 136 bytes
execution time of append() : 0.034903 sec
execution time of findName() : 0.004713 sec
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ make plot
減少entry
空間
只會用到lastName
,所以將其他用不到的結構成員隱藏在detail
中,因此entry
的空間由 136 byte 降到 32 byte
typedef struct __PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
detail *d;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
Cache miss 比較
Performance counter stats for './phonebook_orig' (100 runs):
4,487,968 cache-misses # 90.803 % of all cache refs ( +- 0.13% )
4,942,555 cache-references ( +- 0.08% )
261,554,655 instructions # 1.44 insns per cycle ( +- 0.02% )
181,260,699 cycles ( +- 0.18% )
0.049006945 seconds time elapsed ( +- 0.22% )
Performance counter stats for './phonebook_opt' (100 runs):
1,376,895 cache-misses # 61.605 % of all cache refs ( +- 0.53% )
2,235,047 cache-references ( +- 0.16% )
244,216,012 instructions # 2.07 insns per cycle ( +- 0.02% )
117,889,769 cycles ( +- 0.47% )
0.032283080 seconds time elapsed (+- 0.52% )
執行時間比較
效能分析
L1d cache = 32K,entry
資料變少,cache 可以存更多筆entry
,因此 cache miss 降低
entry
資料為 32 byte,已經小於 64 bytefindName()
執行時間小於10-6秒,所以我把fprintf()
的參數%lf
改成%.9lf
zzzzzzzzzz
,使用long double
精度仍會不夠[ 0] 0.994815802925586024 1.000000000000000000
[ 1] 0.999973124100693638 1.000000000000000000
[ 2] 0.999999860670041444 1.000000000000000000
[ 3] 0.999999999277686036 1.000000000000000000
[ 4] 0.999999999996255382 1.000000000000000000
[ 5] 0.999999999999980587 1.000000000000000000
[ 6] 0.999999999999999899 1.000000000000000000
[ 7] 0.999999999999999999 1.000000000000000000
[ 8] 1.000000000000000000 1.000000000000000000
[ 9] 1.000000000000000000 1.000000000000000000
tag = 1.000000000000000000