2018q1 Homework1 (phonebook) ========================== contributed by < chasingfar > :::warning 請務必依循作業要求,把上面那行寫好!從細節就做好。 :notes: jserv ::: >改這樣嗎? >[name=chasingfar] ### Reviewed by `0922james0922` * 在紅黑樹的部份,可以再加強一些改善動機或是想法的細節,而不是直接跑出結論跟實驗結果,中間過程可以再加詳述。 * 兩種方法的append的改善都不佳,可以試想想有沒有改善append的方法而不是只考量findname好就好 * github上的opt的程式是否沒上傳成功,沒看到改寫過後的版本 [GitHub](https://github.com/chasingfar/phonebook) >每一個部份都個別放在branch裡 >[cache-test](https://github.com/chasingfar/phonebook/tree/cache-test) >[hash-test](https://github.com/chasingfar/phonebook/tree/hash-test) >[red-black-tree-test](https://github.com/chasingfar/phonebook/tree/red-black-tree-test) >[name=chasingfar] ### Reviewed by `YuanHaoHo` * 建議可用探討在使用各種方法後的成效,而非將成果丟出後不進行討論。 * 中英文間記得加空白喔~ ----- 系統環境 ---- ``` $ uname -a Linux chasingjar-V5-591G 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ lscpu Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數:2 每通訊端核心數:4 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 94 Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz 製程: 3 CPU MHz: 800.007 CPU max MHz: 3500.0000 CPU min MHz: 800.0000 BogoMIPS: 5183.88 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 6144K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp $ free total used free shared buff/cache available Mem: 8029896 2599920 3639168 386196 1790808 4751816 置換: 3999740 0 3999740 ``` 增加entry size利用率 --- - 想法: 鍵值分別存,以避免讀入無用資料進cache 增加一次宣告的記憶體大小,一個entry放盡可能多筆資料 - 優點: findName()時的比對近乎線性,應能減少cache miss - 缺點: 鍵值分別需要兩個strcpy,增加append()時間 一次宣告的記憶體太大時會浪費空間 - 實測 嘗試改變一個entry裡的資料數(opt後的數字),但似乎影響不大 ![](https://i.imgur.com/07aswXv.png) ``` Performance counter stats for './phonebook_orig' (100 runs): 4,456,791 cache-misses # 91.648 % of all cache refs ( +- 0.19% ) 4,862,924 cache-references ( +- 0.13% ) 261,860,427 instructions # 1.44 insns per cycle ( +- 0.02% ) 182,196,930 cycles ( +- 0.11% ) 0.056528421 seconds time elapsed ( +- 0.70% ) Performance counter stats for './phonebook_opt_256' (100 runs): 1,814,972 cache-misses # 73.200 % of all cache refs ( +- 0.16% ) 2,479,458 cache-references ( +- 0.16% ) 295,179,271 instructions # 1.78 insns per cycle ( +- 0.03% ) 165,662,413 cycles ( +- 0.08% ) 0.050684667 seconds time elapsed ( +- 0.17% ) Performance counter stats for './phonebook_opt_512' (100 runs): 1,723,911 cache-misses # 71.551 % of all cache refs ( +- 0.14% ) 2,409,340 cache-references ( +- 0.23% ) 294,702,925 instructions # 1.79 insns per cycle ( +- 0.03% ) 164,437,002 cycles ( +- 0.07% ) 0.050525250 seconds time elapsed ( +- 0.13% ) Performance counter stats for './phonebook_opt_1024' (100 runs): 1,673,682 cache-misses # 70.257 % of all cache refs ( +- 0.13% ) 2,382,239 cache-references ( +- 0.24% ) 294,768,040 instructions # 1.79 insns per cycle ( +- 0.03% ) 164,291,731 cycles ( +- 0.08% ) 0.050583163 seconds time elapsed ( +- 0.16% ) Performance counter stats for './phonebook_opt_2048' (100 runs): 1,650,038 cache-misses # 69.371 % of all cache refs ( +- 0.13% ) 2,378,586 cache-references ( +- 0.35% ) 294,483,815 instructions # 1.80 insns per cycle ( +- 0.03% ) 163,949,313 cycles ( +- 0.08% ) 0.050560016 seconds time elapsed ( +- 0.14% ) Performance counter stats for './phonebook_opt_4096' (100 runs): 1,638,272 cache-misses # 68.202 % of all cache refs ( +- 0.12% ) 2,402,086 cache-references ( +- 0.32% ) 294,757,134 instructions # 1.80 insns per cycle ( +- 0.03% ) 163,483,033 cycles ( +- 0.08% ) 0.050201784 seconds time elapsed ( +- 0.16% ) ``` 使用 hash function --- - 想法: 使用 djb2hash function 壓縮字串資料量加快比對速度 - 優點: hash 後就只是 long int 比對速度極快 - 缺點: hash會增加構建時間 - 實測 註1: djb2hash function 來自[http://www.cse.yorku.ca/~oz/hash.html](http://www.cse.yorku.ca/~oz/hash.html) ![](https://i.imgur.com/6Kj95vW.png) ``` Performance counter stats for './phonebook_orig' (100 runs): 4,757,940 cache-misses # 93.920 % of all cache refs ( +- 0.16% ) 5,065,927 cache-references ( +- 0.26% ) 261,406,649 instructions # 1.37 insns per cycle ( +- 0.02% ) 190,460,703 cycles ( +- 0.27% ) 0.063316735 seconds time elapsed ( +- 0.53% ) Performance counter stats for './phonebook_opt_256' (100 runs): 1,696,705 cache-misses # 76.264 % of all cache refs ( +- 0.17% ) 2,224,792 cache-references ( +- 0.29% ) 278,638,415 instructions # 1.64 insns per cycle ( +- 0.02% ) 169,751,759 cycles ( +- 0.18% ) 0.055284318 seconds time elapsed ( +- 0.23% ) Performance counter stats for './phonebook_opt_512' (100 runs): 1,587,984 cache-misses # 72.650 % of all cache refs ( +- 0.27% ) 2,185,787 cache-references ( +- 0.56% ) 278,485,265 instructions # 1.65 insns per cycle ( +- 0.02% ) 169,072,254 cycles ( +- 0.14% ) 0.055322586 seconds time elapsed ( +- 0.22% ) Performance counter stats for './phonebook_opt_1024' (100 runs): 1,538,045 cache-misses # 72.220 % of all cache refs ( +- 0.16% ) 2,129,653 cache-references ( +- 0.32% ) 278,190,421 instructions # 1.65 insns per cycle ( +- 0.02% ) 168,256,201 cycles ( +- 0.13% ) 0.054859117 seconds time elapsed ( +- 0.22% ) Performance counter stats for './phonebook_opt_2048' (100 runs): 1,512,210 cache-misses # 71.419 % of all cache refs ( +- 0.18% ) 2,117,363 cache-references ( +- 0.41% ) 278,144,942 instructions # 1.66 insns per cycle ( +- 0.02% ) 167,913,330 cycles ( +- 0.11% ) 0.054865124 seconds time elapsed ( +- 0.23% ) Performance counter stats for './phonebook_opt_4096' (100 runs): 1,495,944 cache-misses # 71.578 % of all cache refs ( +- 0.14% ) 2,089,945 cache-references ( +- 0.25% ) 278,056,698 instructions # 1.66 insns per cycle ( +- 0.02% ) 167,666,340 cycles ( +- 0.10% ) 0.054465060 seconds time elapsed ( +- 0.15% ) ``` 使用紅黑樹 --- :::danger 闡述動機和考量點,以及實務上你在什麼應用看到 red-black tree 呢? :notes: jserv ::: >因為第一反應這像是個 Map 結構,而 Java 的 TreeMap 就是用 red-black tree 實做的 >所以嘗試自己平衡 binary-search tree 失敗後就先找了 red-black tree 的資料 >另外實現似乎還有問題,我還沒確定問題所在,所以一直沒維護這邊 >[name=chasingfar] - 想法: 用二元搜尋樹減少比對次數 - 優點: 查詢速度極快 - 缺點: 構建時間極長 - 實測 註1:由於構建時間過長字典資料改為 all-names.txt 註2:由於查詢速度過快改為測試查詢所有資料 註3:紅黑樹的代碼來自[Wikipedia](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) ![](https://i.imgur.com/Oy5qoSH.png) ``` Performance counter stats for './phonebook_orig' (100 runs): 7,565,749 cache-misses # 1.952 % of all cache refs ( +- 4.17% ) 387,559,508 cache-references ( +- 0.34% ) 5,751,954,460 instructions # 2.93 insns per cycle ( +- 0.00% ) 1,965,321,160 cycles ( +- 0.36% ) 0.644600121 seconds time elapsed ( +- 0.48% ) Performance counter stats for './phonebook_opt' (100 runs): 154,832 cache-misses # 8.596 % of all cache refs ( +- 2.37% ) 1,801,256 cache-references ( +- 7.61% ) 146,623,518 instructions # 0.98 insns per cycle ( +- 0.01% ) 149,276,315 cycles ( +- 0.22% ) 0.049843064 seconds time elapsed ```