2017q3 Homework1 (phonebook)

# 2017q3 Homework1 (phonebook) contributed by < `ian910297` > ## TODO * 使用 perf 比較前後差異 * * 索引演算法實做 * [sorted linked list to banlanced bst](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) * 打散資料，實際讓演算法重頭建立索引 ## 系統環境 ``` $> lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz Stepping: 3 CPU MHz: 799.987 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 BogoMIPS: 6816.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp ``` ## 事前準備 ### 程式碼難以維護在優化的過程發現，當測試版本越來越多時，程式碼裡會充斥大量的 ifdef ，尤其是 `main.c` 的檔案在程式結構上最為複雜，... ```clike= #ifdef OPT XXXXX #endif #if OPT==1 XXXXX #elif OPT==2 XXXXX #endif ``` ### 導入亂數資料最開始時，使用 python 的 faker 產生產生亂數名字資料 ```python= from faker import Faker f = open('data.txt', 'w') fake = Faker() for _ in range(10000): f.write(fake.name()) f.close() ``` 發現導入亂數資料對於分析並沒有什麼幫助，決定先依據英文名字長度為指標來產生資料 ## 優化程式 ### 簡化資料結構 :::info 觀察程式執行可發現只有使用到部分資料結構， lastName 的部分，那麼其他部分是可以被刪減的，以減少記憶體使用量，及加速執行速度 ::: 原先結構 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 優化結構 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 實驗結果 >> 文字訊息不要用圖片表示！ >> [name="jserv"][color=red] 從上圖觀察到，簡化資料結構之後， cache-misses cache-references instructions cycles 都有了大幅的減少，但目前我還不知道如何觀察 perf report 所產生的 perf.data 讓兩者去做更深入的比較，只能附上 cache-test 之結果示意 ![](https://i.imgur.com/HS8YzEK.png) --- ### Memory Pool :::info 猜測優化是因為 malloc 大小改變造成，既然大小改變會影響效能，那減少 malloc 的次數是否也會造成影響？一次性 malloc 大量記憶體，也就是 memory pool 的概念 ::: 最初的實作 ``` ``` * ### 加速讀取速度 - mmap ```clike= // get file size struct stat st; stat(filepath, &st); int fd = open(filepath, O_RDWR); // mapping file to virtual memory char *map; map = mmap(0, st.st_size+1, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); map[st.st_size] = '\0'; while(*map!='\0') { i = 0; while(*map!='\n') { line[i++] = *map++; } line[i] = '\0'; e = append(line, e); map++; } ``` ![mmap](https://i.imgur.com/gQL4uQR.png) ![mmap + simplify data structure](https://i.imgur.com/u1a9qco.png) mmap 分析 :::info ::: ### 加速檢索速度 - BST 剛開始時，發現建樹時間使用了非常長的時間，才想起這是一個非常不平衡的BST 但觀察資料可以發現，檔案裡面的人名是已經排序過的