2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by < `claaaaassic` > ### Reviewed by `chenweiii` - 建議可以嘗試引入 memory pool，append() 的時間會更加漂亮 - 建議可以把 hash table 的 entry struct 湊成近 cache block 的大小，在我的實驗裡 cache misses 下降了約 7% - 我認爲透過字母統計去比較兩份姓名分佈是否相似，是有疑慮的，更何況不同的 hash function 對各個字母的加權值又有所不同。或許透過同一 hash function，然後去觀察兩份姓名的 hash value 分佈，是一個可以進行的方向。 ## 準備 GNU/Linux 開發工具 * 安裝 ubuntu 16.04 LTS ``` $ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：2 每通訊端核心數：2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 42 Model name: Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz 製程： 7 CPU MHz： 809.295 CPU max MHz: 2900.0000 CPU min MHz: 800.0000 BogoMIPS: 4589.19 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K NUMA node0 CPU(s): 0-3 ``` * 安裝開發工具 ``` $ sudo apt-get update $ sudo apt-get install build-essential $ sudo apt-get install linux-tools-common linux-tools-generic $ sudo apt-get install cppcheck astyle colordiff gnuplot ``` ## 學習使用 Git 與 GitHub * linux 產生 ssh key 貼到 github 個人設定裡面 * 更改 git config --global 的 user.name user.email * clone 這次作業 ```$ git clone https://github.com/claaaaassic/phonebook.git``` ### **我沒有認真讀文章...** 於是 : [實在是太經典了](https://www.facebook.com/groups/system.software2017/permalink/1322345671164132/) > 當然你可能只是想說「我沒有這麼崇高的目標，我來大學只是為了混文憑」，既然如此，拜託不要讀成大，重考去台大吧，後者的文憑才稍為有點強度在我 push 兩個中文的 commit 後就被抓出來鞭了，趕緊把中文改成國際化語言**英文**，因為我的兩個 commit 就算是同一個實驗功能，用這個機會把他們合併起來，加入英文 commit ```shell $ git log $ git rebase HEAD~2 最新的 commit 前方改成 suqash 他會把前方有 s 的 commit 與他前一個紀錄進行合併因為我是要兩個合併成一個，所以只需要修改一個就好儲存後會出現 vim 編輯視窗，在這裡就可以把合併完的英文 commit 輸入進去囉結束後要強制把遠端的紀錄改成新的 $ git push --force origin master ``` 看完老師推薦閱讀的文章 [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/)，下面說明一下我閱讀完推薦文章後改的 commit message ![](https://i.imgur.com/A4BhH7n.png) ### How to Write a Git Commit Message 首先是推薦文章內的重點：好的 commit message 的七個規則 The seven rules of a great Git commit message 1. Separate subject from body with a blank line 標題與內文以一個空白行分隔 > 這樣可以在使用 $ git log 閱讀時叫容易分辨出哪行是標題 2. Limit the subject line to 50 characters 標題字數最多 50 > 因為標題字數超過 50 在 github 上會被隱藏到 ... 裡面 > 需要額外點開才能完整知道這個 commit 在做什麼 > 超過 50 字可能就不是一個精簡的標題了 3. Capitalize the subject line 標題首字大寫 4. Do not end the subject line with a period 標題結尾不要句點 5. Use the imperative mood in the subject line 標題用命令口吻 6. Wrap the body at 72 characters 內文一行字數不超過 72 > 這樣在 $ git log 裡面看會比較清楚，排版比較好看 7. Use the body to explain what and why vs. how 內文說明是什麼、為什麼而不是如何做 > 因為看 commit 的人是想知道你為什麼要做這件事，而不是要知道你如何做這件事的一開始很難想要怎麼寫別人才能一目了然，應該需要練習才能寫的更好吧 ! 想到更好的就可以改，等到別人發現你可笑的 commit 時就已經來不及了QQ ## 學習效能分析工具 perf * 安裝 perf ```$ sudo apt-get install linux-tools-common``` ```$ perf list``` ```$ perf top``` 這裡出現我的權限預設值是 3 還蠻奇怪的，==還沒查到為什麼==，正常應該要是 1 最後我還是用這個指令改成正常的預設值 1 ```$ sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'``` ![](https://i.imgur.com/1o5pSvA.png) 接著繼續下去，使用 perf 測試 perf_top_example.c ![](https://i.imgur.com/VEqGcNO.png) ## 研究軟體最佳化 ### Makefile 這裡不太懂 ```$ make```為什麼可以打開 Makefile 依照裡面的規則編譯檔案，找了一下 [Makefile 概念與基礎I](https://dywang.csie.cyut.edu.tw/dywang/linuxProgram/node49.html) * 當執行 make 時，make 會在當時的目錄下搜尋文字檔 makefile ( or Makefile )，其記錄了原始碼如何編譯的詳細資訊。 * make 使用 makefile 的好處： * 簡化編譯時所需要下達的指令 * 若在編譯完成之後，修改了某個原始碼檔案，則 make 僅會針對被修改了的檔案進行編譯，其他的 object file 不會被更動 * 最後可以依照相依性來更新( update )執行檔。 > 以下測試都有先清除既有 cache ```$ echo 1 | sudo tee /proc/sys/vm/drop_caches``` ### 原始程式 ``` $ ./phonebook_orig size of entry : 136 bytes execution time of append() : 0.057314 sec execution time of findName() : 0.005804 sec $ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig Performance counter stats for './phonebook_orig' (100 runs): 2,018,903 cache-misses # 91.012 % of all cache refs ( +- 0.44% ) 2,218,282 cache-references ( +- 0.46% ) 262,352,481 instructions # 1.18 insn per cycle ( +- 0.02% ) 221,787,124 cycles ( +- 1.24% ) ``` size of entry : 136 bytes cache-misses : 91.012 % 參考 [cache 原理和實驗](https://hackmd.io/s/S14L26-_l#) - cache 總大小是 32 KB - 一個 cache 的 block 為 64 B - 一個 entry 的大小為 (136 B) + memory control block(8 B) = 144 B 一個 entry 需要用到的 block 數 144 B / 64 B = 2.25 所以需要用到 3 個 block 如果縮小 entry 的大小應該可以增加 cache 存放 entry 的數量，減少 cache miss 的數量 ### 優化版本 1 : 改寫 struct __PHONE_BOOK_ENTRY 的成員，搬動到新的結構中因為都只用到搜尋 last_Name 的功能，所以就不先分配 detail 的記憶體，以後需要再加上去 ```shell typedef struct __PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detail *detail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` size of entry 從 **136 bytes** 降到 **32 bytes** cache-misses 可以從 **91%** 降到 **63%** ``` $ ./phonebook_opt size of entry : 32 bytes execution time of append() : 0.046207 sec execution time of findName() : 0.003105 sec $ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig Performance counter stats for './phonebook_opt' (100 runs): 342,113 cache-misses # 62.647 % of all cache refs ( +- 0.29% ) 546,093 cache-references ( +- 2.10% ) 244,702,580 instructions # 1.11 insn per cycle ( +- 0.02% ) 221,244,338 cycles ( +- 0.89% ) ``` cache miss 減少，在時間上也看得出有減少 ![](https://i.imgur.com/01TEBxV.png) ### 優化版本 2 : 使用 hash function 來加速查詢使用 hash 加速搜尋時間，雖然 append()的時間會變長，但是可以大幅降低 findName() 的時間這邊參考 [各种字符串Hash函数比较](https://www.byvoid.com/zhs/blog/string-hash-compare) 使用 BKDR hash function - table size 參考 [Programming Small](https://hackmd.io/s/S1rbwmZ6#0) 設為 42737 - seed 設為 131 cache miss 從 **63%** 降到 **55%** findName() 的時間也降到 **0.000000 sec** ```shell size of entry : 32 bytes execution time of append() : 0.057896 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_hash' (100 runs): 355,920 cache-misses # 54.538 % of all cache refs ( +- 0.36% ) 652,609 cache-references ( +- 0.45% ) 236,541,025 instructions # 1.46 insn per cycle ( +- 0.02% ) 161,652,254 cycles ( +- 0.34% ) ``` ![](https://i.imgur.com/5Hrb7cx.png) 圖表可以看出改用 hash function 後大部分時間都花在 bkdrHash 上面，append 不到他的一半 ```shell Samples: 89 of event 'cycles', Event count (approx.): 561473 Overhead Shared Object Symbol 28.90% phonebook_hash [.] bkdrHash 13.00% libc-2.23.so [.] _IO_fgets 11.25% phonebook_hash [.] append 10.75% phonebook_hash [.] main 7.50% libc-2.23.so [.] _int_malloc 5.64% libc-2.23.so [.] __memcpy_sse2 4.12% libc-2.23.so [.] __strcpy_sse2_unaligned 3.17% [kernel] [k] release_pages 3.15% libc-2.23.so [.] _IO_getline_info 2.83% libc-2.23.so [.] memchr 2.15% [kernel] [k] unmap_page_range 1.11% phonebook_hash [.] strcpy@plt 1.07% [kernel] [k] __mod_zone_page_state 1.07% [kernel] [k] __fsnotify_parent 1.07% libc-2.23.so [.] malloc 1.06% [kernel] [k] try_charge 1.04% [kernel] [k] _raw_spin_lock 1.03% [kernel] [k] clear_page 0.08% [kernel] [k] flush_smp_call_function_queue 0.00% [kernel] [k] native_write_msr ``` ### table size 與效能的關係接下來想來研究一下 table size 到底該設為多大的質數才對呢選了幾個萬級的質數來測試一下，結果在 50000 左右的質數效能最好，append() 時間最短以及 cache-misses 比例最小第一次測試 ![](https://i.imgur.com/LgtSCIi.png) 第二次測試 ![](https://i.imgur.com/HmT7zzn.png) #### 看來應該是介於 40000 到 80000 之間的質數比較適合而兩次的測試 append 時間跟 table size 沒有一定的關係 ![第一次](https://i.imgur.com/9Na8uPA.png) ![第二次](https://i.imgur.com/qiueZCw.png) ### 指定問題 * 有代表性嗎？跟真實英文姓氏差異大嗎？ * 不具代表性，參考[維基百科：英語姓氏](https://zh.wikipedia.org/wiki/%E8%8B%B1%E8%AF%AD%E5%A7%93%E6%B0%8F)真實英文姓名是有固定來源而非隨機輸入組合而成，在比較早期還不是大部分人不識字時，登錄姓氏是以發音取相似可行的姓氏，而非使用隨機的英文排列 * 分析一下我們所使用的 words.txt 他所有字母出現個數下去做統計圖，可以看得出他不是亂數使用字母組合而成的 ![](https://i.imgur.com/huMTgsZ.png) * 接下來使用 dictionary/all-names.txt 來觀察個數的統計圖，可以看出 words.txt 這份資料使用的字母數量其實跟真實姓名分佈相近的，在整體比例上來說，差異不大，但是裡面單獨每一個的姓名與真實姓名卻沒有完全一樣(有待做出分析圖) ![](https://i.imgur.com/ghMQ2Ro.png) * 資料難道「數大就是美」嗎？ * 首先資料要先符合實際情況，根據存在比例不同的每個姓氏數量不同，姓氏數量也要齊全，在符合這些情況下，大量的資料可以消除較少使用或已無人使用的姓氏，符合真實情況， * In the United States, 1,712 surnames cover 50% of the population，參考[維基](https://en.wikipedia.org/wiki/Surname) ![](https://i.imgur.com/M8EemSS.jpg) * 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ * 在有一個有代表性的資料下，使用隨機抽樣的方法 ### 疑問最後想要弄清楚一件事，[cache 原理和實驗](https://hackmd.io/s/S14L26-_l#)上面計算 cache 的方式是錯的 * 條件： * cache 總大小是 32 KB * 一個 cache 的 block 為 64 B * 一個 entry 的大小為 (136 B) + memory control block(8 B) = 144 B * 8-way set associative * 計算： * **有多少 block?** (我考慮 main.c findName() & append 這 2 個 function 因為串練結跟搜尋存取花最多時間) 350000 筆資料 * 每筆 144 B = 5040000 B 5040000 B / 64 B * 2 次 function = 1575000 block 讀一個 block 是一次 ref * **cache 有幾個 set？** cache 32 KB/block 64Byte = 512 個block 512 / 8-way set = 64 個 * **一個 cache line 可以存多少筆 entry?** 144 B / 64 B = 2.25 所以是 3 個 block 已知是 8-way set 所以可以存 2 個 * **cache miss** 1575000 (總 block)/ 3 (一個 entry 需要 3 block) = 525000 組 entry 525000 / 64 (set 數) = 8203.125 (填到相同 set 的個數) 由上知一個 cache line 最多可以放 2 組 entry，所以有兩次機會 8203.125 / 2 = 4101.5625 (可能被對到次數) 4101.5625 * 64 (set 數)* 3 (entry 佔的 block 數) = 797500 (cache hit 次數) 1574000 - 787500 = 786500 (cache miss) 786500 / 1574000 = 50% (miss rate) 怪怪的，不知道怎麼正確的算出來 ### code review 後的修改在 review 完 [bananahuang](https://hackmd.io/GwTgJgxsBMIMwFoBGAOOEEBY5IKYIEM4BWJBOOEFAM2pCUxFwAYg#)的共筆之後，看到自己還未 free 整個 linked-list，所以修改一下程式。 * 首先參考 bananahuang 使用 valgrind 檢查記憶體的使用情況 `$ valgrind --leak-check=full ./phonebook_opt` ``` ==5403== HEAP SUMMARY: ==5403== in use at exit: 11,196,768 bytes in 349,899 blocks ==5403== total heap usage: 349,906 allocs, 7 frees, 11,207,152 bytes allocated ``` 這邊可以看出我總共配置 349906 次記憶體以及釋放 7 次接著在 main.c 更改程式，從原本只清除兩個節點，改成清除整個 list ```shell while(pHead->pNext) { entry *next = pHead->pNext; pHead->pNext = next->pNext; free(next); } free(pHead); ``` 成功釋放所有用完的記憶體 ``` ==5915== HEAP SUMMARY: ==5915== in use at exit: 0 bytes in 0 blocks ==5915== total heap usage: 349,906 allocs, 349,906 frees, 11,207,152 bytes allocated ==5915== ==5915== All heap blocks were freed -- no leaks are possible ``` 在 hash 的部份另外加了一個函式釋放記憶體，最後也成功釋放所有記憶體。 ``` void freeHashTable(hashTable *ht) { for(int i=0; i<HASHTABLE_SIZE; i++) { entry *e = ht->list[i]; if(!e) { free(e); continue; } while (e->pNext) { entry *next = e->pNext; e->pNext = next->pNext; free(next); } free(e); } free(ht->list); free(ht); } ``` 這邊我卡一小段時間，不知道為什麼沒有加判斷這個節點是不是 null 的時候會只有 14,362 frees，但最後加上去還是成功釋放完畢 ``` if(!e) { free(e); continue; } ``` ## 參考文件 [perf 原理和實務](https://hackmd.io/s/B11109rdg#) [dada 的共筆](https://embedded2016.hackpad.com/ep/pad/static/YkqjhwgnQcA) [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) [Makefile 概念與基礎I](https://dywang.csie.cyut.edu.tw/dywang/linuxProgram/node49.html) [cache 原理和實驗](https://hackmd.io/s/S14L26-_l#) [Programming Small](https://hackmd.io/s/S1rbwmZ6#0) [各种字符串Hash函数比较](https://www.byvoid.com/zhs/blog/string-hash-compare) [Hsiang 的共筆](https://hackmd.io/s/HylfV2FFe#) 他提到可以使用 perf 分析 cache-misses 發生的地方有機會可以使用看看 ```shell ⚡ sudo perf record -e cache-misses ./phonebook_orig ⚡ sudo perf report ```