# 2017q1 Homework4 (phonebook-concurrent) contributed by <`yangyang95`> ###### tags: `sysprog2017` `HW4` `yangyang95` 開發環境看 [這裡](https://hackmd.io/s/HJ2c_XYKx) 作業要求看 [這裡](https://hackmd.io/s/rkOExTOoe#作業要求) github page 看 [這裡](https://github.com/yangyang95/phonebook-concurrent) ## 程式碼重構 (1) **避免 warning** 在 make 的時候噴出以下 warning `phonebook_opt.c:48:12: warning: variable ‘cpu_time’ set but not used [-Wunused-but-set-variable] double cpu_time;` 原因是 cpu_time 變數是 DEBUG 的時候用的,沒有加入 DEBUG 標籤就會產生 warning 所以用 #ifdef DEBUG 把它包起來,make 的時候就不會有 warning 了 (2) **相同標籤程式碼重新排列** 原程式碼中有許多 `#ifndef OPT` `#else` `#endif` 分散在各處,導致程式碼相當難閱讀,將相同標籤的程式排列在一起,讓閱讀程式的時候能更有邏輯。 在重新排列時也發現有趣的現象 ```C== #ifndef OPT { int i=0; ...... } #else { ...... } #endif ....... #ifndef OPT printf("%d", i); #endif ``` 原本是想用 `{ }` 把 OPT 中間的程式碼框起來,結果竟然編譯不過 變數 `i` 的生命周期只在 scope 裡面,跑到下面 `printf` 地方的時候,`i` 是沒有定義的 之前看到程式碼這樣寫還以為是為了好看,沒想到是一個特別的技巧 (3) **loop 改進** 程式碼裡面有一段長這樣 ```C= /* Connect the linked list of each thread */ for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = thread_args[i]->lEntry_head->pNext; } else { e->pNext = thread_args[i]->lEntry_head->pNext; } e = thread_args[i]->lEntry_tail; } ``` 很明顯 `i == 0` 只會執行一次,可以移到 for 迴圈外面 ```C== pHead = thread_args[0]->lEntry_head->pNext; e = thread_args[0]->lEntry_tail; for (int i = 1; i < THREAD_NUM; i++) { e->pNext = thread_args[i]->lEntry_head->pNext; e = thread_args[i]->lEntry_tail; } ``` ## 效能分析 ``` $ ./phonebook_orig size of entry : 136 bytes execution time of append() : 0.065837 sec execution time of findName() : 0.005717 sec ``` ``` $ ./phonebook_opt (4 threads) size of entry : 24 bytes orginal file size = 3206080 execution time of append() : 0.008602 sec execution time of findName() : 0.004075 sec ``` 新增新的規則 `$ make thread_test_plot`,執行從 1 ~ 128 thread 的 `phonebook_opt` `makeflie`: ```C== ifeq ($(strip $(THREAD_TEST)),1) CFLAGS_opt += -D THREAD_TEST endif thread_test: main.c for i in `seq 1 128`; \ do $(MAKE) phonebook_opt THREAD_TEST=1 THREAD=$$i --silent; \ ./phonebook_opt; echo "\n"; \ rm -rf phonebook_opt; \ done; thread_test_plot: clean thread_test opt.txt gnuplot scripts/thread_test.gp ``` `main.c`: 為了方便做圖在輸出檔案加入執行緒數量 ```C== #if defined(THREAD_TEST) fprintf(output, "%d append() findName() %lf %lf\n", THREAD_NUM, cpu_time1, cpu_time2); #else fprintf(output, "append() findName() %lf %lf\n", cpu_time1, cpu_time2); #endif ``` `thread_test.gp` ``` reset set terminal png size 2000 set ylabel 'time(sec)' set style fill solid set title 'Different thread append() perfomance comparision (from 1 to 128 thread)' set term png enhanced font 'Verdana,10' set output 'thread_test_append.png' plot [:][:0.02]'opt.txt' using 4:xtic(1) with histogram title 'optimized' set title 'Different thread findName() perfomance comparision (from 1 to 128 thread)' set output 'thread_test_findName.png' plot [:][:0.02]'opt.txt' using 5:xtic(1) with histogram title 'optimized' ``` 那趕快來欣賞我 ~~做到快吐血~~ 的圖形吧 ![](https://i.imgur.com/WqmROXv.png) ![](https://i.imgur.com/Cmgwpqi.png) [append() 放大圖](https://i.imgur.com/WqmROXv.png) [findName() 放大圖](https://i.imgur.com/Cmgwpqi.png) append() 的圖形,我其實看不出明顯的規律。貌似(!?)有些微周期的振蕩(~~其實是我的錯覺~~) 不過有一點可以確定的是 3 個執行緒 append() 的效果最好 findName() 的圖形可以看出有周期性 15,30,44,56,64,71,73,77,88,95...(100後的數字看到眼花) 這些數字所需的時間都相當低 > 如果 append 的部分使用 3 threads,findName 也選擇時間短的 thread,應當可以將時間降到相當低。不過,我不確定這樣的方式是否實用。 > 畢竟,對程式每個區段進行量測,感覺不太實際。程式維護上也有可能造成困難。 > 目前也還不知道圖形如此呈現的原因,得試着思考看看 參考資料: * [How to write loop in a Makefile?](http://stackoverflow.com/a/1490961) * [Recursive Use of make](https://www.gnu.org/software/make/manual/html_node/Recursion.html) * [GNU Make silent by default](http://stackoverflow.com/a/24011502) * [Window dimensions of the wxt terminal in gnuplot](http://stackoverflow.com/a/4820713) ## 作業 1 補充 延續 [2017q1 Homework1 (phonebook)](https://hackmd.io/s/HJ2c_XYKx) 的不足,持續改進 ### Cache-miss 分析 ``` Performance counter stats for './phonebook_orig' (100 runs): 2,290,073 cache-misses # 85.325 % of all cache refs ( +- 0.08% ) 2,683,939 cache-references ( +- 0.15% ) 321,828,538 instructions # 1.20 insn per cycle ( +- 0.03% ) 267,943,910 cycles ( +- 0.37% ) 0.089032849 seconds time elapsed ( +- 0.39% ) ``` ``` Performance counter stats for './phonebook_opt' (100 runs): HW1 410,488 cache-misses # 66.447 % of all cache refs ( +- 0.28% ) 617,763 cache-references ( +- 0.59% ) 243,373,348 instructions # 1.63 insn per cycle ( +- 0.04% ) 149,450,713 cycles ( +- 0.28% ) 0.052516323 seconds time elapsed ( +- 1.28% ) ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 1,126,509 cache-misses # 45.986 % of all cache refs ( +- 0.61% ) 2,449,667 cache-references ( +- 0.85% ) 248,939,912 instructions # 1.02 insn per cycle ( +- 0.09% ) 242,920,594 cycles ( +- 0.68% ) 0.073554961 seconds time elapsed ( +- 0.73% ) ``` | Comparison | Size | append | findName | cache miss | total (average) | |---------------|-----------|--------------|--------------|------------|-----------------| | original | 136 bytes | 0.065837 sec | 0.005717 sec | 85.325 % | 0.089032849 sec | | struct_reduce | 32 bytes | 0.069770 sec | 0.002947 sec | 66.447 % | 0.052516323 sec | | concurrent | 24 bytes | 0.008602 sec | 0.004075 sec | 45.986 % | 0.073554961 sec | 可以看到 cahch miss 隨 structure size 減少而降低 總時間 concurrent > struct_reduce 估計是因為 text_align() 的原因,之後可以實驗看看 ### 問題回答 本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題? * 有代表性嗎?跟真實英文姓氏差異大嗎? Ans: 還真的沒看過外國的電話簿,所以在網路上找了個 1969 年的電話簿圖片來參考 [[source](http://loganwv.us/wp-content/gallery/1969-phone-book/1969-logan-wv-telephone-book-page-22-and-23.jpg)] ![](https://i.imgur.com/jSVa834.jpg) 給定的字典一個名字只有出現一次,但電話簿中卻很可能有同名同姓的人。甚至其中還有數字、縮寫、店家的名字。每個 entry 的字數也不一樣,有的是 3 個字,有的是 4 個字。 如果依照實作只使用 last name 搜尋的話,絕對會產生問題。 * 資料難道「數大就是美」嗎? Ans: 資料正確性是首要考量,不過也要有一定規模才比較具有參考價值 * 如果我們要建立符合真實電話簿情境的輸入,該怎麼作? Ans: 可以在網路上尋找 [open data](http://data.southampton.ac.uk/dataset/phonebook.html) 來參考實際電話簿資訊 ![](https://i.imgur.com/nHeAw7l.png) [放大圖](https://i.imgur.com/nHeAw7l.png) | SURNAME | FORENAME | TITLE | AKA | PHONE | EMAIL | DIVISION | |:---------:|:------------------:|:-----:|:----:|:-------------:|:------------------------:|:--------:| | Tully | Susan | Ms | | | st@soton.ac.uk | CZ | | Cornick | Carlene Jean Linda | Mrs | | +442380596808 | C.J.L.Geddes@soton.ac.uk | NP | | Gramatyka | Jakub Julian | Mr | | | J.Gramatyka@soton.ac.uk | MM | | Ryan | Matthew George | Dr | Matt | +442380593308 | M.G.Ryan@soton.ac.uk | CC | | Stone | Nicole Clare | Mrs | | +442380597770 | N.C.Stone@soton.ac.uk | JW | 配合適當的資料結構,可加入對電話簿結構的封裝與方法,如:插入、刪除...等功能 ## 老師建議 在 commit [b6a2385](https://github.com/yangyang95/phonebook-concurrent/commit/a4dc7a7b525fcd4d07706793986c72953ebd914a) jserv 提供意見,因為強制 Push 時留言可能會不見,所以在這裡備份 ```c +ifeq ($(strip $(THREAD_TEST)),1) +CFLAGS_opt += -D THREAD_TEST ... ``` > You don't have to rebuild with THREAD_TEST. Instead, you can enhance build system to generate executable files with various configurations. They can exist suffixing "_g" (for debugging symbols), "_prof" (for profiler enabled), etc. [name=jserv][color=#212599] > What's the benefit for doing so? It's seems like I still have to use wildcard or some technique, since there is a output difference to file in main.c when running thread_test. Or maybe I just misunderstand your words. [name=yangyang95][color=#253] > You can automate all test plans if the executable files are prepare well in advance. Once continuous integration (CI) is set up in such manner and configurations, everything can be manipulated. Example: https://github.com/chewing/libchewing/blob/master/.travis.yml It triggers Travis-CI and automate build and verification process. [name=jserv][color=#212599] 試了幾次,結果不會用 Travis CI,一直因為 `clock_gettime` build 失敗... orz 只好先關掉,~~逃避雖然可恥,但是有用 XD~~ 多試了幾次後發現把`-lrt` flag 加到 `make` 的最後面就可以編譯成功 可以在專案內加入美美的測試圖案,會根據測試是否通過改變狀態 [![Build Status](https://travis-ci.org/yangyang95/phonebook-concurrent.svg?branch=master)](https://travis-ci.org/yangyang95/phonebook-concurrent) 如此以來就可以在遠端測試,不同環境下是否能夠進行編譯,還有執行測試的 script 順帶放上 Travis-Ci 使用的 `.travis.yml` 檔案 ``` language: c sudo: false branches: only: - master cache: apt addons: apt: sources: - ubuntu-toolchain-r-test packages: - gcc-4.8 - gcc-5 - gcc-6 matrix: include: - os: linux compiler: gcc - os: linux compiler: gcc-4.8 - os: linux compiler: gcc-5 - os: linux compiler: gcc-6 script: - make && ./phonebook_orig && ./phonebook_opt ``` 參考資料: * [C static library cannot link with librt](https://stackoverflow.com/questions/29078160/c-static-library-cannot-link-with-librt) ## 改進方向 * phonebook 的 structure 還可以改進成物件導向的方式實作 * [concurrent-ll](https://github.com/jserv/concurrent-ll) 分析與效能比較