# 2016q3 Homework2 (phonebook-concurent) contributed by <`abba123`> ## 開發環境 OS : ubuntu 16.04.1 LTS ``` lscpu ``` ``` Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數: 2 每通訊端核心數: 2 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 58 Model name: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 製程: 9 CPU MHz: 1924.316 CPU max MHz: 3100.0000 CPU min MHz: 1200.0000 BogoMIPS: 4988.73 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 3072K NUMA node0 CPU(s): 0-3 ``` >> 需要一併列出硬體組態,特別是 processor model,這樣重現實驗才有參考價值 [name=jserv] >> 已補上 [name=洪勤硯] ## 預期目標 * 學習第二週提及的 [concurrency](/s/H10MXXoT) 程式設計 * 學習 POSIX Thread * 學習效能分析工具 * code refactoring 練習 ## refactoring * main.c 裡面有太多的 `#ifndef` 所以我們把一些可以合併的合併,比較好讀懂。 * 加了一些註解,讓人比較容易讀懂程式 * 修改變數名稱,不管是 function、variable、struct 有些不易了解,下面是有改過的部份 * main.c ```c= int file_size; ``` * phonebook_opt.c ```c= thread_info *new_thread(char *file_start, char *file_end, int t_id, int t_num, entry *start); ``` * phonebook_opt.h ``` c= typedef struct _thread_info { char *file_start; char *file_end; int t_id; int t_num; entry *entryStart; entry *pHead; entry *pLast; } thread_info; ``` ## 案例分析: [phonebook-concurent](https://github.com/sysprog21/phonebook-concurrent) 在經過 [TempoJiJi](https://hackmd.io/s/rymKa4aT) 的提示後,我們發現他的輸出結果並不正確,原因出在原本將各 thread 連接起來的時候並不完全 ```c= for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = app[i]->pHead->pNext; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } else { etmp->pNext = app[i]->pHead->pNext; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } etmp = app[i]->pLast; dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr); dprintf("round %d\n", i); } ``` 在這裡 pHead = app[i]->pHead->pNext 跟 etmp->pNext = app[i]->pHead->pNext 直接連接到了 pHead->pNext , 直接忽略了 pHead 這個 node 所以我們修改了連接的點 pHead = app[i]->pHead 跟 etmp->pNext = app[i]->pHead ### pthread #### main.c 在 app[] 裡存了各個 thread 之後跑 append() 會用到的參數 ```c= // store the information for (int i = 0; i < THREAD_NUM; i++) app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i, map + fs, i, THREAD_NUM, entry_pool + i); ``` 開始新增 thread 去做指定的 function ```c= for (int i = 0; i < THREAD_NUM; i++) pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]); ``` 等待所有新增的 thread 都完成後,才繼續往下做 ```c= for (int i = 0; i < THREAD_NUM; i++) pthread_join(tid[i], NULL); ``` 現在這裡有 4 條 link-list 由不同 thread 所建造,分別把 他們頭尾相接連接成一條 ```c= for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = app[i]->pHead; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } else { etmp->pNext = app[i]->pHead; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } etmp = app[i]->pLast; dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr); dprintf("round %d\n", i); } ``` ### 實驗 #### perf分析 orig ``` Performance counter stats for './phonebook_orig' (10 runs): 1,920,612 cache-misses # 92.871 % of all cache refs ( +- 4.35% ) (63.70%) 2,068,037 cache-references ( +- 3.72% ) (64.76%) 2,646,716 L1-dcache-load-misses ( +- 2.13% ) (67.71%) 867,838 L1-dcache-store-misses ( +- 3.13% ) (71.50%) 1,309,519 L1-dcache-prefetch-misses ( +- 6.16% ) (70.05%) 147,668 L1-icache-load-misses ( +- 11.52% ) (65.25%) 0.075387760 seconds time elapsed ( +- 7.41% ) ``` opt ``` Performance counter stats for './phonebook_opt' (10 runs): 630,520 cache-misses # 42.869 % of all cache refs ( +- 8.88% ) (68.52%) 1,470,812 cache-references ( +- 4.00% ) (69.04%) 2,813,932 L1-dcache-load-misses ( +- 6.76% ) (69.69%) 640,970 L1-dcache-store-misses ( +- 15.76% ) (68.42%) 1,791,537 L1-dcache-prefetch-misses ( +- 8.39% ) (68.68%) 571,735 L1-icache-load-misses ( +- 2.34% ) (69.25%) 0.124468458 seconds time elapsed ( +- 15.39% ) ``` #### Gnuplot 1 thread ![](https://i.imgur.com/jjwClpG.png) 2 thread ![](https://i.imgur.com/ySVNfkR.png) 4 thread ![](https://i.imgur.com/iLrOZby.png)