Try   HackMD

2016q3 Homework2 (phonebook-concurent)

contributed by <abba123>

開發環境

OS : ubuntu 16.04.1 LTS

lscpu	
Architecture:          x86_64
CPU 作業模式:    	32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數:		2
每通訊端核心數:		2
Socket(s):             1
NUMA 節點:         	1
供應商識別號:  		GenuineIntel
CPU 家族:          	6
型號:              	58
Model name:            Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
製程:              	9
CPU MHz:             	1924.316
CPU max MHz:           3100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4988.73
虛擬:              	VT-x
L1d 快取:          	32K
L1i 快取:         	 32K
L2 快取:          	 256K
L3 快取:          	 3072K
NUMA node0 CPU(s):     0-3

需要一併列出硬體組態,特別是 processor model,這樣重現實驗才有參考價值 jserv
已補上 洪勤硯

預期目標

  • 學習第二週提及的 concurrency 程式設計
  • 學習 POSIX Thread
  • 學習效能分析工具
  • code refactoring 練習

refactoring

  • main.c 裡面有太多的 #ifndef 所以我們把一些可以合併的合併,比較好讀懂。

  • 加了一些註解,讓人比較容易讀懂程式

  • 修改變數名稱,不管是 function、variable、struct 有些不易了解,下面是有改過的部份

    • main.c
    int file_size;
    • phonebook_opt.c
    ​thread_info *new_thread(char *file_start, char *file_end, int t_id, int t_num, entry *start);
    • phonebook_opt.h
    typedef struct _thread_info { ​​​​ char *file_start; ​​​​ char *file_end; ​​​​ int t_id; ​​​​ int t_num; ​​​​ entry *entryStart; ​​​​ entry *pHead; ​​​​ entry *pLast;} thread_info;

案例分析: phonebook-concurent

在經過 TempoJiJi 的提示後,我們發現他的輸出結果並不正確,原因出在原本將各 thread 連接起來的時候並不完全

for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = app[i]->pHead->pNext; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } else { etmp->pNext = app[i]->pHead->pNext; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } etmp = app[i]->pLast; dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr); dprintf("round %d\n", i); }

在這裡 pHead = app[i]->pHead->pNext 跟 etmp->pNext = app[i]->pHead->pNext 直接連接到了 pHead->pNext , 直接忽略了 pHead 這個 node
所以我們修改了連接的點 pHead = app[i]->pHead 跟 etmp->pNext = app[i]->pHead

pthread

main.c

在 app[] 裡存了各個 thread 之後跑 append() 會用到的參數

// store the information for (int i = 0; i < THREAD_NUM; i++) app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i, map + fs, i, THREAD_NUM, entry_pool + i);

開始新增 thread 去做指定的 function

for (int i = 0; i < THREAD_NUM; i++) pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]);

等待所有新增的 thread 都完成後,才繼續往下做

for (int i = 0; i < THREAD_NUM; i++) pthread_join(tid[i], NULL);

現在這裡有 4 條 link-list 由不同 thread 所建造,分別把 他們頭尾相接連接成一條

for (int i = 0; i < THREAD_NUM; i++) { if (i == 0) { pHead = app[i]->pHead; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } else { etmp->pNext = app[i]->pHead; dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr); } etmp = app[i]->pLast; dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr); dprintf("round %d\n", i); }

實驗

perf分析

orig

 Performance counter stats for './phonebook_orig' (10 runs):

         1,920,612      cache-misses              #   92.871 % of all cache refs      ( +-  4.35% )  (63.70%)
         2,068,037      cache-references                                              ( +-  3.72% )  (64.76%)
         2,646,716      L1-dcache-load-misses                                         ( +-  2.13% )  (67.71%)
           867,838      L1-dcache-store-misses                                        ( +-  3.13% )  (71.50%)
         1,309,519      L1-dcache-prefetch-misses                                     ( +-  6.16% )  (70.05%)
           147,668      L1-icache-load-misses                                         ( +- 11.52% )  (65.25%)

       0.075387760 seconds time elapsed                                          ( +-  7.41% )

opt

 Performance counter stats for './phonebook_opt' (10 runs):

           630,520      cache-misses              #   42.869 % of all cache refs      ( +-  8.88% )  (68.52%)
         1,470,812      cache-references                                              ( +-  4.00% )  (69.04%)
         2,813,932      L1-dcache-load-misses                                         ( +-  6.76% )  (69.69%)
           640,970      L1-dcache-store-misses                                        ( +- 15.76% )  (68.42%)
         1,791,537      L1-dcache-prefetch-misses                                     ( +-  8.39% )  (68.68%)
           571,735      L1-icache-load-misses                                         ( +-  2.34% )  (69.25%)

       0.124468458 seconds time elapsed                                          ( +- 15.39% )

Gnuplot

1 thread

2 thread

4 thread