2016q3 Homework2 (phonebook-concurent)

contributed by <abba123>

開發環境

OS : ubuntu 16.04.1 LTS

lscpu

Architecture:          x86_64
CPU 作業模式：    	32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：		2
每通訊端核心數：		2
Socket(s):             1
NUMA 節點：         	1
供應商識別號：  		GenuineIntel
CPU 家族：          	6
型號：              	58
Model name:            Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
製程：              	9
CPU MHz：             	1924.316
CPU max MHz:           3100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4988.73
虛擬：              	VT-x
L1d 快取：          	32K
L1i 快取：         	 32K
L2 快取：          	 256K
L3 快取：          	 3072K
NUMA node0 CPU(s):     0-3

需要一併列出硬體組態，特別是 processor model，這樣重現實驗才有參考價值 jserv
已補上洪勤硯

預期目標

學習第二週提及的 concurrency 程式設計
學習 POSIX Thread
學習效能分析工具
code refactoring 練習

refactoring

main.c 裡面有太多的 #ifndef 所以我們把一些可以合併的合併，比較好讀懂。
加了一些註解，讓人比較容易讀懂程式

修改變數名稱，不管是 function、variable、struct 有些不易了解，下面是有改過的部份

main.c


int file_size;

phonebook_opt.c


thread_info *new_thread(char *file_start, char *file_end, int t_id, int t_num, entry *start);

phonebook_opt.h









typedef struct _thread_info {
	char *file_start;
	char *file_end;
	int t_id;
	int t_num;
	entry *entryStart;
	entry *pHead;
	entry *pLast;
} thread_info;

案例分析: phonebook-concurent

在經過 TempoJiJi 的提示後，我們發現他的輸出結果並不正確，原因出在原本將各 thread 連接起來的時候並不完全














for (int i = 0; i < THREAD_NUM; i++) {
    if (i == 0) {
        pHead = app[i]->pHead->pNext;     
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    } else {
        etmp->pNext = app[i]->pHead->pNext;
        dprintf("Connect %d head string %s %p\n", i,
                    app[i]->pHead->pNext->lastName, app[i]->ptr);
    }
    etmp = app[i]->pLast;
    dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
    dprintf("round %d\n", i);
}

在這裡 pHead = app[i]->pHead->pNext 跟 etmp->pNext = app[i]->pHead->pNext 直接連接到了 pHead->pNext ，直接忽略了 pHead 這個 node
所以我們修改了連接的點 pHead = app[i]->pHead 跟 etmp->pNext = app[i]->pHead

pthread

main.c

在 app[] 裡存了各個 thread 之後跑 append() 會用到的參數





// store the information
for (int i = 0; i < THREAD_NUM; i++)
    app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i,
                          map + fs, i,
                          THREAD_NUM, entry_pool + i);

開始新增 thread 去做指定的 function


for (int i = 0; i < THREAD_NUM; i++)
        pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]);

等待所有新增的 thread 都完成後，才繼續往下做


    for (int i = 0; i < THREAD_NUM; i++)
        pthread_join(tid[i], NULL);

現在這裡有 4 條 link-list 由不同 thread 所建造，分別把他們頭尾相接連接成一條












for (int i = 0; i < THREAD_NUM; i++) {
        if (i == 0) {
            pHead = app[i]->pHead;
            dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
        } else {
            etmp->pNext = app[i]->pHead;
            dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
        }
        etmp = app[i]->pLast;
        dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
        dprintf("round %d\n", i);
    }

實驗

perf分析

orig

 Performance counter stats for './phonebook_orig' (10 runs):

         1,920,612      cache-misses              #   92.871 % of all cache refs      ( +-  4.35% )  (63.70%)
         2,068,037      cache-references                                              ( +-  3.72% )  (64.76%)
         2,646,716      L1-dcache-load-misses                                         ( +-  2.13% )  (67.71%)
           867,838      L1-dcache-store-misses                                        ( +-  3.13% )  (71.50%)
         1,309,519      L1-dcache-prefetch-misses                                     ( +-  6.16% )  (70.05%)
           147,668      L1-icache-load-misses                                         ( +- 11.52% )  (65.25%)

       0.075387760 seconds time elapsed                                          ( +-  7.41% )

opt

 Performance counter stats for './phonebook_opt' (10 runs):

           630,520      cache-misses              #   42.869 % of all cache refs      ( +-  8.88% )  (68.52%)
         1,470,812      cache-references                                              ( +-  4.00% )  (69.04%)
         2,813,932      L1-dcache-load-misses                                         ( +-  6.76% )  (69.69%)
           640,970      L1-dcache-store-misses                                        ( +- 15.76% )  (68.42%)
         1,791,537      L1-dcache-prefetch-misses                                     ( +-  8.39% )  (68.68%)
           571,735      L1-icache-load-misses                                         ( +-  2.34% )  (69.25%)

       0.124468458 seconds time elapsed                                          ( +- 15.39% )

Gnuplot

1 thread

2 thread

4 thread