contributed by <abba123
>
OS : ubuntu 16.04.1 LTS
lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數: 2
每通訊端核心數: 2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 58
Model name: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
製程: 9
CPU MHz: 1924.316
CPU max MHz: 3100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4988.73
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
需要一併列出硬體組態,特別是 processor model,這樣重現實驗才有參考價值 jserv
已補上 洪勤硯
main.c 裡面有太多的 #ifndef
所以我們把一些可以合併的合併,比較好讀懂。
加了一些註解,讓人比較容易讀懂程式
修改變數名稱,不管是 function、variable、struct 有些不易了解,下面是有改過的部份
int file_size;
thread_info *new_thread(char *file_start, char *file_end, int t_id, int t_num, entry *start);
typedef struct _thread_info {
char *file_start;
char *file_end;
int t_id;
int t_num;
entry *entryStart;
entry *pHead;
entry *pLast;
} thread_info;
在經過 TempoJiJi 的提示後,我們發現他的輸出結果並不正確,原因出在原本將各 thread 連接起來的時候並不完全
for (int i = 0; i < THREAD_NUM; i++) {
if (i == 0) {
pHead = app[i]->pHead->pNext;
dprintf("Connect %d head string %s %p\n", i,
app[i]->pHead->pNext->lastName, app[i]->ptr);
} else {
etmp->pNext = app[i]->pHead->pNext;
dprintf("Connect %d head string %s %p\n", i,
app[i]->pHead->pNext->lastName, app[i]->ptr);
}
etmp = app[i]->pLast;
dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
dprintf("round %d\n", i);
}
在這裡 pHead = app[i]->pHead->pNext 跟 etmp->pNext = app[i]->pHead->pNext 直接連接到了 pHead->pNext , 直接忽略了 pHead 這個 node
所以我們修改了連接的點 pHead = app[i]->pHead 跟 etmp->pNext = app[i]->pHead
在 app[] 裡存了各個 thread 之後跑 append() 會用到的參數
// store the information
for (int i = 0; i < THREAD_NUM; i++)
app[i] = new_append_a(map + MAX_LAST_NAME_SIZE * i,
map + fs, i,
THREAD_NUM, entry_pool + i);
開始新增 thread 去做指定的 function
for (int i = 0; i < THREAD_NUM; i++)
pthread_create( &tid[i], NULL, (void *) &append, (void *) app[i]);
等待所有新增的 thread 都完成後,才繼續往下做
for (int i = 0; i < THREAD_NUM; i++)
pthread_join(tid[i], NULL);
現在這裡有 4 條 link-list 由不同 thread 所建造,分別把 他們頭尾相接連接成一條
for (int i = 0; i < THREAD_NUM; i++) {
if (i == 0) {
pHead = app[i]->pHead;
dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
} else {
etmp->pNext = app[i]->pHead;
dprintf("Connect %d head string %s %p\n", i, app[i]->pHead->pNext->lastName, app[i]->ptr);
}
etmp = app[i]->pLast;
dprintf("Connect %d tail string %s %p\n", i, app[i]->pLast->lastName, app[i]->ptr);
dprintf("round %d\n", i);
}
orig
Performance counter stats for './phonebook_orig' (10 runs):
1,920,612 cache-misses # 92.871 % of all cache refs ( +- 4.35% ) (63.70%)
2,068,037 cache-references ( +- 3.72% ) (64.76%)
2,646,716 L1-dcache-load-misses ( +- 2.13% ) (67.71%)
867,838 L1-dcache-store-misses ( +- 3.13% ) (71.50%)
1,309,519 L1-dcache-prefetch-misses ( +- 6.16% ) (70.05%)
147,668 L1-icache-load-misses ( +- 11.52% ) (65.25%)
0.075387760 seconds time elapsed ( +- 7.41% )
opt
Performance counter stats for './phonebook_opt' (10 runs):
630,520 cache-misses # 42.869 % of all cache refs ( +- 8.88% ) (68.52%)
1,470,812 cache-references ( +- 4.00% ) (69.04%)
2,813,932 L1-dcache-load-misses ( +- 6.76% ) (69.69%)
640,970 L1-dcache-store-misses ( +- 15.76% ) (68.42%)
1,791,537 L1-dcache-prefetch-misses ( +- 8.39% ) (68.68%)
571,735 L1-icache-load-misses ( +- 2.34% ) (69.25%)
0.124468458 seconds time elapsed ( +- 15.39% )
1 thread
2 thread
4 thread