# 2017q1 Homework4 (mergesort-concurrent)
contributed by <`twzjwang`>
作業說明: [B08: mergesort-concurrent](https://hackmd.io/s/B1xV_p_jl)
github: [twzjwang/mergesort-concurrent](https://github.com/twzjwang/mergesort-concurrent)
# 開發環境
- Ubuntu Ubuntu 16.04.2 LTS
Linux 4.8.0-36-generic
- cpu
version: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
- memory
size: 4GiB
# 開發前
- 依照作業說明執行
```
$ (for i in {1..8}; do echo $RANDOM; done) | ./sort 4 8
Segmentation fault (core dumped)
```
- `argv[2]` 為 `build_list_from_file` 的檔名,而不是先的 `input_count`
- fopen 失敗
- 執行 `$ uniq test_data/words.txt | sort -R > test_data/word_sorted.txt` 取得亂數排列的檔案
- 根據 Manual
- NAME : `sort - sort lines of text files `
- DESCRIPTION :
`--sort=WORD sort according to WORD: random -R`
- 執行 `$ ./sort 4 test_data/word_sorted.txt`
- 結果
```
#Total_tasks_consumed: 12
#Elapsed_time: 51.829 ms
#Throughput: 231 (per sec)
```
- list_print
```
0
...
0
```
- 字串經 [atol](http://www.cplusplus.com/reference/cstdlib/atol/) 轉換回傳 long int
- 轉換結果皆為0
# 修改-使用字串排列
- 定義 datatype 為大小 16 的char array
- `typedef char val_t[16];`
- 使用 strcmp 比大小`llist_t *small = (strcmp(a->head->data, b->head->data) <= 0) ? a : b;`
- 自動測試
- `make check_words`
```
check_words: sort genData
./sort $(THREADS) test_data/input.txt output.txt
diff output.txt test_data/expected.txt
```
- diff 沒有輸出 : 內容完全相同
- 結果
```
$ make check
uniq test_data/words.txt | sort -R > test_data/input.txt
uniq test_data/words.txt | sort > test_data/expected.txt
./sort 4 test_data/input.txt output.txt
#Total_tasks_consumed: 12
#Elapsed_time: 150.177 ms
#Throughput: 79 (per sec)
diff output.txt test_data/expected.txt
```
- mutrace
- Lock Contention
- lock contention can have a big impact on the runtime behaviour of applications
- 項目
- Locked : 整個 runtime 中 mutex locked 的次數
- Changed : 擁有 mutex 的 thread 改變的次數
- 數量高 => the risk of contention 高
- Cont : contention 發生次數
- 想要取得 lock 但已經先被其他 thread 取得,必須等待
- tot.Time[ms] : 等待的總時間
- avg.Time[ms] : 平均等待時間
- max.Time[ms] : 最大等待時間
- 結果
- Mutex #1 發生大量 contention 並造成等待
- 使用 `addr2line` 找到對應的位置,推測
- Mutex #0
- Mutex #1 : the_queue->mutex
- Mutex #2 : data_context.mutex
```
$ mutrace ./sort 4 test_data/input.txt output.txt
mutrace: 0.2 sucessfully initialized for process sort (pid 3788).
#Total_tasks_consumed: 12
#Elapsed_time: 129.242 ms
#Throughput: 92 (per sec)
mutrace: Showing statistics for process sort (pid 3788).
mutrace: 3 mutexes used.
Mutex #1 (0x0x1fba670) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2]
./sort(tqueue_init+0x46) [0x4014eb]
./sort(tpool_init+0x6a) [0x40174f]
./sort(main+0xe5) [0x401ee4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830]
Mutex #0 (0x0x7fbca513b860) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7fbca7b356b9]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7fbca4f37fec]
[(nil)]
Mutex #2 (0x0x603140) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2]
./sort(main+0xab) [0x401eaa]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830]
mutrace: Showing 3 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags
1 83875 31938 25850 11.116 0.000 0.019 Mx.--.
0 20 15 8 0.005 0.000 0.001 M-.--.
2 6 4 0 0.002 0.000 0.001 M-.--.
||||||
/|||||
Object: M = Mutex, W = RWLock /||||
State: x = dead, ! = inconsistent /|||
Use: R = used in realtime thread /||
Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
Mutex Protocol: i = INHERIT, p = PROTECT /
RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC
mutrace: Note that the flags column R is only valid in --track-rt mode!
mutrace: Total runtime is 200.374 ms.
mutrace: Results for SMP with 4 processors.
```
:::danger
特別留意 lock contention 對效能的影響,目前整合的 thread pool 實作不是很有效率 (之後我們應該要重寫一套更好的) --jserv
:::
# 修改-[lock free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom)
- lock free
- individual threads 可能發生 starve ,但保證 system-wide throughput
- All wait-free algorithms are lock-free
- lock-free 演算法的四個 phases
- 完成自己的 operation
- 協助阻塞的 operation
- 中止阻塞的 operation
- 等待
- 針對 lock contention 次數較多的 Mutex #1 : the_queue->mutex
- crtitical section
- pthread_mutex_lock() 和 pthread_mutex_unlock() 之間
-
```C=
task_t *tqueue_pop(tqueue_t * const the_queue)
{
task_t *ret;
pthread_mutex_lock(&(the_queue->mutex));
ret = the_queue->head;
if (ret) {
the_queue->head = ret->next;
ret->next = NULL;
if (!(the_queue->head)) {
the_queue->tail = NULL;
}
the_queue->size--;
the_queue->num_of_consumed++;
}
pthread_mutex_unlock(&(the_queue->mutex));
return ret;
}
uint32_t tqueue_size(tqueue_t * const the_queue)
{
uint32_t ret;
pthread_mutex_lock(&(the_queue->mutex));
ret = the_queue->size;
pthread_mutex_unlock(&(the_queue->mutex));
return ret;
}
int tqueue_push(tqueue_t * const the_queue, task_t *task)
{
pthread_mutex_lock(&(the_queue->mutex));
task->next = NULL;
if (the_queue->tail)
the_queue->tail->next = task;
the_queue->tail = task;
if (the_queue->size++ == 0)
the_queue->head = task;
pthread_mutex_unlock(&(the_queue->mutex));
return 0;
}
```
# 參考資料
- [LanKuDot 學長 筆記](https://hackmd.io/s/S1ezGhIA)
- [Measuring Lock Contention](http://0pointer.de/blog/projects/mutrace.html)
###### tags: `twzjwang`