owned this note
owned this note
Published
Linked with GitHub
<style>
h2.part{color:#0099B0;}
h3.part{color:#D92424;}
h4.part{color:#005BB0;}
h5.part{color:#FD6F0A;}
h6.part{color:#4400B0;}
</style>
# 2016q3 Homework3 (mergesort-concurrent)
contributed by <`f5120125`>
###### tags: `sysprog`
## 開發環境
#### Ubuntu 14.04 LTS
使用```lscpu```檢視電腦配備
- CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4
- Mem: 8 GiB
- Cache:
L1d cache: 32 KB
L1i cache: 32 KB
L2 cache: 256 KB
L3 cache: 4096 KB
-------------------------------
使用```x86info -c```去檢視更詳細資訊, 其中```-c```為cache的詳細資訊
:::info
Found 4 identical CPUs
Extended Family: 0 Extended Model: 2 Family: 6 Model: 37 Stepping: 5
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (Nehalem) [Clarkdale/Arrandale]
Processor name string (BIOS programmed): Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz
>>Cache info
L1 Instruction cache: 32KB, 4-way associative. 64 byte line size.
L1 Data cache: 32KB, 8-way associative. 64 byte line size.
L2 (MLC): 256KB, 8-way associative. 64 byte line size.
TLB info
Instruction TLB: 2MB or 4MB pages, fully associative, 7 entries
Instruction TLB: 4K pages, 4-way associative, 64 entries.
Data TLB: 4KB or 4MB pages, fully associative, 32 entries.
Data TLB: 4KB pages, 4-way associative, 64 entries
Data TLB: 4K pages, 4-way associative, 512 entries.
Data TLB: 4KB or 4MB pages, fully associative, 32 entries.
Data TLB: 4KB pages, 4-way associative, 64 entries
64 byte prefetching.
Data TLB: 4K pages, 4-way associative, 512 entries.
Found unknown cache descriptors: e3
Total processor threads: 4
This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.20GHz
:::
## 前置作業
- [ ]1. 使merge-sort-concurrency可以排序字串
- [ ]2. Mutex Lock and Lock Contention
- [ ]3. Thread Model
- [ ]4. Code Review for mergesort-concurrent
- [ ]5. Code Review for concurrent-ll
### 1. 使merge-sort-concurrency可以排序字串
使用```$ uniq words.txt | sort -R > input.txt``` 產生亂序的words檔並導向input.txt
- main.c中的讀檔
```clike=
while( fgets(buff, MAX_LEN, fp) )
list_add(the_list, buff);
```
- 將排序的結果輸出至txt檔
```clike=
node_t *curr = the_list->head;
FILE *new_fp;
char *write_buff;
new_fp = fopen( NEW_FILE_PATH, "w+");
while (curr ) {
int len = strlen(curr->data);
write_buff = (char *)malloc( len );
strncpy(write_buff, curr->data ,strlen(curr->data));
curr = curr->next;
fwrite(write_buff, len, 1, new_fp);
free(write_buff);
}
fclose(new_fp);
```
- 比較輸出結果是否為sorted-list, 因此我寫了個check.c來確認, 並再執行後得到兩個txt檔為相同!
```clike=
int main(int argc, char *const argv[]){
FILE *fp1, *fp2;
fp1 = fopen(ORIG, "r");//讀取input.txt
fp2 = fopen(ANS, "r");//讀取輸出的txt檔
char buf1[MAX_LEN];
char buf2[MAX_LEN];
while(fgets(buf1, MAX_LEN, fp1) && fgets(buf2, MAX_LEN, fp2) ){
if( strcmp(buf1, buf2) ){
printf("%s is not equal to %s\n", buf1, buf2);
return 0;
}
}
fclose(fp1);
fclose(fp2);
printf("equal!!\n");
return 0;
}
```
### 2. Mutex Lock and Lock Contention
- [pthread_mutex_init](https://linux.die.net/man/3/pthread_mutex_init)
- The pthread_mutex_init() function shall initialize the mutex referenced by mutex with attributes specified by attr. If attr is NULL, the default mutex attributes are used; the effect shall be the same as passing the address of a default mutex attributes object. Upon successful initialization, the state of the mutex becomes initialized and unlocked.
- [pthread_mutex_lock]
- [pthread_mutex_unlock]
- [pthread_cond_init](https://linux.die.net/man/3/pthread_cond_init)
- The pthread_cond_init() function shall initialize the condition variable referenced by cond with attributes referenced by attr. If attr is NULL, the default condition variable attributes shall be used; the effect is the same as passing the address of a default condition variable attributes object. Upon successful initialization, the state of the condition variable shall become initialized.
#### 由於之前沒寫過mutex lock, 因此撰寫一份簡單的程式去測試mutex lock
```clike=
#include <stdio.h>
#include <stdlib.h>//malloc
#include <limits.h>//INT_MAX ...
#include <pthread.h>//pthread functions
#define NUM_JOB 2
typedef struct{
int count;
}doSomeThingArg;
int id=0;
pthread_mutex_t lock;
void *doSomeThing(void* arg);
int main(){
pthread_t tid[ NUM_JOB ];
doSomeThingArg* somePtr = (doSomeThingArg*)malloc( sizeof(doSomeThingArg) );
pthread_mutex_init(&lock, NULL);//use mutex_init before using mutex_lock
for(int i=0; i<NUM_JOB; i++){
somePtr->count = i;
pthread_create(&(tid[i]), NULL, &doSomeThing, somePtr);
}
/*
somePtr->count = 100;
printf("after create, before join: %d\n", somePtr->count);
*/
for(int i=0; i<NUM_JOB; i++){
pthread_join(tid[i], NULL);
}
somePtr->count = 100;
printf("after create, after join: %d\n", somePtr->count);
free(somePtr);
return 0;
}
void *doSomeThing(void* arg){
pthread_mutex_lock(&lock);
doSomeThingArg* somePtr = (doSomeThingArg*)arg;
printf("Job %d started\n", id);
printf("somePtr->count: %d\n", somePtr->count);
for(int i=0; i<somePtr->count; i++);
printf("Job %d ended\n", id);
id++;
pthread_mutex_unlock(&lock);
return NULL;
}
```
- 若沒有使用pthread_mutex_lock
- 使用mutrace去追蹤程式
```
hua@hua-ubuntu:~/Desktop$ mutrace ./mutex_test
mutrace: Application appears to be compiled without -rdynamic. It might be a
mutrace: good idea to recompile with -rdynamic enabled since this produces more
mutrace: useful stack traces.
mutrace: 0.2 sucessfully initialized for process mutex_test (pid 25109).
Job 0 started
somePtr->count: 1
Job 0 ended
Job 0 started
somePtr->count: 1
Job 1 ended
after create, after join: 100
mutrace: Showing statistics for process mutex_test (pid 25109).
mutrace: 1 mutexes used.
mutrace: No mutex contended according to filtering parameters.
mutrace: Total runtime is 0.505 ms.
mutrace: Results for SMP with 4 processors.
```
:::info
:::
- 使用mutex lock並用mutrace追蹤程式
```
hua@hua-ubuntu:~/Desktop$ mutrace ./mutex_test
mutrace: Application appears to be compiled without -rdynamic. It might be a
mutrace: good idea to recompile with -rdynamic enabled since this produces more
mutrace: useful stack traces.
mutrace: 0.2 sucessfully initialized for process mutex_test (pid 25130).
Job 0 started
somePtr->count: 1
Job 0 ended
Job 1 started
somePtr->count: 1
Job 1 ended
after create, after join: 100
mutrace: Showing statistics for process mutex_test (pid 25130).
mutrace: 1 mutexes used.
mutrace: No mutex contended according to filtering parameters.
mutrace: Total runtime is 0.519 ms.
mutrace: Results for SMP with 4 processors.
```
:::info
:::
### 3. [Thread Models](http://maxim.int.ru/bookshelf/PthreadsProgram/htm/r_19.html)
```graphviz
digraph hierarchy {
nodesep=0.5 // increases the separation between nodes
node [color=black,fontname=Courier,shape=box] //All nodes will this shape and colour
edge [color=black, style=dashed] // All the lines look like this
Manager->{Worker1 Worker2 Worker3}
}
```
為了分配工作,在 worker thread 實作 Task 的機制。每個主要的操作會先被放進 task queue 裡頭,空閒的 thread 再從 task queue 裡頭提取 task 執行,只要把我們的操作寫成 task,就能順利執行。
```graphviz
digraph {
開始Process[shape="box", style=rounded];
是否為結束Task[shape="diamond"];
結束Process[shape="box", style=rounded];
開始Process->提取Task->是否為結束Task
是否為結束Task->重發結束Task[label="是"]
重發結束Task->結束Process
是否為結束Task->執行Task[label="否"]
執行Task->提取Task;
}
```
### 4. Code Review for mergesort-concurrent
#### list.[c,h]
:::info
注意 data type 的使用
:::
- [stackoverflow - 什麼是 intptr_t](http://stackoverflow.com/questions/35071200/what-is-the-use-of-intptr-t)
- 此篇討論```intptr_t```和```void *``` 的比較, 由於 ==*bitwise operator*== 不可以對```void *```做運算, 但是如果要對address作bitwise operator的運算可以宣告為```intptr_t``` 型態
- [stackoverflow - 使用 intptr_t 資料型態的時機](http://stackoverflow.com/questions/6326338/why-when-to-use-intptr-t-for-type-casting-in-c)
- pointer 轉成 integer 時,32 bits 或 64 bits 的 machine 會有 pointer 長度的差別。若使用到 `intptr_t` 就能免除麻煩的型態轉換。除此之外,也是確保型態轉換上不會出問題的作法
- 以下為32-bit和64-bit機器的type size
| type | 32-bit machine | 64-bit machine |
|:----------|:---------------| :---------------|
| char | 1 | 1 |
| short | 2 | 2 |
| int | 4 | 4 |
| long | 4 | 8 |
| long long | 8 | 8 |
| pointer | 4 | 8 |
- 使用```intptr_t```的目的
- 讓 node 中的資料可以是指標型態,也可以是一般資料型態。
- 為了相容不同平台所設計。
#### threadpool.[c,h]
- [wiki - threadpool](https://en.wikipedia.org/wiki/Thread_pool)
- 再前幾次作業中, 每執行短時間task時, 就建立thread, 但是當完成task後就進行銷毀, 而沒有有效利用thread的能力
- 因此需要建立一個機制來有效利用thread, thread pool即是一個不錯的方法, 而任務排程以執行執行緒的常見方法是使用 ==**同步佇列**== (synchronized queue),稱作 ==**任務佇列**== (task queue), pool中的執行緒等待佇列中的任務,並把執行完的任務放入完成佇列中
![img](http://i.imgur.com/fT4IBV5.png)
##### thread pool
```clike=
typedef struct{
pthread_t* threads;
uint32 count;
tqueue_t* queue;
}tpool_t;
```
- pool中存有```count```個threads, 並且有個指向task queue的指針, 以利提領task
##### task
```clike=
typedef struct _task{
void (*func)(void*);
void *arg;
struct _task *next;
}task_t;
```
- task中有指向```void*```型態的function, 其參數交由```arg```去指
##### task queue
```clike=
typedef struct {
task_t *head;
task_t *tail;
pthread_mutex_t mutex;
pthread_cond_t cond;
uint32_t size;
uint32_t num_of_consumed;
} tqueue_t;
```
#### merge_sort.[c,h]
```clike=
llist_t *sort_n_merge(llist_t *a, llist_t *b);
llist_t *split_n_merge(llist_t *list);
llist_t *merge_sort(llist_t *list);
```
#### main.c
```clike=
tpool_init(pool, thread_count, task_run);
```
```clike=
tqueue_push(pool->queue, task_new(cut_local_list, the_list));
```
### Code Review for concurrent-ll
- [ ][Compare and Swap](https://en.wikipedia.org/wiki/Compare-and-swap)
- [ ][Towards Concurrency](https://hackmd.io/s/Skh_AaVix)