2016q3 Homework3 (mergesort-concurrent)

contributed by <f5120125>

tags: sysprog

開發環境

Ubuntu 14.04 LTS

使用lscpu檢視電腦配備

  • CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4
  • Mem: 8 GiB
  • Cache:
    L1d cache: 32 KB
    L1i cache: 32 KB
    L2 cache: 256 KB
    L3 cache: 4096 KB

使用x86info -c去檢視更詳細資訊, 其中-c為cache的詳細資訊

Found 4 identical CPUs
Extended Family: 0 Extended Model: 2 Family: 6 Model: 37 Stepping: 5
Type: 0 (Original OEM)
CPU Model (x86info's best guess): Core i7 (Nehalem) [Clarkdale/Arrandale]
Processor name string (BIOS programmed): Intel® Core i5 CPU 650 @ 3.20GHz

Cache info
L1 Instruction cache: 32KB, 4-way associative. 64 byte line size.
L1 Data cache: 32KB, 8-way associative. 64 byte line size.
L2 (MLC): 256KB, 8-way associative. 64 byte line size.
TLB info
Instruction TLB: 2MB or 4MB pages, fully associative, 7 entries
Instruction TLB: 4K pages, 4-way associative, 64 entries.
Data TLB: 4KB or 4MB pages, fully associative, 32 entries.
Data TLB: 4KB pages, 4-way associative, 64 entries
Data TLB: 4K pages, 4-way associative, 512 entries.
Data TLB: 4KB or 4MB pages, fully associative, 32 entries.
Data TLB: 4KB pages, 4-way associative, 64 entries
64 byte prefetching.
Data TLB: 4K pages, 4-way associative, 512 entries.

Found unknown cache descriptors: e3
Total processor threads: 4
This system has 1 dual-core processor with hyper-threading (2 threads per core) running at an estimated 3.20GHz

前置作業

  • [ ]1. 使merge-sort-concurrency可以排序字串
  • [ ]2. Mutex Lock and Lock Contention
  • [ ]3. Thread Model
  • [ ]4. Code Review for mergesort-concurrent
  • [ ]5. Code Review for concurrent-ll

1. 使merge-sort-concurrency可以排序字串

使用$ uniq words.txt | sort -R > input.txt 產生亂序的words檔並導向input.txt

  • main.c中的讀檔
while( fgets(buff, MAX_LEN, fp) ) list_add(the_list, buff);
  • 將排序的結果輸出至txt檔
node_t *curr = the_list->head; FILE *new_fp; char *write_buff; new_fp = fopen( NEW_FILE_PATH, "w+"); while (curr ) { int len = strlen(curr->data); write_buff = (char *)malloc( len ); strncpy(write_buff, curr->data ,strlen(curr->data)); curr = curr->next; fwrite(write_buff, len, 1, new_fp); free(write_buff); } fclose(new_fp);
  • 比較輸出結果是否為sorted-list, 因此我寫了個check.c來確認, 並再執行後得到兩個txt檔為相同!
int main(int argc, char *const argv[]){ FILE *fp1, *fp2; fp1 = fopen(ORIG, "r");//讀取input.txt fp2 = fopen(ANS, "r");//讀取輸出的txt檔 char buf1[MAX_LEN]; char buf2[MAX_LEN]; while(fgets(buf1, MAX_LEN, fp1) && fgets(buf2, MAX_LEN, fp2) ){ if( strcmp(buf1, buf2) ){ printf("%s is not equal to %s\n", buf1, buf2); return 0; } } fclose(fp1); fclose(fp2); printf("equal!!\n"); return 0; }

2. Mutex Lock and Lock Contention

  • pthread_mutex_init
    • The pthread_mutex_init() function shall initialize the mutex referenced by mutex with attributes specified by attr. If attr is NULL, the default mutex attributes are used; the effect shall be the same as passing the address of a default mutex attributes object. Upon successful initialization, the state of the mutex becomes initialized and unlocked.
  • [pthread_mutex_lock]
  • [pthread_mutex_unlock]
  • pthread_cond_init
    • The pthread_cond_init() function shall initialize the condition variable referenced by cond with attributes referenced by attr. If attr is NULL, the default condition variable attributes shall be used; the effect is the same as passing the address of a default condition variable attributes object. Upon successful initialization, the state of the condition variable shall become initialized.

由於之前沒寫過mutex lock, 因此撰寫一份簡單的程式去測試mutex lock

#include <stdio.h> #include <stdlib.h>//malloc #include <limits.h>//INT_MAX ... #include <pthread.h>//pthread functions #define NUM_JOB 2 typedef struct{ int count; }doSomeThingArg; int id=0; pthread_mutex_t lock; void *doSomeThing(void* arg); int main(){ pthread_t tid[ NUM_JOB ]; doSomeThingArg* somePtr = (doSomeThingArg*)malloc( sizeof(doSomeThingArg) ); pthread_mutex_init(&lock, NULL);//use mutex_init before using mutex_lock for(int i=0; i<NUM_JOB; i++){ somePtr->count = i; pthread_create(&(tid[i]), NULL, &doSomeThing, somePtr); } /* somePtr->count = 100; printf("after create, before join: %d\n", somePtr->count); */ for(int i=0; i<NUM_JOB; i++){ pthread_join(tid[i], NULL); } somePtr->count = 100; printf("after create, after join: %d\n", somePtr->count); free(somePtr); return 0; } void *doSomeThing(void* arg){ pthread_mutex_lock(&lock); doSomeThingArg* somePtr = (doSomeThingArg*)arg; printf("Job %d started\n", id); printf("somePtr->count: %d\n", somePtr->count); for(int i=0; i<somePtr->count; i++); printf("Job %d ended\n", id); id++; pthread_mutex_unlock(&lock); return NULL; }
  • 若沒有使用pthread_mutex_lock

    • 使用mutrace去追蹤程式
      ​​hua@hua-ubuntu:~/Desktop$ mutrace ./mutex_test
      ​​mutrace: Application appears to be compiled without -rdynamic. It might be a
      ​​mutrace: good idea to recompile with -rdynamic enabled since this produces more
      ​​mutrace: useful stack traces.
      
      ​​mutrace: 0.2 sucessfully initialized for process mutex_test (pid 25109).
      ​​Job 0 started
      ​​somePtr->count: 1
      ​​Job 0 ended
      ​​Job 0 started
      ​​somePtr->count: 1
      ​​Job 1 ended
      ​​after create, after join: 100
      
      ​​mutrace: Showing statistics for process mutex_test (pid 25109).
      ​​mutrace: 1 mutexes used.
      
      ​​mutrace: No mutex contended according to filtering parameters.
      
      ​​mutrace: Total runtime is 0.505 ms.
      
      ​​mutrace: Results for SMP with 4 processors.
      
      
  • 使用mutex lock並用mutrace追蹤程式

    ​hua@hua-ubuntu:~/Desktop$ mutrace ./mutex_test
    ​mutrace: Application appears to be compiled without -rdynamic. It might be a
    ​mutrace: good idea to recompile with -rdynamic enabled since this produces more
    ​mutrace: useful stack traces.
    
    ​mutrace: 0.2 sucessfully initialized for process mutex_test (pid 25130).
    ​Job 0 started
    ​somePtr->count: 1
    ​Job 0 ended
    ​Job 1 started
    ​somePtr->count: 1
    ​Job 1 ended
    ​after create, after join: 100
    
    ​mutrace: Showing statistics for process mutex_test (pid 25130).
    ​mutrace: 1 mutexes used.
    
    ​mutrace: No mutex contended according to filtering parameters.
    
    ​mutrace: Total runtime is 0.519 ms.
    
    ​mutrace: Results for SMP with 4 processors.
    

3. Thread Models

digraph hierarchy {
    nodesep=0.5 // increases the separation between nodes
    node [color=black,fontname=Courier,shape=box] //All nodes will this shape and colour
    edge [color=black, style=dashed] // All the lines look like this
    Manager->{Worker1 Worker2 Worker3}
}

為了分配工作,在 worker thread 實作 Task 的機制。每個主要的操作會先被放進 task queue 裡頭,空閒的 thread 再從 task queue 裡頭提取 task 執行,只要把我們的操作寫成 task,就能順利執行。

digraph {
開始Process[shape="box", style=rounded];
是否為結束Task[shape="diamond"];
結束Process[shape="box", style=rounded];
開始Process->提取Task->是否為結束Task
是否為結束Task->重發結束Task[label="是"]
重發結束Task->結束Process
是否為結束Task->執行Task[label="否"]
執行Task->提取Task;
}

4. Code Review for mergesort-concurrent

list.[c,h]

注意 data type 的使用

  • stackoverflow - 什麼是 intptr_t

    • 此篇討論intptr_tvoid * 的比較, 由於 bitwise operator 不可以對void *做運算, 但是如果要對address作bitwise operator的運算可以宣告為intptr_t 型態
  • stackoverflow - 使用 intptr_t 資料型態的時機

    • pointer 轉成 integer 時,32 bits 或 64 bits 的 machine 會有 pointer 長度的差別。若使用到 intptr_t 就能免除麻煩的型態轉換。除此之外,也是確保型態轉換上不會出問題的作法
  • 以下為32-bit和64-bit機器的type size

    type 32-bit machine 64-bit machine
    char 1 1
    short 2 2
    int 4 4
    long 4 8
    long long 8 8
    pointer 4 8
  • 使用intptr_t的目的

    • 讓 node 中的資料可以是指標型態,也可以是一般資料型態。
    • 為了相容不同平台所設計。

threadpool.[c,h]

  • wiki - threadpool

    • 再前幾次作業中, 每執行短時間task時, 就建立thread, 但是當完成task後就進行銷毀, 而沒有有效利用thread的能力
    • 因此需要建立一個機制來有效利用thread, thread pool即是一個不錯的方法, 而任務排程以執行執行緒的常見方法是使用 同步佇列 (synchronized queue),稱作 任務佇列 (task queue), pool中的執行緒等待佇列中的任務,並把執行完的任務放入完成佇列中

    img

thread pool
typedef struct{ pthread_t* threads; uint32 count; tqueue_t* queue; }tpool_t;
  • pool中存有count個threads, 並且有個指向task queue的指針, 以利提領task
task
typedef struct _task{ void (*func)(void*); void *arg; struct _task *next; }task_t;
  • task中有指向void*型態的function, 其參數交由arg去指
task queue
typedef struct { task_t *head; task_t *tail; pthread_mutex_t mutex; pthread_cond_t cond; uint32_t size; uint32_t num_of_consumed; } tqueue_t;

merge_sort.[c,h]

llist_t *sort_n_merge(llist_t *a, llist_t *b); llist_t *split_n_merge(llist_t *list); llist_t *merge_sort(llist_t *list);

main.c

tpool_init(pool, thread_count, task_run);
tqueue_push(pool->queue, task_new(cut_local_list, the_list));

Code Review for concurrent-ll

Select a repo