多處理機平行程式設計HW5

# 多處理機平行程式設計HW5 [quest](https://hackmd.io/@joey3639570/H1xxTgUFK) ## Problem 1 ### Question #### 1 `temp` shoudle be shared,because each thread has to insert the result . `count` should be private , because count should not be modified by others `i` can be shared or private because it remain constant during sorting `j` should be privated , because it iteration through the list for comparing #### 2 No ,there isn't . Each loop calculate the count from 0 . Since the `temp` table won't encounter collision #### 3 Yes , but we have to calculate the range first .If the array isn't so large , it is inefficiency . #### 4 On the server machine Execution : ``` ./hw51 [sizeof array] [worker_n] ``` #### 5 * **serial** ![](https://i.imgur.com/3tL9H7K.png) * **with 4 threads** ![](https://i.imgur.com/uXEIbId.png) * **qsort** ![](https://i.imgur.com/6Zj0XAE.png) It is obvious that qsort is faster than Count-sort , becasue the time complexity is O(n) for Count-sort , but O(log(n)) for qsort .Although we runnning with mutithread , the search cost dones't decreased . When parallelized with 4 threads , our speedup is 1.19 , which is 20% faster than serial one . ## Problem 2 To parallelized the word-count program , we use an MPMC model .Each producer parse the token `word` out ,and put it enqueue .Each cosumer dequeue a `word` ,and insert into the hash table database . ### Usage * **compile** `gcc word_count.c -o word_count -std=c99 -g -Wall -fopenmp` * **execution** `./word_count [number of file] []file name 1] [file name 2] [worker number] [producer number]` * **example** ![](https://i.imgur.com/H8HeqZm.png) ### Flow Chart ```graphviz Digraph G{ ".txt file"->"producer1"[label = "fetch"] "producer1"->"job queue" ".txt file"->"producer2"[label = "fetch"] "producer2"->"job queue" ".txt file"->"producer3"[label = "fetch"] "producer3"->"job queue" ".txt file"->"producer4"[label = "fetch"] "producer4"->"job queue" "job queue"->consumer1 consumer1->"Hash Table"[label = "insert"] "job queue"->consumer2 consumer2->"Hash Table"[label = "insert"] } ``` ### Jobqueue The structure of the jobqueue is circular ring buffer . It is a amazing struture .First , the operation is simple ,which reduce the time in critical section .Second , we only has to take care of `in` and `out` index for producer and consumer . In this implementation , we reserve the position for producer and consumer to do the operation .We use busy waiting to check if the target position is readable or writable .Also we use a very large queue to reduce the cost of handling the `full` and `empty` situation . #### Flow Chart ```graphviz Digraph G{ "queue" ->"Take a position"->"check if valid" "check if valid"->"do the oparation"[label = "True"] "check if valid"->"check if valid"[ label="False" ] } ``` #### Main Code * **Dequeue** ```clike dic * dequeue(){ int pos ; omp_set_lock(&(q->w)) ; pos = q->out ++ ; omp_unset_lock(&(q->w)) ; if(stop < pn){ while(q->table[pos].words == NULL){ if(stop == pn) break ; }; } dic * tmp = &(q->table[pos]) ; return tmp ; } ``` * **Enqueue** ```clike int enqueue(char ** target , int size){ int pos ; omp_set_lock(&(q->r)) ; pos = q->in ++ ; omp_unset_lock(&(q->r)) ; q->table[pos].size = size ; q->table[pos].words = target ; return 1 ; } ``` #### Thread Safety Since the only critical section is operations on `in` nad `out` , the queue is thread safety .To prevent from over lapping , we specific the queue size be `worker_n + producer_n` which ensure the accurracay when all worker has reserved the position . Since the producer job will finished first , `stop` is used for breaking out the busy waiting loop in consumer . ### File Input When writing parallelized program , an array is always useful since the interval is easy to determined . So we use `mmap` , `open` and `lseek` these system call to map the file into an array . #### System Call * open Open a file descripter refer to the file * lseek Move the fd to the specific position , `SEEK_END` means seeks to the end of file .`lseek` returns the bytes go through , * mmap Maps the file into array ,the file has to be specified with fd , file_size ,and beginning offset . #### Main Code ```clike int get_file(char * file_name){ fd = open(file_name , O_RDONLY) ; if(fd == -1) return 0; fsize = lseek(fd , 0 ,SEEK_END) ; if(fsize < 0) return 0; file = mmap(NULL ,fsize,PROT_READ ,MAP_SHARED,fd ,0) ; if(!file) return 0 ; slice = fsize / pn ; return 1 ; } ``` ### Performance Source file : lyrics of a [song](https://www.youtube.com/watch?v=dQw4w9WgXcQ) #### Serial , 1 file ![](https://i.imgur.com/dFtkYgL.png) #### Producer = 1 ,Cosumer = 1 ,1 file ![](https://i.imgur.com/F4a7wOv.png) #### Producer = 1 ,Cosumer = 2 ,1 file ![](https://i.imgur.com/MtIfUMH.png) #### Producer = 2 ,Cosumer = 2 ,1 file ![](https://i.imgur.com/jjU7FvD.png) Among all test , the performance is really bad ! Only the 1-1 model has a 10% speed-up . I guess the reason is the fread time cost was saved by mmap , which cost the most heavy part be very light .And the insertion has to do `lock` and `busy waiting` operation , even when poping out the queue has to `lock` , too .The total time cost will be `serial`/ `threadn` + `busy` + `lock` . I think if I implement the program by fread or without single safety queue , it will be much faster !