# 多處理機平行程式設計HW5
[quest](https://hackmd.io/@joey3639570/H1xxTgUFK)
## Problem 1
### Question
#### 1
`temp` shoudle be shared,because each thread has to insert the result .
`count` should be private , because count should not be modified by others
`i` can be shared or private because it remain constant during sorting
`j` should be privated , because it iteration through the list for comparing
#### 2
No ,there isn't . Each loop calculate the count from 0 . Since the `temp` table won't encounter collision
#### 3
Yes , but we have to calculate the range first .If the array isn't so large , it is inefficiency .
#### 4
On the server machine
Execution :
```
./hw51 [sizeof array] [worker_n]
```
#### 5
* **serial**

* **with 4 threads**

* **qsort**

It is obvious that qsort is faster than Count-sort , becasue the time complexity is O(n) for Count-sort , but O(log(n)) for qsort .Although we runnning with mutithread , the search cost dones't decreased .
When parallelized with 4 threads , our speedup is 1.19 , which is 20% faster than serial one .
## Problem 2
To parallelized the word-count program , we use an MPMC model .Each producer parse the token `word` out ,and put it enqueue .Each cosumer dequeue a `word` ,and insert into the hash table database .
### Usage
* **compile**
`gcc word_count.c -o word_count -std=c99 -g -Wall -fopenmp`
* **execution**
`./word_count [number of file] []file name 1] [file name 2] [worker number] [producer number]`
* **example**

### Flow Chart
```graphviz
Digraph G{
".txt file"->"producer1"[label = "fetch"]
"producer1"->"job queue"
".txt file"->"producer2"[label = "fetch"]
"producer2"->"job queue"
".txt file"->"producer3"[label = "fetch"]
"producer3"->"job queue"
".txt file"->"producer4"[label = "fetch"]
"producer4"->"job queue"
"job queue"->consumer1
consumer1->"Hash Table"[label = "insert"]
"job queue"->consumer2
consumer2->"Hash Table"[label = "insert"]
}
```
### Jobqueue
The structure of the jobqueue is circular ring buffer . It is a amazing struture .First , the operation is simple ,which reduce the time in critical section .Second , we only has to take care of `in` and `out` index for producer and consumer .
In this implementation , we reserve the position for producer and consumer to do the operation .We use busy waiting to check if the target position is readable or writable .Also we use a very large queue to reduce the cost of handling the `full` and `empty` situation .
#### Flow Chart
```graphviz
Digraph G{
"queue" ->"Take a position"->"check if valid"
"check if valid"->"do the oparation"[label = "True"]
"check if valid"->"check if valid"[ label="False" ]
}
```
#### Main Code
* **Dequeue**
```clike
dic * dequeue(){
int pos ;
omp_set_lock(&(q->w)) ;
pos = q->out ++ ;
omp_unset_lock(&(q->w)) ;
if(stop < pn){
while(q->table[pos].words == NULL){
if(stop == pn)
break ;
};
}
dic * tmp = &(q->table[pos]) ;
return tmp ;
}
```
* **Enqueue**
```clike
int enqueue(char ** target , int size){
int pos ;
omp_set_lock(&(q->r)) ;
pos = q->in ++ ;
omp_unset_lock(&(q->r)) ;
q->table[pos].size = size ;
q->table[pos].words = target ;
return 1 ;
}
```
#### Thread Safety
Since the only critical section is operations on `in` nad `out` , the queue is thread safety .To prevent from over lapping , we specific the queue size be `worker_n + producer_n` which ensure the accurracay when all worker has reserved the position .
Since the producer job will finished first , `stop` is used for breaking out the busy waiting loop in consumer .
### File Input
When writing parallelized program , an array is always useful since the interval is easy to determined . So we use `mmap` , `open` and `lseek` these system call to map the file into an array .
#### System Call
* open
Open a file descripter refer to the file
* lseek
Move the fd to the specific position , `SEEK_END` means seeks to the end of file .`lseek` returns the bytes go through ,
* mmap
Maps the file into array ,the file has to be specified with fd , file_size ,and beginning offset .
#### Main Code
```clike
int get_file(char * file_name){
fd = open(file_name , O_RDONLY) ;
if(fd == -1)
return 0;
fsize = lseek(fd , 0 ,SEEK_END) ;
if(fsize < 0)
return 0;
file = mmap(NULL ,fsize,PROT_READ ,MAP_SHARED,fd ,0) ;
if(!file)
return 0 ;
slice = fsize / pn ;
return 1 ;
}
```
### Performance
Source file : lyrics of a [song](https://www.youtube.com/watch?v=dQw4w9WgXcQ)
#### Serial , 1 file

#### Producer = 1 ,Cosumer = 1 ,1 file

#### Producer = 1 ,Cosumer = 2 ,1 file

#### Producer = 2 ,Cosumer = 2 ,1 file

Among all test , the performance is really bad ! Only the 1-1 model has a 10% speed-up . I guess the reason is the fread time cost was saved by mmap , which cost the most heavy part be very light .And the insertion has to do `lock` and `busy waiting` operation , even when poping out the queue has to `lock` , too .The total time cost will be `serial`/ `threadn` + `busy` + `lock` .
I think if I implement the program by fread or without single safety queue , it will be much faster !