Parallel Programming 2022 fall hw5

# Parallel Programming 2022 fall hw5 ## 1. Counting Sort Questions ```c void Count_sort(int a[], int n) { int i, j, count; int* temp = malloc(n∗sizeof(int)); for (i = 0; i < n; i++) { count = 0; for (j = 0; j < n; j++) if (a[j] < a[i]) count++; else if (a[j] == a[i] && j < i) count++; temp[count] = a[i]; } memcpy(a, temp, n∗sizeof(int)); free(temp); } /∗ Count sort ∗/ ``` 1. >If we try to parallelize the for i loop (the outer loop), which variables should be private and which should be shared? Private: i (by default), count, j Shared: temp[], a[] 2. >If we parallelize the for i loop using the scoping you speciﬁed in the previous part,are there any loop-carried dependencies? Explain your answer. No, we don't have anything like a[i-1], a[i-2]... Also, openMP require the parallized for loops to not have loop-carried depencendies. 3. > Can we parallelize the call to memcpy? Can we modify the code so that this part of the function will be parallelizable? Not easily, and I don't think doing so will improve performance much, since it would be blocked because all threads reads through all a[i]. Maybe we can have each thread record the calculated ```count``` for their local ```a[i]```, and apply a barrier to make sure all threads are done with a[], then we can have each thread do ```memcpy(a[i], temp[count], sizeof(int))``` on their local ```count``` and ```a[i]```; 4. >Write a C program that includes a parallel implementation of Count sort. 5. >How does the performance of your parallelization of Count sort compare to serial Count sort? How does it compare to the serial qsort library function? - qsort performs best because of better time complexity - parallel countsort is way faster then serial count sort ![](https://i.imgur.com/BNXMdEx.png) ![](https://i.imgur.com/9J3qbdp.png) ![](https://i.imgur.com/3VWM645.png) ![](https://i.imgur.com/LKWgDRs.png) ![](https://i.imgur.com/fw7S6cc.png) ![](https://i.imgur.com/2ZvNYgc.png) ## 2. Producer-Consumer >Given **==a keyword file== that contains many keywords**. Use OpenMP to implement a producer-consumer program in which some of the threads are producers and others are consumers. The **producers read text from ==a collection of files==, one per producer**. They insert lines of text into a single shared queue (please implement your own thread safe queue)NEW. The consumers take the lines of text and tokenize them. Tokens are “words” separated by white space. When a consumer finds a token that is keyword, the keyword count increases one. Please print each keyword and its count. ```mermaid graph LR file0.txt --> producer0 --> tail file1.txt --> producer1 --> tail file2.txt --> producer2 --> tail file3.txt --> producer3 --> tail head --> consumer0 --> key_count head --> consumer1 --> key_count head --> consumer2 --> key_count head --> consumer3 --> key_count tail --thread_safe_queue--> head keywords.txt--> consumer0 keywords.txt--> consumer1 keywords.txt--> consumer2 keywords.txt--> consumer3 ``` - Each producer gets its own filename, reads from it, then pushes a line into the queue. ```c=218 #pragma omp for schedule(auto) nowait for (int i = 0; i < file_count; i++) { producer(&threadSafeQueue, file_name[i]); // printf("openfilename: %s\n", file_name[i]); } ``` - I create the same number of consumers as the produsers, each of the consumers read a line from the queue and tokenize it. - I get most of my text_files from generating them with chatGPT like so. ![](https://i.imgur.com/5CSlvl7.png) ## Difficulties - OpenMP makes it better, but parallel programs are still very hard to debug for me. ## References - [OpenMP: Data-Sharing Rules](http://jakascorner.com/blog/2016/06/omp-data-sharing-attributes.html#:~:text=A%20variable%20in%20an%20OpenMP,copy%20of%20the%20private%20variable.) - [OpenMP: For & Scheduling](http://jakascorner.com/blog/2016/06/omp-for-scheduling.html)