# Bitonic Merge Sort SC 25 training ## GOAL This assignment helps you get familiar with MPI by implementing a parallel bitonic merge sort algorithm. Experimental evaluations on your implementation are required to guide you analyze the performance and scalability of a parallel program. You are also encouraged to explore any performance optimization and parallel computing skills. ## Problem Description In this assignment, you are required to implement the bitonic merge sort algorithm using MPI. #### Bitonic sort Bitonic sorting algorithm is based on bitonic sorting network. The key operation is based on the sorting network which converts a given sequence into a bitonic sequence and finally bitonic merge can produce a monotonically increasing or decreasing sequency. Bitonic Sort Principle - Bitonic sequence, a sequence is called Bitonic if it is first increasing, then decreasing - Let s = < $a_0$, $a_1$, ..., $a_{n-1}$ > - s1 = { $min( a_0, a_{n/2} )$, $min( a_1, a_{n/2+1} )$,.…..,$min( a_{n/2-1}, a_{n-1} )$ } - s2 = { $max( a_0, a_{n/2} )$, $max( a_1, a_{n/2+1} )$,.…..,$max( a_{n/2-1}, a_{n-1} )$ } - In sequence s1 , there is an element $b_i$ = $min( a_i , a_{n/2+i} )$ such that all the elements before $b_i$ are from the increasing part of the original sequence and all the elements after $b_i$ are from the decreasing part. - Opposite case for $b_i$ = $max( a_i , a_{n/2+i} )$ ![image](https://hackmd.io/_uploads/BJjRxeHqT.png) #### Stage Diagram of bitonic merge sort Example input sequence: 3, 2, 8, 7, 1, 5, 4, 6 ##### Stage 1 ![stage1](https://hackmd.io/_uploads/HkXkOaNqT.png) Output: 2, 3, 8, 7, 1, 5, 6, 4 ##### Stage 2 ![stage2](https://hackmd.io/_uploads/S1favpVqp.png) Output: 2, 3, 7, 8, 6, 5, 4, 1 ##### Stage 3 ![stage3](https://hackmd.io/_uploads/SJ1ovTN9T.png) Output: 1, 2, 3, 4, 5, 6, 7, 8 ## INPUT/OUTPUT FORMAT The input file contains n 64-bit doubles in binary format. The first 8 bytes represent the first floating point number, the ninth to sixteenth byte represents the second one, and so on. - The output file must be generated by the same format as the input file - Your program is required to read an unsorted array from an input file, and generate the sorted results to another output file. - Your program accepts 3 input parameters separated by space. They are: - Integer, the size of the array n (1 ≤ n ≤ 1327080384) - String, the input file name - String, the output file name - Your program should run on Taiwania 1 with PBS - qsub ... job_script.sh - Create a job_script.sh - You should use any setting if you need, below is a patterm for you ```bash ``` - The sample test cases at the directory `/work1/trkk1224/bonus/testcase` - The following values: `-INF`, `+INF`, `NAN` will not show up in the input ## WORKING ITEMS #### Problem assignments You are required to implement a **parallel version** of **bitonic merge sort** under the given restrictions. Your goal is to optimize the performance and reduce the execution time. - A process can sort or perform any computations on its own local elements. - For bitonic sort, a process can only exchange its local elements with a reasonable pair, you should explain your implementation in report if you didn't follow the pattern of bitonic merge sort #### Requirements & Reminders The implemented algorithm must follow the bitonic merge sort principle, and comply with the restrictions described above. If you are not sure whether your implementation follows the rules, please discuss with TA for approval. - You can use openmp to speedup your code, while **MPI is necessary** - Must use MPI-IO - Must follow the input/output file format and execution command line arguments #### Judging & Scoreboard ## REPORT - **Tilte, Name, Email** - Explain your implementation - Essential - <font color="red"> **How do you sort in your program?** </font> - Present your sorting network in a 8 processes scenario with a diagram - How do you handle the input/output items? - Optional - Any tips to improve the performance? - Experiment & Analysis - System, Environment Spec - Which queue did you choose? - How many nodes, processes used? - What library have you used in your program, version? - Performance Metrics - How do you measure the computing, communication and IO time? - How do you compute the values in the plots? - Speedup Factor & Profile - <font color="red"> **Plot the speedup and profile results** </font> - Make sure your plots are properly labeled and formatted - You can generate your own testcase if meaningful - Experiences & Conclusion - What have you learned from this assignment? - What difficulties did you encounter in this assignment? - Feedbacks - Others - You are encouraged to conduct more experiments and analysis ## GRADING #### **Correctness 55%** testcase1 - 1% ... testcase10 - 10% #### **Report 45%** Explain your implementation - 20% Experiment & Analysis - 20% Experiences & Conclusion - 5 % Others - extra point ## Submission Details - **Deadline: Feb 23, 2024, 23:59** - **Upload to the Google Form before the deadline.** - bitonic_mergesort.cc - Makefile (Optional) - Report - please submit with .md file - **GoogleForm:** https://forms.gle/M37deHofZGtFvJ5C9