<!-- Call the assignment `CS450_aXXX_nhn33.pdf` where `XXX` is the assignment number --> :::info **CS 450** Noah Nuebling ::: # Assignment 4 ## Average times after 10 runs <!-- Question | Total time | Time t1 | Time t2 | Load Imbalance | ---|--------------|-----------|-----------|------------ | Q1 | 26.463382 | | | | Q2 | 26.189997 | 6.561526 | 26.182978 | 19.621452 | Q3 | 22.016665 | 21.450770 | 21.839648 | 0.388878 | Q4 | 20.516656 | 19.976702 | 20.429971 | 0.453269 | --> Question | Total time | Time t1 | Time t2 | Load Imbalance | ---|--------------|-----------|-----------|------------ | Q1 | 19.417746 | | | | Q2 | 19.172079 | 4.271226 | 19.171821 | 14.900595 | Q3 | 18.310884 | 18.310585 | 17.858373 | -0.452212 | Q4 | 14.675914 | 14.675803 | 14.459167 | -0.216636 | ## Question 3 The load imbalance is much improved in V3 compared to V1 and V2. Performance is also slightly improved. <!-- However, I think my computer was much hotter during V2 test runs, so it’s hard to compare software performance. --> The improvements in load balancing can be explained by the dynamic scheduling used to assign the different iterations of the for-loop to the 2 threads. Instead of simply assigning half of the iterations to one thread and the other half to the other thread (static scheduling) each thread is assigned a new loop iteration as soon as it's done processing some previous iteration. This leads to much better load balancing if some loop iterations take much more time than others. (As is the case here) However, much of the performance benefits that come with better load balancing seem to be canceled out by the overhead created by the more complex dynamic scheduling. In the end V3 only runs marginally faster than V2. The increase in combined runtime of the 2 threads in V3 can be explained by the scheduling overhead. ## Question 4 V4 is a little faster. omp is based on pthreads. The added layer of abstraction seems to come with some overhead. This overhead falls away in V4 since we're using pthreads directly.