# HW4 Grading Policy $$ score = correctness*0.5 + performance*0.2 + report $$ ## Correctness (50%) - $X$: passed tests - $N$: total number of tests $$ correctness = \frac{X}{N} * 100 $$ ## Performance (20%) - $T$: time + panelty (from the scoreboard) - $T_i$: student $i$'s $T$ - $T_{best}$: the minimum $T$ of all the students $$ performance = \frac{T_{best}}{T_i}*100 $$ ## Report (30%) 1. (10%) Your implementation - (5%) Mention which functions are ported to CUDA - (5%) Elaberate on how they distribute the workload to blocks and threads 2. (10%) The parallelization and optimization techniques you used in your solution - (3%) Mention at least 1 parallelization technique - (3%) Mention at least 1 optimization technique, excluding simple CUDA parallelization - (2%) Mention another parallelization or optimization technique - (2%) Mention yet another parallelization or optimization technique 3. (10%) Experiments of various combinations of the number of blocks & threads (at least 8 combinations) and plot them with the figures - (4%) Show at least 8 combinations of the number of blocks & threads - (3%) Show the figures, which should contain at least the number of block, the number of thread, and the execution time - (3%) Explain the causes or indications of the results 4. (Optional, 10%) Describe the details if you use advanced CUDA skills - Streaming, page-lock memory, asynchronous memory copy, or any other advanced skills. 5. (Optional, 10%) If you optimize the other parts of your source codes, please demonstrate your experimental results. We REQUIRE you to justify your solutions so that we can give you credits. - Elaberate on which part and how them optimize the source codes, and demonstrate the experimental results to justify it 7. (Optional, 10%) Any suggestions or feedback for the homework are welcome. - At least 1 meaningful suggestions or constructive feedback to the assignment or spec ###### tags: `grading policy`