There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
HW4 Grading Policy
Correctness (50%)
: passed tests
: total number of tests
Performance (20%)
: time + panelty (from the scoreboard)
: student
's
: the minimum
of all the students
Report (30%)
(10%) Your implementation
(5%) Mention which functions are ported to CUDA
(5%) Elaberate on how they distribute the workload to blocks and threads
(10%) The parallelization and optimization techniques you used in your solution
(3%) Mention at least 1 parallelization technique
(3%) Mention at least 1 optimization technique, excluding simple CUDA parallelization
(2%) Mention another parallelization or optimization technique
(2%) Mention yet another parallelization or optimization technique
(10%) Experiments of various combinations of the number of blocks & threads (at least 8 combinations) and plot them with the figures
(4%) Show at least 8 combinations of the number of blocks & threads
(3%) Show the figures, which should contain at least the number of block, the number of thread, and the execution time
(3%) Explain the causes or indications of the results
(Optional, 10%) Describe the details if you use advanced CUDA skills
Streaming, page-lock memory, asynchronous memory copy, or any other advanced skills.
(Optional, 10%) If you optimize the other parts of your source codes, please demonstrate your experimental results. We REQUIRE you to justify your solutions so that we can give you credits.
Elaberate on which part and how them optimize the source codes, and demonstrate the experimental results to justify it
(Optional, 10%) Any suggestions or feedback for the homework are welcome.
At least 1 meaningful suggestions or constructive feedback to the assignment or spec