HW4 Grading Policy

score=correctness0.5+performance0.2+report

Correctness (50%)

  • X
    : passed tests
  • N
    : total number of tests

correctness=XN100

Performance (20%)

  • T
    : time + panelty (from the scoreboard)
  • Ti
    : student
    i
    's
    T
  • Tbest
    : the minimum
    T
    of all the students

performance=TbestTi100

Report (30%)

  1. (10%) Your implementation
    • (5%) Mention which functions are ported to CUDA
    • (5%) Elaberate on how they distribute the workload to blocks and threads
  2. (10%) The parallelization and optimization techniques you used in your solution
    • (3%) Mention at least 1 parallelization technique
    • (3%) Mention at least 1 optimization technique, excluding simple CUDA parallelization
    • (2%) Mention another parallelization or optimization technique
    • (2%) Mention yet another parallelization or optimization technique
  3. (10%) Experiments of various combinations of the number of blocks & threads (at least 8 combinations) and plot them with the figures
    • (4%) Show at least 8 combinations of the number of blocks & threads
    • (3%) Show the figures, which should contain at least the number of block, the number of thread, and the execution time
    • (3%) Explain the causes or indications of the results
  4. (Optional, 10%) Describe the details if you use advanced CUDA skills
    • Streaming, page-lock memory, asynchronous memory copy, or any other advanced skills.
  5. (Optional, 10%) If you optimize the other parts of your source codes, please demonstrate your experimental results. We REQUIRE you to justify your solutions so that we can give you credits.
    • Elaberate on which part and how them optimize the source codes, and demonstrate the experimental results to justify it
  6. (Optional, 10%) Any suggestions or feedback for the homework are welcome.
    • At least 1 meaningful suggestions or constructive feedback to the assignment or spec
tags: grading policy