HW3 Grading Policy

score=correctness0.4+performance0.25+report

Correctness (40%)

  • X
    : passed tests
  • N
    : total number of tests

correctness=XN100

Performance (25%)

(15%) sobel GPU kernel function.

performance=TbestTi100

(10%) Total CPU time.

  • T
    : time + panelty (from the correctness scoreboard)
  • Ti
    : student
    i
    's
    T
  • Tbest
    : the minimum
    T
    of all the students

performance=TbestTi100

Report (35%)

  1. (8%+ bonus 5%) How did you parallelize the code?
    • (2%) Which CUDA APIs did you use?
      • Metion at least 3 CUDA APIs used.
    • Which functions are ported to CUDA? How did you distribute the workload to blocks and threads?
      • (1%) Mention at least 1 function ported to CUDA.
      • (2%) Elaberate on how they distribute the workload distribution to blocks and threads
      • (bonus, 5%) Conduct experiment to show their workload distribution is better than some other combinations.
    • How do you implement shared memory?
      • (1%) Mention which variables are stored in shared memory.
      • (2%) Explain why they stored them in shared memory.
  2. (5%) Which optimization techniques did you apply to your code?
    • Metion at least 1 optimization technique they used, excluding simple parallelization with CUDA and simple shared memory implementation.
  3. (6%) What's the difference between cudaMalloc and cudaMallocManaged? When will you pick one over another?
    • (3%) Mention at least 1 main difference
    • (1.5%) Describe 1 scenario that should use cudaMalloc
    • (1.5%) Describe 1 scenario that should use cudaMallocManaged
  4. (11%) Experiment:
    • Measure the GPU kernel time using nvprof. Show the difference with and without shared memory. In addition, measure the global memory load throughput (gld_throughput) and instruction per cycle (ipc) and explain your observation.
      • (2%) Show the GPU kernel time difference of using and not using shared memory
      • (2%) Show the global memory load throughput and instruction per cycle
      • (2%) Given at least 1 observation
      • (2%) Explain some meaningful thing about the observation (e.g., possible causes)
    • Profile your program by measuring the time spend in I/O, memory copy, CPU, and kernel.
      • (1%) Show the total time
      • (0.5%) Show the time spend in I/O
      • (0.5%) Show the time spend in memory copy
      • (0.5%) Show the time spend in CPU
      • (0.5%) Show the time spend in kernel
  5. (5%) Pick any image that is not in the sample test cases, run your implementation with the image, and showcase both the input and output in your report.
    • Show both the input and output image
  6. (optional, 5%) Any constructive suggestions or feedback for the homework are welcome.
    • At least 1 meaningful suggestions or constructive feedback to the assignment or spec
tags: grading policy