HW3 Grading Policy

s c o r e = c o r r e c t n e s s * 0.4 + p e r f o r m a n c e * 0.25 + r e p o r t

Correctness (40%)

$X$ : passed tests
$N$ : total number of tests

c o r r e c t n e s s = \frac{X}{N} * 100

Performance (25%)

(15%) sobel GPU kernel function.

$T$ : time + panelty (from the correctness-kernel scoreboard)
$T_{i}$ : student
$i$ 's
$T$
$T_{b e s t}$ : the minimum
$T$ of all the students

p e r f o r m a n c e = \frac{T_{b e s t}}{T_{i}} * 100

(10%) Total CPU time.

$T$ : time + panelty (from the correctness scoreboard)
$T_{i}$ : student
$i$ 's
$T$
$T_{b e s t}$ : the minimum
$T$ of all the students

p e r f o r m a n c e = \frac{T_{b e s t}}{T_{i}} * 100

Report (35%)

(8%+ bonus 5%) How did you parallelize the code?
- (2%) Which CUDA APIs did you use?
  - Metion at least 3 CUDA APIs used.
- Which functions are ported to CUDA? How did you distribute the workload to blocks and threads?
  - (1%) Mention at least 1 function ported to CUDA.
  - (2%) Elaberate on how they distribute the workload distribution to blocks and threads
  - (bonus, 5%) Conduct experiment to show their workload distribution is better than some other combinations.
- How do you implement shared memory?
  - (1%) Mention which variables are stored in shared memory.
  - (2%) Explain why they stored them in shared memory.
(5%) Which optimization techniques did you apply to your code?
- Metion at least 1 optimization technique they used, excluding simple parallelization with CUDA and simple shared memory implementation.
(6%) What's the difference between cudaMalloc and cudaMallocManaged? When will you pick one over another?
- (3%) Mention at least 1 main difference
- (1.5%) Describe 1 scenario that should use cudaMalloc
- (1.5%) Describe 1 scenario that should use cudaMallocManaged
(11%) Experiment:
- Measure the GPU kernel time using nvprof. Show the difference with and without shared memory. In addition, measure the global memory load throughput (gld_throughput) and instruction per cycle (ipc) and explain your observation.
  - (2%) Show the GPU kernel time difference of using and not using shared memory
  - (2%) Show the global memory load throughput and instruction per cycle
  - (2%) Given at least 1 observation
  - (2%) Explain some meaningful thing about the observation (e.g., possible causes)
- Profile your program by measuring the time spend in I/O, memory copy, CPU, and kernel.
  - (1%) Show the total time
  - (0.5%) Show the time spend in I/O
  - (0.5%) Show the time spend in memory copy
  - (0.5%) Show the time spend in CPU
  - (0.5%) Show the time spend in kernel
(5%) Pick any image that is not in the sample test cases, run your implementation with the image, and showcase both the input and output in your report.
- Show both the input and output image
(optional, 5%) Any constructive suggestions or feedback for the homework are welcome.
- At least 1 meaningful suggestions or constructive feedback to the assignment or spec

HW3 Grading Policy

Correctness (40%)

Performance (25%)

(15%) sobel GPU kernel function.

(10%) Total CPU time.

Report (35%)

tags: grading policy

Read more

HW6 Grading Policy

HW5 Grading Policy

HW4 Grading Policy

HW6: GPU Simulator

tags: `grading policy`