# HW4 Grading Policy
$$
score = correctness*0.5 + performance*0.2 + report
$$
## Correctness (50%)
- $X$: passed tests
- $N$: total number of tests
$$
correctness = \frac{X}{N} * 100
$$
## Performance (20%)
- $T$: time + panelty (from the scoreboard)
- $T_i$: student $i$'s $T$
- $T_{best}$: the minimum $T$ of all the students
$$
performance = \frac{T_{best}}{T_i}*100
$$
## Report (30%)
1. (10%) Your implementation
- (5%) Mention which functions are ported to CUDA
- (5%) Elaberate on how they distribute the workload to blocks and threads
2. (10%) The parallelization and optimization techniques you used in your solution
- (3%) Mention at least 1 parallelization technique
- (3%) Mention at least 1 optimization technique, excluding simple CUDA parallelization
- (2%) Mention another parallelization or optimization technique
- (2%) Mention yet another parallelization or optimization technique
3. (10%) Experiments of various combinations of the number of blocks & threads (at least 8 combinations) and plot them with the figures
- (4%) Show at least 8 combinations of the number of blocks & threads
- (3%) Show the figures, which should contain at least the number of block, the number of thread, and the execution time
- (3%) Explain the causes or indications of the results
4. (Optional, 10%) Describe the details if you use advanced CUDA skills
- Streaming, page-lock memory, asynchronous memory copy, or any other advanced skills.
5. (Optional, 10%) If you optimize the other parts of your source codes, please demonstrate your experimental results. We REQUIRE you to justify your solutions so that we can give you credits.
- Elaberate on which part and how them optimize the source codes, and demonstrate the experimental results to justify it
7. (Optional, 10%) Any suggestions or feedback for the homework are welcome.
- At least 1 meaningful suggestions or constructive feedback to the assignment or spec
###### tags: `grading policy`