HW3: Sobel

Due: Tue, 2023/4/11 23:59

Problem Description

Slides: https://docs.google.com/presentation/d/1Ux1caoQtJmzLOwkE-C1PGS40WoyhO7u_lB94PWq5Fws/edit?usp=sharing

This homework helps you understand the basic concepts in CUDA.

The sobel operator is used in image processing and computer vision, particularly within edge detection algorithms where it creates an image emphasising edges.

In this homework, you are given the sequential (CPU) code of a 5x5 variant of the sobel
operator, and asked to parallelize it with CUDA. Refer to the appendix for the information of the CPU version.

Your code should only contain a single GPU kernel function named as sobel().

Input Format

The input file is a PNG image with 3 color channels: RGB.

Output Format

The output file is a PNG image with 3 color channels: RGB.

Your output is considered correct if at least 99.8% of the pixels are identical with the provided sequential version.

Your output is considered incorrect if the dimensions of the output image is incorrect.

Example Input

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Example Output

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Optimization Hint

  • Shared Memory
  • Coalesced Memory Access
  • Lower Precision
  • 2D Block & 2D threads
  • CUDA Best Practices
  • I/O optimization

Compilation

We use NCHC container for this homework.

We use Makefile to build your
code. The default Makefile for this homework is provided at /tmp/dataset-nthu-ipc23/share/hw3/Makefile.
If you wish to change the compilation flags, include Makefile in your submission.

To use Makefile to build your code, make sure Makefile and hw3.cu is in the
working directory, then run make on the command line and it will build hw3
for you. To remove the built files, run make clean.

We will compile your code with the following command:

make

Execution

Your code will be executed with a command equalviant to:

./hw3 input.png output.png

The time limit for each test case is 30 seconds.

Report

Answer the following questions, in either English or Traditional Chinese.

  1. How did you parallelize the code?
    • Which CUDA APIs did you use?
    • Which functions are ported to CUDA? How did you distribute the workload
      to blocks and threads?
    • How do you implement shared memory?
  2. Which optimization techniques did you apply to your code?
  3. What's the difference between cudaMalloc and cudaMallocManaged? When will you pick one over another?
  4. Experiment:
    • Measure the GPU kernel time using nvprof. Show the difference with and without shared memory. In addition, measure the global memory load throughput (gld_throughput) and instruction per cycle (ipc) and explain your observation.
      https://docs.nvidia.com/cuda/profiler-users-guide/#nvprof
    • Profile your program by measuring the time spend in I/O, memory copy, CPU, and kernel.
  5. Pick any image that is not in the sample test cases, run your implementation with the image, and showcase both the input and output in your report.
  6. (Optional) Any suggestions or feedback for the homework are welcome.

Submission

Upload these files to EEClass:

  • hw3.cu the source code of your implementation.
  • Makefile optional. Submit this file if you want to change the build command.
  • report.pdf your report.

Please follow the naming listed above carefully. Failing to adhere to the names
above will result to points deduction. Here are a few bad examples: hw3.CU,
HW3.cu, report.docx, report.pages Makefile.mak.

Grading

  1. (40%) Correctness. Propotional to the number of test cases solved.
  2. (25%) Performance.
    • (15%) sobel GPU kernel function.
    • (10%) Total CPU time. for a failed test case, 75 seconds is added to your total time.
  3. (35%) Report.

Appendix

Please note that this spec, the sample test cases and programs might contain bugs.
If you spotted one and are unsure about it, please ask on eeclass.

Sequential (CPU) Version

The reference C++ implementation is at /tmp/dataset-nthu-ipc23/share/hw3/sobel.cc.
The refernce code follows the same input/output format as your homework, and
you can start implementing your version by copying it to hw3.cu.

Sample Testcases

The sample test cases are located at /tmp/dataset-nthu-ipc23/share/hw3/samples.

Output validation

/tmp/dataset-nthu-ipc23/share/hw3/hw3-diff can be used to compare two images.

For example, to compare your output with the answer, you may use:

/tmp/dataset-nthu-ipc23/share/hw3/hw3-diff out.png /tmp/dataset-nthu-ipc23/share/hw3/samples/c-1x.out.png

Judge

The hw3-judge and hw3-kernel-judge command can be used to automatically judge your code against
all sample test cases, it also submits your execution time to the scoreboard
so you can compare your performance with others.

Scoreboard: https://apollo.cs.nthu.edu.tw/ipc23/scoreboard/hw3/
https://apollo.cs.nthu.edu.tw/ipc23/scoreboard/hw3-kernel/

To use it, run hw3-judge in the directory that contains your code hw3.cu.
It will automatically search for Makefile and use it to compile your code,
or fallback to the TA provided /tmp/dataset-nthu-ipc23/share/hw3/Makefile otherwise.
If code compiliation is successful, it will then run all the sample test cases,
show you the results as well as update the scoreboard.

Note: hw3-judge and the scoreboard has nothing to do with grading.
Only the code submitted to iLMS is considered for grading purposes.

Type hw3-judge --help to see a list of supported options.

Judge Verdict Table

Verdict Explaination
internal error there is a bug in the judge
time limit exceeded+ execution time > time limit + 10 seconds
time limit exceeded execution time > time limit
runtime error your program didn't return 0 or is terminated by a signal
no output your program did not produce an output file
wrong answer your output is incorrect
accepted you passed the test case
tags: spec