# Parallel Programming HW6 #### **`311551185 劉力勳`** ### <font color="#EA0000"> Q1: Explain your implementation. How do you optimize the performance of convolution? </font> First, create the memory, command queue, and kernel needed for OpenCL: ``` cl_command_queue q = clCreateCommandQueue(*context, *device, 0, NULL); cl_kernel kernel = clCreateKernel(*program, "convolution", NULL); cl_mem filter_buffer = clCreateBuffer(*context, CL_MEM_READ_ONLY, filterSize, NULL, NULL); cl_mem input_buffer = clCreateBuffer(*context, CL_MEM_READ_ONLY, mem_size, NULL, NULL); cl_mem output_buffer = clCreateBuffer(*context, CL_MEM_WRITE_ONLY, mem_size, NULL, NULL); ``` Next, write the input image and filter into read-only memory: ``` clEnqueueWriteBuffer(q, filter_buffer, CL_TRUE, 0, filterSize, (void *)filter, 0, NULL, NULL); clEnqueueWriteBuffer(q, input_buffer, CL_TRUE, 0, mem_size, (void *)inputImage, 0, NULL, NULL); ``` Then, set the parameters for the kernel function: ``` clSetKernelArg(kernel, 0, sizeof(cl_int), (void *)&filterWidth); clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&filter_buffer); clSetKernelArg(kernel, 2, sizeof(cl_int), (void *)&imageHeight); clSetKernelArg(kernel, 3, sizeof(cl_int), (void *)&imageWidth); clSetKernelArg(kernel, 4, sizeof(cl_mem), (void *)&input_buffer); clSetKernelArg(kernel, 5, sizeof(cl_mem), (void *)&output_buffer); ``` Next, specify the global work size and local work size, and then execute. The code in kernel.cu is basically a slight modification of the serialConv.c provided by the teaching assistant, which can be run as is. Since the local work size is set to 25 x 25, and the global work size is set to imageWidth x imageHeight, this ensures that each pixel will be assigned to a work item for parallel processing. ``` size_t global_size[2] = {imageWidth, imageHeight}; size_t local_size[2] = {25, 25}; clEnqueueNDRangeKernel(q, kernel, 2, 0, global_size, local_size, 0, NULL, NULL); ``` Finally, the memory from the GPU must be written back to the host side: ``` clEnqueueReadBuffer(q, output_buffer, CL_TRUE, 0, mem_size, (void *)outputImage, NULL, NULL, NULL); ```