--- title: 'Parallel Programming HW6' --- Parallel Programming HW6 === 姓名:解心妤 座號:110550146 ## Q1 ### Question Statement > Explain your implementation. How do you optimize the performance of convolution? ### Answer #### 1. `hostFE.c` **OpenCL Integration** Allow efficient parallel execution of tasks. **Buffer Management** 3 Buffers: input image, output image, and filter. 2 Flags: `CL_MEM_USE_HOST_PTR`, `CL_MEM_WRITE_ONLY` | Flag Type | Feature | Buffer | | -------- | -------- | -------- | | `CL_MEM_USE_HOST_PTR` | Reduce data transfer overhead with existing host memory usage | input image, filter | | `CL_MEM_WRITE_ONLY` | For results writing | output image | **Kernel Creation and Setting** Create an OpenCL kernel with the function `convolution`. After kernel function creation, the arguments are set using `clSetKernelArg`. **Parallel Execution** Work Size: `global_work_size` and `local_work_size` (64 due to server capability) The kernel is enqueued for execution with `clEnqueueNDRangeKernel` #### 2. `kernel.cl` **For Loop** The range of looping has narrowed down, which could lower the iteration time, espeicially for the larger filter and images. **Boundary Check** Finish if-else logic in one time, and could avoid unecessary caculation.