---
title: 'Parallel Programming HW6'
---
Parallel Programming HW6
===
姓名:解心妤
座號:110550146
## Q1
### Question Statement
> Explain your implementation. How do you optimize the performance of convolution?
### Answer
#### 1. `hostFE.c`
**OpenCL Integration**
Allow efficient parallel execution of tasks.
**Buffer Management**
3 Buffers: input image, output image, and filter.
2 Flags: `CL_MEM_USE_HOST_PTR`, `CL_MEM_WRITE_ONLY`
| Flag Type | Feature | Buffer |
| -------- | -------- | -------- |
| `CL_MEM_USE_HOST_PTR` | Reduce data transfer overhead with existing host memory usage | input image, filter |
| `CL_MEM_WRITE_ONLY` | For results writing | output image |
**Kernel Creation and Setting**
Create an OpenCL kernel with the function `convolution`.
After kernel function creation, the arguments are set using `clSetKernelArg`.
**Parallel Execution**
Work Size: `global_work_size` and `local_work_size` (64 due to server capability)
The kernel is enqueued for execution with `clEnqueueNDRangeKernel`
#### 2. `kernel.cl`
**For Loop**
The range of looping has narrowed down, which could lower the iteration time, espeicially for the larger filter and images.
**Boundary Check**
Finish if-else logic in one time, and could avoid unecessary caculation.