# PP-f22 assignment 6
## Q1 (5 points): Explain your implementation. How do you optimize the performance of convolution?
For `hostFE.c`, I change global work size and local work size of `clEnqueueNDRangeKernel` to `{(imageWidth+15)/16*16, (imageHeight+15)/16*16}` and `{16, 16}`.
For `kernel.cl`, I copied the serial version and change the index part. I also added `const` and `__restrict__` to try to make compiler generate faster code.