PP-f22 assignment 6

# PP-f22 assignment 6 ## Q1 (5 points): Explain your implementation. How do you optimize the performance of convolution? For `hostFE.c`, I change global work size and local work size of `clEnqueueNDRangeKernel` to `{(imageWidth+15)/16*16, (imageHeight+15)/16*16}` and `{16, 16}`. For `kernel.cl`, I copied the serial version and change the index part. I also added `const` and `__restrict__` to try to make compiler generate faster code.