# Parallel Programming HW6 ###### tags: `Parallel Programming` ### Q1 Explain your implementation. How do you optimize the performance of convolution? Ans: * In the `hostFE.c` file, set the local work size to 25 x 25, and the global work size to imageWidth x imageHeight, so that each pixel in the image has a work item. * Each work item's kernel code is shown below. * Lines 8 and 9 determine which pixels each work item needs to process, while lines 13 to 21 zero-padding the picture. * Lastly, the result is stored in its corresponding location. By doing so, each pixel can be calculated in parallel, improving convolution performance. ```clike=1 __kernel void convolution(int filterWidth, __global float * filter, int imageHeight, int imageWidth, __global float * inputImage, __global float * outputImage) { int halffilterSize = filterWidth / 2; float sum; int k, l; int ix = get_global_id(0); int iy = get_global_id(1); sum = 0; for (k = -halffilterSize; k <= halffilterSize; k++){ for (l = -halffilterSize; l <= halffilterSize; l++){ if (iy + k >= 0 && iy + k < imageHeight && ix + l >= 0 && ix + l < imageWidth){ sum += inputImage[(iy + k) * imageWidth + ix + l] * filter[(k + halffilterSize) * filterWidth + l + halffilterSize]; } } } outputImage[iy * imageWidth + ix] = sum; } ```