# Parallel Programming HW6
###### tags: `Parallel Programming`
### Q1
Explain your implementation. How do you optimize the performance of convolution?
Ans:
* In the `hostFE.c` file, set the local work size to 25 x 25, and the global work size to imageWidth x imageHeight, so that each pixel in the image has a work item.
* Each work item's kernel code is shown below.
* Lines 8 and 9 determine which pixels each work item needs to process, while lines 13 to 21 zero-padding the picture.
* Lastly, the result is stored in its corresponding location. By doing so, each pixel can be calculated in parallel, improving convolution performance.
```clike=1
__kernel void convolution(int filterWidth, __global float * filter,
int imageHeight, int imageWidth,
__global float * inputImage, __global float * outputImage)
{
int halffilterSize = filterWidth / 2;
float sum;
int k, l;
int ix = get_global_id(0);
int iy = get_global_id(1);
sum = 0;
for (k = -halffilterSize; k <= halffilterSize; k++){
for (l = -halffilterSize; l <= halffilterSize; l++){
if (iy + k >= 0 && iy + k < imageHeight &&
ix + l >= 0 && ix + l < imageWidth){
sum += inputImage[(iy + k) * imageWidth + ix + l] *
filter[(k + halffilterSize) * filterWidth + l + halffilterSize];
}
}
}
outputImage[iy * imageWidth + ix] = sum;
}
```