# Parallel Programming Homework 6
## Q1
The convolution is computed with 2D block. Each thread process one pixel of the output image by applying filter to the input image. The input image and filter are passed to the kernel using constant memory to speed up the process.
```clike=
__kernel void convolution(__constant float *input, __constant float *filter, __global float *output,
__constant int *imageWidth, __constant int *imageHeight, __constant int *filterWidth) {
float res = 0.0;
int row = get_global_id(0);
int col = get_global_id(1);
int i = row * (*imageWidth) + col;
int halfw = (*filterWidth) / 2;
for(int j = -halfw; j <= halfw; j++){
for(int k = -halfw; k <= halfw; k++){
int orow = row + j;
int ocol = col + k;
if(orow > 0 && orow < *imageHeight && ocol > 0 && ocol < *imageWidth){
res = res + input[orow * (*imageWidth) + ocol] * filter[(j + halfw) * (*filterWidth) + (k + halfw)];
}
}
}
output[i] = res;
}
```