Parallel Programming Assignment 6: OpenCL Programming

Q1. Explain your implementation. How do you optimize the performance of convolution?

我先 allocate 一個長寬為 imageWidth + 2halffilterSize、imageHeight + 2halffilterSize 的 array，用來儲存 padding 後的 image，並將 inputImage 的 data 儲存到 array 中。

接著，我產生一個 size 為 (imageWidth + 2*halffilterSize) * (imageHeight + 2*halffilterSize) 的 buffer，用來傳輸 padding image，以及一個 size 為 imageWidth * imageHeight 的 buffer，用來傳輸 outputImage。

在 kernel 中，NDRange 是 imageWidth * imageHeight，每個 work-item 會進行 filterWidth * filterWidth 次數的乘加，計算出 outputImage 的 data。

最後，allocate 一個 array 用來儲存讀取出的 Image，並將此 array 中的 data assign 到 hostFE 的 argument outputImage 中。

Parallel Programming Assignment 6: OpenCL Programming

Q1. Explain your implementation. How do you optimize the performance of convolution?

Read more

Parallel Programming Assignment 1: SIMD Programming