PP_HW5 - HackMD

# PP_HW5 ### Q1 What are the pros and cons of the three methods? Give an assumption about their performances. 在kernel1中使用的malloc和cudaMalloc為一般要求記憶體位置的方法，由os決定是否需要將page搬移置硬碟(virtual memory)，有時候可能會有page fault，然後需要等待os再將page從硬碟搬移至memory的延遲,預估應該會比kernel2使用的方法較慢。在kernel2中使用cudaHostAlloc為宣告一個不會被被搬移到硬碟的記憶體空間，由於不會再有page fault了，而cudaMallocPitch使得cuda讀取資料不用每次都從index=0開始讀取，預估會比kernel1快些。在kernel3中每個thread被分配的需要算比較多的pixel，使用的是kernel2使用的方法，由於每個thread被分配算的pixel都為同一個區塊(都在附近)，預估可使用到spatial locality的特性，預估會是三者中最慢的(task 比較多)。 ### Q2 How are the performances of the three methods? Plot a chart to show the differences among the three methods. view1 ![](https://i.imgur.com/nrRW1zh.png) view2 ![](https://i.imgur.com/Q39k3S2.png) ### Q3 Explain the performance differences thoroughly based on your experimental results. Does the results match your assumption? Why or why not. 在原本的預期中kernel2應該要為最快的，但在結果上好像不如我的預期，我想原因是在於host memory就算設定為不會被swap out出去，但其實我們的case是從device寫回host memory這情況只發生了一次，所以其實cudaHostAlloc對於我們的case沒有太大的差異，而cudaMallocPitch方法，由於每條thread都算不同的pixel，也不會有用到spatial locality的特性，所以kernel1和kernel2兩種情況速度是差不多的，kernel3如我們所預期是其中最慢的。 ### Q4 Can we do even better? Think a better approach and explain it. Implement your method in kernel4 .cu . 這個部分我沒寫...