--- title: PP proposal file name: team10_proposal_0656066_0656060_0551284.pdf --- # Parallel Pragramming final project proposal ### 1. Title Parallel Implementation of Coin Counting by Ostu's Method on GPU ### 2. The participants TEAM 10 * 0656066 陳博聖 * 0656060 蘇意喬 * 0551284 林宸皜 ### 3. Introduction/motivation Have you ever been in a situation that pouring all the coins out of your wallet but having no idea how many they are? Coins have been a trouble, but coins are a important part of our life.[1] We need to use coins in places like stores, buses, trains, etc. We want to design a application to help count the coins on the table by image processing. In our application, we use two famous image processing algorithms, Otsu thresholding and Hough transform. ### 4. Statement of the problem In this application, we need to implement two famous algorithms, Otsu thresholding[2] and Hough transformation. These algorithms have to scan a whole image and histogram many times, hence it results in poor performance when implement it by only one thread. ![coin](https://i.imgur.com/cRGF0lp.png=90x90) ### 5. Proposed approaches ```flow input=>start: Image output=>end: Coin Count image1=>operation: gray-level image image2=>operation: binary image op=>subroutine: Gray level op2=>subroutine: Ostu's method op3=>subroutine: sphere Hough transform input->op->image1->op2->image2->op3->output ``` Gray level transforms original colorful image to gray-level. Then Ostu's method would find a proper threshold and apply to binarization. Finally, count coins with hough transform. Hough transform would find sphere in the images for us. We would implement these three steps in C++ and CUDA to prove the acceleration. ### 6. Language selection We choose CUDA, rather than Pthread or openMP, because GPGPU has much better performance on image processing than CPU. Compare with OpenCL, we consider CUDA is much easier to program. ## 7. Related work Other researchers suggest parallelize algorithm using CUDA platform.[3] In CUDA, it is assumed that both host and device maintain their own DRAM. Host memory is allocated using malloc and device memory is allocated using cudaMalloc. CUDA threads are assigned a unique thread ID that identifies its location within the thread, block and grid. The following pseudo-code outlines the structure of parallel code of Otsu’s method. ## 8. Expected results Our solution could accelerate Otsu thresholding and hough transformation 3.5 times even in different size of images. We improve efficiency of loading image to GPU. ## 9. Schedule | Time | Scheduled work | | ------:| -----------:| | Nov 1 - Nov 15 | Implement C++ version | |Nov 15 - Dec 1 | Implement CUDA version | | Dec 1 - presentation | Data collet and analyze | | - Jan 12 | Upload source code and report | ## 10. References [1] FUKUMI, Minoru, et al. Rotation-invariant neural pattern recognition system with application to coin recognition. IEEE Transactions on Neural Networks, 1992, 3.2: 272-279. [2] OTSU, Nobuyuki. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 1979, 9.1: 62-66. [3] SINGH, Brij Mohan, et al. Parallel implementation of Otsu’s binarization approach on GPU. Int J Comput Appl, 2010, 32.2: 16-21.