# F74096297 林懋 平行處理作業二 contributed by < `limaoxd` > ## 第一部份 * 作業要求的 **Scatterv()** , **Gatherv()** 傳遞陣列的元素不能直接使用自己建立的 **Structrue** , 這邊使用 **MPI_Type_contiguous()** 來串 3 個 **MPI_UNSIGNED_CHAR** 讓函式傳遞 RGB 像素 . ``` cpp MPI_Datatype MPI_RGB; MPI_Type_contiguous(3, MPI_UNSIGNED_CHAR, &MPI_RGB); MPI_Type_commit(&MPI_RGB); ``` * 檔案讀取和由 **PROCESS0** **廣播** 讀取圖片的 **WIDTH** 和 **HEIGHT** ``` cpp if(!id) { //由 PROCESS0 讀取檔案 if ( readBMP( infileName) ) cout << "Read file successfully!!" << endl; else cout << "Read file fails!!" << endl; w_h[1] = bmpInfo.biHeight, w_h[0] = bmpInfo.biWidth; } MPI_Bcast(&w_h, 2, MPI_INT, 0, MPI_COMM_WORLD); if(id) BMPSaveData = alloc_memory( w_h[1], w_h[0]); ``` - 動態配置記憶體給要分的陣列 , 並且為其保留 2 格高的 **row** 為了底下的平滑化 , 就夠元的底下順便計算每個 **PROCESS** 的像素位移和非整除時的高度 ``` cpp int divide_size[size]; int divide_height[size]; int disp[size]; int remainder = w_h[1] % size; //當有餘數時 (pixel_number / size)算出來的 row 高要加一 int divide_h = w_h[1] / size + (remainder ? 1 : 0); //建立分割的圖像 , 並且為高度多 2 來保留交換下的空間 scatter_BMP = alloc_memory(divide_h+2, w_h[0]); //事先計算傳遞像素大小 (int* divide_size) 和每個接收資料的位移 (int* disp) for(int i = 0 ; i < size ; ++i) { if(remainder && i == size - 1) divide_height[i] = w_h[1] - (divide_h * (size-1)); else divide_height[i] = divide_h; divide_size[i] = w_h[0] * divide_height[i]; disp[i] = (i ? disp[i-1] + divide_size[i-1] : 0); } //process 0 將像素資訊傳給其他 process MPI_Scatterv(BMPSaveData[0], divide_size, disp, MPI_RGB, scatter_BMP[0], divide_size[id], MPI_RGB, 0, MPI_COMM_WORLD); ``` - 底下圖片是當分成 **n** 份時 , 各區塊要怎麼把自己的像素傳遞或接收邊界的像素 ![](https://i.imgur.com/xUyWe17.jpg) > 我自己是每個區塊先 send 兩次 , 希望再夠透過 recv 兩次 process 避免再等其他 process 資訊而卡住 , 但為了保險我也加上了 MPI_Barrier() 防止偷跑 . ``` cpp for(int count = 0; count < NSmooth ; count ++){ MPI_Send(scatter_BMP[divide_height[id]], w_h[0], MPI_RGB, (id + 1)%size, 0, MPI_COMM_WORLD); MPI_Send(scatter_BMP[1], w_h[0], MPI_RGB, (id - 1 + size)%size, 0, MPI_COMM_WORLD); MPI_Recv(scatter_BMP[0], w_h[0], MPI_RGB, (id - 1 + size)%size, 0, MPI_COMM_WORLD, &status); MPI_Recv(scatter_BMP[divide_height[id]+1], w_h[0], MPI_RGB, (id + 1)%size, 0, MPI_COMM_WORLD, &status); MPI_Barrier(MPI_COMM_WORLD); swap(scatter_BMP, s_BMP); //進行平滑運算 for(int i = 1; i <= divide_height[id]; i++) for(int j =0; j < w_h[0] ; j++){ /*********************************************************/ /*設定上下左右像素的位置 */ /*********************************************************/ int Top = i-1; int Down = i+1; int Left = j>0 ? j-1 : w_h[0]-1; int Right = j<w_h[0]-1 ? j+1 : 0; /*********************************************************/ /*與上下左右像素做平均,並四捨五入 */ /*********************************************************/ scatter_BMP[i][j].rgbBlue = (double) (s_BMP[i][j].rgbBlue+s_BMP[Top][j].rgbBlue+s_BMP[Top][Left].rgbBlue+s_BMP[Top][Right].rgbBlue+s_BMP[Down][j].rgbBlue+s_BMP[Down][Left].rgbBlue+s_BMP[Down][Right].rgbBlue+s_BMP[i][Left].rgbBlue+s_BMP[i][Right].rgbBlue)/9+0.5; scatter_BMP[i][j].rgbGreen = (double) (s_BMP[i][j].rgbGreen+s_BMP[Top][j].rgbGreen+s_BMP[Top][Left].rgbGreen+s_BMP[Top][Right].rgbGreen+s_BMP[Down][j].rgbGreen+s_BMP[Down][Left].rgbGreen+s_BMP[Down][Right].rgbGreen+s_BMP[i][Left].rgbGreen+s_BMP[i][Right].rgbGreen)/9+0.5; scatter_BMP[i][j].rgbRed = (double) (s_BMP[i][j].rgbRed+s_BMP[Top][j].rgbRed+s_BMP[Top][Left].rgbRed+s_BMP[Top][Right].rgbRed+s_BMP[Down][j].rgbRed+s_BMP[Down][Left].rgbRed+s_BMP[Down][Right].rgbRed+s_BMP[i][Left].rgbRed+s_BMP[i][Right].rgbRed)/9+0.5; } } ``` > 其實我還有做 row 的上下移動 , 這樣子就能直接套用迴圈嘞上下移動的操作不複雜 , 就不貼上來嘞 - 最後由 **MPI_Gatherv()** 將各區塊在 **PROCESS0** 組合起來 ``` cpp MPI_Gatherv(scatter_BMP[0], divide_size[id], MPI_RGB, BMPSaveData[0], divide_size, disp, MPI_RGB, 0, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); if(!id){ //寫入檔案 if ( saveBMP( outfileName ) ) cout << "Save file successfully!!" << endl; else cout << "Save file fails!!" << endl; } ``` ### 實績測試 0. 1 process 未平型化 ![](https://i.imgur.com/U14dJru.png) 1. 2 process ![](https://i.imgur.com/4hpMoaz.png) 2. 4 process ![](https://i.imgur.com/GUlfRUw.png) 3. 8 process ![](https://i.imgur.com/UsSIRh2.png) 4. 16 process ![](https://i.imgur.com/kYxwAar.png) > 觀察發現核心數越多 , 其加速逐漸減少 > 而且也 diff 過原程式生成的模糊, 並沒有跳出錯誤訊息 > ![](https://i.imgur.com/Ct9xcNo.png) ## 第二部份 ![](https://i.imgur.com/VCot8hO.png) 我的理解是當從奇數位(Odd)偶數位(Even) 開頭各取一對 **PAIR** 比較大小並進行 **SWAP()** , 直到 **Odd** 和 **Even** 開頭進行 **SORT** 並且未更動陣列就代表 sort 完嘞 . >複雜度大概是 ***O(n^2)*** 以下為未平行的基本版本 ``` c=1 #include <bits/stdc++.h> using namespace std; int main (){ int n , offset = 0 , odd = 0 , even = 0; cin >> n; srand(time(NULL)); vector<int> arr(n); for(auto &it : arr) it = rand(); for(;;){ if(offset) even = 1; else odd = 1; for(int i = offset; i < n; i += 2) { if(i + 1 < n && arr[i] > arr[i + 1]) { swap(arr[i], arr[i + 1]); if(offset) even = 0; else odd = 0; } } if(odd && even) break; offset = !offset; } for(auto &it : arr) cout << it << " "; cout << endl; } ``` 不過現在題目要求的平行化出現了限制 , 要在各個 PROCESS 生成 n/size 大小的 int* 陣列並不能將資訊 Boardcast . 而我的平行化想法如下 , * 各 process 會交換邊界的資訊 (e.g. PROCESS0 最右邊的元素(5) 和 PROCESS1 最左邊的元素(3),則 SWAP) 再對自己的陣列進行 odd_even sort. 而這個步驟會持續至各個 process 不能再 sort 的時候結束 * 由 process 去讀取 stdin 的內容來決定元素的總數量 ``` cpp if(!id) cin >> n; MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); ``` * 每個 process 含有的元素數量皆不同 , 我的分配方法如下 ```cpp int remainder = n % size; for(int i = 0 ; i < size ; ++i) { if (id >= n && size > n){ //特判 n < process 數 sub_n[id] = 1; disp[i] = disp[i-1] + 1; } else { disp[i] = (i ? disp[i-1] + sub_n[i-1] : 0); //由 remainder 個數量的元素 + 1 來補足餘數的量 sub_n[i] = n / size + (i < remainder ? 1 : 0); } } ``` * 主程式 ```cpp while(n--){ int l, r, tr, tl; if(id){ //對左邊交換 , id == 0 則不用 tl = arr[0]; MPI_Send(&tl, 1, MPI_INT, id - 1, 0, MPI_COMM_WORLD); MPI_Recv(&l, 1, MPI_INT, id - 1, 0, MPI_COMM_WORLD, &status); if(l > arr[0]){ undone = 1; //交換後代表還要做一次 sort arr[0] = l; } } if(id < size - 1){ //對右邊交換 , id == size -1 則不用 tr = arr[sub_n[id]-1]; MPI_Recv(&r, 1, MPI_INT, id + 1, 0, MPI_COMM_WORLD, &status); MPI_Send(&tr, 1, MPI_INT, id + 1, 0, MPI_COMM_WORLD); if(r < arr[sub_n[id] - 1]){ undone = 1; //交換後代表還要做一次 sort arr[sub_n[id] - 1] = r; } } if(undone) odd_sort(&arr, sub_n[id]); undone = 0; MPI_Barrier(MPI_COMM_WORLD); //保險 } ``` * 實機測試 ![](https://i.imgur.com/nDafK9Q.png) ![](https://i.imgur.com/mB21ukc.png) > 由於隨機性的關係 , 就算是固定 n = 100 的數量但是時間並未變快甚至變慢 , 接下來改以同 process 數和 n 來驗證 ``` bash=0 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 16 ./a.out 100 Process 1 execution time = 1.98447 Process 2 execution time = 1.98445 Process Process 4 execution time = 1.98446 Process 5 execution time = 1.98448 Process 7 execution time = 1.98446 Process 9 execution time = 1.98447 Process 10 execution time = 1.98445 Process 11 execution time = 1.98447 Process 12 execution time = 1.98445 Process 13 execution time = 1.98447 Process 14 execution time = 1.98444 Process 15 execution time = 1.98447 Process 6 execution time = 1.98444 3 execution time = 1.98446 2 3 5 5 6 6 7 9 10 11 11 12 12 15 15 15 17 18 18 19 19 20 20 22 24 24 26 28 29 29 32 33 34 35 36 38 41 42 42 42 43 44 44 45 46 46 47 47 47 47 48 49 50 51 53 55 55 58 61 61 61 62 62 62 63 67 67 69 69 69 74 74 74 76 76 77 77 78 78 78 80 81 83 84 84 86 86 86 89 89 90 91 91 92 93 93 95 96 96 97 Process 0 execution time = 0.415945 Process 8 execution time = 2.01542 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 16 ./a.out 100 Process 1 execution time = 3.69547 Process 2 execution time = 3.69548 Process 3 execution time = 3.7234 Process 4 execution time = 3.70348 Process 5 execution time = 3.69948 Process 6 execution time = 3.71945 Process 8 execution time = 3.70746 Process 9 execution time = 3.69547 Process 10 execution time = 3.67947 Process 11 execution time = 3.7234 Process 12 execution time = 3.71545 Process 13 execution time = 3.70745 Process 14 execution time = 3.71944 Process Process 7 execution time = 3.7234 1 2 3 3 6 7 7 8 8 8 11 12 12 12 12 13 14 15 15 16 16 16 17 17 18 19 21 22 22 23 24 26 27 28 29 30 31 32 32 33 35 35 36 38 40 40 41 43 43 45 45 45 46 46 46 46 49 50 50 51 51 52 56 56 57 58 59 60 61 62 62 64 66 68 70 71 72 73 74 76 77 77 79 81 81 83 84 86 86 87 88 88 90 92 93 94 96 96 98 99 Process 0 execution time = 0.337826 15 execution time = 3.7234 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 16 ./a.out 100 Process 1 execution time = 5.79329 Process 2 execution time = 5.79329 Process 3 execution time = 5.79329 Process 4 execution time = 5.79328 Process Process 6 execution time = 5.79328 Process 8 execution time = 5.79329 Process 9 execution time = 5.79329 Process 10 execution time = 5.79329 Process 11 execution time = 5.79322 Process 12 execution time = 5.79328 Process 13 execution time = 5.79329 Process 14 execution time = 5.79328 0 0 1 2 3 4 4 7 7 8 8 9 10 10 11 14 15 16 18 18 18 19 20 24 24 27 27 28 30 31 31 32 32 35 35 37 41 46 47 47 48 49 49 50 50 51 51 52 53 54 55 55 63 63 63 65 65 66 68 69 69 69 69 69 70 72 72 75 75 75 77 78 81 82 83 84 84 86 86 86 87 88 89 89 90 90 91 91 91 92 93 93 95 96 96 96 96 96 98 98 Process 0 execution time = 1.38966 5 execution time = 5.79329 Process 15 execution time = 5.79329 Process 7 execution time = 5.79329 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 16 ./a.out 100 Process 2 execution time = 3.16269 Process 3 execution time = 3.16271 Process 4 execution time = 3.1626 Process 5 execution time = 3.16269 Process 6 execution time = 3.16269 Process 8 execution time = 3.1626 Process 9 execution time = 3.16269 Process Process 11 execution time = 3.16271 Process 12 execution time = 3.1626 Process 13 execution time = 3.16269 Process 15 execution time = 3.16271 Process 1 execution time = 3.16269 Process 14 execution time = 3.1627 10 execution time = 3.1627 1 4 6 7 8 8 10 10 11Process 7 execution time = 3.16427 11 11 13 14 15 15 15 18 19 19 20 21 21 22 23 23 24 25 27 27 31 31 34 34 34 35 35 35 37 38 38 39 39 40 40 41 41 41 41 42 45 47 50 50 54 55 55 56 56 57 57 58 59 60 60 60 61 62 62 65 69 69 69 70 71 71 72 72 75 77 78 82 82 83 84 84 86 86 86 89 90 91 92 93 94 95 96 96 96 99 99 Process 0 execution time = 1.95233 ``` > 數量浮動 , 所以不好比較 process 是否越多越好 接下來將 n 調大 ```shell=0 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 2 ./a.out 10000 10 21 24 26 ... Process 0 execution time = 23.852867 Process 1 execution time = 24.931279 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 4 ./a.out 10000 10 10 30 34 ... Process 0 execution time = 7.474880 Process 1 execution time = 18.015723 Process 2 execution time = 18.015730 Process 3 execution time = 18.015719 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 8 ./a.out 10000 14 16 25 30 ... Process 0 execution time = 2.384799 Process 1 execution time = 4.019529 Process 3 execution time = 4.019531 Process 4 execution time = 4.019538 Process 5 execution time = 4.019536 Process 6 execution time = 4.019539 Process 2 execution time = 4.019539 Process 7 execution time = 4.019528 F74096297@sivslab-pn1:~/HW2$ mpiexec -n 16 ./a.out 10000 27 37 41 47 ... Process 0 execution time = 1.282921 Process 1 execution time = 6.544068 Process 3 execution time = 6.544043 Process 4 execution time = 6.544054 Process 5 execution time = 6.544060 Process 7 execution time = 6.544020 Process 2 execution time = 6.544044 Process 6 execution time = 6.544042 Process 8 execution time = 6.544045 Process 10 execution time = 6.544045 Process 11 execution time = 6.544045 Process 12 execution time = 6.544046 Process 13 execution time = 6.544062 Process 14 execution time = 6.544044 Process 15 execution time = 6.544024 Process 9 execution time = 6.544070 ``` * 結論 * 仔細想的話當 process 越多 , 要 coumminate 的次數就越多所以速率增長會遞減而沒辦法到 n(big process_n / small process_n) 倍. * 原本想解決 n < process 數量時 boarst 會等不到多餘 process 的回應 , 原本想試著處理 group 讓 WORLD_COMM_WORLD 移除掉多餘的 PROCESS 讓 BOARDCAST 不會被卡住. > Fatal error in PMPI_Group_range_excl: Invalid group, error stack: PMPI_Group_range_excl(224): MPI_Group_range_excl(group=0x44000000, n=4, ranges=0x7ffe4d04cc6c, new_group=0x7ffe4d04cbc4) failed PMPI_Group_range_excl(173): Invalid group 編譯會過 , 但是我找不到方法解決 QwQ 網路上資源也少所以後來不沒有做這個操作 . * 後來改為多餘的 process 會生成大小為 1 的陣列且數字恆大 random 的數