**平行運算作業二 @NYCU, 2023 Fall**

# **平行運算作業二 @NYCU, 2023 Fall** # **Parallel Programming HW1 @NYCU, 2023 Fall** # Q1 程式會根據第一個命令行參數創建一定數量的執行緒，並將 Monte Carlo 模擬分配到這些執行緒上。每個執行緒使用 rand_r 函數生成自己的隨機數序列，該函數是線程安全且可重入的。主執行緒使用 pthread_join 函數等待所有工作執行緒完成，然後使用投擲次數和落在圓內的次數計算出 pi 的最終估計值。說明：蒙特卡洛估算蒙特卡洛方法是一類基於重複隨機抽樣獲取數值結果的計算算法。其中一個基本的例子是使用蒙特卡洛算法進行 Pi 的估算。任務：將串行程式轉換為使用 Pthreads 的程式，使用蒙特卡洛方法來估算 Pi。主執行緒讀取投擲總次數並列印出估算值。使用 long long ints 記錄圓內投中的次數和總投擲次數。在遵守蒙特卡洛方法的前提下，儘可能快速地使用 Pthreads 進行程式優化。問題：需要將串行程式轉換為 Pthreads 程式，同時保持蒙特卡洛方法的運作。需要確保線程安全和可重入的程式碼。需要遵守程式要求，優化程式速度。 void *monte_carlo(void *tid) { long long int toss; long long int local_number_in_circle = 0; unsigned int seed = time(NULL) + (long)tid; for (toss = 0; toss < number_of_tosses; toss++) { double x = (double)rand_r(&seed) / RAND_MAX * 2.0 - 1.0; double y = (double)rand_r(&seed) / RAND_MAX * 2.0 - 1.0; double distance_squared = x * x + y * y; if (distance_squared <= 1.0) local_number_in_circle++; } pthread_mutex_lock(&mutex); number_in_circle += local_number_in_circle; pthread_mutex_unlock(&mutex); return NULL; } int main(int argc, char *argv[]) { int i; double pi_estimate; if (argc != 3) { printf("Usage: %s <number_of_threads> <number_of_tosses>\n", argv[0]); return 1; } number_of_threads = atoi(argv[1]); number_of_tosses = atoll(argv[2]); if (number_of_threads > MAX_THREADS) { printf("Error: The maximum number of threads is %d\n", MAX_THREADS); return 1; } for (i = 0; i < number_of_threads; i++) pthread_create(&threads[i], NULL, monte_carlo, (void *)(long)i); for (i = 0; i < number_of_threads; i++) pthread_join(threads[i], NULL); pi_estimate = 4 * (double)number_in_circle / ((double)number_of_tosses * (double)number_of_threads); printf("PI = %f\n", pi_estimate); return 0; } ![](https://i.imgur.com/7PUGX7N.png) # Q2 1.Modify starter code to parallelize Mandelbrot generation with 2 processors using spatial decomposition, computing the top and bottom halves of the image in separate threads. 2.Extend code to use 2, 3, and 4 threads, partitioning image generation work accordingly. Produce a graph of speedup compared to sequential implementation as a function of thread count for View 1. Hypothesize why speedup may or may not be linear. Repeat for View 2. 3.Insert timing code to measure time required for each thread to complete its work in workerThreadStart(). Explain how these measurements explain the speedup graph previously created. 4.Modify work-to-thread mapping to achieve 3-4x speedup on both views of the Mandelbrot set without using thread synchronization. Describe approach and report final 4-thread speedup obtained in write-up. 5.Run improved code with 8 threads and compare performance to running with 4 threads. Explain whether or not performance is noticeably greater and why (the workstation server provides 4 cores and 4 threads). Explanation: 1.修改代碼，在mandelbrotThread函數中創建兩個thread，使用條件語句確保第一個thread計算圖像的上半部分，而第二個thread計算下半部分。 2.修改workerThreadStart函數以使用傳遞給函數的行範圍參數為圖像的適當部分計算曼德博集合。 3.在每個thread的工作開始和結束時添加計時代碼，以測量每個thread完成其工作所需的時間。 4.編譯並運行並行化的代碼，比較其性能與原始的順序實現，並使用加速比公式計算使用兩個thread並行化代碼所獲得的加速比。 5.嘗試不同的thread配置，例如使用四個thread而不是兩個，以查看是否可以實現更高的加速比。但是，請注意，最佳thread數量將取決於機器上可用的核心數量和正在解決的問題的大小等因素。 //mark1 int step = args->numThreads; int height = args->height; for (int r = args->threadId; r < height; r += step) { mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, r, 1, args->maxIterations, args->output); } 這段程式碼是一個多執行緒的 Mandelbrot 集合計算器的實現。Mandelbrot 集合是一個複數集合，它的邊界在複平面上非常複雜且美麗。計算 Mandelbrot 集合的方法是對複平面上每個點進行遞迴計算，直到該點是否會發散。這段程式碼中，每個執行緒都負責計算一行像素的值，由 args->threadId 決定每個執行緒開始計算的起點。step 變數表示每隔多少行像素由該執行緒計算，因此每個執行緒會負責計算一個 "stripe"，這樣每行像素都會被至少一個執行緒計算到。在每個執行緒的迴圈中，呼叫 mandelbrotSerial 函數，計算這一行像素的 Mandelbrot 值。mandelbrotSerial 函數將複數坐標 (x, y) 轉換為複數值 c = x + yi，然後對這個複數值 c 進行一定次數的遞迴計算，判斷它是否會發散。如果發散，就記錄下發散的步數，否則就將該像素設為黑色。執行完所有的執行緒後，得到的結果會被寫入 args->output 陣列中，每個像素的值代表該點的 Mandelbrot 值。最終這些值可以用來繪製出 Mandelbrot 集合的圖像。 //mark2 // 儲存開始時間 double startTime = CycleTimer::currentSeconds(); // 用 i 計算每個執行緒應處理的列數，從 threadId 開始，每 numThreads 一行 for (unsigned int i = args->threadId; i < args->height; i += args->numThreads) { mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, i, 1, args->maxIterations, args->output); } // 儲存結束時間 double endTime = CycleTimer::currentSeconds(); // 計算執行緒執行時間 double threadTime = (endTime - startTime) * 1000.0; // 印出執行緒執行時間 printf("[mandelbrot thread %d]:\t\t[%.3f] ms\n", args->threadId, threadTime); 這段程式碼是一個使用多執行緒計算 Mandelbrot 集合的實現。它通過將計算分配給多個執行緒來加速計算。首先，它使用 CycleTimer 函數來儲存計算的開始時間。接著，它通過迴圈將計算分配給多個執行緒，每個執行緒負責處理一定的行數。這裡的 args->threadId 表示當前執行緒的 ID，args->height 表示圖像的高度，args->numThreads 表示總共有多少個執行緒。每個執行緒處理的行數是 args->numThreads，因此每隔 args->numThreads 行會有一個執行緒負責處理。在迴圈內部，執行緒使用 mandelbrotSerial 函數計算該行像素的 Mandelbrot 值，args 是一個指向傳遞給函數的參數的指針，包含了計算所需的所有資訊。在迴圈結束時，它使用 CycleTimer 函數儲存計算的結束時間，並計算出執行緒執行的時間。最後，它使用 printf 函數將執行緒的 ID 和執行時間印出來，以方便觀察執行緒的效能。這些執行緒的計算結果會被儲存在 args->output 陣列中，最終可以用這些值來繪製 Mandelbrot 集合的圖像。 ![](https://i.imgur.com/kF7g8RO.png) 為各thread的時間花費 2 ~ 5/16 v1/v2 thread = 2 V1/V2 ![](https://i.imgur.com/cpQ1wd0.jpg) thread = 3 V1/V2 ![](https://i.imgur.com/9I9xKlh.jpg) thread = 4 V1/V2 ![](https://i.imgur.com/9n6e5EX.jpg) thread = 5 V1/V2 ![](https://i.imgur.com/MMDIesk.jpg) thread = 8 V1/V2 ![](https://i.imgur.com/bUneWw9.jpg) thread = 16 V2 ![](https://i.imgur.com/jl8nhdZ.jpg) 以 2, 3, 4, 5, 8,16 thread分別執行 view1, view2，並且分別量測每個thread的執行時間，可以發現 view1 與，view2 依照thread計算出時間 thread 可以看出V1 = 1.33 V2 = 1.19 thread 4的時候可以看出V1 = 1.33 V2 = 1.19的差異性為各thread的時間花費 2 ~ 12 v1/v2 speedup time ![](https://i.imgur.com/vENoNaU.png) V1呈現平緩成 , V2為成長一開始比較快 , 到thread 6 也開始平緩