HW6 OpenAcc - HackMD

# HW6 OpenAcc Due: Tue, 2024/06/11 23:59 [TOC] ## Reminder * **This assignment should be done by using the container "IPC24 1GPU"** * **Only my_cnn() and functions invoked inside my_cnn() can be modified** * **Please remember to load the needed modules before compile your program** ## Introduction * Optimization of a Convolutional Neural Network (CNN) using OpenACC. * In this homework, you are asked to convert a sequential implementation of a CNN into a parallelized version using OpenACC to enhance performance. ## Problem Description Please refer to the homework slides for a detailed problem description. Implement the parallel code using OpenACC for feasible parts of the CNN. ## CNN Architecture & Operations ### Convolutional Layer * Applies convolution operations to the input data with specific filters to capture features. * Formula for a single convolution operation: $$a^{l+1}_{i,j} = \sigma \left( \sum_m \sum_n W_{m,n} \cdot x_{i+m,j+n} + b \right)$$ where $W$ is the weight matrix for the convolutional filters, $x$ is the input, $b$ is the bias, and $\sigma$ is the activation function (e.g., ReLU). ### Pooling Layer * Reduces the spatial size of the representation to reduce the amount of parameters and computation in the network. * e.g. **Max Pooling**: - Formula: $$ a^{l+1}_{i,j} = \max_{m,n} \left( x_{i \cdot s + m, j \cdot s + n} \right) $$ - Here, $x$ is the input matrix, $s$ is the stride, and $m, n$ iterate over the pooling window dimensions, typically within a 3x3 area. ### Fully Connected Layer * Neurons in a fully connected layer have full connections to all activations in the previous layer. * These layers will classify the features learned by the convolutions. * **Formula**: - $$ a^{l+1} = \sigma(W^l a^l + b^l) $$ - Where $W^l$ is the weight matrix associated with layer $l$, $a^l$ is the activation from the previous layer, $b^l$ is the bias vector for layer $l$, and $\sigma$ is the activation function, which could be ReLU, Softmax, or any other non-linear function. ### Activation Functions * Non-linear functions that are applied after each convolution or fully connected layer. * **ReLU (Rectified Linear Unit)**: - Formula: $a_i = \max(0, z_i)$ - Applies to each input pixel $z_i$; outputs zero if $z_i$ is less than zero, otherwise outputs $z_i$. * **Softmax** (typically used in the output layer for classification tasks): - Formula: $\sigma(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}$ - Applies to the logits $z_i$ of the final layer to normalize them into a probability distribution. ### Argmax Function * Determines the index of the maximum value in each row of the input array. This index typically represents the predicted class label in classification tasks. * **Formula**: - $$ D[i] = \text{arg} \max_j (A[i][j]) $$ - Where $A[i][j]$ is the element of the input array at row $i$ and column $j$, and $D[i]$ is the output array storing the index of the maximum value for each row. ## Program Execution ### Module Load Before compiling your program, we need to load the necessary modules in the container. Please execute the following commands: ``` module unload nvhpc-nompi module load nvhpc ``` ### Sequential Code The provided directory for homework 6 is at the following path of the NCHC container: ***/tmp/dataset-nthu-ipc24/share/hw6/*** You can copy it to your root directory with the following command: ``` cp -r /tmp/dataset-nthu-ipc24/share/hw6/ $HOME/hw6 ``` The example sequential code can be found in the *TBD* folder in the hw6/ directory. ### Compilation **We use NCHC container for this homework.** We will compile your program with a command equivalent to: ```bash make ``` ### Command Line Format Your program will be tested with a command equivalent to: ```bash ./hw6 ``` Note that there is no input parameters to the program, the inference results of the model will be written to a result.txt file. ## Report Answer the following questions, in either English or Traditional Chinese. 1. Describe how you applied OpenACC directives in the CNN. 2. What challenges did you face when parallelizing the network? 3. What are the strengths and weaknesses of using OpenAcc? 4. How did the performance of the parallelized network compare to the sequential implementation? 4. (Optional) Any suggestions or feedback for the homework are welcome. ## Submission Upload these files to eeclass: * ```hw6.cpp``` – the source code of your implementation. * ```Makefile``` – optional. Submit this file if you want to change the build command. If you didn’t submit this file, ```/tmp/dataset-nthu-ipc24/share/hw6/samples/Makefile``` will be used. * ```report.pdf``` – your report. Please follow the naming listed above carefully. Failing to adhere to the names above will result to points deduction. ## Grading 1. (50%) Correctness. 2. (20%) Performance. Based on the total time you solve all the test cases. 3. (30%) Report. ## Resources Refer to ```/tmp/dataset-nthu-ipc24/share/hw6/``` for sample test cases, source codes and tools. ## Judge The `hw6-judge` command can be used to automatically judge your code against all sample test cases, it also submits your execution time to the scoreboard so you can compare your performance with others. Scoreboard: http://elsa-judge.cs.nthu.edu.tw/hw6/. To use it, run ```hw6-judge``` in the directory that contains your code ```hw6.cpp```. It will automatically search for ```Makefile``` and use it to compile your code, or fallback to the TA provided ```/tmp/dataset-nthu-ipc24/share/hw6/samples/Makefile``` otherwise. If code compiliation is successful, it will then run all the sample test cases, show you the results as well as update the scoreboard. ## Judge Verdict Table |Verdict | Explaination| |------- | ------------| |internal error| there is a bug in your code | |time limited exceeded+|execution time > time limit + 10 seconds| |time limited exceeded|execution time > time limit| |runtime error|your program didn’t return 0 or is terminated by a signal| |no output|your program did not produce an output| |wrong answer|your output is incorrect| |accepted|you passed the test case| ###### tags: `spec`