Single GPU CFD simualtion ========================= :::info This code is taken from the repository of [David Henty](https://github.com/davidhenty/cfd/tree/master) and implements a regular 2D CFD simulation of an incompressible fluid flowing in a cavity using the Navier-Stokes equation. ::: In this exercise you will offload the serial version of the code using **OpenACC** programming model. The code is composed by 4 files: - ``boundary.*`` : containes routines to define boundary conditions and initialization; - ``cfdio.*`` : contains routine for final IO operations; - ``jacobi.*`` : containes routine for the main looped operation, i.e. the jacobi step and the error calculation; - ``cfd.*`` : the main of the program Instrumentation --------------- As a first step, we instrument the application with NSight Systems. 1. Add NVTX ranges in the source code to identify the phases of the simulation (e.g. initialization, main loop, jacobi step, finalization...); 2. Modify the Makefile to link the ``nvToolsExt`` library; 3. Create a jobscript to compile, run a reference and an instrumented run with ``nsys profile``; 4. Submit the jobscript. Once the simulation is done, reply to the following questions 1. Open the logfile: how much time does the code take in the serial version? How much time per loop? 2. Open the Flat report in the summaries: which is the most time consuming routine sampled? Where is it from? 3. Look at the timeline view: which is the most time consuming phase of the simulation? What is the code doing in it; is this phase relevant for the offload? Filter ------- 3.3.1 Add parallel loop directives to offload kernels in jacobi.f90, without managing data movement. 3.3.2 Submit the job script. 3.3.3 Check the time to execute the main loop. Do you observe a speed up? 3.3.4 Open the report and look at the data movement section in the GPU panel. Which is the ratio between memory and kernels? 3.4 Improve the offload in ./improve_offload 3.4.1 Add data directives where needed 3.4.2 Offload the remaining part of the main loop 3.4.3 Submit the job file to run the code and check time to solution and time per iteration. 3.4.4 Do you observe a speedup? Which is the ratio between kernel and memory percentage now?