epicurehack

@epicurehack

Public team

Joined on Oct 21, 2024

  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  •  Like  Bookmark
  • :::info This code is taken from the repository of David Henty and implements a regular 2D CFD simulation of an incompressible fluid flowing in a cavity using the Navier-Stokes equation. ::: The code is composed by 4 files boundary.f90 : buondary initialization cfdio.f90 : defines routines for IO operations at the end of the program jacobi.f90 : contains routines to implement the jacobi step and the error calculation cfd.f90 : main of the program, implements the main loop for cfd code
     Like  Bookmark
  •  Like  Bookmark
  • Welcome to the first edition of the <font color="#F7A004">Epicure</font> hackathons! The event will be hosted at Edificio INFN-CINECA , Tecnopolo Via Stalingrado 84/3 40128, Bologna :runner: How to get there? Link here
     Like  Bookmark
  • This hackMD document collects information and guided unwinding of the hands-on proposed here: https://gitlab.hpc.cineca.it/training/epicure-gpu-hackathon CUDA exercises The exercises are collected at this link Vector Addition Electrostatic Particle-In-Cell code Martix Multiplication Matrix Transpose
     Like  Bookmark
  • :::info The non-gpu version of this code is taken from the repository of David Henty. ::: This is the MPI-distributed version of the CFD previously offloded to a single GPU. In this case, each MPI rank will be binded to a GPU for the offload. The aim is to implement efficient GPU to GPU communications. The code is composed by four files cfd.f90 contains the main loop of the program jacobi.f90 contains the jacobi step the reduction for error calculation
     Like  Bookmark
  • The command line interface To perform the measurements when the program starts, you need to run it with nsys profile [optional command_switch_options] [application_executable] [optional application_options] This command will generate a report in the format .nsys_rep, to open in the GUI. :::info The GUI can be install on your laptop from here <https://developer.nvidia.com/nsight-systems>_ :::
     Like  Bookmark
  • Steps Add data and parallel loop/seq directives, without optimization clauses. Open the Makefile and add instructions to target Leonardo's accelerators; use -acc=noautopar to inhibit automatic loop optimizations done by the compiler and -Minfo=accel to get information on how the code is compiled for GPUs. Modify the jobscript is order to compile and run the code on compute nodes. Modify the parallel directives by adding clauses for loop optimizations; rerun the code. Try also with kernels directive and -acc=autopar. Questions How does the compiler offloads the loop in the different cases? Compare the time to solution GPU code in the three cases. Do you observe performance improvement?
     Like  Bookmark
  • :::info This code is taken from the repository of OpenACC best practice guide from NVIDIA. ::: In this code you will start from the block version of the mandelbrot exercise, and use openmp threads to send each block to a different GPU in the node. To this, you need to bind to each thread one of the available GPUs in a round-robin fashion. Thread-GPU binding As a first step, we need to use the OpenMP and OpenACC/CUDA APIs to query the number of openmp threads available and bind threads to gpus. Consider that the number of threads is equivalent to the number of gpus on the node, unkown a priori. Use the following APIs:
     Like  Bookmark
  • :::info This code is taken from the repository of OpenACC best practice guide from NVIDIA and reproduces a common operation in image processing. The image is loaded on an array, each element of this array corresponds to a pixel. Each pixel is processed from within a loop by a mandelbrot function. ::: The code is composed by two files: main.* : containing the main loop of the program; mandelbrot.* : containing the definition of the mandelbrot function. Offload
     Like  Bookmark
  • :::info This code is taken from the repository of David Henty and implements a regular 2D CFD simulation of an incompressible fluid flowing in a cavity using the Navier-Stokes equation. ::: In this exercise you will offload the serial version of the code using OpenACC programming model. The code is composed by 4 files: boundary.* : containes routines to define boundary conditions and initialization; cfdio.* : contains routine for final IO operations;
     Like  Bookmark
  • This toy code is composed by 3 files mod_hostdata.f90 contains the definition of the global variables mod_functions.f90 contains the initialisation routines gemm.f90 is the main of the program, contains the gemm operation on the CPU and on the GPU Steps Manage the data movements with enter data and exit data directives in initialisation/finalisation routines. After computing ZGEMM on the CPU, add a call to cublasZGEMM on the GPU. Be careful to provide the device buffer (with OpenACC) as an input of the cuBLAS API. Add nvtx ranges to wrap the ZGEMM operation on the CPU and on GPU
     Like  Bookmark