**Port FDTD code from CPU to GPU** The whole code consists of 3 modules: 1) FDTD, 2) FFT, 3) statistics **1)** FDTD module already ported to GPU by using OpenACC Check performance with "managed" flag vs no "managed" flag **2)** Ongoing Implements the cuFFT Library -- Check if possible to perform the FFT in the device Before 3: - check the "MPI_Scatter" call - use more than 1 MPI process **3)** "Upgrade" statistical routines with OpenACC ## 20241029 - Compiled and submitted the FDTD module on Leonardo with *acc* directives - Check performance with *nsys* of "managed" flag in the FDTD code (--cpus-per-task=8) ## 20241030... - Implement and check the memory allocation for the FFT routine (maintaining the existing cylce over "npoints") ## 20241031 - Porting and profiling the FFTW to cuFFT (*managed* memory) - ![image](https://hackmd.io/_uploads/ByAlEy--kg.png) - Try Stream and Batched