**Port FDTD code from CPU to GPU**
The whole code consists of 3 modules: 1) FDTD, 2) FFT, 3) statistics
**1)** FDTD module already ported to GPU by using OpenACC
Check performance with "managed" flag vs no "managed" flag
**2)** Ongoing
Implements the cuFFT Library -- Check if possible to perform the FFT in the device
Before 3:
- check the "MPI_Scatter" call
- use more than 1 MPI process
**3)** "Upgrade" statistical routines with OpenACC
## 20241029
- Compiled and submitted the FDTD module on Leonardo with *acc* directives
- Check performance with *nsys* of "managed" flag in the FDTD code (--cpus-per-task=8)
## 20241030...
- Implement and check the memory allocation for the FFT routine (maintaining the existing cylce over "npoints")
## 20241031
- Porting and profiling the FFTW to cuFFT (*managed* memory)
- 
- Try Stream and Batched