# [Greenline] Investigate compiling Python wrappers with F2Py and OpenACC Fortran Turbulence granule
<!-- VEB Optimizer -->
- Shaped by: Yilu, Will
- Appetite (FTEs, weeks):2 weeks - 1 cycle (Exploratory phase)
- Developers: <!-- Filled in at the betting table unless someone is specifically required here -->
## Problem
The 1D Turbulence granule currently only runs on CPU via ```f2py```. To move toward GPU acceleration (Icon4py integration), we need to verify that ```f2py``` can successfully wrap Fortran code containing OpenACC directives. The key difficulty is getting the Python version of the code to properly recognize and use the NVIDIA GPU hardware.
## Appetite
After doing some web searching, we found a simple GPU example (from NVIDIA forum) to establish a baseline workflow ([NV_Example]). We though this could be a good start. While the example uses CUDA Fortran, the linking logic for OpenACC is nearly identical (swapping ```-Mcuda``` for ```-acc```). There are also examples for wrapping OpenACC code but use gcc instead of nvfortran ([OpenACC]).
## Solution
Step 1: The CUDA Baseline. Reproduce the forum example using ```nvfortran``` and ```f2py```. This confirms the environment and linker flags (LDFLAGS) are correctly configured to pull in GPU runtimes.
Step 2: OpenACC Transition. Modify the example to use OpenACC directives (e.g., ```!$acc parallel loop```) instead of CUDA kernels. Compile using the ```-acc``` flag.
Step 3: Turbulence Integration. Once the previous two steps work, we think we can say conceptually it should also be able to work with our turbulence fortran code. So the next step woudl be applying the verified compilation workflow to the actual 1D Turbulence granule.
- Compile the OpenACC turbulence granule with F2PY, which will then generate the ```*.so``` file.
- Implement the Python test. We will firstly test its functionality with Python random-valued fields. Then verify the granule in the Python side with the serialized data. In the Python file, we should be able to call the binding compiled module:
```
import f2py_turbdiff_run_wrapper
f2py_turbdiff_run_wrapper.f2py_turbdiff_run_wrapper.f2py_turbdiff_run()
```
## Rabbit holes
<!-- Details about the solution worth calling out to avoid problems -->
- Linking issues: A common trap with f2py is that it often "forgets" to pass GPU instructions through to the final stage of the build process. Even if we tell the compiler to use OpenACC at the start, f2py acts like a middleman that loses those instructions during the final handshake with Python. This results in a "headless" module: it looks ready to use, but it crashes the moment it tries to talk to the GPU because the necessary NVIDIA connections (runtimes) weren't bundled in.
To avoid this, we can't just rely on standard settings; we must use "forceful" environment variables (like LDFLAGS or NPY_DISTUTILS_APPEND_FLAGS). These act as a direct line to the final build step, ensuring the GPU support is hard-wired into the resulting tool.
- During the GPU porting of the turbulence module, we encountered persistent runtime crashes (FATAL ERROR: data not found, CUDA_ERROR_ILLEGAL_ADDRESS). The root cause was the incorrect handling of Fortran pointers within OpenACC data regions.
## No-gos
## Progress
<!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. -->
- [x] Simple CUDAFortran example
- [x] Can be called from python
- [x] Correct result
- [x] Extend to simple OpenACC example (addition of fields)
- [x] Can be called from python
- [x] Correct result
- [ ] Discovered Task 3
- [ ] Subtask L
- [ ] Subtask S
- [ ] Task 4
[NV_Example]: https://forums.developer.nvidia.com/t/compiling-python-wrappers-with-f2py-and-cuda-fortran/157217/3
[OpenACC]: https://stackoverflow.com/questions/40267183/using-f2py-with-openacc-gives-import-error-in-python