# Reductions in GridTools
###### tags: `archive`
## Problem
FVM requires global reductions. A first step is an efficient C++ implementation for different architectures which works with our data structures/abstractions.
## Goal
Implement a GridTools library for single node reduction of SIDs which handles max/min, product, sum and dot product.
TODO define API requirements.
## Performance
Performance of the implementation will be tracked by the GridTools perftest framework. At the end of the cycle, functionality with decent performance should be achieved. Optimizing for *best* performance is a nice to have.
TODO discuss if dot product is required
TODO discuss if multi field reduction makes sense (for better instruction level parallelism). Do we see a use-case?
### GPU
For an efficient CUDA implementation see [CUDA samples - reduction](https://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-parallel-reduction) and the newer [CUDA samples - reduction using multiblock cooperative groups](https://docs.nvidia.com/cuda/cuda-samples/index.html#reduction-using-multiblock-cooperative-groups). An initial performance comparison (on V100?) of the 2 algorithms should be done.
Challenge is to map the algorithm for a linear contiguous array to a multidimensional array (with padding).
### CPU
Research of an efficient algorithm for modern CPUs needs to be done.
Possible starting point:
- [Using Advanced Vector Extensions AVX-512 for MPI Reductions](https://www.icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf)