# Reductions in GridTools ###### tags: `archive` ## Problem FVM requires global reductions. A first step is an efficient C++ implementation for different architectures which works with our data structures/abstractions. ## Goal Implement a GridTools library for single node reduction of SIDs which handles max/min, product, sum and dot product. TODO define API requirements. ## Performance Performance of the implementation will be tracked by the GridTools perftest framework. At the end of the cycle, functionality with decent performance should be achieved. Optimizing for *best* performance is a nice to have. TODO discuss if dot product is required TODO discuss if multi field reduction makes sense (for better instruction level parallelism). Do we see a use-case? ### GPU For an efficient CUDA implementation see [CUDA samples - reduction](https://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-parallel-reduction) and the newer [CUDA samples - reduction using multiblock cooperative groups](https://docs.nvidia.com/cuda/cuda-samples/index.html#reduction-using-multiblock-cooperative-groups). An initial performance comparison (on V100?) of the 2 algorithms should be done. Challenge is to map the algorithm for a linear contiguous array to a multidimensional array (with padding). ### CPU Research of an efficient algorithm for modern CPUs needs to be done. Possible starting point: - [Using Advanced Vector Extensions AVX-512 for MPI Reductions](https://www.icl.utk.edu/files/publications/2020/icl-utk-1416-2020.pdf)