Field storage and operators redesign

# Field storage and operators redesign ## Goals - Enable device-aware memory management, particularly transferring data between host and devices only when needed. - Decoupling the spectral/hp element operators (e.g. `BwdTrans`) from the expansion definition (i.e. `Explist`) and the field storage. - Enable transparent reordering of data in preparation for specific SIMD operations (e.g. AVX512 or GPU kernels). - Enable series of SIMD operations without intermediate reordering. ### Non goals ## Current implementation Both data and operators are declared and defined in `ExpList` and derived classes. Here "data" refers to `ExpList`'s attributes ```cpp= Array<OneD, NekDouble> m_coeffs // all local expansion coeffs Array<OneD, NekDouble> m_phys // global expansion at quad points ``` Operators are `ExpList` methods acting on that data, for instance ```cpp= inline void BwdTrans ( const Array<OneD, const NekDouble> &inarray, Array<OneD,NekDouble> &outarray); ``` or ```cpp= /// Solve helmholtz problem inline void HelmSolve( const Array<OneD, const NekDouble> &inarray, Array<OneD, NekDouble> &outarray, const StdRegions::ConstFactorMap &factors, const StdRegions::VarCoeffMap &varcoeff = StdRegions::NullVarCoeffMap, const MultiRegions::VarFactorsMap &varfactors = MultiRegions::NullVarFactorsMap, const Array<OneD, const NekDouble> &dirForcing = NullNekDouble1DArray, onst bool PhysSpaceForcing = true); ``` Solver implementations access operators through `ExpList` objects ```cpp= // Define Expansion Exp = MemoryManager<MultiRegions::ContField>::AllocateSharedPtr(vSession,graph3D,vSession->GetVariable(0)); // ... // Project onto Expansion Exp->FwdTrans(Fce->GetPhys(), Exp->UpdateCoeffs()); // ... ``` ### Downside of Current implementation - A downside of this implementation is that new operators must be added as methods of `ExpList` objects, even if only required by a specific solver. - To declare new fields we make soft or hard copy of the whole `ExpList`. - Supporting GPUs woud require us to explicitly or implicitly use cuda code or a layer or indirection (i.e. `Collections` and `MatrixFree`) ## Proposed solution ### Summary There are two strands to our redesign approach: - Strand 1: Extract storage from `ExpList` into FieldStorage class. - Strand 2: Implement a new operators functionality (e.g. `BwdTrans`) which is independent from `ExpList`. Starting from an abstract base operator class, we’ll define a hierarchy of operators supporting multiple architectures (GPU/AVX) and/or algorithmic variations (i.e. local matrix, sum factorisation, matrix free). This will then influence the internal design of FieldStorage for example interlacing of elements for vectorization. The plan is to try and not break the existing operation whilst we go through these developments. The use of FieldStorage will require some changes, but we aim to keep the new operators as independent as possible. ### Top level abstractions #### `FieldStorage` This class encapsulates field data, whether it is global expansion data or local coefficients data. It is templated on - The data type (e.g. `NekDouble`) - The storage type (e.g. physical expansion or coefficients) - The data layout (e.g. element-by-element, interleaved element-by-element, interleaved by variable) `FieldStorage` is created from a `Explist` object, and holds a shared pointer to it (**why**)? ##### Public interface + `FieldStorage(MultiRegions::ExpListSharedPtr exp)` + `Array<OneD, DataType>& UpdateData()` + `Array<OneD, const DataType> GetData() const` + `Array<OneD, DataType>& UpdateArray1D()` Return an OneD Array pointing to underlying DataType + `Array<OneD, const DataType> GetArray1D() const` Return a const OneD Array currently making a copy + `ReorderData()` ##### Private interface + `ExpListSharePtr m_expList` Corresponding expansion list object + `std::vector<DataType> m_storage` ##### Relationships - Contains: pointer to corresponding ExpList - is contained by: - inherit from: - is a parent of: #### `Operator` We have discussed the possibility of introducing a "patch" of elements which would have a group of similar shape elements of similar polynomial/quadrature order. So an operator class might work on a patch where for example a local DG patch might include data layout of the following form: ![](https://i.imgur.com/uTfZyvi.jpg) where the orange quadrilateral blocks represent the volumetric data. This is interfaced by trace spaces which could come from outside as indicated by the green segments as well as the interior patch traces which can be constructed locally. So one possibility is that we could construct patches that include the volumetric data plus the green traces that would allow for either boundary condition data or inter-patch communication. Before pursuing the patch concept further we need to do some testing. A first step will be to make a full operator (such as the Helmholtz solve) work on the GPU. We could then see how performant a small patch size mesh performs. Note that even for a continuous Galerkin expansion the patch information (probably containing coefficients rather than quadrature points) might also be good to iterate over patches. In this case the green trace data would potenially have coefficients from the neighbouring patch elements. ### Examples Illustrations of what solver implementations would look like after the redesign. ## Milestones ### Minimal `FieldStorage` prototype Can we just pass`FieldStorage` instances to `ExpList` methods as a start? ### Helmholtz prototype