--- title: Design patterns for real-space data handling tags: Talk, DIRACmeeting description: View the slide with "Slide Mode". --- # Design patterns for real-space data handling: refactoring VISUAL module gosia.olejniczak@gmail.com DIRAC meeting, June 2022 notes: https://hackmd.io/@gosia/Bk25mkADc boards: [bigger picture](https://miro.com/welcomeonboard/YmRCeDFGVVJTdVY5VkUzY3dCak5lRWFYT3B1Y3c1bk9IYlVIbzlWSWNveHV2cFliVlZGNTM4bzF1SW1CYzJCTXwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=193494696810), [layer strategy](https://miro.com/welcomeonboard/Z1NkQUdiN3VLTWVIV3pGWWFDbzZuYlQ1SVNMVTBkUGtoSFlCRjdCak5QY1E5MkpkYVlDS2dtN1BET1NmV0gybnwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=179668914413) DIRAC issue: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545) --- ### Outline: * Motivation for refactoring VISUAL module * Conceptualization * Architectural drivers to consider * Strategy and schema * Refactoring ROI --- ### Refactoring VISUAL module - motivation * extending module's **functionality**: * ability to calculate (and easily add) various densities for a wide class of methods * ability to call VISUAL from different modules (e.g. from FDE, for a selected subsystem) * easier integration with external codes * better **performance**: * calculate different densities on the same grid in the same run * reuse precalculated objects (e.g. from checkpoints) * parallelization * future: possibly profit from GPU-based architectures * better data structures and storage * support for hdf5 (**numerics**) files * metadata for easier data retrieval and summarization (**data-based workflows**) please contribute: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545) --- ### Conceptualization: current functionality of VISUAL module * central quantity: "property density" $P = \int {\bf P}(\tau) d\tau, \qquad {\bf P}(\tau) = \sum_k \langle \psi(\tau) | f_k M_k \hat{\Omega}_k | \psi (\tau)\rangle$ * largely covers [one-electron operators](http://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#one-electron-operators) * mapping to objects: * $P, {\bf P}$ - physical objects, continuous, grid-independent * ${\bf P}(\tau)$ - numerical object (discretization of ${\bf P}$), grid-dependent * two tasks: * export of densities on a grid * integration of densities on a grid --- ### Programming design - layer strategy: * [layer strategy](https://miro.com/welcomeonboard/Z1NkQUdiN3VLTWVIV3pGWWFDbzZuYlQ1SVNMVTBkUGtoSFlCRjdCak5QY1E5MkpkYVlDS2dtN1BET1NmV0gybnwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=179668914413): application layer -> property layer -> scalar field layer -> mesh layer * advantages of layer structure: * higher layer defines an interface to a lower layer * drives encapsulation, helps to discern modules * helps to formulate schema --- ### Programming design - other architectural drivers to consider * functionality, scallability, numerical accuracy * developer-friendliness: reduce future development time * design for code reuse * exploit templates (generic programming paradigm; [paper](https://dl.acm.org/doi/pdf/10.1145/3374905.3374908)) to formulate abstractions * separate model from implementation (reduce data-dependent bugs) * separate data computation from data I/O * document the process (e.g. how to extend the schema) * use consistent labels in all schemas (e.g. `ao_matrices`) * generalize `utils/process_schema.py` and `gp/checkpoint.F90` to avoid duplication? * usability, and user-friendliness * "from DIRAC calculations to data analysis" - showcases (tutorials?) * enable restarts of computations on grids * scallability: parallelization strategy * MPI vs Fortran coarray distributed data structures (from Fortran 2008), [paper](https://www.hindawi.com/journals/sp/2015/942059/) * testability concerns * improve test coverage * optimize timing and memory load of tests (test set timing benchmarks?) * select representative real-space data for testing --- ### Moder Fortran features * modularity: * impose data privacy and functional purity (where feasible) * separate model from implementation * manipulate physical and numerical objects without access to their data * hide implementation details * derived types: * encapsulate numerical data and algorithms into objects * look for generic structures with extended `type` constructs, e.g. * `type, extends (parent_type) :: my_type` * `type, abstract:: parent_type` * "coordinate-free programming" ([paper1](https://www.hindawi.com/journals/sp/2015/942059/), [paper2](https://dl.acm.org/doi/abs/10.1145/1322436.1322438)): * manipulate objects without explicit dependence on grid type * store only what is needed, e.g. ``` type regular_grid npoints, dx, origin ! required: all we need to construct regular grid nodes(:) ! optional: store only in specific cases (e.g. imported grid) end type ``` * overloading operators: * simpler operations, generic programming ([paper](https://dl.acm.org/doi/pdf/10.1145/3374905.3374908)), possibly automatic code generation --- ### VISUAL schema * January 2022 - decision to use separate schemas for different modules ([miro board](https://miro.com/welcomeonboard/ZnpDdngzNk1GVGEyb2F3Qkk4bG9UUEtJUEM0U2g0dmZMemtidExwWmVFS3FJcEtIMnNNajVlbExpRzNveENSQXwzNDU4NzY0NTE1ODY0NTk1Njcy?invite_link_id=289216627870)) * labeling ideas for property densities - after `src/openrsp` (?) ``` type(prop_field_info) :: field_list(14) = & !nc an ba ln qu (/prop_field_info('EXCI', 'Generalized "excitation" field' , 1, F, F, T, T), & prop_field_info('FREQ', 'Generalized "freqency" field' , 1, F, F, T, T), & prop_field_info('AUX*', 'Auxiliary integrals on file' , 1, F, F, T, F), & prop_field_info('PNC' , 'PNC' , 1, F, F, T, F), & prop_field_info('EL' , 'Electric field' , 3, F, F, T, F), & prop_field_info('VEL' , 'Velocity' , 3, T, F, T, F), & prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), & prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), & prop_field_info('ELGR', 'Electric field gradient' , 6, F, F, T, F), & prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), & prop_field_info('GEO' , 'Nuclear coordinates' ,-1, F, T, F, F), & !-1=mol-dep prop_field_info('NUCM', 'Nuclear magnetic moment' ,-1, F, T, F, T), & !-1=mol-dep prop_field_info('AOCC', 'AO contraction coefficients' ,-1, F, T, F, F), & !-1=mol-dep prop_field_info('AOEX', 'AO exponents' ,-1, F, T, F, F)/) !-1=mol-dep ``` --- ### Refactoring discussion: return of investment and bigger picture * numerics, memory usage, data summarization: **hdf5** * large grids, large systems, benchmarks on molecular sets: **parallelization** * **functionality**: current and future interests: * `DFCOEF`, `AOPROPER` - replace by reading from checkpoints * CC/EXACORR - `CCDENS`(?) MCSCF - `MCNATOCC` (?), DMRG? * real-space relativistic quantum chemistry * data-driven workflows, e.g. back-and-forth communication between DIRAC and other analysis software * VISUAL in bigger picture: [board](https://miro.com/app/board/uXjVO6dnrAU=/ ) please contribute: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545)