---
title: Design patterns for real-space data handling
tags: Talk, DIRACmeeting
description: View the slide with "Slide Mode".
---
# Design patterns for real-space data handling: refactoring VISUAL module
gosia.olejniczak@gmail.com
DIRAC meeting, June 2022
notes: https://hackmd.io/@gosia/Bk25mkADc
boards: [bigger picture](https://miro.com/welcomeonboard/YmRCeDFGVVJTdVY5VkUzY3dCak5lRWFYT3B1Y3c1bk9IYlVIbzlWSWNveHV2cFliVlZGNTM4bzF1SW1CYzJCTXwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=193494696810), [layer strategy](https://miro.com/welcomeonboard/Z1NkQUdiN3VLTWVIV3pGWWFDbzZuYlQ1SVNMVTBkUGtoSFlCRjdCak5QY1E5MkpkYVlDS2dtN1BET1NmV0gybnwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=179668914413)
DIRAC issue: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545)
---
### Outline:
* Motivation for refactoring VISUAL module
* Conceptualization
* Architectural drivers to consider
* Strategy and schema
* Refactoring ROI
---
### Refactoring VISUAL module - motivation
* extending module's **functionality**:
* ability to calculate (and easily add) various densities for a wide class of methods
* ability to call VISUAL from different modules (e.g. from FDE, for a selected subsystem)
* easier integration with external codes
* better **performance**:
* calculate different densities on the same grid in the same run
* reuse precalculated objects (e.g. from checkpoints)
* parallelization
* future: possibly profit from GPU-based architectures
* better data structures and storage
* support for hdf5 (**numerics**) files
* metadata for easier data retrieval and summarization (**data-based workflows**)
please contribute: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545)
---
### Conceptualization: current functionality of VISUAL module
* central quantity: "property density"
$P = \int {\bf P}(\tau) d\tau, \qquad {\bf P}(\tau) = \sum_k \langle \psi(\tau) | f_k M_k \hat{\Omega}_k | \psi (\tau)\rangle$
* largely covers [one-electron operators](http://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#one-electron-operators)
* mapping to objects:
* $P, {\bf P}$ - physical objects, continuous, grid-independent
* ${\bf P}(\tau)$ - numerical object (discretization of ${\bf P}$), grid-dependent
* two tasks:
* export of densities on a grid
* integration of densities on a grid
---
### Programming design - layer strategy:
* [layer strategy](https://miro.com/welcomeonboard/Z1NkQUdiN3VLTWVIV3pGWWFDbzZuYlQ1SVNMVTBkUGtoSFlCRjdCak5QY1E5MkpkYVlDS2dtN1BET1NmV0gybnwzNDU4NzY0NTIxMDAxNzI3NDg3?share_link_id=179668914413): application layer -> property layer -> scalar field layer -> mesh layer
* advantages of layer structure:
* higher layer defines an interface to a lower layer
* drives encapsulation, helps to discern modules
* helps to formulate schema
---
### Programming design - other architectural drivers to consider
* functionality, scallability, numerical accuracy
* developer-friendliness: reduce future development time
* design for code reuse
* exploit templates (generic programming paradigm; [paper](https://dl.acm.org/doi/pdf/10.1145/3374905.3374908)) to formulate abstractions
* separate model from implementation (reduce data-dependent bugs)
* separate data computation from data I/O
* document the process (e.g. how to extend the schema)
* use consistent labels in all schemas (e.g. `ao_matrices`)
* generalize `utils/process_schema.py` and `gp/checkpoint.F90` to avoid duplication?
* usability, and user-friendliness
* "from DIRAC calculations to data analysis" - showcases (tutorials?)
* enable restarts of computations on grids
* scallability: parallelization strategy
* MPI vs Fortran coarray distributed data structures (from Fortran 2008), [paper](https://www.hindawi.com/journals/sp/2015/942059/)
* testability concerns
* improve test coverage
* optimize timing and memory load of tests (test set timing benchmarks?)
* select representative real-space data for testing
---
### Moder Fortran features
* modularity:
* impose data privacy and functional purity (where feasible)
* separate model from implementation
* manipulate physical and numerical objects without access to their data
* hide implementation details
* derived types:
* encapsulate numerical data and algorithms into objects
* look for generic structures with extended `type` constructs, e.g.
* `type, extends (parent_type) :: my_type`
* `type, abstract:: parent_type`
* "coordinate-free programming" ([paper1](https://www.hindawi.com/journals/sp/2015/942059/), [paper2](https://dl.acm.org/doi/abs/10.1145/1322436.1322438)):
* manipulate objects without explicit dependence on grid type
* store only what is needed, e.g.
```
type regular_grid
npoints, dx, origin ! required: all we need to construct regular grid
nodes(:) ! optional: store only in specific cases (e.g. imported grid)
end type
```
* overloading operators:
* simpler operations, generic programming ([paper](https://dl.acm.org/doi/pdf/10.1145/3374905.3374908)), possibly automatic code generation
---
### VISUAL schema
* January 2022 - decision to use separate schemas for different modules ([miro board](https://miro.com/welcomeonboard/ZnpDdngzNk1GVGEyb2F3Qkk4bG9UUEtJUEM0U2g0dmZMemtidExwWmVFS3FJcEtIMnNNajVlbExpRzNveENSQXwzNDU4NzY0NTE1ODY0NTk1Njcy?invite_link_id=289216627870))
* labeling ideas for property densities - after `src/openrsp` (?)
```
type(prop_field_info) :: field_list(14) = & !nc an ba ln qu
(/prop_field_info('EXCI', 'Generalized "excitation" field' , 1, F, F, T, T), &
prop_field_info('FREQ', 'Generalized "freqency" field' , 1, F, F, T, T), &
prop_field_info('AUX*', 'Auxiliary integrals on file' , 1, F, F, T, F), &
prop_field_info('PNC' , 'PNC' , 1, F, F, T, F), &
prop_field_info('EL' , 'Electric field' , 3, F, F, T, F), &
prop_field_info('VEL' , 'Velocity' , 3, T, F, T, F), &
prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), &
prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), &
prop_field_info('ELGR', 'Electric field gradient' , 6, F, F, T, F), &
prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), &
prop_field_info('GEO' , 'Nuclear coordinates' ,-1, F, T, F, F), & !-1=mol-dep
prop_field_info('NUCM', 'Nuclear magnetic moment' ,-1, F, T, F, T), & !-1=mol-dep
prop_field_info('AOCC', 'AO contraction coefficients' ,-1, F, T, F, F), & !-1=mol-dep
prop_field_info('AOEX', 'AO exponents' ,-1, F, T, F, F)/) !-1=mol-dep
```
---
### Refactoring discussion: return of investment and bigger picture
* numerics, memory usage, data summarization: **hdf5**
* large grids, large systems, benchmarks on molecular sets: **parallelization**
* **functionality**: current and future interests:
* `DFCOEF`, `AOPROPER` - replace by reading from checkpoints
* CC/EXACORR - `CCDENS`(?) MCSCF - `MCNATOCC` (?), DMRG?
* real-space relativistic quantum chemistry
* data-driven workflows, e.g. back-and-forth communication between DIRAC and other analysis software
* VISUAL in bigger picture: [board](https://miro.com/app/board/uXjVO6dnrAU=/ )
please contribute: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545)