Design patterns for real-space data handling: refactoring VISUAL module

gosia.olejniczak@gmail.com
DIRAC meeting, June 2022

notes: https://hackmd.io/@gosia/Bk25mkADc
boards: bigger picture, layer strategy
DIRAC issue: #545

Outline:

Motivation for refactoring VISUAL module
Conceptualization
Architectural drivers to consider
Strategy and schema
Refactoring ROI

Refactoring VISUAL module - motivation

extending module's functionality:
- ability to calculate (and easily add) various densities for a wide class of methods
- ability to call VISUAL from different modules (e.g. from FDE, for a selected subsystem)
- easier integration with external codes
better performance:
- calculate different densities on the same grid in the same run
- reuse precalculated objects (e.g. from checkpoints)
- parallelization
- future: possibly profit from GPU-based architectures
better data structures and storage
- support for hdf5 (numerics) files
- metadata for easier data retrieval and summarization (data-based workflows)

please contribute: #545

Conceptualization: current functionality of VISUAL module

central quantity: "property density"

$P = \int P (τ) d τ, P (τ) = \sum_{k} ⟨ ψ (τ) | f_{k} M_{k} {\hat{Ω}}_{k} | ψ (τ) ⟩$
- largely covers one-electron operators
- mapping to objects:
  - $P, P$ - physical objects, continuous, grid-independent
  - $P (τ)$ - numerical object (discretization of
    $P$ ), grid-dependent
two tasks:
- export of densities on a grid
- integration of densities on a grid

Programming design - layer strategy:

layer strategy: application layer -> property layer -> scalar field layer -> mesh layer
advantages of layer structure:
- higher layer defines an interface to a lower layer
- drives encapsulation, helps to discern modules
- helps to formulate schema

Programming design - other architectural drivers to consider

functionality, scallability, numerical accuracy
developer-friendliness: reduce future development time
- design for code reuse
- exploit templates (generic programming paradigm; paper) to formulate abstractions
- separate model from implementation (reduce data-dependent bugs)
- separate data computation from data I/O
- document the process (e.g. how to extend the schema)
- use consistent labels in all schemas (e.g. ao_matrices)
- generalize utils/process_schema.py and gp/checkpoint.F90 to avoid duplication?
usability, and user-friendliness
- "from DIRAC calculations to data analysis" - showcases (tutorials?)
- enable restarts of computations on grids
scallability: parallelization strategy
- MPI vs Fortran coarray distributed data structures (from Fortran 2008), paper
testability concerns
- improve test coverage
- optimize timing and memory load of tests (test set timing benchmarks?)
- select representative real-space data for testing

Moder Fortran features

modularity:
- impose data privacy and functional purity (where feasible)
- separate model from implementation
- manipulate physical and numerical objects without access to their data
- hide implementation details
derived types:
- encapsulate numerical data and algorithms into objects
- look for generic structures with extended type constructs, e.g.
  - type, extends (parent_type) :: my_type
  - type, abstract:: parent_type

"coordinate-free programming" (paper1, paper2):

manipulate objects without explicit dependence on grid type
store only what is needed, e.g.

type regular_grid
  npoints, dx, origin ! required: all we need to construct regular grid
  nodes(:)            ! optional: store only in specific cases (e.g. imported grid)
end type

overloading operators:
- simpler operations, generic programming (paper), possibly automatic code generation

VISUAL schema

January 2022 - decision to use separate schemas for different modules (miro board)

labeling ideas for property densities - after src/openrsp (?)

type(prop_field_info) :: field_list(14) = &                         !nc an ba ln qu
  (/prop_field_info('EXCI', 'Generalized "excitation" field'      , 1, F, F, T, T), &
    prop_field_info('FREQ', 'Generalized "freqency" field'        , 1, F, F, T, T), &
    prop_field_info('AUX*', 'Auxiliary integrals on file'         , 1, F, F, T, F), &
    prop_field_info('PNC' , 'PNC'                                 , 1, F, F, T, F), &
    prop_field_info('EL'  , 'Electric field'                      , 3, F, F, T, F), &
    prop_field_info('VEL' , 'Velocity'                            , 3, T, F, T, F), &
    prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), &
    prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), &
    prop_field_info('ELGR', 'Electric field gradient'             , 6, F, F, T, F), &
    prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), &
    prop_field_info('GEO' , 'Nuclear coordinates'                 ,-1, F, T, F, F), & !-1=mol-dep
    prop_field_info('NUCM', 'Nuclear magnetic moment'             ,-1, F, T, F, T), & !-1=mol-dep
    prop_field_info('AOCC', 'AO contraction coefficients'         ,-1, F, T, F, F), & !-1=mol-dep
    prop_field_info('AOEX', 'AO exponents'                        ,-1, F, T, F, F)/)  !-1=mol-dep
    ```

Refactoring discussion: return of investment and bigger picture

numerics, memory usage, data summarization: hdf5
large grids, large systems, benchmarks on molecular sets: parallelization
functionality: current and future interests:
- DFCOEF, AOPROPER - replace by reading from checkpoints
- CC/EXACORR - CCDENS(?) MCSCF - MCNATOCC (?), DMRG?
- real-space relativistic quantum chemistry
- data-driven workflows, e.g. back-and-forth communication between DIRAC and other analysis software
VISUAL in bigger picture: board

please contribute: #545

Design patterns for real-space data handling: refactoring VISUAL module

Outline:

Refactoring VISUAL module - motivation

Conceptualization: current functionality of VISUAL module

Programming design - layer strategy:

Programming design - other architectural drivers to consider

Moder Fortran features

VISUAL schema

Refactoring discussion: return of investment and bigger picture

Read more

DM 06/2023. Design patterns for real-space data handling

Post-SCF and real-space data with HDF5