Try   HackMD

Design patterns for real-space data handling: refactoring VISUAL module

gosia.olejniczak@gmail.com
DIRAC meeting, June 2022

notes: https://hackmd.io/@gosia/Bk25mkADc
boards: bigger picture, layer strategy
DIRAC issue: #545


Outline:

  • Motivation for refactoring VISUAL module
  • Conceptualization
  • Architectural drivers to consider
  • Strategy and schema
  • Refactoring ROI

Refactoring VISUAL module - motivation

  • extending module's functionality:

    • ability to calculate (and easily add) various densities for a wide class of methods
    • ability to call VISUAL from different modules (e.g. from FDE, for a selected subsystem)
    • easier integration with external codes
  • better performance:

    • calculate different densities on the same grid in the same run
    • reuse precalculated objects (e.g. from checkpoints)
    • parallelization
    • future: possibly profit from GPU-based architectures
  • better data structures and storage

    • support for hdf5 (numerics) files
    • metadata for easier data retrieval and summarization (data-based workflows)

please contribute: #545


Conceptualization: current functionality of VISUAL module

  • central quantity: "property density"

    P=∫P(Ο„)dΟ„,P(Ο„)=βˆ‘k⟨ψ(Ο„)|fkMkΞ©^k|ψ(Ο„)⟩

    • largely covers one-electron operators
    • mapping to objects:
      • P,P
        - physical objects, continuous, grid-independent
      • P(Ο„)
        - numerical object (discretization of
        P
        ), grid-dependent
  • two tasks:

    • export of densities on a grid
    • integration of densities on a grid

Programming design - layer strategy:

  • layer strategy: application layer -> property layer -> scalar field layer -> mesh layer

  • advantages of layer structure:

    • higher layer defines an interface to a lower layer
    • drives encapsulation, helps to discern modules
    • helps to formulate schema

Programming design - other architectural drivers to consider

  • functionality, scallability, numerical accuracy

  • developer-friendliness: reduce future development time

    • design for code reuse
    • exploit templates (generic programming paradigm; paper) to formulate abstractions
    • separate model from implementation (reduce data-dependent bugs)
    • separate data computation from data I/O
    • document the process (e.g. how to extend the schema)
    • use consistent labels in all schemas (e.g. ao_matrices)
    • generalize utils/process_schema.py and gp/checkpoint.F90 to avoid duplication?
  • usability, and user-friendliness

    • "from DIRAC calculations to data analysis" - showcases (tutorials?)
    • enable restarts of computations on grids
  • scallability: parallelization strategy

    • MPI vs Fortran coarray distributed data structures (from Fortran 2008), paper
  • testability concerns

    • improve test coverage
    • optimize timing and memory load of tests (test set timing benchmarks?)
    • select representative real-space data for testing

Moder Fortran features

  • modularity:

    • impose data privacy and functional purity (where feasible)
    • separate model from implementation
    • manipulate physical and numerical objects without access to their data
    • hide implementation details
  • derived types:

    • encapsulate numerical data and algorithms into objects
    • look for generic structures with extended type constructs, e.g.
      • type, extends (parent_type) :: my_type
      • type, abstract:: parent_type
  • "coordinate-free programming" (paper1, paper2):

    • manipulate objects without explicit dependence on grid type
    • store only what is needed, e.g.
    ​​​​type regular_grid
    ​​​​  npoints, dx, origin ! required: all we need to construct regular grid
    ​​​​  nodes(:)            ! optional: store only in specific cases (e.g. imported grid)
    ​​​​end type
    
  • overloading operators:

    • simpler operations, generic programming (paper), possibly automatic code generation

VISUAL schema

  • January 2022 - decision to use separate schemas for different modules (miro board)
  • labeling ideas for property densities - after src/openrsp (?)
    ​​​​type(prop_field_info) :: field_list(14) = &                         !nc an ba ln qu
    ​​​​  (/prop_field_info('EXCI', 'Generalized "excitation" field'      , 1, F, F, T, T), &
    ​​​​    prop_field_info('FREQ', 'Generalized "freqency" field'        , 1, F, F, T, T), &
    ​​​​    prop_field_info('AUX*', 'Auxiliary integrals on file'         , 1, F, F, T, F), &
    ​​​​    prop_field_info('PNC' , 'PNC'                                 , 1, F, F, T, F), &
    ​​​​    prop_field_info('EL'  , 'Electric field'                      , 3, F, F, T, F), &
    ​​​​    prop_field_info('VEL' , 'Velocity'                            , 3, T, F, T, F), &
    ​​​​    prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), &
    ​​​​    prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), &
    ​​​​    prop_field_info('ELGR', 'Electric field gradient'             , 6, F, F, T, F), &
    ​​​​    prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), &
    ​​​​    prop_field_info('GEO' , 'Nuclear coordinates'                 ,-1, F, T, F, F), & !-1=mol-dep
    ​​​​    prop_field_info('NUCM', 'Nuclear magnetic moment'             ,-1, F, T, F, T), & !-1=mol-dep
    ​​​​    prop_field_info('AOCC', 'AO contraction coefficients'         ,-1, F, T, F, F), & !-1=mol-dep
    ​​​​    prop_field_info('AOEX', 'AO exponents'                        ,-1, F, T, F, F)/)  !-1=mol-dep
    ​​​​    ```
    
    
    
    

Refactoring discussion: return of investment and bigger picture

  • numerics, memory usage, data summarization: hdf5
  • large grids, large systems, benchmarks on molecular sets: parallelization
  • functionality: current and future interests:
    • DFCOEF, AOPROPER - replace by reading from checkpoints
    • CC/EXACORR - CCDENS(?) MCSCF - MCNATOCC (?), DMRG?
    • real-space relativistic quantum chemistry
    • data-driven workflows, e.g. back-and-forth communication between DIRAC and other analysis software
  • VISUAL in bigger picture: board

please contribute: #545