Try   HackMD

Design patterns for real-space data handling

gosia.olejniczak@gmail.com
DIRAC meeting, June 2023
notes: https://hackmd.io/f-HgQmWARmOj2HQvW7qj1g
DIRAC issue: #545
DIRAC branch (dirac-private): gosia/visual


Outline:

  • VISUAL module - quick presentation
  • refactoring motivation
  • refactoring conceptualization and discussion
  • related issues: FDE module, schemas for real-space data and response theory

VISUAL module

  • "property density"
    P=P(r)dr,P(r)=ψ(r)|P^(r)|ψ(r)
    • P^(r)
      - a list of predefined operators, also largely covers one-electron operators
    • examples: electron density, electron localization function,
  • in VISUAL module:
    • P(r)
      is discretized on a grid
    • P(r)
      values on a grid can be exported on file (e.g., for real-space analysis)
    • P(r)
      can be integrated to
      P
      (e.g., for a numerical evaluation of a property)

Refactoring VISUAL module - motivation/new features:

  • labeled storage and HDF5 support (performance, numerics)
  • metadata for data retrieval/regeneration
  • parallelization
  • future: possibly adapt to GPU-based architectures
  • detach calculations from analysis - quantum data from external sources
  • generic programming style, flexibility to define densities and grids

Conceptualization

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

board


New user input

.GRIDS
3
id=1 input=create ndim=3 npoints=[3,3,3] margin=2.0 typ=1 export=yes file_out=grid1.h5
id=2 input=create ndim=3 spacing=[0.2,0.2,0.2] margin=2.0 typ=1 export=yes file_out=grid3.h5
id=3 input=import file_inp=grid2.h5
.GRIDFUNCTIONS
3
name=density id_grid=3 purpose=visualization export=yes file_out=ed.h5
name=reduced_density_gradient id_grid=1 purpose=visualization export=yes file_out=density_rdg.h5
  • test: visual_custom_output
  • documentation + tutorial - to do
  • keep a list of predefined function, but also enable the user to define functions, e.g., as with the .OPERATOR keyword)

Labeled storage and schema

  • labels and schema in VISUAL: VISUAL_checkpoint.F90 and VISUALschema.txt - developed after gp/checkpoint.F90 and DIRACchema.txt
  • idea: make it more generic, e.g.:
    • unify labels for GRID, VISUAL and FDE modules
    • use one checkpoint and one schema for all real-space data (gp/REALSPACE_checkpoint.F90, utils/REALSPACEschema.txt) or keep each of these modules self-contained?

Schema for real-space data (snippet):

*input
molecule         composite single    required    # topology of the molecular system
grid             composite array     required    # information about grid on input
grid_function    composite array     required    # information about grid functions on input
*end

*result
execution        composite single    required    # information about the run
grid             composite array     required    # information about grid on output
grid_function    composite array     required    # information about grid functions on output
*end

*grid
id               integer   single    required    # grid id
ndim             integer   single    optional    # grid dimensionality
nr_points        integer   array     optional    # number of points in each dimension
nr_points_total  integer   single    optional    # total number of points (if number of points in each dimension unknown)
origin           real      array     optional    # grid origin
spacing          real      array     optional    # grid spacing
margin           real      array     optional    # margin (in Angstrom) to be added around molecule when creating the grid
file_inp         string    single    optional    # name of a file for grid import
file_out         string    single    optional    # name of a file for grid export
points           real      array     optional    # grid points
*end

*grid_function
id_assigned_grid integer   single    required    # id of assigned grid
wf_order         integer   single    required    # grid function dependent on unperturbed wave function only(0), perturbed wave functions (1, 2, ..)
tensor_order     integer   single    required    # grid function represented with a scalar field (0), vector field (1), tensor field of rank 2 (2), ...
purpose          string    single    required    # visualization/integration
file_inp         string    single    optional    # name of a file for grid function import
file_out         string    single    optional    # name of a file for grid function export
values           composite single    required    # grid function values
*end

Data storage and representation

  • data schema:

    • 2 main groups, grids and grid_functions
    • grid function always exported with a grid (a related grid is referred to via an index);
    • grid can be imported/exported separately
    • many grid functions can be associated to the same grid
    • alternative schema: 1 group - grid, then grid_functions as attributes of a grid
  • data calculated as 2D arrays, data ordering: (ndim, npoints)

  • storage layout:

    • 1 file, contiguous ("flattened arrays"), incompressible data
    • chunking would enable adding more data to it (ref)
  • collective MPI-IO as done in DIRAC ("one reads, one writes")

  • to do: test whether these choices are optimal


Discussion - passing quantum data to VISUAL


VISUAL module under the hood

  • Pk(r)=ψ(r)|P^(r)|ψ(r)=χκ(r)|P^(r)|χλ(r)D~λκ

  • Density matrices,

    D~λκ:

    • Dλκ0=cλmIcκn
    • DλκP=cλmWmncκn
      , with
      W
      holding response parameters
  • in DIRAC - density matrices are generally computed and stored in quaternion algebra, assume Kramers pairing of AOs

  • in (old) VISUAL

    1. MO coefficients are read from CHECKPOINT.h5 in symmetry-adapted (SA) basis
    2. different parts of MO coefficient array (occ, virt) are selected with get_C subroutines (from matrix_operations module)
    3. MO occupation vector is generated (the user can define occupations [LINK to .OCCUPATION keyword])
    4. density matrices (unperturbed and perturbed) in SA-basis are generated by backtransforming MO occupation matrix (3.) with matrices of selected coefficients (2.)
    5. AOs are generated in dirac_ao_eval_init
  • Discussion:

    • replace 5. with AOs read from CHECKPOINT.h5 (to do)
    • do not calculate quantum data in VISUAL, get this data from external sources
      • replace 1.-4. with density matrices read from checkpoints?

Schema for response data:

  • labeling ideas for property densities - after src/openrsp (?)
    ​​​​type(prop_field_info) :: field_list(14) = &                         !nc an ba ln qu
    ​​​​  (/prop_field_info('EXCI', 'Generalized "excitation" field'      , 1, F, F, T, T), &
    ​​​​    prop_field_info('FREQ', 'Generalized "freqency" field'        , 1, F, F, T, T), &
    ​​​​    prop_field_info('AUX*', 'Auxiliary integrals on file'         , 1, F, F, T, F), &
    ​​​​    prop_field_info('PNC' , 'PNC'                                 , 1, F, F, T, F), &
    ​​​​    prop_field_info('EL'  , 'Electric field'                      , 3, F, F, T, F), &
    ​​​​    prop_field_info('VEL' , 'Velocity'                            , 3, T, F, T, F), &
    ​​​​    prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), &
    ​​​​    prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), &
    ​​​​    prop_field_info('ELGR', 'Electric field gradient'             , 6, F, F, T, F), &
    ​​​​    prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), &
    ​​​​    prop_field_info('GEO' , 'Nuclear coordinates'                 ,-1, F, T, F, F), & !-1=mol-dep
    ​​​​    prop_field_info('NUCM', 'Nuclear magnetic moment'             ,-1, F, T, F, T), & !-1=mol-dep
    ​​​​    prop_field_info('AOCC', 'AO contraction coefficients'         ,-1, F, T, F, F), & !-1=mol-dep
    ​​​​    prop_field_info('AOEX', 'AO exponents'                        ,-1, F, T, F, F)/)  !-1=mol-dep
    ​​​​    ```
    
    
    

Motivation - where this is heading

  • science:

    • real-space analysis of molecular densities to study chemical bonding and reactivity:
      • electron density, reduced density gradient (RDG), electron localization function (ELF), etc.
      • large sets of data, various molecular systems with non-covalent intractions (connection to FDE)
    • exploring the use of the Topological Data Analysis tools to quantum chemistry analysis, e.g.:
      • robust analysis of single functions
      • joint analysis of multiple descriptors (topological similarity)
      • proposing new scalar descriptors
  • data-driven workflows

    • discussion: back-and-forth communication between DIRAC and external analysis software - extend pam or develop in external scripts (e.g., pyADF, in-house scripts)?