---
title: DM 06/2023. Design patterns for real-space data handling
tags: Talk, DMJUNE2023
description: View the slide with "Slide Mode".
notes: notes_on_refactoring_VISUAL_in_DIRAC
---
# Design patterns for real-space data handling
gosia.olejniczak@gmail.com
DIRAC meeting, June 2023
notes: https://hackmd.io/f-HgQmWARmOj2HQvW7qj1g
DIRAC issue: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545)
DIRAC branch (dirac-private): `gosia/visual`
---
## Outline:
* VISUAL module - quick presentation
* refactoring motivation
* refactoring conceptualization and discussion
* related issues: FDE module, schemas for real-space data and response theory
---
### VISUAL module
* "property density"
$P = \int {\bf P}(\vec{r}) d\vec{r}, \qquad {\bf P}(\vec{r}) = \langle \psi(\vec{r}) | \hat{P}(\vec{r}) | \psi (\vec{r})\rangle$
* $\hat{P}(\vec{r})$ - a list of predefined operators, also largely covers [one-electron operators](http://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#one-electron-operators)
* examples: electron density, electron localization function, ...
* in `VISUAL` module:
* ${\bf P}(\vec{r})$ is discretized on a grid
* ${\bf P}(\vec{r})$ values on a grid can be exported on file (e.g., for real-space analysis)
* ${\bf P}(\vec{r})$ can be integrated to $P$ (e.g., for a numerical evaluation of a property)
---
## Refactoring VISUAL module - motivation/new features:
* **labeled storage** and **HDF5 support** (performance, numerics)
* metadata for data retrieval/regeneration
* **parallelization**
* future: possibly adapt to GPU-based architectures
* **detach calculations from analysis** - quantum data from external sources
* generic programming style, flexibility to define densities and grids
---
## Conceptualization
![](https://hackmd.io/_uploads/BJWyLiBUh.png)
[board](https://miro.com/app/board/uXjVOxYWTQs=/?share_link_id=902887884886)
---
### New user input
```
.GRIDS
3
id=1 input=create ndim=3 npoints=[3,3,3] margin=2.0 typ=1 export=yes file_out=grid1.h5
id=2 input=create ndim=3 spacing=[0.2,0.2,0.2] margin=2.0 typ=1 export=yes file_out=grid3.h5
id=3 input=import file_inp=grid2.h5
.GRIDFUNCTIONS
3
name=density id_grid=3 purpose=visualization export=yes file_out=ed.h5
name=reduced_density_gradient id_grid=1 purpose=visualization export=yes file_out=density_rdg.h5
```
* test: `visual_custom_output`
* documentation + tutorial - **to do**
* keep a list of predefined function, but also enable the user to define functions, e.g., as with the [.OPERATOR keyword](https://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#operator-specification))
---
### Labeled storage and schema
- labels and schema in `VISUAL`: `VISUAL_checkpoint.F90` and `VISUALschema.txt` - developed after `gp/checkpoint.F90` and `DIRACchema.txt`
- **idea:** make it more generic, e.g.:
- unify labels for GRID, VISUAL and FDE modules
- use one checkpoint and one schema for all real-space data (`gp/REALSPACE_checkpoint.F90`, `utils/REALSPACEschema.txt`) or keep each of these modules self-contained?
---
### Schema for real-space data (snippet):
```
*input
molecule composite single required # topology of the molecular system
grid composite array required # information about grid on input
grid_function composite array required # information about grid functions on input
*end
*result
execution composite single required # information about the run
grid composite array required # information about grid on output
grid_function composite array required # information about grid functions on output
*end
*grid
id integer single required # grid id
ndim integer single optional # grid dimensionality
nr_points integer array optional # number of points in each dimension
nr_points_total integer single optional # total number of points (if number of points in each dimension unknown)
origin real array optional # grid origin
spacing real array optional # grid spacing
margin real array optional # margin (in Angstrom) to be added around molecule when creating the grid
file_inp string single optional # name of a file for grid import
file_out string single optional # name of a file for grid export
points real array optional # grid points
*end
*grid_function
id_assigned_grid integer single required # id of assigned grid
wf_order integer single required # grid function dependent on unperturbed wave function only(0), perturbed wave functions (1, 2, ..)
tensor_order integer single required # grid function represented with a scalar field (0), vector field (1), tensor field of rank 2 (2), ...
purpose string single required # visualization/integration
file_inp string single optional # name of a file for grid function import
file_out string single optional # name of a file for grid function export
values composite single required # grid function values
*end
```
---
### Data storage and representation
- data schema:
- 2 main groups, `grids` and `grid_functions`
- `grid function` always exported with a `grid` (a related grid is referred to via an index);
- `grid` can be imported/exported separately
- many grid functions can be associated to the same grid
- alternative schema: 1 group - `grid`, then `grid_functions` as attributes of a `grid`
- data calculated as 2D arrays, data ordering: `(ndim, npoints)`
- storage layout:
- 1 file, contiguous ("flattened arrays"), incompressible data
- chunking would enable adding more data to it ([ref](https://portal.hdfgroup.org/display/HDF5/Chunking+in+HDF5))
- collective MPI-IO as done in DIRAC ("one reads, one writes")
- alternative: [Parallel HDF5](https://www.alcf.anl.gov/files/Parallel_HDF5_1.pdf#[0,{%22name%22:%22XYZ%22},null,null,null])
- **to do:** test whether these choices are optimal
- option - use [hdf5-iotest](https://github.com/HDFGroup/hdf5-iotest)
---
## Discussion - passing quantum data to VISUAL
---
### VISUAL module under the hood
* $P_k(\vec{r}) = \langle \psi(\vec{r}) | \hat{P}(\vec{r}) | \psi (\vec{r})\rangle = \langle \chi_\kappa(\vec{r}) | \hat{P}(\vec{r}) | \chi_\lambda (\vec{r})\rangle \tilde{D}_{\lambda\kappa}$
* Density matrices, $\tilde{D}_{\lambda\kappa}$:
* $D_{\lambda\kappa}^0 = c_{\lambda m} I c^*_{\kappa n}$
* $D_{\lambda\kappa}^{P} = c_{\lambda m} W_{mn}c^*_{\kappa n}$, with $\mathbf{W}$ holding response parameters
* in DIRAC - density matrices are generally computed and stored in quaternion algebra, assume Kramers pairing of AOs
* in (old) VISUAL
1. MO coefficients are read from `CHECKPOINT.h5` in symmetry-adapted (SA) basis
2. different parts of MO coefficient array (occ, virt) are selected with `get_C` subroutines (from `matrix_operations` module)
3. MO occupation vector is generated (the user can define occupations [LINK to .OCCUPATION keyword])
4. density matrices (unperturbed and perturbed) in SA-basis are generated by backtransforming MO occupation matrix (3.) with matrices of selected coefficients (2.)
5. AOs are generated in `dirac_ao_eval_init`
* **Discussion:**
* replace 5. with AOs read from `CHECKPOINT.h5` (**to do**)
* do not calculate quantum data in `VISUAL`, get this data from external sources
* replace 1.-4. with density matrices read from checkpoints?
---
### Schema for response data:
* labeling ideas for property densities - after `src/openrsp` (?)
```
type(prop_field_info) :: field_list(14) = & !nc an ba ln qu
(/prop_field_info('EXCI', 'Generalized "excitation" field' , 1, F, F, T, T), &
prop_field_info('FREQ', 'Generalized "freqency" field' , 1, F, F, T, T), &
prop_field_info('AUX*', 'Auxiliary integrals on file' , 1, F, F, T, F), &
prop_field_info('PNC' , 'PNC' , 1, F, F, T, F), &
prop_field_info('EL' , 'Electric field' , 3, F, F, T, F), &
prop_field_info('VEL' , 'Velocity' , 3, T, F, T, F), &
prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), &
prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), &
prop_field_info('ELGR', 'Electric field gradient' , 6, F, F, T, F), &
prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), &
prop_field_info('GEO' , 'Nuclear coordinates' ,-1, F, T, F, F), & !-1=mol-dep
prop_field_info('NUCM', 'Nuclear magnetic moment' ,-1, F, T, F, T), & !-1=mol-dep
prop_field_info('AOCC', 'AO contraction coefficients' ,-1, F, T, F, F), & !-1=mol-dep
prop_field_info('AOEX', 'AO exponents' ,-1, F, T, F, F)/) !-1=mol-dep
```
----
## Motivation - where this is heading
* science:
* real-space analysis of molecular densities to study chemical bonding and reactivity:
* electron density, reduced density gradient (RDG), electron localization function (ELF), etc.
* large sets of data, various molecular systems with non-covalent intractions (connection to FDE)
* exploring the use of the Topological Data Analysis tools to quantum chemistry analysis, e.g.:
* robust analysis of single functions
* joint analysis of multiple descriptors (topological similarity)
* proposing new scalar descriptors
* data-driven workflows
* **discussion:** back-and-forth communication between DIRAC and external analysis software - extend `pam` or develop in external scripts (e.g., pyADF, in-house scripts)?