--- title: DM 06/2023. Design patterns for real-space data handling tags: Talk, DMJUNE2023 description: View the slide with "Slide Mode". notes: notes_on_refactoring_VISUAL_in_DIRAC --- # Design patterns for real-space data handling gosia.olejniczak@gmail.com DIRAC meeting, June 2023 notes: https://hackmd.io/f-HgQmWARmOj2HQvW7qj1g DIRAC issue: [#545](https://gitlab.com/dirac/dirac-private/-/issues/545) DIRAC branch (dirac-private): `gosia/visual` --- ## Outline: * VISUAL module - quick presentation * refactoring motivation * refactoring conceptualization and discussion * related issues: FDE module, schemas for real-space data and response theory --- ### VISUAL module * "property density" $P = \int {\bf P}(\vec{r}) d\vec{r}, \qquad {\bf P}(\vec{r}) = \langle \psi(\vec{r}) | \hat{P}(\vec{r}) | \psi (\vec{r})\rangle$ * $\hat{P}(\vec{r})$ - a list of predefined operators, also largely covers [one-electron operators](http://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#one-electron-operators) * examples: electron density, electron localization function, ... * in `VISUAL` module: * ${\bf P}(\vec{r})$ is discretized on a grid * ${\bf P}(\vec{r})$ values on a grid can be exported on file (e.g., for real-space analysis) * ${\bf P}(\vec{r})$ can be integrated to $P$ (e.g., for a numerical evaluation of a property) --- ## Refactoring VISUAL module - motivation/new features: * **labeled storage** and **HDF5 support** (performance, numerics) * metadata for data retrieval/regeneration * **parallelization** * future: possibly adapt to GPU-based architectures * **detach calculations from analysis** - quantum data from external sources * generic programming style, flexibility to define densities and grids --- ## Conceptualization ![](https://hackmd.io/_uploads/BJWyLiBUh.png) [board](https://miro.com/app/board/uXjVOxYWTQs=/?share_link_id=902887884886) --- ### New user input ``` .GRIDS 3 id=1 input=create ndim=3 npoints=[3,3,3] margin=2.0 typ=1 export=yes file_out=grid1.h5 id=2 input=create ndim=3 spacing=[0.2,0.2,0.2] margin=2.0 typ=1 export=yes file_out=grid3.h5 id=3 input=import file_inp=grid2.h5 .GRIDFUNCTIONS 3 name=density id_grid=3 purpose=visualization export=yes file_out=ed.h5 name=reduced_density_gradient id_grid=1 purpose=visualization export=yes file_out=density_rdg.h5 ``` * test: `visual_custom_output` * documentation + tutorial - **to do** * keep a list of predefined function, but also enable the user to define functions, e.g., as with the [.OPERATOR keyword](https://www.diracprogram.org/doc/release-22/manual/one_electron_operators.html#operator-specification)) --- ### Labeled storage and schema - labels and schema in `VISUAL`: `VISUAL_checkpoint.F90` and `VISUALschema.txt` - developed after `gp/checkpoint.F90` and `DIRACchema.txt` - **idea:** make it more generic, e.g.: - unify labels for GRID, VISUAL and FDE modules - use one checkpoint and one schema for all real-space data (`gp/REALSPACE_checkpoint.F90`, `utils/REALSPACEschema.txt`) or keep each of these modules self-contained? --- ### Schema for real-space data (snippet): ``` *input molecule composite single required # topology of the molecular system grid composite array required # information about grid on input grid_function composite array required # information about grid functions on input *end *result execution composite single required # information about the run grid composite array required # information about grid on output grid_function composite array required # information about grid functions on output *end *grid id integer single required # grid id ndim integer single optional # grid dimensionality nr_points integer array optional # number of points in each dimension nr_points_total integer single optional # total number of points (if number of points in each dimension unknown) origin real array optional # grid origin spacing real array optional # grid spacing margin real array optional # margin (in Angstrom) to be added around molecule when creating the grid file_inp string single optional # name of a file for grid import file_out string single optional # name of a file for grid export points real array optional # grid points *end *grid_function id_assigned_grid integer single required # id of assigned grid wf_order integer single required # grid function dependent on unperturbed wave function only(0), perturbed wave functions (1, 2, ..) tensor_order integer single required # grid function represented with a scalar field (0), vector field (1), tensor field of rank 2 (2), ... purpose string single required # visualization/integration file_inp string single optional # name of a file for grid function import file_out string single optional # name of a file for grid function export values composite single required # grid function values *end ``` --- ### Data storage and representation - data schema: - 2 main groups, `grids` and `grid_functions` - `grid function` always exported with a `grid` (a related grid is referred to via an index); - `grid` can be imported/exported separately - many grid functions can be associated to the same grid - alternative schema: 1 group - `grid`, then `grid_functions` as attributes of a `grid` - data calculated as 2D arrays, data ordering: `(ndim, npoints)` - storage layout: - 1 file, contiguous ("flattened arrays"), incompressible data - chunking would enable adding more data to it ([ref](https://portal.hdfgroup.org/display/HDF5/Chunking+in+HDF5)) - collective MPI-IO as done in DIRAC ("one reads, one writes") - alternative: [Parallel HDF5](https://www.alcf.anl.gov/files/Parallel_HDF5_1.pdf#[0,{%22name%22:%22XYZ%22},null,null,null]) - **to do:** test whether these choices are optimal - option - use [hdf5-iotest](https://github.com/HDFGroup/hdf5-iotest) --- ## Discussion - passing quantum data to VISUAL --- ### VISUAL module under the hood * $P_k(\vec{r}) = \langle \psi(\vec{r}) | \hat{P}(\vec{r}) | \psi (\vec{r})\rangle = \langle \chi_\kappa(\vec{r}) | \hat{P}(\vec{r}) | \chi_\lambda (\vec{r})\rangle \tilde{D}_{\lambda\kappa}$ * Density matrices, $\tilde{D}_{\lambda\kappa}$: * $D_{\lambda\kappa}^0 = c_{\lambda m} I c^*_{\kappa n}$ * $D_{\lambda\kappa}^{P} = c_{\lambda m} W_{mn}c^*_{\kappa n}$, with $\mathbf{W}$ holding response parameters * in DIRAC - density matrices are generally computed and stored in quaternion algebra, assume Kramers pairing of AOs * in (old) VISUAL 1. MO coefficients are read from `CHECKPOINT.h5` in symmetry-adapted (SA) basis 2. different parts of MO coefficient array (occ, virt) are selected with `get_C` subroutines (from `matrix_operations` module) 3. MO occupation vector is generated (the user can define occupations [LINK to .OCCUPATION keyword]) 4. density matrices (unperturbed and perturbed) in SA-basis are generated by backtransforming MO occupation matrix (3.) with matrices of selected coefficients (2.) 5. AOs are generated in `dirac_ao_eval_init` * **Discussion:** * replace 5. with AOs read from `CHECKPOINT.h5` (**to do**) * do not calculate quantum data in `VISUAL`, get this data from external sources * replace 1.-4. with density matrices read from checkpoints? --- ### Schema for response data: * labeling ideas for property densities - after `src/openrsp` (?) ``` type(prop_field_info) :: field_list(14) = & !nc an ba ln qu (/prop_field_info('EXCI', 'Generalized "excitation" field' , 1, F, F, T, T), & prop_field_info('FREQ', 'Generalized "freqency" field' , 1, F, F, T, T), & prop_field_info('AUX*', 'Auxiliary integrals on file' , 1, F, F, T, F), & prop_field_info('PNC' , 'PNC' , 1, F, F, T, F), & prop_field_info('EL' , 'Electric field' , 3, F, F, T, F), & prop_field_info('VEL' , 'Velocity' , 3, T, F, T, F), & prop_field_info('MAGO', 'Magnetic field w/o. London orbitals' , 3, T, F, F, T), & prop_field_info('MAG' , 'Magnetic field with London orbitals' , 3, T, T, F, F), & prop_field_info('ELGR', 'Electric field gradient' , 6, F, F, T, F), & prop_field_info('VIBM', 'Displacement along vibrational modes',-1, F, T, F, F), & prop_field_info('GEO' , 'Nuclear coordinates' ,-1, F, T, F, F), & !-1=mol-dep prop_field_info('NUCM', 'Nuclear magnetic moment' ,-1, F, T, F, T), & !-1=mol-dep prop_field_info('AOCC', 'AO contraction coefficients' ,-1, F, T, F, F), & !-1=mol-dep prop_field_info('AOEX', 'AO exponents' ,-1, F, T, F, F)/) !-1=mol-dep ``` ---- ## Motivation - where this is heading * science: * real-space analysis of molecular densities to study chemical bonding and reactivity: * electron density, reduced density gradient (RDG), electron localization function (ELF), etc. * large sets of data, various molecular systems with non-covalent intractions (connection to FDE) * exploring the use of the Topological Data Analysis tools to quantum chemistry analysis, e.g.: * robust analysis of single functions * joint analysis of multiple descriptors (topological similarity) * proposing new scalar descriptors * data-driven workflows * **discussion:** back-and-forth communication between DIRAC and external analysis software - extend `pam` or develop in external scripts (e.g., pyADF, in-house scripts)?