DP-Library - HackMD

DP-Library === ## Table of Contents [TOC] ## Overview Machine learinng based methods are changing the way people use to develop interatomic potentials, or potential energy surface (PES) models. In this context, the two aspects, model and data, have become more and more unified. One needs a good model, or PES representation, to fit ab initio data within satisfactory accuracy; on the other hand, since ab initio calculations are quite expensive, one needs a way to generate a minimal set of data, with which one could obtain a reliable PES model by training. At this stage, one need to rethink how one promotes transparency and reproducibility by developing data base or model libraries to better serve the molecular simulation community. Traditional efforts to develop systematic database and model libraries have clustered into two basins: The first one stresses more on models. A popular example is the NIST Interatomic Potentials Repository (IPR) [https://www.ctcms.nist.gov/potentials/]. In this repo, users are encouraged to download and use interatomic potentials and developers are welcome to contribute potentials for inclusion. Potential models are uploaded together with proper citing acknowledgement and systematic benchmark tests. The second one stresses more on data. An example that serves a more general purpose is the figshare project [https://figshare.com/] and an example that is more specific for machine learning based molecular modeling is the quantum machine project [http://www.quantum-machine.org/]. In the quantum machine project, datasets are collected in the form of moulecular snapshots associated with lables, namely energies, forces, dipoles, etc. obtained from quantum mechanics calculations. However, the resulting models are benchmarked following more the convention of machine learning tasks, such as comparing the root mean square error of the predicted energies/forces, rather than the way people benchmark interatomic potentials, such as performing large-scale molecular simulation and looking at mechanical/structural/dynamical information. [Todo] We developed DP and DPGEN. ### Our aim is to develop a DP library, in which: 1. data: the data generation protocal should be clear. This includes the setup of ab initio calculations, the conditions in which the atmic snapshots are generated; 2. model: the model setup and the training setup; the estimated range of conditions in which the model is reliable. 3. benchmarks: for general-purpose models: should follow something like NIST; for specific-purpose models, should provide the conditions that the model is reliable and evidences (such as results in a paper with specific insterests in certain conditions). 4. for user: can easily download and use the data and model and compare the benchmarks. 5. contributors: can easily upload data, model, and benchmarks. 6. After a project is uploaded, we should provide such a place where both contributors and users can share experience and cooperate to improve the quality of the model continuingly. 1. for contributors: version-control (like github)& instructions or tutorials (optional) . 2. for users: feedback if they calculate using the models. 3. for the community: collection of successive works inspired by the models (citations), other related topics. ## Logic ## Procedure Users should have two ways to upload a project or download models(results) to / from the DP-database: 1) python api integrated in deepmodeling, 2) Website deepmd.org. ### Upload a project (By users) + Project Name + Category + Keyword + Contributors + DOI + Software version + README (Necessary instructions, tutorial to reproduce results, scripts, etc.) + File(zip, URL) + input.json (After unzip the uploaded data, the server should be able to execute `dp train input.json`) + Data: systems(deepmd.raw) or original data(recommended, currently DP-GEN database can systematically process OUTCAR genrated by DP-GEN and transform them to standard format.) + Ab-initio parameters (eg: INCAR, POTCAR) + From DP-GEN : param.json (Currently DP-GEN run/report can automatically analysis the temperature, pressure, candidate ratio, volume range, etc.) + Numerical result for figures (optional): like rdf, etc. ### Processing the project (By DP) 1. Automatical training for all available versions. 2. DP-test for all systems and report. 3. Auto-test if needed (EOS, elastic, phonon, surf, vacancy .... ) 4. Visualization(Training result, iteration details, auto-test result, etc.) 5. Illustration for the condition where the model works(phase, temperature, pressure, volume, etc.). ### After accepting the project. 1. Provide a specific page for discussions between contributors and new users (if contributors would). 2. Allow users to download the DP-model and auto-test original data. ## To-do list 1. More rigorious definition of the procedure. 2. Code development (See plumed-nest) 3. Better presentation and visulization (See material-project) 4. Stable server. 5. Who? (New engineers?) 6. Test-case: DP-GEN-software, Si-Pd-PRB, DP-Warm-Dense-Matter, etc. 7. More and better tutorials (from scratch if possible). 8. Ongoing exploration of advanced topics(MP-battery search) ## Question: