[](https://)Hands-on : Generator on Methane === [Toc] # General Introduction This is a practical tutorial which aims to help you quickly get command of `dpgen-run` interface. The whole process of generator will contain a series of iterations, succussively undertaken in order such as heating the system to certain temperature. In each iteration, there are three stages of work, namely, `00.train`, `01.model_devi` and ` 02.fp`. + 00.train: `dpgen` will train several (default 4) models based on initial and generated data. The only difference between these models is the random seed for neural network initialization. + 01.model_devi : represent for model-deviation. `dpgen` will use models obtained from 00.train to run Molecular Dynamics(MD). Larger deviation for structure properties means less accuracy of the models. Using this criterion, a few structures will be selected and put into next stage `02.fp` for accurate calculation based on first-principle methods. + 02.fp : Selected structures will be calculated by first-principle methods. `dpgen` will obtain some new data and put them together with initial data and data generated in previous iterations. After that a new training will be set up and `dpgen` will enter next iteration! `dpgen` identifies the current stage by a record file, `record.dpgen`, which will be created and upgraded by codes. Each line contains two number: the first is index of iteration, and the second ,ranging from 0 to 9 ,records which stage in each iteration is currently running. 0,1,2 correspond to `make_train`, `run_train`, `post_train`. `dpgen` will write scripts in `make_train`, run tasks by specific machine in `run_train` and collect results in `post_train`. The record for `model_devi` and `fp` stage follows similar rules. As an easy example, we will show you how to generate a uniformly accurate methane (CH4) potential energy surface (PES) in $100K$ for molecular dynamics (MD), by sampling only a few new structures, beyond some initial data in $50K$. # Set up ## Basics We assure that you have successfully installed our`deepmodeling` softwares including `deepmd-kit`, `dpgen` and `dpdata`. You may refer to (https://github.com/deepmodeling/) or our previous tutorials (*to be completed*) to get instructions. Apart from them, you may also need softwares for MD and Ab-initio calculations. In `dpgen`, the defaulted settings are `LAMMPS` and `VASP` , respectively . A general working process of `dpgen` is `init` $\to$ `run` $\to$ `auto_test`. For most convenience, we've prepared initial data for CH4 in $50K$ for you. We've put them in the folder `examples/init/CH4.01x01x01.POSCAR`. There're 300 frames of initial data in total. Next we will totally focus on the `run` interface. You may get instructions by calling `dpgen run -h`. Then you will see: ``` usage: dpgen run [-h] PARAM MACHINE positional arguments: PARAM parameter file, json format MACHINE machine file, json format optional arguments: -h, --help show this help message and exit ``` In the following parts, we will give an explicit illustration to show you how to write `machine.json` and `param.json`. ## Writing `machine.json` When switching into a new machine, you may modifying the `machine.json`, according to your own settings. We've provided an example set on `slurm` as follows: ```jsonld= { "train": [ { "machine": { "machine_type": "slurm", "hostname": "localhost", "port": 22, "username": "1600017784", "work_path": "/gpfs/share/home/1600017784/generator/Cu/work" }, "resources": { "numb_node": 1, "numb_gpu": 1, "task_per_node": 4, "partition": "AdminGPU", "exclude_list": [], "source_list": [ "/gpfs/share/home/1600017784/env/train_tf112_float.env" ], "module_list": [], "time_limit": "23:0:0", "qos": "bigdata" }, "deepmd_path": "/gpfs/share/software/deepmd-kit/0.12.4/gpu/gcc/4.9.0/tf1120-lowprec" } ], "model_devi": [ { "machine": { "machine_type": "slurm", "hostname": "localhost", "port": 22, "username": "1600017784", "work_path": "/gpfs/share/home/1600017784/generator/Cu/work" }, "resources": { "numb_node": 1, "numb_gpu": 1, "task_per_node": 2, "partition": "AdminGPU", "exclude_list": [], "source_list": [ "/gpfs/share/home/1600017784/env/lmp_tf112_float.env" ], "module_list": [], "time_limit": "23:0:0", "qos": "bigdata" }, "command": "lmp_serial", "group_size": 10 } ], "fp": [ { "machine": { "machine_type": "slurm", "hostname": "localhost", "port": 22, "username": "1600017784", "work_path": "/gpfs/share/home/1600017784/generator/Cu/work" }, "resources": { "cvasp": true, "task_per_node": 4, "numb_gpu": 1, "exclude_list": [], "with_mpi": false, "source_list": [], "module_list": [ "mpich/3.2.1-intel-2017.1", "vasp/5.4.4-intel-2017.1", "cuda/10.1" ], "time_limit": "120:0:0", "partition": "AdminGPU", "_comment": "that's All" }, "command": "vasp_gpu", "group_size": 5 } ] } ``` For convenience, we use `TASK` to represent `train` or `model_devi` or `fp`, since they share similar parameters in the json file. + `deepmd_path` (string): is the installed directory of `deepmd-kit`, which should contain `bin lib include`. + `TASK_machine` (dict): you may modify the settings of the machine used to run `TASK` here. + `machine_type`(string): can be "slurm" or "local" + `hostname` & `port`(string) : literal meaning. Generally you needn't change them. + `username`(string) : Creating a `ssh` session needs this. Please modify it according to your own machine. + `work_path`(string): is the remote directory of running`TASK` on the computing nodes. + `TASK_resources` (dict): modify the resources needed for calculation, such as cores of CPU , environments, etc. + `numb_node`(integer): is the numer of nodes you require for the job. + `numb_gpu`(integer): specializes whether the job apply for the GPU resources. If you choose to use GPU, we recommend to set the key 1. + `task_per_node` (integer) is number of CPU cores you need. + `source_list` (list of strings): contains the environment needed for certain job. For example, if "env" is in the list, `source env` will be written in the script submitted into `slurm`. + `module_list` (list of strings):On some machines, softwares are installed by the adiminstator and users may easily use them by `module` systems. If "module_A" is in the list, `module load module_A` will be called before your program. + `time_limit`(string) is the maximum time permitted for the job. Notice the format. + `mem_limit`(string) is the maximum memory permitted to apply for. + `TASK_command` (string): is the command for call `TASK`, such as `dp_train`, `lmp_mpi` and `vasp_std`, etc. + `TASK_group_size`(integer): DP-GEN will put these jobs together in one submitting script. As expected, once you set your `machine.json` successfully, you don't need to change it any more for all jobs on the same machine. ## Writing `param.json` In `param.json`, you can specialize the tasks of `dpgen` as you expect. We take the json file for CH4 as an example. ```jsonld= { "type_map": [ "H", "C" ], "mass_map": [ 1, 12 ], "init_data_prefix": "/sharedext4/generator/example/deep.gen/data/", "init_data_sys": [ "CH4.POSCAR.01x01x01/02.md/sys-0004-0001/deepmd" ], "init_batch_size": [ 8 ], "sys_configs": [ [ "/sharedext4/generator/example/deep.gen/data/CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale-1.000/00000*/POSCAR" ], [ "/sharedext4/generator/example/deep.gen/data/CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale-1.000/00001*/POSCAR" ] ], "sys_batch_size": [ 8, 8, 8, 8 ], "_comment": " that's all ", "numb_models": 4, "train_param": "input.json", "default_training_param": { "_comment": "that's all", "use_smooth": true, "sel_a": [ 16, 4 ], "rcut_smth": 0.5, "rcut": 5, "filter_neuron": [ 10, 20, 40 ], "filter_resnet_dt": false, "n_axis_neuron": 12, "n_neuron": [ 120, 120, 120 ], "resnet_dt": true, "coord_norm": true, "type_fitting_net": false, "systems": [ ], "set_prefix": "set", "stop_batch": 40000, "batch_size": 1, "start_lr": 0.001, "decay_steps": 200, "decay_rate": 0.95, "seed": 0, "start_pref_e": 0.02, "limit_pref_e": 2, "start_pref_f": 1000, "limit_pref_f": 1, "start_pref_v": 0, "limit_pref_v": 0, "disp_file": "lcurve.out", "disp_freq": 1000, "numb_test": 4, "save_freq": 1000, "save_ckpt": "model.ckpt", "load_ckpt": "model.ckpt", "disp_training": true, "time_training": true, "profiling": false, "profiling_file": "timeline.json" }, "model_devi_dt": 0.002, "model_devi_skip": 0, "model_devi_f_trust_lo": 0.05, "model_devi_f_trust_hi": 0.15, "model_devi_e_trust_lo": 10000000000, "model_devi_e_trust_hi": 10000000000, "model_devi_clean_traj": true, "model_devi_jobs": [ { "sys_idx": [ 0 ], "temps": [ 100 ], "press": [ 1 ], "trj_freq": 10, "nsteps": 300, "ensemble": "nvt", "_idx": "00" }, { "sys_idx": [ 1 ], "temps": [ 100 ], "press": [ 1 ], "trj_freq": 10, "nsteps": 3000, "ensemble": "nvt", "_idx": "01" } ], "fp_style": "vasp", "shuffle_poscar": false, "fp_task_max": 20, "fp_task_min": 5, "fp_pp_path": "/sharedext4/generator/example/deep.gen/data/ch4/", "fp_pp_files": [ "POTCAR" ], "fp_params": { "_comment": " that's all ", "ecut": 400, "ediff": 0.000001, "kspacing": 2, "smearing": "gauss", "sigma": 0.05, "metagga": "NONE", "npar": 4, "kpar": 1 } } ``` + `type_map` (list of strings): is the atom types you want to explore. + `mass_map` (list of float): contains the corresponding standard atom weights. + `init_data_prefix` (string): is the location of folder containg initial data. + `init_data_sys` (list of strings): is the exact directories of initial data. You may use either absolute directory or relative here. + `init_batch_size` (list of integer) Each number in the list is the batch_size for training of corresponding system in `init_data_sys`. Therefore, the size of these two lists should be the same. + `sys_configs` (list of list of strings): contains the directories of structures to be explored in iterations. Wildcard characters are supported here. Default file format is POSCAR. The index of string lists ranges from 0, and corresponds to the key `sys_idx` in `model_devi_jobs`. + `sys_batch_size` (list of integer): Each number in the list is the batch_size for training of corresponding system in `sys_configs`. + `num_models` (integer): is the number of models to be trained in `00.train`. Default is 4. + `default_training_param` (dict): is training parameters for `deepmd-kit` in `00.train`. You can find instructions from here: (https://github.com/deepmodeling/deepmd-kit) + `model_devi_f_trust_lo` and `model_devi_f_trust_hi` (float): are the lower bound and upper bound of forces for the selection in `01.model_devi`. One recommended setting for the lower bound is twice of the traning error. Default is 0.05 and 0.15 respectively. + `model_devi_e_trust_lo` and `model_devi_e_trust_hi`(float): are the bounds of energies for the selection. Nevetheless, according to our experience, the criterion based on forces is more accurate. So we recommend to set them a extremely high number, such as 1e10. + `model_devi_clean_traj`(boolin), decides whether to clean traj folders in MD since they are too large. > [name=Huang Jianxing] maybe zip file and archive? + `model_devi_jobs` (list of dicts), contains the settings for `01.model_devi`. Each dict in the list corresponds to one iteration. In the dict, + `sys_idx`(list), choose which systems to be selected as the initial structure of MD and be explored. The index corresponds exactly to the `sys_configs`. + `temp` & `press` (list): are the temperature and pressure in MD. + `trj_freq`(integer): is the frequecy of trajectory saved in MD. + `nsteps`(integer): is the running steps of MD. + `ensembles`(string): determines which ensemble used in MD, options include "npt" and "nvt". + `fp_style`(string): chooses the software for First Principles. Options include "vasp", "pwscf" and "gaussian" up to now. + `fp_task_max` and `fp_task_min`(integers) are the maximum and minimum numbers of structures to calculate in `02.fp` of each iteration. + `fp_pp_path` and `fp_pp_files` determine the location psuedo-potential file to be used for 02.fp. + `fp_params` are parameters for `02.fp`. Atrribute | Key | Type | Example | Meaning | | :-------------| :---------------- | :--------------------- | :----------------------------------------------------------- | :-------------------------------------------------------------| | Basics| ***type_map*** | List of string | ["H", "C"] | Atom types | Data | init_data_prefix | String | "/sharedext4/.../data/" | Prefix of initial data directories| | Training | ***numb_models*** | Integer | 4 (recommend) | Number of models to be trained in `00.train` | Exploration | ***model_devi_f_trust_hi*** | Float | 0.15 | Upper bound of forces for the selection | Labeling | ***fp_style*** | String | "vasp" | Software for First Principles. Options include “vasp”, “pwscf” and “gaussian” The parameters are listed in the following table. | Key | Example | Meaning | | :--------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | | type_map | ["H", "C"] | the atom types you want to explore | | mass_map | [1, 12] | contains the corresponding standard atom weights | | init_data_prefix | "/sharedext4/.../data/" | is the location of folder containg initial data | | init_data_sys | ["CH4.POSCAR.01x01x01/.../deepmd"] | is the exact directories of initial data. You may use either absolute directory or relative here | | init_batch_size | [8] | Each number in the list is the batch_size for training of corresponding system in `init_data_sys`. | | sys_configs | [<br />["/sharedext4/.../POSCAR"], ["/sharedext4/.../POSCAR"]<br />] | contains the directories of structures to be explored in iterations.Wildcard characters are supported here. | | sys_batch_size | [8, 8, 8, 8] | Each number in the list is the batch_size for training of corresponding system in `sys_configs`. | | numb_models | 4 | is the number of models to be trained in `00.train`. Default is 4. | | default_training_param | {<br />... <br />"use_smooth": true, <br/>"sel_a": [16, 4], <br/>"rcut_smth": 0.5, <br/>"rcut": 5, <br/>"filter_neuron": [10, 20, 40], <br/>...<br />} | is training parameters for `deepmd-kit` in `00.train`. You can find instructions from here: (https://github.com/deepmodeling/deepmd-kit). We commonly let `stop_batch` = 20 * `decay_steps`. One recommended rule for setting the `sys_batch_size` and `init_batch_size` is that `batch_size` mutiply number of atoms ot the stucture should be larger than 32. | | model_devi_f_trust_lo | 0.05 | is the lower bound and upper bound of forces for the selection in `01.model_devi`.One recommended setting for the lower bound is twice of the traning error. | | model_devi_f_trust_hi | 0.15 | is the upper bound and upper bound of forces for the selection in `01.model_devi`. | | model_devi_e_trust_lo | 1e10 | is the lower bound of energies for the selection. Recommend to set them a extremely high number, such as 1e10. | | model_devi_e_trust_hi | 1e10 | is the upper bouns of energies for the selection. Recommend to set them a extremely high number, such as 1e10. | | model_devi_clean_traj | true | decides whether to clean traj folders in MD since they are too large. | | model_devi_jobs | [<br/>{<br/>"sys_idx": [0], <br/>"temps": [100], <br/>...<br />},<br />...<br />] | contains the settings for `01.model_devi`. Each dict in the list corresponds to one iteration. | | sys_idx | [0] | choose which systems to be selected as the initial structure of MD and be explored. The index corresponds exactly to the `sys_configs`. | | `temp` & `press` | "temps": [100], <br/>"press": [1], | are the temperature and pressure in MD. | | trj_freq | 10 | is the frequecy of trajectory saved in MD. | | nsteps | 3000 | is the running steps of MD. | | ensembles | "nvt" | determines which ensemble used in MD, options include “npt” and “nvt”. | | fp_style | "vasp" | chooses the software for First Principles. Options include “vasp”, “pwscf” and “gaussian” up to now. | | fp_task_max | 20 | the maximum number of structures to calculate in `02.fp` of each iteration. | | fp_task_min | 5 | the minimum number of structures to calculate in `02.fp` of each iteration. | | fp_pp_path | "/sharedext4/.../ch4/" | determine the directory where psuedo-potential file to be used for 02.fp exists. | | fp_pp_files | ["POTCAR"] | determine the psuedo-potential file to be used for 02.fp. Note that the order of “H” and “C” should correspond the order in `type_map` and `mass_map`. | | fp_params | {<br/>"_comment": " that's all ", <br/>"ecut": 400, <br/>"ediff": 0.000001, <br/>"kspacing": 2, <br/>"smearing": "gauss", <br/>"sigma": 0.05, <br/>"metagga": "NONE", <br/>"npar": 4, <br/>"kpar": 1<br/>}<br/>} | parameters for `02.fp`. In `02.fp`, total cores you require through `task_per_node` should be devided by `npar`times `kpar`. | If you want to let `dpgen` run successfully in a short time, you only need do such modifcations in `param.json` beyond the above example: 1. Replace "/sharedext4/generator/example/deep.gen" with your own directory of CH4 initial data in the keys `init_data_prefix` and `sys_configs`. 2. Put an available POTCAR in your own `fp_pp_path` and specialize it in `param.json` . Note that the order of "H" and "C" should correspond the order in `type_map` and `mass_map`. We also summarize some common tricks and solutions to problems you may meet: 1. The most common problem is whether two settings correspond with each other, including: - The order of elements in `type_map` and `mass_map`. - Size of `init_data_sys` and `init_batch_size`. - Size of `sys_configs` and `sys_batch_size`. - Size of `sel_a` and actual types of atoms in your system. - Index of `sys_configs` and `sys_idx` 2. Please verify the directories of `sys_configs`. If there isnt's any POSCAR for `01.model_devi` in one iteration, it may happen that you write the false path of `sys_configs`. 3. One recommended rule for setting the `sys_batch_size` and `init_batch_size` is that `batch_size` mutiply number of atoms ot the stucture should be larger than 32. Furthermore, we suggest you not set `batch_size` too large. Otherwise, it may exceed the memory of GPU. 4. Among training parameters, we commonly let `stop_batch` = 20 * `decay_steps`. 5. In `02.fp`, total cores you require through `task_per_node` should be devided by `npar` times `kpar`. 6. The frames of one system should be larger than `batch_size` and `numb_test` in `default_training_param`. It happens that one iteration adds only a few structures and causes error in next iteration's training. In this condition, you may let `fp_task_min` be larger than `numb_test`. # Start Generator ## Basics Given all set-ups have been finished for our test case, let's start the generator by ```bash dpgen run param.json machine.json ``` In your current path, you may see a file `record.dpgen` we've introduced before. If the process of `dpgen` unfortunately stops by some reason, `dpgen` will automatically recover the main process by `record.dpgen`. You may also change it manually for your purpose, such as removing the last iterations and recovering from one checkpoint. You will also see a series of folders `iter.*`. They contain our main results that `dpgen` generates. In each folder, there're 3 sub-folders `00.train`, `01.model_devi` and `02.fp`, corresponding with 3 stages previously introduced. ## Details on Iteration We've provided a finished example of CH4 in `ch4/ch4_previous_example`. Next we will use it to demonstrate details of `dpgen`. ### 00.train ```bash cd iter.000000/00.train ls ``` You'll see ``` 000 002 data.init graph.000.pb graph.002.pb 001 003 data.iters graph.001.pb graph.003.pb ``` Here `graph.00x.pb` , linked to `00x/frozen.pb`, is the model `deepmd-kit` generates. Enter one folder, you will find: `frozen_model.pb input.json lcurve.out` + `input.json` is the settings for `deepmd-kit` for current task. + `lcurve.out` records the training accuracy of energies and forces. By `head -n 2 lcurve.out && tail -n 2 lcurve.out` You will see ``` # batch l2_tst l2_trn l2_e_tst l2_e_trn l2_f_tst l2_f_trn lr 0 8.81e+01 9.22e+01 6.62e+00 6.59e+00 2.79e+00 2.91e+00 1.0e-03 39000 1.14e-01 1.25e-01 7.02e-04 6.02e-04 1.11e-01 1.22e-01 4.5e-08 40000 1.13e-01 1.25e-01 6.87e-04 5.90e-04 1.11e-01 1.22e-01 3.5e-08` ``` The total number of batches here corresponds to our settings of `stop_batch_size` in `param.json`. In `dpgen` we mainly focus on the accuracy of forces. Since we may see that after 40000 training steps, the training error of force becomes a tenth of original std, the accuracy of our model is acceptable. Let's step forward. ### 01.model_devi Go back to folder `ch4_previous_example` and enter `iter.000000/01.model_devi`. You will see ten `task.*` folders ranging from `task.000.000000` to `task.000.000009`. Recall that in `param.json`, we have set the index of configs which will be explored as the first element of `model_devi_jobs` 0. So we check it in `sys_configs`, and we see we've set `/sharedext4/generator/example/deep.gen/data/CH4.POSCAR.01x01x01/01.scale_pert/sys-0004-0001/scale-1.000/00000*/POSCAR` for the first iteration. Exactly there are ten POSCARs. You may randomly select one of them, like `task.000.000006` and enter it. After`ls`, you will see ``` conf.lmp input.lammps model_devi.log model_devi.out ``` + `conf.lmp` is exactly the `POSCAR` you've set in `sys_configs`. In current task, it serves as the initial point of MD. + `input.lammps` is the input file for LAMMPS, automatically generated by `dpgen`. + `model_devi.out` records the model deviation of concerned labels, energy and force, in MD. It serves as the criterion for selecting which structures and doing Ab-initio calculations. By `head model_devi.out`, you will see: ``` # step max_devi_e min_devi_e avg_devi_e max_devi_f min_devi_f avg_devi_f 0 1.017324e+00 2.132917e-01 4.148666e-01 1.502898e-02 7.370744e-03 1.050770e-02 10 1.024033e+00 2.244136e-01 4.154341e-01 1.359524e-02 7.596864e-03 9.120678e-03 20 1.040794e+00 2.341178e-01 4.220795e-01 1.519385e-02 6.496701e-03 8.962266e-03 30 1.070875e+00 2.354093e-01 4.332116e-01 2.218688e-02 8.170997e-03 1.317981e-02 40 1.098053e+00 2.254422e-01 4.441248e-01 1.016788e-02 6.114264e-03 8.025430e-03 50 1.092160e+00 2.243387e-01 4.483588e-01 2.106012e-02 1.063208e-02 1.409551e-02 60 1.068275e+00 2.474324e-01 4.530141e-01 7.688441e-02 1.313477e-02 3.620158e-02 70 1.048442e+00 2.242415e-01 4.395818e-01 3.302417e-02 1.509461e-02 1.938477e-02 80 1.029923e+00 2.292606e-01 4.329449e-01 3.890682e-02 9.549126e-03 2.304373e-02 ``` Now we'll concentrate on `max_devi_f`. Recall that we've set `trj_freq` as 10, so every 10 steps the structures are saved. Whether to select the structure depends on its `max_devi_f`. If it falls between `model_devi_f_trust_lo`(0.05) and `model_devi_f_trust_hi`(0.15), `dpgen` will treat the structure as a candidate. For example, in first 80 steps, only the 60th structure will be selected, whose `max_devi_f` is 7.688441e-2. ### 02.fp Now let's go back to folder `ch4_previous_example` and enter `iter.000000/02.fp`. You will see: ``` INCAR candidate.shuffled.000.out rest.shuffled.000.out data.000 data.000.0000?? ``` + `INCAR`: is the input file for VASP. All these Ab-initio calculations share the same parameters with the one you set in `fp_params` of `param.json`. + `candidate.shuffle.000.out`: records which structures will be seleted from last step `01.model_devi`. There are always far more candidates than the maximum `fp_task_max` you expect to calculate at one time. In this condition, `dpgen` will randomly choose up to `fp_task_max` strucures and form the folder `task.*.0000*`. + `rest.shuffle.000.out`: record the other structures where our model's is either too accurate (`max_devi_f` is less than `model_devi_f_trust_lo`, no need to calculate any more), or too inacurate (lager than `model_devi_f_trust_lo`, there may be some error). + `data.000` After Ab-initio calculations, `dpgen` will collect these data and change them into the format `deepmd-kit` needs. In next iteration's `00.train`, these data will be trained together as well as initial data. By `head candidate.shuffled.000.out`, you will see: ``` iter.000000/01.model_devi/task.000.000001 130 iter.000000/01.model_devi/task.000.000001 230 iter.000000/01.model_devi/task.000.000002 160 iter.000000/01.model_devi/task.000.000001 200 iter.000000/01.model_devi/task.000.000001 140 iter.000000/01.model_devi/task.000.000004 150 iter.000000/01.model_devi/task.000.000009 80 iter.000000/01.model_devi/task.000.000007 210 iter.000000/01.model_devi/task.000.000001 260 iter.000000/01.model_devi/task.000.000006 60 ``` The last structure `task.000.000006 60`is exactly we've just found in `01.model_devi` satisfying the criterion to be calculated again. # Results By ` wc -l iter.000000/02.fp/candidate.shuffled.000.out`, we will see ``` 19 iter.000000/02.fp/candidate.shuffled.000.out ``` This means there are 19 out of 300 structures (6.3%) in the fisrt iteration which should be calculated by VASP. Nevertheless, the fantastic thing happens in the second iteration `iter.000001`. By `wc -l iter.000001/02.fp/candidate.shuffled.001.out`, you may find ``` 0 iter.000001/02.fp/candidate.shuffled.001.out ``` It shows that `dpgen` will not select any structures since all of force deviations are less than 0.05 eV/A! We should emphasize that the steps of MD in `iter.000001` is 3000 steps, compared with 300 steps in `iter.000000`. On the other hand, the configures in this iteration are also different between those in last iteration. After adding only 19 frames into the dataset, we may achieve satisfactory accuracy for MD ten times longer than the previous one! You may verify this by checking `iter.000001/01.model_devi/task*/model_devi.out`. The last iteration `iter.000002` will only contain `00.train`. It will train a uniformly accurate PES model for CH4 in $100K$ which you can easily utilize. # FAQ