Random Dropout for ABC

# Incorporating Random Dropout Method into ABC ## Seeking for Sparsity * Based on previous numerical experiments, we observed that the collective, long-wavelength motion driven by crystalline symmetry can lead to unreasonable results when identifying the new energy minimum. * Intuitively, defect motion should involve only a small fraction of particle movements within the simulation cell. * Therefore, it is reasonable to assume that the calculation of $\phi_\mathrm{p}(r)$ should not involve too many dimensions during a single minimization (downhill) step. * The new penalty function now defined as $$ \phi_\mathrm{p}(r) = W \exp{\left(-\frac{D^2}{2\sigma^2}\right)} $$ where $$ D^2 = \sum_{i\in\mathcal{R_i}} \left[(r - r_\mathrm{min})^\top (r - r_\mathrm{min})\right]_{ii} $$ Here, $\mathcal{R_i}$ is a set containing the dimensions to be considered in the calculation of the Gaussian penalty. The gradient is given by: $$ \nabla_r \phi_\mathrm{p}(r) = -\frac{W}{\sigma^2} \exp\left(-\frac{D^2}{2\sigma^2}\right) \mathbf{s} \cdot \left[r - r_\mathrm{min}\right] $$ where $\mathbf{s}$ is a vector such that $s_i=1$ for $i \in \mathcal{R_i}$, and $s_i=0$ otherwise. * Let $R_i$ be the set obtained by randomly drawing $n$ from $N$ atoms in the box. $R_i$ is updated in each minimization step. Note that $R_i$ doesn't update within one minimization step. The ABC algorithm becomes: ``` 1. Start from an energy minimum. 2. Randomly select R_i, calculate phi(r) accordingly. 3. Minimize the modified energy function. 4. Repeat ``` * The results can be found here. The vacancy now travels randomly within the FCC crystal, with no shaking or sudden collapse events observed. * However, you'll notice that the potential energy becomes serrated after identifying the first energy minimum, making it much easier for the system to overcome the energy barrier. Additionally, the potential energy gradually increases as the search steps progress. ![image](https://hackmd.io/_uploads/rkvFn03oA.png) ![image](https://hackmd.io/_uploads/SySz6RnsR.png) ![image](https://i.imgur.com/HOzoguw.gif) --- ## Modifications in Random Update Processes Since the basin climbing process is not confined to the region around the defect, successfully identifying a saddle point will also involve particle motions far from the defect. These displacements can accumulate as the search steps increase, potentially affecting the identification of subsequent energy minima. To robustify the basin climbing process, the following parameters are introduced. 1. Clear the penalty coordinate list after identifying a minimum: `n_memory` Therefore, the penalty coordinate list should be reset after identifying a significant energy minimum. However, the last `n_memory` penalty coordinates are retained to prevent reversing the search. 2. Reset search if no minimum returned: `n_search` Searching along random dimensions may not yield any saddle points, leading to an unreasonable increase in particle displacement and system energy. If no saddle points are found after `n_search` consecutive steps, the penalty coordinate list should be reset, and the random search should be restarted along different dimensions. 3. Correlations between each minimization steps: `p_keep` It is natural to consider that the selection of random dimensions at each step is correlated with the previous ones. A proportion, `p_keep`, of the entries in $\mathcal{R_i}(t-1)$ are retained in $\mathcal{R_i}(t)$. Consequently, $\mathbf{s}(t) \cdot \mathbf{s}(t-1) = \mathtt{p\_keep}$. There is an example of using parameters: ``` n_memory = 10, n_search = 100, p_keep = 0.16 ``` The modified algorithm can consistently generate saddle configurations with energy barriers less than 1 eV, without accumulating potential energy or experiencing sudden collapses, even after $10^4$ minimization steps. Clearing the penalty coordinates after identifying new minima also helps maintain a linear computational cost relative to the total number of search steps. ![image](https://hackmd.io/_uploads/S1HxLtJnA.png) ![image](https://hackmd.io/_uploads/HJopEKk3R.png) The saddle points and minima accurately correspond to the random walk of a single vacancy in an FCC crystal. [Check out the result video here!](https://i.imgur.com/ldAyMeg.mp4) --- ## To-Dos: 1. The selection of $\mathcal{R_i}$ could be influenced by the system's state. For example, particles that have been displaced more during the current minimization step are more likely to be included in $\mathcal{R_i}$ in the subsequent minimization step. 2. In addition to time correlation, the spatial correlation of the selection of $\mathcal{R_i}$ should also be taken into account. Particles that are closer to the currently selected particle are more likely to be chosen in the next step. 3. Quasi-random numbers can be employed to enhance the initial guesses for $\mathcal{R_i}$. Instead of random-uniform particle selections, these can be replaced by a distribution with minimal discrepancy from a uniform distribution. Alternatively, simple space subdivision using regular grids might also be effective. [Wikipedia: Low-discrepancy sequence](https://en.wikipedia.org/wiki/Low-discrepancy_sequence) 4. The current gradient-based minimization algorithm continues to face challenges with dimensionality as the number of particles increases, impacting both accuracy and efficiency. It is crucial to evaluate whether ABC is suitable for directly studying systems with extremely high degrees of freedom. Dimensional reduction techniques may be necessary before applying ABC.