--- --- # Miscelaneous ###### tags: `SoftQC` `self-assembly` --- --- ### Owners (the only one with the permission to edit the main text) Frank, Giuseppe, Etienne --- --- This page contains side stories, related to the main one, but not specific to main research focus developped in the other pages. ## Automatic detection of non-equilibrated simulations [12/08/2021] Our work relies heavily on measuring equilibrium quantities from simulation snapshots. These quantities are usually averages, taken over a sequence of snapshots. Under the assumption that the system is ergodic and at equilibrium, the time averages taken from simulations are the same as ensemble averages. The ergodicity hypothesis is not very easy to assess, and we almost never check it in practice. The equilibrium hypothesis is easier to verify, and usually checked by looking at a property versus simulation time plot. If the quantity fluctuates around a mean value, the system is assumed to be at equilibrium. If a drift is observed in curve, the simulation has not reached equilibrium. | ![](https://i.imgur.com/919KJ6C.png) | ![](https://i.imgur.com/5B8gDAG.png) | |:-:| :-: | | An equilibrated MC simulation of hard disks at pressure $4 k_BT \sigma_{L}^{-2}$ | A MC simulation of hard disks at pressure $40 k_BT \sigma_{L}^{-2}$ that has not reached equilibrium | This visual verification works fine, but quickly gets impractical when the number of simulations grows. This section presents some attempts made at automating the detection of non-equilibrated simulations. I have thought about 3 methods, 2 of which I actually tried: * Using a statistical normality test for the distribution of measured points. * Using a linear fit to all the points an monitor its slope. * Using the blocking average data of the Flyvbjerg-Peterson method (not tried). To benchmark the methods, I use two set of simulation data: * Equation of state measurement for a binary mixture of non-additive hard disks. The volume is recorded as a function of simulation time for various pressures in $[0.05\; k_BT \sigma^{-2}_{L}, 45\; k_BT \sigma^{-2}_{L}]$. At high pressures, the simulations are typically less equilibrated. * Frenkel-Ladd free-energy calculation for a hexagonal crystal of hard disks. Disks are tied to their lattice site by harmonic springs. The squared distance to the lattice site as a function of simulation time is recorded for various springs stiffness. At low spring stiffness, particles tend to diffuse and the squared distance might not reach an equilibrium value. ### Normality test The thermodynamic quantities of a system at thermal equilibrium can be represented as random variables following a Gaussian distribution whose mean and width are physically meaningful quantities. To assess equilibration of a system, one could use a statistical test to see whether the measured points can reasonably be assumed to have been drawn from a normal distribution. Several such tests are available such as [Lilliefors test](https://en.wikipedia.org/wiki/Lilliefors_test), [Anderson-Darling test](https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test) and [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test), the latter being the most efficient according to Wikipedia. In the following, I used the Shapiro-Wilk test. The Shapiro-Wilk test (implemented in the `scipy` Python library) returns a *p-value* for the null hypothesis that the samples were drawn from a normal distribution. P-values are strange objects, and interpretations should be made with extreme care. The p-value of the Shapiro-Wilk test is the probablity that a set of points drawn from a normal distribution would have displayed a behaviour at least as extreme as the observed one. Typical use of p-values involve setting a-priori a threshold value and rejecting the null hypothesis if the computed p-value is smaller than the threshold. Let's consider a threshold p-value of $10^{-2}$. The following figure displays the p-value as a function of pressure for the equation of state data-set. ![](https://i.imgur.com/7ZVHLLu.png) There is an accumulation of points with large p-values at small pressures, indicating that the dillute systems tend to equilibrate better. However, the overall result is not satisfying. Some points at low pressure are classified as not equilibrated even though they look very much so (see eg: $P = 5.2 \; k_BT\sigma^{-2}$, red point) while at high pressures, most points are poorly equilibrated and still have a high p-value (see eg: $P = 42.2\; k_BT \sigma^{-2}$, pink point). |![](https://i.imgur.com/yrcLipr.png)|![](https://i.imgur.com/zYTwRDP.png)| |:-:|:-:| |$P = 5.2 \; k_BT\sigma^{-2}$, $\text{p_value} = 0.002318$|$P = 42.2 \; k_BT\sigma^{-2}$, $\text{p_value} = 0.994072$| This method does not work well. This might be due to time correlations between samples. Indeed, the samples are not exactly drawn from a normal distribution when snapshots are dump too often. A way around this would be to prune the data set to keep only uncorrelated samples. The mixing time can be obtained from the Flyvbjerg-Peterson analysis. However, automating this is a problem in itstelf, that is essentially boils down to the third forseen method. Other normality tests might also perform better on correlated samples. ### Linear fit At equilibrium, thermodynamic quantities fluctuate around a constant mean. Therefore, a linear fit to an infinitely long measurement of an equilibrated quantity should yield a line of slope 0. Fitting an affine function to the data is essentially what one does when eyeballing at the curve to check whether a drift is present or not. I use the function `linregress` of the module `stats` of `scipy` Python library. The x-axis is arbitrary, and simply taken as the index of the sample in the data set. The fit function returns both fit parameters and standard error. The standard error is multiplied by a factor to obtain the confidence interval for the slope. The factor is computed from a Student's t-distribution adapted to the size of the data set. Finally, if zero does not lie within the slope errorbars, the system is assumed to be out of equilibrium. The confidence level used in the errorbar determination gives the probability that the true slope lies within the errorbars. It can be used to adapt the strictness of the test. The smaller the confidence level, the smaller the errorbars, thus the stricter the test. The following figures illustrate the method on two simulations that are not correctly classified by the normality test: |![](https://i.imgur.com/vi0E36Z.png)|![](https://i.imgur.com/zblRJbk.png)| |:-:|:-:| |$P = 5.2 \; k_BT\sigma^{-2}$, $\text{slope} = 0.0015 \pm 0.0028\; \sigma^2$, equilibrated |$P = 42.2 \; k_BT\sigma^{-2}$, $\text{slope} = 0.0016 \pm 0.0004 \sigma^2$, not equilibrated| The following figures display the result of the analysis for all simulations in the two test data sets. |![](https://i.imgur.com/l8SICO0.png)|![](https://i.imgur.com/SlaCtVZ.png)| |:-:|:-:| |Equation of state simulations for a confidence interval of $0.999$|Equation of state simulations for a confidence interval of $0.95$| |![](https://i.imgur.com/oEHjvdb.png)|![](https://i.imgur.com/RxPehjo.png)| |:-:|:-:| |Frenkel-Ladd crystal simulations, confidence interval $0.999$|Frenkel-Ladd crystal simulations, confidence interval $0.95$| For the equation of state data, the method allows to clearly identify that simulations of pressure above $\sim 33 \; k_BT \sigma^{-2}$ have not reached equilibrium. However, at lower pressures, some simulations are classified as not equilibrated although visual inspection suggests they are. Incresing the size of the confidence interval reduces this effect, but does not suppress it. For Frenkel-Ladd data, simulations at low spring stiffness are identified as poorly equilibrated as expected. Visual inspection reveals that a confidence interval of $0.999$ yields more satisfying results that $0.99$ or $0.95$. This method yields decent results. It corresponds to a relatively direct implementation of the manual process. It would however fail spectacularly in some pathological cases such as sampling that would be symmetrical around the middle of the x-axis. This could occur with long wavelength fluctuations for instance, although this should be pretty rare. Finally, plotting the p-value as a function of the slope shows poor correlation, conforting the observation that the p-value is a poor estimator of the equilibration. ![](https://i.imgur.com/H5bqJEA.png) [13/08/2021] I tried using the [Theil-Sen estimator](https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator) for the slope instead of least square regression. This method is more robust is the sense that it is less sensitive to outliers in the data set. The results are assentially the same as those obtained with least square regression. The two figures below display the the classification of the equation of state and Frenkel-Ladd data sets with a confidence interval of $0.999$. |![](https://i.imgur.com/tkWA6s5.png)|![](https://i.imgur.com/9GIRDSi.png)| |:-:|:-:| ### Flyvbjerg-Peterson block averages In a time sequence of equilibrated simulation snapshots, samples are usually correlated. To measure meaningful fluctuation and errors on the calculated means, we routinely use Flyvbjerg-Peterson block-average analysis. This works great and allows us to detect whether the simulation was long enough to decorrelate or not by looking at the appearence of a plateau in the standard error on mean computed on block averages. The plateau typically doesn't appear when the simulation is not equilibrated. Instead, the standard error grows monotonically. The figures below display Flyvbjerg-Peterson analysis for equation of state simulations as $P = 4.8 \; k_BT \sigma^{-2}$ and $P = 44.8 \; k_BT \sigma^{-2}$. The first one would reasonably be classified as equilibrated and is not by the slope analysis, while the second looks equilibrated (and is classified as such by the slope analysis) but is in fact completely correlated. |![](https://i.imgur.com/maZCYIU.png)|![](https://i.imgur.com/1Qski8q.png)| |:-:|:-:| |![](https://i.imgur.com/PFNBoiH.png)|![](https://i.imgur.com/z6HuQ5C.png)| |$P = 4.8 \; k_BT \sigma^{-2}$|$P = 44.8 \; k_BT \sigma^{-2}$| In both cases, the block average plot provides a clear indication. This suggests that detecting the plateau in the block averages is a more reliable method to assess equilibration. However, automatic plateau detection on curves with typically less than a dozen points seems tricky.