# RPSB summary
## TO-DO
- [x] (Guan) Debug on the integrater. Accelerate inference and potentially yield performance improvement.
- [x] (Chenru) Reach out to Mathias Schreiner to triple check my IDPP implementation is correct (because he used IDPP in the Transition1x dataset)
- [x] (Chenru) Fine tune RGD1-xtb checkpoint on Transition1x to see whether the performance can be improved.
## Transition1x
Used in OA-ReactDiff. 13k reactions
### Dataset statistics
red: 1.16, blue: 0.94

red: 0.47, blue 0.36

Mediate correlation between the RMSD of R <> P and TS <> guess TS, both 0.67 for single- and multi-fragment reactions.

### Performance
#### OA-ReactDiff

#### SB (ODE)
| Approach | RMSD mean | RMSD median | $\Delta E_{TS}$ mean | $\Delta E_{TS}$ median | data fraction |
| ---- | ---- | ---- | ---- | ---- | ---- |
| SB | 0.103 | 0.053 | 3.4 | 1.1 | 1 |
| SB (p>0.5) | 0.074 | 0.039 | 2.0 | 0.7 | 0.65 |
| SB (p>0.025) | 0.090 | 0.045 | 2.4 | 0.9 | 0.86 |
| RGD1-pretrained SB | 0.098 | 0.044 | 2.9 | 0.8 | 1 |
| RGD1-pretrained SB (p>0.5) | 0.070 | 0.033 | 1.4 | 0.6 | 0.66 |
| RGD1-pretrained SB (p>0.025) | 0.087 | 0.040 | 1.9 | 0.7 | 0.86 |
| RGD1-pretrained SB (xtb RP) | 0.105 | 0.048 | 3.28 | 0.79 | 1 |
| RGD1-pretrained SB (xtb RP, guess xtb) | 0.097 | 0.045 | 2.73 | 0.85 | 1 |
| Approach | RMSD mean | RMSD median | $\Delta E_{TS}$ mean | $\Delta E_{TS}$ median | data fraction |
| ---- | ---- | ---- | ---- | ---- | ---- |
| OA-ReactDiff | 0.129 | 0.058 | 4.4 | 1.6 | 1 |
| SB | 0.103 | 0.053 | 3.4 | 1.1 | 1 |
| RGD1-pretrained SB | 0.098 | 0.044 | 2.9 | 0.8 | 1 |
| RGD1-pretrained SB (xtb RP) | 0.105 | 0.048 | 3.28 | 0.79 | 1 |
SB with different nfe gives similar rmsd distributions.

But the energy difference improves with increasing nfe, and converges at nfe=100.

Compared to OA-ReactDiff, SB can achieve lowever energy difference at similar RMSD.

---
## RDG1-xtb
Unpublished dataset from Qiyuan. 780k reactions
### Dataset Statistics
red: 0.94, blue: 0.94

red: 0.40, blue: 0.36

Weak correlation between the RMSD of R <> P and TS <> guess TS, 0.37 for single- and 0.51 for multi-fragment reactions.

The current IDDP guess results mostly lie on the parity line or worse. Probably there are bugs in my implementation?

RMSD between IDPP and linear TS guess.
red: 0.07, blue: 0.07

- [x] (Qiyuan) Could you help me check whether this code is problematic here?
```python
from ase.neb import NEB
def idpp_guess(
initial: Atoms,
final: Atoms,
n_images: int,
interpolate: str = "idpp"
):
# Generate blank images.
images = [initial.copy()]
for _ in range(n_images-2):
images.append(initial.copy())
images.append(final.copy())
for image in images:
image.calc = None
neb = NEB(images)
neb.interpolate(interpolate)
return neb.images[n_images // 2].arrays["positions"]
```
### Performance
| Approach | RMSD mean | RMSD median | $\Delta E_{TS}$ mean | $\Delta E_{TS}$ median | data fraction |
| ---- | ---- | ---- | ---- | ---- | ---- |
| OA-ReactDiff | 0.17 | 0.10 | | | 1 |
| SB (ODE) | 0.094 | 0.065 | 3.9 | 1.3 | 1 |
---
## RGD1
There are two types of reactions, "Intended" refers to reactions that are initially planned, where the internal reaction coordinate (IRC) calculation of the optimized TS falls back to the initial R and P. "xTB-IRC" refers to unintended reactions, where the IRC calculations yield R and/or P that are different than the initial guess RP.
### Dataset statistics
RMSD between R and P is similar for "Intended" and "xTB-IRC" reactions.
red: 0.89, blue: 0.90

RMSD between TS and linear interpolation of R and P is quite different for "Intended" and "xTB-IRC" reactions. The reason is that for "Intented" reactions, the initial R and P are in a worse quality compared to the IRC results.
red: 0.73, blue: 0.41

The number of fragments in R/P does not influence the quality of linear guess TS
red: 0.59, blue: 0.55

Weak correlation between the RMSD of R <> P and TS <> guess TS, 0.23 for "xTB-IRC" and 0.46 for "Intended".

The current IDDP guess results mostly lie on the parity line or worse. Probably there are bugs in my implementation?
