# [Retro with EGN](https://arxiv.org/pdf/2112.06028.pdf)
## Objective:
Develop (multi-step) synthetic routes to a target molecule.
## Previous work
- [3N-MCTS](https://www.nature.com/articles/nature25978.pdf)
- [Depth-First-Proof-Number-Search](https://proceedings.neurips.cc/paper/2019/file/4fc28b7093b135c21c7183ac07e928a6-Paper.pdf) (DFPNS)
- [RL-based approach](https://pubs.acs.org/doi/pdf/10.1021/acscentsci.9b00055)
- [Transformer](https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/60c7417af96a00cd49286446/original/a-transformer-model-for-retrosynthesis.pdf)
- [Retro*](https://arxiv.org/pdf/2006.15820.pdf) (A* search)
- Retro*+ (improvement over previous approach)
- [Self-Improved Retro*](https://arxiv.org/pdf/2106.04880.pdf)
## Later work
- [RetroGraph](https://arxiv.org/pdf/2206.11477.pdf) (Data augmentation on training set)
- [Improving one-step retrosynthesis](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00594-8)
The above approaches all untilize **search tree** approach to tackle this probelm.
The important part is how to define(or build) the **score function**.
Beside from the DFPNS method which uses human-defined score function, all the other uses score function given by **neural networks**.
## Downsides of previous approaches
- Depends heavily on existing **single step reaction database**
- **Exploitation over Exploration**
(Once a successful route is found through a reaction template $T_2$, it will increase the probablity to select $T_2$, reducing the probability to explore $T_1$ which leads to optimal route.)
## Idea in this article
- Based on **MCTS** and **Neural Network**.
Similar design to **AlphaGo Zero**
- Incorporate **failed experience** into the training of NN to build better score function.

For example, in the above figure, the path through $T_{A3}, T_{O1}, T_{P1}$ leads to the compound $S$ which is not a commercially available building block. Thus it is a failed experience.
## Tree Design
Even though the algorithm is similar to the one in AlphaGo Zero, the game tree can not be applied directly to this problem.
### Data Structure
In retrosynthesis problem, instead of using game tree, we use **AND-OR tree**.
Specifically,
- **OR node** represents a **molecule**
- **AND node** represents a **reaction** template.
### State of node
Each node is either sucessful, unsucessful or undecided.
A molecule is successful (able to synthesize) if
- it is a building block
- One of its child reaction node is successful.
A reaction is sucessful if all of its child molecule node is successful.
### Value of node
Each molecule node has a $V$ value, and each reaction node has a $Q$ value.
(Both $Q$ and $V$ can be seen as the **score** or **probability** of that nodes)
$$Q, V \in [0, 1]$$
$Q$ and $V$ are used when evaluating which node to select during selection phase.
## Neural Network
There are 2 kinds of neural networks in this work.
### Single step reaction prediction neural network
$S(·)$ is a single step retrosynthetic model.
Given a molecule $M$, $S(·)$ will gives $k$ most probable *single-step* reaction templates $T_i$ that synthesize $M$ along with their probability $P_i$.
$$S(M) = [(T_i, P_i)]_{i=1}^{k}$$
### Experience-Guided Neural network (EGN)
The EGN, $f_θ$, is used when building the Monte-Carlo tree. Specifically, when a new reaction node is expanded, $f_θ$ will assign **initial** $Q$ value to it.
When tree construction is done, data is collected from the tree and fed back to train $f_θ$.
## Retrosynthesis Algorithm
The overview is that, we will train EGN by constructing Monte-Carlo tree again and again to generate training data for EGN. These data are used to train EGN to minimize the loss function.
When EGN is done training, we will build a final Monte-Carlo tree for target molecule using the same procedure when training EGN. We will then perform BFS to collect all sucessful routes to our target molecule.
```python=
def retro_synthesis(m, B, S):
'''
m: target molecule
B: set of building blocks
S: NN used to expand tree node in Monte-Carlo Tree
'''
M_train = get_molecules()
M_validation = get_molecules()
# Construct search trees for M_train over and over again
# and collect data (experience) from the tree to train the NN
f = train_egn(M_train, M_validation, B, S)
# Construct search tree for the target molecule
T = EG_MCTS_planning(m, B, S, f)
# Perform BFS to collect all sucessful routes from the tree
reaction_list = []
queue = []
while queue is not empty:
product = queue.pop()
for each child reaction node rxn of product in T:
if rxn is successful:
possible_reactants = get_reactants(rxn, product)
for reactant in possible_reactants:
add {reactant -> product} to reaction_list
if reactant is not in B:
queue.push(reactant)
return reaction_list
```
## Monte-Carlo Tree Construction
The search tree is constructed using the procedure **EG-MCTS-planning**.
The construction is greatly similar to the one used in AlphaGo Zero with a few differences.
There are 3 phases in each round of construction.
- Selection
- Expansion
- Update (Back-Propagation)
### Selection
Starting from root node, we will recursively select and tarverse to a child node until leaf node is met.
When we are at a molecule node $m$, we will select a child reaction node $T$ according to [**PUCT**](https://reurl.cc/mogDr1) policy.
$$T^* = \arg \max_{T \in\ child(m)}{(\frac{\bar Q(m, T)}{N(T)} + cP(m, T) \frac{\sqrt {N(T')}}{1 + N(T)})}$$
where
$N(T)$ = #($T$ is visited)
$N(T')$ = #(grandparent reaction node of $T$ is visited)
$\bar Q(m, T)$ = average of $Q$ value of $T$. (Each time $T$ is updated, it will get a new $Q$ value)
$P(m, T)$ = Probability given by $S(·)$
$c$ = hyper-parameter (balancing between exploration and exploitation)
The reaction node with **maximum** score is selected.
When we are at a reaction node, we will select an unexpanded molecule child node. If no such child node, then select the one which is not proven sucessful.
### Expansion
When we are done selecting and at a leaf molecule node, $S(·)$ is applied to expand the node.
If no child node is expanded (i.e. $S(·)$ gives no reaction), the selected molecule node will be marked unsucessful.
Else, each new reaction node will be given an inital $Q$ value by EGN $f_θ$ and a $P$ value given by $S(·)$.
### Update
**Back-propagate** up the tree and update each node traversed.
At molecule node $m$, first check sucessful or not.
If undecided, it will pick the largest $Q$ value from its children reaction node as its $V$ value.
$$V(m) = \max_{T \in child(m)}{\bar Q(m, T)}$$
At reaction node $T$, we will first update its visit count $N(T)$. Then it will be given a $Q$ value from a reward function $R(m, T)$. So $T$ will have $(N(T) + 1)$ $Q$ values.
($1$ stands for the initial $Q$ given by EGN)
## Experience/Data Collection
Given a constructed tree, we collect
$$D = \{ {((m, T), \bar Q)} \}$$
as training data for our EGN.
**In fact, each $((m, T), \bar Q)$ can be seen as how good is it to take the reaction template $T$ to obtain $m$.**
**So, no matter if it is part of sucessful route to target molecule in previous search tree, it will help the neural network to learn and make better prediction. (That's the virtue of expererience-guided.)**