Retro with 4 score MCST

# Retro with 4 score MCST ## Idea - MCST - GCN - 4 score design ### GCN A 1-step reaction prediction model. Used in **expansion** and **rollout** phase. In expansion phase, the branching factor is determined by the **top-n accuracy** In rollout phase, the branching factor is 10. ### 4 score design For each **single-step reaction**, we consider and assign four scores: - CDS (Convergent Disconnection) - ASS (Available Substance) - RDS (Ring Disconnection) - STS (Selective Transformation) **CDS**: referring to **convergent synthesis**, to synthesize the target molecule, it is good to synthesize each piece of intermediate, then assemble them into the target molecule. (Compare to linear synthesis.) In practice, if the size of each reactant is approximately the same, then it gain the score for CDS **ASS**: If some reactants is **available substance** (building blocks), then it gain some ASS score, since it potentially reduce number of steps in synthesis. **RDS**: If one or more **ring** is formed, then it gain some score. Intuitively, if ring is formed, the reactants tends to be more simple. **STS**: If **few side product** is produced, then this reaction is more favorable. ### MCST The search tree node (state) is a set of molecules. A molecule is called *resolved* if it is an starting material (building block). A state is called *proven* if all molecules in the states are starting materials. The following operations are performed (in order) for multiple iterations. In each iteration, four phases: - Selection - Expansion - Rollout (Simulation) - Update (Back-propagate) #### Selection The selection policy is nearly the same as normal PUCT, **except for one additional term $K$ which is the weighted sum of four score**. $$K = \frac{1}{n}(w_1CDS + w_2ASS + w_3RDS + w_4STS)$$ The formula of selection policy is as follows $$\arg \max State = \frac Q N + cP\frac{\sqrt{N_{-1}}}{1+N} + K$$ where $State \in child(current State)$. #### Expansion #### Rollout The simulation is performed at most 5 times. At the current state, randomly select an unresolved molecule, apply GCN with branching factor = 10, then randomly select one of ten templates. At the end of rollout phase, it check if the current state is *proven*, the result determines a reward value $z$ given, given by a **reward function** $r$. $$r(state) = \begin{cases} 10 & \text{if state is proven} \\ -1 & \text{if state is terminated} \\ \text{ratio of resolved molecules to all in the state} & \text{else} \end{cases}$$ #### Update The reward $z$ is then backpropate back up the tree and used to update each node's $Q$ value.