# LambdaZero Analysis
#### Next steps/ideas:
- [ ] Compare GFlow with Best RL - TOP100 scores - (same fixed proxy, same NN architecture, same evaluation metrics & steps)
- [ ] Improve DecisionNetworks accuracy (architecture, noise on conditioning score)
- [ ] Improve DecisionNetworks test set evaluation
- [ ] Compare quality of molecule comparison based on dockscore vs dokscore prediction per molecule (proxy)
- [ ] Improve RL Exploration (e.g. maximize diversity vs count based intrinsic reward; population of different greediness; decouple exploration policy?)
- [ ] RL auxiliary tasks (especially for the method with molecule proposal only on action stop)
#### Previous work:
<details>
<summary>DecisionNetworks (Dockscore conditioned networks)</summary>
Training: Sample trajectories ending in a molecule with known dockscore. Learn to classify actions for each transition given the molecule (in the transition) and the final dockscore (learn action stop when reaching the molecule with the conditioned dockscore).
Evaluate: Sample trajectories starting from different blocks conditioned by different dockscores (e.g. [-16, -14]).
Testing ideas like: (1) loss coefficient based on dockscore, or (2) distance to target molecule. (3) Class balancing for stop action. (4) Trajectory sampling balancing based on dockscore.
Related work: [DecisionTransformer](https://arxiv.org/abs/2106.01345)
[Report Draft](https://wandb.ai/lambdazero/cond/reports/DecisionNetworks--Vmlldzo4MjczNDc?accessToken=lvo8w64a6kq22poujwssr3cz5vjvn40lcb3hpc0ied8s0fw5bd99klvcc6bnt13g)
</details>
<details>
<summary>Evaluating LZ - training RL with proxy (1)</summary>
Evaluate training RL (PPO) with scores from a fixed proxy network that predicts dockscore.
1 conclusion: It seems that although different setups help on their intended purpose in improving for e.g. number of new molecules produced (A), higher training reward (B) - the transfer is not promising to the "true objective". Most significant impact is the change in proxy scores used for training RL. (C)
(D) Evaluating "true objective" (min/mean of best new feasible molecules) for different experiments (e.g.different R, hyperparam, proxy). New best dockscore -15.9
[Report](https://wandb.ai/lambdazero/an_rlbo/reports/Evaluating-LZ-training-RL-with-proxy--Vmlldzo3OTYxNjY?accessToken=4366ivnvbzkbfpejtjb1xmpm33icmc3z5uayog5w8jv4p9rfucwvdm0gbrrszn3k)
</details>
<details>
<summary>NN architecture search for best action prediction</summary>
Evaluate different neural networks performance for classifying best action (~RL policy networks). Target label - action that produces the best scoring next state molecule. (Train on proxy score & evaluate on proxy score validation set and oracle dockscore test set)
Important distinction vs evaluating performance for a regression problem: in this setup we have architectures for predicting multiple values using per atom NN heads.
Conclusions: Significant differences in performance based on arch. Very bad generalization to oracle dockscore given used proxy. ~significant difference based on filtering train molecules (based on qed & synth).
[Report](https://wandb.ai/andreic/predact/reports/Generalization-per-atom-prediction--Vmlldzo3NjgzOTY?accessToken=i2zz9hnpkwim641chhzg4a5pvv64o9u7qgznz4edios3bt8t2rvk8490uhotb0g0)
</details>
<details>
<summary>1 step dataset setup</summary>
#### Env/Data:
- random_steps: 3
- max_steps: 1
- allow_removal: true
- 1087 unique start states - with oracle_dockscore & qed & synth calculated for all next states given all pausible actions
- 914 of them have at least 1 good next action candidate molecule (qed > 0.35, synt > 4.)
$r = satlins(qed, 0.2, 0.5) * satlins(synth, 0, 4) * proxy\_dockscore(dockscore)$
```python
def act_y(x, x_shift=-1., y_shift=1., epsilon=math.e):
x += x_shift
act = y_shift + x if x > 0 else y_shift + (epsilon ** x - 1)
return act
def proxy_dockscore(dock_score):
proxy_dock = mean if np.isnan(dock_score) else dock_score
proxy_dock = (-8.6-proxy_dock) / 1.1 # mean/std -8.6/1.1
proxy_dock = act_y(proxy_dock)
return proxy_dock
def satlins(x, cutoff0, cutoff1)
x = (x - cutoff0) / (cutoff1 - cutoff0)
x = min(max(0.0, x), 1.0)
return x
```
</details>
<details>
<summary>1 step dataset analysis</summary>
- analysis [colab](https://colab.research.google.com/drive/1MPfDJBKLIkwcZ3tXi60cDE52K1C2ysK0?usp=sharing)
- Best action blockidx - skewed distribution
Block_idx histogram for best action

</details>
<details>
<summary>1 horizon RL</summary>
- RL on random statset with oracle dockscore - eval on test set (914mol)
- RL using proxy pipeline - eval on test set (914mol)
- Compare networks
- Test scalling efficienty when increasing number of training seeds. Overfit 1 step horizon?
Observations:
- MPNN seems to struggle converging to correct max R action (~60%). Where MLP directly on mol_graph obs converges
- *Preliminary* more training seeds lower scores
- Saturation seems higlhy correlated with very low entropy.
- PPO parameters might be problematic (e.g. *30 outer loops / set of 4800 states collectes - 128 batch size - Adam*).
- Maybe this is not a problem when train set size is very big? - although gnn seem to bias blockindx
[Plots egnn vs mpnn fixed set](https://wandb.ai/andreic/rlbo4/reports/RLBO-evaluation-1-step---Vmlldzo2ODI1OTQ?accessToken=5709n1nsth26i4v9idz1fex00bx77e7rrqjhdwgkqajp0ff8vsyrrajsorawswpb)
</details>
<details>
<summary>Classification for best action 1 step dataset</summary>
### Test Network(s) supervised for classification on max R
*input* unpacked mol_graph, out 2107 "labels"
(preliminary) Observations:
- needs 600epochs / ~2000batches / ~500k datapoints to reach ~max 96% accuracy
- *levels* for egnn it is more important to have above 6; mpnnet > 8 not much difference
- *batch size* very important > 128

[hyperparameters search experiments](https://wandb.ai/andreic/lztest/reports/Snapshot-May-12-2021-12-10am--Vmlldzo2ODI2NTE?accessToken=4voknby2j0lperx215029mhu37srrvlxdddjmrs00wd0xfz3cnqbf5nxxdvx1u27)
Experiment configuration
```yaml
criterion=CrossEntropy
optim=adam, # default
lr=0.001
batch_size=256,
train_split=0.8, # 870/217
num_epochs=100000,
hidden_size=128,
levels=12,
eval_freq=50,
net="EGNNetRLBO", #"MPNNet_v2", # "EGNNetRLBO"
seed=13
```
</details>
<details>
<summary>Beginning of internship work</summary>
[hackmd](https://hackmd.io/@MMEV-hspTH-8FS7P3Zj1wQ/S1q9bpWUu)
</details>