# Test Run 2024-(Jan.~Feb.)
I test some candidate options. Each candidate run plays around 130k ~ 150K games. Each match plays 400 games against the baseline (default setting).
</br>
### 1. Use Gumbel approximate Q policy temperature
The origin approximate Q is $Q_{approx} = \Sigma(P(b)Q(b))(N(b) != 0)$. We apply the temperature (t=4.0) on the policy. This method was inspired by the shengkelong's test run. I am surprised that it can improve the perfornace. I have no ideal for that.
~~Win-Rate: 59%~~
After fixing some behavior and self-play around 0.5 M games on 9x9, I don't see significantly improvement. I remove the this function.
</br>
### 2. Use the optimistic policy
The optimistic policy was introduced by KataGo. But KataGo never use the optimistic policy for self-play. In my short test run, it looks good. However, after the longer training, it has some biases. Not good idea.
~~Win-Rate: 65%~~
</br>
### 3. Forbid the random move during the fast search phase
Test the feature in the v0.6.1. The orgin playout cap randomization perform fast search without any exploring settings. I add some randomization during the fast search phase since v0.6.1. Looks the random move could improve the strength.
Win-Rate: 42.25%
</br>
### 4. Optimistic policy with KLD weighted.
see [code](https://github.com/CGLemon/Sayuri/commit/e0d3a11cf8ecc247f827c183c2e11bc242991187)
This method is inspired by [Policy Surprise Weighting](https://github.com/lightvector/KataGo/blob/master/docs/KataGoMethods.md#policy-surprise-weighting). I test this methods several times. Evey run shows similer improvement.
~~Win-Rate: 81.75%~~
After the longer training on 19x19, I checked Optimistic with KLD weighted is bad than normal Optimistic.
</br>
### 5. Large batch size
Use larger batch size for training process. We also adjust the learning rate and steps per epoch. Be sure the number of samples and per-batch learning rate are are equal to small batch size case. Looks we do not need the large batch size.
Win-Rate: 4.5%
</br>
### 6. Territory scoring rule
For the next UEC cup, I implement the territory scoring (aka Japanese-like rule). After fixing some bugs, look like the network could understand this rule.
Because the next UEC cup is close, I am afraid that I couldn't check optimistic policy things. The next main run will be soon.
</br>