Run 2024-3-21

2024-3-21
- Start the training.
- The version is v0.7.0.
- learning rate = 0.005
- batch size = 256
- The network size is 6bx96c.
- Area Scoring only (T^T). Maybe next run will play Territory Scoring both.

2024-3-25
- Played 475k games.
- The loss is unstable. Drop the learning rate to 0.0025 (from 0.005).

2024-3-27
- Played 590k games.
- Accumulate around
  $1.707 \times 10^{8}$ 20bx256c eval queries.
- The strengh is between LZ091 and LZ092. I will update the match games later.
- Halt the 6bx96c training.
- The loss is still unstable after playing 100k games. Maybe 6bx96c can not understand the 19x19 well? Here is the whole process of policy loss. You could see the loss values at around 450000 steps and at around 680000 steps are oscillation.
- Image Not Showing Possible Reasons
  - The image was uploaded to a note which you don't have access to
  - The note which the image was originally uploaded to has been deleted
  Learn More →

2024-3-28
- Start the 10bx128c training.
- learning rate = 0.005
- batch size = 256
- current replay buffer is 150000 games.

2024-3-30
- Played 695k games (10bx128c played 105k games).
- Accumulate around
  $3.061 \times 10^{8}$ 20bx256c eval queries.
- The strengh is better than Leela Zero with 116 weights. Elo different is +53.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 116.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      124              106         230  (57.50%)
Leela Zero 0.17     94               76         170  (42.50%)
```

2024-4-3
- Played 905k games (10bx128c played 315k games).
- Drop the learning rate to 0.0025 (from 0.005).

2024-4-5
- Played 1040k games (10bx128c played 450k games).
- Accumulate around
  $7.509 \times 10^{8}$ 20bx256c eval queries.
- Halt the 10bx128c training.
- The strengh is between LZ116 and LZ117.

2024-4-8
- Start the 15bx192c training.
- learning rate = 0.0025
- batch size = 256
- current replay buffer is 200000 games.

2024-4-9
- Played 1095k games (15bx192c played 55k games).
- Accumulate around
  $9.905 \times 10^{8}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 127 weights. Elo different is -6.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 127.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 596 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      143              150         293  (49.16%)
Leela Zero 0.17    148              155         303  (50.84%)
```

2024-4-12
- Test a new network structure mixer convolution, a transformer-like without attention block. Seem the mixer of policy loss and WDL loss are significantly better than resnet. Each network is 6bx128c and trained on 150k games.
- Image Not Showing Possible Reasons
  - The image was uploaded to a note which you don't have access to
  - The note which the image was originally uploaded to has been deleted
  Learn More →
- Image Not Showing Possible Reasons
  - The image was uploaded to a note which you don't have access to
  - The note which the image was originally uploaded to has been deleted
  Learn More →
- shengkelong said "I did get good result on transformer. Performance is bad and The relation of attention map is mess". One possible solution proposed by him is Global Pooling or Patch Embedding. It could compress spatial size. But it is high risk. Think about why AlphaGo never uses Global Pooling. The Global Pooling may lose the local information for policy head.
- Another solution is to remove attention mechanism, like MLP-Mixer. My current version use depthwise convolution instead of token-mixing. You may see relation works from here, and here. Note we don't use any Patch Embedding.
- Relative performance: Residual (Winograd) > Mixer > Residual (Im2col).

2024-4-19
- Played 1380k games (15bx192c played 285k games). Accumulate around
  $2.826 \times 10^{9}$ 20bx256c eval queries. Current weights strength should be between LZ130 and LZ135. We lack of GPU now. So only release weights. Do not provide full match result.
- Drop the learning rate to 0.002 (from 0.0025) because last 80k don't get significantly improvement. I am not really sure so only change a little.
- Test 10b mixer network performance. After playing around 400 games. The mixer winrate is around 54%. I think we get some benefit from new structure. Will add mixer block for next training step. However still have some problems.
  - Although the value head performance of SWA is good. The value head is unstable in the training process.
  - The evals per second is slower than conv3x3 (Winograd).

2024-5-6
- Played 1640k games (15bx192c played 600k games).
- Accumulate around
  $5.9118 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 151 weights. Elo different is +28.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 151.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 686 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      175              196         371  (54.08%)
Leela Zero 0.17    147              168         315  (45.92%)
```
- The strengh is as same as Leela Zero with 154 weights. Elo different is -2.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 154.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 773 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      174              210         384  (49.68%)
Leela Zero 0.17    176              213         389  (50.32%)
```
- Time of Learning rate change.
  - Drop the learning rate to 0.0016 (from 0.002) when playing 1435k games. (2024-4-25)
  - Drop the learning rate to 0.00125 (from 0.0016) when playing 1545k games. (2024-5-1)
- After the longer training, the performance of pure mixer block is not good than residual block. Although the mixer block have large view, counter-intuitively, it is not good at life and death of dragon. We try some hybrid struture and find that residual block can help this.

ownership of mixer-block

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

ownership of residual-block

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

ownership of hybrid-block (residual + mixer)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

2024-5-10
- Played 1710k games (15bx192c played 670k games).
- Remove KLD weighted for optimistic policy.

2024-5-11
- Played 1720k games (15bx192c played 680k games).
- Accumulate around
  $6.6862 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 157 weights. Elo different is +10.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 157.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 914 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      213              257         470  (51.42%)
Leela Zero 0.17    200              244         444  (48.58%)
```

2024-5-17
- Played 1840k games (15bx192c played 800k games).
- Accumulate around
  $7.8433 \times 10^{9}$ 20bx256c eval queries.
- Now use the SWA weights for the self-play games.
- Drop the learning rate to 0.001 (from 0.00125) when playing 1825k games. (2024-5-16)

2024-5-19
- Update the experimental executable mixer block weight here. Pleas watch the README.txt file first. Compare it with baseline. The new net is +70 Elo better than resnet only.

2024-5-21
- Played 1915k games (15bx192c played 875k games).
- Accumulate around
  $8.557 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 173 weights. Elo different is -6.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 173.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 1000 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      222              270         492  (49.20%)
Leela Zero 0.17    230              278         508  (50.80%)
```

2024-5-22
- Played 1935k games (15bx192c played 895k games).
- Accumulate around
  $8.747 \times 10^{9}$ 20bx256c eval queries.
- Halt the 15bx192c training.
- Some future works:
  - Based on Kobayashi's reply. The Gumbel based model is hard to learn playing pass under the territory scoring. My current implementation shows same result. I guass the main reason is Sequential Halving? Agent refuse to play pass because win-rate of pass is 0% and think other moves are better win-rate?
  - Version 4 weights supports other activation functions, like mish or swish.
  - Improve my GTP match tool. Hiroshi Yamashita suggested me follow the floodgate's design. But this is not what I need. I also try CGOS implementation. However the performance is not better. Maybe BayesElo is my next choose.

2024-5-30
- Start the 20bx256c training.
- learning rate = 0.0005
- batch size = 128
- cPUCT = 0.5 (from 1.25)
- current replay buffer is 300000 games.
- Please use this version or above for last 20b network.

2024-6-1
- Played 1975k games (20bx256c played 40k games).
- Accumulate around
  $9.518 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 174 weights. Elo different is +8.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 174.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 606 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      149              161         310  (51.16%)
Leela Zero 0.17    142              154         296  (48.84%)
```
- The strengh is better than Leela Zero with ELFv0 weights. Elo different is +60.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w ELFv0.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      105              129         234  (58.50%)
Leela Zero 0.17     71               95         166  (42.50%)
```

2024-6-14
- Played 2210k games (20bx256c played 275k games).
- Accumulate around
  $1.4005 \times 10^{10}$ 20bx256c eval queries.
- Drop the learning rate to 0.0003 (from 0.0005).
- Progression is slow. Look like the Gumbel effect can not help current weights.

2024-6-17
- Played 2250k games (20bx256c played 315k games).
- We double the playouts/visits for self-play (from 400 to 800).
- Really weird result. Current weights aready acheived ELFv1. However they are weaker than LZ190. Based on Computer Go Rating, ELFv1 should be stronger than LZ190.

2024-6-20
- Played 2285k games (20bx256c played 350k games).
- Accumulate around
  $1.5722 \times 10^{10}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 190 weights. Elo different is -10.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w 190.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 581 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0      124              158         282  (48.54%)
Leela Zero 0.17    132              167         299  (51.46%)
```
- The strengh is as same as Leela Zero with ELFv1 weights. Elo different is +16.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w ELFv1.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 446 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0       95              138         233  (52.24%)
Leela Zero 0.17     85              128         213  (47.76%)
```

2024-6-27
- Played 2385k games (20bx256c played 450k games).
- Accumulate around
  $1.8455 \times 10^{10}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with ELFv2 weights. Elo different is -2.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8
  - Leela Zero: -w ELFv2.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8
  - Game result (played 876 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.7.0       194              242         436  (49.77%)
Leela Zero 0.17     196              244         440  (50.23%)
```
- Compare Sayuri with ELF OpenGo, our engine may reduce around 250 times computation, surpassing KataGo g104's 50 times.
- I think I will keep this run because I am interesting in Gumbel issue on later 20b network. However, could I afford the computation? Actually, I don't get any resource and help from my professor (our Lab's budget becomes professor's personal bonus :-) ).

2024-7-8
- Played 2525k games (20bx256c played 590k games).
- Accumulate around
  $2.2215 \times 10^{10}$ 20bx256c eval queries.
- Fix the target policy issue. But seem we doesn't get the benefit on the current games. Will explain it later.
- In order to maximize the strengh before UEC cup, dropping the learning rate again.
  - learning rate = 0.00015 (from 0.0003)
  - batch size = 128
  - current replay buffer is 325000 games.

Summary
- Halt this run after UCE16 because we are busy for next run features. Include slight improvement of network, fixing Japanese-like rule, fixing target policy and policy surprising sampling.
- We mention disadvantage completed-Q target policy (part of the Gumbel AlphaZero). Let we look at target policy formula, the target policy should be
  $P_{target} (a) = Softmax (P_{logit} (a) + σ (Q (a)))$ . Where the
  $σ (. . .)$ could be any monotonic function. In practice, the return value of function is proportional to number of total visits. It means we may too believe Q in high visits condition. For example, the target policy will be one-hot if number of visits is over 800. It will make the output policy too sharp. What's more, based on Yuki Kobayashi's result, we are hard to get benefit when number of visits is over 200. Our solution is switching to AlphaZero.
- The mixer block is not a noval network structure for the game of the Go. For example, Maru in the UEC16 also used the same ideal to improve the performance. But based on my experience, these kind of transformer-like modules and the modules inspired by transformer do not work well on the game of the Go. The reasons are (1) the speed (evaluation times per second) is terrible. (2) these module may lost informations. In order to overcome these disvantages, (1) we remove conv 1x1 layers as many as possible and use customized cude implementation instead of cuDNN. (2) Only add the mixer block at tail of the tower.

Run 2024-3-21

ownership of mixer-block

ownership of residual-block

ownership of hybrid-block (residual + mixer)

Read more

電腦對局比賽

Sayuri Zero Note

Run 2023-8-11

Run 2023-1-6