Run 2024-9-23 (Discarded)

# Run 2024-9-23 (Discarded) * 2024-9-23 * Start the training. * The version is v0.8.0. * learning rate = 0.005 * batch size = 256 * The network size is 6bx96c. * area scoring and territory scoring. * 2024-9-25 * Played 275k games. * Accumulate around $0.473\times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [91](https://zero.sjeng.org/networks/b3b00c6d75b4e74946a97b88949307c9eae2355a88f518ebf770c7758f90e357.gz) weights. Elo different is +42. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 091.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 569 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 150 169 319 (56.06%) Leela Zero 0.17 115 135 250 (43.94%) ``` * 2024-9-26 * Played 340k games. * Accumulate around $0.732\times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [92](https://zero.sjeng.org/networks/ae205d8b957e560c19cc2bc935a8ea76d08dd9f88110ea783d50829bdca45329.gz) weights. Elo different is +40. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 092.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 734 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 202 207 409 (55.72%) Leela Zero 0.17 164 161 325 (44.28%) ``` * The strengh is weak than Leela Zero with [95](https://zero.sjeng.org/networks/5b90bd32ccc835d8e08d41970d39753e5732413d75e8f4035bebb5f1da69fb87.gz) weights. Elo different is -24. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 095.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 611 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 129 155 284 (46.48%) Leela Zero 0.17 150 177 327 (53.52%) ``` * 2024-9-30 * Played 590k games. * Accumulate around $1.712\times 10^{8}$ 20bx256c eval queries. * **Halt the 6bx96c training.** * The strengh is as same as Leela Zero with [110](https://zero.sjeng.org/networks/4c7dd084662bfe356e572d2dca033a8e0dd30d1542bb252083ffc899cdbcb52e.gz) weights. Elo different is +9. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 110.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 1000 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 239 274 513 (51.30%) Leela Zero 0.17 226 261 487 (48.70%) ``` * The strengh is weak than Leela Zero with [116](https://zero.sjeng.org/networks/39d465076ed1bdeaf4f85b35c2b569f604daa60076cbee9bbaab359f92a7c1c4.gz) weights. Elo different is -14. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 116.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 1000 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 227 253 480 (48.00%) Leela Zero 0.17 247 273 520 (52.00%) ``` * 2024-10-1 * Start the 10bx128c training. * Current replay buffer is 112500 games. * 2024-10-4 * Played 720k games (10bx128c played 130k games). * Accumulate around $3.178\times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [117](https://zero.sjeng.org/networks/ba748d402af1bdd101212cbd217025dee866b3fcc996bd16d1a3134d5591a501.gz) weights. Elo different is +20. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 117.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 707 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 190 184 374 (52.90%) Leela Zero 0.17 169 164 333 (47.10%) ``` * 2024-10-7 * Played 925k games (10bx128c played 335k games). * Accumulate around $5.466\times 10^{8}$ 20bx256c eval queries. * Drop the learning rate to 0.0025 (from 0.005). * Current replay buffer is 131250 games. * Strength should be slightly stronger than LZ125. * 2024-10-7 * Played 935k games (10bx128c played 345k games). * Accumulate around $5.578\times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [130](https://zero.sjeng.org/networks/18e6a6c53d61359a922b36a5f3fb078aa7514cfe0050ccf3cae37f8584154d89.gz) weights. Elo different is +9. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 130.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 1000 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 242 271 513 (52.90%) Leela Zero 0.17 229 258 487 (47.10%) ``` * 2024-10-11 * Played 1140k games (10bx128c played 550k games). * Accumulate around $7.886\times 10^{8}$ 20bx256c eval queries. * **Halt the 10bx128c training. * The strengh is better than Leela Zero with [135](https://zero.sjeng.org/networks/05b723d0c9e1c10abf18e9ba3e55c378de42e364cfe8a506d425f2af8ca204e8.gz) weights. Elo different is +15. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 135.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 1000 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 246 276 522 (52.90%) Leela Zero 0.17 224 254 487 (47.10%) ``` * The strengh is weak than Leela Zero with [140](https://zero.sjeng.org/networks/521b086873ab560bbd34bab3b5d98a30efd5a4292e4c589aca3d9a3d03165ecf.gz) weights. Elo different is -35. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 140.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 546 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 106 140 246 (45.05%) Leela Zero 0.17 133 167 300 (54.95%) ``` * 2024-10-15 * Start the 15bx192c training. * Current learning rate is 0.0025. * Current replay buffer is 168750 games. * 2024-10-20 * Played 1320k games (15bx192c played 180k games). * Accumulate around $1.4773\times 10^{9}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [155](https://zero.sjeng.org/networks/834f35fa55ef2f46d40021b2cffd8f30a7f389e7664ff38aaf039e4c2e17265c.gz) weights. Elo different is +31. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 155.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 570 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 146 164 310 (54.39%) Leela Zero 0.17 121 139 260 (45.61%) ``` * The strengh is as same as Leela Zero with [157](https://zero.sjeng.org/networks/d351f06e446ba10697bfd2977b4be52c3de148032865eaaf9efc9796aea95a0c.gz) weights. Elo different is +3. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 157.gz --noponder -v 800 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 615 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.8.1 135 175 310 (50.41%) Leela Zero 0.17 132 173 305 (49.59%) ``` ![sayuri-elo-2024-10-20](https://hackmd.io/_uploads/HJGe7fVgye.png) * 2024-10-24 * Played 1445k games (15bx192c played 305k games). * Full Visits = 400 (from 150) * Fast Visits = 125 (from 50) * Gumbel Visits = 55 (from 32) * Resign Visits = 75 (from 55) * 2024-11-15 * Played 1965k games (15bx192c played 825k games). * Accumulate around $5.569\times 10^{9}$ 20bx256c eval queries. * The strength is still weaker than ELFv0. Best weights is only 46% win-rate. The progression is bad so we decide to halt 15bx192c training. * We drop the learning rate at * 15bx192c played 365k games | at 2024-10-27 | drop lr to ```0.002``` * 15bx192c played 475k games | at 2024-10-31 | drop lr to ```0.00125``` * 15bx192c played 730k games | at 2024-11-11 | drop lr to ```0.001```