0L Validator (Re-)Onboarding Instructions

# 0L Validator (Re-)Onboarding Instructions Note: The 0l network experienced instability in late April and early May 2022. Acknowledgements: This information was compiled from the 0L validator Discord channels, as well as my troubleshooting work with @sirouk, who generously contributed hours of late night time, as well as the example validator config file and Appendix instructions. Please contact me @0olo0 with any questions and/or edits. Many validators fell out of the active set. A new admittance process was implemented in order to increase network stability by - 1 - Gradually increasing the active set. The validator set can increase by 15% per epoch. 2 - Only admitting validators with a proven history and current commitment to actively participate in the project This document is a running self of instructions maintained by 0olo0#0242 to help validators who have fallen out of the active set to be re-admitted into the active set. I'm documenting my own process to catch-up and come back online. Note this is my interpretation of the information I've gathered trying to re-enter the active set. Please let me know in the 0L [#validator-onbaording](https://discord.gg/35rjR4Yy) Discord channel if you'd like to see any edits made to this document. Or leave a comment in the document using the comment function. ## 1 - Preparing the validator and VFN node Restart your validator and VFN node using the latest sofware version and config files. At the time of this writing on May 9, 2022, the latest - - Software version is ```release-v5.1.1``` - Config files (validator and VFN) are generated using release-v5.1.1 and adding [these parameters](https://discordapp.com/channels/833074824447655976/916199092789600276/972243142424285194), if they are not already in the config file - ``` mempool: capacity: 100 capacity_per_user: 1 default_failovers: 3 max_broadcasts_per_peer: 1 mempool_snapshot_interval_secs: 180 shared_mempool_ack_timeout_ms: 20000 shared_mempool_backoff_interval_ms: 3000 shared_mempool_batch_size: 100 shared_mempool_max_concurrent_inbound_syncs: 10 shared_mempool_tick_interval_ms: 5000 system_transaction_timeout_secs: 600 system_transaction_gc_interval_ms: 500 ``` [Here](https://hackmd.io/EL5WOlrrRi2jYOwRthKYbg?view) is an example of a good ``validator.node.yaml`` file. ## 2 - Qualifying and Getting Back Into the Active Set Up to three new validators are admitted at each epoch change. Epoch changes happen at approximately 2345 UTC and can be tracked [here](https://0l.interblockcha.in/). Here's what's required to get back into the active set - 1 - Proofs Generate > 8 proofs in the previous epoch and be one of the three validators with the highest number of proofs generated in the previous epoch. 2 - Tower Height Be one of the three validators with the highest tower height among validators who have met the proofs and vouches (see next) criteria. 3 - Vouches Have received 4 vouches from unrelated towers in the previous epoch. Check your vouches using ```ol -a <YOUR_VALIDATOR_ADDRESS> query -r``` [See this example](https://hackmd.io/-Zi5b2HTT46FxHGWW5nSlw) to get an idea of what existing validators may look for when deciding whether or not to vouch for you. 4 - Voting Power If all the above criteria are met, the validators will be admitted according to descending voting power. Track your validator here https://0l.interblockcha.in/address/<YOUR_VALIDATOR_ADDRESS> ## 3 - Re-entering the Active Set Assuming your validator has met the above three conditions, check the [validator list here](https://0l.interblockcha.in/validators). If you are listed as Active, restart your validator in validator (rather than fullnode) mode, shortly after the epoch transition. ## Appendix - Additional Troubleshooting If you have trouble restarting in validator mode and/or find yourself not voting, you may consider trying to fix the issue by following these steps. ### A note on syncing If you need to resync your validator or VFN node, syncing in fullnode mode first, then switching to validator or VFN node, is the fastest way to sync. Check sync status using ``ol health``. If you see a negative sync value, repeat the command until you see a positive, hopefully descending, sync value. The negative value could come from getting a response from a node with a bad state sync. ### switch off everything (node, tower, monitor) pkill -f diem-node pkill -f tower pkill -f 'ol serve' ### make sure you have all the binaries (answer y if needed) apt get update && apt get upgrade ### the repo should be cloned but just in case cd ~ git clone https://github.com/OLSF/libra.git cd ~/libra/ ### get the latest release to replace {X.X.X} here https://github.com/OLSF/libra/tags git fetch --all; git checkout -f release-v5.1.1; git reset --hard origin/release-v5.1.1; git clean -f; git pull make bins install ### update the monitor too ol serve --update rm ~/.0L/monitor_cache.json ### backup database (or you can remove it completely) mv ~/.0L/db ~/.0L/db-bak ### start up the db with a snapshot(will be updated by restore_version bash below) ol restore ### now overwrite with the latest snapshot ### for this to work, the previous epoch (not current) should be listed here https://github.com/OLSF/epoch-archive curl -s https://raw.githubusercontent.com/OLSF/epoch-archive/main/restore_version.sh | DURING_EPOCH=188 | bash If that doesn't work, try - ``cd ~ wget https://raw.githubusercontent.com/OLSF/epoch-archive/main/restore_version.sh chmod +x restore_version.sh DURING_EPOCH=188 ./restore_version.sh`` ### bootstrap the DB cd ~/libra/language/diem-tools/writeset-transaction-generator && make tx ### start node (fullnode cfg if not synced recently, otherwise validator cfg), tower, and monitor ### confirm port 6178 is open to anyone on the validator firewall ### waypoint on monitor should show up and no longer be null ### sync up and switch to validator mode, votes should climb if validator in the set