Eth2 Clients Experiment Summary

# Eth2 Clients Experiment Summary ## Executions and test performed ### Versions [Prysm 2.0.6](https://github.com/prysmaticlabs/prysm/releases/tag/v2.0.6) [Lighthouse 2.1.4](https://github.com/sigp/lighthouse/releases/tag/v2.1.4) [Teku 22.3.2](https://github.com/ConsenSys/teku/releases/tag/22.3.2) [Nimbus 1.6.0](https://github.com/status-im/nimbus-eth2/commit/2b0957f32a115eb9dee7fca9d1aeb6703ceae4d0) [Lodestar 0.34.0](https://github.com/ChainSafe/lodestar/releases/tag/v0.34.0) [Grandine 0.2.0 (several beta versions)](https://github.com/sifraitech/grandine) ### Configurations All machines were monitored using Prometheus Node Exporter and a custom python script. Àll of them were connected to an already synced geth running in a separate machine. This geth was the same for all the clients and experiments. The only exception is the kiln experiment, as the kiln guide describes the need to deploy a geth client in the same machine. https://notes.ethereum.org/@launchpad/kiln To deploy geth in the Kiln machines, we have used the following command: ``` ./go-ethereum/build/bin/geth --datadir geth-datadir --http --http.api='engine,eth,web3,net,debug' --http.corsdomain '*' --networkid=1337802 --syncmode=full --authrpc.jwtsecret=/tmp/jwtsecret --bootnodes enode://c354db99124f0faf677ff0e75c3cbbd568b2febc186af664e0c51ac435609badedc67a18a63adb64dacc1780a28dcefebfc29b83fd1a3f4aa3c0eb161364cf94@164.92.130.5:30303 --override.terminaltotaldifficulty 20000000000000 ``` During our experiment, the Kiln network suffered an incident where many miners entered the network, making the merge happen before. To avoid this, Kiln nodes were requested to override the total difficulty so as the merge could happen in the scheduled time. Therefore, in some cases we had to add an additional flag, but in others just updating the config file was enough. #### Prysm ##### Default sync (used for standard machine, fat node and raspberry PI) config.yaml: ``` monitoring-host: 0.0.0.0 http-web3provider: http://XX.XX.XXX.XXX:8545/ slots-per-archive-point: 2048 ``` ##### All-topics We have added ```subscribe-all-subnets: true``` to the configuration file. ##### Archival mode We have changed the ```slots-per-archive-point``` parameter to 64. ##### Kiln Following the Kiln guide, our configuration was the following: ``` bazel run //beacon-chain -- \ --genesis-state $PWD/../genesis.ssz \ --datadir $PWD/../datadir-prysm \ --http-web3provider=/home/crawler/kiln/merge-testnets/kiln/geth-datadir/geth.ipc \ --execution-provider=/home/crawler/kiln/merge-testnets/kiln/geth-datadir/geth.ipc \ --chain-config-file=$PWD/../config.yaml \ --bootstrap-node=enr:-Iq4QMCTfIMXnow27baRUb35Q8iiFHSIDBJh6hQM5Axohhf4b6Kr_cOCu0htQ5WvVqKvFgY28893DHAg8gnBAXsAVqmGAX53x8JggmlkgnY0gmlwhLKAlv6Jc2VjcDI1NmsxoQK6S-Cii_KmfFdUJL2TANL3ksaKUnNXvTCv1tLwXs0QgIN1ZHCCIyk \ --jwt-secret=/tmp/jwtsecret \ --monitoring-host 0.0.0.0" ``` #### Lighthouse ##### Default sync (used for standard machine, fat node and raspberry PI) ``` lighthouse bn --http --metrics --metrics-address 0.0.0.0 --eth1-endpoints http://XX.XX.XXX.XXX:8545/ --slots-per-restore-point 2048 --datadir /mnt/diskChain/.lighthouse/mainnet ``` ##### All-topics We have added the parameter ```--subscribe-all-subnets``` to the execution command. ##### Archival mode We have modified the ```-slots-per-restore-point``` to 64. ##### Kiln Following the kiln guide, the execution command is as follows: ``` lighthouse \ --spec mainnet \ --network kiln \ --debug-level info \ beacon_node \ --datadir ./testnet-lh1 \ --eth1 \ --http \ --http-allow-sync-stalled \ --metrics --metrics-address 0.0.0.0 \ --merge \ --execution-endpoints http://127.0.0.1:8551 \ --enr-udp-port=9000 \ --enr-tcp-port=9000 \ --discovery-port=9000 \ --jwt-secrets=/tmp/jwtsecret ``` #### Teku After speaking to the developer teams we were suggested to configure the JVM memory allocation. This was done using the following command: ``` export JAVA_OPTS="-Xmx5g -Xms5g -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$CLIENT_BASE_DIR/heap_data" ``` ##### Default sync (used for standard machine, fat node and raspberry PI) config.yaml: ``` network: "mainnet" eth1-endpoint: ["http://51.79.142.201:8545/"] metrics-enabled: true rest-api-docs-enabled: true metrics-port: 8007 p2p-port: 9001 data-storage-archive-frequency: 2048 metrics-interface: "0.0.0.0" metrics-host-allowlist: ["*"] rest-api-enabled: true rest-api-host-allowlist: ["*"] rest-api-interface: "0.0.0.0" rest-api-port: 5051 ``` ##### All-topics We have added the option ```p2p-subscribe-all-subnets-enabled```. ##### Archival mode We have removed the ```data-storage-archive-frequency``` parameter from the configuration file. We have added the option ```data-storage-mode: "archive"``` in the configuration file. ##### Kiln Following the kiln guide, the execution command is: ``` ./teku/build/install/teku/bin/teku \ --data-path datadir-teku \ --network config.yaml \ --p2p-discovery-bootnodes enr:-Iq4QMCTfIMXnow27baRUb35Q8iiFHSIDBJh6hQM5Axohhf4b6Kr_cOCu0htQ5WvVqKvFgY28893DHAg8gnBAXsAVqmGAX53x8JggmlkgnY0gmlwhLKAlv6Jc2VjcDI1NmsxoQK6S-Cii_KmfFdUJL2TANL3ksaKUnNXvTCv1tLwXs0QgIN1ZHCCIyk \ --ee-endpoint http://localhost:8551 \ --Xee-version kilnv2 \ --rest-api-enabled true --metrics-enabled=true --metrics-host-allowlist=* --metrics-interface=0.0.0.0 \ --validators-proposer-default-fee-recipient=0x2Ad2f1999A99F6Af12D4634e2C88a0891c3013e8 \ --ee-jwt-secret-file /tmp/jwtsecret \ --log-destination console ``` #### Nimbus ##### Default sync (used for standard machine, fat node and raspberry PI) The execution command is: ``` run-mainnet-beacon-node.sh --web3-url="http://XX.XX.XXX.XXX:8545/" --metrics-address=0.0.0.0 --metrics --tcp-port=9002 --udp-port=9003 --num-threads=4 --data-dir=/home/crawler/.nimbus-db/ ``` ##### All-topics We have added the parameter ```--subscribe-all-subnets``` to the execution command. ##### Archival mode There is no parameter to adjust the number of slots per state to store. ##### Kiln Following the Kiln guide, the execution command is as follows: ``` nimbus-eth2/build/nimbus_beacon_node \ --network=./ \ --web3-url=ws://127.0.0.1:8551 \ --rest --validator-monitor-auto \ --metrics --metrics-address=0.0.0.0 --data-dir=./nimbus-db \ --log-level=INFO \ --jwt-secret=/tmp/jwtsecret ``` #### Lodestar ##### Default sync (used for standard machine, fat node and raspberry PI) ``` sudo docker run -p 9596:9596 -p 8006:8006 -p 9005:9005 -v /mnt/diskBlock/lodestar:/root/.local/share/lodestar/ chainsafe/lodestar:v0.34.0 beacon --network mainnet --metrics.enabled --metrics.serverPort=8006 --network.localMultiaddrs="/ip4/0.0.0.0/tcp/9005" --network.connectToDiscv5Bootnodes true --logLevel="info" --eth1.providerUrls="http://XX.XX.XXX.XXX:8545/" --api.rest.host 0.0.0.0 ``` ##### All-topics We have added the parameter ```--network.subscribeAllSubnets true``` to the execution command. ##### Archival mode There is no documented archival mode for Lodestar. ##### Kiln Following the Kiln guide, the execution command is as follows: ``` ./lodestar beacon --rootDir=../lodestar-beacondata --paramsFile=../config.yaml --genesisStateFile=../genesis.ssz --eth1.enabled=true --execution.urls=http://127.0.0.1:8551 --network.connectToDiscv5Bootnodes --network.discv5.enabled=true --jwt-secret=/tmp/jwtsecret --network.discv5.bootEnrs=enr:-Iq4QMCTfIMXnow27baRUb35Q8iiFHSIDBJh6hQM5Axohhf4b6Kr_cOCu0htQ5WvVqKvFgY28893DHAg8gnBAXsAVqmGAX53x8JggmlkgnY0gmlwhLKAlv6Jc2VjcDI1NmsxoQK6S-Cii_KmfFdUJL2TANL3ksaKUnNXvTCv1tLwXs0QgIN1ZHCCIyk --metrics.enabled --metrics.serverPort=8006 ``` #### Grandine During our grandine experiments we were provided with several executables, each of them implementing a different functionality. Even though all of them belong to the same version 0.2.0, several executables were used (different beta versions). ##### Default sync (used for standard machine, fat node and raspberry PI) The execution command is as follows: ``` grandine-0.2.0 --metrics --archival-epoch-interval 64 --eth1-rpc-urls http://XX.XX.XXX.XXX:8545/ --http-address 0.0.0.0 --network mainnet ``` ##### All-topics We have added ```--subscribe-all-subnets``` parameter to the execution command. ##### Archival mode We have modified the parameter ```--archival-epoch-interval 64``` to 2. ##### Kiln The execution command: ``` sudo docker run --name grandine_container -v /home/crawler/.grandine:/root/.grandine -v /tmp/jwtsecret:/tmp/jwtsecret --network=host sifrai/grandine:latest grandine --eth1-rpc-urls http://localhost:8551/ --network kiln --jwt-secret=/tmp/jwtsecret --keystore-dir /root/.grandine/keys --keystore-password-file /root/.grandine/secrets ``` ### Tests performed We have executed each client in sync mode (standard machine, fat machine and raspberry PI), all-topics mode and in Kiln network. During all these experiments the goal was to measure the performance and hardware resource consumption in each mode and machine. We have executed some clients in archival mode: in this case we have not measured the hardware consumption but the goal was to perform an API benchmark test, in order to check the resilience and speed of the client when receiving different number of queries to the Beacon API. ### Issues in the tests performed #### Default sync The first test performed was syncing all clients except Grandine (we did not have the executable yet). During this process we investigated the best way to configure each of them, even asking the developer teams how to do so. During this process, we encountered several issues: ##### Prysm We did not encounter any issues when executing the client. ##### Lighthouse The Lighthouse database using the above configuration takes around 100 - 110 GB. However, the disk we were using had a space of 90 GB, so the client filled the disk. As soon as we noticed this, we created a new disk and move the db to it, so no resyncing was needed. We also encountered a memory problem, where the client ran out of memory and the OS would kill the process. ##### Teku As mentioned, the only issue we encountered the first time we executed the client was that the memory consumption would rise until the OS killed the process. After configuring the JVM, the client worked fine. ##### Nimbus While compiling Nimbus we noticed the process was taking very long. After speaking to the developer team, we were suggested to add the flag **-j4** to the **make** command, which enables multithreading while compiling. This improved the compiling time to around 9 minutes. After this was sorted out, the client ran smoothly. ##### Lodestar During the installation of Lodestar, we followed the official guide: https://chainsafe.github.io/lodestar/installation/ However, we were unable to successfully install the client. The issue we found out was that nodeJS version had to be greater or equal to 16.0.0, however the guide specified greater or equal to 12.0.0. The version we were using was 14.8.3 and after upgrading to 16.0.0 it worked. This has already been updated in the current documentation. After executing the default mode, we realized the client did not find any peers, so we asked the developer team and they suggested to use the docker execution, which seemed to be more stable and easy to use. Switching to docker worked fine and the client started syncing. However, the database space was more than 80GB at the time of the experiment and the disk ran out of space. We created a new disk, moved the db and continued syncing. However, the databse seemed corrupted, we were getting the following error: ``` Error: string encoded ENR must start with 'enr:' at Function.decodeTxt (/usr/app/node_modules/@chainsafe/discv5/lib/enr/enr.js:92:19) at readEnr (/usr/app/node_modules/@chainsafe/lodestar-cli/src/config/enr.ts:28:14) at persistOptionsAndConfig (/usr/app/node_modules/@chainsafe/lodestar-cli/src/cmds/init/handler.ts:86:24) at processTicksAndRejections (node:internal/process/task_queues:96:5) at Object.beaconHandler [as handler] (/usr/app/node_modules/@chainsafe/lodestar-cli/src/cmds/beacon/handler.ts:26:3) ``` We asked Lodestar team for support: the ENR was corrupted, we have renamed the enr file to enr_corrupted and the client now works, a new ENR was generated. We also found a bug in the Beacon API, related to the number of peers. When querying Lodestar API, the number of peers returned would always be 0. This bug was reported and an issue was opened in Github. #### Grandine During the syncing process we realized the client would sometimes use a lot of memory and eventually get killed by the OS. After speaking to the developer team, we were provided with a new executable which did not crash anymore. Apart from this, Grandine does expose prometheus metrics and an API, but it is unstable and we were not able to use it in every single test, as when querying the endpoint it sometimes killed the client. The API exposes data about the slot but we were not able to retrieve the number of peers from it. ### All-topics After syncing the clients we have stopped them and added the necessary parameters to activate the all-topics mode. During this process we have not encountered any major issues, apart from verifying that the client was in fact in the all-topics mode. For some clients this is shown in the terminal output. For some others, we could check the Prometheus metrics to verify this. ### Archival mode #### Prysm During the execution of Prysm in archival mode we faced a long and slow synchronization. This process took more than 3 weeks and it was also very irregular, as the exposed metrics to Prometheus sometimes were not available. Once the client was synced, we were also unable to perform the API benchmark properly, as the client would stop responding after several queries. #### Lighthouse We have not encountered any issues to sync the client in archive mode using the above configuration, other than the disk space required to store the database, which came out to be more than 1TB. #### Teku We experienced a similar behaviour in Teku, where the sync process took longer than 4 weeks. In this case, the metrics worked well. After the client was synced we could perform the API benchmark test and obtain a response for each query. #### Nimbus As per the developer team suggestion we have used the same default mode we used to sync the client, so we have not encountered any other issues to execute the client in this mode. #### Lodestar There is no archival mode available as per the developer team's answer and, therefore, we have not executed the APÎ benchmark test. #### Grandine Grandine did not implement the standard Beacon API and, therefore, we were not able to perform the API benchmark test. ### Raspberry PI #### Lodestar We were not able to execute any docker image, as they are amd64 based so we had to recompile the project for the raspberry PI. Again we did find some issues while installing nodeJS and upgrading to the correct version as well as compiling Lodestar, which sometimes compiled but would not execute because of a missing dependency (probably due to not updated). ### Kiln #### Prysm While installing and running Prysm in the Kiln network we were not able to connect Prysm to the installed Geth in the same machine by following the guide. After speaking to the Prysm team, we were suggested to not use the JWT to connect to Geth, but the IPC file instead, which was not specified in the guide. After this fix, the client worked well. #### Lodestar We faced the following error: ```Error: Invalid length index at CompositeListType.tree_setProperty (/home/crawler/kiln/merge-testnets/kiln/lodestar/node_modules/@chainsafe/ssz/src/types/composite/list.ts:494:13) at CompositeListTreeValue.setProperty (/home/crawler/kiln/merge-testnets/kiln/lodestar/node_modules/@chainsafe/ssz/src/backings/tree/treeValue.ts:294:22) at Object.set (/home/crawler/kiln/merge-testnets/kiln/lodestar/node_modules/@chainsafe/ssz/src/backings/tree/treeValue.ts:92:19) at DepositDataRootRepository.batchPut (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/db/repositories/depositDataRoot.ts:34:27) at DepositDataRootRepository.batchPutValues (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/db/repositories/depositDataRoot.ts:43:5) at Eth1DepositsCache.add (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/eth1/eth1DepositsCache.ts:104:5) at Eth1DepositDataTracker.updateDepositCache (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/eth1/eth1DepositDataTracker.ts:178:5) at Eth1DepositDataTracker.update (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/eth1/eth1DepositDataTracker.ts:159:33) at Eth1DepositDataTracker.runAutoUpdate (/home/crawler/kiln/merge-testnets/kiln/lodestar/packages/lodestar/src/eth1/eth1DepositDataTracker.ts:133:29) Mar-31 15:20:58.651[ETH1] error: Error updating eth1 chain cache Invalid length index ``` The client worked fine, produced blocks, attested and performed accordingly, but the above warning kept appearing. After speaking to the Lodestar team, there was a missing parameter: ```--network kiln``` ### Raspberry PI checkpoint sync As the Raspberry PI synchronization was slow, we tried using the checkpoint sync functionality to monitor each of the clients once synced in the raspberry PI. #### Prysm We tried using the remote checkpoint sync, which consists on connecting to a remote synced node and obtaining the last finalized state, then syncing from there. However, we were constantly having issues to do this. After speaking with the Prysm team, there was a bug in how the remote client version was parsed and, therefore, the checkpoint sync failed. In the end, we had to manually download the last finalzed checkpoint from an already synced Prysm and load it locally in the raspberry PI. #### Lighthouse In this case we just needed to add the parameter ```--checkpoint-sync-url http://XX.XX.XXX.XXX:5052``` and the client will continue syncing from the last finalized checkpoint from our already synced node. #### Teku In this case we just needed to add the parameter ```initial-state: http://XX.XX.XXX.XXX:5052/eth/v2/debug/beacon/states/finalized``` and the client will continue syncing from the last finalized checkpoint from our already synced node. #### Nimbus We were not able to execute the checkpoint sync using Nimbus 1.6.0, as it is not supported. In this case we needed to update to version 1.7.0, as per the recommendation of the Nimbus team. We just had to add: ``` trustedNodeSync --trusted-node-url=http://X.X.X.X:5051 ``` #### Lodestar In this case we were able to execute the client using the checkpoint sync by adding the parameters ```--weakSubjectivityServerUrl http://139.99.75.0:5051/ --weakSubjectivitySyncLatest``` #### Grandine Grandine does not support checkpoint sync, so we tried downloading an already synced db into the machine. However, when executing the client it would start syncing from scratch, so we were not able to execute Grandine as a synced node in a raspberry PI. ## Data Points NE data: 508.5M data points Python data: 225.2M data points Eth-Pools tool Prometheus: 2.3M data points Archival API benchmark: 10.1M data points Total cells in used CSVs: 146422234 ~= 146.4M data points used for plotting ## Execution time 1243 days ~= 29832 CPU hours

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.