1559 Large State Performance Test

--- tags: 1559 --- # 1559 Large State Performance Test ## Test Overview For background on the test, see our previous [write-up](https://hackmd.io/@timbeiko/1559-prelim-perf): > As part of the testing process of 1559, we wanted to determine whether having “200% full” blocks on a network with a state size comparable to mainnet would cause any issues. > To test this, we’ve created a tool which can generate a new network with an arbitrary number of accounts and contract storage slots and a tool to then spam ethereum network with a large number of transactions. This time, a network with 4 Besu nodes (2 Ohio, 1 Paris, 1 Sydney), 4 Geth nodes (Ohio) and 2 Nethermind nodes (1 Germany, 1 USA) was used. Another change between this test and the previous one is that throughout the test, we sent both transactions which transferred ether and transactions which grew the state size, compared to only transfer in the preliminary test. Prior to starting the test, we confirmed that all nodes were in synced with a high number of peers. ### Ramp Up To ramp up the performance test, we began sending only legacy transactions to the network in large number, starting at block [167105](http://3.21.227.120:3000/block/167105), which got processed in 240ms by Nethermind. Then, we added transactions which increase the state size by adding storage slots to the smart contract (which started with `114736207` storage slots) in parallel. Average block time on the network was ~19 seconds (PoW targetting 15 seconds, but difficulty was too high for the mining instance). The two Besu nodes hosted in Paris and Sydney quickly fell behind head. Blocks it got stuck on: * http://3.21.227.120:3000/block/167187 * http://3.21.227.120:3000/block/167192 * http://3.21.227.120:3000/block/167194 The nodes kept processing blocks, albeit extremely slowly. All other nodes on the network kept processing the blocks without issues. See [below](https://hackmd.io/@timbeiko/1559-perf-test#Besu-Paris-amp-Sydney-Nodes-Troubleshooting) for more on the problematic Besu nodes. ### Main Test After the ramp up period, we went from sending only legacy transcations to a mix of 50-50 legacy & 1559 transactions on the network, along with maintaining the transactions that grew the network state. The Besu nodes that had fallen behind were restarted, but that did not immediately fix the problem. The first block for this main test was [167235](http://3.21.227.120:3000/block/167235), and [here](http://3.21.227.120:3000/tx/0xba2863f430ae216e9a82805bf8d99ba93986a682fc73557df37942d38f8f7f9a) is a 1559-style transaction from that block. The test ran up to block [167468](http://3.21.227.120:3000/block/167468), and throughout the ramp up and main test, `85289` storage slots were added to the smart contract, for a total of `114821496` at the end of the test. Some statistics about the test: * Duration: 60 minutes * Mean gas used: 43564661.93131868 * Sum of gas used: 15857536943 * Max gas used: 115763250 * Min gas used: 0 * Median gas used: 41286000.0 * Standard Deviation of gas used: 26262159.560433116 * Variance of gas used: 689701024777648.5 * 90% of blocks were 10M and more #### Observations > Nethermind processing blocks between 150-700ms, consistent with and without pruning. The restarted node caught up quickly (6 blocks/second, for 20-30 blocks). Memory and everything looks healthy. Using ~3 GB of RAM, using a 1GB cache > Besu using ~1.7 GB of RAM #### Screenshots of a Besu node's grafana dashboard during the test ![](https://i.imgur.com/y4pPFzR.png) ![](https://i.imgur.com/8vWRO05.png) ## Clients Info * Besu * Regions: AWS Ohio, Paris, Sydney * Instance type: t3.2xlarge * DB size: 280GB * Geth * Regions: AWS Ohio * Instance type: t3.2xlarge * DB size: ~280GB * Nethermind * Regions: Digital Ocean Germany, USA * Instance type: Linux, 4 CPU, 16GB RAM * DB size: 296GB ## Besu Paris & Sydney Nodes Troubleshooting After the test had concluded, we tried to understand what went wrong with the Besu nodes and, after one week of trying to reproduce the issue, we haven't been able to. We have re-run a similar test multiple times (with a similar mix of nodes from different clients in different regions) and have not seen any issues where Besu nodes get stuck slowly processing blocks. See a full write-up of the follow up test [here](https://hackmd.io/@abdelhamidbakhta/eip1559-perf-update). ## Test Video Recording https://consensys.zoom.us/rec/share/3zae8EBROau0DiR_-O7knj9moxCabqn5d-vFPapgbQnJems3tnUEvfDkFpKDnmeT.v4fKPfnYOURWwk3b Passcode: YN%a8tr+ ## Pre-Test Checklist - [X] Have nodes in >1 Region - [x] Paris - [x] Sydney - [x] Ohio - [x] Germany - [ ] Have metrics turned on for all nodes - [x] Besu - [x] Nethermind - [ ] Geth - [x] Have a list of enodes of all participating nodes - [x] Besu - [x] Nethermind - [x] Geth - [x] Make static nodes connections between all the nodes upfront - [x] Besu - [x] Nethermind - [x] Geth - [x] Confirm that any node is capable of sending one transaction of legacy and eip1559 types each to the network and get included in the block - [x] Besu - [x] Nethermind - [x] Geth - [x] Share the ethstats, Block Explorer addresses with all particpants - [x] Ethstats: http://3.21.227.120:3001/ - [x] Block Explorer: http://3.21.227.120:3000/ - [x] Have access to machine logs and be able to retrieve them when needed - [x] Besu - [x] Nethermind - [x] Geth - [x] Confirm that there are no disconnections between nodes when idle - [x] Besu - [x] Nethermind - [x] Geth - [x] Setup more than one node for tests - [x] Besu - [x] Nethermind - [x] Geth - [X] Confirm representative of each client team - [x] Besu - [x] Nethermind - [X] Confirm that the tx generator tool works ahead of the call (execute 2 - 3 blocks) - [X] (Nice to have) Define storage touching transactions for the test too? - [x] For accounts: ensure that we do not use same accounts over and over again, which would enable caching and falsify the results