For background on the test, see our previous write-up:
As part of the testing process of 1559, we wanted to determine whether having “200% full” blocks on a network with a state size comparable to mainnet would cause any issues.
To test this, we’ve created a tool which can generate a new network with an arbitrary number of accounts and contract storage slots and a tool to then spam ethereum network with a large number of transactions.
This time, a network with 4 Besu nodes (2 Ohio, 1 Paris, 1 Sydney), 4 Geth nodes (Ohio) and 2 Nethermind nodes (1 Germany, 1 USA) was used.
Another change between this test and the previous one is that throughout the test, we sent both transactions which transferred ether and transactions which grew the state size, compared to only transfer in the preliminary test.
Prior to starting the test, we confirmed that all nodes were in synced with a high number of peers.
To ramp up the performance test, we began sending only legacy transactions to the network in large number, starting at block 167105, which got processed in 240ms by Nethermind.
Then, we added transactions which increase the state size by adding storage slots to the smart contract (which started with 114736207
storage slots) in parallel. Average block time on the network was ~19 seconds (PoW targetting 15 seconds, but difficulty was too high for the mining instance).
The two Besu nodes hosted in Paris and Sydney quickly fell behind head. Blocks it got stuck on:
The nodes kept processing blocks, albeit extremely slowly. All other nodes on the network kept processing the blocks without issues. See below for more on the problematic Besu nodes.
After the ramp up period, we went from sending only legacy transcations to a mix of 50-50 legacy & 1559 transactions on the network, along with maintaining the transactions that grew the network state. The Besu nodes that had fallen behind were restarted, but that did not immediately fix the problem.
The first block for this main test was 167235, and here is a 1559-style transaction from that block. The test ran up to block 167468, and throughout the ramp up and main test, 85289
storage slots were added to the smart contract, for a total of 114821496
at the end of the test.
Some statistics about the test:
Nethermind processing blocks between 150-700ms, consistent with and without pruning. The restarted node caught up quickly (6 blocks/second, for 20-30 blocks). Memory and everything looks healthy. Using ~3 GB of RAM, using a 1GB cache
Besu using ~1.7 GB of RAM
Besu
Geth
Nethermind
After the test had concluded, we tried to understand what went wrong with the Besu nodes and, after one week of trying to reproduce the issue, we haven't been able to.
We have re-run a similar test multiple times (with a similar mix of nodes from different clients in different regions) and have not seen any issues where Besu nodes get stuck slowly processing blocks.
See a full write-up of the follow up test here.
https://consensys.zoom.us/rec/share/3zae8EBROau0DiR_-O7knj9moxCabqn5d-vFPapgbQnJems3tnUEvfDkFpKDnmeT.v4fKPfnYOURWwk3b Passcode: YN%a8tr+
Have nodes in >1 Region
Have metrics turned on for all nodes
Have a list of enodes of all participating nodes
Make static nodes connections between all the nodes upfront
Confirm that any node is capable of sending one transaction of legacy and eip1559 types each to the network and get included in the block
Share the ethstats, Block Explorer addresses with all particpants
Have access to machine logs and be able to retrieve them when needed
Confirm that there are no disconnections between nodes when idle
Setup more than one node for tests
Confirm representative of each client team
Confirm that the tx generator tool works ahead of the call (execute 2 - 3 blocks)
(Nice to have) Define storage touching transactions for the test too?
For accounts: ensure that we do not use same accounts over and over again, which would enable caching and falsify the results