# Post-mortem report: Gnosis liveness Oct 19th 2023 **Current status**: Mitigated. Stakers running Nethermind are strongly recommended to update to Nethermind [v1.21.1](https://github.com/NethermindEth/nethermind/releases/tag/1.21.1) ## Summary On Oct 19th, 2023 around 02:24:25 UTC Gnosis chain block producers suffered a network-wide block production issue that caused nodes to produce blocks with an invalid state root. Every node in the network (including the producer itself) rejected the blocks. As a consequence, the network did not see any new blocks for a period of multiple epochs until the error self-corrected. A total of 4 incidents happened on the same day. The incident resulted in a temporary loss of liveness. There was no loss of funds, nor slashings, nor long re-orgs to full nodes. ![](https://hackmd.io/_uploads/ByfNLHuf6.jpg) _Grafana dashboard displaying metrics of participation rates of the first incident_ ![](https://hackmd.io/_uploads/B1xSdB_Ma.jpg) _Screen capture of [gnosischa.in](https://gnosischa.in/epochs) for the last incident epoch range_ ## Impact A total of 416 blocks with invalid state transitions were produced in 4 separate incidents (41, 85, 146, 144). The absence of blocks impacted end users, increasing transaction inclusion times, in the range of a few minutes. For each incident, the first block after the incident was 99% full, indicating a backlog of transactions. Stakers perceived a very small loss of expected consensus revenue. The absence of block space prevents the inclusion of attestations, so each network participant is seen as offline. ## Root causes Under some very specific circumstances, Nethermind can build blocks on an incorrect state root (grandparent) rather than the parent state root. A change of head during block production may cause the block improvement routine to choose the incorrect parent. Consensus clients drive the execution client fork-choice via FCU (Fork-Choice Update) messages. These messages are also used to start a block production routine, where the execution client attempts to build the best block possible in a loop until the consensus client requests the final payload. See below an example sequence of events that trigger the issue: 1. FCU with payload attributes for block 1000A (block building process initiated). 2. FCU block 1000B. 3. Main chain changed. (1000B) 4. A new block improvement for block 1000A begins. 5. Building on an incorrect state root. (999A instead of 1000A) 6. Resulting in an invalid block. For a full description of the cause and fix, refer to [NethermindEth#6212](https://github.com/NethermindEth/nethermind/pull/6212) ## Lessons learned Gnosis chain does not have a diverse execution client stake distribution today, with most stake running Nethermind. Erigon has added support for Gnosis chain this year but it's yet to gain sufficient adoption. We will increase our efforts to motivate stakers and node runners to divest into running minority clients (Erigon). A wider distribution of execution clients would have not resulted in a continuous absence of blocks, but only a localized reduction of block space for some epochs. Gnosis core devs and devops will increase the number of debugging tools running permanently to accelerate the resolution of similar incidents. While there was alerting in place for adverse network conditions, a high number of invalid blocks received through p2p was not on the alert list. We'll review and increase the list of network alerts to be aware of a wider range of network conditions indicative of potential issues. ## Timeline _All times in UTC_ **Oct 19th 2023** **02:24:25 - 02:27:50**: Incident 1, 41 invalid blocks, slots [11737785](https://gnosischa.in/slot/11737785) - [11737826](https://gnosischa.in/slot/11737826), epochs 733611 - 733614 **07:21:45**: beaconcha.in team member alerts of irregular chain activity. All hands on deck. **10:14:40 - 10:21:45**: Incident 2, 85 invalid blocks, slots [11743428](https://gnosischa.in/slot/11743428) - [11743513](https://gnosischa.in/slot/11743513), epochs 733964 - 733969 **~14:30:00**: With the second incident data it's confirmed that it's an invalid block issue and not a long re-org. **16:55:35 - 17:07:45**: Incident 3, 146 invalid blocks, slots 11748239](https://gnosischa.in/slot/11748239) - [11748385](https://gnosischa.in/slot/11748385), epochs 734264 - 734274 **23:36:50 - 23:48:50**: Incident 4, 144 invalid blocks, slots [11753054](https://gnosischa.in/slot/11753054) - [11753198](https://gnosischa.in/slot/11753198) epochs 734565 - 734574 **Oct 21th 2023** **14:32:00**: Nethermind devs identify the most likely cause of the issue and PR a fix [NethermindEth#6212](https://github.com/NethermindEth/nethermind/pull/6212) **Oct 23th 2023** **14:56:00**: Nethermind devs confirm the fix [NethermindEth#6212](https://github.com/NethermindEth/nethermind/pull/6212), by recreating the incident locally. **Oct 25th 2023** **13:16:00**: Nethermind version [v1.21.1](https://github.com/NethermindEth/nethermind/releases/tag/1.21.1) released, including the fix for the incident.