This page documents investigation into panics reported by "SnowyHitch" on the Lighthouse Discord.
There have been three panics:
cache_arena.rs:184:12 (Investigation)
cache.rs:153:14 (Discord)
cache.rs:159:14 (Discord)
Screenshots
Panic #1
Paul Hauner changed 2 years agoView mode Like Bookmark
The Panic
Investigation
Lighthouse is panicking with an out-of-bounds slice access at position 144115188075905177. Let's call that position p.
In binary, p isn't very interesting:
>>> math.log2(p)
57.00000000000049
>>> bin(p)
Paul Hauner changed 2 years agoView mode Like Bookmark
From a CL developer perspective, there are three serialization formats for consensus structures (e.g. BeaconBlock, ExecutionPayload):
Format
Primary Use
Primary Goal
Endianness
Specs
SSZ
P2P comms & hashing
Paul Hauner changed 2 years agoView mode Like Bookmark
Logs: https://drive.google.com/file/d/1M5rcOt_WvilZIUgxabjMrSg7kMtPRBTQ/view
What happened to 0x5e060d315a364b3cc84cd3984ddca9792057f853218e9197c3f49104361fb189?
Note:
This document refers to two blocks:
The parent: 1b28
The child: 5e06
Paul Hauner changed 3 years agoView mode Like Bookmark
Summary
This high-priority release contains an important fix to ensure that Lighthouse does not attempt to produce invalid blocks. Furthermore, it improves block production efficiency, thereby increasing the likelihood of a Lighthouse-produced block being included in (and rewarded by) the canonical chain.
We recommend all mainnet users update before the Bellatrix upgrade on Sept 6, 2022, 11:34:47am UTC. Testnet users should upgrade at their next convenience.
Notable changes include:
Improved handling of offline EEs during sync (#3428)
Optimisations to block production (#3312)
Improvements to how builder-enabled VCs interact with non-builder BNs (#3488)
Michael Sproul changed 3 years agoView mode Like Bookmark
Summary
This high-priority release contains the parameters to enable the mainnet merge scheduled for September 2022. All mainnet users must upgrade to this release (or a subsequent release) before the Bellatrix fork on Sept 6, 2022, 11:34:47am UTC.
Users who fail to upgrade their nodes before the Bellatrix fork (September 6th) will stop following the canonical chain. We recommend upgrading to v3.0.0 at your earliest convenience.
In addition to upgrading to v3.0.0 (or later), users will also need to make other changes to their nodes. These changes will be familiar to Goerli/Prater users whilst all other users should see our Merge Migration documentation.
Users will also be required to ensure that their "execution layer" client (i.e. Besu, Erigon, Geth or Nethermind) is also on a version with the latest merge parameters. We expect all consensus and execution layer clients to have merge-ready releases by 2022-08-23 (UTC). We recommend that users who are already using the --execution-endpoint flag to wait until their execution layer client has released a merge-compatible release and update both clients together. Updating Lighthouse before the execution layer is not harmful, but it will result in noisy ExchangeTransitionConfigurationFailed errors. The Ethereum Foundation is expected to publish an annoucement on 2022-08-23 (UTC) with detailed information about which client releases are mainnet-ready.
There are also other valuable improvements and fixes in this release making it relevant to Prater/Goerli users as well:
Paul Hauner changed 3 years agoView mode Like 1 Bookmark
This analysis was done by using a Lighthouse node run by Sigma Prime. Authored by @paulhauner.
The data is based upon the output of the lighthouse/proto_array HTTP endpoint and was captured at 1660184917 (Thu Aug 11 2022 02:28:37 GMT+0000), which was approximately 10 seconds before the merge transition was finalized.
Merge Transition Blocks
There were two merge transition blocks:
Canonical
Slot: 3639527
Paul Hauner changed 3 years agoView mode Like Bookmark
Summary
This high-priority release contains important fixes for mainnet users.
There were two separate bugs introduced into fork choice in v2.4.0 and v2.5.0.
The first bug results in a steady memory footprint increase of 100MB per month. It has been less than two weeks since that release so it's unlikely that a significant memory increase can be observed, yet. It was fixed in #3408.
The second bug can result in an error during fork choice. This error will only be triggered in rare timing-based circumstances and will resolve itself within seconds. It was fixed in #3402.
Furthermore, an incompatibility between Lighthouse VCs running v2.5.0 and BNs running a version prior to v2.5.0 was detected and fixed in #3410.
Summary
This medium-priority release contains a fix for mainnet users experiencing slow "eth1 cache" syncing times (several hours or more). A synced eth1 cache is required for reliable block production.
For Prater/Goerli users, this release contains several new features and bug fixes. The developers kindly request that all Prater/Goerli users update to this release before the Bellatrix upgrade (2022-08-04 12:24 pm UTC). This release is very close to what will be used for the mainnet merge (presently unscheduled). This release should get as much testing as possible during the Prater/Goerli upgrade. Any Prater/Goerli users on v2.3.1 must upgrade to this release before the Bellatrix upgrade or they will follow the wrong chain.
Improvements and fixes include:
Add execution_optimistic flag to HTTP responses (#3070, #3374)
Fix slow eth1 cache syncing times (#3358)
Full support for builder specs v0.2.0 (i.e. mev-boost support) (#3134)
Michael Sproul changed 3 years agoView mode Like Bookmark
Summary
This low-priority release contains improvements for mainnet validators and support for the upcoming Goerli/Prater merge. This release is recommended for all validators on all networks.
Whilst this release is "low-priority" for mainnet, Prater users must upgrade to this release (or a subsequent release) before 2022-08-04 12:24 pm UTC for the Bellatrix fork. Failure to upgrade in time will leave nodes following the wrong chain.
Improvements and fixes include:
Various bugfixes (#3258, #2911, #3287, #3331, #3350, #3347)
Various optimisations (#3271, #3301, #3272, #3335)
Support for the Sepolia network (#3268, #3288)
Paul Hauner changed 3 years agoView mode Like Bookmark
This document describes the changes required for consensus clients to be compatible with Milestone 1 of Flashbots mev_boost.
Flashbots mev_boost is an application that builds ExecutionPayloads for post-merge Beacon Chain validators. In short, does all the fancy MEV stuff for PoS Ethereum that Flashbots already does for PoW Ethereum.
The Flashbots Scheme
In Flashbots, there is a series of entities invovled in producing an ExecutionPayload. These include searchers/users who find transactions, builders who collate these transactions into an ExecutionPayload and relays who transmit an ExecutionPayload to validators.
From a consensus client perspective, a consensus node will interact with mev_boost, a component which abstracts away searchers/users, builders and relays. There is only one factor which necessarily leaks into the consensus engine, hiding the payload transactions from the validator.
In the Flashbots scheme, a validator must commit to a set of transactions without knowing those transactions. This prevents a validator from "stealing" transactions and breaking the Flashbots economic model.
Paul Hauner changed 3 years agoView mode Like Bookmark
Benching a recent mainnet block
These benchmarks involve getting a recent mainnet BeaconState (slot 1783731) and then deriving two BeaconState objects:
A "clean" state with no changes.
A "dirty" state where 512 state.balances entries have been incremented by 1. The changes are evenly distributed across the state.balances array.
Then, for each state, we perform a cached tree-hash (with an already initialized cache). Results:
Clean: 35.642 ms
Dirty: 37.109 ms
Paul Hauner changed 4 years agoView mode Like Bookmark
For the past ~5 months, Lighthouse has been implementing Doppelganger Protection. I believe either Dankrad or Superphiz came up with this concept, you can find a loose description here: https://github.com/sigp/lighthouse/issues/2069. The Lighthouse implementation was merged here: https://github.com/sigp/lighthouse/pull/2230.
Doppelganger Protection (DP) is a nice feature, but it's turned out to be rather difficult to implement. It sits in very critical code paths in the validator client. Failures in DP can easily result in full liveness failures on the network. The difficulty with implementation stems from two major points:
Edge cases
Difficulty in testing
I (@paulhauner) created this document to help share some of the edge-cases we found along the way.
Edge-case: Genesis
Paul Hauner changed 4 years agoView mode Like Bookmark
OUTDATED, see revision 2: https://hackmd.io/BdJsIOICRhGmvRywGa4SWA
This document is a review by Paul Hauner (and whoever else wants to contribute) on the Lighthouse UI Prototype delivered by Aqeel on 2021/07/09.
Missing Pages
Full validator onboarding flow. Demonstrated via existing HTML code.
Validator top-up
Validator exit
Recover Validator
Paul Hauner changed 4 years agoView mode Like Bookmark
Introduction
This document describes a method for tracking the performance of Eth2 validators. It tracks the following parameters for each validator:
attestation_hits: The validator had an attestation included on-chain.
attestation_misses: There was no attestation included on-chain for an epoch.
head_attestation_hits: The validator had an attestation included on-chain with a correct beacon_block_root.
head_attestation_misses: The validator had an attestation included on-chain with an incorrect beacon_block_root.
target_attestation_hits: The validator had an attestation included on-chain with a correct target.root.
target_attestation_misses: The validator had an attestation included on-chain with an incorrect target.root.
delay_avg: The average of all the attestation delays where an attestation was included on-chain.
Paul Hauner changed 4 years agoView mode Like Bookmark
TL;DR
The honest-validator spec says we should broadcast attestations as soon as we get the block. However, this means attestations will likely propagate before the block does. To resolve this, the spec says we should cache attestations where we don't know the head block.
However, we can't always signature verify attestations if we don't know the head and this makes it difficult to design a DoS resistant cache.
Detail
The Honest Validator spec delcares:
A validator should create and broadcast the attestation to the associated attestation subnet when either (a) the validator has received a valid block from the expected block proposer for the assigned slot or (b) one-third of the slot has transpired (SECONDS_PER_SLOT / 3 seconds after the start of slot) -- whichever comes first.
Paul Hauner changed 4 years agoView mode Like Bookmark
There is an issue on mainnet and Pyrmont where blocks from slot % SLOTS_PER_EPOCH == 0 are arriving late. Due to a quirk in fork choice, validators are building their block atop the late block, even though they attested to an empty block in that slot. The result is that validators infrequently but consistently miss the head/target on their attestations because they built a conflicting block.
This document examines the shape and size of the problem on the Pyrmont testnet. It also details a change that can reduce block propagation times.
Note: all data in this block is from Pyrmont. Mainnet analysis can follow later.
Identifying the Problem
This graph represents the delay between the when a block should have been created and when it was recieved.
For example; if a block with slot 1 was recieved half-way through slot 1, then the delay would be 6 seconds (half a slot).
Paul Hauner changed 4 years agoView mode Like Bookmark
This minor release contains some important protection from excessive resource consumption in some uncommon cases (#2130). We recommend all users update to this version.
The breaking changes in this release should be insignificant for most, see below for more information
New Features
The most notable feature is the new "validator monitor" that allows a BN to provide additional logging and metrics for specific validators.
This provides the long-awaited "validator balance" metric and many, many others.
Additionally, you get additional logs about your validators activities (e.g. attestation inclusion in blocks).
Paul Hauner changed 4 years agoView mode Like Bookmark
This document tracks progress towards the Lighthouse sec review starting on the morning of Oct 6 (Sydney time).
Items: per-person
Allocated
Paul
Validator key caching (review)
Validator top ups (review)
Paul Hauner changed 5 years agoView mode Like Bookmark
This document is intended to provide security reviewers an overview of the Lighthouse project.
Project Overview
The primary goal of the https://github.com/sigp/lighthouse project is to serve the following three groups of users:
Eth2 stakers.
Those who wish to obtain a view of the eth2 network via our API.
Those who wish to run a beacon node (or boot node) for the good of the network.
We serve these users via a single binary named lighthouse.