svalbard - CL hardening summary - 30 Apr '26

**Date:** 2026 / 04 / 30 **Hosts:** Pari, Dapplion **Time:** 13:30 # CL Hardening Sync — Meeting Summary ## Agenda 1. Fork Choice Compliance Tests (Mikhail) 2. Diamond / CL Hardening Scenarios 3. Glamsterdam Edge Cases 4. Builder Testing Tool (build0r) demo 5. Misc hardening debt (checkpoint sync, mev-boost deprecation) --- ## 1. Fork Choice Compliance Tests **Status:** PRs are open against most CL clients to adopt the new compliance test suite. Adoption effort has historically been a blocker. **Test generation pipeline:** - Built on top of the original fork choice compliance work done with Alex ~2 years ago, extended for EPBS. - Recent work (by Alex Mosters): refactor + speedup, plus support for separate payload publishing and payload timeliness votes for Glamsterdam. - Three miniZinc models drive generation: - **Supermajority link model** — generates trees of supermajority links - **Block tree model** — generates block trees of various shapes - **Block cover model** — enumerates predicate vectors for the filter block tree - Constraint solver produces solutions (sometimes exhaustive, sometimes a subset), then test vectors are generated in the standard format. - Output: a few thousand tests per preset (tiny / small / standard). **New test format addition:** `viable_for_head` roots and weights — yields blocks in fork choice that are not filtered out, with their fork choice weight. In Glamsterdam, **payload status** is also added to this check. This is the only format change clients need to adopt; it can otherwise be ignored. **Coverage:** - Models support Altair through Glamsterdam (including Fulu). - Glamsterdam tests cover execution payload votes, payload attestations, and payload reorg scenarios. - Two levels of randomization: randomized model instantiation + mutation operators (message reordering / delays). **Validation of the models themselves:** - Models are simple → peer review. - Test vectors were validated against Teku initially with no model issues found. - A test runner exists in the framework that replays vectors against the official Python spec — kept minimal to reduce its own bug surface. **Known caveats raised:** - Fork choice is the area where implementations diverge most from spec (e.g., proto-array vs. spec). - Client-specific optimizations (e.g., Teku dropping old attestations via heuristics) are legit but not always spec-compliant → produces test failures requiring careful triage. - Suggestion: leverage AI for initial failure triage given expected volume. - Gaps noted: no checkpoint sync tests, no synthetic-state injection (would require feeding fork choice a starting state directly), helper functions in FCU lack their own unit tests. **Action item:** Each client team should review and merge some version of Pari's PR, then engage with Mikhail on debugging. The included script is removable if undesired. --- ## 2. Diamond / CL Hardening Scenarios **Repo:** `eth-clients/diamond` — every client team should already have admin access via their org. **Purpose:** Reproducible scenarios (Kurtosis + Assertoor) that capture mainnet-scale edge cases at smaller scale, including the failing case and the fix. **Existing scenarios:** - Non-finality with no blocks for an extended period - Non-finality with blocks but no attestations - Checkpoints during non-finality - Fulu proposer shuffling bug for large deposits (originated as a bug bounty submission) - EPBS non-finality (proof of concept) — Lighthouse + Besu survive ~1hr without blocks then heal **Status:** ~4–5 of the original 38 Berlin brainstorm items implemented; not top priority for any team. Acknowledged that slow progress is acceptable as long as there is *some* progress. ### New scenarios proposed in this session | Scenario | Notes | |---|---| | State bloat via builder deposits | Bounded by reused builder index; would require ~10K ETH minimum to fill, hard to test directly | | Overflows on large validator state | Active count vs. total count distinction (root cause of recent Teku/Hoodi issue); systematically search for overflow points across clients | | Huge exited-validator set at genesis | Default for devnets/Kurtosis to stress epoch transitions and shuffling | | Forks + attestations during non-finality | More valuable than pure non-finality; forces shuffling computation over giant validator sets, hits state-copy / reset paths | | Deep-history block processing | Open question: should clients process valid blocks built on parents 200+ slots behind? Currently inconsistent across clients | | Network-wide reorg testing across epoch boundaries | Including reorgs that change payload status (empty vs. full); EIP-1594 helps but client reactions still need coverage | | Out-of-order P2P actions (payload before block, etc.) | Cache-DoS risk surface; should be added to Diamond using TISM (malicious Prysm) hooks | | Buggy client injection causing forks | Already feasible on EL side via PK; need to exercise it in a larger network | **Key insight (Etan):** Non-finality on its own is not interesting; what stresses clients is **forks + attestations during non-finality**, because computing shufflings outside the standard 2-epoch dependent-root window forces state copies/resets. **Range to target:** Recovery from ~10–20 epochs of non-finality with maximal forking. Honest acknowledgment that we likely can't handle 10 today in a maximally forking scenario. **Slashing:** Holesky-style mass slashing handling still requires lazy-slashing scenario work that hasn't been built. --- ## 3. CL "In-Between" Testing Layer **Problem:** Gap between unit tests and full Kurtosis network. When EPBS networks fall apart in Kurtosis it is hard to localize bugs quickly. **EL has Hive** for this — give a start state, feed a block, assert end state. CL has nothing equivalent. **Existing partial solutions:** - Beaconfuzz (sunset; Sigma Prime working on a replacement) - Lighthouse's own DB fuzz - Lodestar plans Glamsterdam fuzzing - Prysm doesn't currently fuzz the state transition **Historical caveat:** Multi-client fuzzing has previously diverged due to disagreements about what constitutes a "valid" starting state (e.g., proposer index optimization mismatch traced to spec-invalid states; Prysm dropped from cluster over justification-bits handling). Any fuzzer must start from states reachable via valid state transitions. **Proposal:** Resurrect a CL Hive-like simulator, possibly built on Kurtosis for maintainability. Scenarios would be: load checkpoint state, feed signed blocks over P2P, assert end state; later extend to grow chain + reorg over alternative branches. Acknowledged as not a one-month project. --- ## 4. Glamsterdam Edge Cases **Priority framing:** Focus on **bugs we cause ourselves** over malicious-attacker scenarios (Cristian's framing). Catastrophic scenarios > attacks. **Race conditions / message ordering:** Identified as the most important class. Test goals: - Verify next-slot attesters and proposers behave correctly when payload arrives late - Late payload reorged by proposer → reorg succeeds, network stays healthy - Skipped block + late payload → next attesters vote that payload was absent - Payload arriving in next slot before attestation deadline → must NOT be attested for **Diagnostic blocker:** Today in the EPBS devnet, attestations sometimes look wrong, making race-condition diagnosis hard. Not all clients have a complete fork choice yet — needs to land before scenario-level testing is meaningful. **Action item:** Cristian / Bharat to write a shared document expanding the existing markdown of out-of-order action implications to include proposer preferences and blobs. --- ## 5. build0r — External Builder Testing Tool **Built by:** Bharat / EF DevOps team. Hooks up to a normal EL+CL pair, requests payloads from EL, signs and submits bids. **Capabilities demonstrated:** - Configurable bid timing (e.g., reveal payload after N seconds vs. slot time) - Configurable bid count and interval - Configurable fee escalation - Subsidized bids - Multiple block variants via `extraData` mutation (different block hash, same parent) - Auto top-up of builder balance (can be disabled) - Kurtosis config flag for easy spin-up; Dora UI showed missed slots when reveal delay > slot time **Bid propagation caveat:** Bids are not yet propagating well across the network, so wins are limited to slots where the local node is proposer. ### Scenarios to add (collected during demo) - Timing games (already supported) - Circuit breaker triggers (no payload reveals for ~10 consecutive slots) - Payload envelope equivocations - Epoch boundary reorgs that recompute the look-ahead — verify proposer preferences are rebroadcast - Zero-value bids - Bids exceeding builder balance - Bid values > uint32 max and uint64 max (catch negative-number bugs) - Multiple missed reveals against pending delayed payment accounting (2-epoch deduction logic) - Spam / P2P protection limits - Bids from exited builders, not-yet-active builders, nonexistent indices, indices > uint32 - Equivocation on objects, submission at random times - Reveal payload but withhold data columns - Reveal payload for the wrong bid ### Productization debate **Cristian's position:** EF should maintain the *signing / Builder API* portion as a reference implementation so external builders (Titan, etc.) plug into it. This prevents builder-side bugs from introducing anything beyond invalid blocks. Builder-specific sequencing/simulation remains the builder's own concern. **Bharat's position:** build0r is explicitly a *test utility*, not a production builder. **Counter-arguments:** - Dapplion / Bharat: don't want a mev-boost-style maintenance burden for production builders' software. - Titan uses Rust; build0r is in Go — can serve as blueprint only. - Lighthouse (Michael) is internally discussing a separate **builder client** (analogous to validator client but for builder duties). Not committed. - Prysm has a builder included in the validator client (tightly coupled). **Glamsterdam-fork builder activation:** Open question whether to enforce active builders at slot 1 of Glamsterdam. Tricky because 0x03 deposits made before the fork enter the validator set and funds are unrecoverable. Bharat has a math helper website and offered to coordinate. **Cristian's strong claim:** Hard to imagine a bug here that isn't already caught — if we can produce a Glamsterdam block with build0r at genesis, it should work with an external bid. Worth recording. ### Proposer preferences / mev-boost **Disagreement on record:** - Bharat: nodes must subscribe to the proposer preferences gossip topic in Fulu (fork before Glamsterdam); important to test. - Cristian: doesn't expect anyone to use the gossiped preferences at the fork; Titan already has every validator's preferences from prior registrations including never-proposed validators. CLs won't see bids on P2P at the fork. Failure here is not catastrophic — falls back to local building. - Bharat: feels differently; debate parked. - Acknowledgement that Cowswap and mempool dynamics broke when local building took over during the last circuit breaker — argues for early external builder availability at the fork (not delayed). --- ## 6. Hardening Debt Status | Item | Status | |---|---| | Sync from unfinalized checkpoint | Lighthouse: started. Lodestar: supported (news to Pari). Prysm: not started. Teku: not started. Nimbus: not started. | | Berlin hardening to-do list | ~4 of 38 items implemented in the past year | | EPF projects | Coming to help close gaps | | Implementation notes | Michael (Lighthouse) wrote docs on unfinalized checkpoint sync; Cristian found their approach reasonable | **Mood:** Teams are stretched. Acknowledgment that Glamsterdam's pipeline plus Fusaka work has consumed bandwidth and a "rest period" is needed. --- ## Action Items | # | Item | Owner | |---|---|---| | 1 | Review and merge fork choice compliance test PRs | Each CL team | | 2 | Engage with Mikhail on test failures (with AI-assisted triage) | Each CL team | | 3 | Audit for overflow surfaces beyond active-validator count | Each CL team | | 4 | Add TISM-driven out-of-order action scenarios to Diamond | Pari / Dapplion | | 5 | Write shared doc expanding out-of-order action implications (incl. proposer preferences + blobs) | Cristian, Bharat | | 6 | Spec out a CL-side Hive-equivalent on Kurtosis | TBD | | 7 | Test Lodestar's unfinalized checkpoint sync | Pari | | 8 | Confirm/debate mev-boost deprecation path | All client teams | | 9 | Decide on builder-client reference implementation scope | EF + client teams | | 10 | Plan a non-finality + forking devnet with buggy CL injection | Pari + clients | --- ## Open Questions - Should Glamsterdam force external-only building for the first N epochs as a stress test? (Bharat: no; Etan: interesting; Cristian: not at fork.) - Is build0r the long-term reference implementation for the Builder API portion, or strictly disposable test infra? - Do we treat valid-but-deep-history blocks (200+ slots back) as compliance failures, or as legitimate client policy decisions? - What's the agreed deepest non-finality range we commit to handling cleanly? (~10–20 epochs floated.) - Does mev-boost stay as a supported sidecar post-Glamsterdam, or is it formally deprecated?