final report - HackMD

## EPF Final Report - Vineet Pant **Project:** EIP-7745 Log Value Index Implementation **Client:** Nimbus Execution Client **Cohort:** 6 **Report Date:** January 12, 2025 **Mentors:** [Etan](https://github.com/etan-status) and [Advaita](https://github.com/advaita-saha) ### Project Abstract Implementation of EIP-7745 (Log Value Index) in Nimbus Execution Client **Summary:** I decided to work on [EIP 7745](https://eips.ethereum.org/EIPS/eip-7745) and thankfully with the help of mentors from Nimbus team we have achieved M0 implementation. EIP 7745 aims at replacing traditional `logsBloom` with `logIndexSummary` structure which offers: - Efficient log queries - Cryptographic proofs - Significantly improved light client capabilities **Links:** - [Proposal Document](https://github.com/eth-protocol-fellows/cohort-six/blob/master/projects/pureth-eip-7919-nimbus.md) - [EIP 7745 implementation guide](https://pureth.guide/implementations-7745/) - [nimbus-eth1 PR](https://github.com/status-im/nimbus-eth1/pull/3820) - [nim-eth PR](https://github.com/status-im/nim-eth/pull/827) - [Pureth Glamsterdam](https://pureth.guide/glamsterdam/) ### Project Description EIP-7745 addresses fundamental limitations of Ethereum's bloom filters by introducing `LogIndex` and `FilterMaps` which are used to generate `logIndexSummary` to replace `logsBloom`. - LogIndex answers: "What logs exist?" - FilterMaps answers: "Where can I find logs from address X?" - LogIndexSummary = The table of contents + index + cryptographic fingerprint Following benefits are achieved with the introduction of `logIndexSummary` - Zero false positives * `logsBloom` has high false positive rate (approx 40%) while `logIndexSummary` has guaranteed result. - Direct queries by address or topic instead of scanning blocks sequentially * With traditional `logsBloom` all blocks are sequentially checked so time complexity is O(n) but in `LogIndexSummary` Approach: 1. proc findLogs(address: Address, blockRange: 0..100000) = 2. let results = eth_getLogsByAddress(address, blockRange) 3. Returns only matching blocks immediately [5, 42, 156, 789, ...] * Performance: 1. RPC calls: 1 (direct query) 2. Time: O(log n) or O(1) depending on index structure 3. Example: 100k blocks = ~1 second to query index - Cryptographic proofs via merkle roots * With traditional `logsBloom` Light Client has to trust full nodes but with `logIndexSummary` Light client would do following: 1. Extracts LogIndexSummary from block header 2. Gets root from summary 3. Verifies merkle proof against root 4. If valid: log is GUARANTEED to be in block ! - Light client optimization * With traditional `logsBloom` all block headers have to be downloaded, parsed and verified which takes significant amount of Bandwidth and consumes time. * While `logIndexSummary` gives matching blocks with single query which can be downloaded and verified with proofs The same 256-byte `header.logsBloom` field is reused and replaced with 256 byte `logIndexSummary` **Critical Implementation Requirement:** All data structures use **SSZ (Simple Serialize)** encoding to enable: - Merkleization for cryptographic proofs - Deterministic serialization for consensus This required migrating Nimbus from traditional `seq[Topic]` types to SSZ-compatible `List[Topic, 4]` types throughout the codebase. ## Status The core implementation is functional and verified in a Kurtosis testnet environment. All primary objectives have been achieved for [M0 milestone](https://pureth.guide/implementations-7745/) as per objective: - **Core LogIndex System** - Core implemented with all data structures - **Block Creation Integration** - tx_packer.nim generates LogIndexSummary - **Block Validation Integration** - process_block.nim validates LogIndexSummary - **Fork Activation Logic** - Dual validation (bloom vs LogIndexSummary), pre and post EIP 7745 activation. - **LogIndex Accumulation** - Properly accumulates from genesis - **Testnet Verification** - Successfully validated in Kurtosis - **Draft PRs** - Created for both nim-eth and nimbus-eth1 ### High-Level Design ``` ┌─────────────────────────────────────────────────────────────┐ │ Block Processing │ │ │ │ 1. Receive parent's LogIndex state │ │ 2. Process transactions → collect logs │ │ 3. add_block_logs() → accumulate to LogIndex │ │ 4. createLogIndexSummary() → generate 256-byte summary │ │ 5. Encode summary → header.logsBloom │ │ 6. Return (receipts, updatedLogIndex) to chain │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ LogIndex Storage │ │ │ │ Store in BlockRef for next block to use │ └─────────────────────────────────────────────────────────────┘ ``` ### Core LogIndex Data Structures All Data Structures and Core functions for LogIndex implementations are in file @`execution_chain/core/log_index.nim`. #### 1. LogIndex (The Global Accumulator) Accumulates all logs from genesis. This grows with each block. ```nim LogIndex = object epochs*: seq[LogIndexEpoch] # All log data organized by epochs next_index*: uint64 # Total entries since genesis latest_block_delimiter_index*: uint64 latest_log_entry_index*: uint64 latest_value_index*: uint64 ``` --- #### 2. LogIndexSummary (Block Header Payload) 256-byte summary that replaces bloom filter in block headers. ```nim LogIndexSummary = object root*: Hash32 # Merkle root of entire LogIndex epochs_root*: Hash32 # Merkle root of epochs epoch_0_filter_maps_root*: Hash32 # Merkle root of filter maps latest_block_delimiter_index*: uint64 # Index of last block marker latest_log_entry_index*: uint64 # Index of last log latest_value_index*: uint32 # Position in filter maps latest_layer_index*: uint32 latest_row_index*: uint32 latest_column_index*: uint32 latest_log_value*: Hash32 latest_row_root*: Hash32 ``` ``` Offset Size Field ───────────────────────────────────────────────── 0x00 32 root (Hash32) 0x20 32 epochs_root (Hash32) 0x40 32 epoch_0_filter_maps_root (Hash32) 0x60 8 latest_block_delimiter_index (uint64) 0x68 32 latest_block_delimiter_root (Hash32) 0x88 8 latest_log_entry_index (uint64) 0x90 32 latest_log_entry_root (Hash32) 0xb0 4 latest_value_index (uint32) 0xb4 4 latest_layer_index (uint32) 0xb8 4 latest_row_index (uint32) 0xbc 4 latest_column_index (uint32) 0xc0 32 latest_log_value (Hash32) 0xe0 32 latest_row_root (Hash32) ───────────────────────────────────────────────── Total: 256 bytes ``` --- #### 3. FilterMap (Sparse Bitmap) Efficiently store which addresses/topics exist without requiring massive bitmaps. - Sparse: Only store coordinates of set bits and takes few KB of storage ```nim FilterMap = object rows*: Table[uint64, seq[uint64]] # row_index -> [column_indices] ``` ```nim For value V (address hash, topic hash): row = V mod 2^16 # 65,536 possible rows column = V div 2^16 # Position within row Store: rows[row] = [col1, col2, col3, ...] ``` --- #### 4. LogIndexEpoch Organizes logs into epochs for efficient processing and storage. ```nim LogIndexEpoch = object records*: Table[uint64, LogRecord] # index -> log record log_index_root*: Hash32 # Merkle root of this epoch filter_maps*: FilterMaps # MAPS_PER_EPOCH filter maps ``` --- ### Code Implementation #### SSZ Type Migration Challenge in `nim-eth` Apart from implementation in `nimbus-eth1` there were design challenges to be addressed in `nim-eth` dependency. LogIndex uses List[Topic, 4] (SSZ list with max size), but Nimbus historically used seq[Topic] (unbounded sequence). **Resolution:** - Updated type definitions in nim-eth to use SSZ-compatible types - Modified serialization/deserialization logic - Ensured backward compatibility during migration - Resolved compilation errors and fixed tests **Impact:** Enabled proper merkleization for proof generation while maintaining type safety #### `nimbus-eth1` Code Implementation The implementation basically handles 2 scenarios: 1. Block Import When receiving blocks from peers via P2P: ``` forked_chain.nim: importBlock() ↓ chain_private.nim: processBlock(parentBlk, ...) ↓ state.nim: BaseVMState.init() ├─ Receives logIndex parameter └─ logIndex = parentBlk.logIndex ← Pass parent's state! ↓ process_block.nim: validate block ├─ add_block_logs() accumulates current block's logs └─ Validate LogIndexSummary matches header.logsBloom ↓ Return (receipts, updatedLogIndex) ↓ forked_chain.nim: appendBlock() └─ Store in BlockRef { logIndex: updatedLogIndex } ``` **Key Files:** - `execution_chain/core/chain/forked_chain/chain_private.nim` - processBlock() - `execution_chain/core/chain/forked_chain.nim` - appendBlock() - `execution_chain/evm/state.nim` - BaseVMState initialization 2. Block Creation When creating new blocks to propose: ``` tx_packer.nim: assembleBlock() ↓ tx_desc.nim: updateVmState() └─ setupVMState(logIndex = chain.latest.logIndex) ↓ tx_packer.nim: pack transactions ├─ Collect logs from transaction execution ├─ add_block_logs() accumulates logs ├─ createLogIndexSummary() generates 256-byte summary └─ Encode to header.logsBloom ↓ Propose new block with LogIndexSummary ``` **Key Files:** - `execution_chain/core/tx_pool/tx_desc.nim` - setupVMState() - `execution_chain/core/tx_pool/tx_packer.nim` - assembleBlock() ### Unfinished Work The M0 milestone is complete and functional, but there's still work to do before this is production-ready. #### 1. Code Cleanup The PRs are currently in draft state. I need to go through and clean things up: - Remove debug logging that I added while debugging accumulation issues - Clean up commented code from experiments that didn't work out - Add proper documentation comments for public functions #### 2. RPC Integration and Testing - The M0 milestone focused on block processing, so RPC integration wasn't part of the scope. The next step is exposing LogIndex functionality through RPC calls: 1. eth_getLogs - Currently this uses bloom filters to find matching blocks. With LogIndex, we can query the FilterMap directly to get only blocks that actually contain logs from a specific address or topic. This should be much faster and eliminate false positives. 2. eth_getLogProof - This is one of the main benefits of EIP-7745 - generating merkle proofs for logs. Light clients can verify these proofs against the LogIndexSummary in the block header without trusting full nodes. This needs to be implemented. #### 3. Performance Testing I haven't done proper performance comparisons yet. Need to measure: - Query time: LogIndex vs bloom filters for eth_getLogs - Memory usage: How much does LogIndex grow over time? - Block processing speed: Does LogIndex add noticeable overhead during block import and creation? #### 4. Kurtosis Testing with Other Clients - Kurtosis works great for testing Nimbus in isolation, but I couldn't test interoperability with other clients since no other client has M0 implemented yet. ### Pivots and Changes #### Pivot: LogIndex Accumulation Strategy - Initially I stored LogIndex in a separate global state but later to accumulate all logs from Parent to Next Block I had to change it. - Pivot: Pass LogIndex as parameter through all vmState initialization functions **Impact:** - Simpler architecture (fewer state containers) - Clearer data flow (explicit parameter passing) - Easier to debug (visible in function signatures) ### Current Usability - Code compiles and runs correctly - Technical docs available - Verified in testnet environment - Kurtosis deployment successful - Blocks process without failures - LogIndexSummary validates correctly ### Current Impact In the current status of EIP 7745 with M0 milestone, following points are worth mentioning: - Reference implementation for implementers - Testnet-ready code for early adopters - Architectural proof of replacing `logsBloom` with `logIndexSummary` shows that it can be used by light clients in future ### Limitations - Pending code review from Nimbus core team - No RPC query extensions yet - Not yet production-ready - Depends on EIP-7745 acceptance into Ethereum roadmap ## Journey with EPF Well, as Mario and Josh kept reminding us - EPF is not a sprint, it's a marathon. They were right. The most challenging part wasn't writing code - it was understanding where the code should go. When I started, Nimbus felt like a massive codebase I didn't understand. I remember spending two days trying to figure out where LogIndex accumulation should happen. The SSZ type migration was brutal. Changing seq[Topic] to List[Topic, 4] in nim-eth broke compilation and tests. Kurtosis Docker builds kept failing with weird errors. I spent days debugging that instead of writing code. Sometimes protocol development feels more like DevOps than blockchain research. The Pureth guide was a lifesaver. Whenever I got stuck, I'd go back and re-read the M0 requirements. Breaking it into milestones prevented me from getting overwhelmed by the full scope. To be honest, there were weeks where I questioned if I'd finish. But taking it day by day - implement one data structure, fix one test, understand one more piece of Nimbus architecture - that's what got me through. It's less about crossing a finish line and more about the daily habit of showing up and learning. ## Acknowledgments **Mentors:** - **Etan** - For creating the Pureth guide, providing architectural guidance, and patiently explaining Nimbus internals - **Advaita** - For code reviews, debugging assistance, and answering countless questions about Nim **EPF Program:** - **Josh Davis & Mario Havel** - For organizing EPF Cohort 6 and creating a supportive learning environment - **Fellow EPF Participants** - For cooperation and interesting conversations throughout the EPF program.