Headers Downloaders

[toc] # Headers Downloaders MVP: Headers download successfully Where: crates/net/headers-downloaders/src/linear.rs ### Assumptions: - Ethereum network has reverse download due to PoS - Non-Ethereum networks may have forward download. Is any abstraction breaking due to that? - Maybe BSC people can build that? ## Linear reverse downloader - [x] Downloads the headers in reverse order from the tip - [x] Over P2P in batches - What tip should we advertise to our peers? - Only after the Headers + Bodies have been updated. If you advertise your tip before you have the body you'll be penalized because peers will think you have the bodies and ask you for the bodies w/o you having them. - Akula already did this in the headers stage and we think it's unsound. - [ ] What should be the default? - [ ] Configurable? - [x] Checks that the response matches the header we requested (first request is the tip, then whatever we requested) - If response doesn't match we request from a different peer - [ ] [Peer penalization](https://github.com/paradigmxyz/reth/issues/231): * When to penalize? * When the peer sends us a malformed response * When consensus check fails * What happens if our connection drops? * Our peers would penalize us * Maybe nothing to do here? - [x] Consensus checks - [ ] Unit tested? - [x] Dispatches the next request and yields what has completed ### Questions we should figure out * The headers stage should commit the db entries at some point, so we wouldn't keep the transaction open with too many changes. Current `commit_threshold` parameter is simple the batch size to collect from the stream. ## Bodies Downloader MVP: Bodies download successfully Where: crates/net/bodies-downloaders/ ### Concurrent-ish bodies downloader * Takes the first header that it does not have a body for * Configuration * Total memory + bandwidth usage = peers * {requests/peer} * How many peers do we request things from? * How much do you request from each peer? * Currently requests 1. Can be made to request multiple bodies from each peer (ref: discussion on GH) * Ref: headers/bodies getting merged, doesn't solve the issue of not knowing which body to tie to which header * The Bodies Stage can be called multiple times: How frequently do you commit to the database? * [x] Yields an ordered stream * Sorts in-memory in request order * [ ] Penalizes * Checks are done in the stage right now, but they should be moved to the downloader so that we can penalize * Geth Block Bodies handler: https://github.com/ethereum/go-ethereum/blob/1b8a392153b39fbbde17536c730d96510e57341f/eth/protocols/eth/handlers.go#L215-L234 * Geth Block Fetcher: https://github.com/ethereum/go-ethereum/blob/262bd38fce0317f0123947add02cbb3ccde75f59/eth/fetcher/block_fetcher.go#L124 ### Tests * [x] Txns + ommers root match the header * State root to be validated in separate stage * [x] Responses out of order are sorted * [x] Retries on timeout * [ ] Integration test with Bodies Client ### Questions we should figure out - How do we match bodies to headers if we request multiple bodies from a peer at a time? - Dragan mentioned maybe indexing by transaction root - If ommershash + txroot is empty then bodies are not downloaded so we are OK for empty blocks even though there is many-to-one relationship from body to header. - Does this cover everything? - How fast is this? - There is no guarantee that the bodies sent from a peer is in the same order as the request - We could just assume they are in the correct order, and if not they would fail validation and we penalize the peer - We should check if geth responds in order, if they do, just assume this is standard --> should we open an EIP? - https://github.com/ethereum/go-ethereum/blob/262bd38fce0317f0123947add02cbb3ccde75f59/les/server_requests.go#L245 # Headers Stage * It is the only stage that works in reverse order for Ethereum * It is the only stage that doesn't pass control flow back to the pipeline while syncing, it passes back when it's at the tip * Currently the headers stage does not commit inside of it. * MVP: We will just commit inside of the stage. * We don't know what the overhead of an "empty commit" is and what the implications are for e.g. caching in MDBX. * Brainstorm: * Pulling out the stage from the pipeline for the initial reverse sync to "save" the redundant pipeline commits * Sounds broken because it might happen anyway ## Tests * Initial sync * Initial sync + gap * Failures from the downloader propagating to the stage # Bodies Stage * Uses an iterator that yields headers * Resolves them to bodies via the downloader * Body validation * Checks txn root + ommers root matches header * Passes control flow back to the pipeline on a configurable interval ## Questions * Txcount bug flagged in discord (@dragan to explain/ack the bug and we figure out the right fix in other places) # Senders Recovery Stage * Bottleneck: CPU Bound * Rayon-recovers the senders and verifies they match the `from` field * Should return control flow back to the pipeline to commit on the right intervals (cc @roman) # Execution Stage * Reads all * Headers * TxCount * Senders * Transactions * Execute all * Serially * Part in interpreter -> CPU bound * Part in host -> I/O Bound * Generate changesets * TxChangeSets * AccountChangeSets * StorageChangeSets * Receipts/Logs * Write all to DB * JsonTests inside of binary on CI * When do we commit? Based on blocks? Gas? * Maybe it makes sense to do by gas becuase it's more uniform * If you commit per block, then blocks with tiny gas are gonna be committed very frequently and you're gonna thrash the DB * If you commit per "gas block" * If your changeset table has the key of Address => Vec of BLock, then it keeps growing forever and is expensive * e.g. the WETH storage slot * That is a lot of info you need to load to just get all the blocks * Block-level changeset: For a given block * You use the Address => Vec of Block table to find the block at which an account was touched * You use the Block => State Table to get the pre-state of the tx * You execute all txs to get the tx * You execute the tx and get the diff * TX-level changeset: * You use the Address => TxCount table to find the tx numbre you used * Question: * **It seems like the Tx-level idnexing is going to blow up storage 10-100x which is unacceptable, even if it makes calltracing 100x faster and removes the need for index tables.** * # Intermediate Hashes Stage # Indexes * Are used primarily for RPC * We don't need to decide now on what indexes * We can look at what RPC calls are slow and add space/time tradeoffs for each one * Then we want to minimize the {time*space} function # P2P * Eth protocol implemented * headers * bodies * txs * getpooledtxs * fulltxs * blocks (deprecated in eth, not in other networks) * devp2p spec vs engine api tension * block message is still valid for devp2p, but post-merge this is deprecated (devp2p doesnt know what the merge is) * Other Subprotocols like LES not implemented * Used in: * Stages: get Headers, get Bodies * TxPool: * receive new txs & insert new txs to the pool * receive by other nodes and insert into our pool * [RPC] create ourselves and insert into the pool * receive and relay them * PooledTransactions: the msg variant for vec of txs * FullTransactions: full tx * sqrt2 Subset of all peers * Two types of messages * Broadcasts: * Examples: PooledTransactions, FullTransactions, Block announcements, * you dont expect a response * Request/Response: Headers * RLPx is overlaid on each stream * Discv4 * Over UDP * You set up some bootnodes to receive things from * You announce yourself to those bootnodes * Algo * You lookup your node every * You find nodes that are closest to you * You request peers from these nodes, peers respond with NodeRecord: * PeerId * IP Address v4/v6 * 2 ports: * udp for discovery * tcp for rlpx to run the protocol * default the same * You ping them, and connect with them if they pong * is there unbounded memory here due ot the size of the local hashtable? is the hashtable capped? * Network * Where do we need to block? Where are the CPU and IO bottlenecks? * Swarm: * Incoming IO: Peer reaches out to us via their own discv4. We check that the peer has not been already discovered by us. * Sessions: * * State: * List of Peers discovered via discv4 * Are we connected t othem? * Direction? * Channel to peer task * Reputation * Depends on error * TxManager: handles only pool connection * Request/Response Handler: for incoming bodies and headers requests * Algo: * Discv4 emits NodeRecord update * This goes to the State * Updates peer set * Idle * In: They discovered us * Out: We discovered them * Disconnecting * If there's room for more active sessions issue a Connect mesage to the best unconnected peer the Swarm with the PeerId and the TCP SocketAddr * Attacks on the session: * If the channel between Session Task and Session Manager is unbounded, then a peer can cause perofrmance degradation by spamming you with Disk I/O heavy requests and then you timeout and get disconnected from the network * If the channel between Session Task and Session manager is bounded, then a peer can fill up (not overflow) your message buffer and cause you to drop messages and then you get penalized from the network * So we need some kind of request-cost aware rate limit on the session task? * What do other nodes do? * Log as much info as we can about our peers * Fetcher needs to have logic for writing to the networkhandle for the penalization * Is it weird that the Network > NetworkHandle > Fetcher? * [ ] Peer Selection prioritization * [ ] By reputation * [ ] Unit tested? * [ ] Tested with other implementations? * [ ] Peers that are not currently handling already a request we've sent out * [ ] Unit tested? * [ ] Tested with other implementations? * [ ] Random * [ ] Auto Generate Json Testvectors ## Tests ### Header Stage - [ ] Where? - [ ] Tests with Local Network - [ ] Matt says we have them - [ ] https://github.com/paradigmxyz/reth/blob/main/crates/net/network/tests/it/requests.rs - [ ] Tests with other Clients - [ ] Dan is working on Geth tests ## Consensus * Verification checks exist * Some are still WIP * Engine API * # Open Questions Things we cannot figure out today but may clear out as we wire things up or community can figure out ## Header Stage & Downloaders - [ ] Does this actually work in a live setting? - Commit threshold configuration tradeoffs - Network: Higher bandwidth may mean we can commit more frequently. - Memory / Disk: If you have a lot of memory or a slow disk you should commit less frequently - * Network future continues to be called by the executor and does not stall - * I have a connection, connection dies, can this properly re-handshake with a peer? - [ ] No spam / can have long-lived connections - [ ] I am peered and I am not force-disconnected - [ ] Polling: Are the Streams and Futures sound? Do they wake up properly? ## Database - Reverse walker? not needed - DbCursorRO::current shoudln't be mutable - Document that prev and next reposition the cursor - Add a comment about append failing on non pre-sorted data - Document the behavior of `seek` and `seek_exact` - Document the position of the cursor after `delete_current` Encoding: - Partial decompression of values would be nice Tables * Is Dupsort required? * Account => Storage k-v looks like a Hashmap * Storage Slot => Value k-v looks like a btreemap *