# 44444 Plan and Summary ## Problem **Definitions for context** Block History: This refers to all of the blocks that have been made since genesis. State History: This refers to set of all states after each block has been applied to the previous state, starting from genesis. > For example, At genesis, the state is $S_0$, after block 1 has been applied, the state is then $S_1$. The state history refers to the set of all $S_n \forall n$. Full nodes: Store the latest state and the block history. Archives nodes: Store the state history and the block history. *Clarification on definitions* If we are at block 500 and I ask a full node to give me my balance at block 100. It would not be able to because it does not know what my state was at block 100, only my state at block 500. This is because it does does not have state history. If I were to ask the same full node for block 100. It would be able to give this me because it has block history. An archive node is able to give me my state(balance etc) at whichever block I want in the past on top of blocks in the past, since they have both state history and block history. ### Problem Definition Block history takes up a lot of space and once that block has been finalized, it is only needed for limited usecases that are not consensus critical. We will elaborate on what these are. ### TLDR Solution (44444) Block history will no longer be stored permanently by full nodes. After some period of time it will be removed from nodes, and entities that need it, will need to query for it from somewhere else. > Sidenote: It is not removed immediately because we want to keep history around according to the weak subjectivity time period. This is on the scale of three months. ## Users of History - Archive nodes - Applications and end users **Archive nodes** Archive nodes need block history in order to compute state history. The state history can then be used to do queries like "What was my balance at block 1". > Erigon3 will distribute state history over a torrent instead of computing it. > The status quo is that archive nodes ask the network for blocks, since full nodes and archive nodes are storing block history. ~~The solution we arrive at *could* be used to solve state history too, however we are only solving block history, and so we do not elaborate on this further in this article.~~ Another thing to note; the end users *could* just rely on a set of altruistic archive nodes, however the requirements for running an archive node will get big(currently 2TB for erigon and reth) that this would be akin to having centralized history providers. **Application and end users** Applications and end users need block history for answers to queries like "What is the transaction that corresponds to a given transaction hash" > The status quo is that applications ask either full nodes or archive nodes, since they have the block history or they ask a centralised provider like infura. > Note: These types of query rely on block history and not state history. Querying state history is out of scope. The status quo has not changed in this regard. TLDR: Archive nodes need block history. Applications and end users need to be able to efficiently query parts of block history. ## Requirements - We want to delete the block history from full nodes. Archive nodes by definition will still have it. - We want assurance that the block history will be available once full nodes remove it, without the *need* for centralized entities. - *Optionally*: We ideally want a strategy for archive nodes to download block history quickly, since they will be downloading hundred of gigabytes of data. ## Solution The solution is to use a minified version of portal network (Portal history network). The portal network can be seen as decentralised cdn with verifiable point queries. BitTorrent doesn't work because when a client asks for a block with a corresponding block hash. They could be sent a BitTorrent link that does not correspond to the block hash. However, 44444 can be done with a much simpler version of portal. The main downside of the portal network is that it is made to allow granular queries about block history like "Give me the transaction that corresponds to this transaction hash" however, it is not great at range queries, ie "Give me all blocks between block 2000 to 3000". We will address this issue with torrent files and a special P2P networking layer for archive nodes (Ahmad EIP). These are non-critical for 44444. ### Critical Action items Critical refers to these action items being blocking for 44444. - Figure out what we need from portal specs - How often are historical blocks synced to the portal history network - Example: If I am block 20Million, and say six months in the past is block 15Million. This means that on initial startup I keep blocks between 15Million and 20Million. 0 to 15 million is assumed to be on portal. - A day passes, do I expire one days worth of blocks and push them portal or do I wait longer? - Should we write first to portal network, and then wait a few days for it to be seeded to portal network and then expire it from ethereum? How long should we wait? - Write and finalize these minified portal specs - Figure out what a worse case attack on the (minified) portal network looks like and what it means for ethereum. - Clients will then write their own implementations for these minified portal specs and integrate into their clients. Clients can refer to the reference implementation called fluffy. - Nethermind (ahmad@nethermind.io) - Nimbus (github: @kdeme) - github issue: #2147 (status-im/nimbus-eth1) - Geth (Felix*) - Reth (Georgios) - Erigon (Andrew) - Besu (Justin Florentine*) - Full nodes need to set a minimum requirement of how much block history will be stored as part of their contribution to the portal network. - Keep security team updated at each step (Fredrik) ### Non-critical Action Items - Figure out a strategy for archive nodes to torrent large ranges of blocks. (jacek@status.im) - This surfaced conversations around data format that are currently unresolved. - Figure out a strategy for archive nodes to still ask for historical blocks over the p2p network. - The rough idea is that archive nodes will now sync over a special wire protocol for historical blocks. (ahmad@nethermind.io) - (Georgios) - Archive nodes should not penalize a 44444 node if they don't have the historical data