# Substrate Transaction Pool: A Closer Look ## The big picture: the transaction pool within a node This section presents the interactions between the transaction pool and other node components. The main task of the transaction pool is to provide a set of *ready* transactions that can be included in the new block to the [*block builder*](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/basic-authorship/src/basic_authorship.rs#L419). The transaction pool accepts and validates transactions received from the RPC server component, which serves the [RPC API](https://polkadot.js.org/docs/substrate/rpc/#submitandwatchextrinsicextrinsic-extrinsic-extrinsicstatus). The transaction pool's internal procedures are driven by [events](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L348-L365) notified from the input queue: *new best block* and *finalized*. With every event, the transaction pool needs to update its internal state, re-compute the set of *ready* and *future* transactions, remove transactions that were finalized or became invalid, and send out the corresponding events. The transaction pool uses the [`Chain API`](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/src/graph/pool.rs#L58-L109) to perform blockchain-related operations, such as querying the block body, computing the tree route between blocks, or checking transaction validity. Finally, the transaction pool provides the set of ready transactions to the p2p network module, which is responsible for gossiping transactions to peers. Additionally, it also notifies every newly imported transaction to the network module so it can be gossiped immediately. All of this is illustrated in the following figure: ![overview](https://hackmd.io/_uploads/rJJLNqda0.svg) When it comes to API, the transaction pool module provides the concrete implementation of following traits: - [`TransactionPool` trait](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L241-L324) - [`MaintainedTransactionPool` trait](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L381-L386) ## What is transaction? From the perspective of the transaction pool (and also the node), the transaction is just an opaque array of bytes. How does the transaction pool know the transaction attributes like priority or longevity? How does it determine if a transaction is ready, future, or stalled? The answer lies in the runtime API, specifically the [`TaggedTransactionQueue` API](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/primitives/transaction-pool/src/runtime_api.rs#L25-L55). A call to its function accepts the transaction (opaque bytes) and returns a structure called [`TransactionValidity`](https://github.com/paritytech/polkadot-sdk/blob/d591b16f6b1dec88003323cdae0c3abe3b5c9cbe/substrate/primitives/runtime/src/transaction_validity.rs#L255-L284). This call is provided by the [`Chain API`](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/src/graph/pool.rs#L58-L109) implementation, which is a transaction pool's specific abstraction of the rest of the node. ![Untitled-2024-09-18-1919](https://hackmd.io/_uploads/SJO8At_p0.svg =200x) Typically the transaction is submitted using `submit_and_watch` method which allows to track the [status](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L131-L157) of the transaction within the pool. However, it's worth noting that a fire-and-forget version also exists: the `submit` method (both methods are part of [`TransactionPool` trait](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L257-L279)). The transaction's lifecycle within the transaction pool is well described by the [`TransactionStatus` enum](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L131-L157). ## What are tags? The [`TransactionValidity`](https://github.com/paritytech/polkadot-sdk/blob/d591b16f6b1dec88003323cdae0c3abe3b5c9cbe/substrate/primitives/runtime/src/transaction_validity.rs#L255-L284) provided by [`TaggedTransactionQueue` API](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/primitives/transaction-pool/src/runtime_api.rs#L25-L55), called by transaction pool, contains two tags fields: `provides` and `requires`. On the node side, these tags are (again) the opaque bytes provided by the runtime. Tags allow building the dependency graph between transactions. The transaction pool does not interpret tags. Each transaction may require certain tags, and each transaction provides some tags. The runtime implements all blockchain logic, including how transactions depend on each other. On the (frame-based) runtime side, the entire `TransactionValidity` structure (including its `requires` and `provides` tags) is built by `SignedExtension`s, which are invoked during the transaction validation process. A representative example of a `SignedExtension` providing tags is [`CheckNonce`](https://github.com/paritytech/polkadot-sdk/blob/d4c426afd46f43b81115911657ccc0002a361ddb/substrate/frame/system/src/extensions/check_nonce.rs#L116-L121). ![signed-extensions](https://hackmd.io/_uploads/Sy8abqu6C.svg) The transaction is *ready* for inclusion if all *required* tags are satisfied (by another transaction or a chain of transactions that are also *ready*) or if the *required* tags are empty. Otherwise, the transaction is considered *future*. #### Example This section provides in-depth examples of tags and dependencies between transactions. Let's assume that transaction `P` has: - empty `requires` tags. This means that it is *ready* for inclusion, - `0x0A01` in its `provides` tags. If some other transaction `Q` has `0x0A01` in its `requires` tags, it means that transaction is *future*. If there is some other *ready* transaction (e.g., `P`) in the pool that has `0x0A01` in its `provides` tags, then `Q` also becomes *ready* (because if `P` gets included in the block, the prerequisite for `Q` is automatically satisfied). *Note*: notation `0x0A01` presents a simplified version of the tags actually used in the `CheckNonce` extension. The first byte represents the account (`0x0A`) and the second is the nonce value for this account (`0x01`). It is important to note that the values of `require` and `provide` tags (as well as the entire `TransactionValidity` struct) depend on the current chain state. The values may be different depending on which block the transaction was validated. The figure below depicts this: ![nonces-examples](https://hackmd.io/_uploads/H1i7EqdpR.svg) Last example, showing different *ready* and *future* set computed among different subsets of `P`,`Q`,`R`,`S`,`T` transactions assuming their tags are as follows: | tx | `provides` | `requires` | |-----|------------|------------| | `P` | `0x0A01` | empty | | `Q` | `0x0A02` | `0x0A01` | | `R` | `0x0A03` | `0x0A02` | | `S` | `0x0A04` | `0x0A03` | | `T` | `0x0A05` | `0x0A04` | The figure below presents different subsets of the above transactions and shows how they are grouped into ready (*green*) or future (*red*) sets within the transaction pool: ![ready-future-set](https://hackmd.io/_uploads/S1Uiwc_p0.svg =600x) ## Recap: current transaction pool architecture This section is for readers who may not be familiar with the current design (*old* aka *single state*) of the transaction pool. It briefly presents the state of the art. ### Single view on ready/future transactions The singular instance of the structure called [`BasePool`](https://github.com/paritytech/polkadot-sdk/blob/ea4085ab7448bb557a1558a25af164cf364e88d6/substrate/client/transaction-pool/src/graph/base_pool.rs#L211-L221) holds two sets of transactions: *ready* and *future*. This single *view* (or maybe *state* is a better term here) of the transactions within the pool is continuously updated. The update process is driven by two events: *new best block* and *finalized*. These events activate two internal processes in the transaction pool: maintenance and revalidation, which are responsible for updating the *ready* and *future* sets. ### Submit New transactions are submitted to the transaction pool at a specific block. The pool validates the transactions with the runtime against the state for this block, and subsequently places the transactions in either the *future* or *ready* set. Upon importing a new transaction into the transaction pool, a notification is dispatched to the listeners. As the transaction progresses through the pool, its [status](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/transaction-pool/api/src/lib.rs#L131-L157) is continually updated and notified via the listener. ### Maintenance This is a process triggered by *new best block* or *finalized block* notifications. This process: - removes (prunes) the transactions included in the notified block from the transaction pool, - checks what tags were provided by the transactions included in the notified block, and updates the *ready* and *future* sets within the transaction pool. *New best block* (or *finalized block*) events may be reported for blocks originating from a different fork compared to what the transaction pool is aware of. If that is the case, transactions need to be resubmitted from blocks on the retracted fork and the transactions on the enacted one should be pruned. For more details and examples, refer to the [maintenance appendix](#Appendix-dive-into-the-maintenance-details). ### Revalidation It is a process that periodically validates transactions from the pool against the tip of the chain that the transaction pool is aware of. Obviously, it updates *ready* and *future* sets. It is triggered by every maintenance. ### `ready_at` Provides a future that resolves to an iterator over the *ready* transactions for the requested block. `ready_at` operates on the block height and, by itself, does not perform any computations. It internally waits for the maintenance process to be finished. This is the main entry point used by [*block builder*](https://github.com/paritytech/polkadot-sdk/blob/c77095f51119d2eccdc54d2f3518bed0ffbd6d53/substrate/client/basic-authorship/src/basic_authorship.rs#L419). ### `import_notification_stream` Every transaction imported to the transaction pool is sent to this public stream, allowing listeners to be notified about newly imported transactions. The networking module listens to the events within this stream. ### Problems with current implementation The `ready_at` method operates on block height. It is not synced with the maintenance process in terms of forks. This results in two major issues: - When building new blocks on top of blocks that are not *best* or *finalized*, the `invalid::stale` error occurs. This is because the transaction pruning was not executed on block import. ``` B0 -- B1[u0] -- B2[..] // If the maintenance was *not* triggered for B1, // ready_at will provide u0 when building B2 , // (which is stale from B1 perspective) ``` - when constructing new blocks on an alternative fork, the `invalid::future` error might arise. This occurs when blocks on the alternative fork lack transactions that serve as prerequisites for transactions present in the ready pool. As the maintained fork contains these prerequisite transactions the ready set would comprise transactions that are considered future on the alternative fork. See the figure below: ``` B1[u0,u1]--B2[u2] //u3 is ready, after maintenance was triggered for B2 / B0 \ C1[t0]--C2[t1]--C3 //when building C3 (with maintenance triggered recently for B2) ready_at will provide u3, //which is future from B2 perspective ``` ## Fork-aware transaction pool ***Note:*** This section provides high level overview of fork-aware transaction pool, and is just excerpt from the fork-aware-txpool module, which can be generated with: ``` cargo doc --document-private-items -p sc-transaction-pool ``` ### Structures #### View The main responsibility of the `View` is to provide the valid set of *ready* transactions at the given block. The transaction pool keeps the number of recent views for all the blocks notified since the recently finalized block. The views associated with blocks at the tips of the forks are actively updated with all newly incoming transactions, while intermediate views are frozen and are not updated (due to performance reasons - we need to validate transactions at every view). ![views](https://hackmd.io/_uploads/Hkr7qhuaC.svg =500x) The view just represents the state (*ready* and *future* set) of the transaction pool at the given block. Views are created when the new chain events are notified to the pool. When the view is no longer at the tip of the forks, it is moved to the inactive views set. When the block number of the view is lower than the finalized block, the view is permanently removed. #### View store `ViewStore` is the helper structure that provides means to perform actions like submitting transactions to every view. It keeps track of both active and inactive views. The views at the tips of the forks are active, while the intermediate views are inactive. ![view-store](https://hackmd.io/_uploads/r1C7q3OaA.svg =500x) #### `TxMemPool` The main purpose of an internal `TxMemPool` (referred to as *mempool*) is to prevent a transaction from being lost, e.g., due to a race condition when a new transaction submission occurs just before the new view is created. This could also happen when a transaction is invalid on one fork but could be valid on another which is not yet fully processed by the maintenance procedure. Additionally, it allows the pool to accept transactions when no blocks have been reported yet. Once the view is created, all transactions from the *mempool* are submitted to, and validated at, this view. The *mempool* removes its transactions when they get finalized. The transactions in the *mempool* are also periodically verified at every finalized block and removed from the *mempool* if no longer valid. This process is called *mempool* revalidation. #### Multi-view listeners There are a number of event streams provided by individual views: - transaction status, - ready notification (see networking section), - dropped notification. These streams need to be merged into a single stream exposed by the transaction pool (or used internally). These aggregators are often referred to as multi-view listeners, and they implement stream-specific or event-specific logic. Every listener has different logic. The following figures show the flows. ##### Status stream Created for every watched transaction. Merges events from views and exposes a side channel allowing control commands to be sent. ![multi-view-listener-statuses](https://hackmd.io/_uploads/r1gIc3OaR.svg) ##### Ready transactions stream Aggregates events about hashes of newly imported transactions from the views, filters them to avoid duplicates, and exposes them to the networking module. ![multi-view-listener-ready-txs](https://hackmd.io/_uploads/ryxU5nOTR.svg) ##### Dropped transactions stream This listener provides a stream of transactions that were dropped from every view. ![multi-view-listener-dropped](https://hackmd.io/_uploads/SJ6ZUBF60.svg) ### Submit The submitted transaction is added to the *mempool* and if it is not rejected by it (e.g., due to size limits), it is also submitted to every active view. Every active view validates the transaction with the runtime against the state of the block associated with the view and subsequently places the transaction in either the *future* or *ready* set. Upon importing a new transaction into the transaction pool, a notification (if required) is dispatched to the listeners. ### Maintain Maintaining in a fork-aware pool is similar to the [maintain](#Maintenance) in the legacy txpool, however, it needs to handle view creation and take into account transactions from the mempool. The following example should give an overview of what happens during this process. #### Example If the new block (`B2`) actually needs to be handled, the following steps are executed: - Find the best view and clone it to create a new view (`B2'` in the figure is cloned), - Update the view with the transactions from the *mempool*, - Update the view with the transactions from the tree route: - Resubmit the transactions from the retracted blocks (`B1'`, `B2'`) to the new view, - Prune extrinsics from the enacted blocks (`B1`, `B2`), and trigger `InBlock` events, ![create-view-re-org](https://hackmd.io/_uploads/HkeOqhO6A.svg) ### Light maintenance The maintenance procedure can sometimes be quite heavy, and it may not be accomplished within the time window expected by the *block builder*. On top of that, the *block builder* may want to build a few blocks in a row, not giving the pool enough time to complete the required maintenance process. To address this, a light version of the maintenance procedure was introduced. It finds the best view, clones it, and prunes all the known transactions that were included in the enacted part of the tree route from the base view to the block at which a ready iterator was requested. No new transaction validations are required to accomplish this, so it is expected to finish immediately. The *ready* set provided by light maintenance is not perfect, but in many cases, it is good enough to reduce the number of empty blocks. ### `ready_at` The `ready_at_with_timeout` function returns a future that resolves to the ready transactions iterator. The block builder shall wait either for the future to be resolved or for the timeout to be hit. In case the maintenance process was not accomplished in the requested time window, the result of light maintenance is returned. This should reduce the number of empty blocks. ### Revalidation The maintenance procedure shall be as quick as possible, so the heavy revalidation job (which involves validating the transactions) is delegated to the background worker. View revalidation is executed for every view. All the transactions from the view are revalidated. The maintenance and view revalidation processes are executed exclusively. The view revalidation process is triggered at the very end of the maintenance process and stopped at the very beginning of the next maintenance execution. The results from the revalidation are immediately applied once the revalidation is terminated. The view that will be cloned from the revalidated view will be up-to-date in terms of transaction validity. The revalidation of the *mempool* is triggered on every finalized block. If a transaction is found to be invalid, the `Invalid` event is sent and the transaction is removed from the mempool. ## Appendix: dive into the maintenance details For every new block notified, the transaction pool needs to perform maintenance. It needs to rebuild the current state of the transaction pool, depending on which transactions are included in the enacted fork. If there was a re-org, the transactions from the retracted fork need to be resubmitted; otherwise, they could be lost and not included in any block. If any known transactions were included, finalized, or became invalid, appropriate events shall be dispatched. This process was implemented for the old transaction pool and is also used in the new, fork-aware implementation to update the newly created view. The material to follow presents examples that aim to help understand the maintenance process. ### Promoting transaction to *ready* The following figure tries to explain what happens during maintenance for the super simple scenario. ![promoting-few-blocks](https://hackmd.io/_uploads/rk4cTjOTR.svg) First, at `T0` timestamp, `P,Q,S` transactions are submitted to the empty transaction pool (we stick to the notation used earlier in this document) meaning that `S` becomes *future* while `P,Q` are *ready*. The `Future` event is emitted for `S` and `Ready` event for `P,Q`. Later, at some point `T1`, a new best block event is notified to the transaction pool. The tree route within this event is `B0,B1,B2`. Block `B1` contains transactions `P,Q`. Block `B2` contains the transaction `R` (which is a prerequisite for `S` to become *ready*). Please note that `R` at this point is not known to the transaction pool. The transaction pool needs to inspect every transaction included in every new block that was notified in the event's tree route. If the included transaction is known to the transaction pool (which is the case for `P,Q`), it will send the `InBlock` event and will prune the transaction from the pool (or from the view for fork-aware implementation). If a transaction is not known (`R`), the transaction pool needs to validate it against the runtime to get the transaction's tags. Once these tags are known, they are pruned from the pool, meaning that: - all other transactions providing these tags will be notified as invalid and removed from the pool (it is not depicted in this example) - all future transactions that require these tags can be promoted to *ready* (which is the case in our example -- `S` becomes *ready*). The `Ready` event will be dispatched. ### Handling re-org I This simple example explains why resubmission of transactions from retracted blocks is required. First, at `T0` timestamp, the `P` transaction is submitted to the empty transaction pool, meaning that `P` becomes *ready* (an event is dispatched). Later, at some point `T1`, a new best block event is notified to the transaction. The tree route within this event is `B0,B1`. Block `B1` contains transaction `P`. It is known to the transaction pool, so the `InBlock` event is emitted and the transaction is **removed** from the pool. Later, at `T2` timestamp, another best block notification arrives. The tree route contains the retracted fork `B1` and the enacted fork `B1'`. Note that `B1'` does not contain any transactions related to the transaction known by the pool. The transaction pool inspects transactions on the retracted fork (`P`) and resubmits them to the transaction pool—we don't want to lose them. Once the transaction is sent back to the transaction pool, a `Retracted` event is dispatched. In our example, the `P` transaction becomes `Ready`, which is also notified. ![re-org-simple](https://hackmd.io/_uploads/SygsuSnOT0.svg) ### Handling re-org II First, at the `T0` timestamp, `P, Q, S` transactions are submitted to the empty transaction pool, meaning that `S` becomes *future* while `P, Q` are *ready*. The *future* and *ready* events are emitted appropriately. Later, at some point `T1`, a new best block event is notified to the transaction pool. The tree route within this event is `B0, B1`. Block `B1` contains transactions `P, Q`. They are known to the transaction pool, so the `InBlock` events are emitted, and the transactions are removed from the pool. Finally, at `T2`, another best block arrives. The tree route contains the retracted fork `B1` and the enacted fork `B1', B2'`. Note that `B1'` contains transactions `P', Q'`, which means that they provide exactly the same tags as transactions `P` and `Q` but their *hash* is different (they could be sent at the same time from different wallets and included into a block by some peer on the other side of the network). In this scenario, transactions `P, Q` got retracted and later invalid, which is notified by appropriate events. Block `B2'` contains the transaction `R`, which is a prerequisite for `S` to become ready. As `R` at this point is not known to the transaction pool, it will also be validated to get tags. Once `R`'s tags are pruned, `S` will get promoted to *ready*. ![re-org](https://hackmd.io/_uploads/rkXWOodpA.svg) ## Extra reading [1] [txpool readme.md file](https://github.com/paritytech/polkadot-sdk/blob/master/substrate/client/transaction-pool/README.md) [2] [PR: Generic Transaction Queue logic in runtime](https://github.com/paritytech/substrate/issues/728)