Understanding optimal message selection algorithm [WIP doc]

# Understanding optimal message selection algorithm [WIP doc] This document describes the optimal message selection algorithm used in lotus and forest for selecting messages for inclusion in a block from a filecoin node. > Relevant PR: [#1086](https://github.com/ChainSafe/forest/pull/1086) # The Message Pool The [message pool](https://github.com/ChainSafe/forest/tree/main/blockchain/message_pool) is a component in forest that stores a collection of messages received by a filecoin node, before they are added to the filecoin blockchain. These messages maybe the node's locally added messages or received from other nodes. These messages might come in any order. So a message with a high gas premium might end up being at the very end of the list, rendering it to be processed later, while a low priority message might be at the front of the queue. As a result, some pre-processing is required by the node so that the messages are processed in a fair manner which respects the clients expectations. # Message selection When a miner is ready to create a block for a tipset, it invokes one of the implemented message selection algorithms. At present, there are two algorithms for selecting messages: ## The greedy message selection algorithm A naive algorithm which simply selects messages as they are pushed in the queue, sorts them and just maximizes for maximal gas reward. It doesn't take into account the dependencies between the next and prev nodes for a given node and the block probability. One of the drawbacks of this selection is that most nodes end up selecting the same set of messages, wasting 80% of the chain bandwidth. ## The optimal message selection algorithm As the name suggests, it's a better alterative to greedy selection. It's performed when the ticket quality is less than `0.84`. It provides a fair throughput of message selection by selecting messages based on various factors such as block probability and previous and next chain nodes. The block probability is factored in and is the probability that they are included in some prior block and is used to avoid duplicate messages. ## The message chain node abstraction The optimal and greedy selection algorithm both use an abstraction called a `MsgChain` and `MsgChainNode` to group messages based on effective gas performance. MsgChain groups together messages with a non-decreasing effective gas performance. ### What is effective gas performance? In this document, we discuss optimal message selection: # Steps in optimal message selection 1. First we compute the base fee. Every block has a base fee. Base fee is a unit price. similar to gasLimit (which is set by clients). Base fee is a dynamic value and helps protect against misuse of price setting by clients leading to DOS attacks. 3. We get the list of messages in the message pool. This comes from the `pending` hashmap. 4. Then we select any priority messages that we need to. 5. Then we create chains of message for each actor, which is explained further below. 6. We then trim and invalidate the dependendent chains based on the invariants. # Create message chain API When selecting messages, a vector of these chains are created per actor. Each actor's list of msg chain nodes are linked doubly. So the final list of chain nodes we get, looks something like this: ``` [ actor 1: [] <-> [] <-> [] <-> [] actor 2: [] <-> [] <-> [] ] ``` So it's a list of list of individual msg chains. ## 0.84 ticket quality number The ticket quality is a VRF token assigned to a miner when it is elected to mine blocks as part of the expected consensus process. `0.84` is the ticket quality for which the probability distribution maximizes the first block probability, at which point the greedy selection is sufficient. ## Partition step The partitions step after create message chain is needed to avoid duplicate messages and to increase chain throughput. The greedy algorithm results in all miners selecting the same messages and wasting 80% of the chain bandwidth. The probability numbers for each chain indicates the probability that they are included in some prior blocks. ## The prev and next links The prev and next link for each msg chain node is necessary as they enforce the invariants of the messages dependencies. The dependencies between message chains stem from the fact that you can't include a chain in a block without including the previous (dependent) chains or else you would produce an invalid block that has nonce-gaps and messages that fail to execute. This stems from the fact that miners don't communicate with each other, hence they don't know the tickets of each other. Coupled with the random nature of tickets, you can't know whether the dependent chains have been included in a prior block in the same tipset by another miner with a better ticket. So you have to include them in your block. ## Trimming Trimming is neccessary to remove negative performing messages that would be unprofittable to include compared to some other chain that fits our remaining gas limit. ## Msg Chain design The premise of message chains is that they group together messages with a non-decreasing effective gas performance. The point is that we want to avoid head of line blocking if we happen to have one or more low performing messages that are followed by a message with high enough premium to amortize the previous messages. The basic invariant is that the effective gas performance of each chain is higher than the effective gas performance of a subsequent chain, or else they could be merged into a new chain with higher performance than the first chain. The dependencies between message chains stem from the fact that you can't include a chain in a block without including the previous (dependent) chains or else you would produce an invalid block that has nonce-gaps and messages that fail to execute. ## Lotus's implementation (Golang) Lotus implements MsgChain via pointers to the msgChain structure. The [msg chain structure](https://github.com/filecoin-project/lotus/blob/86d9372b30478eca0e97e521ad952dad64c7bb6e/chain/messagepool/selection.go#L27) has next and prev fields which are pointers to the same structure. ## Forest's implementation (Rust) Forest implements MsgChain in a different way because we can't have next and prev pointers to the same MsgChain struct because of borrow checker restrictions. The access pattern of msg chain fields in optimal selection algorithm is to be able to have multiple mutable access to next and prev fields. But, Rust allows having only one mutable access to an object in a given scope. To get around the borrow checker, we use unique keys to msg chain nodes in the next and prev fields in msg chain node struct. All the nodes are kept in a hashmap (we use slotmap crate as it provides us with a unique key on insertion). We also keep a vector of keys that points to these nodes to allow for iteration in insertion order. ## Misc notes * Every block has a base fee. * Base fee is a unit price. similar to gasLimit (which is set by clients). * Base fee is dynamic and helps protect against misuse of price setting by clients leading to DOS attacks. * Gas premium is also something set by senders to incentivise miners to pick most profitable messages. * During message selection we get two tipsets: the msg pool's own curTipset and the invoking miner's tipset. * The target tipset: It's the tipset passed to msg pool select messages API, and that is the tipset the miner wants to build on top of. ## Acknowledgements I would like to thank my team mates at Chainsafe and Dimitris and Kuba (from protocol labs) who were of immense help in putting this across.