Disclaimer: This is work in progress.
Smart contract platforms like Ethereum have network parameters that dictate how often a block is to be produced, and how much computation can be performed in a block or how big a block's size can be. The selection of these parameters determine the maximum amount of computation per second that the overall network can perform. We want to maximize that number by choosing the right parameters.
Protocol execution consists of cycles called rounds. Each round has a leader (validator), responsible for proposing a block at the beginning of that round. We mean by block time, denoted by , the length of a round. Block time is spent by first executing transactions and including them in a new block, and then by propagating that block throughout the network. Ideally, a leader should have enough time to receive and verify his predecessor's block before the previous round ends, so that he can declare it as a parent of his block when his round begins. If that is not the case, then the blockchain would fork.
To prevent validators from creating arbitrarily large blocks or blocks with arbitrarily many operations, certain limits are imposed on blocks at the protocol level. These protocol parameters are:
The focus of this work is to enable reasoning for working toward our goal of maximizing the efficiency of our platform. Specifically, we strive to maximize the network power , the number of operations that the network can perform on average. At the same time, we strive to minimize time-to-finality, a quantity directly proportional to block time. Thus, if it turned out that throughput increased with increasing block time, we would have a trade-off between overall network performance and user experience[1].
We separate block time into four phases, which make sense in the given context:
We assume that the execution and verification phases last an equal amount of time, denoted by , whereas the durations of propagation and voting phases are denoted by and respectively. Then, block time is calculated as
Here, we assume that depends on block load limit, depends on block size limit and is a constant. We give the following reasons:
A single validator has a processing power , defined as the number of operations it can perform per second. We assume that all validators have the same processing power, and the network is used at full capacity, block load being at the limit, and denote it by :
To find the network's overall processing power, , we note the following fact: In each round, the leader dictates which transactions are to be executed. Therefore, can be found by factoring out the time that is not spent by executing new transactions, i.e. propagation, verification and voting phases:
where we substituted the quantities defined above.
We make the simplifying assumption that propagation time depends only on the block size , i.e. . To substitute it in , however, we need to have it in terms of , and we cannot use an algebraic relationship express block size in terms of block load. That is because there is not a law which dictates the maximum number of operations of a program a certain number of bytes can encode. To solve this problem, we use on-chain data to obtain operation density, an empirically observed quantity defined as the average number of operations encoded per byte of transaction:
Then we can define network power as a function of block load
Before we can move onto finding maximizers of , we have to explore the relationship between block size and propagation time.
Here, is a cumulative distribution function which gives the ratio of the nodes that have received a block of size , seconds after it's mined. Being a CDF over , it satisfies the condition
Moreover, we assume that satisfies the following condition:
The number of nodes that have received a smaller block is always greater than the number of nodes that have received a larger block. In other words, we assume that a smaller block is always propagated faster than a larger block.
Then the probability (or rate) of a blockchain fork is calculated as[2]
Substituting and rearranging the equation, we can express propagation time in terms of block size :
Corollary: Propagation time is monotonically inreasing in tems of block size .
Proof: By algebra, using the second condition for given above.
This simplifies the problem of deducing whether is increasing or decreasing as goes to infinity.
In other words, if we want to keep forking rate (a.k.a. orphan, uncle, stale rate) constant, then increasing block size or load limit would certainly mean that we would have to increase block time. This is kind of obvious, but it is useful to express it mathematically.
Propagation time's rate of change with respect to determines whether network processing power increases, or decreases as we allow for bigger blocks. To illustrate this, we consider the following examples:
If where is a constant, then is monotonically increasing, the theoretical maximum is obtained when goes to infinity:
This is a purely theoretical case–-propagation time is absolutely dependent on block size.
If where and are some constants, then is monotonically increasing, and the theoretical maximum is obtained when goes to infinity:
The profile looks the same as the constant case.
If propagation time has linear complexity, any increase in block load would be an improvement in terms of network processing power. The bigger the block, the higher the throughput.
If where , and c are some constants, then attains a maximum at . It is monotonically increasing for , and monotonically decreasing for .
approaches zero as goes to infinity. Thus, we have the opposite of the cases above. Depending on where the maximum is, the bigger the block, the lower the throughput.
I gave these examples so that it's clear when we would have a trade-off between throughput and time-to-finality, and when we would not. The bottom line is, if propagation time has a complexity less than or equal to , we have a trade-off situation at hand. On other hand, if propagation time has a complexity greater than , we may or may not have a trade-off situation, depending on the parameters.
We define a simplistic gossip protocol as follows:
This goes on everyone receives the block. The resulting "time" vs "nodes which have received the block" curve follows the logistic equation.
Following the formulation above, the given CDF yields the following propagation time:
This corresponds to the linear case above, with being equal to the coefficient. If we actually had such a block propagation scheme, we would have had a trade-off between throughput and time-to-finality.
The gossip protocol described above is not used in blockchains. Block propagation in blockchains involve more messages, where miners announce the blocks they received and send requests to download the ones they don't have, in order to minimize bandwidth usage:
Deriving an explicit formula for this type of block propagation is not so easy, and perhaps not possible. However, even if we cannot derive such a formula, we could still run simulations to fit approximate models.
TBD
In other words, how fast the network can guarantee a user that a transaction is finalized and will not revert. The lower that duration, the better. ↩︎
See Fork Rate-Based Analysis of the Longest Chain Growth Time Interval of a PoW Blockchain. ↩︎