owned this note
owned this note
Published
Linked with GitHub
# Forecasting the sustainability of blockchain scaling
*Thanks to Ed Felten, Frederico Lacs and Patrick McCorry for comments and feedback*
Blockchains have failed to meet the throughput demanded of them. Since the end of 2016 on [Bitcoin](https://bitcoinvisuals.com/chain-block-weight) and the end of 2017 on [Ethereum](https://etherscan.io/chart/networkutilization) the frequency of full blocks has shown that demand for blockspace is outstripping supply. Transaction fees have risen, pushing out the users and use cases that can’t afford them.
The naive response is to increase the supply - bigger blocks - but this has knock on effects to the security and decentralisation of the protocol. Bigger blocks means the nodes in the network need better hardware to validate blocks. Node operators will need to pay higher costs to purchase the hardware, meaning less of them will do so. But to remain secure a [wide variety of users need to be able to validate blocks]([here](https://vitalik.ca/general/2021/05/23/scaling.html)).
Instead the search is on to find methods of scaling that do not increase the hardware costs of the average user. However some methods of increasing throughput are one-off improvements - eg a more efficent data structure for storing state data, or a cheaper method to verify user signatures. In these cases the gains will not last, increasing blockchain demand will eventually consume a one-off increase in supply. At that point another improvement would need to be found to once again provide more supply.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:600px">
<img style="" src="https://i.imgur.com/Gct3dKn.jpg">
<figcaption style="font-size:14px; text-align">In ancient Greek mythology Sisyphus was doomed to carry a heavy boulder to the top of a hill only for it to roll back down, and to repeat this task for eternity.</figcaption>
</figure>
</div>
This kind of consistent innovation does exist for other computation resources leading to exponential improvements in performance - Moore’s law. However a Moore’s law for blockchain throughput has yet to emerge, or if it has it’s still being outstripped by demand as evidenced by the climbing transaction fees.
Instead, it would be more ideal if a single method could be found that could scale sustainably into the future without the need for consistent protocol innovation. To analyse whether protocols scale sustainably into the future we need to model the growing demand for blocks, the requirements the protocol makes of computational hardware and the declining costs of hardware.
In this article we’ll:
* Look at the scalability properties of a potential Ethereum upgrade “danksharding”
* Show that the network consumption of nodes should be sustainable for block builders, archive nodes and full nodes.
* Show that staker nodes are sustainable only if demand for blockspace grows at less than half the rate at which hardware costs decline.
* Generalise the approach taken to analyse danksharding to a general model for forecasting blockchain scalability.
## Danksharding
[Danksharding](https://notes.ethereum.org/@dankrad/new_sharding) is a proposal that would introduce a new type of blockchain resource to improve the throughput of Ethereum as a data availability provider. In this post we'll attempt to analyse whether danksharding can scale into the future, or whether the boulder will eventually roll back down the hill, requiring the advent of another, better, data availability technique.
Danksharding defines unstructured data blobs that are provided in a new transaction type. These data blobs are then erasure coded and divided amongst groups of nodes (shards), who can then seed their portion of the data to other nodes. If enough nodes attest that they have seen enough of the data then a block is accepted as valid, and the data in the blobs is accepted as available.
Whilst danksharding impacts many underlying hardware resources eg storage, compute and memory, in this article we’ll focus only on its long term impact to **network bandwidth costs**.
We'll take the following approach to the analysis:
1. Estimate what volume of data blobs the blockchain will need to provide over time to keep up with demand.
2. Estimate how the unit price of network bandwidth may change over time.
3. Evaluate how units of blob data map to units of network bandwidth, for the different types of node in the system.
4. Combine these to get an estimate for the cost of running different types of node over time.
5. Determine any cost constraints that the protocol imposes on the different types of node
6. Evaluate whether the running costs will remain within the constraints to determine whether each of the node types is sustainable long term.
### Forecasting data blob demand
In order to provide sustainable scale, blockchain throughput needs to keep up with demand. Blockchain protocols offer fundamentally new and unique resources. Their censorship resistance, transparency and immutability enable new classes of platform and application.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:800px">
<img style="" src="https://i.imgur.com/SWH3F7c.png">
<figcaption style="font-size:14px;">Broadband data usage in the UK appears to be increasing exponentially - https://www.ofcom.org.uk/__data/assets/pdf_file/0015/222324/affordability-of-communications-services-summary.pdf</figcaption>
</figure>
</div>
If transaction costs stay the same, which they should if throughput can keep up with demand, then we would expect the demand for blockchain resources to grow in a similar way to other fundamental digital resources - exponentially. So for this model we’ll assume that a “scalable” blockchain is one that can provide an exponentially increasing throughput.
$$D(t)= d_{1}e^{d_{2}t}$$
### Forecasting network bandwidth costs
With breakthroughs in technology and infrastructure, hardware gets more powerful. For network bandwidth the rate of these improvements is characterised by [Nielsen's law](https://www.nngroup.com/articles/law-of-bandwidth/) which predicts that network bandwidth of a high end user will grow by 50% each year. In many cases this translates to an exponentially decaying cost per unit of hardware resource, as users spend the same but get better technology. Although I haven’t found good data for cost per GB of network bandwidth over time, I have found data showing monthly spend on network bandwidth staying roughly constant.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:800px">
<img style="" src="https://i.imgur.com/hcYMFBK.png">
<figcaption style="font-size:14px;">Average broadband expenditure in the UK appears to be roughly constant - https://www.ofcom.org.uk/__data/assets/pdf_file/0015/222324/affordability-of-communications-services-summary.pdf</figcaption>
</figure>
</div>
Combining this with Nielsens law, we can intuit that cost per GB of bandwidth is decaying exponentially.
$$C(t)=c_{1}e^{-c_{2}t}$$
### Node types
Danksharding describes a number of different types of node, each of which play different roles in the blockchain system. For this post we’ll analyse the block builder, staker, full, and archive node types. In order to determine the network bandwidth requirements of running these node types we need to describe as accurately as possible the roles these nodes play in the network, how that relates to bandwidth consumption and what cost constraints these node have.
### Block builders
In the [PBS model](https://ethresear.ch/t/two-slot-proposer-builder-separation/10980) builder nodes are responsible for gathering transactions from users and forming them into blocks. It's questionable whether decentralisation in this role is necessary. If [cr-lists](https://notes.ethereum.org/@fradamt/H1ZqdtrBF) are implemented then even a centralised builder network shouldn't be able censor. This needs further investigation and research, but for the purposes of this post we'll assume that a centralised builder system is acceptable, and therefore the only constraint we have on the sustainability of builders is that the builder node remains profitable to run:
$$K(t) - R(t) < 0$$
where $K(t)$ is the cost to run and $R(t)$ the revenue from running the builder at time $t$
Block builders do not make profit directly from the protocol but instead make profit by collecting and ordering transactions - MEV. We'll assume here that revenue from MEV is captured in value proportional to the shard blob throughput (althougth this relationship may actually be superlinear).
$$R(t) = \alpha D(t)$$
Block builders create full blocks and then transmit them over the network, therefore their network usage increases proportionally to the block size/throughput. We can therfore describe the cost of running a node as a function of throughput and network bandidth cost as:
$$K(t) = \beta D(t) C(t) $$
where \beta represents the proportionality between throughput and network usage.
Substituting this with our assumptions for throughput and network hardware costs, and comparing to revenue we have:
$$K(t) - R(t) < 0 $$ $$\beta D(t) C(t) - \alpha D(t) < 0 $$ $$\beta c_1e^{(d_2-c_2)t} - \alpha e^{d_2t} < 0$$
This function has two basic forms, dependent on the constants. One where the first term is initially larger than the second, yielding positive results, before the second term begins to dominate for higher $t$. And the other where the second term dominates from the beginning.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:266px">
<img style="" src="https://i.imgur.com/kou2Zh7.png">
<figcaption style="font-size:14px;">The first term dominates initially, however the second term always dominates for high enough <b><i>t</i></b></figcaption>
</figure>
<figure style="display: flex; flex-flow: column; max-width:266px">
<img style="" src="https://i.imgur.com/WVwRtbx.png">
<figcaption style="font-size:14px;">The second dominates from the beginning</figcaption>
</figure>
</div>
This means that, depending on the values of the constants, danksharding may initially be unsustainable for a block builder - their costs will outweigh their revenue. However, regardless of the constants, block building eventually becomes sustainable as the profit remains proportional to throughput, whilst their costs are lowered by decreasing hardware costs.
### Stakers
These nodes participate in the Proof of Stake consensus protocol. They both propose the blocks that they receive from the builders, and attest to the validity of the blocks proposed by other stakers. Decentralisation is important in this role to maintain censorship resistance and protect against malicious re-orgs. We can preserve decentralisation in this role by ensuring that the **cost to run a staker node never exceeds some specified limit**, making it possible for a diverse set of participants to take part.
$$K(t) - Q_S < 0$$
where $Q_S$ is the maximum cost we expect a staker to be able to expend on network hardware
Stakers do not receive the full data from builders. The builders form the data into a square, erasure code it both horizontally and vertically, then split it into rows and columns. Each row and column gets seeded into its own sub p2p network layer. The stakers then subscribe to a fixed number of column p2p networks, and a fixed number of row p2p networks. Since the data is arranged in a square but stakers only sample rows and columns, they receive data proportional to the square root of the full data size.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:495px">
<img style="" src="https://i.imgur.com/bWIeoD2.png">
<figcaption style="font-size:14px;">Stakers only sample a fixed number of rows and colums of the full data, meaning that that their network usage grows proportionally to the square root of the full data size. Image take from https://www.youtube.com/watch?v=e9oudTr5BE4</figcaption>
</figure>
</div>
Therefore their network usage should be proportional to the square root of the throughput. We then describe the cost of running a staker as:
$$K(t) = \gamma\sqrt{D(t)} C(t) $$
where $\gamma$ is a proportionality constant between shard blob data throughput and network usage.
Substituting in our assumptions about throughput and network hardware cost, and comparing to our constraint for decentralisation we have:
$$K(t) - Q_S < 0 $$ $$\gamma\sqrt{D(t)} C(t) - Q_S < 0 $$ $$\gamma\sqrt{d_1}c_1e^{(d_2/2-c_2)t} < Q_S$$
Depending on the comparative values of the constants $c_{2}$ and $d_{2}$ we get different results as to the sustainability of running a staker in a danksharding scheme. If throughput growth rate exceeds twice the cost decay rate then cost of running a staker will increase exponentially over time, eventually certainly surpassing the fixed limit of running a staker $Q_S$. However if throughput increases at anything up to twice the rate at which hardware decreases then cost of running a node should decrease over time.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:266px">
<img style="" src="https://i.imgur.com/HwKBeYd.png">
<figcaption style="font-size:14px;">Cost function K for stakers evaluated against an arbitrary limit Q for d/2 > c</figcaption>
</figure>
<figure style="display: flex; flex-flow: column; max-width:266px">
<img style="" src="https://i.imgur.com/FxsZ8dG.png">
<figcaption style="font-size:14px;">Cost function K for stakers evaluated against an arbitrary limit Q for d/2 < c</figcaption>
</figure>
</div>
### Full nodes
Full nodes verify the validity of new blocks, but do not participate in their creation. It’s imperative for the safety of the network that average users can [run a full node](https://vitalik.ca/general/2021/05/23/scaling.html). For this to be the case the cost of running a full node should exceed a specified limit, and that limit should probably be even lower that that required for staking:
$$K(t) - Q_F < 0$$
where $Q_F$ is the maximum cost we expect a staker to be able to expend on network hardware
The erasure coding of the data means that full nodes can receive a probabilistic guarantee that the full data exists by requesting random samples of points of the data. In order to achieve this probabilistic guarantee full nodes need only make logarithmic network requests - each of a constant size.
$$K(t) = \delta\ln{(D(t))}C(t) $$
where $\delta$ is a proportionality constant between shard blob data throughput and network usage.
Combining this with assumptions for throughput and hardware costs, and comparing it to the decentralisation cost constraint we have:
$$K(t) - Q_F < 0$$ $$ \delta\ln{(D(t))}C(t) - Q_F < 0 $$ $$ \delta\ln{(d_1))}d_2c_1te^{-c_2t} < Q_F $$
From this we see that whilst an initial increase in time leads to an increase in cost, eventually the reducing cost of hardware is dominant, ensuring that in the long term running a full node will be sustainable. However in the short term it may peak above this limit.
<div style="display: flex; justify-content: center;">
<figure style="display: flex; flex-flow: column; max-width:266px">
<img style="" src="https://i.imgur.com/jwFWJvw.png">
<figcaption style="font-size:14px;">The cost of running a full node compared to the decentralisation cost constraint Q. Although it may, or may not, initially rise above the limit Q the lowering cost of network bandwith should eventually beat the linearly increasing requirement for network</figcaption>
</figure>
</div>
Note that full nodes could also decide to propagate more data than just their logarithmic samples. In this case they could propagate a fixed number of rows and columns, giving them the same network costs as the staker
### Archive nodes
Archive nodes create extra indexes and supplementary data structures. They aren’t essential to the safety of the network, however they do impact its usability. A centralised archive node network only has limited power. They can censor users from accessing archive state, but as long as a user can find one archive node operator willing to provide access to archive state then the user will be able to get the data they need to transact. For that reason we'll assume here that a high degree of centralisation is acceptable in the archive node role, and that running archive nodes is sustainable as long as their costs do not outweigh their revenue.
The revenue models of archive node operators are more varied, they tend to charge for API usage and make custom deals with different projects. We’ll assume here that, like builders, archive nodes make revenue proportional to the throughput of the chain. However this may not be a valid assumption, and needs more investigation.
Archive nodes need all of the data to function, so their network bandwidth usage is proportional to throughput. Therefore builders have the same constraints and costs and builders, and yield the same analysis for sustainability.
## General model for sustainability forecasting
We’ve analysed the network bandwidth costs incurred by different types of node for serving blob data. However, this isn’t the full picture. Nodes consume other hardware resources, eg. storage, memory and compute.
They also offer different types of resources to their users. Users can store data in state or in the history of the chain, or execute arbitrary programs. In fact any observable resource that can be publicly manipulated by a user is a potential resource - think of all the opcodes in EVM. These operations can be grouped together by the way they impact hardware, we’ll call these groupings of operations **blockchain resources**.
Each blockchain resource can be mapped to the amount of hardware it consumes via a function that describes the protocol, we’ll call that function the Impact function. The Impact function describes the number of units of hardware resource consumed for each unit of blockchain resource.
The full cost of serving a particular blockchain resource can then be calculated using the Impact function to map the number of blockchain resource units (throughput) consumed to the number of hardware resources consumed, and multiplying that by the cost of a unit of hardware resource. This is the exercise we've completed above for the blob shard data blockchain resourse, and network hardware resource pair.
The total cost of running a node will then be the sum of doing this for each hardware and blockchain resource. This cost should be evaluated for each of the different node types involved in the protocol.
$$
K_i(t) = \Sigma_{j,k}[I_{i,j,k}(D_j(t))C_k(t)]
$$
Where:
- $i$ is the node type: eg. $i \in \{ BU, S, F, A \}$ where $BU$ is block builder, $S$ is staker, $F$ is full and $A$ is archive.
- $K(t)$ is the cost to run a node at some time $t$
- $j$ is the blockchain resource: eg. $j \in \{ H, SW, SR, C, M, B \}$ where $H$ is history, $SW$ is storage writes, $SR$ is storage reads, $C$ is compute, $M$ is memory and $B$ is shard blobs
- $D_j(t)$ is the throughput of the the blockchain resource $j$ at time $t$
- $k$ is the hardware resource: eg. $k \in \{ N, SS, SIO, P, M \}$ where $N$ is network bandwidth, $SS$ is storage size, $SIO$ is storage io, $P$ is processing power/compute and $M$ is memory
- $C_k(t)$ is the cost per unit of hardware resource $k$ at time $t$
- $I_{j,k}$ is the Impact function for blockchain-hardware resource pair $j$-$k$
Once the costs of running a node have been determined we then need to examine any cost constraints the node may have, due to the role it plays in the network. We can then compare these cost constaints to the forecasted cost of running a node:
$$
K_i(t) - R_i(t)=\begin{cases}<0\;\forall \;t:&\overline{S_i}=1\\else\;\;\;\;\;:&\overline{S_i}=0\end{cases}
$$
Where:
- $K_i(t)$ is the cost to run a node, at time $t$, node type $i$
- $R_i(t)$ is any cost constraints, at time $t$, node type $i$
- $\overline{S_i}$ is the sustainability metric for node type $i$
### Method summary
The method for forecasting the sustainability a blockchain protocol is then:
1. List the different node types in the system
3. List the different types of blockchain resource in the system - resources offered to users
4. Forecast how the throughput of these blockchain resources would grow, if not capped by supply
5. List the different types of hardware consumed by the nodes
6. Forecast how the cost of these hardware types can be expected to change over time
7. Examine the exact behaviour of each node type as defined by the protocol and implementation to determine how many units of hardware are consumed for a single unit of each blockchain resource
8. Evaluate the cost function for each of the node types
8. Given the roles that each of the node types flufils in the network, determine any constraints on running costs that these node types may have
9. Compare the cost function to the cost contraints for that node type to determine whether the node will be sustainable long term
### Danksharding in the sustainbility forecasting model
Lets summarise the results of the danksharding applying the sustainability forecasting model to blob shard data - network bandwidth pair.
| Node type | Impact function $I_{i,B,N}(x)$ | Cost constraint $R_{i,B,N}(t)$ | Sustainbility metric $\overline{S_{i,B,N}}$ |
|-----------|:-:|:-:|:-:|
| Builder | $\beta_{BU}x$ | $\alpha_{BU}D_{B}(t)$ | May initially be $0$ but tends to $1$ |
| Staker | $\beta_S\sqrt{x}$ | $Q_S$ | if $d_{2B}/2 < c_{2N}$ then $1$ otherwise $0$ |
| Full | $\beta_F\ln{x}$ | $Q_F$ | Very likely $1$, could be $0$ for some period dependent on constants |
| Archive | $\beta_Ax$ | $\alpha_{A}D_{B}(t)$ | May initially be $0$ but tends to $1$ |
We’ve assumed that demand (and therefore throughput) for blob data will grow exponentially, the unit price network bandwidth will decay exponentially and that the revenue of builder and archive nodes will be proportional to the throughput of the data blobs.
Given these assumptions in the long term we can expect builder and archive nodes and full nodes to have sustainable network bandwidth costs. Although there may be some short term periods where they are unsustainable
The staker node, however, is different in that we only expect it to be sustainable long term if throughput grows at less than twice the rate at which network bandwidth costs decay.
### Exercise for the reader :)
In [this video](https://www.youtube.com/watch?v=TdsaVoJiy3g) Geth developer Péter Szilágyi explains that the current bottleneck to blockchain scaling is state access. Determine the Impact function for the (state write)-(storage io) pair on full nodes $I_{F,SW,SIO}$ and plot the possible forms of the cost function $K(t)$ to show the “wall” that Péter refers to in the video.