or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Kate commitments in
Image Not Showing
Possible Reasons
ETH
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →An engineering and implementation perspective
Disclaimer: this document just collates, links and include work published across many articles and contributions. the respective credits (including of images extracted from these links) goes to the respective authors/developers of the work.
ps: Special thanks goes to Ethereum R & D discord (especially to @vbuterin and @piper ) for helping understand some of the aspects of Kate. Additional gratitude to @vbuterin for reviewing the document.
pps: This document was developed for the benefit of 🌟 lodestar team, an awesome ETH POS client in works based on typescript enabling ETH everywhere, but also to enable the author's understanding of the ETH ecosystem and innovations.
Hopefully this will benefit other developers/techies from all around the world. All rights of this work waived through CC0
Motivation
A wholesome guide to familiarize, summarize and provide pointers to deep dive into proposed used of Kate Commitments while staying in ethereum's context.
Aim of the document is more summarization and less rigor, but please follow the links as they explain in detail what is being talked about.
Some Basics
Note-1: Hash is just a commitment to the value being hashed, proof is checking the integrity of Hash w.r.t data.
For e.g.
h1=H(t1,t2,t3..)
, and giveh1
to the verifier (for e.g. in a block header), and then give a tempered block(t1,t2',t3...)
, one would be able to fast calculate the integrity of the block and reject it as the tempered block.Similarly root of a merkle tree is a commitment to the leaves and their indexes (paths), or in short to the map of
indexes => values
.- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Proof here is the merkle branch and its sibling hashes which can be used to check the consistency all the way to the merkle root.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Trie it out here
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Note-2: correspondence between a data map and a polynomial
The map of

indexes => values
can be represented as a polynomialf(x)
which takes the valuef(index)=value
courtesy of Lagrange Interpolation . Thisf(index)=value
is called evaluation form and the more typical representationf(x)=a0+ a1.x + a2.x^2...
is called coefficient form. Intuitively, we fit a polynomial on the(index,value)
points.For optimizations, and for one to one mapping between polynomial and data map, instead of just

index
asx
inf(x)
,w^index
is used i.e.f(w^index)=value
wherew
is thed
th root of unity whered
is the degree of the polynomial (the maximum index value we are trying to represent), so that FFT can be deployed for efficient polynomial operations like multiplication and division which are of theO(d)
in evaluation form and can be converted back into coefficient form inO(d*log(d))
. So its still beneficial to keepd
small.Note-2.1: Ethereum's state is a map from
addresses => (version,balance,nonce,codeHash,storageRoot)
Background
Ethereum currently use merkle trees and more specifically patricia merkle tries to commit to EVM data (EVM state, block transactions or transaction receipts and may be in near future contract code as well) , that can be
Trie
structure here gives that piece by piece capability.Given a large
d
-ary trie with sizeN
leaves, any leaf change will takeO(log-d(N))
nodes updation (all the way to the root) to compute the newroot
reflecting new state, which requires additional(d-1)*O(log-d(N))
sibling nodes Hashes/Commitments as witnesses for both time (and space if light clients are to be served). In a block, which is a batched update of lets saym
random leaf updates wherem<<N
, only a very small percentage of nodes will be expected to share the witnesses and computations and hence the updateOrder
wouldn't change much for per update.This problem is compounded even further (because of size of the witness data) in following scenarios:
In an experiment on stateless ethereum, the block proof sizes of 1MB were observed (of which majority of them were merkle proofs), which would even blow up in multiples in times of an attack.

One way to get around this is binary merkle trees taking
d
out of the picture but then the depth of tree would increase but it would still remainO log(N)
Why Kate?
Following properties are desirable for any ideal commitment scheme to have for committing to data in the block headers
index=>value
map) in an incremental wayKate commitment based constructions are the result of search of pursuing such an ideal scheme. Vanilla Kate excel at first three.

What is Kate
Kate commitment as such is just another Hashing scheme, but it doesn't Hash 'bytes', it hashes polynomials.
It actually is the evaluation of the polynomial
f(x)
at a secret (fixed) values
but represented on an elliptical curve i.e.[f(s)]=f([s])
, which requires a trusted setup ( of the sorts of zcash genesis) to generate[s]
,[s^2]
, …[s^d]
(to plug into the polynomial wherever there isx^i
) whered
is the max degree we would deal in.Here notation
[t]
means elliptical curve value at pointt
which is basicallyt[1]
i.e. generator[1]
of the additive group on the elliptical curve addedt
times (modulo Fp
). All operations on the curves are somemodulo Fp
whereFp
imposes some field on the curve.Note:3.0 any/every
value
in theindexes=>values
map has also to be represented on an elliptical curve as curve element[value]
to calculate commitment (as you will see later). This imposes restriction on size ofvalue
to bemodulo Fp
~ BLS Curve order ~ somewhere between31-32
bytes. For simplicity, it gets restricted to31
bytes, any biggervalue
will need to be chunkified and appropriately represented with itsindex
s (or truncated).Note3.1:
[t]
can be treated as Hash oft
as obtainingt
from[t]
is discrete log problem known to be intractable for secure curves.Note3.2:
s
is a secret and should be unknown forever to all/any but elliptical curve points[s]
,[s^2]
…[s^d]
as well as evaluation on another curve[s]'
(whose generator is[1]'
, only[s]'
is required) are generated, known, shared and public at trusted setup time.These are sort of system parameters which define the security of out entire system, as knowing
s
would allow the attacker to construct any sort of proofs. So in a multi party trusted setup whereN
participants come together to generate the setup in a protocol of combining their own locals
, even if1
participant is honest and destroys his contributings
, the system is fully secure. i.e. risk is ~1/N
and higher the number of participants, lower the risk!Note3.3:
[]
is a linear operator i.e.[x]+[y]=[x+y]
. alsoa[x]=[ax]
As mentioned before, we represent our data map (
index=>value
) asf(w^index)=value
i.e. evaluation form of the polynomial (or in other words we fitted a polynomial here on these(w^index,value)
points).So Kate commitment
C(f)
of a polynomialf(x)
is elliptical curve pointf([s])
which can be computed by plugging[s]
,[s^2]
… into expanded form off(x)
.Note3.4:
f(s)
can't be computed ass
is unknown butC(f)=[f(s)]=f([s])
can be!Note3.5: Commitment of
f(x)
:C(f)=[f(s)]
is also a linear operator i.e.C(f+g)=C(f)+C(g)
.Rollups/Aggregators can use this property to update the commitments. In evaluation form, updating an evaluation point will lead to complete change in
f(x)
but still its commitmentC(f)
can still be easily updated using this propertyNote3.6: if a setup
[s]
,[s^2]
…[s^d]
is performed for uptod
exponents, then its impossible to represent commitment of any polynomial with degree>d
and viceversa.As there is no way multiply two curve points to get another curve point in secure curves, so
[s^(d+k)]
can't be evaluated (ever!) and conversely it can be argued that any commitmentC(f)
can only represent a polynomial of degree<=d
.Note3.7: proofs using the kate always try to show that
f(x) - some remainder
can be factored in a particular way, but this requires a way to multiply the factors and compare with the original commitmentC(f)=f([s])
.For this we need pairing equation which is just a multiplication scheme for two points on curve and comparing it with the candidate point, since we can't directly multiply two curve points to get the resultant curve point.
Note3.8: above two properties can be further used to prove that the polynomial
f(x)
a particular commitmentC(f)
represents, is of a low degreek
(< d
)So whats the good thing about this:
y=f(r)
of the underlying polynomial at any pointr
and evaluation of the quotient polynomialq(x)=(f(x)-y)/(x-r))
at[s]
i.e.q([s])
and comparing with the previously provided commitmentf([s])
using the pairing equationThis is called opening the commitment at
r
, andq([s])
is the proof. one can easily see thatq(s)
is intuitively the quotientp(s)-r
divided bys-r
which is exactly what we check using the pairing equation i.e. check(f([s])-[y]) * [1]'= q([s]) * [s-r]'
r
as randomness only matters with respect to inputs we are trying to prove. i.e. once commitmentC=f([s])
has been provided,r
can be obtained by hashing all inputs (r=Hash(C,..)
), where the commitment provider has to provide the opening and proof.f([s])
,q([s])
can be directly computed from the evaluation form. To compute an opening atr
, you would convert thef(x)
to coefficient formf(x)=a0+ a1*x^1....
(i.e. extracta0
,a1
,…) by doing the inverse FFT which isO(d log d)
, but even there is a substitute algorithm available to do it inO(d)
without applying (inverse)FFT
.f(x)
at their respectiveindex
s, i.e. multipleindex=>value
i.e.index1=>value1
,index2=>value2
…indexk=>valuek
i.e.k
data points using the single opening and proofq(x)
the quotient polynomial (to calculate proof) is now the quotient if we dividef(x)
by the zero polynomialz(x) =(x-w^index1)*(x-w^index2)...(x-w^indexk)
r(x)
(r(x)
is a max degreek
polynomial that interpolatesindex1=>value1
,index2=>value2
…indexk
=valuek
)( f([s])-r([s]) )* [1] ' = q([s]) * z([s]')
In the sharded setup of the POS chain where the sharded data blobs will be represented as a low degree polynomial (and extended for erasure coding into twice its size using the same fitted polynomial), the Kate commitment can be checked against random chunks to validate and establish availability without needing siblings datapoints and hence enabling random sampling.
Now, for a state with possible
2^28
account keys, you would require atleast2^28
degree polynomial (total account keys space can be much much bigger) for just a flat commitment construction. There are some downsides to that if updates and inserts are to be done. And any change in any one of those accounts will trigger the computation of the commitment (and more problematic witnesses/proofs)Updates to Kate
k
index=>value
points, let say atindexk
, will requirek
to update the commitment again using the corresponding Lagrange polys ~O(1)
per updatef(x)
has changed, all the witnessq_i([s])
i.e. witness ofi
thindex=>value
will need to be updated ~O(N)
q_i[s]
witnesses, any witness calculation from scratch requiresO(N)
sqrt(N)
Hence for the 4rth point of ideal commitment we need a special construction: Verkel tries
Verkle tries
The ethereum state that needs to be represented is
2^28 ~= 16^7 ~= 250m
sizeindexes=> values
map. If we just build a flat commitment (d
we would need will be atleast ~2^28
) then despite our proof still beingO(1)
of the size ofelliptical curve element
of 48 bytes, any insert or update will require eitherO(N)
updates to all precomputed witnesses (theq_i(s)
of all the points, asf(x)
has now changed) orO(N)
on the fly computation per witness if no precomputed witnesses are maintained.Hence we need to move away from a flat structure to something called Verkle Trees which you will notice is also a trie like its merkle counterpart.
i.e. Build a commitment tree in same way as merkle, where we can keep
d
low at each node of the tree (but can still go as high as ~256
or1024
).index => child values
whereindex
is the child'sindex
at that particular parent node.32
bytes size values (check note on size).32
byte hash of data that those leaves store or directly to data if its only32
byte data as in case of proposed State Tree as mentioned in next section. (check note on size)D
,E
can be generated along with their openings at point relatively random pointt
again using fiat shamir heruristic.Order
This is an analysis of verkle branch multi proof
index=>value
will requiredlog_d(N)
commitment updates ~log_d(N)
f_i(X)/(X-z_i)
at[s]
forD
which is in totalO(d log_d N)
, but can be pushed to save precomputes at update/inserts which will push its order toO d log_d(N)
m
~O( log_d(N) )
number of terms:f_i(t)
to evaluateh(t)
which is totalO (d log_d N)
π
,ρ
which requires division of sum ofm~ log_d N
exponent-ed polys ~O(d log_d N)
to get to the evaluation form of the numerator in order to compute the divisionE
) plus verification order ~O( log_d(N) )
Verkle Trees Construction Scenarios
Proposed ETH State Verkle Tree
A single trie structure for account header, code chunks as well as storage chunks with node commitments of degree
d=256
polysstorageKey
which essentially is a representation of the tuple(address,sub_key,leaf_key)
65536
chunks of32
bytes(check note on size)65536*32
bytes is represented as a single depth-2 subtree, there can be many such subtrees in the main trie for storage of an account.(address, sub_key, leaf_key)
WITNESS_CHUNK_COST
for every unique access eventWITNESS_BRANCH_COST
for every uniqueaddress,sub_key
combinationCode merklization
Code will automatically become part of verkle trees as the part of unified state tree.
WITNESS_CHUNK_COST
and one mainWITNESS_BRANCH_COST
for accessing account.Data sampling and sharding in POS protocol
One of the goals of ETH POS is to enable to commit ~1.5MB/s of data (think this throughput as the state change throughput and hence transaction throughput that layer 2 rollups can use and eventually layer 1 EVM) into the chain. To achieve this, many parallel proposals will be done and verified at any given ~
12
second slot, and hence multiple (~64
) data shards will exists, each publishing its own shard blob every slot. Beacon block will include the shard blobs which have>2/3
voting support, and the fork choice rule will determine if the beacon block is canonical based on the availability of all the blobs on that beacon block and its ancestors.Note-3: shards are not chains here, any implied order will need to be interpreted by layer 2 protocols
Kate commitment can be used to create data validity and availability scheme without clients accessing full data published across shards by the shard proposer.

16384
(32 byte
check note on size) samples~512kB
with the shard header comprising mainly of corresponding max16384
degree polynomial commitment to these samplesD
of the evaluation representation of polynomial is2*16384
size i.e.1
,w^1
,…w^
,…w^32767
wherew
is the32768
throot of unity
and not16384
.16384
degree polynomial on the data (f(w^i)=sample-i
fori<16384
), and then extend(evaluate) it to32768
places (i.e. evaluatef(w^16384)
…f(w^32767
) as erasure coding samples16384
samples out of32768
can be used to fully recoverf(x)
and hence the original samples i.e.f(1)
,f(w^1)
,f(w^2)
…f(w^16383)
2048
chunks of these erasure coded32768
samples (each chunk containing16
samples i.e.512 byte
chunks) are published by the shard proposer horizontally (i
th chunk going to 'i'th vertical subnet along with their respective proof), plus globally publishing thecommitment
of the full blob.k~20
vertical subnets for the assigned(shard,slot)
to establishavailability guarantees
~128
committee per(shard,slot)
out of which atleast~70+
should attest to availability and validity i.e.2/3
of the committee attesting for successful inclusion of shard blob in the beacon chain.~262144
validators (32
slots *64
shards *128
minimum committee size)` requiredBenchmarks
As we can see from the POC verkle go library after one time building of
verkle
for the size of state tree, the inserts and updates toverkle
will be pretty fast.PS: if you like this document and think this adds value to the ecosystem you could tip the author at: 0xb44ddd63d0e27bb3cd8046478b20c0465058ff04