owned this note
owned this note
Published
Linked with GitHub
# Zorp Primitives
## Abstract
We introduce *Zorp*, an arithmetization of *Nock*, and describe the architecture of a zero-knowledge virtual machine (zkVM) for proving Nock computations within a zk-STARK proving system. Nock is the Turing-complete combinator interpreter function which defines the lifecycle of the Urbit OS. It is defined by an extremely compact and mathematical specification, making for a natural arithmetization, and is the compilation target of the LISP-adjacent, purely functional programming language *Hoon*, giving a practical way for Hoon programmers to generate zero-knowledge proofs of arbitrary computation.
The fundamental data structure for Nock computations is the *noun*: a (full, rooted) binary tree with natural numbers decorating the leaves. Our arithmetization is based on the bijection between binary trees and *dyck words*, which define a language on a binary alphabet. The reason for this choice is the simplicity of expressing binary tree concatenation, or *cons*, within this encoding, but it also pleasantly transpires that there is a simple interpretation of the encoding in terms of depth-first traversal. The data at the leaves of the nouns are then simply encoded by a vector, where the ordering is determined by depth-first search.
After commitment to the dyck words and leaf vectors, nouns are compressed by interpreting the words and vectors as polynomials and evaluating them on verifier randomness. This makes their manipulation within the zkVM vastly more efficient. Performing cons in this compact representation becomes a near-triviality provided large powers of the verifier randomness has been computed.
Another novelty of our design is in the construction of the stack data structure. A stack is merely a polynomial; the constant term is at the top of the stack. We can then use verifier randomness to validate the polynomial equations which describe push and pop operations.
These considerations inform the design of our zkVM: a *noun table* for constructing valid compressions of nouns; an *exponent table* for making powers of randomness available for cons; a *stack table* for handling the priority-ordering of subcomputations and intermediate results; a *decoder table* which decodes Nock formulas for the stack table; and a *memory table* which performs subtree (memory) accesses into the various subjects throughout a computation. These tables are related for consistency via various multiset arguments.
## zk-STARKs
A zk-STARK, conceptually, addresses the following scenario. An entity called the *verifier* wants some computation performed but does not have the resources to do the computation for itself. It would like to outsource the computation to another entity, called the prover, who has more resources but, crucially, is *untrusted* by the verifier. A zk-STARK is a protocol that the prover and verifier can engage in, in which:
1. The verifier does exponentially less work than in a naive run of the computation.
2. The prover does only marginally more work than just running the computation.
3. The verifier is always convinced by the protocol if the prover is honest.
4. The verifier is only convinced by a mischievous prover with astronomically low odds.
A zk-STARK is based on an Interactive Oracle Proof (IOP) model for proving computation, which, briefly, means the prover performs the computation and sends a string of cryptographic information – e.g. commitments to some data – to the verifier, and the verifier can respond at various points by sending some randomness from a black-box function (oracle) back to the prover; after a certain number of rounds of communication, the verifier either accepts or rejects the prover's efforts.
The intuitive way that zk-STARKs ask us to think of computation is as a sequence of steps, where at each step we update a fixed finite amount of information according to a set of rules. The fixed finite amount of information at each step can be arranged in a horizontal row, and the sequential ordering of the steps is reflected by stacking the rows vertically into a two-dimensional table. To bring powerful mathematical tools to bear, the data that fills the cells of the table is drawn from a finite field, and the transition function is implicitly described by an algebraic set defined over this field by *polynomial constraints*. Additionally, there is no reason for the prover to be restricted to a single table: a fixed finite number of tables may be utilized, each with its own set of constraints, as long as the tables are appropriately related. You can think of different tables as performing different aspects of a computation, which have to obey some consistency relations; i.e. they have to be doing different aspects of the same particular computation.
A related motivation for the table model of computation described above is to address the following problem: how can the verifier tell that the step-by-step computation was done correctly without doing every step themselves? After all, as we know well from even grade school math, one small misstep in a computation can wildly affect the intended output! The verifier needs some way to magnify an error in a single location so that it's noticeable at a much larger scale, so-to-speak.
Surprisingly, the humble class of polynomials is very good at this kind of "error detection", which one can see in the following way. Suppose we fit the simplest polynomial possible to some data points over a particular input domain. If we corrupt the data by changing even a single one of the data points, then the polynomial will either look radically different outside of the input domain, or its degree – the number of coefficients needed to express the polynomial – will diverge radically from the original polynomial.
Thus in the zk-STARK protocol, the prover fits the simplest possible polynomial to the data in each column over a convenient input domain which is known to the verifier. There are more details left unrelated, but it is basically in this way that deceit by the prover is magnified and made visible to the verifier.
The zk-STARK protocol combines these two ideas with cryptographic primitives in the following rough outline (see [here](https://aszepieniec.github.io/stark-brainfuck/engine#stark-engine-workflow) for more detail):
(1) The prover populates the *base columns* of the tables.
(2) These columns are interpolated and the resulting polynomials evaluated on a larger domain; this data is put into a Merkle tree and the root hash sent to the verifier.
(3) The verifier sends randomness.
(4) The prover uses the randomness to compute extension columns and the interpolation and Merklization of (2) is repeated.
(5) The prover can continue extending the tables with randomness, interpolating, and Merklizing.
(6) Generate quotient polynomials using the constraints in the table and commit to them in a Merkle tree.
(7) Send the Merkle root to the verifier and get randomness in return.
(8) Create the nonlinear combination codeword polynomial, and create the Merkle tree of this polynomial.
(9) Send the root to the verifier and get random indices to open in the Merkle trees where consistency checks will be performed.
(10) Run the FRI protocol on the decommitment of the nonlinear combination codeword to check that it is of sufficiently low degree; if it is, the verifier accepts the prover's work.
To use a zk-STARK to prove arbitrary computation, we need a suitable Turing-complete model of computation, and we need a way to encode this model of computation in terms of finite field elements. We next describe *Nock*, our model of computation, and then describe its *arithmetization* in terms of finite field objects.
## Nock
What follows is a somewhat informal introduction to the Nock specification. For a more complete and detailed introduction, see [here](https://developers.urbit.org/reference/nock/definition).
Nock is a definition of a Turing-complete pure function. Nock is a minimal series of reduction rules that map an input noun to a product noun.
A noun is defined as either an atom, which is a natural number, or a cell, which is an ordered pair of nouns.
`[a b]` is a way to write a cell that consists of two nouns, `a` and `b`. In a cell `[a b]`, we say the head of the cell is `a` and the tail is `b`.
`[a [b c]]` is a way to write a cell that consists of a noun a in the head and a tail consisting of a cell containing `b` and `c`, which are also nouns.
The previous expression can be written as `[a b c]` with an implicit right association, and is the preferred way to write the expression.
Let's define a few logical operators that may be applied to input nouns to provide a resulting output noun:
* We define the noun type logical operator, `?` that can be applied to any noun `a`:
* `?a` will evaluate to `0` for a cell, and `1` for an atom.
* We define the increment logical operator, `+` that can be applied to any noun `a`:
* `+a` will evaluate to `(1 + a)` if a is an atom, and will crash if a is a cell.
* We define the equivalence logical operator `=` that can be applied to any two nouns:
* `=[a a]` will evaluate to `0`, and `=[a b]` where b is a different noun than a will evaluate to `1`.
We define the `/` logical operator that may be applied to nouns:
* `/(1 a)` will output `a`, and may be applied to any noun.
* `/(2 [a b])` will output `a` (the head of the cell), and may only be applied to a cell. If it is applied to an atom, crash.
* `/(3 [a b])` will output `b` (the tail of the cell), and may only be applied to a cell. If it is applied to an atom, crash.
* If the head atom is greater than 3, will recursively apply the `/` operator using the following reduction:
* `/(2a b)` will reduce to `/(2 /(a b))` which will then be further evaluated until it reaches one of the base cases.
* `/((2a + 1) b)` will reduce to `/(3 /(a b))` which will then be further evaluated until it reaches one of the base cases.
We define the Nock function as a function that takes an input noun (described as a cell of `[subject formula]`), and returns an output noun or crashes. The Nock function is defined below:
A valid Nock formula is always a cell. If the head of the formula is a cell, Nock treats both head and tail as formulas, evaluates each against the subject, and produces the cell of their products. If the head of the formula is an atom, it's an instruction from 0 to 5. If the head of the formula is higher than 5, it is not valid. Invalid Nock instructions result in a crash.
`Nock(x)` may be written as `*x`.
In the following examples, `a` is a subject and the following cell is the formula:
* `*[a 0 b]` evaluates to `/(b a)`
* `*[a 1 b]` evaluates to `b`
* `*[a 2 b c]` evaluates to `*[*[a b] *[a c]]`.
* (Note that the first subformula generates a new subject, the second subformula generates a new formula, and then the old subject and formula are discarded. Then, the new formula is evaluated against the new subject as follows: `*[new-subject new-formula]` ). Remember, 1. compute new subject, 2. compute new formula, 3. run new formula against new subject.
* `*[a 3 b]` evaluates to `?*[a b]`
* `*[a 4 b]` evaluates to `+*[a b]`
* `*[a 5 b c]` evaluates to `=[*[a b] *[a c]]`
## Binary Tree Encoding
#### Binary Trees
A *[full rooted binary tree](https://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_trees)* is a data structure like the following picture:
•
/ \
•
/ \
•
/ \
Briefly, these are trees with a root node, and such that each node has either zero or two children. Note we only explicitly draw the *internal* nodes, i.e. the ones with two children. The external, or childless, nodes are called *leaves*.
For the remainder of this document, the term "binary tree" will always refer to full rooted binary trees.
In the context of Nock, the fundamental data structure is the *noun*, which in less formal terms is a binary tree decorated with natural numbers (0 is considered natural) at the leaves, like so:
•
/ \
3 •
/ \
• 23
/ \
7 11
The definition of a noun as a binary tree maps perfectly as the semantic content of the definition of a noun in the Nock specification: a noun is either a natural number, called an atom, or a pair of nouns.
If `a` and `b` are nouns, the pair is denoted `[a b]`. This pair is the binary tree with left subtree `a` and right subtree `b`. Our syntax is such that nouns associate to the right by default, so `[a b c]` implicitly denotes `[a [b c]]`; by contrast `[[a b] c]` has no expression with fewer brackets.
As an example, the noun above is represented as `[3 [[7 11] 23]]` with all brackets written explictly, but we can use the abbreviation `[3 [7 11] 23]` as well.
#### Depth-First Search
A fundamental algorithm for traversing a binary tree which is relevant below is depth-first search. It can be described recursively in the following way: If the current tree being searched is a leaf, you are done. Otherwise, move down and left and search that whole subtree according to these instructions; once completed, move down and right and search that whole subtree according to these instructions.
To illustrate, here are the successive stages of the tree above being searched according to the depth-first algorithm:
• • •
• / / \ / \
3 3 • 3 •
/
•
• • •
/ \ / \ / \
3 • 3 • 3 •
/ / / \
• • • 23
/ / \ / \
7 7 11 7 11
#### Dyck Words and Leaf Vectors
As mentioned above, in a zk-STARK the target computational model must be arithmetized. Thus we need an arithmetic model that will represent nouns.
Our arithmetization of a noun is a pair of vectors, each intimately related to depth-first search of the noun. The first is a vector of binary digits encoding the shape of the tree – the *dyck word* of the noun – while the second is a vector of natural numbers encoding the data at the leaves – the *leaf vector* of the noun.
##### Intuition for Dyck Words
The dyck word of a noun is, intuitively, obtained by doing a depth-first search of the noun and recording, in order, each move to the left as a 0 and each move to the right as a 1. A useful exercise for the reader is to use the example figure of depth-first search from above and convince himself that the dyck word of
•
/ \
3 •
/ \
• 23
/ \
7 11
is <0, 1, 0, 0, 1, 1>. Since the digits are binary there is no loss in compressing this to 010011, which is our standard practice in writing dyck words.
**Note**: Some sources use ( and ) or \[ and \] as their "alphabet" for dyck words. So 010011 in the example above would be \[\]\[\[\]\]. You should not confuse this for the notation for the nouns syntax introduced above which uses \[ and \]. For example, our syntactic representation of the noun above is `[3 [[7 11] 23]]`; omitting the leaf values yields \[\[\[\]\]\], which is not the same as \[\]\[\[\]\].
However, the dyck word can be recovered from our syntactic representation by inserting commas to separate left and right arguments like so
[3 , [[7 , 11] , 23]]
and ignoring the leaves *and* the \], then mapping \[ to 0 and , to 1. The reader can check that this works in this example.
The leaf vector of a noun is obtained by doing a depth-first search on the noun and writing down, in order, the natural numbers so-encountered into a vector. The reader will have no trouble convincing himself that <3, 7, 11, 23> is the leaf vector corresponding to the above example.
That this encoding scheme is reversible is easy to make a strong intuitive argument for: since a particular run of depth-first search doubles as a description of the construction of a full rooted binary tree, the dyck word allows one to reconstruct the tree shape; once we have the shape, we insert the leaf vector components into the tree according to depth-first order.
([Corroboration of the bijection on slide 14 here.](https://www.math.utah.edu/mathcircle/tessler-catalan-notes.pdf))
See the technical appendix for a different argument.
##### Cons Relation
Depth-first search is briefly: go left, search left subtree; then go right, search right subtree. This translates to the following relation for dyck words:
T = 0L1R (*)
where T, L, R are the dyck words for the tree and its left and right subtrees respectively (assuming these subtrees exist – the formula is only valid if T is non-empty). Why? Reading this formula in words directly translates to the brief description of depth-first search given above: "Depth-first search (T): (=) go left (0), search left subtree (L); go right (1), search right subtree \(R\)."
**NOTE**: L and/or R can be empty words. (**END NOTE**)
We can represent the relation (\*) diagrammatically in a way that hopefully aids intuition in connecting it to the underlying binary trees:
•
T = 0 / \ 1
L R
Here, the 0 and 1 label the edges; 0 for left, 1 for right. We will refer to the relation (*) as the *cons* relation for dyck words. Cons is a term adopted from LISP that refers to concatenation. In our context, we concatenate the trees L and R to form the composite tree T.
##### Definition of Dyck Words
We can in fact take (\*) as the basis for a definition of dyck words, independent of binary trees:
**Definition**: A dyck word is defined over an alphabet of two symbols – here we use 0 and 1 – and is either empty or has the form 0L1R where L and R are dyck words.
##### Cons on Leaf Vectors
Tree concatenation as it manifests on leaf vectors is trivial: it's literal concatenation of the vectors. This is because when we do depth first search on a tree, the first leaves encountered are the leaves of the left subtree, and in the order we'd find them by doing depth-first search on the left subtree (because of the recursive nature of depth-first search); then we encounter the leaves of the right subtree in the same order as we'd find them by doing depth-first search of the right subtree.
##### Dyck Word Properties as Binary Strings
It's easy to recognize dyck words within the general class of binary strings by two properties:
(a) Reading left-to-right, the number of 1's thus-encountered never exceeds the number of 0's.
(b) The total number of 0's is equal to the total number of 1's.
The fact that both of these properties hold for dyck words follows inductively from the cons relation (\*) and the fact that they trivially hold on the empty dyck word. Conversely, it's elementary, if a bit lengthy, to argue that any binary word with these properties has the form 0L1R where L and R *also* have these two properties, so we can again employ an induction argument. We do this in the technical appendix.
These properties give a very efficient way to check if a binary string is a dyck word by maintaining a weighted counter that increments on a 0 and decrements on a 1; as long as the counter never hits -1 and ends at 0, the binary string is a dyck word.
## Our Field
We use the field $\mathbb{F}_p^3$ of extension degree 3 over the [prime](https://aszepieniec.github.io/stark-brainfuck/engine#field-and-extension-field) [field](https://cp4space.hatsya.com/2021/09/01/an-efficient-prime-for-number-theoretic-transforms/) $\mathbb{F}_p$ where $p = 2^{64} - 2^{32} + 1 = 18446744069414584321$, the latter of which is denoted the *base field*. The extension is defined by the [irreducible](https://crypto.katestange.net/finite-field-tools/) [polynomial](https://sagecell.sagemath.org) $x^3 - x + 1 \in \mathbb{F}_p[x].$
In practical terms, $\mathbb{F}_p^3$, is the set of triples of elements of $\mathbb{F}_p$, where addition is vector addition. Multiplication is performed by converting the triple's components into polynomials, multiplying these polynomials, keeping only the remainder upon division by $x^3 - x + 1$ – which is necessarily of degree two or less – and converting it back into a tuple.
One practical motivating advantage of the base field is that we can do $a\cdot b + c$ in this field without "overflow" if $a, b, c$ are 32-bit numbers. This is because $a\cdot b + c$ is at most $(2^{32}-1)^2 + (2^{32} - 1) = (2^{32}-1)\cdot 2^{32} = p - 1$.
Note that the order of the multiplicative subgroup of the base field is
$$\begin{align*}
p-1 &= 2^{32}(2^{2^4}+1)(2^{2^3}+1)(2^{2^2}+1)(2^{2^1}+1)(2^{2^0}+1)\\
&=2^{32}\cdot 65537 \cdot 257 \cdot 17 \cdot 5 \cdot 3
\end{align*}$$ The odd factors are, remarkably, all of the Fermat numbers that are known to be prime. The multiplicative subgroup of the base field is cyclic since the multiplicative subgroup of any field is cyclic, so in particular there is a cyclic subgroup of order $2^{32}$, which is crucial for the FRI protocol and very useful for computing fast Fourier transforms. In particular the heights of our tables will be bounded by $2^{32}$.
## Schwarz-Zippel
This section contains what might be considered a technical digression for a mathematical lemma, but the reader shouldn't be too intimidated, as the mathematics involved aren't terribly advanced.
First, a *multivariate polynomial* (*multipoly* for short) is a finite expression of the form
$$f(x_1, \dots , x_k) = \sum \, a_{i_1...i_k} x_1^{i_1}\dots x_k^{i_k}$$ where the $i_m$ are natural numbers, and the $a_{i_1...i_k}$ (the *coefficients*) are drawn from some field $\mathbb{F}$.
That is, one starts with variables $x_1, \dots , x_k$, chooses natural number powers $i_1, \dots , i_k$ to raise them to, and multiplies all the resulting quantities together, and then by a fixed scalar $a_{i_1...i_k}$ of your choosing. This is called a *term*. Repeat this as many times as you like, creating different terms, then add the results together to create a multivariate polynomial.
The *degree* of a term is $\sum i_m$. The degree of a multipoly is the maximum of the degrees of its terms.
The following lemma generalizes, and gives a convenient name to, the (hopefully familiar) fact that a polynomial of one variable cannot have more zeroes (also called roots) – that is, points where it evaluates to 0 – than its degree. The reason is simple: a root $a$ corresponds to a linear factor $x-a$ (by doing polynomial division with remainder by $x-a$), and each linear factor adds another $1$ to the degree, so degree must exceed the number of linear factors, i.e. number of roots.
**Lemma** (Schwarz-Zippel): Let $f(x_1, \dots , x_k)$ be a nonzero multipoly with coefficients in a field $\mathbb{F}$, let $d$ be the degree of $f$, and let $S$ be a finite subset of $\mathbb{F}$. Then the number of zeroes of $f$ in the "box" $S^k$ is at most $$d\cdot |S|^{k-1},$$ so the probability that $f(r_1, \dots , r_k)=0$ where $r_1, \dots , r_k$ are drawn randomly and uniformly from $S$ is at most
$$\frac{d\cdot |S|^{k-1}}{|S|^{k}} = \frac{d}{|S|}.$$
**Note**: When $\mathbb{F}$ is a finite field, $S$ can be all of $\mathbb{F}$!
The point is that the larger you allow the domain $S$ in $\mathbb{F}$ from which you are drawing the inputs to be, the less likely it becomes that you will randomly draw a root of $f$.
**Sketchy Proof**: For any $r_2, \dots , r_k$ we plug into $f$, we get a one variable polynomial by fixing all arguments after the first: $f_{r_2\dots r_k}(x_1) = f(x_1, r_2, \dots , r_k)$. This polynomial is at most degree $d$, so has at most $d$ roots. There are $|S|^{k-1}$ choices of $r_2, \dots , r_k$, so this gives at most $d\cdot |S|^{k-1}$ total zeroes.
The reason this is sketchy is that $f_{r_2\dots r_k}(x_1)$ may be the zero polynomial, which is zero at every point in $S$. It is possible to analyze this case and use an argument by induction to complete the proof with the same conclusion. Hopefully the sketchy version gives some quick intuition why the proposition holds.
## Probabilistic Data Structures & Algorithms
Surprisingly, the seemingly useless randomness that the prover receives after committing to the base columns from the verifier in the zk-STARK protocol can drastically alleviate the burden on the prover, at the cost of a (usually astronomically low) chance that the verifier accepts an incorrect computation from a mischievous prover. It is rational for the verifier to accept this small risk if it can get the result and proof of the computation in less time.
#### p-Multisets
A standard (section 9.6 [here](https://eprint.iacr.org/2021/1063.pdf)) [example](https://aszepieniec.github.io/stark-brainfuck/engine#permutation-arguments) of [this](https://wiki.polygon.technology/docs/miden/design/multiset/) quick-but-possibly-dirty type of computation is to show that the values in two columns of the base tables are permutations of each other, or equivalently to show that these two sets of values form the same *multiset*. (A multiset is like a set but elements can be repeated.)
Denote the values in the columns by $\{a_i\}$ and $\{b_j\}$. We first reframe multiset equality as polynomial equality: $\{a_i\} = \{b_j\}$ as multisets if and only if $\prod (x-a_i) = \prod (x-b_j)$ (since the polynomials over a field are a unique factorization domain). Now, it is no easier to compute and check the equality of these polynomials than to check the equality of the multisets directly; however, we can use the most basic case of Schwarz-Zippel to obtain a probabilistic equality check: merely evaluate these two polynomials on a random $\beta$ in $\mathbb{F}$ and use the equality of the resulting field elements as a proxy for the equality of the multisets. If the evaluations are different, we know the multisets are different. If the evaluations are equal, this can only happen if the polynomials (equivalently, multisets) are *not* equal with very low probability; the same probability that $\beta$ is a root of the difference polynomial $\prod (x-a_i) - \prod (x-b_j)$ We can even quantify the unlikeliness as the size of the multisets divided by the size of the field by Schwarz-Zippel, so as long as the field is very large relative to the size of the tables, the verifier should be willing to take this risk to get the answers to his calculations faster.
**Note**: One must be careful as to how randomness is used. Above, we assumed that both the $\{a_i\}$ and $\{b_j\}$ values are in the base columns which are committed to. This commitment is the seed from which $\beta$ is produced. Now, if the mischievous prover goes fishing for $\{b_j\}$ values that aren't actually a permutation of the $\{a_i\}$ values but are capable of fooling the verifier just by luckily satisfying $\prod (\beta-a_i) = \prod (\beta-b_j)$, then he will really have to put in the work and grind; for each new set of $\{b_j\}$ produces a *new* $\beta$, since the former values are part of the seed that produces the latter randomness. However, suppose we try to use this protocol to prove that $\{a_i\}$ and $\{b_j\}$ are identical multisets, where $\{b_j\}$ comes from an extension column, i.e. a column that the prover fills in *after* receiving the randomness. Then it's completely trivial for the prover to insert values for $\{b_j\}$ that have nothing to do with $\{a_i\}$ but satisfy $\prod (\beta-a_i) = \prod (\beta-b_j)$; for since $\beta$ is now fixed relative to the $\{b_j\}$, the prover is only constrained by this one *fixed*, *explicit* relation, and so can fill in whatever he wants for all but one of the $\{b_j\}$ and still satisfy the required equation.
#### Tuple Compression
(See [here](https://aszepieniec.github.io/stark-brainfuck/engine#permutation-arguments) and [here](https://wiki.polygon.technology/docs/miden/design/multiset/) for precedent examples.)
Suppose a row in a table is intersected with multiple columns: we call the resulting vector of values from that row a *tuple*. It is often the case that the prover needs to show that a multiset of tuples formed from the same set of columns in one table is equal to the multiset of tuples formed from a group of columns of the same number in another table. The prover can reduce this to the case discussed in the p-multiset section above by compressing the tuples into a single value with verifier supplied randomness.
For example, if we have a 3-tuple $(x, y, z)$ then using three random values $a$, $b$, $c$ from the verifier, the prover compresses the tuple to
$$ax + by + cz.$$ The general case can be inferred from the example (it's just a dot product).
Of course, if the tuples in the different tables are genuine permutations of each other, then the sets of compressed values will also be related by the same permutation. But is it possible for the compression to be lossy in the sense that the tuples are *not* related by a permutation but the compressed values *are*? If this is the case, then $(a, b, c, \dots)$ is in the kernel of a matrix which depends on the values in the two tables (exercise!), so $(a, b, c, \dots)$ is a common solution of a number of linear relations; this is even less likely than it being a solution of any single one of these linear relations. But this probability of sampling a zero of a linear functional, i.e. degree = $1$ multilinear polynomial, is, by Schwarz-Zippel, at most $\frac{1}{|\mathbb{F}|}$.
This allows for the following, psychologically, when the prover is "programming" with the tables. Some computation is done in one table to produce data in multiple columns. This data is compressed and put into an p-multiset to be stored. Then, in another table where a different aspect of computation is done, the prover "retrieves" this data from the p-multiset and "unpacks" it from the tuple by "guessing" the correct input values. What actually happens however is that the retrieved-and-unpacked data is part of the base columns and needs to be present in the second table *before* any p-multisets are created, since this data needs to be part of the seed for the randomness to prevent prover mischief; the straight-laced prover will have no problem doing this since he's actually run the computation and has an omnipotent view of it.
#### p-Stacks
Say we have a string of data $s_0...s_n$ (equivalently a vector $\langle s_0, \dots s_n \rangle$) where each of the $s_i$ are elements of a finite field $\mathbb{F}$. It can be advantageous to have a compressed version of this data which nevertheless maintains the order properties implied by the string/vector data structure.
One way to achieve this is by including the data $s_0...s_n$ in the data committed to and getting a packet of randomness in $\mathbb{F}$ from the verifier; call it $\alpha$. Now construct
$$s(\alpha) = s_0\alpha^n + \dots + s_{n-1}\alpha + s_n.$$ This can be done in O(n) time with Horner's method of accumulation.
One should think of the concrete field element $s(\alpha)$ as a proxy for the symbolic expression $$s(x) = s_0 x^n + \dots + s_{n-1} x + s_n,$$ the latter of which would be a fully faithful equivalent of the original data.
It's also probably helpful to think of $\sum s_i \alpha^{n-i}$ as analogous to the $\prod (\beta-a_i)$ expression from the p-multiset section, where we're just putting some data into the coefficients of a polynomial instead of the roots of a polynomial, and then evaluating the polynomial on verifier-supplied randomness. Where the p-multisets allowed us to make efficient arguments about membership and multiplicity by using the roots of a polynomial, p-stacks will enable us to make efficient arguments about ordering.
(A certain degree of our use of p-stacks is similar to the evaluation arguments [here](https://aszepieniec.github.io/stark-brainfuck/engine#evaluation-arguments).)
The elements $s_i$ of the original string can be "pushed" on the stack by multiplying the existing state of the stack by $\alpha$ and adding the next $s_i$. We can also "pop" an $s_i$ from the stack $s(\alpha)$ by nondeterministically asserting the pop value $s_i$ and the state of the stack $t(\alpha)$ after the pop, then showing that pushing $s_i$ onto $t(\alpha)$ yields $s(\alpha)$. If we know that $t(\alpha)$ really is the value of a polynomial $t(x)$ at $\alpha$, then it follows that $s(x) = x\cdot t(x) + s_i$ with high probability, which is equivalent to saying that $t(x)$ is the result of popping $s_i$ from $s(x)$.
**Note**: We're glossing a technicality here since it's impossible to tell the difference between an empty p-stack and a p-stack with a bunch of $0$'s pushed on; in practice we also maintain a stack-length counter.
##### Compressed Tuples in Stacks
Just as with multisets, where we can compress data into a single field element and store it in a multiset, we can compress data and store it in a stack. The difference between the two is in storing the data as the roots of a polynomial (multisets) and the coefficients of a polynomial (stacks).
As a concrete example suppose we have $\{x_{ij}\}$ in one table and $\{y_{kl}\}$ in another, where the first subindex labels rows and the second labels columns, and assuming that there are the same number of columns in each. We give a protocol that can prove with high probability that for arbitrary $m$, $\langle x_{im}\rangle_i=\langle y_{km}\rangle_k$ as vectors; i.e. the columns in the two tables are the same when read top to bottom.
Commit to the $\{x_{ij}\}$ and $\{y_{kl}\}$. Produce a set of randomness $\{a_p\}$ equal in number to the number of columns in one table, and also a random value $\alpha$. Check for the equality
$$\sum_i \alpha^{m-i} \Big(\sum_j a_j x_{ij}\Big) = \sum_k \alpha^{n-k} \Big(\sum_l a_l y_{kl}\Big)$$ where $m$ and $n$ are the number of rows in the first and second table respectively.
If the columns are the same, then obviously this equality holds and the prover passes. For the converse, if this equality holds, then $\alpha$ is a root of the polynomial
$$\sum_i t^{m-i} \Big(\sum_j a_j x_{ij}\Big) - \sum_k t^{n-k} \Big(\sum_l a_l y_{kl}\Big)$$ and $\alpha$ is a completely random value in the field relative to the coefficients of this polynomial. It is highly unlikely that we have produced a root of this polynomial randomly unless it is the zero polynomial by Schwarz-Zippel, so we conclude that
$$\sum_j a_j x_{ij} = \sum_l a_l y_{il}$$ for each $i$; in particular the number of rows in the two tables must be equal.
But now we find that the vector $\langle a_j \rangle$ is in the kernel of the matrix $(x_{ij}-y_{ij})_{ij}$; that is, it is simultaneously in the kernel of the linear functionals defined by the rows $\langle x_{ij}-y_{ij} \rangle_{j}$. This is even more unlikely than the probability that $\langle a_j \rangle$ is in the kernel of any one of these linear functionals, and since $\langle a_j \rangle$ was sampled randomly relative to the coefficients of the matrix, this probability is at most $\tfrac{1}{|\mathbb{F}|}$ (by Schwarz-Zippel since a linear functional is equivalent to a degree-one multilinear polynomial) unless the matrix is the zero matrix. Thus $x_{ij} = y_{ij}$ with high probability.
**Note**: One may reasonably ask, "Why in the above does the prover not have to make a separate commitment to the values $\sum_j a_j x_{ij}$, $\sum_l a_l y_{il}$ before putting them in stacks? The reason is that the prover will be fully committed to these values through constraints in the table that force him to compute these values from the committed $\{x_{ij}\}$ and $\{y_{kl}\}$ values.
#### Noun Storage
We combine all of these ideas to package nouns and share them across tables.
We represent a noun by its length, dyck word, and leaf vector, where the length of a noun refers to the number of leaves; the latter two we can represent symbolically as a pair of polynomials whose coefficients are the binary dyck letters and leaf vector components:
$$(len, d(x), l(x))$$ We include the length to be able to restore the appropriate number of leading zeroes to the leaf and dyck polynomials and recover the strings. (Technically this is not necessary, since a dyck word has equal numbers of 0's and 1's, and thus we can use the 1's present to restore the correct number of 0's. The length of the dyck word then tells us how many 0's to restore to the leaf vector if necessary, via the formula
$$\mathrm{len}(dyck) = 2\cdot\mathrm{len}(leaf) - 2,$$ In any case it's convenient to still include the length.)
Once all nouns have been committed to, we can evaluate these polynomials on verifier randomness $\alpha$ to obtain the dyck stack and leaf stack of the noun.
$$noun\mbox{-}tuple = (len, d(\alpha), l(\alpha))$$
The evaluation map
$$(len, d(x), l(x)) \rightarrow (len, d(\alpha), l(\alpha))$$ can be shown to be injective with high probability, thus we are nearly assured of a unique representation of nouns in terms of triples of field elements. Intuitively, this is because we have a limited number of nouns of a given length in a program, given the height restriction of the tables, and different polynomials almost certainly have different values at a randomly selected field element. It may be conceptually beneficial to think of the evaluations of $d(x)$ and $l(x)$ at $\alpha$ as hashes of the dyck word and leaf vector.
We can then compress the tuples like so
$$tree\mbox{-}felt = a\cdot len + b\cdot d(\alpha) + c\cdot l(\alpha)$$
and include them in multisets. We can similarly argue that these tree felts are unique with high probability; a tree-felt is conceptually like a hash of a noun.
#### p-Cons
A prime example of our use of p-stacks is in performing *cons* (recall from above that this refers to tree-concatenation). Let's look at how it works with dyck words.
The cons relation for dyck words is
d(T) = 0d(L)1d(R) (*)
Let's write the digits of $T$ as $t_i$, zero-indexed, and let $t$ denote the number of such symbols; we do similarly with $L$ and $R$.
We can now rephrase this string identity in terms of polynomials; set
$$\begin{align*}
d_T(x) &= t_0x^{t-1} + \dots + t_{t-2}x +t_{t-1}\\
d_L(x) &= l_0x^{l-1} + \dots + l_{l-2}x +l_{l-1}\\
d_R(x) &= r_0x^{r-1} + \dots + r_{r-2}x +r_{r-1}
\end{align*}$$
Because of (*), we have
$$d_T(x) = 0\cdot x^{t-1} + l_0 x^{t-2} + \dots + l_{l-1} x^{t-l-1}\\ + 1\cdot x^{t-l-2} + r_0 x^{t-l-3} + \dots + r_{r-1}$$
but also because of (*) there's the relation
$$t = l + r + 2;$$ using this in the previous equation yields
$$\begin{align*}
d_T(x) &= 0\cdot x^{l+r+1} + (l_0 x^{l+r} + \dots + l_{l-1} x^{r+1})\\ &+ 1\cdot x^{r} + (r_0 x^{r-1} + \dots + r_{r-1})\\
&= 0\cdot x^{l+r+1} + x^{r+1}(l_0 x^{l-1} + \dots + l_{l-1})\\ &+ 1\cdot x^{r} + (r_0x^{r-1} + \dots + r_{r-1})
\end{align*}$$ which is
$$d_T(x) = 0\cdot x^{l+r+1} + x^{r+1}d_L(x) + 1\cdot x^{r} + d_R(x).$$
This equation is really quite simple. In relation (\*) things more to the left on the right-hand side are "deeper" in the stack, so when translating (*) into the corresponding stack equation, it's easiest to start from the right: $D_R$ is at the top of the stack so it directly translates to $D_R(x)$; then we have to "push" $1$ down the stack $r$ slots to fit under $d_R$, so it gets multiplied by $x^r$; then we have to push $d_L$ down past $d_R$ and $1$, so it gets multiplied by $x^{r+1}$; finally, we push $0$ past $d_R$, $1$, and $d_L$, which multiplies it by $x^{l+r+1}$.
As an application, suppose the data in $T$, $L$, and $R$ have been committed to. (This is very natural in our context because every noun's dyck word has to be validated anyway; that is, checked for its adherence to the rules which distinguish dyck words from general binary strings.) Then to prove that the dyck word $T$ is the cons of $L$ and $R$, we probabilistically prove the polynomial identity above by using verifier-supplied randomness $\alpha$ and verify that
$$d_T(\alpha) = 0\cdot \alpha^{l+r+1} + \alpha^{r+1}d_L(\alpha) + 1\cdot \alpha^{r} + d_R(\alpha).$$ Note that this is very quick: just a few field operations.
There is a similar cons relation in polynomials for the leaf vectors:
$$l_T(x) = x^{\rho}\cdot l_L(x) + l_R(x)$$ where $\rho$ is the length of the leaf vector of $R$. The reason is that the leaf vector of a tree, in depth-first search order, is just the concatenation of the leaf vector of the left and the leaf vector of the right, since depth-first search works left-to-right.
## Nock zkVM Table Design
![](https://hackmd.io/_uploads/SJ7gy9FBh.jpg)
The core of our Nock zkVM design has six tables.
(1) Noun Table
(2) Memory Table
(3) Stack Table
(4) Pop Table
(5) Decoder Table
(6) Exponentiation Table
For each of the tables we describe how the base columns are populated and then extended with verifier randomness, as outlined in points (1)-(5) in the section above on the zk-STARK protocol.
At a high level, it goes like this. As we run the original computation, we record metadata about the computation; for example, which subjects we do Nock 0's on. We finish the whole computation before populating the tables because information like, say, how many Nock 0 lookups of a particular subject at a particular axis occur is not necessarily available until the end of the computation.
As far as creating the tables goes, there are three phases, with two commitments, and correspondingly two "releases" of randomness from the verifier. In the first phase, the base columns of *all* tables are populated. This is any information that the honest prover can fill-in without knowing the randomness that comes later. Once the base columns are committed to, the verifier releases some randomness; in our case it's a field element $\alpha$ used to compute dyck and leaf stacks, randomness $a, b, c$ used to compress noun triples, and randomness $p, q, r, s, t$ to compress other tuples; for example, to compress a tuple where some components of the tuple are already nouns that involve $a, b, c$. We cannot reuse $a, b, c$ in this second layer of tupling because these are no longer random with respect to the things being compressed! The prover then fills any columns that require $\alpha, a, b, c, p, q, r, s, t$ and no other randomness for their computation. Committing to these new columns, the verifier returns more randomness $\beta, \gamma$ which the prover uses to create stacks in the stack table ($\gamma$) and compute multisets ($\beta$); these multisets allow certain worker tables to do computation that another master table (especially e.g. the stack table) would prefer to merely assert answers to, and then share the answers with the master table.
What we have to explain is how the relations in and between the tables express the logic of a generic Nock computation.
Let's discuss the tables in turn.
#### Noun Table
##### High Level Overview
This table takes binary strings and vectors, validates the binary strings as dyck words, computes the resulting p-stacks, and packages them into a multiset for sharing with the stack, decoder, and Nock 0s table. The dyck word and leaf vector pairs in the table are a description of all the nouns encountered in the course of the program, *with multiplicity*.
##### Essential Details Overview
1. In a designated base column for dyck words, the prover puts strings of 0's and 1's, one character per cell. There are polynomial constraints which apply to each row and each pair of adjacent rows which validate that these dyck words are properly formed.
2. Similarly, in another designated column the prover puts the leaf vector values, and in a third designated column there is a counter that increments appropriately so that at the end of the dyck word and leaf vector the counter's value is the length of the leaf vector. A final important base column is one for recording multiplicity of each noun.
3. The prover commits to this data, and receives randomness $\alpha, a, b, c, p, q, r, s, t$ from the verifier.
4. Using $\alpha$, the prover now creates stacks $\mathrm{d}(\alpha),$ $\mathrm{l}(\alpha)$ for each dyck word $\mathrm{d}$ and leaf vector $\mathrm{l}$. These are accumulated in new dedicated extension columns.
5. The prover commits and receives randomness $\gamma, \beta$ from the verifier.
6. Finally, the tuples $(\mathrm{len}(\mathrm{l}), \mathrm{d}(\alpha), \mathrm{l}(\alpha))$ are compressed with randomness $a, b, c$ into *tree felts* and put into a multisets using randomness $\beta$. There are several multisets; one for each table that uses the compressed form of nouns and needs them validated.
![](https://hackmd.io/_uploads/SyNMyqYH3.jpg)
![](https://hackmd.io/_uploads/BJ_voR5rn.jpg)
![](https://hackmd.io/_uploads/S1c_iC5S3.jpg)
##### Intuition-Building Example
As a purely pedagogical exercise (i.e. we don't actually do this or even need to do this in our Nock zkVM), suppose we had two noun tables where the nouns were allegedly in different orders in the two tables. We can prove, using the construction of the noun table outlined above, whether or not this is indeed the case.
In each table, commit to the dyck word letters, leaf vector components, and counter to the length of the leaf vector. Then in table one, create the dyck and leaf stacks $\mathrm{d}_i(\alpha)$, $\mathrm{l}_i(\alpha)$ and compute the length of the leaf vector $L_i$, and in table two do similarly and compute the dyck stacks, leaf stacks, and lengths $\delta_j(\alpha)$, $\lambda_j(\alpha)$, $\Lambda_j$. Then
$$t_i=a\cdot L_i + b\cdot\mathrm{d}_i(\alpha) + c\cdot\mathrm{l}_i(\alpha)$$ and
$$\tau_j=a\cdot \Lambda_j + b\cdot\delta_j(\alpha) + c\cdot\lambda_j(\alpha)$$ will denote the resulting *tree felts* (felt = field element).
Now form the two polynomials $\prod_i (x-t_i)$ and $\prod_j (x-\tau_j)$, and consider another random value $\beta$. Then
$$\prod_i (\beta-t_i) = \prod_j (\beta-\tau_j)$$ if and only if the $\{t_i\}$, $\{\tau_j\}$ are the same multiset, with probability $1-\tfrac{C_1}{|\mathbb{F}|}$ where $C_1$ is the number of nouns in the table with the largest number of nouns. Note that this conclusion only holds because $t_i$ and $\tau_j$ are fully committed to through our committments to the underlying data in the tables: the value $\beta$ only appears after the $t_i$ and $\tau_j$ values can be deterministically calculated, i.e. we cannot make up values for the $t_i$ or $\tau_j$ after $\beta$ has appeared and try to force the above equality.
Once we know $\{t_i\}$ and $\{\tau_j\}$ are the same multiset, then for any given $i$ there is a corresponding $j = \sigma(i)$ where $\sigma$ is a permutation, such that
$$a\cdot L_i + b\cdot\mathrm{d}_i(\alpha) + c\cdot\mathrm{l}_i(\alpha) = a\cdot \Lambda_j + b\cdot\delta_j(\alpha) + c\cdot\lambda_j(\alpha),$$ which means that $(a, b, c)$ is in the kernel of the matrix whose $i^{th}$ row is
$$[L_i-\Lambda_j, \mathrm{d}_i(\alpha)-\delta_j(\alpha), \mathrm{l}_i(\alpha)-\lambda_j(\alpha)].$$ At its largest, this kernel is two-dimensional in $\mathbb{F}^3$ (unless it's the zero matrix, which is what we're trying to concude anyway). This happens when the image of the matrix is one-dimensional, or equivalently when every row is a scalar multiple of any particular row. In this case, $(a, b, c)$ is in the kernel of the matrix if and only if it's in the kernel of the linear functional defined by a single non-zero row. It is highly unlikely (by Schwarz-Zippel, since a matrix with one row is equivalent to a linear polynomial) that we randomly sampled such a kernel vector, so the only conclusion is that this is the zero matrix, i.e. $L_i=\Lambda_j$, $\mathrm{d}_i(\alpha)=\delta_j(\alpha)$, $\mathrm{l}_i(\alpha)=\lambda_j(\alpha)$, and this holds for every $(i, j=\sigma(i))$ pair.
Now that we have $L_i=\Lambda_j$, $\mathrm{d}_i(\alpha)=\delta_j(\alpha)$, $\mathrm{l}_i(\alpha)=\lambda_j(\alpha)$, we see that we've improbably sampled a *common* root of all of the polynomials $\mathrm{d}_i-\delta_j$, $\mathrm{l}_i-\lambda_j$. The probability that this has occurred is at most the minimum of the degrees of all these polynomials divided by the field size. Thus with high probability $\mathrm{d}_i(x)=\delta_j(x)$ and $\mathrm{l}_i(x)=\lambda_j(x)$ for each $(i, j)$ pair which correspond via the permutation. In particular, $\mathrm{l}_i(x)=\lambda_j(x)$ implies the leaf vectors must be the same up to a prepending of one of them with $0$'s, but then $L_i=\Lambda_j$ implies exact equality. Then $\mathrm{d}_i(x)=\delta_j(x)$ implies similar with the dyck words, except since they're both dyck words there is only the possiblity of exact equality: if you prepend $0$'s to a dyck word, it is never a dyck word because it would have an excess of $0$'s.
**NOTE**: We're actually trying to conclude $\mathrm{d}_i(x)=\delta_j(x)$ from $\mathrm{d}_i(\alpha)=\delta_j(\alpha)$ for every corresponding $(i, j)$ pair (and similarly with the leaf vectors), not just for a single pair. Thus we want to assess the probability that $\alpha$ is not the root of *any* of these polynomials. This probability is bounded by the sum of the degrees of these polynomials divided by the field size. Since leaf vectors are roughly half the length of dyck words, we can roughly bound this by three halves of the height of the table divided by the field size. (**END NOTE**)
Assuming the events of success (that is, convincing the verifier of a true claim) in each successive round of "unwrapping" the randomness are independent, the probability of that the verifier is only convinced of a true claim is bounded by
$$(1-\tfrac{C_1}{|\mathbb{F}|})(1-\tfrac{C_1}{|\mathbb{F}|})(1-\tfrac{C_2}{|\mathbb{F}|}) \approx 1 - \tfrac{C}{|\mathbb{F}|}$$ where $C_1$ depends on the number of nouns, and $C_2$ depends linearly on the total number of nodes in all of the nouns.
#### Memory (Nock 0s) Table
##### High-Level Overview
A Nock 0 is a memory access into the subject which returns a subtree at a specified axis, so this table plays a role akin to RAM in standard CPU architecture. For each subtree access into the subject, there is an O(log(n))-time axis-path traversal to that subject: starting at the root, we go left or right into the tree according to the axis path; one interesting aspect is we "calculate" the correct subtree nondeterministically by writing down both possible answers in the form of tree-stack tuples and verify they are correct by showing that they nd-cons appropriately, then select the correct subtree according to the path and move it into the next row as the new subject.
##### Essential Details Overview
1. The prover divides up the table into segments (contiguous groupings of rows), one for each distinct (subject, axis) pair. In each segment, he puts the binary digits of the input axis, excluding the leading one and in order of most- to least-significant, in a dedicated base column. (There are other base columns but they're not pertinent here.)
2. Prover commits to the base columns (of all tables) and receives randomness $\alpha, a, b, c, p, q, r, s, t$.
3. (Honest) prover now goes row-by-row in each segment. In each row, a "triple" of a tree and its left and right subtrees is entered. Each of the trees in this triple is actually four contiguous cells in the row: three cells for (length, dyck-stack, leaf-stack), and the fourth is the tree felt which is the compression of this tuple with the randomness $(a, b, c)$. (There is a constraint in the table which enforces that the left and right subtrees nd-cons together to the tree.) Now, depending on whether the axis digit is 0 or 1, respectively, the prover moves the left or right subtree into the tree position in the next row, writes down the left and right subtrees of this tree, and continues in this fashion until the subtree at the target axis is written down after the last axis digit is read.
4. Prover commits to this and randomness $\beta, \gamma$ are received.
4. The prover then makes several multisets. One is of all the nouns (which will be compared to a multiset from the noun table), the idea being that e.g. the dyck-stacks and leaf-stacks were merely produced, not computed, so we need to check that they have indeed been well-constructed. Another is of (subject, axis, output) triples that are shared with the stack table; the idea being that the stack table needs to send requests to the memory table. It may seem weird that the stack table includes the result in the "request", but remember, the stack table is the prover too and should know the answer; it's basically a request to confirm the output with the given input.
![](https://hackmd.io/_uploads/HJVE1cYr2.jpg)
![](https://hackmd.io/_uploads/B158ua5Vn.jpg)
**Mathematical Argument.**: Given that we're assuming the various opaque felts really do represent genuine dyck words and leaf vectors, the only real question here is whether using nd-cons in each row is a valid proxy for cons. We argue for this here.
Suppose nd-cons holds. There are three equations:
$$\mathrm{len}(T) = \mathrm{len}(L)+\mathrm{len}(R)$$ $$\mathrm{d}_T(\alpha) = 0\cdot\alpha^{2\mathrm{len}(T)-3} + \alpha^{2\mathrm{len}(R)-1}\mathrm{d}_L(\alpha) + 1\cdot\alpha^{2\mathrm{len}(R)-2} + \mathrm{d}_R(\alpha)$$ $$\mathrm{l}_T(\alpha) = \alpha^{\mathrm{len}(R)}\mathrm{l}_L(\alpha) + \mathrm{l}_R(\alpha) $$
Each side of the latter two equations is the evaluation of a polynomial at $\alpha$. Given that the coefficients of these polynomials were committed to (in the noun table) before $\alpha$ was produced, with high probability the equality of the evaluations implies the equality of the underlying polynomials. These polynomial identities are equivalent to cons as expressed by dyck words and leaf vectors once leading zeroes are restored appropriately using the lengths, and the length equation implies this restoration is consistent.
Thus nd-cons implies cons with high probability.
#### Stack Table
##### High-Level Overview
This table implements a stack machine with stack height bounded by $2^{32}$ to handle the recursive Nock opcodes, and proves the flow of an entire computation. As such, it receives input from every other table.
##### Design Motivation
[This page](https://hackmd.io/3RRlYOEkQUiE-oWbXC-Llg#Basic-registers) has incredibly detailed information about how the stack table functions. It is the component of our zkVM architecture which likely looks most imposing at first glance, so let's proceed languidly.
The three core components of the stack table are: the compute stack, which holds (subject, formula) pairs to be evaluated; the product stack, which holds the output of Nock evaluations which are intermediate results of the overall computation; and the state machine, whose various modes orchestrate the movement of items on and off the two stacks.
To give – tersely – a sense of the utility of these componenets, we can paraphrase Turing from his *Entsheidungsproblem* paper: the behavior of the stack table at any moment is determined by what it's observing at the top of the stacks, and its "state of mind" as recorded in the state machine.
Let's, then, talk through a generic Nock computation with the opcodes 0-5, as covered in the Nock section above. We'll be tracking our states of mind as the computer, the subcomputations we need to perform, and the intermediate results we need to store; we'll indicate how these correspond to components in the stack table. This will give a solid basis for our design choice and hopefully make clear that the logic of the stack table is exactly the logic of Nock execution in the Nock spec.
We first have our state of mind (mode: %mstart) preceding any computation, observing a subject and formula (subject and formula on the computation stack), with nothing yet computed (product stack empty).
We then transition into the state of execution (mode: %m1) of a Nock computation `*[S F]` by grabbing the subject and formula (popping off the computation stack). Note that since Nock execution is fundamentally recursive, we can also enter this mode from modes other than the starting state of a computation.
What decides our state of mind next? Well, in order to proceed we have to know what species of formula `F` is: in Nock, the head of `F` is either an opcode that can be directly computed, i.e. `0` or `1`, or it's not, i.e. it's a cell or it's one of the opcodes `2`, `3`, `4`, or `5`. Each of these options involves doing an operation after the result(s) of some *other* Nock computation or computations are obtained: cons (cell), evaluate Nock (`2`), atom/cell? (`3`), increment (`4`), and equality (`5`). We clearly need a way to express these in the computation stack. Thus our computation stack uses an additional formula type with opcode `15`: `F = [15 a]` where `a` is among the atoms `99`, `2`, `3`, `4`, or `5`; `99` is the code for cons.
If we see the opcode `0`, this switches our state of mind to the evaluation of a Nock 0 (mode: %mn0). We evaluate the Nock 0 (this happens via communication with the memory table: the stack table, being the prover, asserts the answer, puts a (subject, axis, asserted-output) triple in a multiset, and this is proven equal to a (subject, axis, actual-output) triple by the memory table by a multiset argument), and save it as the latest result obtained (push it on the product stack). If the computation stack is empty (it may not be; remember, as noted above, the evaluation state %m1 could have been entered in the middle of a larger computation so we may still have more work to do) we transition to a state of recognition that the computation is over (mode: %mend); otherwise we return to clearing the compute stack (%m1 again).
If we see the opcode `1`, our state of mind becomes that of a Nock 1 evaluation (mode: %mn1). The Nock 1 evaluation is fairly trivial and is in fact computed in the process of formula decoding by the decoder table. (The stack table asserts the correct formula destructuring which contains the Nock 1 output, and the components of the destructuring are placed in a multiset which is checked against the multiset which is computed by the decoder table.) The result is pushed on the product stack and we move to clearing the compute stack (%m1) or end if it's empty (%mend).
If the opcode is not `0`, `1`, or `15`, this means one of several possibilities, but the commonality is that a certain subtree or subtrees, called *subformulas*, of the current formula will need to be executed against the current subject, and then a second operation is performed on this result or results. Thus the computer moves to a state (mode: subformula, %msf) that reflects this. The current subject and the formula `[15 op]`, where `op` is `99`, `2`, `3`, `4`, or `5`, are placed on the compute stack to register the operation that will occur *after* the execution of the subformulas.
Now the computer turns focus (mode: %msfa) to the execution of the first subformula SF by prioritizing its computation (places current subject and SF on the compute stack). If the opcode is `3` or `4` there is only one subformula and he proceeds computing it (goes to execution mode %m1). Otherwise there is a second subformula and his focus turns (mode: %msfb) to prioritizing it (places current subject and SF2 at the top of the compute stack above current subject and SF1). He proceeds to computing it (execution mode %m1).
The only remaining case is a formula `[15 op]` where `op` is `99`, `2`, `3`, `4`, or `5` (mode: %mn15). (We actually have another mode %mn15a but this is for technical reasons relating to the number of registers of a certain type that we have in one row of the table, so we will ignore the distinction here.) In every case but `2`, we take the requisite number of things off the product stack (two for `99` and `5`; one for `3` and `4`), and perform the corresponding operation (cons for `99`, check equality for `5`; check atom/cell? for `3`, increment if an atom for `4`), whose result is saved (pushed onto the product stack). We then check if the compute stack is empty; if it is we're done (mode: %mend), otherwise being evaluating (mode: %m1). While for `2`, we take the top two things off the product stack, which are a new subject and formula, put them on the compute stack, and begin to execute (mode: %m1).
**Note**: In case you were wondering about the order of things at the top of the product stack: Since subformula two (SF2) goes higher in the compute stack than SF1, the results of the evaluation of SF2 will be *below* the results of the evaluation of SF1 on the product stack when when encounter a `[15 op]`.
##### Example
![](https://hackmd.io/_uploads/rkEdTjZHn.jpg)
![](https://hackmd.io/_uploads/ryXK6i-H3.jpg)
![](https://hackmd.io/_uploads/rk6tpo-r3.jpg)
##### How to Push and Pop
###### Push
Given a (nondeterministic) stack
$$n(\gamma) = n_0\gamma^k + \dots + n_{k-1}\gamma + n_k$$ and a field element $n_{k+1}$ to push on the stack, it's quite simple to do the push:
$$\mathrm{push}(n(\gamma), \, n_{k+1}) = \gamma\cdot n(\gamma) + n_{k+1}$$
The multiplication by $\gamma$ pushes the stack down to create an open spot at the top, and then $n_{k+1}$ can simply be slotted into this spot with an addition.
Given two stacks $m(\gamma) = \sum_i^{k+1} m_i \gamma^{l-i}$, $n(\gamma) = \sum_j^k n_j\gamma^{k-j}$, and a field element $n_{k+1}$, if
$$m(\gamma) = \mathrm{push}(n(\gamma), \, n_{k+1}) = \gamma\cdot n(\gamma) + n_{k+1}$$ then with high probability
$$m(x) = x\cdot n(x) + n_{k+1}$$ so $m$ is just $n$ with $n_{k+1}$ added at the end, which is what we want from a stack push.
###### Pop
Given a stack
$$n(\gamma) = n_0\gamma^k + \dots + n_{k-1}\gamma + n_k$$ it's not as straightforward to pop $n_k$ since $n(\gamma)$ is an opaque felt blob. The honest prover will know what should be popped, but we need to guard against the prover with mischief at heart.
The situation is not dissimilar to one we've already encountered: having a noun and wanting to produce, say, the left subtree. The easiest way to compute an (partial) inverse of a function (like push, or cons) is to write down the (full) correct answer and then show it's the correct answer by computing the original function.
Thus to compute pop, we write down $(p(\gamma), f)$ where $p(\gamma)$ is known to have the form
$$p(\gamma) = p_0\gamma^{k-1} + \dots + p_{k-1}$$ and check that $\mathrm{push}(p(\gamma), f) =n(\gamma)$. Then by the same probabilistic argument given above, $f = n_k$ with high probability, so $f = \mathrm{pop}(n(\gamma))$.
The catch here is in the fragment "known to have the form" above. There are many field elements $(p, f)$ that satisfy $n(\gamma) = \gamma\cdot p + f$; we need to know that $p$ has the form of a polynomial in $\gamma$, the coefficients of which have been committed to ahead of time, to apply the probabilistic argument and conclude that $f$ is correct with high probability.
Thus we need some apparatus in our zkVM which prepares structured field elements like $p(\gamma) = p_0\gamma^{k-1} + \dots + p_{k-1}$ so that we can check our pops. This would be somewhat analogous to the noun table; call it the *pop table*. Its purpose would be to create a multiset of all these structured field elements to share with the stack table and validate the pops.
This also means the stack table will need to have a column for the popped stack quantity $p$ in $n(\gamma) = \gamma\cdot p + f$.
##### Commitment Timeline
1. The prover fills the table with the binary flags that define the modes and the binary selector flags that do opcode selection.
2. The prover commits to this and the verifer returns randomness $\alpha, a, b, c, p, q, r, s, t$.
3. The prover populates the table with the components (dyck and leaf) of the trees used in the computation with $\alpha$, fills in any instances of compressed nouns with $a, b, c$, and any other compressed tuples. This includes everything that is pushed to and popped from the stack.
4. The prover commits to these new columns and the verifier provides randomness $\beta$, $\gamma$.
5. The prover creates the stacks with variable $\gamma$: these are polynomial expressions in $\gamma$ whose coefficients are compressed tuples (using $a$, $b$, $c$) of tree components. This is also when the prover fills in the column that holds the popped stack quantities $p$ in the equation $n(\gamma) = \gamma\cdot p + f$, which will go in the multiset that is shared with the pop table. (**Note**: Notice that the prover was committed to this quantity since before $\gamma$ was revealed, since the states of the stack machine and the push and pop quantities determine $n(\gamma)$ and $f$ in the above equation, which is agreed to by the verifier. This means we don't have to recommit to the $p$ column before $\beta$ is produced, and allows us to avoid a third commitment.) Prover also creates any multisets; for example to validate nouns with the noun table, to request Nock 0 computations from the memory table, and to do communication about the structure of formulas with the decoder table.
#### Pop Table
##### High-Level Overview
This table computes stacks of compressed noun tuples and puts them in a multiset to be shared with the stack table. The stacks recorded are all the stack states after a pop in the stack table occurs.
##### Essential Details Overview
1. The table can be mentally divided into segments; one per stack. Each segment is labelled by a nonzero multiplicity base column which eventually decrements to 0; this is the trigger in the constraints to begin the next segment.
2. Prover commits to the base column and gets randomness $\alpha, a, b, c, p, q, r, s, t$.
3. Prover uses these random values to create the compressed nouns that will go in the stacks.
4. Prover commits again and receives randomness $\beta, \gamma$.
5. In each segment the nouns are pushed onto separate stacks with variable $\gamma$ for that segment one row at a time. Prover puts each full stack into the multiset using $\beta$ a number of times equal to the multiplicity. Prover also puts the nouns in a multiset which is shared with the noun table to show that they are properly formed. (From the perspective of this table, $\beta$ and $\gamma$ can be produced at the same time without an extra commitment since the stacks are produced deterministically from the nouns via the constraints, thus the commitment to the nouns constitutes a commitment to the stacks.)
##### Picture
![](https://hackmd.io/_uploads/B1KP19FHn.jpg)
##### Example
![](https://hackmd.io/_uploads/SyMTjR9H2.jpg)
![](https://hackmd.io/_uploads/SJxCi05Hh.jpg)
#### Decoder Table
##### High-Level Overview
This is a table that the stack table outsources to; it destructures and analyzes the formulas in the stack table and processes them so that the stack table knows how to proceed in the computation. The analysis reveals whether a formula's head is a cell or an atom, and if it's an atom, how many subformulas need to be computed.
##### Essential Details Overview
Each row of this table essentially computes a function $decode: \mathrm{formula} \rightarrow (\mathrm{formula}, \mathrm{is}\textrm{-}\mathrm{cons}, \mathrm{opcode}, \mathrm{out_1}, \mathrm{out_2})$. We explain each component below.
The $\mathrm{formula}$ in the output is identical to the one in the input. The reason for this is to link the input formula to the destructured output. The stack table makes assertions about how a particular formula destructures and puts the input *and* all output components in a multiset, while the decoder table does the same except the assertion is instead a computation, enforced by the constraints in the decoder table. After a successful multiset argument, the first $\mathrm{formula}$ input component in a stack table tuple must match the first component of some decoder table input, and thus the asserted output has to match the computed output for this formula. It's as if the first $\mathrm{formula}$ component of the tuple in the stack table serves as a dictionary key, which looks up the computed answer from the decoder table as the value pair of the key, and then checks it against the asserted answer.
Before proceeding it's worth reminding the reader that a formula in Nock is always a cell `[b c]`, but there are three subcases within this. First, if `b` is a cell, then `*[a [b c]] = [*[a b] *[a c]]`; that is, each of `b` and `c` are treated as formulas, and we cons the result of computing Nock with these formulas against the subject; this is the `yes` case of the $\mathrm{is}\textrm{-}\mathrm{cons}$ flag which is explained in the next paragraph. Then if `b` is an atom, there are two more cases: `b` could be `2` or `5`, in which case `c` must be a cell `[c_1 c_2]` and the computation of `*[a [b c]]` involves computing `*[a c_1]` and `*[a c_2]`; or `b` is `0`, `1`, `3`, or `4`, in which case we don't need to destructure `c` any further.
The $\mathrm{is}\textrm{-}\mathrm{cons}$ component, then, is simply a binary flag which indicates whether the formula's head is a cell.
The $\mathrm{opcode}$ component is either 0-5, 15, or 99. (Recall that 15 is the fake opcode used in the stack table implementation.) It's 99 exactly when $\mathrm{is}\textrm{-}\mathrm{cons}$ is `yes`.
The $\mathrm{out_1}$ and $\mathrm{out_2}$ components are both noun-triples, and are conditional on $\mathrm{opcode}$. If $\mathrm{opcode}$ is 99, $\mathrm{out_1}$ and $\mathrm{out_2}$ are the left and right nouns of the formula.
•
F = / \
out_1 out_2
If $\mathrm{opcode}$ is 2 or 5, $\mathrm{out_1}$ and $\mathrm{out_2}$ are the left and right subtrees of the tail of the formula.
•
/ \
F = 2/5 •
/ \
out_1 out_2
Othwerwise, $\mathrm{out_1} = \mathrm{out_2} = \mathrm{out}$ is the right subtree of the formula.
•
F = / \
op out
From these diagrams, it's easy to see how the function $decode$ is computed. First the left and right subtrees are found via assertion and checked correct via cons. If the length of the left subtree is greater than 1, it's the cons case and we're done. If the length is 1, the value of the leaf of the left subtree is checked. Depending on what it is, we either find the left and right subtrees of the formula's right subtree via assertion and check with cons, or we're already done.
##### Commitment Timeline
1. Prover fills in the cons flags, lengths of nouns, and opcodes which don't require randomness.
2. Prover commits and gets randomness $\alpha$ from the verifier.
3. Prover fills in the dyck and leaf stack information about all the nouns used.
4. Prover commits to these new columns and gets randomness $p, q, r, s, t, u, v, w$ and $\beta$.
5. The Latin letters are used to compress the results of the $decode$ function and the resulting field elements are put into a multiset using the variable $\beta$.
##### Example
![](https://hackmd.io/_uploads/rkRznCqr3.jpg)
![](https://hackmd.io/_uploads/HJkE2CcS3.jpg)
#### Exponentiation Table
##### High-Level Overview
This table has the narrow purpose of computing powers of the randomness $\alpha$ which is used to do fast cons, and packaging them into multisets to share with the previous tables. Why do we need it? Variable powers of a base cannot be produced in a table at will because they must be computed in polynomial terms, so they are built by repeated exponentiation of the base in this table.
##### Essential Details Overview
The table is divided into segments consisting of contiguous groups of rows, one segment for each natural number starting with 1. These natural numbers are the powers to which we will raise $\alpha$.
1. Prover fills in the powers just described. The prover also fills in a multiplicity column, whose implementation and purpose are thus: at the top of each segment, it holds the number of times the prover wants to save this particular power $k$ of $\alpha$ for use in other tables; the multiplicity then decrements within the segment to 0, and at each decrement, $\alpha^k$ will be saved in a multiset.
2. Prover commits to this and receives randomness $\alpha$ from the verifier.
3. Prover now fills in an exponent column, with copies of the exponential $\alpha^k$ in each segment labelled with the power $k$. There are constraints which ensure that this is what *must* be filled in.
4. Prover now commits to this column as well and receives randomness $\beta$.
5. Prover then creates multisets for different tables containing the exponentials. What goes in the multisets is a 2-tuple $(pow, exp)$ of the power and resultant exponential in $\alpha$ raised to that power.
![](https://hackmd.io/_uploads/B1bFJcFr2.jpg)
##### Relation to Memory Table
The exponent table produces a multiset of exponentials of $\alpha$ for the memory table to use. Recall that in the cons formulas for dyck words and leaf vectors in terms of polynomials, the length of the right leaf, or a quantity derived from it by affine transformation (i.e. $x \mapsto mx+b$), is utilized in the exponents of $\alpha$ needed to apply the cons formulas.
Now in the memory table, when we want to do a cons and need $\alpha^{\mathrm{len(R)}}$, the memory table puts the 2-tuple $(\mathrm{len(R)}, f)$ onto a multiset, and uses $f$ as if it were $\alpha^{\mathrm{len(R)}}$ in the cons formula, effectively asserting that $f = \alpha^{\mathrm{len(R)}}$. The multiset argument with the exponent table ensures that $(\mathrm{len(R)}, f)$ is equal to a 2-tuple of the form $(k, \alpha^k)$, and it follows that indeed $f = \alpha^{\mathrm{len(R)}}$ as asserted.
## What to Keep in Mind About Randomness
(I don't know where to put this section.)
Values that are put into a "random data structure" need to be committed to before the randomness is produced. If the prover knows the random value before filling in the values that go in a multiset, then the multiset argument doesn't work mathematically.
There are subtleties, though. For example, when we compress tuples $(x_i, y_i, z_i)$ and put them in a multiset $\prod_i (\beta - (ax_i+by_i+cz_i))$, it's the $ax_i+by_i+cz_i$ that go in the multiset but we don't need to recommit to these quantities to generate $\beta$ if we've already committed to the $x_i, y_i, z_i$, since $ax_i+by_i+cz_i$ is a determined function of these committed quantities (more precisely, there would be a constraint that the verifier has already agreed to which demands that the prover form the linear combination $ax_i+by_i+cz_i$). As long as the initial commitment travels up through the layers of randomness via some deterministic computation, and the new values of randomness truly are random relative to the previous values, the prover only needs one commitment.
The need for an extra commitment when using multiple layers of randomness comes when the values being put in the outer random data structure are nondeterministically produced, but the values cannot be produced by the honest prover without knowledge of the inner randomness. This is what happens in the memory table where dyck and leaf stacks are nondeterministically produced and then put into a multiset. The honest prover needs an initial commitment to produce these values properly with $\alpha$, but the dishonest prover can just make up values and try to game the multiset equation if he already knows $\beta$.
## Picture Summary
![](https://hackmd.io/_uploads/BJfoy9tHn.jpg)
![](https://hackmd.io/_uploads/ByrnkcFBn.jpg)
![](https://hackmd.io/_uploads/rJfv30cH3.jpg)
## Concluding Summary
The Nock zkVM design elaborated here has endeavored to capture the logic of the minimal Turing-complete portion (opcodes 0-5) of a Nock computation. It's useful to describe as succinctly as possible how we think this has been accomplished by the design above.
##### An Abbreviated Look at Nock
At a high level, the essential elements of a Nock computation are as follows. We are presented with a pair of nouns: a subject and formula. The head of the formula has to be inspected, and depending on the result of the inspection, the formula is split into different subtrees. Based on the contents of the formula's head, there is either an immediate computation carried out on the subject using the other information in the formula, or new subcomputation(s) are prioritized which have to be carried out before the main computation can conclude. The only operations which are performed in the process of Nock are: Nock itself, binary tree concatenation (cons), binary tree decomposition (inverse cons), subtree access (iterated tree decomposition), the identity function, increment, and equality checking. Of these, only the operations involving cons are basically nontrivial.
Thus, to capture a Nock computation, we need: a representation of nouns, a means to inspect and decompose formulas, a way to keep track of subcomputation priority and the storing of intermediate results, and a valid way to do cons, inverse cons, and subtree accesses within our noun representation.
We'll summarize how we've addressed each of these items in turn, and how they are tied into a cohesive whole with multiset arguments.
##### Noun Representation
We initially arithmetize nouns as (dyck-word, leaf-vector) pairs. We then use a robust hashing system to succinctly represent a noun as a triple of field elements using verifier randomness; by robust we mean the size of the field and Schwartz-Zippel imply that the representation is faithful, almost certainly. The noun table then accomplishes the task of performing this encoding.
##### Cons
Our noun representation leads to a relatively straightforward handling of cons in terms of dyck words and leaf vectors. A simple translation can be made from words/vectors into polynomial identities. These polynomial identities are then validated by the fact that they hold at some verifier-supplied randomness, which follows from Schwartz-Zippel. In order for this to work, we need arbitrary powers of the randomness, which necessitates the exponent table.
##### Inverse Cons
This is simple once we have cons: given a tree's representation, just write down the representations of candidate left and right subtrees and check that they cons together to the given tree.
##### Subtree Access
This is what the memory table does. Given an axis, it uses the binary digits of the axis to do repeated inverse cons operations, starting with a given subject, and where each repeated inverse cons uses the results of the last according to the current axis digit.
##### Formula Analysis & Decomposition
This is the function of the decoder table. Destructuring a formula into the requisite components involves nothing more than a few inverse cons operations.
##### Computation Bookkeeping
This is the purpose of the stack table: priority of subcomputations and of intermediate results is handled by giving them each their own stack. One of our technical innovations is in how to simulate a stack in the context of finite field elements. A stack can be conceptualized as a pair of (height, polynomial), where the constant term of the polynomial is the top of the stack. Just as with cons, pushing on the stack can be translated into a polynomial identity which is validated by evaluation on verifier randomness. Popping from a given stack is analogous to inverse cons, in that we provide the post-pop stack and popped value, and show that the push of the value onto the post-pop stack yields the given stack. We need to actually construct these post-pop stacks, somewhat analogous to the noun table, which necessitates the introduction of the pop table.
##### Communication
The last major component is making sure that the different tables are talking about the same computation! This is accomplished with multiset arguments. The tables that use the concise representation of nouns get those representation from the noun table via multiset. The tables that need to do cons or inverse cons get arbitrary powers of randomness from the exponent table via multiset. The decoder table, memory table, and pop table share their results with the stack table via multiset. The soundness of the multiset arguments is guaranteed to be high by Schwartz-Zippel and the large size of the field.
## Technical Appendix
#### Proof of Equivalence of Dyck Word Definitions
We have defined two types of binary word. The first type is defined by the recursive definition given for dyck words. The second type is defined by properties (a) and (b) above. We will show these define the same set of binary words.
**Proof that any type-2 word is a type-1 word**: By induction on the length of a word. (Note that any word of either type is obviously of even length.) If the type-2 word is empty, it's also a type-1 word, which is the base case. Suppose all type-2 words of length less than or equal to a certain length $2k-2$ are type-1 words. We will argue that all type-2 words of length $2k$ are type-1 words. Let $w$ be such a word.
By (a), the first character of $w$ is $0$. Now find the shortest initial subword $A$ of $w$ which has property (b), i.e. equal numbers of $0$'s and $1$'s. We call a subword *initial* if it starts at the same character as the full word. Since $w$ has property (b), $A$ must exist. Write $w=Ab$.
We claim that $1$ is the terminal character of $A$. If not, then $A=0a0$ for some word $a$. Since $A$ has property (b), it must be that $0a$ has more $1$'s than $0$'s; but since $0a$ is an initial subword of $w$, this contradicts property (a) of $w$. Thus $A=0a1$, and so $w=0a1b$.
We now claim that $a$ and $b$ also have properties (a) and (b). If this is true, the proof will be complete by the induction since $a$ and $b$ are shorter than $w$.
Now, it's obvious that $a$ has property (b) since $A=0a1$ and $A$ has property (b). So suppose $a$ doesn't have property (a). Intuitively, the first place in which $a$ violates property (a) while reading left to right will be a $1$ which "matches" with the opening $0$; but the $1$ at the end of $A$ is supposed to be this match. Precisely, the strategy is to show that $A$ has a strict initial subword with property (b), which will be an initial subword of $w$ with property (b) shorter than $A$; contradiction.
Begin by finding the shortest initial subword $C$ of $a$, so $a=Cd$, and such that $C$ *also doesn't have* property (a); $C$ exists because $a$ itself doesn't have property (a). We claim that $C$ ends in a $1$. For if it ended in $0$ we could drop the zero and still have an excess of $1$'s but a shorter word, which contradicts the definition of $C$. So $C=c1$.
Now we claim that $c$ must have an equal number of $0$'s and $1$'s. If it had more $0$'s than $1$'s, then $C$ would have either equal numbers of $0$'s and $1$'s or an excess of $0$'s, contradicting the definition of $C$; while if it had more $1$'s than $0$'s, then $c$ wouldn't have property (a) but be an initial subword of $a$ shorter than $C$, again a contradiction.
But now $w=0a1b=0Cd1b=0c1d1b$ where $0c1$ has equal numbers of $0$'s and $1$'s, and $0c1$ is shorter than $A$. But this contradicts the definition of $A$! Thus $a$ has properties (a) and (b).
It follows quite immediately from the facts that $0a1$ and $w$ have property (b) that $b$ also has property (b). And since $0a1$ has property (b) and $w$ has property (a), (b) must have property (a).
So as noted previously, the proof is complete by induction.
**Proof that any type-1 word is a type-2 word**: This is again by induction on word length just as in the last argument. Let $w$ be a type-1 word, so that $w=0a1b$, where $a$ and $b$ are type-1 words. Then since $a$ and $b$ are shorter type-1 words than $w$, the induction hypothesis implies $a$ and $b$ both have properties (a) and (b).
It's immediate then that $w$ has property (b). Since $a$ has property (a), the initial subword $0a1$ has property (a) as well. For any initial subword of $w$ longer than $0a1$, the numbers of $0$'s and $1$'s up through $0a1$ are the same, so the relative count of these two characters past this point is entirely determined by $b$. But (b) has property (a). Thus it follows that $w$ has property (a).
#### Proof of Dyck Word / Binary Tree Bijection
We now make an argument that dyck words perfectly encode binary trees. If the binary tree is a leaf (has no children i.e. is the trivial tree), then we map it to the empty dyck word. If not, then we map it to 0L1R where L and R are the dyck words of the left and right subtrees. Since the left and right subtrees are smaller and we know how to handle the trivial tree, this constructs the map by induction on the size of a tree. This is the encoding map.
In the reverse direction, the empty word maps to the trivial tree. Otherwise, we find the 1 which pairs with the opening 0 in the word. To do this, read left-to-right and keep track of the number of 0's and 1's; the first time they become equal in number you must be looking at a 1, otherwise we contradict (1) above. This is the partner of the opening 0. Everything between this (0, 1) partner pair will be the dyck word of the left subtree of the image tree (it could be the empty word!), and everything to the right of 1 is the dyck word of the right subtree of the image tree (again, this could be the empty word!). But each of these words is smaller than the original and we know how to handle the base case of the empty word, so this constructs the map by induction on the length of a dyck word. This is the decoding map.
**Lemma**: In a binary tree the leaf nodes exceed the internal nodes in number by 1; thus a binary tree has an odd number 2n-1 of nodes, where n is the number of leaf nodes. A binary tree with n leaf nodes maps to a dyck word with 2n-2 letters. Conversely, a dyck word always has an even number 2n-2 of letters, and maps to a binary tree with n leaf nodes.
The justifications of each of the claims in the lemma are simple proofs by induction.
It's now easy to argue that the maps above are inverse to each other.
First, it's clear that the trivial tree and empty word map to each other, and by the descriptions above, nothing else can map to either. This serves as the base case of an induction on the size of the tree in nodes, or equivalently the length of the image dyck word, by the lemma. So take a nontrivial tree; it maps to 0L1R where L and R are the dyck word images by the encoding map of its left and right subtrees. We suppose that the encoding map takes all trees with less nodes than our chosen tree bijectively to all dyck words with size less than the image dyck word of our chosen tree, and that the inverse is given by the decoding map. Now, applying the inverse map to 0L1R requires knowing which 1 in this dyck word is the match of the opening 0: *a priori* it could be internal to L or R. However, this is not the case, and the 1 which matches with the opening 0 in 0L1R is the explicit 1 separating the L and R. The reason for this is that L is a dyck word and thus its internal counts of 0's and 1's are equal; this immediately proves the claim in the previous sentence. So the inverse map applied to 0L1R is the tree whose left and right subtrees are the images of L and R by the inverse map. But L and R are shorter words than 0L1R, so by induction, L and R map back to the left and right subtrees of the starting tree. Thus the decoding map inverts the encoding map.