# zkTree Matrix representations
$\def\RR{{\bf R}}$
$\def\bold#1{{\bf #1}}$ $\def\pL{{{\mathbf{p}}^{(i)}_{\leftarrow}}}$
$\def\bold#1{{\bf #1}}$ $\def\pLm{{{\mathbf{P}}_{\leftarrow}}}$
$\def\pR{{{\mathbf{p}}^{(i)}_{\rightarrow}}}$
$\def\pRm{{{\mathbf{P}}_{\rightarrow}}}$
$\def\fLm{{{\mathbf{F}}_{\leftarrow}}}$
$\def\fRm{{{\mathbf{F}}_{\rightarrow}}}$
$\def\Nint{{N_{int}}}$
$\def\sparse#1{{\bbox[1px, border: 1.5px solid red]{\bold{#1}}}}$
$\def\vstack#1#2{{\overset{#1}{\underset{#2}{=}}}}$
## First notes
A box is represented by $2d$ coordinates.
We need in total $N\times 2d$ coordinates where $N$ is the total number of nodes.
We can represent these boxes through $2d$ vectors each of size $N$:
- $\pL$ is the set vector of ''left'' edges for dimension $i \in [d]$
- $\pR$ is the set vector of ''right'' edges for dimension $i \in [d]$
We assume the last part of these vectors to represent the leaf nodes (see also more examples below).
To describe the structure of the tree in terms of left and right children we use two matrices $L,R \in \{0,1\}^{\Nint\times N}$ where $\Nint$ is the number of internal nodes.
The semantics of these matrices is such that for each row $L_j$ in matrix L, for example, $L_j\pL$ represents the $\leftarrow$ edge (in the i-th dimension) of the left child of node $j$.
## Consequences of the above
**NB: there is a mistake (corrected in the constraints) about the equality below. On the RHS there should be a modified version of the p vectors with all zeros for children. I solved it with stacking in the constraints and separating the internal/leaf nodes (we'll optimize later)**
Some of the checks we need to perform can be written down as follows:
- for all $i \in [d]:$ $\,L' \pL = \pL$ (left children share the left edge of their parent) where $L'$ is L stacked on top of a zero matrix to pad $N-\Nint$ rows of zeroes.
-- for all $i \in [d]:$ $\,R' \pR = \pR$ (right children share the right edge of their parent) where $R'$ is R stacked on top of a zero matrix to pad $N-\Nint$ rows of zeroes.
Each of the equations above is the form
$$ \mathbf{v}M = \mathbf{v}$$
where $M$ is a "simple" matrix (only one zero per row) as defined by [RZ21].
To prove this type of equation we can use CSS techniques.
NB: Above we ignore temporarily the fact that we need to prove each of these equations for all $i \in [d]$. We discuss this below.
### Scenario 1: we know the structure of L,R
If we know the structure of L and R and this is public this can be preprocessed so that we set M as a "repetition" of L (resp. R); the vector $v$ is defined as the concatenation of the $\pL$-s (resp. $\pR$-s).
### Scenario 2: we do not know the structure of L,R
In this scenario we cannot preprocess, but the prover will instead commit to the matrices during a proving stage and will need to prove that:
- the matrix L (resp. R) has the appropriate "permutation"-like structure (each node is child of only one parent, etc). Open: how to do this?
- the expanded matrix M (repeated L (resp. R)) is constructed appropriately from the basic matrix from the previous item. Semi-open: we may use coset tricks or not for this.
## The matrix constraints we want
### Preliminaries
General notation:
- Lowecase denotes vectors; uppercase matrices
- We denote _sparse_ and _dense_ matrices differently. A sparse matrix $n$ by $n$, for some $n$, is one that has a number of non-zero elements asymptotically linear in $n$. Representation through sparse matrices is helpful to simplify notation, but should not be seen as an indication of efficiency: we require the prover not to work in the total size of the sparse matrix, only in the number of non-zero entries. A dense matrix is one that is not sparse. A sparse matrix is denoted by a red box, as in $\sparse{M}$.
- All matrices are assumed to be committed at proving time, except for identity matrices and matrices of ones.
- We denote element wise multiplication among matrices as $A\circ B$
- $\mathsf{Pad}$ denotes the padding with zeros to match the other matrices. This is unambiguous.
Decision tree notation:
- The matrices $\pLm, \pRm \in \mathbb{F}^{\Nint\times d}$ describe the left/right (i.e., $\leftarrow/\rightarrow$) boundaries of internal nodes, one column per component.
- Leaf nodes descriptions are denoted by $\fLm, \fRm\in \mathbb{F}^{(N-\Nint)\times d}$
- The matrices $\sparse{L}, \sparse{R} \in \{0,1\}^{\Nint \times N}$ represent a mapping $[\Nint] \to [N]$ from parents to left/right children. Specifically the i-th row of each matrix is the elementary vector $e_j \in \{0,1\}^{N}$ iff node $i$'s left/right child is $j$.
- Matrix $\bold{E} \in \{0,1\}^{\Nint \times d}$ denotes the dimension in which the left and right children are "split". That is, each row $i$ in $\bold{E}$ is an elementary vector $e_j \in \{0,1\}^{d}$. This encodes the fact that the left/right children of internal node $i$ have the same boundaries except for component $j$. We do not represent this sparsely because we need homomorphism on its commitments and its total size is not quadratic but still linear in the number of nodes (and their description).
### Constraints
// left (resp. right) boundaries of left (resp. right) children are the same as their parents'
- $\sparse{L} \cdot \left(\vstack{\pLm}{\fLm}\right) = \pLm$
- $\sparse{R} \cdot \left(\vstack{\pRm}{\fRm}\right) = \pRm$
// left and right children are the same in all coordinates except the one in E (left coordinates)
- $(\bold{1}-\bold{E})\circ \left( \pLm - \sparse{R} \cdot \left(\vstack{\pLm}{\fLm}\right) \right) = \bold{0}$ // NB: I used the equivalence $\pLm = \sparse{L} \cdot \left(\vstack{\pLm}{\fLm}\right)$ from the other constraint
// (right coordinates)
- $(\bold{1}-\bold{E})\circ \left( \pRm - \sparse{L} \cdot \left(\vstack{\pRm}{\fRm}\right) \right) = \bold{0}$ // NB: used symmetric equivalence
// coordinate in which their boundaries "split"
- $\bold{E}\circ \left( \sparse{L} \cdot \left(\vstack{\pRm}{\fRm}\right) - \sparse{R} \cdot \left(\vstack{\pLm}{\fLm}\right)\right)= \bold{0}$
**TODO: finish constraints**
Structural constraints on matrices
- Prove that E is made of elementary matrices