Multilinear Polynomials and Tensor Products

# Multilinear Polynomials and Tensor Products ## Preliminaries: Multilinear Polynomials Let $F$ be a finite field. A polynomial $f(X_0, \dots, X_{n-1})$ in $n$ variables over $F$ is multilinear if the degree of any variable $X_i$ in any term is at most 1. A multilinear polynomial $f$ in $n$ variables is uniquely defined by its $2^n$ coefficients. For example, for $n=2$, any multilinear polynomial can be written as: $$f(X_0, X_1) = c_{00} + c_{10}X_0 + c_{01}X_1 + c_{11}X_0X_1$$ We are primarily concerned with the evaluations of such polynomials over the Boolean hypercube, $H^n = \{0, 1\}^n$. ## Representation of Multilinear Polynomials A multilinear polynomial $f$ can be represented in two primary ways: 1. **Coefficient Form:** As a list $\mathbf{c} = (c_i)_{i \in \{0,1\}^n}$ of its $2^n$ coefficients. 2. **Evaluation Form:** As a list $\mathbf{f} = (f(x))_{x \in H^n}$ of its $2^n$ evaluations over the Boolean hypercube, typically ordered lexicographically. These two representations are linearly equivalent, and one can be converted to the other. This note focuses on the evaluation form because this is the one we care about in WHIR implementation. ## Evaluation at an Arbitrary Point Given the evaluation form $\mathbf{f}$ (a list of $2^n$ values on the hypercube), how do we efficiently compute $f(p)$ for an arbitrary point $p = (p_0, \dots, p_{n-1}) \in F^n$? This point $p$ is generally *not* on the hypercube. We will explore two equivalent methods to solve this. ### Method 1: Recursive Linear Interpolation This approach is based on a property of multilinear polynomials. We can "split" any $n$-variable multilinear polynomial $f$ by its first variable $X_0$: $$ f(X_0, \dots, X_{n-1}) = (1-X_0) \cdot f_0(X_1, \dots, X_{n-1}) + X_0 \cdot f_1(X_1, \dots, X_{n-1}) $$ where: * $f_0$ is $f$ with $X_0$ fixed to 0: $f_0(\dots) = f(0, \dots)$ * $f_1$ is $f$ with $X_0$ fixed to 1: $f_1(\dots) = f(1, \dots)$ Notice that $f_0$ and $f_1$ are both multilinear polynomials in $n-1$ variables. To evaluate $f$ at $p = (p_0, \dots, p_{n-1})$, we can apply this identity: $$ f(p) = (1-p_0) \cdot f_0(p_1, \dots, p_{n-1}) + p_0 \cdot f_1(p_1, \dots, p_{n-1}) $$ This gives a natural recursive algorithm: 1. **Base Case ($n=0$):** The polynomial is a constant $c$. The evaluation list $\mathbf{f}$ is just $[c]$. Return $c$. 2. **Recursive Step ($n > 0$):** * The evaluation list $\mathbf{f}$ (of size $2^n$) consists of the evaluations of $f_0$ followed by the evaluations of $f_1$. * Split $\mathbf{f}$ into two halves: $\mathbf{f}_0$ (the first $2^{n-1}$ elements) and $\mathbf{f}_1$ (the last $2^{n-1}$ elements). * Recursively compute $v_0 = f_0(p_1, \dots, p_{n-1})$ using $\mathbf{f}_0$. * Recursively compute $v_1 = f_1(p_1, \dots, p_{n-1})$ using $\mathbf{f}_1$. * Return $(1-p_0) \cdot v_0 + p_0 \cdot v_1$. This algorithm requires $O(2^n)$ field operations. :::info ### Example: Recursive Evaluation Let's trace the algorithm with a concrete 2-variable polynomial, using formatted equations to make the process clearer. ### The Setup * Polynomial: $f(X_0, X_1) = 2 + 5X_0 + 3X_1 + 8X_0X_1$ * Target Point $p$: We want to evaluate $f$ at the arbitrary point $p = (3, 4)$. * Evaluation Form $\mathbf{f}$: This is the input to our algorithm. We get it by evaluating $f$ on the hypercube $H^2 = \{(0,0), (0,1), (1,0), (1,1)\}$. * $f(0,0) = 2$ * $f(0,1) = 2 + 3(1) = 5$ * $f(1,0) = 2 + 5(1) = 7$ * $f(1,1) = 2 + 5(1) + 3(1) + 8(1)(1) = 18$ Our input evaluation list is $\mathbf{f} = [2, 5, 7, 18]$. ### The Algorithm Trace We want to compute $f(3, 4)$. **Step 1: Top-Level Call ($n=2$)** We apply the recursive formula using $p_0 = 3$. $$ \begin{align*} f(\overbrace{3}^{p_0}, 4) &= (1-3) \cdot f(0, 4) + 3 \cdot f(1, 4) \\ &= -2 \cdot \underbrace{f_0(4)}_{\text{Sub-problem 1}} + 3 \cdot \underbrace{f_1(4)}_{\text{Sub-problem 2}} \end{align*} $$ To solve this, the algorithm splits the list $\mathbf{f}$ into two halves (for $X_1 \in \{0, 1\}$): * $\mathbf{f}_0 = [2, 5]$, which are the evaluations of $f_0(X_1) = f(0, X_1)$. * $\mathbf{f}_1 = [7, 18]$, which are the evaluations of $f_1(X_1) = f(1, X_1)$. It then recursively calls itself to solve the two sub-problems. **Step 2: Solving sub-problem 1 (compute $f_0(4)$)** The algorithm is now asked to evaluate an $n=1$ polynomial (represented by $\mathbf{f}_0 = [2, 5]$) at the point $(4)$. $$ \begin{align*} f_0(4) &= (1-4) \cdot f_0(0) + 4 \cdot f_0(1) \\ &= -3 \cdot \underbrace{f_0(0)}_{\text{Base Case, from } \mathbf{f}_0[0]} + 4 \cdot \underbrace{f_0(1)}_{\text{Base Case, from } \mathbf{f}_0[1]} \\ &= -3 \cdot (2) + 4 \cdot (5) \\ &= -6 + 20 = \mathbf{14} \end{align*} $$ The base cases $f_0(0)=2$ and $f_0(1)=5$ are read directly from the list $\mathbf{f}_0$. **Step 3: Solving sub-problem 2 (compute $f_1(4)$)** Similarly, it evaluates the $n=1$ polynomial (represented by $\mathbf{f}_1 = [7, 18]$) at the point $(4)$. $$ \begin{align*} f_1(4) &= (1-4) \cdot f_1(0) + 4 \cdot f_1(1) \\ &= -3 \cdot \underbrace{f_1(0)}_{\text{Base Case, from } \mathbf{f}_1[0]} + 4 \cdot \underbrace{f_1(1)}_{\text{Base Case, from } \mathbf{f}_1[1]} \\ &= -3 \cdot (7) + 4 \cdot (18) \\ &= -21 + 72 = \mathbf{51} \end{align*} $$ The base cases $f_1(0)=7$ and $f_1(1)=18$ are read directly from the list $\mathbf{f}_1$. **Step 4: Final Computation** Now we substitute the results from Step 2 and 3 back into Step 1: $$ \begin{align*} f(3, 4) &= -2 \cdot \underbrace{f_0(4)}_{14} + 3 \cdot \underbrace{f_1(4)}_{51} \\ &= -2 \cdot (14) + 3 \cdot (51) \\ &= -28 + 153 \\ &= \mathbf{125} \end{align*} $$ ### Verification We can check this by plugging $p=(3, 4)$ directly into the original polynomial: $f(3, 4) = 2 + 5(3) + 3(4) + 8(3)(4) = 2 + 15 + 12 + 96 = 125$. The recursive algorithm successfully found the correct value. ::: ### Method 2: The Linear Algebra and Tensor Product View This second method provides a non-recursive perspective that connects polynomial evaluation to linear algebra. #### The Equality Polynomial First, we define the multilinear equality polynomial, $\text{eq}(x, p)$, for a hypercube point $x \in H^n$ and an arbitrary point $p \in F^n$: $$ \text{eq}(x, p) = \prod_{i=0}^{n-1} \left( x_i p_i + (1-x_i)(1-p_i) \right) $$ This polynomial has a special property: * If $x_i = 1$, the $i$-th term is $p_i$. * If $x_i = 0$, the $i$-th term is $(1-p_i)$. For example, for $n=3$ and $x=(1, 0, 1)$, $\text{eq}((1,0,1), p) = p_0 \cdot (1-p_1) \cdot p_2$. #### Evaluation as a Sum The fundamental identity of multilinear evaluation states that $f(p)$ is the sum of all its hypercube evaluations $f(x)$, each weighted by the equality polynomial $\text{eq}(x, p)$: $$ f(p) = \sum_{x \in \{0, 1\}^n} f(x) \cdot \text{eq}(x, p) $$ This transforms the evaluation problem into computing this sum. #### Connection to Dot Products and Tensors Let's represent our evaluation list as a column vector $\mathbf{f}$ of length $2^n$. Let's also define $\mathbf{eq}_p$ as the column vector of length $2^n$ containing the evaluations $\text{eq}(x, p)$ for all $x \in H^n$ in lexicographical order. The summation above can now be written as a simple dot product: $$ f(p) = \mathbf{f}^T \cdot \mathbf{eq}_p $$ This is a nice reframing. The vector $\mathbf{eq}_p$ has a pretty unexpected structure: it is a tensor product (also known as the Kronecker product $\otimes$). Let $\bar{p}_i = \begin{pmatrix} 1-p_i \\ p_i \end{pmatrix}$. Let's see how $\mathbf{eq}_p$ is built: * **Case $n=1$:** $p=(p_0)$. $x \in \{0, 1\}$. $$\mathbf{eq}_p = \begin{pmatrix} \text{eq}(0, p) \\ \text{eq}(1, p) \end{pmatrix} = \begin{pmatrix} 1-p_0 \\ p_0 \end{pmatrix} = \bar{p}_0$$ * **Case $n=2$:** $p=(p_0, p_1)$. $x \in \{(0,0), (0,1), (1,0), (1,1)\}$. \begin{align*} \mathbf{eq}_p = \begin{pmatrix} \text{eq}((0,0), p) \\ \text{eq}((0,1), p) \\ \text{eq}((1,0), p) \\ \text{eq}((1,1), p) \end{pmatrix} &= \begin{pmatrix} (1-p_0)(1-p_1) \\ (1-p_0)p_1 \\ p_0(1-p_1) \\ p_0 p_1 \end{pmatrix} \\ &= \begin{pmatrix} 1-p_0 \\ p_0 \end{pmatrix} \otimes \begin{pmatrix} 1-p_1 \\ p_1 \end{pmatrix} \\ &= \bar{p}_0 \otimes \bar{p}_1 \end{align*} In general, $\mathbf{eq}_p$ is the tensor product of the $\bar{p}_i$ vectors: $$ \mathbf{eq}_p = \bar{p}_0 \otimes \bar{p}_1 \otimes \dots \otimes \bar{p}_{n-1} $$ This is the key point to have in mind here. Evaluating a multilinear polynomial $f$ (given by $\mathbf{f}$) at a point $p$ is *identical* to computing the dot product of $\mathbf{f}$ with the tensor product vector $\mathbf{eq}_p$ derived from $p$. :::info ### Example: Tensor Product Evaluation Let's use the same setup as the previous example to see how this method works. ### The Setup * Polynomial: $f(X_0, X_1) = 2 + 5X_0 + 3X_1 + 8X_0X_1$ * Target Point $p$: $p = (p_0, p_1) = (3, 4)$ * Evaluation Form $\mathbf{f}$: This is the vector of evaluations on $H^2 = \{(0,0), (0,1), (1,0), (1,1)\}$. $$ \mathbf{f} = \begin{pmatrix} f(0,0) \\ f(0,1) \\ f(1,0) \\ f(1,1) \end{pmatrix} = \begin{pmatrix} 2 \\ 5 \\ 7 \\ 18 \end{pmatrix} $$ ### The Algorithm Trace We want to compute $f(p) = \mathbf{f}^T \cdot \mathbf{eq}_p$. **Step 1: Construct the Equality Vector $\mathbf{eq}_p$** First, we build the component vectors $\bar{p}_i$: $$ \bar{p}_0 = \begin{pmatrix} 1-p_0 \\ p_0 \end{pmatrix} = \begin{pmatrix} 1-3 \\ 3 \end{pmatrix} = \begin{pmatrix} -2 \\ 3 \end{pmatrix} $$ $$ \bar{p}_1 = \begin{pmatrix} 1-p_1 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1-4 \\ 4 \end{pmatrix} = \begin{pmatrix} -3 \\ 4 \end{pmatrix} $$ Next, we compute their tensor product to get $\mathbf{eq}_p$: $$ \begin{align*} \mathbf{eq}_p = \bar{p}_0 \otimes \bar{p}_1 &= \begin{pmatrix} -2 \\ 3 \end{pmatrix} \otimes \begin{pmatrix} -3 \\ 4 \end{pmatrix} \\ &= \begin{pmatrix} -2 \cdot \begin{pmatrix} -3 \\ 4 \end{pmatrix} \\ 3 \cdot \begin{pmatrix} -3 \\ 4 \end{pmatrix} \end{pmatrix} \\ &= \begin{pmatrix} (-2)(-3) \\ (-2)(4) \\ (3)(-3) \\ (3)(4) \end{pmatrix} = \begin{pmatrix} 6 \\ -8 \\ -9 \\ 12 \end{pmatrix} \end{align*} $$ *Note: These values are precisely the $\text{eq}(x, p)$ terms. For example, $\text{eq}((0,0), p) = (1-p_0)(1-p_1) = (-2)(-3) = 6$. And $\text{eq}((1,0), p) = p_0(1-p_1) = (3)(-3) = -9$.* **Step 2: Compute the Dot Product** Now we compute $f(p) = \mathbf{f}^T \cdot \mathbf{eq}_p$. $$ \begin{align*} f(p) &= \begin{pmatrix} 2 & 5 & 7 & 18 \end{pmatrix} \cdot \begin{pmatrix} 6 \\ -8 \\ -9 \\ 12 \end{pmatrix} \\ &= \sum_{x \in H^2} f(x) \cdot \text{eq}(x, p) \\ &= \underbrace{(2) \cdot (6)}_{f(0,0) \cdot \text{eq}((0,0),p)} + \underbrace{(5) \cdot (-8)}_{f(0,1) \cdot \text{eq}((0,1),p)} + \underbrace{(7) \cdot (-9)}_{f(1,0) \cdot \text{eq}((1,0),p)} + \underbrace{(18) \cdot (12)}_{f(1,1) \cdot \text{eq}((1,1),p)} \\ &= 12 - 40 - 63 + 216 \\ &= \mathbf{125} \end{align*} $$ This non-recursive method gives the exact same result, 125, as the recursive approach. ::: ## Application: Logarithmic Randomness in Proof Systems In many protocols (like Ligero or Brakedown), a prover has a large set of $m$ vectors, $\mathbf{u}_0, \dots, \mathbf{u}_{m-1} \in F^N$. Think of these as the $m$ rows of a very large $m \times N$ matrix $\mathbf{U}$. The prover wants to convince a verifier that *all* these vectors are "good" (e.g., they are all valid codewords from some error-correcting code). ### The Naive Solution: Random Linear Combination Checking all $m$ vectors is too slow. A standard technique is for the verifier to check a *random linear combination* instead. 1. **Verifier:** Chooses $m$ random scalars $r_0, \dots, r_{m-1} \in F$. 2. **Verifier $\to$ Prover:** Sends the random vector $\mathbf{r} = (r_0, \dots, r_{m-1})$. 3. **Prover:** Computes the single combined vector $\mathbf{u}' = \sum_{i=0}^{m-1} r_i \cdot \mathbf{u}_i$. 4. **Prover $\to$ Verifier:** Sends $\mathbf{u}'$. 5. **Verifier:** Performs the expensive check only on $\mathbf{u}'$. The Schwartz-Zippel lemma ensures that if even one $\mathbf{u}_i$ is "bad," the combination $\mathbf{u}'$ will also be "bad" with high probability. **The Cost:** This is a nice reduction, but the verifier must generate and send $m$ random scalars. If $m = 2^{20}$ (about a million), this is infeasible. ### Structured Logarithmic Randomness From papers like Ligerito (https://angeris.github.io/papers/ligerito.pdf) and Diamond & Posen (https://eprint.iacr.org/2023/630.pdf) we know that $\mathbf{r}$ does not need to be *truly* random. It just needs to be "random enough" to catch a bad vector. We can replace $\mathbf{r}$ with a structured vector that *depends* on far fewer random elements. Let $m = 2^k$. Instead of $m$ random scalars, the verifier samples only $k = \log_2 m$ random scalars $p_0, \dots, p_{k-1} \in F$. The verifier then *defines* the $m$-length combination vector $\mathbf{r}$ to be the tensor product vector from our previous section: $$ \mathbf{r} = \mathbf{eq}_p = (1-p_0, p_0)^T \otimes \dots \otimes (1-p_{k-1}, p_{k-1})^T $$The verifier asks the prover to compute the *same* combination $\mathbf{u}' = \sum_{i=0}^{m-1} r_i \cdot \mathbf{u}_i$, but using this special $\mathbf{r}$. This is an exponential saving in randomness: from $O(m)$ to $O(k) = O(\log m)$. ### This is a Multilinear Evaluation This is where the two concepts merge. How does the prover compute this combination $\mathbf{u}'$? Let's look at the $j$-th column of the prover's matrix $\mathbf{U}$. Let this column be the vector $\mathbf{f} \in F^m$: $$\mathbf{f} = (u_{0,j}, u_{1,j}, \dots, u_{m-1,j})^T $$The $j$-th element of the resulting vector $\mathbf{u}'$ is simply the dot product $\mathbf{r}^T \cdot \mathbf{f}$. Let's write this out: $$ u'_j = \mathbf{r}^T \cdot \mathbf{f} = \underbrace{(\mathbf{eq}_p)^T}_{\text{Structured combination vector}} \cdot \underbrace{\mathbf{f}}_{\text{Column of data}} $$ As we established previously, this dot product is *mathematically identical* to evaluating a $k$-variable multilinear polynomial $f$ at the point $p = (p_0, \dots, p_{k-1})$, where $f$ is the unique polynomial whose evaluations on the $k$-dimensional hypercube $H^k$ are the entries of the vector $\mathbf{f}$. Therefore, the task of computing a "tensor product combination" of $m$ vectors is *identical* to the task of evaluating a $k=\log m$ variable multilinear polynomial (defined by those vectors) at the point $p = (p_0, \dots, p_{k-1})$. This allows protocols to switch from a high-randomness, high-communication linear combination to a low-randomness, computationally efficient multilinear evaluation. :::info ### Example: Logarithmic Combination vs. Multilinear Evaluation Let's show this equivalence with $m=4$ vectors, which means $k = \log_2 4 = 2$ variables. ### The Setup * **Prover's Data:** A $4 \times N$ matrix $\mathbf{U}$. We'll just look at one column, $\mathbf{f}$. $$ \mathbf{f} = \begin{pmatrix} \mathbf{u}_0 \text{'s column } j \\ \mathbf{u}_1 \text{'s column } j \\ \mathbf{u}_2 \text{'s column } j \\ \mathbf{u}_3 \text{'s column } j \end{pmatrix} = \begin{pmatrix} 10 \\ 20 \\ 30 \\ 40 \end{pmatrix} $$ * **Verifier's Randomness:** $k=2$ random scalars. Let's pick $p = (p_0, p_1) = (3, 5)$. * **Structured Vector $\mathbf{r}$:** The verifier defines $\mathbf{r} = \bar{p}_0 \otimes \bar{p}_1$. $$ \begin{align*} \bar{p}_0 &= \begin{pmatrix} 1-p_0 \\ p_0 \end{pmatrix} = \begin{pmatrix} 1-3 \\ 3 \end{pmatrix} = \begin{pmatrix} -2 \\ 3 \end{pmatrix} \\ \bar{p}_1 &= \begin{pmatrix} 1-p_1 \\ p_1 \end{pmatrix} = \begin{pmatrix} 1-5 \\ 5 \end{pmatrix} = \begin{pmatrix} -4 \\ 5 \end{pmatrix} \\ \end{align*} $$ $$ \mathbf{r} = \bar{p}_0 \otimes \bar{p}_1 = \begin{pmatrix} -2 \\ 3 \end{pmatrix} \otimes \begin{pmatrix} -4 \\ 5 \end{pmatrix} = \begin{pmatrix} -2(-4) \\ -2(5) \\ 3(-4) \\ 3(5) \end{pmatrix} = \begin{pmatrix} 8 \\ -10 \\ -12 \\ 15 \end{pmatrix} $$ ### Path 1: Compute the Combination (Dot Product) The prover computes the $j$-th element of the combined vector $\mathbf{u}'$ by taking the dot product $\mathbf{r}^T \cdot \mathbf{f}$. $$ \begin{align*} u'_j = \mathbf{r}^T \mathbf{f} &= \begin{pmatrix} 8 & -10 & -12 & 15 \end{pmatrix} \cdot \begin{pmatrix} 10 \\ 20 \\ 30 \\ 40 \end{pmatrix} \\ &= (8)(10) + (-10)(20) + (-12)(30) + (15)(40) \\ &= 80 - 200 - 360 + 600 \\ &= \mathbf{120} \end{align*} $$ ### Path 2: Evaluate the Polynomial We now interpret $\mathbf{f}$ as the evaluation form of a $k=2$ variable multilinear polynomial $f(X_0, X_1)$. $$ \mathbf{f} = \begin{pmatrix} f(0,0) \\ f(0,1) \\ f(1,0) \\ f(1,1) \end{pmatrix} = \begin{pmatrix} 10 \\ 20 \\ 30 \\ 40 \end{pmatrix} $$ The task is to evaluate this $f$ at the point $p = (3, 5)$ using recursive interpolation. $$ \begin{align*} f(3, 5) &= \underbrace{(1-3)}_{-2} \cdot f(0, 5) + \underbrace{(3)}_{3} \cdot f(1, 5) \end{align*} $$ First, we need the sub-evaluations: \begin{align} f(0, 5) &= \underbrace{(1-5)}_{-4} \cdot \underbrace{f(0,0)}_{10} + \underbrace{(5)}_{5} \cdot \underbrace{f(0,1)}_{20}\\ &= -4(10) + 5(20) = -40 + 100 = 60 \end{align} \begin{align} f(1, 5) &= \underbrace{(1-5)}_{-4} \cdot \underbrace{f(1,0)}_{30} + \underbrace{(5)}_{5} \cdot \underbrace{f(1,1)}_{40} \\ &= -4(30) + 5(40) = -120 + 200 = 80 \end{align} Now, substitute these back into the main equation: $$ \begin{align*} f(3, 5) &= -2 \cdot \underbrace{f(0, 5)}_{60} + 3 \cdot \underbrace{f(1, 5)}_{80} \\ &= -2(60) + 3(80) \\ &= -120 + 240 \\ &= \mathbf{120} \end{align*} $$ **Conclusion:** Both paths yield the exact same result, 120. This demonstrates that computing the structured linear combination is *identical* to evaluating the corresponding multilinear polynomial. ::: The central takeaway of this is an exponential reduction in the verifier's randomness complexity. * **Standard Method:** To securely combine $m$ vectors, the verifier must generate $m$ independent random scalars. This is computationally expensive and requires $O(m)$ communication or randomness generation. * **Structured Method:** We replace the fully random $m$-length vector $\mathbf{r}$ with the *structured* $m$-length tensor product vector $\mathbf{eq}_p$. This new vector is "random enough" for security, but it is fully determined by only $k = \log_2 m$ random scalars $(p_0, \dots, p_{k-1})$. The example explains *why* this is a valid and useful replacement. This shows that the act of computing this new, "structured combination" (Path 1: the dot product $\mathbf{r}^T \cdot \mathbf{f}$) is *mathematically identical* to the act of evaluating a $k$-variable multilinear polynomial (Path 2: $f(p)$). This equivalence allows protocols to swap a high-randomness linear combination task for an equivalent, low-randomness multilinear evaluation task, which is the core of the logarithmic randomness optimization.