LogUp* - HackMD

# Logup\*: Faster, Cheaper Indexed Lookups for Small Tables In this technical note, we unpack and explain the recent note Logup*: faster, cheaper logup argument for small-table indexed lookups by Lev Soukhanov (May 2025). The paper proposes a new technique to perform *indexed lookup arguments* in zero-knowledge proof systems more efficiently when the lookup tables are small. We will build up the background, provide intuition, and carefully walk through the mathematics involved. ## Introduction: The Lookup Argument In many zero-knowledge proof systems, we often need to check that a list $X$ of witness elements is contained within some *table* $T$. This is called a **lookup argument** and arises, for example, when constraining values to be in a valid range or when using precomputed tables. For arrays $X, T$ over a large field $F$, we want to prove $$ X \subseteq T $$ :::info ### Example Let the table $T$ contain allowed values: $$ T = \{3, 7, 20, 45\} $$ Suppose we have a witness array $$ X = \{7, 20, 3, 3\} $$ In this case, we want to prove $$ X \subseteq T $$ which means every element in $X$ must be present in $T$. ::: The *logup* argument, first proposed in prior work, provides a compact way to do this. Instead of explicitly sorting or building polynomials, it uses a logarithmic derivative trick to check a certain rational function identity at a random point. ## The Classic Logup Check Let $|X| = n$, $|T| = m$. For example, if $X = \{5, 2\}$ and $T = \{2, 5, 9\}$, then, $|X| = n = 2$ and $|T| = m = 3$. We also define an array $acc$ of size $m$, where $acc[j]$ counts how many times $T[j]$ appears in $X$. We assume no repeats in $T$ and that $n < p$ to avoid overflow. :::info ### Example for $acc$ array Let $$ T = \{2, 5, 9\} $$ and $$ X = \{5, 2, 2\} $$ We have $$ |T| = m = 3, \quad |X| = n = 3 $$ We define an array $acc$ of size $m$ (same size as $T$). Each entry $acc[j]$ counts how many times $T[j]$ appears in $X$. Let’s compute $acc$ entry by entry: * $T[0] = 2$. It appears twice in $X$, so $acc[0] = 2$. * $T[1] = 5$. It appears once in $X$, so $acc[1] = 1$. * $T[2] = 9$. It does not appear in $X$, so $acc[2] = 0$. Thus, we have $$ acc = [2, 1, 0] $$ We also assume that $T$ has no repeats (which it does not here) and that $n < p$, where $p$ is the characteristic of the finite field $\mathbb{F}$. For example, $p$ could be a large prime like $101$, so $3 < 101$, which ensures there is no overflow in these counts. #### What would an overflow look like? Suppose we are working in a field with a very small characteristic, say $p = 3$. Now imagine $$ X = \{\underbrace{2, 2, 2}_{3 \text{ times}}, \underbrace{5, 5}_{2 \text{ times}}, 9\} $$ Then $$ acc = [3, 2, 1] \rightarrow [0, 2, 1] \mod 3 $$ But if $p = 3$, then $acc[0] = 3$ becomes zero when reduced modulo $p$. This "wrap-around" causes an **overflow**, meaning we can no longer correctly represent the actual count. To avoid this, we always require $n < p$, so all counts $acc[j]$ stay below $p$ and no reduction occurs. ::: The prover commits to $X, T$, and $acc$, then they check for a random challenge $c \in F$ that \begin{equation} \sum_{i=0}^{n-1} \frac{1}{c - X[i]} = \sum_{j=0}^{m-1} \frac{acc[j]}{c - T[j]} \end{equation} The soundness error is at most $\frac{n + m}{|F|}$. :::info ### Example illustrating the rational function check Let $$ T = \{2, 5, 9\} $$ and $$ X = \{5, 2, 2\} $$ We have $$ acc = [2, 1, 0] $$ Suppose we choose a random challenge $c = 10$ in our field $\mathbb{F}$. #### Left-hand side (LHS) \begin{equation} \begin{aligned} \sum_{i=0}^{2} \frac{1}{c - X[i]} &= \frac{1}{10 - 5} + \frac{1}{10 - 2} + \frac{1}{10 - 2} \\ &= \frac{1}{5} + \frac{1}{8} + \frac{1}{8} \\ &= \frac{1}{5} + \frac{2}{8} \\ &= \frac{1}{5} + \frac{1}{4} \\ &= \frac{4}{20} + \frac{5}{20} \\ &= \frac{9}{20} \end{aligned} \end{equation} #### Right-hand side (RHS) \begin{equation} \begin{aligned} \sum_{j=0}^{2} \frac{acc[j]}{c - T[j]} &= \frac{2}{10 - 2} + \frac{1}{10 - 5} + \frac{0}{10 - 9} \\ &= \frac{2}{8} + \frac{1}{5} + 0 \\ &= \frac{1}{4} + \frac{1}{5} \\ &= \frac{5}{20} + \frac{4}{20} \\ &= \frac{9}{20} \end{aligned} \end{equation} #### Conclusion Both sides evaluate to $\frac{9}{20}$, confirming that $$ \sum_{i=0}^{n-1} \frac{1}{c - X[i]} = \sum_{j=0}^{m-1} \frac{acc[j]}{c - T[j]} $$ This equality holds for all $c \in F$ if $X \subseteq T$ and $acc$ is correct. If not, with high probability, the equality will fail for a random $c$. ::: ## Indexed Lookup Often we want an **indexed lookup**: for an index array $I$, we want to look up $T$ at positions $I[i]$. Define $$ (I^* T)[i] = T[I[i]] $$ We want to prove to the verifier that some witness array $X$ exactly matches the result of these indexed lookups, that is, $$ X = I^* T $$ The end goal is to ensure that the prover did not cheat by claiming arbitrary values in $X$ — instead, each $X[i]$ must truly be the entry at index $I[i]$ in $T$. This is useful, for example, when $T$ is a precomputed table of constants (like roots of unity, valid opcodes, or lookup tables for optimized arithmetic), and $I$ specifies which entries the computation uses. The classic approach reduces this to an unindexed lookup using random folding. We pick a random $\gamma$ and check \begin{equation} \sum_{i=0}^{n-1} \frac{1}{c - (I^* T[i] + \gamma I[i])} = \sum_{j=0}^{m-1} \frac{acc[j]}{c - (T[j] + \gamma j)} \end{equation} This requires committing to $I$, $I^* T$, and $acc$. :::info ### Example for indexed lookup and random folding Let $$ T = \{20, 30, 40\} $$ and index array $$ I = \{2, 0, 1\} $$ Then, \begin{equation} \begin{aligned} I^* T[0] &= T[2] = 40 \\ I^* T[1] &= T[0] = 20 \\ I^* T[2] &= T[1] = 30 \end{aligned} \end{equation} Thus, $$ I^* T = \{40, 20, 30\} $$ Suppose we have a witness array $$ X = \{40, 20, 30\} $$ We want to prove that $X = I^* T$. #### Classic approach: using random folding We choose a random challenge $\gamma$ (say $\gamma = 7$) and a random $c$ (say $c = 100$). For each $i$, we form $$ I^* T[i] + \gamma I[i] $$ Numerically: - $I^* T[0] + 7 \cdot I[0] = 40 + 7 \cdot 2 = 40 + 14 = 54$ - $I^* T[1] + 7 \cdot I[1] = 20 + 7 \cdot 0 = 20$ - $I^* T[2] + 7 \cdot I[2] = 30 + 7 \cdot 1 = 30 + 7 = 37$ So we obtain the folded values: $$ \{54, 20, 37\} $$ #### Left-hand side (LHS) \begin{equation} \begin{aligned} \sum_{i=0}^{2} \frac{1}{c - (I^* T[i] + \gamma I[i])} &= \frac{1}{100 - 54} + \frac{1}{100 - 20} + \frac{1}{100 - 37} \\ &= \frac{1}{46} + \frac{1}{80} + \frac{1}{63} \end{aligned} \end{equation} #### Right-hand side (RHS) We also define $acc$ by counting how many times each $T[j]$ appears in $I^* T$. In this example, each $T[j]$ appears once, so $$ acc = [1, 1, 1] $$ We know that: - $T[0] + 7 \cdot 0 = 20 + 0 = 20$ - $T[1] + 7 \cdot 1 = 30 + 7 = 37$ - $T[2] + 7 \cdot 2 = 40 + 14 = 54$ Then \begin{equation} \begin{aligned} \sum_{j=0}^{2} \frac{acc[j]}{c - (T[j] + \gamma j)} &= \frac{1}{100 - 20} + \frac{1}{100 - 37} + \frac{1}{100 - 54} \\ &= \frac{1}{80} + \frac{1}{63} + \frac{1}{46} \end{aligned} \end{equation} Notice that the folded terms in RHS match the folded terms in LHS exactly, just in a different order. So both sides evaluate to the same sum. #### Why do we do this? The folding with $\gamma$ combines $I^* T[i]$ and $I[i]$ into a single value, allowing us to reduce an **indexed lookup** problem to a standard unindexed logup-style rational function check. This requires committing to: - $I$ (the index array) - $I^* T$ (the looked-up values) - $acc$ (access counts) and then performing the rational function check at random $c$. ::: ## Challenges with Small Tables In many practical protocols, we often have: 1. $m \ll n$: the table $T$ is small, but the witness array $X$ (and index array $I$) is large. 2. Indices $I[i]$ are small integers that can be committed to efficiently. 3. The table $T$ contains large, essentially random elements of $\mathbb{F}$ (for example, evaluations of a polynomial in SPARK-like protocols). ### Why does this cause problems? While $I$ is cheap to commit to (since its elements are small), the looked-up values $I^* T$ are large field elements, and there are $n$ of them. In hash-based commitment schemes, committing to each large element of $I^* T$ is expensive because: * The number of elements is large ($n$). * Each element is big (full field size). This significantly increases prover time, proof size, and verifier cost. :::info ### Example illustrating challenges with small tables Suppose $$ T = \{50, 80\} $$ so $m = 2$, a very small table. Now let the index array $$ I = \{0, 1, 0, 0, 1, 0, 1, 0, 1, 0\} $$ so $n = 10$, much larger than $m$. The looked-up values are $$ I^* T = \{50, 80, 50, 50, 80, 50, 80, 50, 80, 50\} $$ Even though $I$ contains small indices (just 0 or 1), each value in $I^* T$ is a large field element (say, 256-bit values if $\mathbb{F}$ is large). #### Why is this expensive? - We must commit to $I^* T$, which has 10 large elements. - In a hash-based scheme, each of these big elements must be individually hashed and included in the Merkle tree. - The total cost scales with $n$ and with the size of each element. #### Key point Despite the table $T$ being tiny and $I$ being cheap, the commitment to $I^* T$ dominates the cost. Logup* avoids this by removing the need to commit to $I^* T$ at all. ::: ## New Idea: Trading Pullback for Pushforward The key insight in Logup\* is to **avoid committing to $I^* T$ altogether**. This is achieved by using a duality between *pullbacks* and *pushforwards*. ### Definitions * **Pullback**: Given $I : {0,...,n-1} \rightarrow {0,...,m-1}$ and $T$, define $$ I * T[i] = T[I[i]] $$ * **Pushforward**: Given a function $A$ on $n$, define $$ I_* A[j] = \sum_{i | I[i] = j} A[i] $$ :::info ### Example illustrating pullback and pushforward Let $$ T = \{10, 20, 30\} $$ and $$ I = \{2, 0, 1, 2\} $$ Then $n = 4$, $m = 3$. ### Pullback We define $$ I^* T[i] = T[I[i]] $$ Compute for each $i$: - $I * T[0] = T[2] = 30$ - $I * T[1] = T[0] = 10$ - $I * T[2] = T[1] = 20$ - $I * T[3] = T[2] = 30$ Thus, $$ I * T = \{30, 10, 20, 30\} $$ ### Pushforward Let $$ A = \{5, 6, 7, 8\} $$ We define $$ I_* A[j] = \sum_{i | I[i] = j} A[i] $$ Compute for each $j$: - $I_* A[0]$: indices where $I[i] = 0$ is $i = 1$, so $I_* A[0] = A[1] = 6$. - $I_* A[1]$: indices where $I[i] = 1$ is $i = 2$, so $I_* A[1] = A[2] = 7$. - $I_* A[2]$: indices where $I[i] = 2$ are $i = 0$ and $i = 3$, so $I_* A[2] = A[0] + A[3] = 5 + 8 = 13$. Thus, $$ I_* A = \{6, 7, 13\} $$ #### Intuition - **Pullback** copies table values to positions specified by $I$. - **Pushforward** aggregates input values $A$ according to which index they map to. ::: ### Duality Lemma \begin{equation} \langle I_* A, B \rangle = \langle A, I^* B \rangle \end{equation} where $\langle \cdot, \cdot \rangle$ denotes the inner product (sum over the matching indices). :::info ### Example illustrating the duality lemma Let $$ A = \{2, 3, 4\} $$ and $$ B = \{10, 20\} $$ Define $$ I = \{1, 0, 1\} $$ So $n = 3$, $m = 2$. ### Compute $I^* B$ $$ I^* B[i] = B[I[i]] $$ - $I^* B[0] = B[1] = 20$ - $I^* B[1] = B[0] = 10$ - $I^* B[2] = B[1] = 20$ Then, $$ I^* B = \{20, 10, 20\} $$ ### Compute $\langle A, I^* B \rangle$ \begin{aligned} \langle A, I^* B \rangle &= 2 \cdot 20 + 3 \cdot 10 + 4 \cdot 20 \\ &= 40 + 30 + 80 \\ &= 150 \end{aligned} ### Compute $I_* A$ $$ I_* A[j] = \sum_{i : I[i] = j} A[i] $$ - $I_* A[0]$: indices where $I[i] = 0$ is $i = 1$, so $I_* A[0] = A[1] = 3$. - $I_* A[1]$: indices where $I[i] = 1$ are $i = 0$ and $i = 2$, so $I_* A[1] = A[0] + A[2] = 2 + 4 = 6$. Then, $$ I_* A = \{3, 6\} $$ ### Compute $\langle I_* A, B \rangle$ \begin{aligned} \langle I_* A, B \rangle &= 3 \cdot 10 + 6 \cdot 20 \\ &= 30 + 120 \\ &= 150 \end{aligned} ### Conclusion $$ \langle I_* A, B \rangle = \langle A, I^* B \rangle = 150 $$ This confirms the duality lemma. ::: ## The New Protocol As mentionned previously, in classical approaches, the prover must commit to the full array $I^* T$, which is large. The key innovation here is to **avoid ever committing to $I^* T$**, while still convincing the verifier of the correctness of each lookup. ### What do we want? We want to claim that for a random point $r$ (usually a challenge chosen by the verifier), $$ I^* T(r) = e $$ Here, $I^* T$ is viewed as a multilinear polynomial evaluated at point $r$. ### Strategy breakdown 1. **Commit to $I$ and $T$ only** We avoid committing to $I^* T$, which would have $n$ large elements. Instead, we just commit to: - $I$: the array of indices (small elements). - $T$: the table (small size $m$, but possibly large elements). 2. **Claim the value $e$** Prover declares a value $e$ which should be $I^* T(r)$. 3. **Define $eq_r$** We define a special polynomial called the multilinear Lagrange kernel: \begin{equation} eq_r(x) = \prod_{i=0}^{k-1} (r_i x_i + (1 - r_i)(1 - x_i)) \end{equation} 4. **Construct and commit to $I_* eq_r$** We pushforward $eq_r$ through $I$: $$ I_* eq_r[j] = \sum_{i: I[i] = j} eq_r[i] $$ 5. **Key observation: switch inner product** We have: \begin{equation} I^* T(r) = \langle I^* T, eq_r \rangle = \langle T, I_* eq_r \rangle \end{equation} 6. **Use sumcheck to verify** We use the sumcheck protocol to prove \begin{equation} \langle T, I_* eq_r \rangle = e \end{equation} After sumcheck, this reduces to a small number of evaluation claims on $T$ and $I_* eq_r$ at random points. 7. **Open commitments** Finally, we open the necessary evaluations from $T$ and $I_* eq_r$, and check that they match the expected relations. ### Why does this help? - We never commit to $I^* T$. - We only commit to $I_* eq_r$, which is small (size $m$). - The expensive large field element vector commitment is completely avoided. :::info ### Simple numeric example illustrating the new strategy Let $$ T = \{100, 200\} $$ and $$ I = \{0, 1, 0\} $$ So $$ I^* T = \{100, 200, 100\} $$ ### Suppose we want to evaluate at $r = (0, 1, 0)$ On the Boolean hypercube $\{0,1\}^3$, $r$ corresponds to selecting index 2 (binary index 010). ### Step 1: Commit to $I$ and $T$ only We do **not** commit to $I^* T = \{100, 200, 100\}$. ### Step 2: Claim value $e$ The prover claims $$ e = I^* T(r) = I^* T[2] = 100 $$ ### Step 3: Define $eq_r$ The kernel $eq_r$ is 1 only at $r = (0,1,0)$ and zero elsewhere, that is to say $eq_r = \{0,0,1\}$. For simplicity in this example, think of $eq_r$ as an indicator that selects index 2. ### Step 4: Compute $I_* eq_r$ We compute: $$ I_* eq_r[j] = \sum_{i | I[i] = j} eq_r[i] $$ - For $j = 0$: - $I[0] = 0$, $I[2] = 0$, and $I[1] = 1$ - Only $i = 2$ has $eq_r[2] = 1$, so $I_* eq_r[0] = 1$ - For $j = 1$: - Only $i = 1$, but $eq_r[1] = 0$, so $I_* eq_r[1] = 0$ Thus, $$ I_* eq_r = \{1, 0\} $$ ### Step 5: Switch inner product We have $$ I^* T(r) = \langle I^* T, eq_r \rangle = \langle T, I_* eq_r \rangle $$ Numerically, $$ \langle T, I_* eq_r \rangle = 100 \times 1 + 200 \times 0 = 100 $$ Matches the claimed value $e$. ### Step 6: Use sumcheck The prover proves via sumcheck that: $$ \langle T, I_* eq_r \rangle = e $$ Which here is simply $100 = 100$. ### Step 7: Open evaluations Finally, open the commitments to $T$ and $I_* eq_r$ at points chosen during the sumcheck. ### Key point We never committed to $$ I^* T = \{100, 200, 100\} $$ Instead, we only committed to $$ I_* eq_r = \{1, 0\} $$ which is short (length $m = 2$), and we reused the commitment to $T$. ::: ## Pushforward Check via Logup The core check for $I_* A$ can also be done using a logup-style argument. Given $X$, $Y$ with $Y = I_* X$, for random $c$ we check \begin{equation} \sum_{i=0}^{n-1} \frac{X[i]}{c - I[i]} = \sum_{j=0}^{m-1} \frac{Y[j]}{c - j} \end{equation} In our context, we set $X = eq_r$, $Y = I_* eq_r$. The prover commits to $I$ and $T$, and validates \begin{equation} \sum_{i=0}^{n-1} \frac{eq_r[i]}{c - I[i]} = \sum_{j=0}^{m-1} \frac{I_* eq_r[j]}{c - j} \end{equation} This step is needed to ensure that the committed $I_* eq_r$ values are indeed the correct pushforward sums derived from $eq_r$ and $I$. Without this check, a dishonest prover could cheat by committing to a fake $I_* eq_r$ and then falsely convince the verifier that $\langle T, I_* eq_r \rangle = e$. This guarantees that $I_* eq_r$ is computed correctly without having to commit to $I^* T$. ## Conclusion The Logup\* protocol trades a *pullback* claim for a *pushforward* claim, allowing small-table indexed lookups without large additional commitments. This technique simplifies proof construction, reduces prover costs, and unlocks better performance for hash-based proofs. When $m \ll n$ and $F$ is large, this new approach has major advantages: * No need to commit to $I^* T$, saving $n$ large-field commitments. * Only requires small-size commitments (size $m$) and sumchecks involving $m$ terms. * Avoids numerator overflow issues present in classical logup accumulators.