Partial reduction of $x \bmod (B + c)$

# Partial reduction of $x \bmod (B + c)$ To implement modular arithmetic (e.g. prime field arithmetic), we want to have an efficient reduction algorithm. Often the modulus is chosen to be $B+c$ where $B$ is a power of two, and $c$ is *very* small in magnitude compared to $B$. For example, Curve25519 uses $B = 2^{255}$ and $c = -19$. In fact it's normally sufficient to have efficient "partial reduction", which leaves the result within a range that is a few times larger than $B+c$. Given an efficient partial reduction algorithm where the bound on the output can be calculated given a bound on the input, we can easily perform full reduction when needed (e.g. when we're comparing or hashing values). You could use [Montgomery](https://en.wikipedia.org/wiki/Montgomery_reduction) or [Barrett](https://en.wikipedia.org/wiki/Barrett_reduction) reduction, but I want to describe a simpler algorithm that may be more efficient in some cases, and that does not require a special representation of numbers such as Montgomery form. I don't claim any novelty; I expect that this is just a reinvention of a folklore technique. The algorithm I'm going to describe works when $c$ is not necessarily *very* small, but just signficantly smaller than $B$ (say, half the bit length). This is useful in situations where you can't arbitrarily choose the prime $B+c$, but still have enough freedom to make $c$ small enough. For instance, that applies to the fields used for elliptic curves suited to [Halo](https://eprint.iacr.org/2019/1021), and it could also apply in cases where $B$ and $B+c$ are related by a [Hasse bound](https://en.wikipedia.org/wiki/Hasse_bound). Note that we don't need to assume that $B$ is a power of two; the important thing is that there is an efficient way to convert numbers into base-$B$ representation. In practice we would make specific choices of $L_{0,1,2}$ and $L'_{0,1}$ suited for a given implementation of the low-level arithmetic, and work out all the bounds exactly rather than asymptotically (as done in the example below). ## With signed limbs Write the input $x$ as $B^2 x_2 + B x_1 + x_0$ with $|x_i| \leq L_i$. (The $B^2$ term is optional; you can instead just increase the length of $x_1$ slightly.) $x = [(B+c)(B-c) + c^2]x_2 + [(B + c) - c]x_1 + x_0$ $\hphantom{x} \equiv c^2 x_2 - c x_1 + x_0 \pmod{B + c}$ Let $x' = c^2 x_2 - c x_1 + x_0$. Then $|x'| \leq c^2 L_2 + c L_1 + L_0$. Now write $x'$ as $B x'_1 + x'_0$ with $|x'_i| \leq L'_i$. $x \equiv x' \equiv -c x'_1 + x'_0 \pmod{B + c}$. Let $x'' = -c x'_1 + x'_0$. Then $|x''| \leq c L'_1 + L'_0$. Suppose $L_0, L_1$ are $O(B)$, $L_2$ is $O(B^{1/2})$, and $c$ is $O(B^{1/2})$. Then $|x'|$ is $O(B^{3/2})$. If $L'_0$ is $O(B)$, then $L'_1$ is $O(B^{1/2})$, and $|x''|$ is $O(B)$ as required. ## With unsigned limbs If we want to use only unsigned limbs, we have to avoid the possibility that $x'$ can be negative. The simplest way to do this is to unconditionally add $k(B+c)$ to $x'$ where $k = \left\lceil \frac{c L_1}{B+c} \right\rceil$. That is, write the input $x$ as $B^2 x_2 + B x_1 + x_0$ with $0 \leq x_i \leq L_i$. (Again, the $B^2$ term is optional.) $x = [(B+c)(B-c) + c^2]x_2 + [(B + c) - c]x_1 + x_0$ $\hphantom{x} \equiv c^2 x_2 + k(B+c) - c x_1 + x_0 \pmod{B + c}$ Let $x' = c^2 x_2 + k(B+c) - c x_1 + x_0$. Then $0 \leq x' \leq c^2 L_2 + k(B+c) + L_0$. Now write $x'$ as $B x'_1 + x'_0$ with $0 \leq x'_i \leq L'_i$. We have the same issue again that $-c x'_1 + x'_0$ could be negative, so we'll add $k'(B+c)$ where $k' = \left\lceil \frac{c L'_1}{B+c} \right\rceil$ (in practice we'll usually have $k' = 1$). $x \equiv x' \equiv k'(B+c) - c x'_1 + x'_0 \pmod{B + c}$. Let $x'' = k'(B+c) - c x'_1 + x'_0$. Then $0 \leq x'' \leq k'(B+c) + L'_0$. Suppose $L_0, L_1$ are $O(B)$, $L_2$ is $O(B^{1/2})$, and $c$ is $O(B^{1/2})$. Then $k$ is $O(B^{1/2})$, and so $x'$ is $O(B^{3/2})$. If $L'_0$ is $O(B)$, then $L'_1$ is $O(B^{1/2})$, $k'$ is $O(1)$, and $x''$ is $O(B)$ as required. ## Example with exact bounds Let's say we want to implement the field over which the [Tweedledum curve](https://github.com/daira/tweedle) from Halo is defined. Set $B = 2^{254}$ and $c = 4707489545178046908921067385359695873$ (which is 122 bits). If we have ``uint64`` arithmetic available, then we probably want to express field elements using 4 unsigned limbs. So for example, set $L_0 = L_1 = 2^{255}-1$ and $L_2 = 15$. Then we have $k = 9414979090356093817842134770719391746$, and $0 \leq x' \leq c^2 L_2 + k(B+c) + L_0$ which is 377 bits. Then, we set $L'_0 = B-1$ and obtain $L'_1 = (c^2 L_2 + k(B+c) + L_0) \gg 254 = 9414979090356093817842134770719391748$. We have $k' = 1$, so $0 \leq x'' \leq 2B + c - 1$, which is only 1 bit away from being fully reduced. Furthermore, $(2B + c - 1)^2 \gg 508 = 4$, so if we multiply two such partially reduced values and express the result as $B^2 x_2 + B x_1 + x_0$ for the next reduction, we have $0 \leq x_2 \leq 4$. So we could have chosen any $L_2 \geq 4$ (but still $O(B^{1/2})$); the reason we chose $L_2 = 15$ is so that we can add together several products as 512-bit values without immediate reduction. The cost of the reduction was: * expressing the input $x$ in base $B$; * one multiplication of a 4-bit $x_2$ with a 244-bit constant $c^2$ (maybe using a lookup table); * one multiplication of a 256-bit $x_1$ with a 122-bit constant $c$, i.e. $s = c x_1$; * a 256-bit addition $t = c^2 x_2 + x_0$; * a 377-bit subtraction from a constant and addition of 256-bit $t$, $x' = k(B+c) - s + t$; * expressing $x'$ in base $B$; * one multiplication of a 123-bit $x'_1$ with a 122-bit constant $c$, i.e. $u = c x'_1$; * a 255-bit subtraction from a constant and 256-bit addition $x'' = (B + c) - u + x'_0$. We could also have avoided the multiplication by $c^2$, by setting $L_2 = 0$ and $L_1 = 2^{261}-1$ for example. In that case the cost would be: * expressing the input $x$ in base $B$; * one multiplication of a 261-bit $x_1$ with a 122-bit constant $c$, i.e. $s = c x_1$; * a 383-bit subtraction from a constant and addition of 256-bit $t$, $x' = k(B+c) - s + t$; * expressing $x'$ in base $B$; * one multiplication of a 129-bit $x'_1$ with a 122-bit constant $c$, i.e. $u = c x'_1$; * a 255-bit subtraction from a constant and 256-bit addition $x'' = (B + c) - u + x'_0$.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.