# VII. Message Encoding & SIMD
[TOC]
## Introduction
This is the seventh chapter of [Understanding Homomorphic Encryption](https://hackmd.io/@spockwall/rJgV4qFgZe) series. The CKKS scheme maps a vector of complex numbers into a polynomial with integer coefficients, so it can be encrypted using Ring-LWE, as the encoding scheme of BGV/BFV. The encoding scheme which packs multiple data complex points into a single ciphertext by leveraging the Complex Canonical Embedding allows parallel data processing (SIMD). Below, we use the notation of Full RNS Variant, so readers can understand the content in following chapters more easily.
Since the polynomial degree $N$ is large (e.g., $2^{16}$ or $2^{17}$), a single FHE operation (like an encrypted addition or multiplication) processes **32,768 or 65,536 data slots** in parallel. This massive parallelism is what makes schemes like CKKS practical for data analysis tasks like logistic regression.
🚪 Recall of Cyclotomic Polynomial: [I. The Mathematical Foundations of Cryptography: Abstract Algebra](https://hackmd.io/TDwmj5IlRxiDeEpwdngUEQ)
## Notations
- $\Phi_M(X)$ is the $M$-th cyclotomic polynomial of degree $N$
- $N$: The degree of the polynomial quotient ring.
- $\mathcal{R}_Q = \mathbb{Z}_Q/(\Phi_M(X))=\mathbb{Z}_Q[X]/(X^N + 1)$
- $Q$: The modulus of the ciphertext. $Q=\prod^L_{i=0}q_i$, and $\log Q \ge 200$ (very big).
- $q_i$: are primes $\in \mathbb{Z}$
- $L$: The size of current ciphertext moduli set. (explained in following chapters)
- $\vec{z}$: The input message vector $\vec{z} \in \mathbb{C}^{\frac{N}{2}}$.
- $\Delta$: The scaling factor (e.g., $2^{40}$), used to preserve precision when rounding to integers.
- For example, let $\Delta=4$, we have $1.23456\times \Delta \times \frac{1}{\Delta}=1.2345$)
- $\zeta$: A complex root of unity. The specific roots used for slots are indexed as $\zeta_i = \zeta^{5^i}$.
- $\odot$: Hadamard Multiplication
- $a=[1,2,3], b=[2,3,4]\rightarrow a\odot b=[2,6,12]$
## The Canonical Embedding
### Slot to Coefficients
The encoding is essentially a map between a **vector space of complex numbers** $\vec{z}\in \mathbb{C}^{\frac{N}{2}}$ and a **polynomial ring** $\mathcal{R}_Q$.
- **The Ciphertext Space (Polynomials):** The ring $\mathcal{R} = \mathbb{Z}[X] / (X^N + 1)$. These are polynomials of degree $N-1$ with integer coefficients.
- **The Message Space (Complex Vectors):** A vector $\vec{z} \in \mathbb{C}^{\frac{N}{2}}$. We want to operate on $\frac{N}{2}$ independent complex numbers simultaneously (SIMD).
The map of two spaces is called **Canonical Embedding**, denoted as $\sigma$.
### Canonical Embedding
- The $M$-th cyclotomic polynomial $\Phi_M(X)$ has $N$ distinct complex roots.
- Let's denote these roots as $\xi_0, \xi_1, \dots, \xi_{N-1}$. These are the primitive $M$-th roots of unity. Typically $M=2N$, so they are roots of $X^N+1=0$.
- The canonical embedding $\sigma$ takes a polynomial $m(X)$ and evaluates it at all these roots:
$$\sigma(m) = \left( m(\xi_0), m(\xi_1), \dots, m(\xi_{N-1}) \right) \in \mathbb{C}^N$$
- This map is an **isometric ring homomorphism**. This is the mathematical foundation of SIMD in CKKS:
* Adding two polynomials $m_1(X) + m_2(X)$ results in adding their evaluations: $\vec{z}_1 + \vec{z}_2$.
* Multiplying two polynomials $m_1(X) \cdot m_2(X)$ results in multiplying their evaluations component-wise: $\vec{z}_1 \odot \vec{z}_2$.
## Conjugate Symmetry
### Observation
- The embedding $\sigma$ maps to $\mathbb{C}^N$, but our input vector $\vec{z}$ only has $\frac{N}{2}$ slots. Furthermore, encryption requires the polynomial $m(X)$ to have **integer** coefficients.
- Simply pick a random vector in $\mathbb{C}^N$ and inverse-map it to a polynomial, the coefficients of that polynomial will likely be complex numbers, which are invalid for the ring $\mathcal{R} = \mathbb{Z}[X]/(X^N+1)$.
- To ensure the resulting polynomial $m(X)$ has real (and eventually integer) coefficients, the vector in the embedding space must satisfy **Conjugate Symmetry**.
### Conjugate Symmetry
- For a polynomial $m(X)$ to have real coefficients, it must satisfy:
$$m(\bar{\xi}) = \overline{m(\xi)}$$
- Since the roots of unity come in conjugate pairs (e.g., $\xi^{-1} = \bar{\xi}$), the values in the embedding vector must also come in conjugate pairs.
- We define a subspace $\mathbb{H} \subset \mathbb{C}^N$:
$$\mathbb{H} = \{ \vec{z} \in \mathbb{C}^N \mid z_j = \overline{z_{-j}} \}$$
- This explains why we only have $\frac{N}{2}$ slots.
- We choose $\frac{N}{2}$ values freely (our message $\vec{z}$).
- The other $\frac{N}{2}$ values are strictly determined by the complex conjugates of our message to satisfy the symmetry condition.
## Encoding Algorithm
The encoding function converts message vector $\vec{z} \in \mathbb{C}^{\frac{N}{2}}$ into a polynomial $m(X)$.
### Step 1: Expansion (Projection Inverse)
Map the $\frac{N}{2}$ message slots to the full $N$-dimensional space $\mathbb{H}$ using a **Natural Projection** inverse $\pi^{-1}$. We select a subgroup of indices $T$ sized $\frac{N}{2}$ to represent the slots.
* For $j \in T$, we set the value to your message: $v_j = \vec{z}[j]$.
* For the remaining indices, we set the value to the conjugate: $v_{-j} = \overline{\vec{z}[j]}$. This creates a vector $\vec{v} \in \mathbb{H}$ that satisfies conjugate symmetry.
### Step 2: Inverse Canonical Embedding (iDFT)
- We must find the polynomial $m_{\mathbb{Q}}(X)$ such that when evaluated at the roots $\xi$, it yields our vector $\vec{v}$.
$$m_{\mathbb{Q}}(X) = \sigma^{-1}(\vec{v})$$Mathematically, this is polynomial interpolation. Since the evaluation points are roots of unity, this operation is exactly the **Inverse Discrete Fourier Transform (iDFT)**.
- In matrix terms, if $V$ is the Vandermonde matrix of the roots of unity, we are solving:
$$\vec{v} = V \cdot \overrightarrow{\text{coeffs}} \implies \overrightarrow{\text{coeffs}} = V^{-1} \cdot \vec{v}$$ and $V$ is a **Vandermonde matrix**, $\zeta_j=\zeta^{5^j}$ for $0\le j<N/2$:
$$ V = \begin{pmatrix}
1 & \zeta_0 & \zeta_0^2 & \dots & \zeta_0^{N-1} \\
1 & \zeta_1 & \zeta_1^2 & \dots & \zeta_1^{N-1} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & \zeta_{N/2-1} & \zeta_{N/2-1}^2 & \dots & \zeta_{N/2-1}^{N-1}
\end{pmatrix} $$ This calculation results in a polynomial with **real** coefficients (due to the symmetry in Step 1), but they are not yet integers.
### Step 3: Scaling and Rounding (Discretization)
- RLWE works on integer polynomials. The coefficients from Step 2 are real numbers with small decimals.
- To save the coefficients without losing precision, we multiply by the scaling factor $\Delta$ (e.g., $2^{40}$) and round to the nearest integer.
$$m(X) = \lfloor \Delta \cdot m_{\mathbb{Q}}(X) \rceil_{\sigma(\mathcal{R})}$$
- The result $m(X)$ is now an integer polynomial $\in \mathcal{R}_Q$ that can be encrypted.
## Decoding Algorithm
- Decoding is simply the forward application of the map. Firstly, we scale down the plaintext polynomial by multiplying $\Delta^{-1}$.
$$m_{\text{real}}(X) \approx \Delta^{-1} \cdot m(X)$$
- Evaluate the polynomial at the specific roots of unity indexed by $T$ (the $\frac{N}{2}$ slot indices).
$$z_j = m_{\text{real}}(\xi_j)$$
- This effectively recovers the original message vector $\vec{z}$ with small rounding error added.
## Example of Encoding Scheme
### Parameters Selection
- $M = 8$: The cyclotomic order. The polynomial ring is defined by the 8th cyclotomic polynomial $\Phi_8(X) = X^4 + 1$.
- $N = 4$: The degree of the polynomial ($\phi(8) = 4$).
- $\Delta = 64$: The scaling factor.
- $\zeta = e^{-2\pi i / 8}=\frac{1-i}{\sqrt{2}}$: The primitive 8th root of unity.
- $T = \{1, 3\}$: The subgroup of indices for the slots.
- Number of Slots $=\frac{N}{2} = 2$.
- Input message to be encoded:
- $\vec{z} = (3+4i, 2-i)\rightarrow (z_1, z_3)=(3+4i, 2-i)$
- $\vec{z}' = (1+i, 3+2i)\rightarrow (z'_1, z'_3)=(1+i,3+2i)$
### Step 1: Expand to Conjugate Symmetric Vector ($\pi^{-1}$)
To ensure the resulting polynomial has real coefficients, we must define the values for the conjugate indices.
- The primitive roots are $\zeta^1, \zeta^3, \zeta^5, \zeta^7$.
- We have $(z_1, z_3)$ and $(z'_1, z'_3)$.
- We calculate $z_7 = \overline{z_1} = 3 - 4i$, since $\zeta^7 = \zeta^{-1} = \overline{\zeta^1}$.
- We calculate $z_5 = \overline{z_3} = 2 + i$, since $\zeta^5 = \zeta^{-3} = \overline{\zeta^3}$.
- The full vector $\vec{v}, \vec{v}' \in \mathbb{H}$ corresponding to roots $(\zeta^1, \zeta^3, \zeta^5, \zeta^7)$ is:
\begin{cases}
\vec{v} = (3+4i, 2-i, 2+i, 3-4i) \\
\vec{v}' = (1+i, 3-2i, 3+2i, 1-i) \\
\end{cases}
### Step 2: Inverse Canonical Embedding (Interpolation)
- Find a real polynomial $m_{\mathbb{Q}}(X)$ of degree 3 such that $m_{\mathbb{Q}}(\zeta^j) = v_j$ for $j \in \{1, 3, 5, 7\}$. This is equal to solving the system:
\begin{cases}
m(\zeta^1) = 3+4i,\; m'(\zeta^1) = 1+i \\
m(\zeta^3) = 2-i,\; m'(\zeta^3) = 3-2i \\
m(\zeta^5) = 2+i,\; m'(\zeta^5) = 3+2i \\
m(\zeta^7) = 3-4i,\; m'(\zeta^7) = 1-i
\end{cases}
- Using the Inverse DFT (or Lagrange interpolation), the paper provides the resulting polynomial:
\begin{cases}
m_{\mathbb{Q}}(X) = \frac{1}{4}(10 + 4\sqrt{2}X + 10X^2 + 2\sqrt{2}X^3) \\
m_{\mathbb{Q}}'(X) = \frac{1}{4}(8 + \sqrt{2}X - 2X^2 + 5\sqrt{2}X^3)\\
\end{cases}
### Step 3: Scaling and Rounding
- Multiply $m_{\mathbb{Q}}(X)$ by $\Delta = 64$ to convert the coefficients to integers.
\begin{cases}
\begin{align}
64 \cdot m_{\mathbb{Q}}(X) &= 16(10 + 4\sqrt{2}X + 10X^2 + 2\sqrt{2}X^3) \\
&\approx 160 + 91X + 160X^2 + 45X^3
\end{align}
\\\\
64\cdot m_\mathbb{Q}'(X) \approx 128 + 23X - 32X^2 + 113X^3
\end{cases}
- Approximate values for rounding ($\sqrt{2} \approx 1.414$):
- $[160, 64\sqrt{2}, 160, 32\sqrt{2}] \approx [160, 91, 160, 45]$
- $[128, 16\sqrt{2}, -32, 80\sqrt{2}] \approx [128, 23, -32, 113]$
### SIMD Addition (Parallel Addition)
- Simply add $m_\mathbb{Q}(X), m_\mathbb{Q}'(X)$ together, we get:
\begin{align}
m_{sum}(X) &= m_\mathbb{Q}(X) + m_\mathbb{Q}'(X) \\
&= (160+128) + (91+23)X + (160-32)X^2 + (45+113)X^3 \\
&= 288 + 114X + 128X^2 + 158X^3
\end{align}
- If we evaluate $m_{sum}(X)$ at the roots of unity ($\zeta^1, \zeta^3$) and scale down by $\Delta=64$, we recover the slot-wise sum.
- The expected result: $$\vec{z}_1 + \vec{z}_2 = (3+4i + 1+i, \;\; 2-i + 3+2i) = \mathbf{(4+5i, \;\; 5+1i)}$$
- The decoded result:
$$\frac{1}{\Delta}\cdot (m_{sum}(\zeta), m_{sum}(\zeta^3))=(4.01 + 5.00i, \;\; 4.99 + 1.00i)$$
The decoded result matches expected one with small precision lost.
### SIMD Multiplication (Parallel Multiplication)
- Multiplying the encoded polynomials results in the slot-wise product of the messages.
- Note that in the ring $\mathcal{R}_Q$:
$$m_{mult}(X) = m_1(X) \cdot m_2(X) \pmod{X^4+1}$$
- Calculating the product of $m_\mathbb{Q}(X)$ and $m_\mathbb{Q}'(X)$:
$$m_{mult}(X) \approx 11423 + 29935X + 18239X^2 - 14215X^3$$
- We multiplied two polynomials scaled by $\Delta$, the result is scaled by $\Delta^2 = 4096$. To decode, we evaluate at the roots and divide by $4096$.
- The expected Result:
$$\vec{z}\odot\vec{z}'=(3+4i, 2-i) \odot (1+i,3+2i)=(-1 + 7i, 8 + i)$$
- The decoded result: $$\frac{1}{\Delta^2}\cdot (m_{mult}(\zeta), m_{mult}(\zeta^3))\approx (-0.99 + 7.04i, \;\; 7.96 + 1.00i)$$
The decoded result matches expected one with small precision lost.
## Reference
- [Homomorphic Encryption for Arithmetic of Approximate Numbers](https://eprint.iacr.org/2016/421)
- [A Full RNS Variant of Approximate Homomorphic Encryption](https://eprint.iacr.org/2018/931)
- [Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs](https://eprint.iacr.org/2021/508)