owned this note
owned this note
Published
Linked with GitHub
# SimplePIR
SimplePIR is a private information retrieval scheme described in [HHCMV23](https://eprint.iacr.org/2022/949). DoublePIR, also described in the paper, is a recursive construction of SimplePIR.
## Background
## Cryptographic primitives
### Notation
| Symbol | Meaning |
|-|-|
| $\mathbb{N}$ | The database size (which is a square) |
| $\mathbb{Z}_p$ | The set of integers moduluo $p$ |
| $\mathcal{X}$ | A probability distribution |
| $x \xleftarrow{\mathrm{R}} \mathcal{X}$ | $x$ is a random sample from $\mathcal{X}$ |
| $[x] \in \mathbb{N}$ | Denotes the set ${1 \dots x}$ |
| $n \in \mathbb{N}$ | The LWE secret |
| $m \in \mathbb{N}$ | The number of samples |
| $p$ | The plaintext modulus (see section 4.2 of HHCMV23) |
| $q$ | The integer modulus (must be $\ge 2$) |
| $\boldsymbol{\mathrm{A}}$ | A matrix such that $\boldsymbol{\mathrm{A}} \xleftarrow{\mathrm{R}} \mathbb{Z}^{m \times n}_q$ |
| $\boldsymbol{\mathrm{D}}$ or $\mathsf{db}$ | The database $\boldsymbol{\mathrm{D}} \in \mathbb{Z}^{\sqrt{N} \times \sqrt{N}}_{p}$ |
| $\boldsymbol{\mathrm{s}}$ | A secret $\boldsymbol{\mathrm{s}} \xleftarrow{\mathrm{R}} \mathbb{Z}^{n}_q$ |
| $\boldsymbol{\mathrm{e}}$ | An error vector $\boldsymbol{\mathrm{e}} \xleftarrow{\mathrm{R}} \mathcal{X}^m$, mod $q$ |
| $\boldsymbol{\mathrm{r}}$ | A random vector $\boldsymbol{\mathrm{r}} \xleftarrow{\mathrm{R}} \mathbb{Z}^{m}_q$ |
| $\mu$ | A message $\mu \in \mathbb{Z}_p$ |
| $\mathsf{hint}_s$ | A hint for the server. The base-$p$ decomposition of $\mathsf{db} \cdot \boldsymbol{\mathrm{A}}$. Used in DoublePIR. |
| $\mathsf{hint}_c$ | A hint for the client. $\mathsf{hint}_c = \mathsf{db} \cdot \boldsymbol{\mathrm{A}}$. Used in both SimplePIR and DoublePIR. |
### Learning with errors (LWE)
The assumption that the Learning with Erorrs (LWE) problem is hard forms the foundation of Regev encryption, which in turn is key to this PIR scheme.
The LWE problem stems from the assertion that the following distributions are computationally indistinguishable:
$\{(\boldsymbol{\mathrm{A}}, \boldsymbol{\mathrm{A}} \boldsymbol{\mathrm{s}} + \boldsymbol{\mathrm{e}})\} \stackrel{c}{\approx} \{(\boldsymbol{\mathrm{A}}, \boldsymbol{\mathrm{r}})\}$
### Regev encryption
Section 3.1 of HHCMV23 describes Regev encryption, which is based on the decision variant of the LWE assumption.
The encryption of a message $\mu$ with secret key $\boldsymbol{s}$ and error $e \xleftarrow{\mathrm{R}} \mathcal{X}$ is $(\boldsymbol{\mathrm{a}}, c)$ where:
$c = \boldsymbol{\mathrm{a^\top s}} + e + \lfloor q / p \rfloor \cdot \mu \in Z^{n}_q \times Z_q$,
$e$ is sampled from ${q/p/4, ... -1, 1, ... q / p / 4}$,
and $\lfloor q / p \rfloor$ is $q / p$ rounded to the nearest integer.
### Regev decryption
To decrypt a ciphertext, compute:
$c - \boldsymbol{\mathrm{a^\top s}}$ mod $q$ rounded to the nearest multiple of $\lfloor q / p \rfloor$.
#### Intuition
Due to the LWE problem, it is difficult to recover $s$ from $As + e$ even if $e$ is small.
When we subtract $\boldsymbol{\mathrm{a^\top s}}$ from $c$, we get:
$e + \lfloor q / p \rfloor \cdot \mu$
Multiplying the plaintext $\mu$ by $\lfloor q / p \rfloor$ effectively "converts" $\mu$ into multiples of $\lfloor q / p \rfloor$, so as long as $e < \frac{1}{2} \cdot \lfloor q/p \rfloor$, rounding $c - \boldsymbol{\mathrm{a^\top s}}$ to the closest $\lfloor q / p \rfloor$ eliminates $e$ and leaves $\mu$.
#### Homomorphic addition
Regev ciphertexts enjoy homomorphic addition, such that ciphertexts can be added in order to derive the encryption of the sum of their plaintext. That is, if the ciphertext $c_1$ is the encryption of $m_1$ and $c_2$ is the encryption of $m_2$, the decryption of $c_1 + c_2$ equals $m_1 + m_2$.
Note that $\boldsymbol{\mathrm{A}}$ of $c_1$ must be added to that of $c_2$ so that the result can be used to decrypt $c_3$.
#### Homomorphic multiplication
A Regev ciphertext multiplied by a plaintext yields the ciphertext of the product of both plaintexts. That is, if the ciphertext $c_1$ is the encryption of $m_1$, then $c_1 \cdot x$ is the encryption of $m_1 \cdot x$.
Note that $\boldsymbol{\mathrm{A}}$ of $c_1$ must be multiplied by $x$ so that the result can be used to decrypt $c_1 \cdot x$.
#### Noise growth
Adding ciphertexts also adds the noise vectors. If too many ciphertexts are summed, the result maybe undecryptable. Concretely, if the sum of $e_i$ exceeds $\frac{1}{2} \cdot \lfloor q / p \rfloor$ then the rounding operation will round the result in the wrong direction. The same applies to multiplication.
### A simple example
Samir's blog post, [*Explained from scratch: private information retrieval using homomorphic encryption*](https://blintzbase.com/posts/pir-and-fhe-from-scratch/) describes a simple instantiation of Regev encryption using toy parameters. This section will lay out this example.
In the blog post, the ciphertext $\mu$ is a single bit, so $m$ (the number of samples) is 1.
The plaintext modulus is implicitly set to $2$.
He sets $n$ to 512, so the size of the matrix $A$ is $m \times n = 1 \times 512$. Further, he sets $q$ to 3329 and the noise distribution to [-3, 3] mod $q$ (that is, $[3327, 3328, 0, 1, 2, 3]$).
Since $m = 1$, the error vector $e$ has only one element sampled from the noise distribution.
The secret vector $s$ has $n$ rows and 1 column ($n \times 1$).
Recall that if you multiply a matrix of $l \times m$ dimensions (rows $\times$ columns) with another matrix of $m \times n$ dimensions, the resulting matrix has dimensions $l \times n$.
The dimensions of the ciphertext $c = As + e \cdot \lfloor q / p \rfloor \cdot \mu$ is therefore $1 \times 1$ because $As$ is an $n \times n = 1 \times 1$ matrix, $e$ is a $1 \times 1$ matrix, and the rest of the terms are single elements.
The database is a 1 $N$ matrix of bits, where $N$ is set to 50.
A query from the client is a vector of $(A, c)$ values which are the encryptions of $N$ bits where the $i$th bit is set to 1 and all other bits are set to 0.
The answer from the server is a single ciphertext $(A, c)$ which is the sum of all ciphertexts from the query which correspond to entries in the database that have the value 1. The resulting ciphertext therefore the encryption of the value 1 as long as one and only one of the query ciphertexts is set to 1. This is a simplification of the SimplePIR technique of summing the multiplications of each query ciphertext with its corresponding database entry.
## The SimplePIR construction
### Preprocessing
Both the client and server may locally generate the matrix $\boldsymbol{\mathrm{A}}$ from a random seed.
The server generates $\mathsf{hint}_c = \mathsf{db} \cdot \boldsymbol{\mathrm{A}}$ and sends it to the client.
### Queries
When the client wants to privately retrieve a row from the database, they construct a query and send it to the server. A query is a vector of ciphertexts. Each item in the vector are encryptions of the value 0, except for the value at the client's desired index, which is the encryption of the value 1.
### Answers
An answer is the product of $\mathsf{db}$ and a query. Since multiplication of a plaintext with an ciphertext homomorphically results in the encryption of both plaintexts, the answer is the encryption of the desired row of the database.
### Recovery
To recover a desired row or cell from an answer, the client decrypts the value(s). Recall from above that Regev decryption is defined as:
$c - \boldsymbol{\mathrm{a^\top s}}$ mod $q$ rounded to the nearest multiple of $\lfloor q / p \rfloor$.
Also recall that the answer is a vector of encryptions of a single row of $\mathsf{db}$. It follows that each cipertext in the answer is the encryption of a single database cell. Furthermore, since homomorphic multiplication also involves the first value of the ciphertext tuple, we get as ciphertext:
$(\mathsf{db} \cdot \boldsymbol{\mathrm{A}}, c)$ where
$c = \mathsf{db} \cdot\boldsymbol{\mathrm{A s}} + e + \lfloor q / p \rfloor \cdot \mu$
Where $\mu$ is the value of the database cell.
Since the client knows $\mathsf{hint}_c = \mathsf{db} \cdot \boldsymbol{\mathrm{A}}$, decryption is as follows:
$\mathsf{ans} -\mathsf{hint}_c \cdot \boldsymbol{\mathrm{s}}$ and round each value to the nearest multiple of $\lfloor q / p \rfloor$.
### Updates
The server may update the database a row at a time and only send a fraction of the hint back to the client. The server only needs to multiply updated row $\boldsymbol{\mathrm{d}}$ with $\boldsymbol{\mathrm{A}}$ and send the result, along with the index of the updated row back to the client. The client can then simply replace each value of $\mathsf{hint}_c[i]$ and future recoveries from said row will result in the updated data.
### Benchmarks
| Task | By | Frequency | Operations | Communication |
|-|-|-|-|-|
| Pre-setup | Server and client | One-time | Generate $\boldsymbol{\mathrm{A}}$ from seed | N/A |
| Setup | Server | One-time | 1 MM ($\sqrt{N}$, $\sqrt{N}$, $n$) | $\sqrt{N} \cdot n$ |
| Query | Client | | 1 VM ($\sqrt{N}$, $n$); $\sqrt{N}$ additions | $\sqrt{N}$ |
| Answer | Server | Per query | 1 VM ($\sqrt{N}$, $n$) | $\sqrt{N}$ |
| Recover row | Client | Per query | 1 VM ($\sqrt{N}$, $\sqrt{N}$); $\sqrt{N}$ subtractions; $\sqrt{N}$ rounding to $\lfloor q / p \rfloor$ | N/A |
| Batch query | Client | | TODO |
| Batch answer | Server | Per batch query | TODO |
#### Legend:
**MM ($l$, $m$, $n$)**: matrix multiplication. This has worst-case $O(n^3)$ complexity. With a simple algorithm, matrix multiplication of a $l \times m$ matrix with a $m \times n$ matrix to yield a $l \times n$ matrix involves $l \times n \times m$ multiplications and $l \times n \times m - 1$ additions.
**VM ($l$, $m$)**: matrix multiplication by a vector. It involves $l \times m$ multiplications and $l \times m - 1$ additions.
**MA ($l$, $m$)**: matrix addition (or subtraction). Both matrices should have dimensions $l \times m$.
## The DoublePIR construction
### Benchmarks
$\kappa = \frac{\mathrm{log}_q}{\mathrm{log}_p}$
| Task | By | Frequency | Operations | Communication |
|-|-|-|-|-|
| Pre-setup | Server and client | One-time | Generate $\boldsymbol{\mathrm{A}}_1$ and $\boldsymbol{\mathrm{A}}_2$ from seed | N/A |
| Setup | Server | One-time | 2 MMs - ($n$, $m$, $l$) and ($\kappa n$, $l$, $m$)| Only $\mathsf{hint}_s$ ($\kappa n \times l$ elements) |
| Query | Client | | 2 VM ($m$, $n$) and ($l$, $n$) | $\boldsymbol{\mathrm{c}}_1 \in \mathbb{Z}^m_q$ and $\boldsymbol{\mathrm{c}}_2 \in \mathbb{Z}^l_q$ |
| Answer | Server | Per query | 2 VMs ($l$, $m$), ($\kappa (n + 1)$, $l$), 1 MM ($\kappa$, $l$, $n$) | $\kappa(2n + 1)$ |
| Recover | Client | Per query | 1 VM ($\kappa n$, $n$), 1 MA ($\kappa$, $n+1$) | N/A |
### Updates
TODO