SimplePIR without hint

# SimplePIR without hint Appendix A of https://eprint.iacr.org/2023/1438.pdf describes a way to remove hint from SimplePIR. Below I have first described SimplePIR and the described how to remove hint using BFV encryption. The trick is the perform LWE decryption of SimplePIR response using BFV homomorphically on server side. I have also taken a concrete case of retrieving txs demonstrating the costs. ### SimplePIR Let's take a database of size $N$ with values in $Z_p$ and structure the database into a rectangle matrix with dimensions $l \times m$. Let LWE secret vector dimension be $n$ and ciphertext modulus be $q$. Then to query a single element from the $db$ client chooses one of the columns $col_q$ in range $[0, m)$ in which the desired value lives, constructs a hot vector $pt$ of dimension $m$ with only $col_q^{th}$ element set to 1. Client then encrypts $pt$ under LWE as query $$qu = (c_0 = As + e + \Delta pt, c_1 = A) \in (Z_q^m, Z_q^{m \times n})$$ where $s \in Z_q^{n}$ is secret vector, $A \in Z_q^{m \times n}$, error $e \in \chi^{m}$, $\Delta = q/p$. Server processes query $qu$ as $$res = db \cdot qu = (db \cdot c_0, db \cdot c_1 ) \in (Z_q^{l}, Z_q^{l \times n})$$ Client downloads $res$ and decrypts it to find $col_q$ which contains the desired value. We can reduce the upload size of client query by fixing a public $A$ and asking client to change their secret key $s$ after every query (changing key $s$ is important for security). **Client-server communication**: $m * \log(q)$ bits **Server-client communication**: $l * \log(q) + l * n * \log(q)$ bits **Server side processing**: $N + N * n$ multiplications in $Z_q$ In practice, we can set $q$ either as 32 bit or 64 bit and not worry about modulus operations. So the server processing should be fast. The bottleneck is downloading $l * n * \log(q)$ for each query. For example with with $db$ of size $N = 2^{44}$ with dimensions $2^{20} \times 2^{24}$, $p = 527, n = 1608, q = 2^{32}$, it can be $2^{20} * 1608 * 32$ bits = 6.28GB. We can reduce the size by further structuring the $db$ a lot wider than it is tall at the risk of increase client-server communication and possibly error overflow. SimplePIR turns this into hint that client downloads before making (hopefully) many queries. However, as $db \cdot c_1$ depends on $db$, hint needs to be updated when $db$ updates. ### Removing $db \cdot c_1$ using BFV: Notice the decryption procedure of LWE ciphertext $(c_0', c_1')$ is: $$\Delta m + e = c_0' - c_1's$$ After which we can scale down $\Delta m + e$ by $\Delta$ to get $m$. The key here is we perform the decryption procedure (without scaling down) homomorphically using BFV encryption. Let's instantiate a BFV scheme with ciphertext modulus $Q$, polynomial degree $n'$ (no. of SIMD slots), plaintext space $Z_q$ (equal to ciphertext space of LWE). Client encrypts their per query LWE secret $s$ as $n$ BFV ciphertexts where $i^{th}$ BFV ciphertext encrypts $s[i]$ in all $n'$ slots and sends all $l$ ciphertexts along with query $qu$ to server. Server first calculates $res = (db \cdot c_0, db \cdot c_1)$ as before and decrypts $res$ homomorphically using $z$. It first chunks $db\cdot c_1 \in Z_q^{l \times n}$ into multiple small matrix each $\in Z_q^{n' \times n}$ and $db \cdot c_0 \in Z^l_q$ into multiple vectors each $\in Z_q^{n'}$. Let $k = {\lceil}\frac{l}{n'}{\rceil}$ and denote $t^M_i$ as the $i^{th}$ small matrix and $t^V_i$ as $i^{th}$ small vector with $i \in [0..k)$. For each pair, server computes: $$t^V_i - \sum_{j=0}^{n-1} t^M_i[j]*z[j]$$ where $t^M_i[j]$ represents $j^{th}$ column of $t^M_i$ encoded as BFV plaintext, $z[j]$ represents BFV ciphertext corresponding to element $s[j]$. This results in $k$ BFV ciphertexts. Server sends all $k$ BFV ciphertexts back to client. **New client-server communication**: $m * \log(q)$ bits + $n * n' * \log{Q}$ bits (ie $c_0$ and $z$) **New server-client communication**: $\frac{l}{n} * n * \log{Q}$ bits (ie $k$ BFV ciphertexts). **New Server processing**: $N + N * n$ multiplications in $Z_q$ + $n * \frac{l}{n}$ BFV scalar multiplications (I am not counting additions) Usually both $l$ and $n$ will be power of 2 with $n < l$ for any reasonable size $db$. So we can modify costs to: **New server-client communication**: $l * \log{Q}$ bits (ie $k$ BFV ciphertexts). **New Server processing**: $N + N * n$ multiplications in $Z_q$ + $n * l$ BFV scalar multiplications Server-client communication improves because we lose a big factor of $n$. In practice, BFV plaintext space will not equal LWE ciphertext modulus $q$. Thus we will decompose $res = (c_0, c_1)$ into multiple chunks and process them separately. This will be increase server-client communication by some factor equal to no. of chunks. For example, if BFV plaintext fits $t$ bits then server-client communication will increase by factor $q/t$. Doing this also requires to LWE secrets to not be -ve and be bounded to small value so that performing decryption procedure homomorphically does not overflow bits (probably this is the reason why paper's implementation sets LWE secret key bound to $[0,2]$). ### Concrete example Lets assume a tx is of $32 * 18$ bytes = 4608 bits (ie 18 field elements) and consider database db of dimension $2^{18} \times 2^{21}$ with each element in $Z_{p=527}$ (ie 9 bits). If we store each transaction in chunks of 9 bits in consecutive rows within a single column, a single transaction will take $4608/9=512$ rows. Thus we can pack in $2^{39}/2^{9}=2^{30}$ txs (1B txs) in the $db$. For now, let's assume client wants to retrieve a single tx and knows the column it is stored in. Set LWE parameters as $n=1608, p=527, q=2^{32},m=2^{21}$ (128 bit security). Set BFV parameters as n=2048, logQ=38, plaintext $Z_q' = 2^{16}+1$. Notice that BFV plaintext does not equal LWE ciphertext space. Thus we need to split $res$ into 32/16 = 2 chunks causing server-client communication to increase by factor of 4. **Client-server communication**: $2^{21} * 32 (/8/1024/1024) + 2048*1608*38 (/8/1024/1024) = 8+15 = 23 Mb$ **Server-client communication**: $4 *(2^{18})*38 /8/1024/1024 = 4.75 Mb$ ### Random points and Questions: 1. Client-server communication cost is dominated by BFV ciphertexts in $z$. Can we somehow reduce this or amortise across several queries? Notice that we can't use same ciphertexts for multiple secrets and can't use same secret across different queries. So this simple solution will not work. 2. I think that BFV parameters n=2048, logQ=38 are somewhat optimal since to do BFV scalar multiplications with plaintext of 9 bits will atleast require 38 bits of BFV ciphertext modulus. Reducing n to 1024 will make logQ really small, resulting error overflow. 3. Only $n * n' * \log{Q}$ bits due to (2) stays fixed in client-server communication. Otherwise, both client-server and server-client communication scale linearly with size of $db$ 4. The $db$ is lot wider than it is tall ($m > l$). This assures LWE operations take majority of processing time than BFV operations. 5. Even though the example shows client retrieving a single tx. In fact client retrieves the entire column which contains $2^{18}/2^{9}=2^9$ txs. Is it possible to structure the database such that txs that client is interested in are clustered in as few columns as possible? (this will probably require leaking a few bits of privacy). A naive idea could be to increase $l$ (ie length of single column) and assign a single column to many clients. But in this case anonymity set depends on $l$ and increasing $l$ will increase communication. 6. Paper mentions that since lower bits of LWE response $res$ will be rounded off anyways during decryption, we can potentially get rid of them. This reduces server-client communication by a factor of 2X.