Oblivious Message Detection

Problem Statement

Many agents (\(N_a \gg 1\)) post many messages (\(N_m \gg 1\)) addressed to each other on a public message board, encrypting them with recipients' public keys.

Privacy requirement: Recipents don't want any other party to know how many (if any) and which in particular messages are addressed to them.

Efficiency requirement: Assume, there are \(0 \le k \ll N_m\) messages addressed to (or "pertinent to") a particular recipient. They should be able to detect which messages on the public board are addressed to them in reasonable time using reasonable resources.

We aim to fulfill the efficiency requirement without compromising the privacy requirement.

General Principle

Setup

The system consists of a public board, agents (who can be senders and/or receivers), and detectors –- service providers that facilitate detection of pertinent messages.

During setup, recipient \(r\) generates a pair of FHE keys \(\langle \mathrm{pk}_r, \mathrm{sk}_r \rangle\). Everybody also agrees on a public parameter \(\ell\) –- the number of clues.

Sending

Sender publishes the message to the public board where it gets a numerical index \(j\). Together with the message, sender attaches \(\ell\) clues \(c_i\) (\(i \in [1,\ell]\)), each of which is an FHE encryption of number "1" (with plaintext space \(\mathbb{Z}_2\)) created using recipients public key \(\mathrm{pk}_r\).

Detection

Recipient sends \(\mathrm{pk}_r\) to detector, who, in its turn, uses it to re-randomize the clues of all messages (\(N_m \times \ell\) homomorphic operations). FHE scheme must support re-randomization of ciphertexts. Each clue will now be decryptable into \(\mathbb{Z}_2\) by \(\mathrm{pk}_r\). Clues of pertinent (to the user who requested detection) messages will decrypt to "1", others will decrypt to either 1 or 0 with equal probability.

	\(m_1\)	\(\bf m_2\)	\(m_3\)	\(m_4\)	\(m_5\)
\(c_1\)	1	1	0	0	1
\(c_2\)	0	1	0	1	0
\(c_3\)	1	1	1	1	0

Detector now holds \(\ell\) encrypted binary vectors of length \(N_m\). Positions corresponding to petinent messages should all contain an encryption of "1" with overwhelming probability. All other positions contain either "1" or "0" with equal probability. In the table above, only message \(m_2\) is pertinent.

Detector homomorphically performs bitwise AND over these \(\ell\) vectors thus obtaining a vector \(\mathrm{PV}\), in which positions corresponding to pertinent messages contain "1" with overwhelming probability and other positions contain "0" with probability \(1-2^{-\ell}\). Therefore, \(\ell\) controls the rate of false positive detections and distribution of work between detector and receiver (who will waste resources attempting to decrypt non-pertinent messages).

Compacting

We can stop at this point and just send vector \(\mathrm{PV}\) to recipient and have them decrypt it, find out which bits are set to "1", download corresponding messages and attempt to decrypt them. However, the size of vector \(\mathrm{PV}\) grows linearly with \(N_m\), and any client will have to download and decrypt potentially millions of bits even if they expect just 1-2 messages. In this section we compact the data that the client needs to download from \(O(N_m)\) to \(O(\bar{k})\), where \(\bar{k}\) is an upper bound on the expected amount of pertinent messages.

Detector creates \(m > \bar{k}\) buckets, \(m = \texttt{poly}(\bar{k})\); each bucket \(i\) has an accumulator \(\mathrm{Acc}_i\) and a counter \(\mathrm{Ctr}_i\) stored as homomorphic encryptions of their bitwise reprsentations, initialized to zero.

Detector goes through elements \(\mathrm{PV}[i]\), then for every \(i\) it chooses at random a bucket \(j\), and homomorphically executes \(\mathrm{Ctr}_j \mathrel{+}= \mathrm{PV}[i]\) and \(\mathrm{Acc}_j \mathrel{+}= (\mathrm{PV}[i] \times i)\).

Detector then sends two vectors \(\mathrm{Ctr}\) and \(\mathrm{Acc}\) of length \(m\) to the recipient, who checks decryptions of each \(\mathrm{Ctr}_i\):

If \(\mathrm{Ctr}_i = 0\), the bucket has no pertinent messages;
If \(\mathrm{Ctr}_i = 1\), the bucket has exactly one pertinent message, and \(\mathrm{Acc}_i\) decrypts to index of that message;
If \(\mathrm{Ctr}_i > 1\), an overflow happened and we need to re-do the compacting.

Overflow means that more than one pertinent message ended up in the same bucket, which is a low probability possibility.

Recipient may interactively request re-compacting or, as it is done in the paper, compacting can be non-interactively repeated multiple times and multiple sets of buckets are sent to the user at once in expectation that at least one will not have overflown. Exact probabilities are cailculated in the paper.

Practical Implementation

Implementation written in C++, based on

Palisade – an OpenFHE library, written in C++
SEAL – Microsoft Research FHE library, written in C++
HEXL – optional, provides very fast Intel-specific implementations of math typically used in homomorphic crypto
NTL – number theory library, only used for parallelization, can be easily replaced

SEAL version used in their implementation is v3.6.3, which can't currently be built with HEXL. I got it to work with a newer version v3.7.3, which does build with HEXL enabled. See performance metrics below.

Installation

To install without HEXL on an Ubuntu 20.04 instance:

cd $HOME && \
LIBDIR=$HOME/OMR && \
sudo apt update && sudo apt upgrade -y && \
sudo apt install -y autoconf cmake libgmp3-dev libntl-dev=11.4.3-1build1 git build-essential && \
git clone -b v1.11.3 https://gitlab.com/palisade/palisade-release && \
cd $HOME/palisade-release && mkdir build && cd build && cmake .. -DCMAKE_INSTALL_PREFIX=$LIBDIR && \
make -j && make install && \
cd $HOME && \
git clone -b 3.7.3 https://github.com/microsoft/SEAL && \
cd $HOME/SEAL && cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$LIBDIR && \
cmake --build build && cmake --install build && \
cd $HOME/ && \
git clone https://github.com/heliaxdev/ObliviousMessageRetrieval.git && \
cd ObliviousMessageRetrieval && git checkout bazzilic && \
mkdir build && cd build && \
mkdir ../data && mkdir ../data/payloads && mkdir ../data/clues && \
cmake .. -DCMAKE_PREFIX_PATH=$LIBDIR && make && \
cd $HOME

To install with HEXL support on a Ubuntu 20.04 instance running on an Intel CPU (e.g., an m6i instance in EC2):

cd $HOME && \
LIBDIR=$HOME/OMR && \
sudo apt update && sudo apt upgrade -y && \
sudo apt install -y autoconf cmake libgmp3-dev libntl-dev=11.4.3-1build1 git build-essential && \
git clone -b v1.11.3 https://gitlab.com/palisade/palisade-release && \
cd $HOME/palisade-release && mkdir build && cd build && cmake .. -DCMAKE_INSTALL_PREFIX=$LIBDIR && \
make -j && make install && \
cd $HOME && \
git clone --branch 1.2.3 https://github.com/intel/hexl && \
cd hexl && cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$LIBDIR && \
cmake --build build && cmake --install build && \
cd $HOME && \
git clone -b 3.7.3 https://github.com/microsoft/SEAL && \
cd $HOME/SEAL && cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$LIBDIR -DSEAL_USE_INTEL_HEXL=ON && \
cmake --build build && cmake --install build && \
cd $HOME/ && \
git clone https://github.com/heliaxdev/ObliviousMessageRetrieval.git && \
cd ObliviousMessageRetrieval && git checkout bazzilic && \
mkdir build && cd build && \
mkdir ../data && mkdir ../data/payloads && mkdir ../data/clues && \
cmake .. -DCMAKE_PREFIX_PATH=$LIBDIR && make && \
cd $HOME

Performance

On Mac M1 CPU:

Generating clues rate: approx. 250us/clue, seems to behave linear in clues and messages.
Detector rate: ~15ms per message (500k msgs – over 2 hours) – easily parallelisable
Recipient’s rate: ~2ms overall

On Intel Xeon (Ice-Lake) single core with HEXL:

As fast as 15us/clue with some kind of batching that I suspect is done under the hood by HEXL. Generating 1 clue/msg take same time as 16 clues/message. In reality, we are likely to use 3-5 clues per message, so ~320us/message.
Detector rate: ~4ms/clue (500k msgs with 4 clues - over 2 hours)
Recipient's rate: ~3ms overall

Without HEXL, detector works ~2x slower.

(I wil re-do the benchmarks on the week of Oct 31 –VS)

Limitations & Issues

DoS Attack

Consider the system instantiated with FHEW or TFHE (the technique may be extended differently depending on schemes). Then, a malicious sender can generate a wildcard ciphertext that decrypts to 1 for almost any honesty-generated key pair. By setting all the clue ciphertexts to wildcard ciphertexts, the adversary will cause the message including this clue to be detected as pertinent for almost all recipients, and if there are many such messages, then these recipients would all overflow.

Implementation uses PVW encryption (doi, pdf), which is a "vectorized" form of Regev scheme (pdf), which in its turn is based on LWE. Authors claim, that PVW encryption is not prone to this kind of ciphertext manipulation and therefore is not vulnerable to this type of DoS attack.

Cryptography

Regev encryption is a basic LWE encryption:

Let \(n\) be the security parameter. Let \(p \ge 2\) be a prime in \([n^2,2n^2]\), \(m=(1+\varepsilon)(n+1)\log p\) and let \(\chi\) be a probability distribution on \(\mathbb{Z}_p\).

Private key: Choose \(\vec{s} \in \mathbb{Z}_p^n\) uniformly at random.
Public key: Choose \(m\) vectors \(\vec{a}_1,\ldots,\vec{a}_m \in \mathbb{Z}_p^n\) independently and uniformly at random. Also, choose \(\vec{e}_1,\ldots,\vec{e}_m \in \mathbb{Z}_p\) independently by \(\chi\). Public key is \((\vec{a}_i,b_i)_{i=1}^m\), where \(b_i = \vec{a}_i \cdot \vec{s} + e_i\).
Encryption: To encrypt one bit, choose a random set \(S\) uniformly among all \(2^m\) subsets of \([m]\). If the bit is 0, the encryption is \((\Sigma_{i \in S}\vec{a}_i,\Sigma_{i \in S}b_i)\), otherwise it's \((\Sigma_{i \in S}\vec{a}_i,\lfloor \frac{p}{2} \rfloor + \Sigma_{i \in S}b_i)\).
Decryption: To decrypt pair \((\vec{a}, b)\), calculate \(d = b - \vec{a}\cdot\vec{s} \mod p\). If \(d\) is closer to \(0\) than to \(\lfloor \frac{p}{2} \rfloor\), then decrypt to 0; otherwise, decrypt to 1.

Typos

p. 21, 5th paragraph from the top: "It also homomorphically computes \(\mathrm{PV}_i\) and adds it…" should be "computes \((\mathrm{PV}_i \times i)\) and adds it".

Sources

OMD paper: Zeyu Liu, Eran Tromer. Oblivious Message Retrieval, 2022. https://eprint.iacr.org/2021/1256.pdf
OMD source code: https://github.com/ZeyuThomasLiu/ObliviousMessageRetrieval/
My modifications to OMD code: https://github.com/heliaxdev/ObliviousMessageRetrieval/tree/bazzilic
PVW encryption: https://doi.org/10.1007/978-3-540-85174-5_31, https://link.springer.com/content/pdf/10.1007/978-3-540-85174-5_31.pdf
Regev encryption: https://cims.nyu.edu/~regev/papers/qcrypto.pdf

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.