INSAIT_Kamen - HackMD

# INSAIT_Kamen We are considering a setting that consists of a client and a server who are communicating, and there is an adversary listening to their communication. The client has some local data and wants to privately train an ML model by communicating some information about their data to the server. Concretely, the client has some training data $x$ which is sampled from a distribution $p(x)$. Now, instead of sending $x$ directly to the server, they sample a vector $g$ from a distribution $p(g|x)$ which is supposed to summarize relevant information from $x$, and then transmit $g$ to the server. Assume that adversary knows distributions $p(x)$ and $p(g|x)$. Given that they observe vector $g$, what is the best possible reconstruction $x$? We will formalize this. Let adversary be a function $f$ that is predicting the original input from $g$. We will say that adversary incurs loss $0$ if $x = f(g)$ and loss $1$ if $x \neq f(g)$. We define expected risk of the adversary as: $$R(f) = \mathbb{E}_{x \sim p(x), g \sim p(g|x)} \left[ 1_{x \neq f(g)} \right] = \mathbb{E}_g \mathbb{E}_{x|g} [ 1_{x \neq f(g)} ].$$ $$ \mathbb{E}_{x|g} [ 1_{x \neq f(g)} ] = p(x \neq f(g) | g) = 1 - p(x = f(g) | g) $$ Opt problem: $$ \max_x p(x|g) $$ $$ \max_x \frac{p(g | x)p(x)}{\int p(g, x_1) dx_1} $$ def p(x): # prior p(x) ... def p(g, x): # conditional p(g|x) ... def f(g): # adversary x = random_init() for i in range(iter_limit): loss = - (log (p(x)) + log(p(g, x))) x = x - learning_rate * d loss / d x