# Distributed Testing Lower Bounds (Non-Interactive)
Let $\mathcal{H}_0$, $\mathcal{H}_1$ be two disjoint classes of probability distributions on an alphabet $\mathcal{X}$.
There are $n$ distributed agents (players), each receiving an i.i.d. sample from a distribution either in $\mathcal{H}_0$ or $\mathcal{H}_1$. Every player sends information about its samples to a central server using a channel from the set of allowed channels. Based on the output from the $n$ channels, the center must decide between $\mathcal{H}_0$ and $\mathcal{H}_1$ with probability of error at most $\delta$ ($< 1/2$), and it must do so using as few players as possible.
There are three types of non-interactive protocols allowed:
- Deterministic
- Private-coin
- Public-coin
Existence of a successful distributed testing algorithm with $n$ players **implies** the following:
:::info
There exists a protocol (in the class of *allowed protocols*) such that, for every instance (in the class of *allowed instances*), the TV distance between distributions of **random variables observed by the center** is greater than or equal to $1-2\delta$. That is,
$$
\max_{\text{allowed protocols}} \
\min_{\text{allowed instances}}
d_{TV}(
\mathbb{P}^{\text{observed r.v.}}_{\mathcal{H}_1 \text{ instance}},
\mathbb{P}^{\text{observed r.v.}}_{\mathcal{H}_0\text{ instance}}
)
\geq 1-2\delta.
$$
The **observed random variables** are:
- $Y_1,\ldots,Y_n$ (outputs of channels selected by the $n$ players) for deterministic and private-coin protocols.
- $U, Y_1,\ldots,Y_n$ for public-coin protocols with public randomness $U$.
:::
:::danger
**Question:** Is the condition $\max \min d_{TV} \geq 1-2\delta$ sufficient? That is, if there exists a protocol that, for any pair of allowed instances, creates sufficient distance between them, can we use this protocol to test between two **unknown** hypotheses, one from $\mathcal{H}_0$ and one from $\mathcal{H}_1$?
This is not clear because, for optimal binary hypothesis testing, we need to know distributions $P$ and $Q$ in order to define acceptance region $\{ x : P(x) \geq Q(x)\}$. But for composite hypothesis testing, we just know the family from which distributions come, and not the exact distributions themselves.
:::
---
### What is an *allowed protocol*?
---
**Deterministic:** Let $\mathcal{W}$ be the class of deterministic channels that the players are allowed to use. Each deterministic channel is a function $w: \mathcal{X} \to \mathcal{Y}$.
A deterministic protocol is specified by an $n$-tuple of deterministic channels $(w_1,\ldots,w_n) \in \mathcal{W}^n$.
**Private-coin:** In private-coin protocols, player-$i$ is allowed access to an independent private random variable $U_i$.
For each $i \in [n]$, a private-coin protocol specifies *(i)* distribution of $U_i$'s, and *(ii)* how $U_i$ is used to select a channel $W_i := w_i(U_i) \in \mathcal{W}$. Note that the channels $w_i(U_i),\ldots,w_n(U_n)$ are independent in the non-interactive setting.
Equivalently, a private-coin protocol is specified by a product distribution $D^n = D_1 \otimes \cdots \otimes D_n \in \Delta_{\rm prod}(\mathcal{W}^n)$, where $\Delta_{\rm prod}(\mathcal{W}^n)$ is the set of product distributions on $\mathcal{W}^n$. Protocol works by sampling $(w_1,\ldots,w_n) \sim D^n$, which can be done by player-$i$ sampling $w_i \sim D_i$ (using private randomness).
:::info
A private-coin protocol can be thought of as sampling a deterministic protocol "on the fly".
:::
**Public-coin:** In addition to independent private randomness, each player and the center is allowed access to a common random variable $U$ (independent of private randomness $U_i$'s).
For each $i \in [n]$, a public-coin protocol specifies *(i)* distribution of $U$ and $U_i's$, and *(ii)* how $(U_i,U)$ is used to select a channel $W_i := w_i(U_i,U) \in \mathcal{W}$. Note that, given $U = u$, the channels $w(U_i,u),\ldots,w(U_n,u)$ are independent in the non-interactive setting.
Equivalently, a public-coin protocol is specified by a distribution $\mathcal{D} \in \Delta(\Delta_{\rm prod}(\mathcal{W}^n))$. Protocol works by sampling $D^n \sim \mathcal{D}$ (using public randomness), and then sampling $(w_1,\ldots,w_n) \sim D^{n}$ (using private randomness).
:::info
A public-coin protocol can be thought of as sampling a private-coin protocol "on the fly".
:::
---
### What is an *allowed instance*?
---
What does it mean for a testing protocol to successfully (i.e. with probability of error at most $\delta$) test $\mathcal{H}_0$ vs $\mathcal{H}_1$ using $n$ samples?
It means (by definition) that the algorithm can successfully test $p^n$ vs $q^n$ for every $p \in \mathcal{H}_0$, $q \in \mathcal{H}_1$.
It turns out that such a test can also successfully test $\mathbb{E}_{\pi_0}[p^n]$ vs $\mathbb{E}_{\pi_1}[q^n]$, for every $\pi_0 \in \Delta(\mathcal{H}_0), \pi_1 \in \Delta(\mathcal{H}_1)$.
Thus, we specify an instance with $(\pi_0,\pi_1)$, where $\pi_0 \in \Delta(\mathcal{H}_0), \pi_1 \in \Delta(\mathcal{H}_1)$.
>An instance can also be specified using a parametrized family. For e.g., let $\mathcal{P} := \{p_z : z \in \mathcal{Z}_0\} \subseteq \mathcal{H}_0$, and $\mathcal{Q} := \{q_z : z \in \mathcal{Z}_1\} \subseteq \mathcal{H}_1$. Then, an instance is specified by $((\mathcal{P},\tau_0),(\mathcal{Q},\tau_1))$, where $\tau_0 \in \Delta(\mathcal{Z}_0), \tau_1 \in \Delta(\mathcal{Z}_1)$.
---
## Information-theoretic condition for a successful test
We now instantiate the condition on TV distance in the big blue box above for deterministic, private-coin and public-coin protocols.
---
### Deterministic
---
Existence of a successful deterministic testing protocol using $n$ players implies (and is implied by) the following:
:::success
$$
\max_{w^n \in \mathcal{W}^n} \
\min_{(\pi_0,\pi_1)} \
d_{TV}(
\mathbb{E}[p^{Y^n}],
\mathbb{E}[q^{Y^n}]
)
\geq 1-2\delta, \\
\text{where }
\\\mathbb{P}(p,X^n,Y^n) \equiv \mathbb{P}(p)\times \mathbb{P}({X^n|p})\times \mathbb{P}({Y^n|X^n}),
\\\mathbb{P}(q,X^n,Y^n) \equiv \mathbb{P}(q)\times \mathbb{P}({X^n|q})\times \mathbb{P}({Y^n|X^n}),
\\p \sim \pi_0, q \sim \pi_1, Y^n|X^n \sim (w_i(X_i))_{i\in[n]}.
$$
:::
---
### Private-coin
---
Existence of a successful private-coin testing protocol using $n$ players implies (and is implied by) the following:
:::success
$$
\max_{D^n \in \Delta_{\rm prod}(\mathcal{W}^n)} \
\min_{(\pi_0,\pi_1)} \
d_{TV}(
\mathbb{E}[p^{Y^n}],
\mathbb{E}[q^{Y^n}]
)
\geq 1-2\delta, \\
\text{where }
\\\mathbb{P}(p,X^n,w^n,Y^n) \equiv \mathbb{P}(p)\times \mathbb{P}({X^n|p})\times \mathbb{P}({w^n})\times \mathbb{P}({Y^n|X^n,w^n}),
\\\mathbb{P}(q,X^n,w^n,Y^n) \equiv \mathbb{P}(q)\times \mathbb{P}({X^n|q})\times \mathbb{P}({w^n})\times \mathbb{P}({Y^n|X^n,w^n}),
\\p \sim \pi_0; \ q \sim \pi_1; \ w^n \sim D^n; \ Y^n|(X^n,w^n) = (w_i(X_i))_{i\in[n]}.
$$
:::
---
### Public-coin
---
Existence of a successful public-coin testing protocol using $n$ players implies (and is implied by) the following:
:::success
$$
\max_{\mathcal{D} \in \Delta(\Delta_{\rm prod}(\mathcal{W}^n))} \
\min_{(\pi_0,\pi_1)} \
d_{TV}(
\mathbb{E}[p^{D^n,Y^n}],
\mathbb{E}[q^{D^n,Y^n}]
)
\geq 1-2\delta, \\
\text{where }
\\\mathbb{P}(D^n,p,X^n,w^n,Y^n) \equiv \mathbb{P}({D^n})\times \mathbb{P}(p)\times \mathbb{P}({X^n|p})\times \mathbb{P}({w^n|D^n})\times \mathbb{P}({Y^n|X^n,w^n}),
\\\mathbb{P}(D^n,q,X^n,w^n,Y^n) \equiv \mathbb{P}({D^n})\times \mathbb{P}(q)\times \mathbb{P}({X^n|q})\times \mathbb{P}({w^n|D^n})\times \mathbb{P}({Y^n|X^n,w^n}),
\\ D^n \sim \mathcal{D}; \ p \sim \pi_0; \ q \sim \pi_1; \ w^n|D^n \sim D^n; \ Y^n|(X^n,w^n) = (w_i(X_i))_{i\in[n]}.
$$
:::
>In a public-coin protocol, the center observes the public randomness as well. Equivalently, center can observe $D^n$ in addition to the channel output $Y^n$. This is why we consider the TV distance between distributions of $(D^n,Y^n)$.
Noting that
$$
\mathbb{E}[p^{D^n,Y^n}] = \mathbb{E}_{D^n}[\mathbb{E}[p^{Y^n|D^n}]] , \quad
\mathbb{E}[q^{D^n,Y^n}] = \mathbb{E}_{D^n}[\mathbb{E}[q^{Y^n|D^n}]],
$$
and using the fact that
$$
d_{TV}(P_UP_{V|U},P_UQ_{V|U}) =
d_{TV}(\mathbb{E}_U[P_{V|U}],\mathbb{E}_U[Q_{V|U}]) =
\mathbb{E}_U[d_{TV}(P_{V|U},Q_{V|U})],
$$
we get
$$
d_{TV}(
\mathbb{E}[p^{D^n,Y^n}],
\mathbb{E}[q^{D^n,Y^n}]
) =
\mathbb{E}_{D^n \sim \mathcal{D}}[d_{TV}(\mathbb{E}[p^{Y^n|D^n}],\mathbb{E}[q^{Y^n|D^n}])].
$$
Thus, existence of a successful public-coin protocol with $n$ players is equivalent to
:::success
$$
\max_{\mathcal{D} \in \Delta(\Delta_{\rm prod}(\mathcal{W}^n))}
\min_{(\pi_0,\pi_1)} \
\mathbb{E}_{D^n \sim \mathcal{D}}[d_{TV}(\mathbb{E}[p^{Y^n|D^n}],\mathbb{E}[q^{Y^n|D^n}])]
\geq 1-2\delta.
$$
:::
To obtain a weaker condition, note that
$$
\mathbb{E}_{D^n \sim \mathcal{D}}[d_{TV}(\mathbb{E}[p^{Y^n|D^n}],\mathbb{E}[q^{Y^n|D^n}])]
\leq
\max_{D^n \in \Delta_{\rm prod}(\mathcal{W}^n)}
d_{TV}(\mathbb{E}[p^{Y^n|D^n}],\mathbb{E}[q^{Y^n|D^n}]).
$$
Thus, existence of a successful public-coin protocol *implies*
:::success
$$
\min_{(\pi_0,\pi_1)} \
\max_{D^n \in \Delta_{\rm prod}(\mathcal{W}^n)} \
d_{TV}(\mathbb{E}[p^{Y^n|D^n}],\mathbb{E}[q^{Y^n|D^n}])
\geq 1-2\delta.
$$
:::
>We replaced expectation with max. This makes the outermost max vacuous, and so we remove it and end up with min-max.
:::info
The weaker public-coin condition above is just the private-coin condition with max and min interchanged.
:::