---
theme: gaia
_class: lead
paginate: true
---
# **No-Dimensional Sampling Coresets for Classification**
Jeff M. Phillips* & Meysam Alishahi
University of Utah
(* visiting ScaDS.AI, Uni Leipzig, MPI for Math in Sciences)
---
## Coresets
Large dataset $X$.
Want to solve expensive $f(X)$.
Step 1: Reduce $Z \leftarrow X$ (a *coreset*)
Step 2: Solve $f(Z)$
Guarantee:
- $|f(Z) - f(X)| < \varepsilon$
- $|Z| < \mathsf{size}(1/\varepsilon)$
---
## Linear classifiers
$(X,y)$ with $X \in \mathbb{R}^{n \times d}$ and $y \in \{-1,+1\}^n$
for each $(x_i, y_i)$, goal $w \in \mathbb{R}^{d+1}$ so $v_i(w) = \langle y_i x_i , w \rangle > 0$.
Objective: minimize $\sum_{i=1}^n \phi(v_i(w))$ with
- sigmoid $\phi(v) = \frac{1}{1 + e^v}$
- logistic $\phi(v) = \ln(1+e^{-v})$
- svm $\phi(v) = \max(0, 1-v)$
- ReLu $\phi(v) = \max(0,-v)$
Regularizers: Minimize $\sum_{i=1}^n \phi(v_i(w)) + \frac{1}{k}\|w\|_2^2$ with $f^{\phi,k}_w(x_i) = \phi(v_i(w)) + \frac{1}{k}\|w\|_2^2 \in F^{\phi,k}$
---
## Our Results
- Consider *infinite* $X$ with prob. dist. $P$ on $\mathbb{R}^d$.
Find small $Z \subset \mathbb{R}^d$ so for all $w$
$$\left|\int_x f_w^{\phi,k}(x) {\rm d}P(x) - \sum_{z \in Z} \nu(z) f_w^{\phi,k}(z) \right| \leq \varepsilon \int_X f_w^{\phi,k}(x) {\rm d}P(x)$$
- $|Z| = O(k^3/\varepsilon^2)$. Via iid samples from $P$.
No dependence on dimension $d$
Sample complexity results!
- Radamacher Complexity bound for sensitivity sampling.
- Simplification of [Langberg-Schulman-Feldman] sensitivity sampling bounds. New proofs with Fubini's theorem.
---
## 3 Ingredients
- Sensitivity Sampling (Langberg-Schulman-Feldman 2010,2011)
importance sampling
- Linked Range Spaces for KDE coresets (JK**P**V 2011)
- Radamacher Complexity & McDiarmid's Inequality
---
## Sensitivity Sampling
Objective function $f : \mathbb{R}^d \to [0,\infty)$, for $f_w \in F$
*upper sensitivity function*
$s$: $\sup_{f \in F} \frac{f(x)}{\int_z f(z) {\rm d}P(z)} \leq s(x)$
*total sensitivity* $S = \int_x s(x) {\rm d}P(x)$
Importance sampling: sample $Z$ from $Q(x) = P(x) \cdot \frac{s(x)}{S}$
new weighting $\nu(z) = \frac{S}{s(z)}\frac{1}{|Z|}$
Now "effect" from any $x$ bounded by $S$, not $s(x)$.
$(1+\varepsilon)$ error with $|Z| = O(\mathsf{VC} \cdot S^2/\varepsilon^2)$
$\mathsf{VC}$ is VC-dimension of $s(x)/S$-augmented $(Q,F)$.
(Feldman-Schmidt-Sohler 2020) improve to $|Z| = O(\mathsf{VC} \cdot S \log S / \varepsilon^2)$
---
## Linked Range Spaces
Kernel $K(x,z) \in [0,1]$ like $K(x,z)=\exp(-\|x-z\|^2)$
$\mathsf{KDE}_P(z) = \mathbf{E}_{x \sim P} K(x,z)$
$\varepsilon$-*KDE coreset* is $Z \subset \mathbb{R}^d$ so $|\mathsf{KDE}_P(x) - \mathsf{KDE}_Z(x)| \leq \varepsilon$ for all $x \in \mathbb{R}^d$
(Joshi-Kommaraju-**P**-Venkatasubramanian 2011)
For $f : \mathbb{R}^d \to [0,1]$, define superlevel set $\mathsf{super}(f,\tau) = \{x \in \mathbb{R}^d \mid f(x) \geq \tau \}$.
*linked range space* $(\mathbb{R}^d, \mathsf{super}(F))$ is all superlevel sets $f \in F$ and $\tau \in [0,1]$
Sample $Z \sim P$ with $|Z| = O(\mathsf{VC}/\varepsilon^2)$ then $Z$ an $\varepsilon$-KDE coreset.
$\mathsf{VC}$ is VC dimension of $(\mathbb{R}^d,\mathsf{super}(K))=d$
For sensitivity sampling, we need:
$g: \mathbb{R}^d \to [0, \infty)$, define sublevel set $\mathsf{sub}(g,\tau) = \{x \in \mathbb{R}^d \mid g(x) \leq \tau\}$.
Call these range spaces: *sub-linked*.
---
## Radamacher Complexity
$m$-*Radamacher Complexity* for $(P,F)$ uses $\sigma_i \ldots \sigma_m \sim \{-1,+1\}$
$$\mathcal{R}_m^P = \mathbf{E}_{x_{1:m} \sim P} \mathbf{E}_{\sigma_{1:m}} \sup_{f \in F} \left[ \frac{1}{m} \sum_{i=1}^m \sigma_i f(x_i) \right]$$
$f : (\mathbb{R}^d)^n \to \mathbb{R}$ satisfies $\Delta$-*bounded difference* if,
for all $x_1, \ldots, x_n$ and $x'_i$,
$$|f(x_1,\ldots,x_{i-1}, x_i, x_{i+1}, \ldots x_n) - f(x_1,\ldots,x_{i-1}, x'_i, x_{i+1}, \ldots x_n)| \leq \Delta$$
McDiarmid (1989) for $f$ with $\Delta$-bounded diff, $X_1 \ldots X_n$ are independent RVs.
$$\mathbf{Pr}[f(X_1, \ldots, X_n) - \mathbf{E}[f(X_1, \ldots, X_n)] \geq \varepsilon] \leq \exp(-2 \varepsilon^2 / (n \Delta^2))$$
---
## Prior Work
**$\mu(X)$ Complexity** (Munteanu-Schweigelsohn-Sohler-Woodruff 2018)
$$\mu(X) = \sup_w \frac{\int_x |\langle x,w \rangle| {\rm d}P_+(x)}{\int_x |\langle x,w \rangle| {\rm d}P_-(x)}$$
where $P_+$ is distribution of $x_i$ st $y_i=+1$; $P_-$ st $y_i = -1$
Study $\sum_{i=1}^n \phi(v_i)$ with logistic $\phi(v) = \ln(1+e^{-v})$, no regularizer
- coreset size $|Z|$ must depend on $\mu(X)$
- $|Z| = O^*(\frac{d^3 \mu(X)^3 }{ \varepsilon^4})$
**Lewis weights** (Mai-Musco-Rao 2021)
- $|Z| = O(\frac{\mu(X)^2 d \log(d/\varepsilon) }{ \varepsilon^2})$
---
## Prior Work
### Regularized Regression
Study $\sum_{i=1}^n \phi(v_i) + \frac{1}{k} \|w\|_2^2$ with logistic $\phi(v) = \ln(1+e^{-v})$
(Tolochinsky-Jubran-Feldman 2022)
- $|Z| = O(\frac{k d^2 \log n \log k}{\varepsilon^2})$
- logistic and svm require $\|w\| \leq 1$, not sigmoid
- bounded $\|x_i\|$
(Curtin-Im-Moseley-Pruhs-Samadian 2020)
- $|Z| = O(\frac{d k \log k}{\varepsilon^2})$
---
## Our Main Radamacher Result
**Theorem 1:** For $(P,F)$ with upper sensitivity function $s$, total sensitivity $S$.
Take $|Z|$ s-sensitivity samples (from $Q$), wp $> 1- \exp(-2|Z|t^2/S)$ then for all $f \in F$
$$\left| \int_x f(x) dP(x) - \frac{S}{|Z|} \sum_{z \in Z} \frac{f(z)}{s(z)} \right| \leq (2 \mathcal{R}_{|Z|}^Q(F_Q) + t) \int_x f(x) {\rm d}P(x)$$
- $\|x\|_2 \leq 1$ (constant goes into $C_\phi$)
- $\phi \in$ logistic, sigmoid, svm, ReLu
1. $S = O(k)$ ($s$ constant; depends on $\max_x \|x\|_2$, Lipschitz factor of $\phi$)
2. sensitive Radmacher $\mathcal{R}_{|Z|}^Q(F_Q) = C_{\phi,k}\sqrt{S/|Z|} = O(k\sqrt{k/|Z|})$
3. Set $|Z| = O(k^3/\varepsilon^2)$, $t = \varepsilon/2$ yields
$$\left| \int_x f(x) {\rm d}P(x) - \frac{1}{|Z|} \sum_{z \in Z} f(z) \right| \leq (1+\varepsilon)\int_x f(x) {\rm d}P(x)$$
---
### Proof of Theorem 1.
Define and divide by $P(f)=\int_x f(x) {\rm d}P(x)$ so we need to prove:
$$\left| 1 - \frac{S}{|Z|} \sum_{z \in Z} \frac{1}{P(f)}\frac{f(z)}{s(z)} \right| \leq 2 \mathcal{R}_{|Z|}^Q(F_Q) + t$$
Using that $s(x) \geq f(x)/P(f)$:
- $b_f(z_{1:|Z|}) = \frac{S}{|Z|} \sum_{z \in Z} \frac{1}{P(f)}\frac{f(z)}{s(z)} \leq S$
- $b_f(z_{1:|Z|})$ has a bounded difference $\Delta \leq S/|Z|$.
Thus by McDiarmid's, with prob $\geq 1 - \exp(-2|Z|t^2/S^2)$
$$\begin{align}\sup_{f \in F} \left|1-b_f(z_{1:|Z|}) \right|
&\leq \textbf{E}_{z_{1:|Z|} \sim Q} \left[\sup_{f \in F}\left|1-b_f(z_{1:|Z|}) \right|\right] + t
\\ &\leq 2 \mathcal{R}_{|Z|}^Q(F_Q) + t\end{align}$$
---
### Relative $(\alpha, \eta)$-Approximation
Range space $(P, \mathcal{H})$.
For $h \in \mathcal{H}$ let $P(h) = \int_x 1_{x \in h} {\rm d}P(x)$ be measure in $h$.
$\mu$ is a *relative $(\alpha, \eta)$-approximation* of $(P, \mathcal{H})$ if
$$\sup_{h \in \mathcal{H}} |P(h) - \mu(h)| \leq \alpha \max(\eta, P(h))$$
$m = O(\mathsf{VC} \frac{1}{\alpha^2}\frac{1}{\eta}\log \frac{1}{\eta})$ iid samples from $P$ (HarPeled-Sharir 2011)
---
### Sub-linked Function Approximation
Function family $G$ then $g : \mathbb{R}^d \to [0,\infty)$ (in $G)$, and prob dist $Q$
sub-linked ranges space $(Q, \mathsf{sub}(G))$:
- $\mathsf{sub}(g,\tau) = \{x \in \mathbb{R}^d \mid g(x) \leq \tau\}$
- $Q(g,\tau) = \int_{x \in \mathbb{R}^d} 1_{g(x) \leq \tau} {\rm d}Q(x) = \int_{x \in \mathsf{sub}(g,\tau)} {\rm d}Q(x)$
Consider $g(x) = \frac{S}{s(x)P(f)}f(x)$ and $Q(x) = \frac{s(x)}{S}P(x)$
- $g(x) \leq S$ (set $s(x) = s'(x)+1$)
- $\int_x g(x) {\rm d}Q(x) = 1$
---
## Classic Sensitivity VC-bound
**Theorem 2**: $\nu$ is relative $(\alpha, \eta)$-approximatioin of $(Q,\mathsf{sub}(G))$.
For each $f \in F$
$$\left| P(f) - \int_x \frac{S f(x)}{s(x)} dv(x) \right| \leq (\alpha + S \eta \alpha) P(f)$$
Then set $\alpha = \varepsilon/2$ and $\eta = 1/S$, so $... \leq \varepsilon P(f)$
---
### Proof:
Divide the goal by $P(f)$ and solve:
$$\begin{align} \frac{1}{P(f)} \left| P(f) - \int_x \frac{S f(x)}{s(x)} {\rm d}\nu(x) \right|
&=
\left| 1 - \int_x g(x) {\rm d}\nu(x) \right|
\\ &= \left| \int_x g(x) {\rm d}Q(x) - \int_x g(x) {\rm d}\nu(x) \right|
\\\mathsf{(via~ \textbf{Lemma 2})} &=\left| \int_{\tau=0}^S Q(g,\tau) - \nu(g,\tau) {\rm d}\tau\right|
\\ &\leq \int_{\tau=0}^S \left| Q(g,\tau) - \nu(g,\tau)\right| {\rm d}\tau
\\ &\leq \alpha \int_{\tau=0}^S \max(\eta, Q(g,\tau)) {\rm d}\tau
\\ &= \alpha \int_{\tau=0}^{L_g} \max(\eta, Q(g,\tau)) {\rm d}\tau + \alpha \int_{\tau=L_g}^S \max(\eta, Q(g,\tau)) {\rm d}\tau
\\ &\leq \alpha (1+ S \eta)\end{align}$$
---
### Split at $L_g$
Define $L_g = \sup\{ \tau \geq 0 \mid Q(g,\tau) \geq \eta\}$
- above the threshold $L_g$:
$$\int_{\tau = L_g}^S \max(\eta, Q(g,\tau)) {\rm d}\tau =\int_{\tau = L_g}^S \eta~ {\rm d}\tau \leq S \eta$$
- below the threshold $L_g$:
$$\begin{align}\int_{\tau=0}^{L_g} \max(\eta, Q(g,\tau)) {\rm d}\tau
&=\int_{\tau=0}^{L_g} Q(g,\tau) {\rm d}\tau
\\ \mathsf{(via ~\textbf{Lemma 1})}
&=\int_x \min(L_g, g(x)) {\rm d}Q(x) \leq \int_x g(x) {\rm d}Q(x) = 1\end{align}$$
---
### Using Fubini's theorem:
**Lemma 1:** $\int_x \min(L, g(x)){\rm d}Q(x) = \int_{\tau = 0}^L Q(g,\tau) {\rm d}\tau$
> $$\begin{align}\int_x \min(L, g(x)){\rm d}Q(x)
> &=\int_x \int_{\tau = 0}^L 1_{\tau < g(x)} {\rm d}\tau~ {\rm d}Q(x)
> \\ &=\int_{\tau = 0}^L \int_x 1_{\tau < g(x)} {\rm d}Q(x)~ {\rm d}\tau
> \\ &=\int_{\tau = 0}^L Q(g,\tau) {\rm d}\tau \end{align}$$
>
**Lemma 2:** $\left| \int_x g(x){\rm d}Q(x) - \int_x g(x){\rm d}\nu(x) \right| = \left| \int_{\tau=0}^S Q(g,\tau) - \nu(g,\tau) {\rm d}\tau\right|$
> $$\begin{align}\left| \int_x g(x){\rm d}Q(x) - \int_x g(x){\rm d}\nu(x) \right|
> &= \left| \int_x \min(S,g(x)){\rm d}Q(x) - \int_x \min(S,g(x)){\rm d}\nu(x) \right|
> \\ \mathsf{ (via \textbf{ Lemma 1}, x2)} &=\left| \int_{\tau=0}^S Q(g,\tau) - \nu(g,\tau) {\rm d}\tau\right|
\end{align}$$