# $\tilde{O}(\log n)$ - Approximation Algorithm Ideas for Gaussian KDE
$$
\newcommand{\fkp}{\hat{f}_{K, P}}
\newcommand{\fkpi}{\hat{f}_{K, \Pi P}}
\newcommand{\sumpp}{\sum\limits_{p\in P}}
\newcommand{\sumppp}{\sum\limits_{p\in P'}}
\newcommand{\sumxk}{\sum\limits_{x\in K}}
\newcommand{\kxp}{k(\|x^* - p\|^2)}
\newcommand{\gxp}{g(\|x^* - p\|^2)}
\newcommand{\epstwo}{\frac{\epsilon}{2}}
\newcommand{\epsfour}{\frac{\epsilon}{4}}
\newcommand{\epssix}{\frac{\epsilon}{6}}
\newcommand{\xp}{\|x^* - p\|^2}
\newcommand{\maxxd}{\max_{x \in \mathbb{R}^d}}
\newcommand{\maxxm}{\max_{x \in \mathbb{R}^m}}
\newcommand{\norm}[1]{\|#1\|}
\newcommand{\R}{\mathbb{R}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\S}{\mathcal{S}}
\newcommand{\oc}{\overline{c}}
\newcommand{\tp}{\tilde{p}}
$$
$$
\newcommand{\gammaone}{\epstwo\cdot\frac{\sumpp \kxp}{\sumpp \gxp \cdot \xp}}
\newcommand{\sumppcondition}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 \le #1}}
\newcommand{\sumppconditionp}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 > #1}}
$$
$$
\newcommand{\gammatwo}{\epsfour\cdot\frac{\sumppcondition{\xi} \kxp}{\sumppcondition{\xi} \gxp \cdot \xp}}
\DeclareMathOperator*{\argmax}{arg\,max}
$$
## Defines
Defines there are $m = \log n \log\log n$ balls $\B_1, \B_2, ...,\B_m$ surrounding the optimum mode $x^*$. The squared radius $r_1^2, r_2^2, ..., r_m^2$ of those balls are as following:
$$
\begin{align}
r_1^2 &= O(1)\\
r_2^2 &= (1 + 1/\log n)\\
&...\\
r_i^2 &= (1 + 1/\log n)r_{i-1}^2\\
&...\\
r_m^2 &= (1 + 1/\log n)^{\log n\log\log n}
\end{align}
$$
Then, we defined the area between ball $\B_i$ and $\B_{i-1}$ is called shell $\S_i$ as shown in the following figure.
<div style="text-align:center; width: 240px; border: green solid 1px;">
<img src=https://i.imgur.com/koQA6Lf.png style="margin: 0 auto;"/>
</div>
## Approach I: a "magic" algorithm can give all points inside $\S_i$
### Claim 1
The outside area of $\B_m$ does not include more then constant approximation of the total density value with $x^*$ as the center of balls. This claim is defined as the following
$$
\sum_{i=1}^m \sum_{p\in \S_i}e^{-\norm{x^* - p}^2}\ge c\cdot \sumpp e^{-\norm{x^* - p}^2}, s.t.\ c\in (0, 1].
$$
**Prove**. We have $\sum_{p \in P} e^{-\norm{x^* - p}^2}\ge 1$ because the worst case is the $x^*$ directly on a point. And for the points located outside the $\B_m$, we have the sum value of those points less than $1$:
$$
\sum_{p\not\in \B_m} e^{-\norm{x^* - p}^2} < \sum_{p\not\in \B_m} e^{-(1 + 1/\log n)^{\log n\log\log n}} = 1.
$$
### Claim 2
For some $i$ we have the estimation value as following
$$
\begin{align}
\sum_{p\in \S_i} e^{-\norm{x^* - p}^2} &\ge \frac{1}{\log n\log\log n} \cdot OPT\\
s.t.\ OPT &= \sumpp e^{-\norm{x^* - p}^2}
\end{align}
$$
**Prove**. This is trivially correct because we have $\log n\log\log n$ balls in total and there must be one ring including the density more than or equal to the average of the optimium estimate value because of the pigen hole principle.
### Claim 3
First, **we project all the points on the outer boundary** of the $\S_i$ this point belongs to and name those points $\tilde{p}$. Then, we can set $\tilde{x}^* = 1/|\S_i|\cdot \sum_{p\in \S_i} \tilde{p}$, the mean of all the points inside the shell $\S_i$. And we will still have, for some $i$
$$
\sum_{p\in \S_i} e^{-\norm{\tilde{x}^* - \tilde{p}}^2} \ge \frac{1}{\tilde{O}(\log n)}\cdot OPT
$$
**Prove**. We have $\tilde{x}^* = 1/|\S_i|\cdot \sum_{p\in \S_i} \tilde{p}$. And based on all points are on the outer boundary of each $\S_i$, we have $\tilde{x}^*$ is equal to the one step mean-shift on $x^*$ of $\tilde{p}$ points of $\S_i$, which is
$$
\tilde{x}^* = \frac{\sum_{p\in \S_i}\tilde{p}\cdot e^{-\norm{x^* - \tilde{p}}^2}}{\sum_{p\in \S_i} e^{-\norm{x^* - \tilde{p}}^2}} = \frac{1}{|\S_i|} \cdot \sum_{p\in \S_i} \tilde{p}, because\ \forall \tilde{p}\ s.t.\ \norm{x^* - \tilde{p}}^2 = r_i^2.
$$
Since for Gaussian Kernel (convex), mean-shift algorithm will not get a solution worse than $x^*$. We have
$$
\sum_{p\in \S_i}e^{-\norm{\tilde{x}^* - \tilde{p}}^2} \ge \sum_{p\in \S_i}e^{-\norm{x^* - \tilde{p}}^2}.
$$
We also have $\norm{x^* - \tilde{p}}^2 \le (1 + 1/\log n)\cdot\norm{x^* - p}^2$, because both $p$ and $\tilde{p}$ are inside the ring $\S_i$. We have for the density:
$$
\sum_{p\in \S_i}e^{-\norm{\tilde{x}^* - \tilde{p}}^2} \ge \sum_{p\in \S_i}e^{-\norm{x^* - \tilde{p}}^2} \ge \sum_{p\in \S_i}e^{-(1 + 1/\log n)\cdot\norm{x^* - p}^2} = \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \cdot e^{-\norm{x^* - p}^2 / \log n}
$$
We have
$$
e^{-\norm{x^* - p}^2 / \log n} \ge e^{-1}, \ becuase\ \norm{x^* - p}^2 \le \log n.
$$
Thus, based on claim 2, we have
$$
\sum_{p\in \S_i}e^{-\norm{\tilde{x}^* - \tilde{p}}^2} \ge \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \cdot e^{-\norm{x^* - p}^2 / \log n} \ge e^{-1} \cdot \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \ge \frac{1}{\tilde{O}(\log n)} \cdot OPT.
$$
### Summary
We assumed we know the points in ring $\S_i$, and we project the points $\tilde{p}$ inside a ring on the outer boundary of $\S_i$. Then, we get the $\tilde{x}^*$ as the mean of $\tilde{p}$, and it is provable that the $\tilde{x}^*$ we get is better than the $\tilde{O}(\log n)$ - approximation of the $OPT$ density.
## Approach II: get $\S_i'$ which including maximum points in a ring with the specific outer radius $r_i$ and the inner radius $r_{i - 1}$
Without the assumption that the "magic" algorithm can give points inside $\S_i$, we use the Sk-EB alogrihtm to get the approximate $\S_i$ as $S_i'$.
### Assumption 1: we have the ring $\S_i'$ with maximum points
Here, we assumed we have the ring $\S_i'$ with outer radius $r_i$ and inner radius $r_{i-1}$ containing the most points. (**Need to be discussed more about how to get this ring.**)
### Claim 4
$x^*$ is the center of $\S_i$, and $x'$ is the center of $\S_i'$. We also scaled the points $p'$ inside $\S_i'$ to the outer boundary of the ring $\S_i'$ as $\tp'$. For all $i$, we have
$$
\sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}' }^2} \ge c\cdot \sum_{p\in \S_i} e^{-\norm{x^* - p}^2}
$$
**Prove**. Ring $\S_i'$ has the same radius as ring $\S_i$. Thus, we have for each $p$
$$
\norm{x' - \tp'}^2 = \norm{x^* - \tp}^2 \le (1 + 1/\log n)\cdot\norm{x^* - p}^2.
$$
We have ring $\S_i'$ containing more than or equal number of points as $\S_i$. Therefore, as the density value we have, for all $i$ we have
$$
\sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}' }^2} \ge \sum_{p\in \S_i} e^{-\norm{x^* - \tp}^2} \ge \sum_{p\in \S_i} e^{-(1 + 1/\log n)\cdot\norm{x^* - p}^2} = \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \cdot e^{-\norm{x^* - p}^2 / \log n}.
$$
The last step is from the same derivation as Cliam 3 (last part).
We have
$$
e^{-\norm{x^* - p}^2 / \log n} \ge e^{-1}, \ becuase\ \norm{x^* - p}^2 \le \log n.
$$
Thus, based on claim 2, we have
$$
\sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}' }^2} \ge \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \cdot e^{-\norm{x^* - p}^2 / \log n} \ge e^{-1} \cdot \sum_{p\in \S_i}e^{-\norm{x^* - p}^2} \ge \frac{1}{\tilde{O}(\log n)} \cdot OPT.
$$
**Actually, $c$ better than or equal to $e^{-1}$.**
### Claim 5
For some $i$, defined $\S_i'$ as the ring with outer radius $r_i$ and inner radius $r_{i-1}$ containing maximum number of points. Consider $x'$ as the center of $\S_i'$, we have
$$
\sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}'}^2} \ge \frac{1}{\tilde{O}(\log n)}\cdot OPT
$$
**Prove**. Trivally correct, based on Claim 1,2,4.
### Claim 6
A constant approximation of $\S_i'^*$ will give us the same result as Claim 5.
**Prove**. We have
$$
\sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}'}^2} \ge c \sum_{\tilde{p}'^* \in \S_i'^*} e^{-\norm{x'^* - \tilde{p}'^*}^2},\ because\ \norm{x' - \tilde{p}'}^2 = \norm{x'^* - \tilde{p}'^*}^2.
$$
## Find the Ring $\S_i'$
We need an algorithm which can give the maximum number of points under two specific quadratic constrains in a high dimensional space.
We are still finding possible solutions for this...
## Sk-EB Approximation for $\S_i'$
Based on the definition of Sk-EB there are more points in the SkEB ball with the fixed radius. Thus, assume sphere $\B_i''$ with the center $x''$ is the SkEB ball (we project all the points to the boundary of ball $\B_i''$) with the same radius as $\S_i'$, we have
$$
\sum_{\tp\in \B_i''} e^{-\norm{x'' - \tp}^2} \ge \sum_{\tilde{p}' \in \S_i'} e^{-\norm{x' - \tilde{p}'}^2}
$$
We cannot directly have the optimum ball $\B_i''^*$, if we want
$$
\sum_{\tp\in \B_i''} e^{-\norm{x'' - \tp}^2} \ge \frac{1}{\tilde{O}(\log n)}\cdot OPT
$$
In paper, [A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space](https://rdcu.be/cmPZK). They mentioned with which aproximation, what time complexity it will be (Mainly Theorem 1 and Theorem 3).


Then, we discuss which approximation between $\sum_{\tp\in \B_i''} e^{-\norm{x'' - \tp}^2}$ and $\sum_{\tp\in \B_i''^*} e^{-\norm{x''^* - \tp}^2}$ can give us a polynomial time complexity.
### Time complexity analysis of constant approximation of estimate value of SkEB balls
Based on $N(n, \epsilon) = O((n/\epsilon)^{\frac{2}{\epsilon}\log \frac{2}{\epsilon}})$, if we want a polynomail time complexity, $\epsilon$ has to be constant $c$.
We have $\norm{x'' - \tp}^2 \le (1 + c)\norm{x''^* - \tp}^2$. Recall that $\norm{x''^* - \tp}^2$ is equal to the radius square of ring $\S_i'$ which is $\norm{x' - \tp}^2$.
$$
\sum_{\tp\in \B_i''} e^{-\norm{x'' - \tp}^2} = \sum_{\tp\in \B_i''} e^{-(1 + c)\norm{x' - \tp}^2} = e^{-c\norm{x' - \tp}^2}\cdot \sum_{\tp\in \B_i''} e^{-\norm{x' - \tp}^2} \ge \frac{e^{-c\norm{x' - \tp}^2}}{\tilde{O}(\log n)}\cdot OPT
$$
Then, we analyze $e^{-c\norm{x' - \tp}^2}$, and in our cases, we got $\norm{x' - \tp}^2 \le \log n$. Thus, we have $e^{-c\norm{x' - \tp}^2} \ge n^{-c}$.
We have
$$
\sum_{\tp\in \B_i''} e^{-\norm{x'' - \tp}^2} \ge \frac{1}{O(n^c)} \cdot OPT, for\ any\ c\in (0, 1).
$$
Then, the time complexity is polynomial.
However, with $\epsilon = 1/\log n$ the time complexity is $O((n\log n)^{2\log n \log\log n}\cdot d)$.
One possible solution is we add the bandwidth $h = \sqrt{\log n}$. This idea fail mainly because in the previous claim we set the cut-off radius to $\sqrt{\log n}$, thus, we have to have $\epsilon$ to be $1/\log n$. After we have $h = \sqrt{\log n}$, we will have $\norm{(x' - \tp)/h}^2 \le some\ constant$.
After we have $h$, we can have polynomial time complexity solved KDE problem with $\tilde{O}(\log n)$ approximation of the estimate value.
### Directly adapt function $f$ to KDE problem

Here, it is maximum function $f$ instead of minimum, thus, the inequality should be $\ge$.
#### Epanchnikov Kernel
****Assume after the approximation no points are chopped by the cut-off circle.**
For the Epanchnikov Kernel, we have $k = 1$ and $f(X; c_1^*) = \sum_{x \in X} 1 - \norm{c_1^* - x}^2$, $f(X; c_1') = \sum_{x \in X} 1 - \norm{c_1' - x}^2$.
And for the distant relation, $\epsilon$ is constant:
$$
\norm{c_1' - x}^2 \le (1 + \epsilon) \cdot \norm{c_1^* - x}^2.
$$
Then, we want to have
$$
f(X; c_1') = \sum_{x \in X} 1 - \norm{c_1' - x}^2 \ge \sum_{x \in X} 1 - (1 + \epsilon)\norm{c_1^* - x}^2 \ge \alpha \sum_{x \in X} 1 - \norm{c_1^* - x}^2
$$
So, we have $\alpha$ can be
$$
\alpha = \frac{1 - (1 + \epsilon)\cdot \norm{c_1^* - x}^2}{1 - \norm{c_1^* - x}^2}
$$
Here, we have $\norm{c_1^* - x}^2 \le 1$, which is no more than a constant. And $\epsilon$ is also a constant.
- When $\norm{c_1^* - x}^2$ is a constant, we have $\alpha$ is a constant, too.
- When $\norm{c_1^* - x}^2$ is less than a constant, such as $1/n$ or $1/\log n$, the $(1 + \epsilon)$ part can be ignored. Then, $\alpha = 1$.
Thus, in every cases, we have $\alpha = \mu(1 + \epsilon)$ is a constant. And the time complexity is polynomial.
#### Gaussian Kernel
In our Gaussian KDE, we have $k = 1$ and $f(X; c_1^*) = \sum_{x \in X} e^{-\norm{c_1^* - x}^2}$, $f(X; c_1') = \sum_{x \in X} e^{-\norm{c_1' - x}^2}$.
And for the distant relation, $\epsilon$ is constant:
$$
\norm{c_1' - x}^2 \le (1 + \epsilon) \cdot \norm{c_1^* - x}^2.
$$
Then, we have
$$
f(X; c_1') = \sum_{x \in X} e^{-\norm{c_1' - x}^2} \ge \sum_{x \in X} e^{-(1 + \epsilon)\norm{c_1^* - x}^2} = e^{-\epsilon \norm{c_1^* - x}^2 } \cdot \sum_{x \in X} e^{-\norm{c_1^* - x}^2}
$$
Based on $\norm{c_1^* - x}^2 \le \log n$ and $\epsilon$ is a constant, we have
$$
e^{-\epsilon \norm{c_1^* - x}^2 } \ge n^{-\epsilon}.
$$
We have $\mu(1 + \epsilon) = n^{-\epsilon}$.