# A Simple Algorithm for Constant Approximation Proof of KDE
$$
\newcommand{\fkp}{\hat{f}_{K, P}}
\newcommand{\fkpi}{\hat{f}_{K, \Pi P}}
\newcommand{\sumpp}{\sum\limits_{p\in P}}
\newcommand{\sumppp}{\sum\limits_{p\in P'}}
\newcommand{\sumxk}{\sum\limits_{x\in K}}
\newcommand{\kxp}{k(\|x^* - p\|^2)}
\newcommand{\gxp}{g(\|x^* - p\|^2)}
\newcommand{\epstwo}{\frac{\epsilon}{2}}
\newcommand{\epsfour}{\frac{\epsilon}{4}}
\newcommand{\epssix}{\frac{\epsilon}{6}}
\newcommand{\xp}{\|x^* - p\|^2}
\newcommand{\maxxd}{\max_{x \in \mathbb{R}^d}}
\newcommand{\maxxm}{\max_{x \in \mathbb{R}^m}}
\newcommand{\norm}[1]{\|#1\|}
\newcommand{\R}{\mathbb{R}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\oc}{\overline{c}}
$$
$$
\newcommand{\gammaone}{\epstwo\cdot\frac{\sumpp \kxp}{\sumpp \gxp \cdot \xp}}
\newcommand{\sumppcondition}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 \le #1}}
\newcommand{\sumppconditionp}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 > #1}}
$$
$$
\newcommand{\gammatwo}{\epsfour\cdot\frac{\sumppcondition{\xi} \kxp}{\sumppcondition{\xi} \gxp \cdot \xp}}
\DeclareMathOperator*{\argmax}{arg\,max}
$$
## Reference
[# Complexity and approximation of the Smallest k-Enclosing Ball problem: Algorithm 1 & proof.](https://www.sciencedirect.com/science/article/pii/S0195669815000335)
## Sk-EB Problem Constant Approximation
### Algorithm 1 and Proof

## Adapt Algorithm 1 to Gaussian KDE problem
Here, we define an algorithm for getting a constant approximate result of the KDE problem.
**Algo:** *We choose each point in $P$ as the approximate center of Gaussian Kernel KDE. We select the center $x'$ which will give the largest estimate value of $\sum_{p\in P} e^{-\norm{x' - p}^2}$.*
Then, we prove $x'$ will give a constant approximation of estimate value of the optimal center $x^*$, as the following:
$$
c\sumpp e^{-\norm{x' - p}^2}\ge \sumpp e^{-\norm{x^* - p}^2}, s.t.\ c\ge 1 \ is\ a\ constant.
$$
## Prove Algo is Constant Approximation (Fail)
By triangle inequality, we have
$$
\norm{x' - p}^2 \le \norm{x^* - p}^2 + \norm{x^* - x'}^2
$$
$e^{-x}$ is monetonic decreasing, we have
$$
e^{-\norm{x' - p}^2} \ge e^{-(\norm{x^* - p}^2 + \norm{x^* - x'}^2)}
$$
For the total estimate value, we have
$$
\sumpp e^{-\norm{x' - p}^2} \ge \sumpp e^{-(\norm{x^* - p}^2 + \norm{x^* - x'}^2)} = e^{-\norm{x^* - x'}^2}\cdot\sumpp e^{-\norm{x^* - p}^2}
$$
**If we want to prove this algo is constant approximate of the optimal estimate value, we need to have an alogrithm to give $\norm{x^* - x'}^2$ is constant.**
## Prove Algo is not a constant approximation: 1
Assume all points of $P$ are evenly distributed on a sphere with radius equal to $\sqrt{\log n}$. We have $\norm{x^* - p}^2 = \log n$ and every point is a $x'$. Based on the symmetry of sphere, we have there are at least $cn$ constant partial points which statisfies $\norm{x' - p}^2 \ge 1.5\norm{x^* - p}^2$.
Then, we have
$$
\sumpp e^{-\norm{x' - p}^2}\le c\sumpp e^{-1.5\norm{x^* - p}^2} = e^{-0.5\norm{x^* - p}^2}\cdot c\sumpp e^{-\norm{x^* - p}^2} = c(\sqrt{n})^{-1}\cdot \sumpp e^{-\norm{x^* - p}^2}
$$
## Prove Algo is not a constant approximation: 2
Consider the space $\R^d$, let $P$ be the set of its canonical base vectors multiplied by some constant $c$, i.e. the set of vectors $p_i$ that are 0 everywhere except for its $i$th component, where it is $c$. The solution given by the algorithm has value $1+(d-1)e^{-2c^2}$, whereas (for a suitable $c$) the optimal value is $de^{-c^2}$. Consider the ratio
$$
\frac{1+(d-1)e^{-2c^2}}{de^{-c^2}}.
$$
For the algorithm not to be a constant approximation, it suffices to find $c$ (depending on $d$) such that this ratio goes to 0 as $d$ increases (i.e., there does not exist a constant bounding this ratio from below). Indeed, let $c = (\log d)^{1/2}/2$. Then we have
$$
\frac{1+(d-1)e^{-2c^2}}{de^{-c^2}} = \frac{1+(d-1)e^{-2((\log d)^{1/2}/2)^2}}{de^{-((\log d)^{1/2}/2)^2}} \\
= \frac{1+(d-1)e^{-(\log d)/2}}{de^{-(\log d)/4}} \\
= \frac{d^{1/4}}{d}+\frac{(d-1)d^{1/4}}{d^{3/2}} \\
= O(d^{-1/4}).
$$
Hence, the ratio between the algorithm's solution and the optimal solution is not bounded and therefore it is not a constant factor approximation.
## Approximate Algorithms
Reference from wikipedia: [Approximate Algorithms.](https://en.wikipedia.org/wiki/Approximation_algorithm)

## Something Close to Our Problem
[A Local Search Approximation Algorithm for k-Means Clustering](https://www.cs.umd.edu/users/mount/Projects/KMeans/kmlocal-cgta.pdf)
In this paper, they used local search combined with a heuristic function within $9 + \epsilon$ approximation.