# Proving Randomly Sampling from a Points Evenly Distributed Sphere not Work
$$
\newcommand{\fkp}{\hat{f}_{K, P}}
\newcommand{\fkpi}{\hat{f}_{K, \Pi P}}
\newcommand{\sumpp}{\sum\limits_{p\in P}}
\newcommand{\sumppp}{\sum\limits_{p\in P'}}
\newcommand{\sumxk}{\sum\limits_{x\in K}}
\newcommand{\kxp}{k(\|x^* - p\|^2)}
\newcommand{\gxp}{g(\|x^* - p\|^2)}
\newcommand{\epstwo}{\frac{\epsilon}{2}}
\newcommand{\epsfour}{\frac{\epsilon}{4}}
\newcommand{\epssix}{\frac{\epsilon}{6}}
\newcommand{\xp}{\|x^* - p\|^2}
\newcommand{\maxxd}{\max_{x \in \mathbb{R}^d}}
\newcommand{\maxxm}{\max_{x \in \mathbb{R}^m}}
\newcommand{\norm}[1]{\|#1\|}
\newcommand{\R}{\mathbb{R}}
\newcommand{\B}{\mathcal{B}}
\newcommand{\oc}{\overline{c}}
$$
$$
\newcommand{\gammaone}{\epstwo\cdot\frac{\sumpp \kxp}{\sumpp \gxp \cdot \xp}}
\newcommand{\sumppcondition}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 \le #1}}
\newcommand{\sumppconditionp}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 > #1}}
$$
$$
\newcommand{\gammatwo}{\epsfour\cdot\frac{\sumppcondition{\xi} \kxp}{\sumppcondition{\xi} \gxp \cdot \xp}}
\DeclareMathOperator*{\argmax}{arg\,max}
$$
## Reference
[# Foundations of Data Science: Chapter 2.5, Theorem 2.9](https://www.cs.cornell.edu/jeh/book.pdf)
## Pointset Generator and Theorem
### Pointset Generator
From the book [# Foundations of Data Science](https://www.cs.cornell.edu/jeh/book.pdf) Chapter 2.5, they described a uniformly random points generating method by using spherical Gaussian.

We can use this method to generate our pointset $P$ on a sphere.
### Theorem 2.9: Gaussian Annulus Theorem

In our cases, we set $x'$ is sampling from the spherical gaussian distribution of point $p_i \in \R^d$, this theorem is about
$$
Pr(\norm{x' - p_i} > \sqrt{d} + \beta \cup \norm{x' - p_i} < \sqrt{d} - \beta) \le 3e^{-c\beta^2}
$$
This also implies that
$$
Pr(\norm{x' - p_i} < \sqrt{d} - \beta) \le 3e^{-c\beta^2}
$$
## Prove it is not a constant approximation for sampling polynomial points
### Goal
We want to prove it is not a constant approximation for sampling polynomial points from the Gaussian KDE PDF of an evenly points distributed sphere. And for a sampling point $x'$, we have the following:
*With high probability, there is **no such** point from $poly(n)$ sampling points satisfies*
$$
c\sumpp e^{-\norm{x' - p}^2}\ge \sumpp e^{-\norm{x^* - p}^2}, s.t.\ c\ge 1 \ is\ a\ constant.
$$
Also, we required $\norm{x^* - p}^2 \le \log n/10$, otherwise, $\sumpp e^{-\norm{x^* - p}^2} < o(n)$, the $x^*$ should be located at a point instead of the center of the sphere.
### Definations
#### Pointset and Sphere
Assume all points of $P = \{p_1, p_2,..., p_n\}\subset \R^{n\times d}$ are evenly distributed on a sphere with radius $r = \sqrt{\log n}$. We have $\norm{x^* - p_i}^2 = \log n$.
#### Points and Dimensions
Let $d = O(n)$ and $\beta = \sqrt{n}/2$, then we can have $\sqrt{d} - \beta > r$ when $n$ is large.
#### Mass Concentrated Annulus $A_i$ and Empty Centered Ball $\B_i$
We have each point $p_i$ is a Spherical Gaussian distribution, and $3e^{-c\beta^2}$ of the total mass is concentrated on the annulus $A_i$ between $\sqrt{d} - \beta$ and $\sqrt{d} + \beta$.
The Ball $\B_i$ which is the nearly empty ball at the center of the annulus with radius equal to $\sqrt{d} - \beta$.
### Proof
#### Part1 - only $x'$ is inside all mass empty balls there are some chance to have constant approximation
This is a fact. Only when the point $x'$ is sampling from the empty sphere which radius is $\sqrt{d} - \beta$, with some probability $x'$ is located inside the sphere.
When $x'$ is located outside the empty ball $\B_i$, we have $bn$ constant partial point which have $\norm{x' - p}^2 \ge 4\norm{x^* - p}^2$. Then, we have
$$
\sumpp e^{-\norm{x' - p}^2}\le b\sumpp e^{-4\norm{x^* - p}^2} = e^{-3\norm{x^* - p}^2}\cdot b\sumpp e^{-\norm{x^* - p}^2} = b(n)^{-3}\cdot \sumpp e^{-\norm{x^* - p}^2}
$$
#### Part 2 - sampling point $x'$ is inside the empty mass ball $\B_i$ with high probability:
For pointset $P$ can be uniformly random sampling on the surface of the sphere, we can assume that all points inside $P$ are independent.
Based on the Gaussian Annulus Theorem, we have
$$
Pr(x'\in \B_1) \le 3e^{-cn}\\
Pr(x'\in \B_2) \le 3e^{-cn}\\
...\\
Pr(x'\in \B_n ) \le 3e^{-cn}\\
$$
Then, we calculate the probability of $x'$ is inside any $\B_i$, by union bound we do not require $\B_i$ is independent anymore.
$$
\begin{align}
Pr(x' \in \B_1 \cup x' \in \B_2 \cup ...) &\le \sum_i^n Pr(x'\in \B_i)\\
&= 3ne^{-cn}\\
&= 3e^{-cn + \log n}
\end{align}
$$
#### Part 3 - sampling $poly(n)$ points is not enough for a constant approximation with high probability
We sample $q = n^{c'}$ points, and the probability of there is at least one point instead $\B_i$ is:
$$
\begin{align}
Pr(\bigcup_i^q x'_i \in \B_1 \cup x'_i \in \B_2 \cup ...) &\le \bigcup_i^q Pr(x'_i \in \B_1 \cup x'_i \in \B_2 \cup ...)\\
&\le 3q\cdot e^{-cn} \\
&= \frac{n^{c'}}{3e^{cn}}\\
&= \frac{e^{c'\log n}}{3e^{cn}}\\
&= \frac{1}{3e^{cn - c'\log n}}
\end{align}
$$
With selecting $poly(n)$ points, the probability of there is at least one point inside all $\B_i$ is less than $1/3e^{cn - c'\log n}$.