# Proving Randomly Sampling from a Points Evenly Distributed Sphere not Work $$ \newcommand{\fkp}{\hat{f}_{K, P}} \newcommand{\fkpi}{\hat{f}_{K, \Pi P}} \newcommand{\sumpp}{\sum\limits_{p\in P}} \newcommand{\sumppp}{\sum\limits_{p\in P'}} \newcommand{\sumxk}{\sum\limits_{x\in K}} \newcommand{\kxp}{k(\|x^* - p\|^2)} \newcommand{\gxp}{g(\|x^* - p\|^2)} \newcommand{\epstwo}{\frac{\epsilon}{2}} \newcommand{\epsfour}{\frac{\epsilon}{4}} \newcommand{\epssix}{\frac{\epsilon}{6}} \newcommand{\xp}{\|x^* - p\|^2} \newcommand{\maxxd}{\max_{x \in \mathbb{R}^d}} \newcommand{\maxxm}{\max_{x \in \mathbb{R}^m}} \newcommand{\norm}[1]{\|#1\|} \newcommand{\R}{\mathbb{R}} \newcommand{\B}{\mathcal{B}} \newcommand{\oc}{\overline{c}} $$ $$ \newcommand{\gammaone}{\epstwo\cdot\frac{\sumpp \kxp}{\sumpp \gxp \cdot \xp}} \newcommand{\sumppcondition}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 \le #1}} \newcommand{\sumppconditionp}[1]{\sum\limits_{p\in P, \norm{x^* -p}^2 > #1}} $$ $$ \newcommand{\gammatwo}{\epsfour\cdot\frac{\sumppcondition{\xi} \kxp}{\sumppcondition{\xi} \gxp \cdot \xp}} \DeclareMathOperator*{\argmax}{arg\,max} $$ ## Reference [# Foundations of Data Science: Chapter 2.5, Theorem 2.9](https://www.cs.cornell.edu/jeh/book.pdf) ## Pointset Generator and Theorem ### Pointset Generator From the book [# Foundations of Data Science](https://www.cs.cornell.edu/jeh/book.pdf) Chapter 2.5, they described a uniformly random points generating method by using spherical Gaussian. ![](https://i.imgur.com/QyyNxQV.png) We can use this method to generate our pointset $P$ on a sphere. ### Theorem 2.9: Gaussian Annulus Theorem ![](https://i.imgur.com/pEuDEZA.png) In our cases, we set $x'$ is sampling from the spherical gaussian distribution of point $p_i \in \R^d$, this theorem is about $$ Pr(\norm{x' - p_i} > \sqrt{d} + \beta \cup \norm{x' - p_i} < \sqrt{d} - \beta) \le 3e^{-c\beta^2} $$ This also implies that $$ Pr(\norm{x' - p_i} < \sqrt{d} - \beta) \le 3e^{-c\beta^2} $$ ## Prove it is not a constant approximation for sampling polynomial points ### Goal We want to prove it is not a constant approximation for sampling polynomial points from the Gaussian KDE PDF of an evenly points distributed sphere. And for a sampling point $x'$, we have the following: *With high probability, there is **no such** point from $poly(n)$ sampling points satisfies* $$ c\sumpp e^{-\norm{x' - p}^2}\ge \sumpp e^{-\norm{x^* - p}^2}, s.t.\ c\ge 1 \ is\ a\ constant. $$ Also, we required $\norm{x^* - p}^2 \le \log n/10$, otherwise, $\sumpp e^{-\norm{x^* - p}^2} < o(n)$, the $x^*$ should be located at a point instead of the center of the sphere. ### Definations #### Pointset and Sphere Assume all points of $P = \{p_1, p_2,..., p_n\}\subset \R^{n\times d}$ are evenly distributed on a sphere with radius $r = \sqrt{\log n}$. We have $\norm{x^* - p_i}^2 = \log n$. #### Points and Dimensions Let $d = O(n)$ and $\beta = \sqrt{n}/2$, then we can have $\sqrt{d} - \beta > r$ when $n$ is large. #### Mass Concentrated Annulus $A_i$ and Empty Centered Ball $\B_i$ We have each point $p_i$ is a Spherical Gaussian distribution, and $3e^{-c\beta^2}$ of the total mass is concentrated on the annulus $A_i$ between $\sqrt{d} - \beta$ and $\sqrt{d} + \beta$. The Ball $\B_i$ which is the nearly empty ball at the center of the annulus with radius equal to $\sqrt{d} - \beta$. ### Proof #### Part1 - only $x'$ is inside all mass empty balls there are some chance to have constant approximation This is a fact. Only when the point $x'$ is sampling from the empty sphere which radius is $\sqrt{d} - \beta$, with some probability $x'$ is located inside the sphere. When $x'$ is located outside the empty ball $\B_i$, we have $bn$ constant partial point which have $\norm{x' - p}^2 \ge 4\norm{x^* - p}^2$. Then, we have $$ \sumpp e^{-\norm{x' - p}^2}\le b\sumpp e^{-4\norm{x^* - p}^2} = e^{-3\norm{x^* - p}^2}\cdot b\sumpp e^{-\norm{x^* - p}^2} = b(n)^{-3}\cdot \sumpp e^{-\norm{x^* - p}^2} $$ #### Part 2 - sampling point $x'$ is inside the empty mass ball $\B_i$ with high probability: For pointset $P$ can be uniformly random sampling on the surface of the sphere, we can assume that all points inside $P$ are independent. Based on the Gaussian Annulus Theorem, we have $$ Pr(x'\in \B_1) \le 3e^{-cn}\\ Pr(x'\in \B_2) \le 3e^{-cn}\\ ...\\ Pr(x'\in \B_n ) \le 3e^{-cn}\\ $$ Then, we calculate the probability of $x'$ is inside any $\B_i$, by union bound we do not require $\B_i$ is independent anymore. $$ \begin{align} Pr(x' \in \B_1 \cup x' \in \B_2 \cup ...) &\le \sum_i^n Pr(x'\in \B_i)\\ &= 3ne^{-cn}\\ &= 3e^{-cn + \log n} \end{align} $$ #### Part 3 - sampling $poly(n)$ points is not enough for a constant approximation with high probability We sample $q = n^{c'}$ points, and the probability of there is at least one point instead $\B_i$ is: $$ \begin{align} Pr(\bigcup_i^q x'_i \in \B_1 \cup x'_i \in \B_2 \cup ...) &\le \bigcup_i^q Pr(x'_i \in \B_1 \cup x'_i \in \B_2 \cup ...)\\ &\le 3q\cdot e^{-cn} \\ &= \frac{n^{c'}}{3e^{cn}}\\ &= \frac{e^{c'\log n}}{3e^{cn}}\\ &= \frac{1}{3e^{cn - c'\log n}} \end{align} $$ With selecting $poly(n)$ points, the probability of there is at least one point inside all $\B_i$ is less than $1/3e^{cn - c'\log n}$.