# Possible Ideas for Dimensions Recovery
$$
\newcommand{\v}{\mathbf{v}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\norm}[1]{\|#1\|}
\newcommand{\fkp}{\hat{f}_{K, P}}
\newcommand{\pc}[2]{p(#1 | #2)}
\newcommand{\zcm}{z = c_m}
\newcommand{\summ}{\sum_{m=1}^M}
\newcommand{\vt}{\mathbf{v}^{(t)}}
\newcommand{\epstwo}{\frac{\epsilon}{2}}
$$
$$
\newcommand{\pxv}{p(x | \v )}
\newcommand{\pcv}[1]{p(#1 | \v )}
$$
## Previously Recall
Based on the EM-algorithm analysis, we can not get the directly result from $\frac{\partial Q}{\partial \v} = 0$, and got a complex iterative scheme:
$$
v^{(\delta+1)} = \frac{\summ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)} \cdot \mu_m}{\summ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)}}
$$
And we cannot solve it by numberial analysis as well because iterative method will only give us an arbitary result it can be local max, local min and stationary point.
And for now, I really have no clear idea about how to deal with it to and get $\v^{(t+1)}$ which satisfies
$$
Q(\v^{(t+1)}|\vt) \ge Q(\vt|\vt)
$$
based on the iteration scheme above.
We are thinking about other solutions to do the dimension recovery...
## Explicit Formula for Kirszbraun's Theorem
This idea is coming from Cas Widdershoven, I very appreciated for his support.
### Original Kirszbraun Theorem in the KDE Paper
In the KDE paper, the Kirszbraun Theorem is used as an existance function:
*Function $g:\{\Pi \vt\} \cup\Pi P\to {\vt}\cup P$ with $g(\Pi y) = y$ for any $y\in {\vt}\cup P$. By the rescaled JL lemma, we have that $g$ is 1-Lipschitz. It follows that there is some function $\tilde{g}: \R^m\to \R^d$ which agrees with $g$ on inputs $\{\Pi \vt\} \cup\Pi P$ and satisfies $\norm{\tilde{g}(s) - \tilde{g}(t)}\le \norm{s - t}$, for all $s,t\in \R^m$.*
And the following proof in the KDE paper only dependent on this inequality, $\v^{(t)}$ is the solution on low dimension and $\mu$ represent a point in $P$:
$$
\norm{\tilde{g}(\v^{(t)}) - \mu} \le \norm{\v^{(t)} - \Pi \mu}, \forall \mu \in P
$$
### Kirszbraun Theorem with Extension
As we don't need to require $\tilde{g}(\vt) = g(\vt)$, and we only need the reservation of distance.
We can reduce the function $g$ to $g:\Pi P\to P$ instead of $g:\{\Pi \vt\} \cup\Pi P\to {\vt}\cup P$. And we also have $g(\mu) = \tilde{g}(\mu)$ for all points $\mu \in P$. We can still do the extension on Kirszbraun Theorem and have the distance preserved:
$$
\norm{\tilde{g}(\v^{(t)}) - \mu} \le \norm{\v^{(t)} - \Pi \mu}, \forall \mu \in P
$$
And we will not decrease the estimate value $\fkp (\tilde{g}(\vt))$ from $\hat{f}_{K, \Pi P}(\vt)$ (the estimate value we get from the low dimension).
### Explicit Formula of Kirszbraun Theorem
In paper, [Kirszbraun Theorem via an Explicit Formula](https://arxiv.org/abs/1810.10288), they gave an explicit formula about $\tilde{g}$ when we already know the $g$ function. The description is as following:

Also, the explainations of $conv(g)$:

In our problem, we can shrink the input set $E:P$, thus, the function $G$ here should be $G: \Pi P \to P$, and actually we can get the $G$ function, as long as it is a continous function.
The Theorem 2 above can give the extension function $\tilde{G}$ to extend $G$ to all input $X \subset \R^d$, which have $\vt \in X$.
Thus, we can set
$$
\v^{(t+1)} = \tilde{G}(\vt)
$$
And it will give us
$$
\norm{\tilde{G}(\v^{(t)}) - \mu} \le \norm{\v^{(t)} - \Pi \mu}, \forall \mu \in P
$$
## Randomize Projection Matrix
Inspirted by the paper, [Random Projections for Classification: A Recovery Approach](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.705.1234&rep=rep1&type=pdf).
In our problem, we already have the solution of the low dimension $\vt$.
We can first using **randomize projection** to get $\v^{(t+1)}$ in high dimension, and then to prove $\fkp (\v^{(t+1)})$ is inside the $\epsilon$ error of $\hat{f}_{K, \Pi P}(\vt)$.
In the paper "Random Projections for Classification: A Recovery Approach", they proved for **sparse matrix** (Data matrices are low-rank and/or optimal solution is sparse) and **classification problem** the error is inside $\epsilon$ after random projection for dimension recovery. I think we can use the same idea to prove:
$$
\fkp (\v^{(t+1)}) \ge (1 - \epstwo)\cdot \hat{f}_{K, \Pi P}(\vt)
$$
One concern is this method may only works for sparse matrix.
## LSH for Iteratively Update
Based on the analysis of EM-algorithm we got an iterative scheme as following:
$$
v^{(\delta+1)} = \frac{\summ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)} \cdot \mu_m}{\summ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)}}
$$
We can write this scheme in another way:
$$
v^{(\delta+1)} = \summ\frac{ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)} }{\summ p_m \cdot K(0;\mu_m - v^{(t)}, \Sigma_m) \cdot \frac{K'(0;\mu_m - v^{(\delta)}, \Sigma_m)}{K(0;\mu_m-v^{(\delta)}, \Sigma_m)}}\cdot \mu_m
$$
And we can treat the first part as a weight (probability) of $m$th component.
Then, we tried to increase the probability based on the density of points. Until we have
$$
Q(\v^{(\delta)}|\vt) \ge Q(\vt|\vt)
$$