## Iteratively reducing $\alpha$ We have a vector $x = \alpha U + \beta V$, for which we observe a partial set of entries $y = \Omega x$ where $\Omega_{ii} = 1$ if and only if $x_i$ is observed. ### Assumptions 1. $U$ is known 2. $\Omega_{ii}$ is $1$ with prob. $p$ for some known $p$. For ease of use we will use the indexing $\Omega_i$ inplace of $\Omega_{ii}$, in the future. 3. $\alpha \gg \beta$ ### Iterative method definitions and analysis Consider an iterative process where, $x^{(l)} = x-\left(\sum_{i=1}^{l-1} \hat{\alpha}_{i}\right)U$ $\alpha_l = \alpha-\left(\sum_{i=1}^{l-1} \hat{\alpha}_{i}\right)$ $\hat{\alpha}_l = \frac{U^\top \Omega^{(l)}x^{(l)}}{q}$ where $\Omega^{(l)}$ is sampled from $\Omega$ such that, $\Omega^{(l)}_{i} = \Omega_i$ with prob. $\frac{1}{L}$ (and $0$ otherwise) and $q=p/L$ where $L$ is some known parameter which captures the number of trials for the iterative process. Also, in this process, let $x^{(1)} = x$ and $\alpha_1 = \alpha$. Consider any $l$, we can see that, $E\left(\hat{\alpha}_l\right) = E_{l,\Omega}\left(\frac{U^\top \Omega^{(l)}x^{(l)}}{q}\right)$ and since each entry of $\Omega^{(l)}$ is corresponding entry of $\Omega$, we get, $E\left(\hat{\alpha}_l\right) = E_{\Omega}\frac{1}{L}\left(\frac{U^\top \Omega x^{(l)}}{q}\right) = \frac{p}{L}\left(\frac{U^\top x^{(l)}}{q}\right) = U^\top x^{(l)} = \alpha_l$ We can also see that given $\hat{\alpha}_l = \frac{U^\top \Omega^{(l)} x^{(l)}}{q} = \sum_{i=1}^n \frac{U_i \Omega^{(l)}_i x^{(l)}_i}{q}$, $|\frac{U_i \Omega^{(l)}_i x^{(l)}_i}{q}| \le |\frac{U_i x^{(l)}_i}{q}| \le \frac{1}{q}|U_i| |x^{(l)}_i| \le \frac{1}{q}\sqrt{\frac{2\mu}{n}} \sqrt{U_i^2+V_i^2}\sqrt{\alpha_l^2+\beta^2} \le \frac{2\mu}{nq}\|x^{(l)}\|$ and $v(\hat{\alpha}_l):= \sum_{i=1}^n E \left(\frac{U_i^2 {\Omega^{(l)}_i}^2 {x^{(l)}}^2}{q^2}\right) = \sum_{i=1}^n \left(\frac{U_i^2 {x^{(l)}}^2}{q}\right) \le \frac{2\mu}{nq}\|x^{(l)}\|^2$ Using this and Chung et al. Theorem 3.6 and 3.7, we get, $Pr(|\hat{\alpha}_l-\alpha_l| \ge \frac{1}{2} \|x^{(l)}\|) \le 2 e^{-\frac{\frac{1}{4}\|x^{(l)}\|^2}{2\left(\frac{2\mu}{nq}\|x^{(l)}\|^2+\frac{1/2\|x^{(l)}\|}{3}\cdot \frac{2\mu}{nq}\|x^{(l)}\|\right)}} = 2e^{-\frac{3nq}{56 \mu}}$ Letting $q \ge \frac{c \mu}{n} \log \frac{2}{\delta}$ for an appropriate $c$, i.e. setting $p \ge \frac{c \mu}{n} \cdot L \cdot \log \frac{2}{\delta}$, with prob. $\ge 1-\delta$, we get that, $|\hat{\alpha}_l-\alpha_l| \le \frac{1}{2} \|x^{(l)}\|$ Let $E_l$ be the event where $\|x^{(l+1)}\|^2 \le \frac{1}{4^{l}} \alpha^2+2\beta^2$. We can see that $Pr(E_l|E_{l-1}) \ge 1-\delta$ since if $\|x^{l}\|^2 \le \le \frac{1}{4^{l-1}} \alpha^2+2\beta^2$, we get with prob $\ge 1-\delta$, $\|x^{(l+1)}\| = |\hat{\alpha}_l-\alpha_l|^2+\beta^2 \le \frac{1}{4} \left(\frac{1}{4^{l-1}} \alpha^2+2\beta^2\right)+\beta^2 \le \frac{1}{4^{l}} \alpha^2+2\beta^2$ (which means the probaility of success of event $E_l$ conditioned on $E_{l-1}$ should be at least that). Given that, $\Omega^{(l)}$ is independent we can see that using simple induction, $Pr(E_l) \ge (1-\delta)^l \ge 1-l\delta$. Note that we have assumed in our process we have $L$ iterations. Therefore, we get that the probability of success of the algorithm is $Pr(E_L) \ge (1-\delta)^L \ge 1-L\delta$. Assuming we need the success probability to be $1-\frac{1}{n^\gamma}$ for some constant $\gamma$, we get that $\log 2/\delta =\gamma \log n+\log L$ Note that we know when our process ends $|\alpha_{L+1}|^2 \le \frac{1}{4^L} \alpha^2 + \frac{1}{2}\beta^2$. Assuming we need this to be close to $\beta^2$, we can see that we need a maximum of $L = \hat{c} \log \frac{\alpha}{\beta}$ iterations (Figuring out how we get this $L$ is the next step.) Therefore, we end up with a vector $x^{(L+1)}$ that is well-spread (with high probability), assuming $p \ge c\frac{\mu}{n} \cdot \log \frac{\alpha}{\beta} \cdot \left(\log n + \log \log \frac{\alpha}{\beta}\right)$ where $c$ is an appropriate constant. ### Learning local $L$ ratio Idea: Check if we can identify the cases where $L \le \log n$. Assuming $L \ge \log n$, we get $\alpha \ge n\beta$. Let $\tilde{\alpha} = \frac{U^\top \Omega^{(1)} x}{p/2} = \sum_{i=1}^n \frac{U_i \Omega^{(1)}_ix_i}{p/2}$. We can see that, $|\frac{U_i \Omega^{(1)}_ix_i}{p/2}| \le \frac{4\mu}{np}\|x\|$ and $v(\tilde{\alpha})=\sum_{i=1}^n E\left(\frac{U_i^2 \Omega^{(1)}_ix_i^2}{p^2/4}\right) \le \frac{4\mu}{np}\|x\|^2$ Therefore, $Pr(|\tilde{\alpha}-\alpha| \le \frac{1}{\sqrt{\log n}}\|x\|) \ge 1-2e^{-\frac{\|x\|^2/\log n}{2\left(\frac{4\mu}{np}\|x\|^2+\frac{4\mu}{np}\|x\|\frac{1}{3\sqrt{\log n}}\right)}} = 1-2e^{-\frac{3np/\log n}{8\mu\left(3+\frac{1}{\sqrt{\log n}}\right)}}$ We can see that assuming $p \ge c\frac{\mu}{n}\log^2 n$, with prob. $\ge 1-2e^{-\frac{c\mu\log n}{8\mu}} = 1-\frac{2}{n^{c/8}}$, $|\tilde{\alpha}-\alpha| \le \frac{1}{\sqrt{\log n}}\|x\| \le \frac{1}{\sqrt{\log n}}\sqrt{1+\frac{1}{\log^2 n}}\alpha$ $\left(1-\frac{1}{\sqrt{\log n}}\sqrt{1+\frac{1}{\log^2 n}}\right)\alpha \le \tilde{\alpha} \le \left(1+\frac{1}{\sqrt{\log n}}\sqrt{1+\frac{1}{\log^2 n}}\right)\alpha$ Also, let $z = \frac{\|\Omega^{(2)}x\|^2}{p/2} = \sum_{i=1}^n \frac{\Omega^{(2)}_i x_i^2}{p/2}$. We can see that $|\frac{\Omega^{(2)}_i x_i^2}{p/2}| \le \frac{4\mu}{np}\|x\|^2$ and $v(z) = \sum_{i=1}^n E\left(\frac{\Omega^{(2)}_i x_i^4}{p^2/4}\right) \le \frac{4\mu}{np}\|x\|^4$. Therefore, $Pr(|z-\|x\|^2| \le \frac{1}{\sqrt{\log n}}\|x\|^2) \le 1-2e^{-\frac{3np/\log n}{8\mu\left(3+\frac{1}{\sqrt{\log n}}\right)}}$ similar to before, with prob. $\ge 1-\frac{2}{n^{c/8}}$ $\left(1 - \frac{1}{\sqrt{\log n}} \right) \|x\|^2 \le z \le \left(1 + \frac{1}{\sqrt{\log n}} \right) \|x\|^2$ $\left(1 - \frac{1}{\sqrt{\log n}} \right) \alpha^2 \le z \le \left(1 + \frac{1}{\sqrt{\log n}} \right) \left(1+\frac{1}{\log^2 n}\right)\alpha^2$ Could we calculate $z$ and $\tilde{\alpha}$ and compare the ratio? If it is within the range $\left[\frac{\left(1 - \frac{1}{\sqrt{\log n}} \right)}{\left(\left(1+\frac{1}{\sqrt{\log n}}\sqrt{1+\frac{1}{\log^2 n}}\right)\right)^2},\frac{\left(1 + \frac{1}{\sqrt{\log n}} \right) \left(1+\frac{1}{\log^2 n}\right)}{\left(\left(1-\frac{1}{\sqrt{\log n}}\sqrt{1+\frac{1}{\log^2 n}}\right)\right)^2}\right]$ Can we only do the process for the columns with the ratio not in the range? Or stop at $\log n$ steps for all the candidates? ### Candidate Algorithm Assume $\max \|x\|^2 \le \Delta^2$ ``` For each y = \Omega x in the stream: Let \hat{y} = \Omega x/\Delta (Assume \alpha = \alpha/\Delta and \beta = \beta/\Delta) Do: Sample from \Omega with prob. 1/L and get \Omega^{(l)} Let \hat{\alpha}_l = U^T\Omega^{(l)}x^{(l)}L/p Let \Omega x^{(l+1)} = \Omega x^{(l)}-\Omega \hat{\alpha}_l U while(|\|\Omega x^{(l)}\|^2-\|\Omega x^{(l+1)}\|^2| \ge \xi) ``` What is the choice for $\xi$? What is the guess for $L$? ### Assuming we only have a strong estimate for $U$ (Incomplete). Assume we have an estimate $\tilde{U}$ such that $|\tilde{U}^\top U| \ge 1-\epsilon$ (we can see that therefore $\|\tilde{U}-U\|^2 = \|\tilde{U}\|^2+\|U\|^2-2\tilde{U}^\top U \le 2-2(1-\epsilon) = 2\epsilon$). Consider an iterative process where, $x^{(l)} = x-\left(\sum_{i=1}^{l-1} \hat{\alpha}_{i}\right)\tilde{U}$ $\hat{\alpha}_l = \frac{\tilde{U}^\top \Omega^{(l)}x^{(l)}}{q}$ where $\Omega^{(l)}$ is sampled from $\Omega$ such that, $\Omega^{(l)}_{i} = \Omega_i$ with prob. $\frac{1}{L}$ (and $0$ otherwise) and $q=p/L$ where $L$ is some known parameter which captures the number of trials for the iterative process. Also, in this process, let $x^{(1)} = x$ and $\alpha_1 = \alpha$. Consider any $l$, we can see that, $E\left(\hat{\alpha}_l\right) = E_{l,\Omega}\left(\frac{\tilde{U}^\top \Omega^{(l)}x^{(l)}}{q}\right)$ and since each entry of $\Omega^{(l)}$ is corresponding entry of $\Omega$, we get, $E\left(\hat{\alpha}_l\right) = E_{\Omega}\frac{1}{L}\left(\frac{\tilde{U}^\top \Omega x^{(l)}}{q}\right) = \frac{p}{L}\left(\frac{\tilde{U}^\top x^{(l)}}{q}\right) = \tilde{U}^\top x^{(l)} =\tilde{U}^\top x - \sum_{i=1}^{l-1}\hat{\alpha}_i$ We can also see that given $\hat{\alpha}_l = \frac{\tilde{U}^\top \Omega^{(l)} x^{(l)}}{q} = \sum_{i=1}^n \frac{\tilde{U}_i \Omega^{(l)}_i x^{(l)}_i}{q}$, $|\frac{\tilde{U}_i \Omega^{(l)}_i x^{(l)}_i}{q}| \le |\frac{\tilde{U}_i x^{(l)}_i}{q}| \le \frac{1}{q}|\tilde{U}_i| |x^{(l)}_i|\le \frac{\sqrt{2\mu}}{\sqrt{n}q}\sqrt{\left((\alpha-\sum_{i=1}^{l-1} \hat{\alpha}_i)^2+\beta^2+2(\sum_{i=1}^{l-1} \hat{\alpha}_i)^2 \epsilon \right)}$ and $v(\hat{\alpha}_l):= \sum_{i=1}^n E \left(\frac{\tilde{U}_i^2 {\Omega^{(l)}_i}^2 {x^{(l)}}^2}{q^2}\right) = \sum_{i=1}^n \left(\frac{\tilde{U}_i^2 {x^{(l)}}^2}{q}\right) \le \frac{2\mu}{nq}\|x^{(l)}\|^2 \le \frac{\sqrt{2\mu}}{\sqrt{n}q}\left((\alpha-\sum_{i=1}^{l-1} \hat{\alpha}_i)^2+\beta^2+2(\sum_{i=1}^{l-1} \hat{\alpha}_i)^2 \epsilon \right)$ Let $s_l^2 = \left((\alpha-\sum_{i=1}^{l-1} \hat{\alpha}_i)^2+\beta^2+2(\sum_{i=1}^{l-1} \hat{\alpha}_i)^2 \epsilon \right)$ Using this and Chung et al. Theorem 3.6 and 3.7, we get, $Pr(|\tilde{U}^\top x - \sum_{i=1}^{l-1}\hat{\alpha}_i-\hat{\alpha}_l| \ge \frac{1}{4} s_l) \le 2 e^{-\frac{\frac{1}{16}s_l^2}{2\left(\frac{\sqrt{2\mu}}{\sqrt{n}q}s_l^2+\frac{1/4s_l}{3}\cdot \frac{\sqrt{2\mu}}{\sqrt{n}q}s_l\right)}} = 2e^{-\frac{3\sqrt{n}q}{104 \sqrt{2\mu}}}$ Letting $q \ge \frac{c \sqrt{\mu}}{\sqrt{n}} \log \frac{2}{\delta}$ for an appropriate $c$, i.e. setting $p \ge \frac{c \sqrt{\mu}}{\sqrt{n}} \cdot L \cdot \log \frac{2}{\delta}$, with prob. $\ge 1-\delta$, we get that, $|\tilde{U}^\top x - \sum_{i=1}^{l}\hat{\alpha}_i| \le \frac{1}{4} s_l$ $|\tilde{U}^\top U \alpha+\tilde{U}^\top V\beta - \left(\tilde{U}^\top U+\tilde{U}^\top (\tilde{U}-U)\right)\sum_{i=1}^{l}\hat{\alpha}_i| \le \frac{1}{4} s_l$ Let $t_l=\sum_{i=1}^l \hat{\alpha}_i$ and $r_l = \alpha-t_l$ $\begin{align}|\tilde{U}^\top U r_l+\tilde{U}^\top V\beta - \tilde{U}^\top (\tilde{U}-U)t_l|^2 &\ge \left(\tilde{U}^\top U r_l\right)^2r_l^2+\left(\tilde{U}^\top V\right)^2\beta^2+\left(\tilde{U}^\top (\tilde{U}-U)\right)^2t_l^2 \\ & +2\tilde{U}^\top U \tilde{U}^\top V r_l\beta-2\tilde{U}^\top V\tilde{U}^\top (\tilde{U}-U)\beta t_l-2\tilde{U}^\top U\tilde{U}^\top (\tilde{U}-U)\alpha t_l\end{align}$ Since $\tilde{U}^\top (\tilde{U}-U) \le \epsilon$ and $\tilde{U}^\top V \le \epsilon$, for small enough epsilon, $|\tilde{U}^\top U r_l+\tilde{U}^\top V\beta - \tilde{U}^\top (\tilde{U}-U)t_l|^2 \ge \frac{1}{2}(1-\epsilon)^2 r_l^2$ (not sure if this is true) and $3r_l^2 \ge s_{l+1}^2$. Then $\frac{1}{16}s_l^2 \ge \frac{1}{2}(1-\epsilon)^2 r_l^2 \ge \frac{1}{6}(1-\epsilon)^2 s_{l+1}^2$. Assuming $\epsilon$ is small enough, $\frac{3}{4} s_l^2 \ge s_{l+1}^2$ Let $E_l$ be the event where $s_{l+1}^2 \le \frac{3^l}{4^{l}} \alpha^2+2\beta^2$. We can see that $Pr(E_l|E_{l-1}) \ge 1-\delta$ since if $s_l^2 \le \frac{3^{l-1}}{4^{l-1}} \alpha^2+2\beta^2$, we get with prob $\ge 1-\delta$, $s_{l+1}^2 =\frac{3}{4} \left(\frac{3^{l-1}}{4^{l-1}} \alpha^2+2\beta^2\right)+\beta^2 \le \frac{3^{l}}{4^{l}} \alpha^2+2\beta^2$ (which means the probaility of success of event $E_l$ conditioned on $E_{l-1}$ should be at least that). Given that, $\Omega^{(l)}$ is independent we can see that using simple induction, $Pr(E_l) \ge (1-\delta)^l \ge 1-l\delta$. Note that we have assumed in our process we have $L$ iterations. Therefore, we get that the probability of success of the algorithm is $Pr(E_L) \ge (1-\delta)^L \ge 1-L\delta$. Assuming we need the success probability to be $1-\frac{1}{n^\gamma}$ for some constant $\gamma$, we get that $\log 2/\delta =\gamma \log n+\log L$ Note that we know when our process ends $|\alpha_{L+1}|^2 \le \frac{3^L}{4^L} \alpha^2 + \frac{1}{2}\beta^2$. Assuming we need this to be close to $\beta^2$, we can see that we need a maximum of $L = \hat{c} \log \frac{\alpha}{\beta}$ iterations (Figuring out how we get this $L$ is the next step.) Therefore, we end up with a vector $x^{(L+1)}$ that is well-spread (with high probability), assuming $p \ge c\frac{\mu}{n} \cdot \log \frac{\alpha}{\beta} \cdot \left(\log n + \log \log \frac{\alpha}{\beta}\right)$ where $c$ is an appropriate constant. ### Lower bound -- dependence on $\beta$ Simple observation: if the coefficient $\beta$ is distributed as Bern$(0,\tau)$, then the number of columns seen must be $> \frac{1}{\tau}$ ---> This does not give any "regret" lower bound, i.e., after ~ $1/(p\tau)$ steps, we can estimate $u,v$ perfectly, and total regret would only be $\approx 1/\tau$. What if $\beta$ is a smaller variance Gaussian? I.e., $\alpha \sim N(0,1)$ and $\beta \sim N(0,\sigma^2)$? (magnitude ratio will be around $1/\sigma$) In this case, 1. What is the bound that is given by the standard analysis of matrix completion via nuclear norm min? (work out incoherence of the obtained matrix) -- this will likely require # columns and $p$ (obs probability) to grow with $\sigma$ 2. Assume $p$ is not growing with $\sigma$, can we obtain a lower bound on the number of columns that need to be sampled? (I.e., to obtain error $< \delta$ in $v$, do we _need_ to see a certain number of columns -- first assuming you know $u$ exactly?) Assuming we have $m$ columns and $\hat{\alpha}$ the vector of $\alpha_i$s and $\hat{\beta}$ the vector of $\beta_i$s, $E \left(\langle \hat{\alpha}, \hat{\beta} \rangle\right) = 0$. If $\hat{\alpha},\hat{\beta}$ are orthogonal $E(\lambda_1^2) = E\|\hat{\alpha}\|^2 = m$ and $E(\lambda_2^2) = E\|\hat{\beta}\|^2 = \sigma^2m$. We can see that $\alpha_i^2+\beta_i^2 = \lambda_1^2 S_i^2+\lambda_2^2 R_i^2 \implies \lambda_2^2 ( S_i^2+ R_i^2) \le \alpha_i^2+\beta_i^2 \le \lambda_1^2 (S_i^2+ R_i^2)$ (assuming $\lambda_1\ge \lambda_2$). #### On Q1 Let $M = [x_1 \; x_2 \; \dots \; x_d]$ where $x_i = \alpha_i U+ \beta_i V$. Let $\hat{\alpha} = [\alpha_1 \; \alpha_2 \; \dots \; \alpha_d]$ and $\hat{\beta} = [\beta_1 \; \beta_2 \; \dots \; \beta_d]$. Consider $A = M^\top M = \hat{\alpha} \hat{\alpha}^\top +\hat{\beta} \hat{\beta}^\top$. Let $\lambda_1,\lambda_2$ be the eigenvalues of $A$. Then, $\lambda_1+\lambda_2 = Tr(A) = \sum_{i=1}^d \alpha_i^2 + \sum_{i=1}^d \beta_i^2$ Note that, $\begin{align} \|\alpha_i^2-1\|_{\psi_1} & = \sup_{p\ge 1} \frac{\left(E|\alpha_i^2-1|^p\right)^{1/p}}{p} \\ & = \sup_{p\ge 1} \frac{\left(E\left(\sum_{j=0}^p {p \choose j}\alpha_i^{2(p-j)}\right)\right)^{1/p}}{p} \\ & \le \sup_{p\ge 1} \frac{\left(\left(\sum_{j=0}^p p^j(2(p-j)-1)!!\right)\right)^{1/p}}{p} \\ & \le \sup_{p\ge 1} \frac{\left(\left(\sum_{j=0}^p 2^{p-j}p^p\right)\right)^{1/p}}{p} \\ & \le 4 \end{align}$ and $\begin{align} \|\beta_i^2-\sigma^2\|_{\psi_1} & = \sup_{p\ge 1} \frac{\left(E|\beta_i^2-\sigma^2|^p\right)^{1/p}}{p} \\ & = \sup_{p\ge 1} \frac{\left(E\left(\sum_{j=0}^p {p \choose j}\beta_i^{2(p-j)}\sigma^{2j}\right)\right)^{1/p}}{p} \\ & \le \sup_{p\ge 1} \frac{\left(\left(\sum_{j=0}^p p^j(2(p-j)-1)!!\right)\sigma^{2p}\right)^{1/p}}{p} \\ & \le \sup_{p\ge 1} \frac{\left(\left(\sum_{j=0}^p 2^{p-j}p^p\right)\sigma^{2p}\right)^{1/p}}{p} \\ & \le 4\sigma^{2} \end{align}$ Using this, we can see that, $Pr\left(\left |\sum_{i=1}^d \alpha_i^2 - d\right | \ge \epsilon d\right) \le 2e^{-c \min \left(\frac{\epsilon^2 d}{16},\frac{\epsilon d}{4}\right)}$ and $Pr\left(\left |\sum_{i=1}^d \beta_i^2 - \sigma^2d\right | \ge \epsilon \sigma^2 d\right) \le 2e^{-c \min \left(\frac{\epsilon^2 d}{16},\frac{\epsilon d}{4}\right)}$ We can see that with prob. $\ge 1-4e^{-c \min \left(\frac{\epsilon^2 d}{16},\frac{\epsilon d}{4}\right)}$, $\lambda_1+\lambda_2 \ge d-\epsilon d + \sigma^2 d -\epsilon \sigma^2 d = (1-\epsilon)(1+\sigma^2)d$ and $\lambda_1+\lambda_2 \le d+\epsilon d + \sigma^2 d +\epsilon \sigma^2 d = (1+\epsilon)(1+\sigma^2)d$ $\begin{align}\lambda_1 &= \max \frac{x^\top A x}{x^\top x} \\& \ge \frac{\hat{\alpha}^\top (\hat{\alpha}\hat{\alpha}^\top+\hat{\beta}\hat{\beta}^\top) \hat{\alpha}}{\hat{\alpha}^\top \hat{\alpha}}\\ & \ge \sum_{i=1}^d \alpha_i^2 \end{align}$ Therefore $\lambda_1 \ge (1-\epsilon)d$ Consider $\sum_{i=1}^d \alpha_i \beta_i$. $\begin{align}\|\alpha_i \beta_i\|_{\psi_1} & = \sup_{p\ge 1}\frac{\left(E(\alpha_i^p)E(\beta_i^p)\right)^{1/p}}{p} \\ & = \sup_{2p\ge 1}\frac{\left((2p-1)!!(2p-1)!!\sigma^{2p}\right)^{1/2p}}{2p} \le \sigma^2\end{align}$ Then, $Pr\left(|\sum_{i=1}^d \alpha_i \beta_i| \ge \epsilon \sigma^2 d\right) \le 2e^{-c\min\left(\epsilon^2 d,\epsilon d\right)}$ Letting $\epsilon = \sigma^2$, we get, with prob. $\ge 1-4e^{-c \frac{\sigma^4 d}{16}}$,we can see that $\lambda_1 \ge (1-\sigma^2)d$ and $\lambda_2 \le (1+\sigma^2)^2d-\lambda_1 \le (1+\sigma^2)^2d-(1-\sigma^2)d = (\sigma^2+\sigma^4)d \le 2\sigma^2d$ Assume $\theta$ is the angle between $\hat{\alpha},\hat{\beta}$. With prob. $\ge 1-2e^{-c\epsilon^2 d}-4e^{-c \frac{\sigma^4 d}{16}}$, $|\sum_{i=1}^d \alpha_i \beta_i| = |\sqrt{\sum_{i=1}^d \alpha_i^2}||\sqrt{\sum_{i=1}^d \beta_i^2}|\cos\theta \le \epsilon \sigma^2 d$ $\sqrt{(1-\sigma^2)d}\sqrt{(1-\sigma^2)\sigma^2 d}\cos\theta \le |\sqrt{\sum_{i=1}^d \alpha_i^2}||\sqrt{\sum_{i=1}^d \beta_i^2}|\cos\theta \le \epsilon \sigma^2 d$ $\cos\theta \le \frac{\epsilon \sigma}{1-\sigma^2} \le 2\epsilon \sigma$. We can see that we can have $\tilde{\alpha} = \hat{\alpha}\cos(\pi/4-\theta/2)+\hat{\beta}\sin(\pi/4-\theta/2) = \hat{\alpha}\left(\frac{1}{\sqrt{2}}\cos \theta/2+\frac{1}{\sqrt{2}}\sin \theta/2\right)+\hat{\beta}\left(\frac{1}{\sqrt{2}}\cos \theta/2-\frac{1}{\sqrt{2}}\sin \theta/2\right)$ $\tilde{\beta} = \hat{\beta}\cos(\pi/4-\theta/2)+\hat{\alpha}\sin(\pi/4-\theta/2) = \hat{\beta}\left(\frac{1}{\sqrt{2}}\cos \theta/2+\frac{1}{\sqrt{2}}\sin \theta/2\right)+\hat{\alpha}\left(\frac{1}{\sqrt{2}}\cos \theta/2-\frac{1}{\sqrt{2}}\sin \theta/2\right)$ Note that when $\cos \theta$ is small, we get $\cos \theta/2 -\sin \theta/2$ is small. Then we get $\tilde{\alpha} \approx \hat{\alpha}$ and $\tilde{\beta} \approx \hat{\beta}$ and therefore we get that the incoherence itself would be small. We can see that, $\|\tilde{\alpha}\|^2 = \|\hat{\alpha}\|^2 (\frac{1}{2}+\frac{\sin \theta}{2})+\|\hat{\beta}\|^2 (\frac{1}{2}-\frac{\sin \theta}{2})$ $\|\tilde{\beta}\|^2 = \|\hat{\beta}\|^2 (\frac{1}{2}+\frac{\sin \theta}{2})+\|\hat{\alpha}\|^2 (\frac{1}{2}-\frac{\sin \theta}{2})$ and $\sin \theta \ge \sqrt{1-4\epsilon^2 \sigma^2}$ $\begin{align}\tilde{\alpha}^\top \tilde{\beta}&=\hat{\alpha}^\top \hat{\alpha} \sin(\pi/4-\theta/2)\cos(\pi/4-\theta/2)+\hat{\beta}^\top \hat{\beta} \sin(\pi/4-\theta/2)\cos(\pi/4-\theta/2)+\hat{\alpha}^\top \hat{\beta} \\ &=\left(\hat{\alpha}^\top \hat{\alpha} +\hat{\beta}^\top \hat{\beta}\right)\frac{\cos \theta}{2}+\hat{\alpha}^\top\hat{\beta}\\ &\le \left(\|\hat{\alpha}\|+\|\hat{\beta}\|\right)^2\epsilon\sigma\end{align}$ Note that if we need the success probability to be high (i.e. $1-2e^{-c\epsilon^2 d} \ge 1-1/n^\gamma$), we would need $e^{c\epsilon^2 d} \ge 2n^\gamma \implies d \ge O\left(\frac{\log n}{\epsilon^2}\right)$ #### Candidate Algorithm Idea (Assuming you know $U$) For each $x_t$, 1. Check if $x_t$ is $U$ only 2. If $x_t$ is $U$ only, derive $\alpha$ by using the observed non-zero entries of $U$ and return $\alpha U$ 3. Else, run the reduction until we get $\hat{x_t} = \hat{\alpha}U + \beta V$ where $\hat{\alpha}\approx \beta$ and then use $\Omega_t y_t = \Omega_t \hat{x_t}/\|\Omega_t \hat{x_t}\|$ for matrix completion. Since at step 3, we scale such that $\alpha \approx \beta$, then we get $y_t$ has coefficients $\approx \frac{1}{\sqrt{2}}$