# Convexity of the Loss function of GAN on univariate Gaussian Distribution ## Problem setup ### Distribution * Target: $N_0(x) = \frac{1}{(2 \pi)^{d/2}} e^{-\frac{1}{2}x^Tx}$ * Generator: $N_{\mu}(x) = \frac{1}{(2 \pi)^{d/2}} e^{-\frac{1}{2}(x-\mu)^T(x-\mu)}$ ### Loss function Formulation $$ L(\mu) = \sup_{D} \left(E_{x \sim N_0} \log(D(x)) + E_{x \sim N_{\mu}} \log(1 - D(x))\right ) $$ ### Perfect Discriminator: $$ D^*(\mu; x) = \frac{e^{-\frac{1}{2}x^Tx}}{e^{-\frac{1}{2}(x-\mu)^T(x-\mu)} + e^{-\frac{1}{2}x^Tx}} = \frac{1}{1 + e^{-\frac{1}{2}\mu^T\mu + x^T\mu}} $$ ## Derivative for univariate normal (holding the other variable constant) ### Derivative for mean #### First Order Using Envelope theorem, we have: $$ \nabla_{\mu} L(\mu) = \nabla_{\mu} E_{x \sim N_{\mu}} \log\left( 1 - D^*(\mu; x) \right) = E_{x \sim N_{\mu}}[\log\left( 1 - D^*(\mu; x) \right) (x - \mu)] $$ Applying Steins Lemma, \begin{align*} \nabla_{\mu} L(\mu) &= E_{x \sim N_{\mu}} [\nabla_x \log\left( 1 - D^*(\mu; x) \right)] \\ &= E_{x \sim N_{\mu}}\left[ \frac{D^*(\mu; x)^2}{1- D^*(\mu;x)}\ \frac{N_{\mu}(x)}{N_{0}(x)}\ \mu \right] \\ &= \mu\ E_{x \sim N_{\mu}}[D^*(\mu; x)] \end{align*} Observe that $E_{x \sim N_{\mu}}[D^*(\mu; x)] > 0$ for any $\mu \in \mathbb{R}^d$. Therefore, the only critical point of $L(\mu)$ is $0$. Moreover, for any direction $v \in \mathbb{R}^d$ the gradient along that direction is negative for any vector $\rho v$ with $\rho < 0$ and positive for any $\rho > 0$. Therefore, $0$ is the unique global minimum of $L(\mu)$. #### Second Order $$ \frac{\partial Loss}{\partial \mu^2} = E_{x \in P_{G}} D^*(x) \space + \space E_{x \in P_{G}} \mu \cdot D^*(x) \cdot (x-\mu) \space - \space E_{x \in P_{G}} \mu * D^*(x)^2 \cdot \frac{N_{\mu}(x)}{N_{t}(x)} \cdot (x-\mu) \\=E_{x \in P_{G}} D^*(x) \space + \space E_{x \in P_{G}} \mu \cdot D^*(x) \cdot (x-\mu) \cdot \left( 1 - D^*(x) \cdot \frac{N_{\mu}(x)}{N_{t}(x)} \right) \\ = E_{x \in P_{G}} D^*(x) \space + \space \mu \cdot E_{x \in P_{G}} D^*(x)^2 \cdot (x-\mu) $$ Again, applying Steins Lemma: $$ \frac{\partial Loss}{\partial \mu^2} = E_{x \in P_{G}} D^*(x) \space + \space \mu \cdot E_{x \in P_{G}} 2 \cdot D^*(x) \cdot (-1) \cdot D^*(x)^2 \cdot \frac{N_{\mu}(x)}{N_{t}(x)} \cdot \mu \\ = E_{x \in P_{G}} D^*(x) \cdot \left [1 \space - \space 2 \cdot \mu^2 \cdot D^*(x) \cdot \left(1 - D^*(x)\right) \right] $$ A trivial bound can be optained to ensure the positivity of the expression. As $0 \leq D^*(x) \leq 1$, $D^*(x) \cdot (1 - D^*(x)) \leq \frac{1}{4}$. Thus, if we have $-\sqrt{2} \leq \mu \leq \sqrt{2}$, the whole thing will be always postive. I have tried to plot the expression under different $\mu$ values and obtained the following graph: ![](https://i.imgur.com/dSxCYCj.png) ~~So it seems like the bound is actually quite tight?~~ (I have made a mistake in my program.) There should be a more relaxed bound but I am unable to have a symbolic solution ```python import matplotlib.pyplot as plt import scipy.stats as st import numpy as np import math # How many points are used to evaluate the Expectation sample_size = 1000 # Range of the points range = 100 def eva(mu): mean = mu std = 1 xmin = mu - range xmax = mu + range x = np.linspace(xmin, xmax, sample_size) value = st.norm.pdf(x,mean,std) ret = 0 def D(pt): return 1/ ( 1 + math.exp(-0.5 * mu * mu + pt * mu ) ) for pt, pdf in zip(x, value): D_pt = D(pt) ret += pdf * ( D_pt - 2 * D_pt * D_pt * (1 - D_pt) * mu * mu) return ret # Range of mu to be used in plotting mus = np.linspace(1, 2, 50) res = [] for mu in mus: res.append(eva(mu)) plt.scatter(mus, res, label='Second Derivative') plt.plot(mus, np.zeros(len(mus)), label="zero") plt.legend() plt.show() ``` ### Derivative for Variance Target Distribution: $N_t(\mu^*=0, \sigma^* = 1)$ Initialization: $N_i(0, \sigma )$ $\frac{\partial Loss}{\partial \sigma} = \mathbb{E}_{x \sim N_i}log\left(1 - D\left(x\right) \right) \cdot \left(\frac{x^2}{\sigma^3} - \sigma^{-1}\right) \\ \quad \quad = \mathbb{E}_{x \sim N_i} \sigma^{-2} \cdot \frac{x^2}{\sigma} \cdot log \left(1 - D(x) \right) - \mathbb{E}_{x \sim N_i} \sigma^{-1}log(1-D(x))$ Applying Steins lemma with $g(x) = \frac{x}{\sigma} \cdot log(1 - D(x))$ $\frac{\partial Loss}{\partial \sigma} = \mathbb{E}_{x \sim N_i}|\sigma|^{-1} \cdot D(x) \cdot x^2 \cdot (1 - \sigma^{-2})$ Thus, we reach the same conclusion as for the case $\mu$ is unknown and $\sigma=1$ A graph for the gradient: ![](https://i.imgur.com/QHkJx0D.png) Obviously, the function is still not convex as the gradient vanishes when $\sigma$ goes further away from $\sigma^*=1$ ### Derivative when both mean and variance is changing #### Mean $$ \frac{\partial Loss}{\partial \mu} = \mathbb{E} D(x) \cdot (x \cdot (1- \sigma ^{-2}) + \frac{\mu}{\sigma^2}) $$ ##### Case: $\sigma>1$ We argue $\mathbb{E}D(x)\cdot x$ alone is positive. $\mathbb{E}D(x)\cdot x \\= \int_{-\infty}^{\infty}\frac{x}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}} \\= \int_{0}^{\infty}\frac{x}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}} + \int_{-\infty}^{0}\frac{x}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}} \\= \int_{0}^{\infty}\frac{x}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}} + \int_{0}^{\infty}\frac{-x}{\sigma \cdot e^{\frac{(-x-\mu)^2}{2}} + e^{\frac{(-x)^2}{2}}} \\= \int_{0}^{\infty}x \cdot (\frac{1}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}} - \frac{1}{\sigma \cdot e^{\frac{(-x-\mu)^2}{2}} + e^{\frac{x^2}{2}}}) \\= \int_{0}^{\infty}x \cdot \frac{e^{\frac{(x+\mu)^2}{2}} - e^{\frac{(x-\mu)^2}{2}}}{Denominator}$ Since $\mu$ is positive, when $x>0$, $(x+\mu)^2 \geq (x-\mu)^2$. Therefore, the whole thing is always positive. ##### Case: $\sigma<1$ We instead argue for $\frac{\partial Loss}{\partial \mu} \cdot \sigma^2 = \mathbb{E} D(x)\cdot (\mu + x \cdot (\sigma^2-1))$. As we know $\sigma^2 \cdot \mathbb{E}D(x)x$ is positive, we have: $\frac{\partial Loss}{\partial \mu} \cdot \sigma^2 \geq \mathbb{E} D(x)\cdot (\mu - x) \\ = \int_{-\infty}^{\infty}\frac{\mu - x}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}}dx \\ = -\int_{-\infty}^{\infty}\frac{x - \mu}{\sigma \cdot e^{\frac{(x-\mu)^2}{2}} + e^{\frac{x^2}{2}}}dx$ Now, substitute $y = x-\mu$, we have: $\frac{\partial Loss}{\partial \mu} \cdot \sigma^2 \geq -\int_{-\infty}^{\infty}\frac{y}{\sigma \cdot e^{\frac{y^2}{2}} + e^{\frac{(y+\mu)^2}{2}}}dy$ Follow a similar argument as in Case 1, we will have: $\int_{-\infty}^{\infty}\frac{y}{\sigma \cdot e^{\frac{y^2}{2}} + e^{\frac{(y+\mu)^2}{2}}}dy \leq 0$. Thus, $\frac{\partial Loss}{\partial \mu} \cdot \sigma^2 \geq 0$. #### Variance $$ \frac{\partial Loss}{\partial \mu} = \mathbb{E} D(x) \cdot (x \cdot (1- \sigma ^{-2}) + \frac{\mu}{\sigma^2}) \cdot \frac{x-\mu}{\sigma} $$ ## Next Step * Write the bounds obtained in terms of the target distribution's mean * Generalize it to higher dimension * Figure out if there is a similar bound for the covariance/variance * See what would happen if we are using truncated normal distribution ## Truncated normal distribution ### First order $$ \frac{\partial Loss}{\partial \mu} = \frac{\partial}{\partial \mu}E_{x \in P_{G}} log\left( 1 - D^*(x) \right) \\= E_{x \in P_{G}} log\left( 1 - D^*(x) \right) \cdot (x - \mu) - E_{x \in P_{G}}(log(1-D^*(x))) \cdot E_{x \in P_{G}}(x-\mu) \\= E_{x \in P_{G}} log\left( 1 - D^*(x) \right) \cdot (x - \mu) $$ I guess we may need some rule similar to Steins lemma but for truncated normal distribution ## Derivative for $(\mu, \sigma)$ We have $$ \frac{\partial}{\partial \mu} L(\mu, \sigma)= E_{x \sim N(\mu, \sigma)}[ D^*(\mu, \sigma;x)(x (1-1/\sigma^2) + \mu/\sigma^2)] $$ We want to prove that it is positive for $\mu > 0$. First, assume that $\sigma > 1$. Then it suffices to show that $E_{x \sim N(\mu,\sigma)}[D^*(\mu, \sigma;x)x] > 0$. We have $$ E_{x \sim N(\mu,\sigma)}[D^*(\mu, \sigma;x)x] = \int_{-\infty}^{\infty} \frac{1}{\frac{1}{N_0(x)}+ \frac{1}{N(\mu, \sigma;x)}} x d x. $$ We can lower bound the above integral by picking $\mu \geq 0$ so that the contribution of the negative $x$'s is maximized. Therefore, we want the density $N(\mu,\sigma;x)$ to be as large as possible for $x<0$. Given that $\mu \geq 0$, in the worst case $\mu = 0$. Therefore $$ E_{x \sim N(\mu,\sigma)}[D^*(\mu, \sigma;x)x] \geq E_{x \sim N(0,\sigma)}[D^*(0, \sigma;x)x] $$ Moreover, notice that $D^*(0, \sigma;x) = D^*(0, \sigma, -x)$ and therefore $E_{x \sim N(0,\sigma)}[D^*(0, \sigma;x)x] =0$. ## Normal variable with a Relu layer ### Gradient with respect to $\sigma$ * Formulating with piece-wise function * No longer expectation - integration only on half of the space ### Case when both $\mu$ and $\sigma$ are changing (with respect to $\mu$) $\frac{ \partial Loss } {\mu} = exp(-\frac{\mu_g^2}{2\sigma_g^2}) \cdot log(\frac{1 - lim_{x \to 0}D(x)}{1 - D(0)}) + \frac{1}{\sigma_g^2} \cdot \int_{0}^{\infty} D(x) \cdot exp(\frac{ - (x - \mu_g^2)}{2\sigma_g^2}) \cdot (\mu_g - \mu_t \cdot \sigma_g^2 + x \cdot (\sigma_g^2 - 1))dx$ where $\frac{1 - lim_{x \to 0}D(x)}{1 - D(0)}$ can be reduced to evaluate: $x^2 (1 - \sigma_g^2) + 2x(\mu_t \cdot \sigma_g^2 - \mu_g)$ ### Simpler case: when $\sigma = 1$ $\frac{ \partial Loss } {\mu} = exp(-\frac{\mu_g^2}{2\sigma_g^2}) \cdot log(\frac{1 - lim_{x \to 0}D(x)}{1 - D(0)}) + \frac{1}{\sigma_g^2} \cdot \int_{0}^{\infty} D(x) \cdot exp(\frac{ - (x - \mu_g^2)}{2\sigma_g^2}) \cdot (\mu_g - \mu_t)dx$ where $\frac{1 - lim_{x \to 0}D(x)}{1 - D(0)}$ can be reduced to evaluate: $2x(\mu_t - \mu_g)$ ### Case when both $\mu$ and $\sigma$ are changing (with respect to $\sigma$) $\frac{ \partial Loss } {\sigma} = \mu_g \cdot exp(-\frac{\mu_g^2}{2\sigma_g^2}) \cdot log(\frac{1 - D(0)}{1 - lim_{x \to 0}D(x)}) + \cdot \int_{0}^{\infty} D(x) \cdot exp(\frac{ - (x - \mu_g^2)}{2\sigma_g^2}) \cdot (x - \mu_g) \cdot (\mu_g - \mu_t \cdot \sigma_g^2 + x \cdot (\sigma_g^2 - 1))dx$ #### Simpler Case of Case II: $\mu = \mu_g = \mu_t$ $\frac{ \partial Loss } {\sigma} = \mu \cdot exp(-\frac{\mu^2}{2\sigma_g^2}) \cdot log(\frac{1 - D(0)}{1 - lim_{x \to 0}D(x)}) + \cdot \int_{0}^{\infty} D(x) \cdot exp(\frac{ - (x - \mu^2)}{2\sigma_g^2}) \cdot (x - \mu)^2 \cdot (\sigma_g^2 - 1)dx$ We focus on showing: $\frac{1 - D(0)}{1 - lim_{x \to 0}D(x)} \geq 1$ when $\sigma_g > 1$ and vice versa. Expanding it we have: $$ \frac{1 - D(0)}{1 - lim_{x \to 0}D(x)} = \frac{ \sigma_g^{-1} \cdot \int_{-\infty}^{0} exp ( \frac{-(x - \mu)^2}{2 \cdot \sigma_g^2} ) dx }{ \sigma_g^{-1} \cdot \int_{-\infty}^{0} exp ( \frac{-(x - \mu)^2}{2 \cdot \sigma_g^2} ) dx + \int_{-\infty}^{0} exp(\frac{-(x - \mu)^2}{2}) dx} \cdot \frac{ \sigma_g^{-1} \cdot exp (\frac{- (x - \mu)^2}{2 \sigma_g^2} ) + exp ( \frac{- (x - \mu)^2}{2} ) }{ \sigma_g^{-1} \cdot exp (\frac{- (x - \mu)^2}{2 \sigma_g^2} ) } $$ **Numerator** - **Denomenator** gives us: $$ \int_{-\infty}^{0} exp( \frac{- (x - \mu)^2 }{2\sigma_g^2} - \frac{\mu^2}{2} )dx - \int_{-\infty}^{0} exp( \frac{- (x - \mu)^2}{2} - \frac{\mu^2}{2\sigma_g^2} ) dx$$ Directly comparing the power in exponential gives us: $$ -\frac{(x - \mu)^2}{2\sigma_g^2} - \frac{\mu^2}{2} + \frac{(x - \mu)^2}{2} + \frac{\mu^2}{2\sigma_g^2} = (\sigma_g^2 - 1) \cdot \left( (x - \mu)^2 - \mu^2 \right) = (\sigma_g^2 - 1) \cdot ( x^2 - 2 \mu x) $$ In case $\mu>0$, as our range of integration is $x \in (-\infty, 0)$, we easily have $(x^2 - 2 \mu x) > 0$. This thus gives us the conclusion we want. In case $\mu>0$, we need to rewrite our integration in $$\frac{1 - D(0)}{1 - lim_{x \to 0}D(x)}$$ Denote $A = \sigma_g^{-1} \cdot \int_{-\infty}^{\infty} exp( \frac{ -(x - \mu)^2 }{2 \sigma_g^2} )$ Denote $B = \int_{-\infty}^{\infty} exp( \frac{ -(x - \mu)^2 }{2 } )$ $$ \frac{1 - D(0)}{1 - lim_{x \to 0}D(x)} = \frac{ A - \sigma_g^{-1} \cdot \int_{0}^{\infty} exp ( \frac{-(x - \mu)^2}{2\sigma_g^2} ) dx }{ A + B - \sigma_g^{-1} \cdot \int_{0}^{\infty} exp ( \frac{-(x - \mu)^2}{2 \cdot \sigma_g^2} ) dx - \int_{0}^{\infty} exp(\frac{-(x - \mu)^2}{2}) dx} \cdot \frac{ \sigma_g^{-1} \cdot exp (\frac{- (x - \mu)^2}{2 \sigma_g^2} ) + exp ( \frac{- (x - \mu)^2}{2} ) }{ \sigma_g^{-1} \cdot exp (\frac{- (x - \mu)^2}{2 \sigma_g^2} ) } $$ **Numerator** - **Denomenator** gives us: $$ \left( A \cdot \sigma_g exp(\frac{-\mu^2}{2}) - B \cdot exp( -\frac{\mu^2}{2\sigma_g^2} ) \right) + \int_{0}^{\infty} exp(- \frac{(x - \mu)^2}{2} - \frac{\mu^2}{2\sigma_g^2}) - exp( -\frac{(x - \mu)^2}{2\sigma_g^2} - \frac{\mu^2}{2} ) $$ **FACT**: $A \cdot \sigma_g = B$ when $\sigma_g > 1$ vice versa. Then, the first part is reduced to comparing: $$ exp(-\frac{\mu^2}{2}) < exp(-\frac{\mu^2}{2\sigma_g^2}) $$ The second part is reduced to: $(x^2 - 2x\mu) \cdot (1 - \sigma_g^2)$ Now, we have $x > 0$ and $\mu < 0$. Both parts have the behavior 'greater than 0' when $\sigma_g < 1$ and vice versa. The reversed behavior is corrected with $\mu$ mutliplied in front of the log term. ### New observations Denote $A = exp(-\frac{\mu_g^2}{2\sigma_g^2}) \cdot log(\frac{1 - lim_{x \to 0}D(x)}{1 - D(0)})$ Denote $B = D(x) \cdot exp(\frac{ - (x - \mu_g^2)}{2\sigma_g^2})$ #### Case I: $\frac{\partial Loss}{\partial \mu_g}$ if $\mu_g \geq \mu_t$, $\frac{\partial Loss}{\partial \mu} \geq A + \sigma_g^{-2} \cdot \int_{0}^{\infty} B \cdot (x - \mu_g) \cdot (\sigma_g^2-1) dx$ if $\mu_g \leq \mu_t$, $\frac{\partial Loss}{\partial \mu} \leq A + \sigma_g{-2} \cdot \int_{0}^{\infty} B \cdot (x - \mu_g) \cdot (\sigma_g^2-1) dx$ #### Case II: $\frac{\partial Loss}{\partial \sigma_g}$ Still haven't figured out how to bound $\int_{0}^{\infty} B \cdot (x - \mu_g) dx$ Assume $\int_{0}^{\infty} B \cdot (x - \mu_g) dx \geq 0$, if $\mu_g \geq \mu_t$, $\frac{\partial Loss}{\partial \sigma}\geq -A + \int_{0}^{\infty} B \cdot (x - \mu_g)^2 \cdot (\sigma_g^2 - 1) dx$ if $\mu_g \leq \mu_t$, $\frac{\partial Loss}{\partial \sigma} \leq -A + \int_{0}^{\infty} B \cdot (x - \mu_g)^2 \cdot (\sigma_g^2 - 1) dx$ Assume $\int_{0}^{\infty} B \cdot (x - \mu_g) dx \geq 0$, if $\mu_g \leq \mu_t$, $\frac{\partial Loss}{\partial \sigma} \geq -A + \int_{0}^{\infty} B \cdot (x - \mu_g)^2 \cdot (\sigma_g^2 - 1) dx$ if $\mu_g \geq \mu_t$, $\frac{\partial Loss}{\partial \sigma} \leq -A + \int_{0}^{\infty} B \cdot (x - \mu_g)^2 \cdot (\sigma_g^2 - 1) dx$ It really makes a lot of sense if it is: $\mathbb{E}_{x \sim N_g} x \cdot ( I - \Sigma^{-1} ) \cdot x^T \cdot \Sigma^{-1} \cdot (W + W^T) \cdot D^*(W;x)$ intead of: $\mathbb{E}_{x \sim N_g} x^T \cdot ( I - \Sigma^{-1}) \cdot x \cdot \Sigma^{-1} \cdot (W + W^T) \cdot D^*(W;x)$ In the first case, the resulting matrix gradient for diagonal matrix $\Lambda$ will be: $\Lambda_{ij} = \lambda_{ij} - \frac{1}{\lambda_{ij}}$ Gradient for Diagonal Matrix $$ \frac{\partial}{\partial \lambda_j} L(\Lambda) = \int \log(1- D(x)) N_{\Lambda}(x) \left(\frac{x_j^2}{\lambda_j^3} - \frac{1}{\lambda_j}\right) d x $$ Reparametarize with respect to $W^{-1}$ (this is the new matrix that we are trying to estimate). Gradient for matrix $W^{-1}$, $y = W^{-1}x$. Then $y$ is distributed according to $N(0, W^{-1} (W^{-1})^T) = N(0, (W^TW)^{-1})$, i.e. the density is $$ N_W(x) = (1/2 \pi)^{d/2} \exp\left(-\frac 1 2 x^T W^T W x + \frac 12 \log |W^TW|\right) $$ Therefore, the derivative with respect to $W$ is $$ L'(W) = \int \log(1- D(x)) N_W(x) (-W x x^T + {(W^{-1}})^T ) d x $$ We have $$ \nabla_x N_{\Sigma}(x) = - \Sigma^{-1} x N_{\Sigma}(x). $$ Therefore, (Stein's Lemma) integrating by parts we get $$ \int N_{\Sigma}(x) x f(x) d x= \Sigma \int N_{\Sigma}(x) \nabla_x f(x) d x $$ In particular, $$ \int N_{\Sigma}(x) x x^T f(x) = \Sigma \int N_{\Sigma}(x) f(x) d x + \Sigma \int N_{\Sigma}(x) \nabla_x f(x) x^T d x $$ We have $$ \begin{align*} L'(W) &= (W^{-1})^T \int N_{W}(x) \nabla_x \left[\log(1-D(x))\right]\ x^T d x \\ &= \end{align*} $$ We have $$\log(1-D(x)) = \log\left(\frac{1}{1+N_0(x)/N_W(x)} \right) $$ \begin{align*} \nabla_x [\log(1 - D(x))] &= - \frac{1}{1 + N_0(x)/N_\Sigma(0)} \frac{N_0(x)}{N_\Sigma(x)}(\Sigma^{-1} - I)x \\ &= D(x)(I - \Sigma^{-1}) x \end{align*} --- I don't know if I have done it correctly. $$ \begin{align*} L'(W) &= (W^{-1})^T \int N_{W}(x) \nabla_x \left[\log(1-D(x))\right]\ x^T d x \\ &= (W^{-1})^T \int N(\Sigma^{-1};x) \cdot (1 - D(x)) \cdot \frac{N(I^{-1};x)}{N(\Sigma^{-1};x)} \cdot (I^{-1} - \Sigma^{-1}) \cdot xx^T \\ &= (W^{-1})^T \int \frac{N(\Sigma^{-1};x) \cdot N(I^{-1};x)}{N(\Sigma^{-1};x) + N(I^{-1};x)} \cdot (I^{-1} - \Sigma^{-1}) \cdot xx^T \end{align*} $$ For the diagonal element, it is obvious that its sign solely depends on $1 - \lambda_{i,j}$. For the non-diagonal element, assume that we have an diagonal matrix, then the density function will be: $N(\Lambda^{-1};x) = \Sigma x_i^2 \cdot \lambda_i^{-1}$. The function is symmetric with respect to each entry $x_i$. Consider entry $(i,j)$ where $i \neq j$ in the resulting gradient matrix. Observation: $$ \frac{N(\Sigma^{-1};x) \cdot N(I^{-1};x)}{N(\Sigma^{-1};x) + N(I^{-1};-x)}= \frac{N(\Sigma^{-1};-x) \cdot N(I^{-1};-x)}{N(\Sigma^{-1};-x) + N(I^{-1};-x)} $$ when there is no correlation between entries in the two distributions. \begin{align*} L'(W)_{i,j} &= \lambda_i^{-1} \cdot (1 - \lambda_i{-1}) \int_{x_1} \int_{x_2} \cdots \int_{x_i} \int_{x_j} \frac{N(\Sigma^{-1};x) \cdot N(I^{-1};x)}{N(\Sigma^{-1};x) + N(I^{-1};x)} \cdot x_ix_j \\ &= \lambda_i^{-1} \cdot (1 - \lambda_i{-1}) \int_{x_1} \int_{x_2} \cdots \int_{x_i} x_i \cdot \int_{x_j} \frac{N(\Sigma^{-1};x) \cdot N(I^{-1};x)}{N(\Sigma^{-1};x) + N(I^{-1};x)} \cdot x_j \\ &= \lambda_i^{-1} \cdot (1 - \lambda_i{-1}) \int_{x_1} \int_{x_2} \cdots \int_{x_i} x_i \cdot 0 \\ &= 0 \\ \end{align*} Consider the case of learning non-diagonal matrix: We transform our space such that our target becomes a non-diagonal matrix and our generator distribution becomes the identity matrix ## Missing Gradient under imperfect discriminator $$ \frac{\patial}{\partial p_g} \log { \frac{p_d}{p_d + p_g \epsilon_x} } $$ ### Missing Gradient $$ \int (1 - D(x)) \cdot p_g' $$ $$ \int p_g' \cdot log(D(x)) + \int (1 - D(x)) \cdot p_g' $$ $$ \nabla \int p_g \cdot log(D(x)) $$ Objective: $$ \nabla \int p_g \cdot log( \frac{p_d}{p_d + p_g \cdot \epsilon_x} ) $$