Frechet derivative

# Frechet derivative Document giving an overview of the Frechet derivative. # Interesting theorem If $T:\mathcal{U}\rightarrow \mathcal{V}$ is a linear transformation and $\mathcal{U}, \mathcal{V}$ are vector spaces, then the following statements are equivalent: + $T$ is continuous + $T$ is continuous at $0$ + $T$ is limited (that is, $\|T\|_\mathrm{op}<\infty$) # Interesting preliminary Corollaries ## Corollary 1: Suppose you have $\mathcal{H}$ a Hilbert space. Then, for every continuous linear functional $T:\mathcal{H}\rightarrow \mathbb{K}$, there is $h \in \mathcal{H}$ such that $T(x) = \langle h, x\rangle$. ## Corollary 2: All continuous linear transformations of type $T:\mathbb{K}^N \rightarrow \mathbb{K}^M$ can be expressed as a matrix product so that $T(x) = A x$, $A\in \mathbb{K}^{M\times N}$. In particular for scalar linear transformations $T:\mathbb{K} \rightarrow \mathbb{K}$ $\Rightarrow$ $\exists a \in \mathbb{K}$ s.t. $T(x) = a.x$. # Frechet derivative and the generalized gradient Suppose you have $\mathcal{U}$ and $\mathcal{V}$ Banach spaces. Then, the Frechet derivative of a differentiable function $f:\mathcal{U}\rightarrow \mathcal{V}$ at point $x\in \mathcal{U}$ is a linear function $D_x[f](\Delta x)$ such that: \begin{equation} \lim_{\|\Delta x\|_\mathcal{U} \rightarrow 0 } \frac{\left \| f(x+\Delta x) -f(x) - D_x[f](\Delta x) \right \|_\mathcal{V}}{\|\Delta x\|_\mathcal{U}} = 0 \end{equation} In order to understand how the Frechet derivative expands on the regular derivative, let's consider the case where $f$ is a simple real function and look at its Taylor expansion: \begin{equation} f(x+\Delta x) = f(x) +\Delta x.\frac{\mathrm{d}f}{\mathrm{d}x}(x) + \Delta x^2 \frac{\mathrm{d}^2f}{\mathrm{d}x^2}(x) + \dots \end{equation} If $\mathcal{U}$ is a Hilbert space and $\mathcal{V} = \mathbb{K}$, from corollary 1, we have that $\exists u \in \mathcal{U}$ such that: \begin{equation} D_x[f](\Delta x) = \langle u, \Delta x \rangle \end{equation} We call $u$ the gradient of $f$ at $x$, or $\nabla f (x)$. ### Example: Let's find the Frechet derivative of $(\mathbf{K}+\lambda \mathbf{I})^{-1}$ in regards to $\mathbf{K}$. Observe the following expression (proof up to the reader): \begin{equation} (\mathbf{I} + \Delta \mathbf{H})^{-1} = \mathbf{I} - \Delta \mathbf{H} + \Delta \mathbf{H}^2 - \Delta \mathbf{H}^3 \dots,\ \| \Delta \mathbf{H}\|_\mathrm{Op} <1 \end{equation} Therefore: \begin{eqnarray} (\mathbf{K} + \Delta \mathbf{K} + \lambda \mathbf{I})^{-1} &= (\mathbf{K} + \lambda \mathbf{I})^{-1}(I + \Delta \mathbf{K} (\mathbf{K} + \lambda \mathbf{I})^{-1})^{-1}\\ &= (\mathbf{K} + \lambda \mathbf{I})^{-1} (\mathbf{I} - \Delta \mathbf{K} (\mathbf{K} + \lambda \mathbf{I})^{-1} + (\Delta \mathbf{K} (\mathbf{K} + \lambda \mathbf{I})^{-1})^2 + \dots)\\ &=(\mathbf{K} + \lambda \mathbf{I})^{-1} - (\mathbf{K} + \lambda \mathbf{I})^{-1}\Delta \mathbf{K} (\mathbf{K} + \lambda \mathbf{I})^{-1} + \dots \end{eqnarray} Therefore, we have that the Frechet derivative is $D_\mathbf{K}[(\mathbf{K} + \lambda \mathbf{I})^{-1}] = - (\mathbf{K} + \lambda \mathbf{I})^{-1}\Delta \mathbf{K} (\mathbf{K} + \lambda \mathbf{I})^{-1}$ ## Theorem: Chain rule The Frechet derivative's true power comes from its composability via the chain rule. Suppose you have $\mathcal{W}$ also Banach and $g: \mathcal{V} \rightarrow \mathcal{W}$. \begin{equation} D_x[g \circ f](\Delta x) = D_{f(x)}[g](D_x[f](\Delta x)) \end{equation} ## Corollary 3: The Frechet derivative of a continuous linear transformation is the transformation itself. Suppose you have $T:\mathcal{U}\rightarrow \mathcal{V}$ a continuous linear transformation. Then $D_x[T](\Delta x) = T(\Delta x)$ ## Corollary 4: The Frechet derivative of a vector or matrix function in regards to a parameter $h$ is the partial derivative of the entries multiplied by the increment $\Delta h$. Example: \begin{equation} D_h[\mathbf{K}](\Delta h) = \Delta h \left [ \frac{\partial K_{y,x}}{\partial h} \right ]_{y, x \in {1,2,\dots, N}} = \frac{\partial \mathbf{K}}{\partial h} \Delta h \end{equation} # Practical example: calculating quick, coordinate-free partial derivatives. As an example, let's calculate the partial derivative of the function $f(\theta) = \mathrm{Tr}\left( (\mathbf{K} + \lambda \mathbf{I})^{-1} \right)$ in terms of the parameter $\theta$ of matrix $\mathbf{K}$. \begin{equation} D_\theta [f] (\Delta \theta) = D_{(\mathbf{K} + \lambda \mathbf{I})^{-1}}[\mathrm{Tr}](D_\mathbf{K}[(\mathbf{K} + \lambda \mathbf{I})^{-1}](D_\theta [\mathbf{K}](\Delta \theta))) \end{equation} The first step is that $\mathrm{Tr}$ is a linear transformation, meaning it's Frechet derivative is itself. \begin{equation} D_\theta [f] (\Delta \theta) = \mathrm{Tr}(D_\mathbf{K}[(\mathbf{K} + \lambda \mathbf{I})^{-1}](D_\theta [\mathbf{K}](\Delta \theta))) \end{equation} Then, we substitute the Frechet derivative of the inverse we calculated earlier: \begin{equation} D_\theta [f] (\Delta \theta) = \mathrm{Tr}(-(\mathbf{K} + \lambda \mathbf{I})^{-1}D_\theta [\mathbf{K}](\Delta \theta)(\mathbf{K} + \lambda \mathbf{I})^{-1})=-\mathrm{Tr}((\mathbf{K} + \lambda \mathbf{I})^{-1}D_\theta [\mathbf{K}](\Delta \theta)(\mathbf{K} + \lambda \mathbf{I})^{-1}) \end{equation} And finally, we replace the derivative of $\mathbf{K}$ in regards to the parameter. \begin{equation} D_\theta [f] (\Delta \theta) =-\mathrm{Tr}\left ((\mathbf{K} + \lambda \mathbf{I})^{-1}\left (\frac{\partial \mathbf{K}}{\partial \theta} \Delta \theta \right ) (\mathbf{K} + \lambda \mathbf{I})^{-1}\right ) = -\mathrm{Tr}\left ((\mathbf{K} + \lambda \mathbf{I})^{-1}\frac{\partial \mathbf{K}}{\partial \theta} (\mathbf{K} + \lambda \mathbf{I})^{-1}\right ) \Delta \theta \end{equation} Finally, our function itself is a scalar function, meaning the Frechet derivative is a simple product: \begin{equation} \frac{\partial \mathrm{Tr}((\mathbf{K}+\lambda \mathbf{I})^{-1})}{\partial \theta} \Delta \theta = -\mathrm{Tr}\left ((\mathbf{K}+\lambda \mathbf{I})^{-1} \frac{\partial \mathbf{K}}{\partial \theta} (\mathbf{K}+\lambda \mathbf{I})^{-1} \right )\Delta \theta,\ \forall \Delta \theta \end{equation} Therefore: \begin{equation} \frac{\partial \mathrm{Tr}((\mathbf{K}+\lambda \mathbf{I})^{-1})}{\partial \theta} = -\mathrm{Tr}\left ((\mathbf{K}+\lambda \mathbf{I})^{-1} \frac{\partial \mathbf{K}}{\partial \theta} (\mathbf{K}+\lambda \mathbf{I})^{-1} \right ) \end{equation} Meaning if you know Frechet derivatives in a table or precomputed, much in the way we compute integrals and regular derivatives of composite functions, you are capable of calculating partial derivatives across spaces just as easily as if the spaces were all the same.