$$ \newcommand{\D}{\partial} \newcommand{\R}{\mathbb{R}} \newcommand{\C}{\mathbb{C}} \newcommand{\norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\abs}[1]{\left| #1 \right|} \newcommand{\bmat}[1]{\begin{bmatrix} #1 \end{bmatrix}} \newcommand{\T}{^\top} \newcommand{\maxnorm}[1]{\norm{#1}_\infty} \newcommand{\supnorm}[1]{\norm{#1}_\infty} $$ Differential Calculus === ## Definitions ### Bounded Linear Functions $V, W$ are Banach spaces. $f \in B(V,W)$ iff $f \in L(V,W)$ and $\exists M \geq 0 \forall x \in V \bigg[ \norm{f(x)}_W \leq M \norm{x}_V \bigg]$. ### Differentiable Functions and Derivatives Given Banach spaces $V, W$ and an open set $S \subseteq V$, a function $f:S \mapsto W$ is differentiable at $x \in S$ iff $f(x+h) - f(x) = g(h) + o(h)$ for some $g \in B(V,W)$. If such $g$ exists, we call it the derivative of $f$ at $x$. Well definedness of the derivative follows from its uniqueness implied by existence. $f$ is differentiable on $S$ iff $f$ is differentiable at all $x \in S$. The association, $S \mapsto B(V,W)$, of $x$ with the derivative at $x$, if exists, is called the derivative of $f$ on $S$ and is denoted as $\partial f$. ## General Properties ### Chain Rule $U,V,W$ are Banach spaces. $U' \subseteq U, V' \subseteq V$ are open sets. $g:U' \mapsto V, f:V' \mapsto W$ are differentiable, and $g(U') \subseteq V'$. For $x \in U'$, $\D (fg)(x) = \D f(g(x)) \D g(x)$. ### Proof $$ \frac{\norm{O(o(h))}_W}{\norm{h}_U} = \frac{\norm{O(o(h))}_W}{\norm{o(h)}_V} \frac{\norm{o(h)}_V}{\norm{h}_U} = O(1) \frac{\norm{o(h)}_V}{\norm{h}_U} \to 0 \; (h \to 0) $$ $$ \frac{\norm{o(O(h))}_W}{\norm{h}_U} = \frac{\norm{o(O(h))}_W}{\norm{O(h)}_V} \frac{\norm{O(h)}_V}{\norm{h}_U} = \frac{\norm{o(O(h))}_W}{\norm{O(h)}_V} O(1) \to 0 \; (h \to 0) $$ $$ \begin{align*} & f(g(x+h)) - f(g(x)) \\ &= f(g(x) + \D g(x)(h) + o(h)) - f(g(x)) \\ &= f(g(x)) + \D f(g(x))(\D g(x)(h) + o(h)) + o(\D g(x)(h) + o(h)) - f(g(x)) \\ &= \D f(g(x))(\D g(x)(h)) + O(o(h)) + o(O(h)) \\ &= (\D f(g(x))\D g(x))(h) + o(h) + o(h) \end{align*} $$ ## Differential Calculus in $\R^n$ In this setting, any norm will suffice, as they are all equivalent. ### Theorem $A \in \R^{m \times n}, b \in \R^m, f(x) = Ax + b \implies f'(x) = A$ ### Proof $$ f(x+h) - f(x) = (Ax + Ah + b) - (Ax + b) = Ah $$ Let $a = \max_{i,j} \abs{A_{ij}}$. $$ \maxnorm{Ax} = \max_i \abs{\sum_j A_{ij}x_j} \leq \max_i \sum_j \abs{A_{ij}}\abs{x_j} \leq \max_i \sum_j a\maxnorm{x} = na\maxnorm{x} $$ This concludes the boundedness of the derivative. ### Theorem $A \in \R^{n \times n}, A\T = A, b \in \R, f(x) = x\T Ax + b \implies f'(x) = 2x\T A$ ### Proof Let $a = \max_{i,j} \abs{A_{ij}}$. $$ \abs{h\T Ah} = \abs{\sum_{i,j} h_i A_{ij} h_j} \leq \sum_{i,j} \abs{A_{ij}} \abs{h_i} \abs{h_j} \leq \sum_{i,j} a \maxnorm{h}^2 \leq n^2 a \maxnorm{h}^2 \\ \implies \abs{h\T Ah} / \maxnorm{h} \leq n^2 a \maxnorm{h} \rightarrow 0 \; (h \rightarrow 0) \\ \implies f(x+h) - f(x) = x\T Ah + h\T Ax + h\T Ah = 2x\T Ah + o(h) $$ Since $2x\T A \in \R^{1 \times n}$, the previous proof guarantees its boundedness.