$$
\newcommand{\D}{\partial}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\abs}[1]{\left| #1 \right|}
\newcommand{\bmat}[1]{\begin{bmatrix} #1 \end{bmatrix}}
\newcommand{\T}{^\top}
\newcommand{\maxnorm}[1]{\norm{#1}_\infty}
\newcommand{\supnorm}[1]{\norm{#1}_\infty}
$$
Differential Calculus
===
## Definitions
### Bounded Linear Functions
$V, W$ are Banach spaces.
$f \in B(V,W)$ iff $f \in L(V,W)$ and $\exists M \geq 0 \forall x \in V \bigg[ \norm{f(x)}_W \leq M \norm{x}_V \bigg]$.
### Differentiable Functions and Derivatives
Given Banach spaces $V, W$ and an open set $S \subseteq V$, a function $f:S \mapsto W$ is differentiable at $x \in S$ iff $f(x+h) - f(x) = g(h) + o(h)$ for some $g \in B(V,W)$. If such $g$ exists, we call it the derivative of $f$ at $x$. Well definedness of the derivative follows from its uniqueness implied by existence.
$f$ is differentiable on $S$ iff $f$ is differentiable at all $x \in S$. The association, $S \mapsto B(V,W)$, of $x$ with the derivative at $x$, if exists, is called the derivative of $f$ on $S$ and is denoted as $\partial f$.
## General Properties
### Chain Rule
$U,V,W$ are Banach spaces. $U' \subseteq U, V' \subseteq V$ are open sets. $g:U' \mapsto V, f:V' \mapsto W$ are differentiable, and $g(U') \subseteq V'$. For $x \in U'$, $\D (fg)(x) = \D f(g(x)) \D g(x)$.
### Proof
$$
\frac{\norm{O(o(h))}_W}{\norm{h}_U}
= \frac{\norm{O(o(h))}_W}{\norm{o(h)}_V} \frac{\norm{o(h)}_V}{\norm{h}_U}
= O(1) \frac{\norm{o(h)}_V}{\norm{h}_U}
\to 0 \; (h \to 0)
$$
$$
\frac{\norm{o(O(h))}_W}{\norm{h}_U}
= \frac{\norm{o(O(h))}_W}{\norm{O(h)}_V} \frac{\norm{O(h)}_V}{\norm{h}_U}
= \frac{\norm{o(O(h))}_W}{\norm{O(h)}_V} O(1)
\to 0 \; (h \to 0)
$$
$$
\begin{align*}
& f(g(x+h)) - f(g(x))
\\ &= f(g(x) + \D g(x)(h) + o(h)) - f(g(x))
\\ &= f(g(x)) + \D f(g(x))(\D g(x)(h) + o(h)) + o(\D g(x)(h) + o(h)) - f(g(x))
\\ &= \D f(g(x))(\D g(x)(h)) + O(o(h)) + o(O(h))
\\ &= (\D f(g(x))\D g(x))(h) + o(h) + o(h)
\end{align*}
$$
## Differential Calculus in $\R^n$
In this setting, any norm will suffice, as they are all equivalent.
### Theorem
$A \in \R^{m \times n}, b \in \R^m, f(x) = Ax + b \implies f'(x) = A$
### Proof
$$
f(x+h) - f(x) = (Ax + Ah + b) - (Ax + b) = Ah
$$
Let $a = \max_{i,j} \abs{A_{ij}}$.
$$
\maxnorm{Ax}
= \max_i \abs{\sum_j A_{ij}x_j}
\leq \max_i \sum_j \abs{A_{ij}}\abs{x_j}
\leq \max_i \sum_j a\maxnorm{x}
= na\maxnorm{x}
$$
This concludes the boundedness of the derivative.
### Theorem
$A \in \R^{n \times n}, A\T = A, b \in \R, f(x) = x\T Ax + b \implies f'(x) = 2x\T A$
### Proof
Let $a = \max_{i,j} \abs{A_{ij}}$.
$$
\abs{h\T Ah}
= \abs{\sum_{i,j} h_i A_{ij} h_j}
\leq \sum_{i,j} \abs{A_{ij}} \abs{h_i} \abs{h_j}
\leq \sum_{i,j} a \maxnorm{h}^2
\leq n^2 a \maxnorm{h}^2
\\ \implies \abs{h\T Ah} / \maxnorm{h}
\leq n^2 a \maxnorm{h} \rightarrow 0 \; (h \rightarrow 0)
\\ \implies f(x+h) - f(x)
= x\T Ah + h\T Ax + h\T Ah
= 2x\T Ah + o(h)
$$
Since $2x\T A \in \R^{1 \times n}$, the previous proof guarantees its boundedness.