Jacobian and Gradients

# Jacobian and Gradients Quick review of derivatives and gradients in the case of - one dimensional input and output $f:\mathbb R\to\mathbb R$, - two dimensional input and one dimenional output $f:\mathbb R^2\to\mathbb R$, - $n$-dimensional input and $m$-dimenional output $f:\mathbb R^n\to\mathbb R^m$. ## The case $\mathbb R\to\mathbb R$ In this case the gradient is the derivative, or the slope of the tangent. Recall that the tangent line of $f$ at $x=a$ is given by the equation $$y=f'(a)(x-a) + f(a).$$ From an algebraic point of view, the important fact here is that $$y-f(a) = f'(a)(x-a)$$ is a **linear equation** and, writing $h$ for $x-a$, the mapping $$h\mapsto f'(a)h$$ is a **linear map**. ## The case $\mathbb R^2\to\mathbb R$ (Picture a function $f:\mathbb R^2\to\mathbb R$ as giving the **height** of a mountainous landscape.) We have the same equation as above, but now $x$ and $a$ are 2-dimensional vectors that can be represented as columns $$x=\left(\array{x_1\\ x_2}\right) \quad\quad a=\left(\array{a_1\\ a_2}\right)$$ and $f'(a)$ is a 2-dimensional vector that can be represented as a row $$f'(a) = \left(\frac {\partial f} {\partial x_1} (a), \frac {\partial f} {\partial x_2} (a)\right).$$ The reason to represent these vectors as columns and rows in this way is that the expression $f'(a)(x-a)$ can then be computed via **matrix multiplication**. This is nice and natural because, for computational purposes, linear maps are usually represented by matrices. For example, above, we represent $f'(a)$ as a (1,2)-matrix. This matrix is also known as the **Jacobian** $J$ of $f$. To summarize, the change of $f$ at $a$ induced by a displacement vector $h$ is $$J(a)\cdot h$$ where $\cdot$ is matrix multiplication. ## Gradients Alternatively, we can also think of $f'(a)$ as a vector, the so-called **gradient** of $f$ at $a$. The gradient of $f$ is often written as $\nabla f$. The change of $f$ at $a$ induced by a displacement vector $h$ is $$\nabla f(a)\cdot h$$ where $\cdot$ is the dot-product between two vectors. **Remark:** Computationally, there is no difference between the matrix product $J(a)\cdot h$ and the dot product $\nabla f(a)\cdot h$. In fact, $J(a)$ is the so-called **transpose** of $\nabla f(a)$, that is, the row-vector obrained from taking the column vector and "turning it on its side". **Important Remark:** While computationally the difference between the linear mapping $J(a)\cdot h$ and the dot product $\nabla f(a)\cdot h$ is trivial, conceptually this equivalence is a deep observation: - The linear map $J(a)$ computes for each displacement $h$ the induced "change in height" of $f$. Note that the displacement $h$ can point in any direction on the plane $\mathbb R^2$. - The gradient $\nabla f$ is the vector that points in the direction of "steepest ascent". - The surprising fact is that the linear map $J(a)$ and its effect on arbitrary displacements $h$ can be represented by the single vector $\nabla f$. ## The case $\mathbb R^n\to\mathbb R^m$ The case $\mathbb R^n\to\mathbb R$ generalises in a straight-forward manner to the case $\mathbb R^n\to\mathbb R^m$ with $m$ outputs. - The function $f$ consists now of components $f_i$, ${1\le i\le m}$. - $J$ is an $(m\times n)$-matrix with $J_{ij}=\frac{\partial f_i}{\partial x_j}.$ - The $i$-th row of the matrix $J$ is the transpose of the gradient $\nabla f_i$. ## References For a review of notation see the Wikipedia articles on [Gradient](https://en.wikipedia.org/wiki/Gradient) and [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant).