# Jacobian and Gradients
Quick review of derivatives and gradients in the case of
- one dimensional input and output $f:\mathbb R\to\mathbb R$,
- two dimensional input and one dimenional output $f:\mathbb R^2\to\mathbb R$,
- $n$-dimensional input and $m$-dimenional output $f:\mathbb R^n\to\mathbb R^m$.
## The case $\mathbb R\to\mathbb R$
In this case the gradient is the derivative, or the slope of the tangent. Recall that the tangent line of $f$ at $x=a$ is given by the equation
$$y=f'(a)(x-a) + f(a).$$
From an algebraic point of view, the important fact here is that
$$y-f(a) = f'(a)(x-a)$$
is a **linear equation** and, writing $h$ for $x-a$, the mapping
$$h\mapsto f'(a)h$$
is a **linear map**.
## The case $\mathbb R^2\to\mathbb R$
(Picture a function $f:\mathbb R^2\to\mathbb R$ as giving the **height** of a mountainous landscape.)
We have the same equation as above, but now $x$ and $a$ are 2-dimensional vectors that can be represented as columns
$$x=\left(\array{x_1\\ x_2}\right)
\quad\quad
a=\left(\array{a_1\\ a_2}\right)$$
and $f'(a)$ is a 2-dimensional vector that can be represented as a row
$$f'(a) = \left(\frac {\partial f} {\partial x_1} (a), \frac {\partial f} {\partial x_2} (a)\right).$$
The reason to represent these vectors as columns and rows in this way is that the expression $f'(a)(x-a)$ can then be computed via **matrix multiplication**.
This is nice and natural because, for computational purposes, linear maps are usually represented by matrices. For example, above, we represent $f'(a)$ as a (1,2)-matrix.
This matrix is also known as the **Jacobian** $J$ of $f$.
To summarize, the change of $f$ at $a$ induced by a displacement vector $h$ is
$$J(a)\cdot h$$
where $\cdot$ is matrix multiplication.
## Gradients
Alternatively, we can also think of $f'(a)$ as a vector, the so-called **gradient** of $f$ at $a$. The gradient of $f$ is often written as $\nabla f$.
The change of $f$ at $a$ induced by a displacement vector $h$ is
$$\nabla f(a)\cdot h$$
where $\cdot$ is the dot-product between two vectors.
**Remark:** Computationally, there is no difference between the matrix product $J(a)\cdot h$ and the dot product $\nabla f(a)\cdot h$. In fact, $J(a)$ is the so-called **transpose** of $\nabla f(a)$, that is, the row-vector obrained from taking the column vector and "turning it on its side".
**Important Remark:** While computationally the difference between the linear mapping $J(a)\cdot h$ and the dot product $\nabla f(a)\cdot h$ is trivial, conceptually this equivalence is a deep observation:
- The linear map $J(a)$ computes for each displacement $h$ the induced "change in height" of $f$. Note that the displacement $h$ can point in any direction on the plane $\mathbb R^2$.
- The gradient $\nabla f$ is the vector that points in the direction of "steepest ascent".
- The surprising fact is that the linear map $J(a)$ and its effect on arbitrary displacements $h$ can be represented by the single vector $\nabla f$.
## The case $\mathbb R^n\to\mathbb R^m$
The case $\mathbb R^n\to\mathbb R$ generalises in a straight-forward manner to the case $\mathbb R^n\to\mathbb R^m$ with $m$ outputs.
- The function $f$ consists now of components $f_i$, ${1\le i\le m}$.
- $J$ is an $(m\times n)$-matrix with $J_{ij}=\frac{\partial f_i}{\partial x_j}.$
- The $i$-th row of the matrix $J$ is the transpose of the gradient $\nabla f_i$.
## References
For a review of notation see the Wikipedia articles on [Gradient](https://en.wikipedia.org/wiki/Gradient) and [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant).