Part 6 - HackMD

--- tags: Digital Image Processing disqus: hackmd --- # Part 6 ## Image Transformation An image transformation is the conversion of an input image to another output image of the same size, using different transformation functions and operators. A transformation should be such that inverse transform, that is, getting the original image back should be possible. They are used widely in pre-processing, compression, enhancement, etc. A transformation can be considered as a set of summation of unitary matrices. Unitary matrix $A$ is defined as one, where $A^{-1} = A^{*T}$, where $A^{*}$ is the complex conjugate of the matrix. They are also known as basis matrices. Consider a $1D$ signal $x(t)$. ![](https://i.imgur.com/GE3P4kO.png) This can be represented as a combination of orthogonal basis vectors. In terms of functions can be considered as $\{a_n(t)\} = \{a_0(t),a_1(t),...\}$. These are orthogonal in the interval $t_0$ to $t_0 + T$ if, \begin{equation} \int_{T}a_m(t)a_n(t)dt = K \text{ if m = n, else 0} \end{equation} If $K = 1$, then the set is orthonormal. If the set $\{\sin{\omega t}, \sin{2\omega t}, \sin{3\omega t}\} = \{a_n(t)\}$. If the interval is $t = 0 \text{ to } T$, where $\omega = \frac{2\pi}{T}$. The graph of $\sin{\omega t}\sin{2\omega t}$ can be found out as, ![](https://i.imgur.com/f5hNcN4.png) So, $\int_{T}\sin{\omega t}\sin{2\omega t}dt = 0$ in this case. Similarly, for the other two combinations, the integration and orthogonality can be checked. On solving, all the three values yield zero, therefore, they can be considered as orthogonal basis functions. So, to represent the function using orthogonal basis, \begin{equation} x(t) = \sum_{n = 0}^{\infty}c_na_n(t); \text{ }t_0 \leq t \leq t_0 + T \end{equation} $c_n$ is the $n^{th}$ coefficient. In order to calculate its value, we can multiply both sides by another orthogonal basis, say, $a_m(t)$. In that case, \begin{equation} \int_{T}x(t)a_m(t)dt = \int_{T}c_na_n(t)a_m(t)dt = c_0\int_{T}a_0(t)a_m(t)dt + c_1\int_{T}a_1(t)a_m(t)dt + ... + c_m\int_{T}a_m(t)a_m(t)dt + ... \end{equation} By definition of orthogonality, we have, \begin{equation} \int_{T}x(t)a_m(t)dt = c_mK \end{equation} So, the coefficient can be represented as, \begin{equation} c_m = \frac{1}{K}\int_{T}x(t)a_m(t)dt \end{equation} If the basis are orthonormal, $K = 1$. The orthogonal basis functions are complete or closed if one of the two conditions hold, 1. There is no signal $x(t)$ with finite energy ($\int_{T}|x(t)|^2dt < \infty$) such that $\int_{T}x(t)a_n(t)dt = 0$. 2. There is a signal $x(t)$ with finite energy ($\int_{T}|x(t)|^2dt < \infty$) such that for $\epsilon > 0$, there exists an $N$, such that $\hat{x}(t) = \sum_{n = 0}^{N - 1}c_na_n(t)$ such that $\int_{T}|x(t) - \hat{x}(t)|^2dt < \epsilon$. Now, instead of taking infinite basis, we consider finite basis, say, $N$ such that the energy difference between the original and the one reconstructed using finite basis is under a small error $\epsilon$, which is a small positive quantity. This is done for approximate representation, to save memory. Using the matrix notation for transformation, consider the vector $\{u(n): 0 \leq n \leq N - 1\}$. Consider the transformation matrix $A_{n \times n}$. In that case, the final transformed vector can be calculated as $v = Au$. This can be shown as $v(k) = \sum_{n = 0}^{N - 1}a(k,n)u(n); \text{ }k = 0,1,...,N - 1$. Being a unitary matrix, it is possible to obtain the original vector as $u = A^{-1}v = A^{*T}v$, giving $u(n) = \sum_{k = 0}^{N - 1}a^{*}(k,n)v(k); \text{ }n = 0,1,...,N - 1$. $a^{*}(k,n)$ are known as the basis vectors of $A$. Using column notations, $\langle A_i, A_j\rangle = K \text{ if i = j, else 0}$. The discussion was done for $1D$ signal. If an image is considered ($2D$ signal), $u(m,n), 0 \leq m,n \leq N - 1$. The transformed image can be considered as, \begin{equation} v(k,l) = \sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1}a_{kl}(m,n)u(m,n); \text{ } 0 \leq k,l \leq N - 1 \end{equation} The inverse transformation can be written as, \begin{equation} u(m,n) = \sum_{k = 0}^{N - 1} \sum_{l = 0}^{N - 1}a^{*}_{kl}(m,n)v(k,l); \text{ } 0 \leq m,n \leq N - 1 \end{equation} Here, \begin{equation} \sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1}a_{kl}(m,n)a^{*}_{k'l'}(m,n) = \delta(k - k',l - l'); \text{ } 0 \leq k,l,k',l' \leq N - 1 \end{equation} Where $\delta$ is known as Kronecker delta function, equal to $1$ when $k = k'$ and $l = l'$, otherwise, it is $0$. Similarly, \begin{equation} \sum_{k = 0}^{N - 1} \sum_{l = 0}^{N - 1}a_{kl}(m,n)a^{*}_{kl}(m',n') = \delta(m - m',n - n'); \text{ } 0 \leq m,n,m',n' \leq N - 1 \end{equation} $v = \{v(k,l)\}$ is the transformation coefficient. Adding the approximation term, we can get the reconstructed image $\hat{u}$ as, \begin{equation} \hat{u}(m,n) = \sum_{k = 0}^{P - 1} \sum_{l = 0}^{Q - 1}a^{*}_{kl}(m,n)v(k,l) \end{equation} The error can be calculated as, \begin{equation} \epsilon^2 = \sum_{k = 0}^{N - 1} \sum_{l = 0}^{N - 1}|u(m,n) - \hat{u}(m,n)|^2 \end{equation} When this is defined, then $a$ is a set of complete or closed basis. If the image is of size $N^2$, then the transformation will take complexity as $O(N^4)$, which is very computationally expensive! To reduce this, we use the concept of separable unitary transformations. The basis $a_{kl}(m,n)$ can be decomposed as $a_{kl}(m,n) = a_k(m)b_l(n) = a(k,m)b(l,n)$. So, $\{a_k(m), k = 0,1,...,N - 1\}$ and $\{b_l(n), l = 0,1,...,N - 1\}$ are the one dimensional orthonormal basis vectors. If they are represented as matrices $A$ and $B$ respectively, then they can be considered as unitary matrices on their own. The transformation equations, then can be written as, \begin{equation} v(k,l) = \sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1} a(k,m)u(m,n)a(l,n) \end{equation} In matrix form, it is written as $v = AuA^T$. For inverse transformations, \begin{equation} u(m,n) = \sum_{k = 0}^{N - 1} \sum_{l = 0}^{N - 1} a^{*}(k,m)v(k,l)a^{*}(l,n) \end{equation} In matrix form, it is written as $u = A^{*T}vA^{*}$. Given $v = AuA^T$, we also have $v^T = A[AU]^T$. This indicates, that the transformation can be done in the two steps that can be seen. For the first multiplication, we need $O(N^3)$. The second multiplication needs $O(N^3)$. So, total complexity is $O(2N^3)$, which is less than the original $O(N^4)$. ### Basis Images $a_K^{*}$ is the $K^{th}$ column of $A^{*T}$. The matrix can be seen as $A^*_{kl} = a_k^{*}a_l^{*T}$. The inner product of $N \times N$ matrices $F,G$ can be defined as $\langle F,G\rangle = \sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1} f(m,n)g^{*}(m,n)$. We have, \begin{equation} v(k,l) = \sum_{m = 0}^{N - 1} \sum_{n = 0}^{N - 1}a_{kl}(m,n)u(m,n); \text{ } 0 \leq k,l \leq N - 1 \end{equation} This can be written as $\langle U, A^{*}_{kl}\rangle$. This is also known as the projection of input image on $(k,l)^{th}$ basis image $A_{kl}^{*}$. Similarly, the inverse transform can be written as, \begin{equation} u = \sum_{k = 0}^{N - 1} \sum_{l = 0}^{N - 1} v(k,l)A^{*}_{kl} \end{equation} These are known as the basis images. So, the image can be seen as a linear combination of basis images. Example: Consider the matrix, \begin{equation} A = \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \end{equation} Also, \begin{equation} u = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \end{equation} The transformed image can be seen as $v = AuA^T$. Therefore, \begin{equation} v = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \end{equation} The basis images can be computed as, \begin{equation} A^{*}_{0,0} = \frac{1}{2} \begin{bmatrix} 1 \\ 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \end{equation} Similarly, \begin{equation} A^{*}_{0,1} = \frac{1}{2} \begin{bmatrix} 1 & -1 \\ 1 & -1 \end{bmatrix} = A^{*}_{1,0} \end{equation} Also, \begin{equation} A^{*}_{1,1} = \frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} \end{equation} Computing the inverse transform as $A^{*T}vA^{*}$ as, \begin{equation} \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = u \end{equation} If the two images $u$ and $v$ are written in the form of rows concatenated with one another or row ordering. With this, we get two vectors of size $N^2 \times 1$. They can be denoted as u and v respectively. Therefore, the transformation can be written as, $v = (A \otimes A)u = \mathscr{A}u$. Similarly, the inverse transformation can be written as $u = (A \otimes A)^{*T}v = \mathscr{A}^{*T}v$. $\otimes$ is the Kronecker product. So, the given transformation can be considered as separable if it can be decomposed to the Kronecker product of two matrices. This separability can reduce the complexity of the operation.