---
###### tags: `DataCompression`
---
# JPEG Decoder (1/3)
# 簡介
Baseline JPEG Baseline編碼時依序將經過下面流程:
1. RGB to Y'CbCr
2. 2-D DCT
3. Quantization
4. Huffman encoding
最後Header、Huffman table、Quantization table和資料一起包裝成jpg。
![](https://i.imgur.com/6DdYiJU.png)
出自[An Introduction to Image Compression](http://disp.ee.ntu.edu.tw/meeting/%E7%B6%AD%E6%AF%85/An%20Introduction%20to%20Image%20Compression/An%20introduction%20to%20Image%20Compression.pdf)
:::info
以`numpy`進行矩陣運算
```python=
import numpy as np
A = A @ B # matrix multiplication
A = np.dot(A,B) # matrix multiplication
A = np.multiply(A,B) # element-wise matrix multiplication
A = A.astype(np.uint8) # Copy the array and cast to uint8 type
A = A.reshape((3,3))
```
:::
# 1. RGB-to-Y'CbCr
參考 [YCbCr](https://en.wikipedia.org/wiki/YCbCr): JPEG conversion,`LenaRGB.raw` 為 512x512 24bit RGB影像。
```python=
import numpy as np
def rgb2yuv(rgb):
rgb = rgb.astype(np.float64)
m = np.array([
[0.29900, -0.168736, 0.5],
[0.58700, -0.331264, -0.418688],
[0.11400, 0.5, -0.081312]
])
yuv = np.dot(rgb, m)
yuv[:, :, 1:] += 128
return yuv.astype(np.uint8)
def yuv2rgb(yuv):
yuv = yuv.astype(np.float64)
yuv[:, :, 1:] -= 128
m = np.array([
[1.000, 1.000, 1.000],
[0.000, -0.344136, 1.772],
[1.402, -0.714136, 0.000],
])
rgb = np.dot(yuv, m)
return rgb.astype(np.uint8)
with open("./LenaRGB.raw", "rb") as f:
original = [f.read(1)[0] for i in range(512*512*3)]
original = np.array(original, dtype=np.uint8).reshape((512, 512, 3))
image = original[:, :, :].copy()
img_yuv = rgb2yuv(image)
img_rgb = yuv2rgb(img_yuv)
# Difference between two image
Diff = np.abs(img_rgb.astype(np.int16) - image.astype(np.int16))
for idx, color in enumerate(("Red", "Green", "Blue")):
print(color)
unique, counts = np.unique(Diff[:,:,idx], return_counts=True)
for d, c in np.asarray((unique, counts)).T:
print(d, c , '{:.3f}%'.format(c/512**2*100))
assert np.allclose(image, img_rgb, atol=3, rtol=0)
```
```
# 與原圖誤差
Red
1 98459 37.559%
2 150671 57.476%
3 13014 4.964%
Green
0 137287 52.371%
1 124857 47.629%
Blue
1 76003 28.993%
2 143250 54.646%
3 42891 16.362%
```
測試$RGB\rightarrow Y'CbCr \rightarrow RGB$,雖然肉眼分不出還原的圖與原圖差異,但從數據看來,值差異最多到3,以藍色最為明顯。造成誤差的原因是float、uint8間的轉換誤差。
# 2. 2D Discrete Cosine Transform
### DCT & IDCT
> 先定義JPEG使用的DCT式子,再推導出1D-DCT和2D-DCT。
[DCT-II](https://en.wikipedia.org/wiki/Discrete_cosine_transform)為最廣泛使用的DCT形式之一,用於JPEG壓縮,DCT依據定義不同而有不同形式。DCT-II式子如下:
- $X_k=2\sum_{n=0}^{N-1}x_n cos(\frac{\pi}{N}(n+\frac{1}{2})k)$
為了在其化為矩陣時成為orthogonal matrix ,乘上 scaling factor: f 化為 orthonormalized DCT-II。
![](https://i.imgur.com/NDlaXKU.png)
DCT-III為DCT-II的inverse,orthonormalized DCT-III如下:
$x_k=\frac{X_0}{\sqrt{N}}+\sqrt{\frac{2}{N}}\sum_{n=1}^{N-1}X_n cos(\frac{\pi}{N}(k+\frac{1}{2})n)$
現在簡單驗證一下乘上$\sqrt{\frac{1}{4N}}$、$\sqrt{\frac{1}{2N}}$係數不會影響到轉換,並以`scipy.fftpack.dct`驗證結果是否一致。`scipy.fftpack.dct`有提供幾個參數,設置`norm='ortho'`使矩陣係數orthonormal。
```python=
import matplotlib.pyplot as plt
from scipy.fftpack import dct, idct
import numpy as np
def get_Xk(Xs, k):
L = len(Xs)
ns = np.arange(0, L, 1)
terms = 2*Xs*np.cos((np.pi*k*(1/2+ns))/L)
if k == 0:
terms *= np.sqrt(1/(4*L))
else:
terms *= np.sqrt(1/(2*L))
return np.sum(terms)
def get_xn(Xk, k):
L = len(Xk)
ks = np.arange(0, L, 1)
terms = Xk*np.cos((np.pi*(1/2+k)*ks)/L)
terms[0] *= np.sqrt(1/L)
terms[1:] *= np.sqrt(2/L)
return np.sum(terms)
# my data
original_data = np.random.rand(4)
L = original_data.shape[0]
print('Original data:', original_data)
# DCT
X = []
for k in range(L):
Xk = get_Xk(original_data, k)
X.append(Xk)
print('After DCT:', X)
assert np.allclose(X,
dct(original_data, norm='ortho', axis=0))
# IDCT
Y = []
for k in range(L):
xn = get_xn(X, k)
Y.append(xn)
print('After IDCT', Y)
assert np.allclose(Y,
idct(X, norm='ortho', axis=0))
# Verify
assert np.allclose(original_data, Y)
```
```
Original data: [0.55236188 0.60173472 0.6432904 0.86820949]
After DCT: [1.332798243756804, -0.21758228008826608, 0.08777313099349143, -0.058320189471026906]
After IDCT [0.5523618833360506, 0.6017347154930212, 0.6432903972702912, 0.8682094914142449]
```
結果符合預期。
### 1-D DCT
試著將DCT化成矩陣形式
$X=Cx$
- $X\in R^N$,第k項元素代表$X_k$
- $C$是NxN矩陣,為轉換時的係數,稱為DCT matrix
- $when\ k=0,C_{k,n}=\sqrt{\frac{1}{N}}cos{(\frac{\pi}{N}(n+\frac{1}{2})k)}$
- $when\ k>0,C_{k,n}=\sqrt{\frac{2}{N}}cos{(\frac{\pi}{N}(n+\frac{1}{2})k)}$
隨意列舉幾個DCT matrix:
$C_{2X2}=\left[
\begin{array}{ccc}
0.707 & 0.707 \\
0.707 & -0.707 \\
\end{array}
\right]$
$C_{3X3}=\left[
\begin{array}{ccc}
0.577 & 0.577 & 0.577 \\
0.707 & 0 & -0.707 \\
0.408 & -0.816 & 0.408 \\
\end{array}
\right]$
$C_{4X4}=\left[
\begin{array}{ccc}
0.5 & 0.5 & 0.5 & 0.5 \\
0.653 & 0.271 & -0.271 & -0.653\\
0.5 & -0.5 & -0.5 & 0.5 \\
0.271 & -0.653 & 0.653 & -0.271 \\
\end{array}
\right]$
不難看出$C$是[orthogonal matrix](https://en.wikipedia.org/wiki/Orthogonal_matrix),根據orthogonal matrix的特性:$C^TC=CC^T=I \longleftrightarrow C^T=C^{-1}$。
在進行反轉換時原本要求的$C^{-1}$可代換為$C^T$,得到$x=C^{-1}=C^TX$。
驗證$C$的orthogonality:
```python=
import matplotlib.pyplot as plt
import numpy as np
L = 8
C = np.zeros((L, L))
for k in range(L):
for n in range(L):
C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
if k != 0:
C[k, n] *= np.sqrt(2)
print('C=', np.array2string(np.round(C, 3), prefix='C='))
print('C^TC=', np.array2string(np.round(np.transpose(C) @ C, 3), prefix='C^TC='))
assert np.allclose(np.transpose(C) @ C, np.eye(8))
```
```
C= [[ 0.354 0.354 0.354 0.354 0.354 0.354 0.354 0.354]
[ 0.49 0.416 0.278 0.098 -0.098 -0.278 -0.416 -0.49 ]
[ 0.462 0.191 -0.191 -0.462 -0.462 -0.191 0.191 0.462]
[ 0.416 -0.098 -0.49 -0.278 0.278 0.49 0.098 -0.416]
[ 0.354 -0.354 -0.354 0.354 0.354 -0.354 -0.354 0.354]
[ 0.278 -0.49 0.098 0.416 -0.416 -0.098 0.49 -0.278]
[ 0.191 -0.462 0.462 -0.191 -0.191 0.462 -0.462 0.191]
[ 0.098 -0.278 0.416 -0.49 0.49 -0.416 0.278 -0.098]]
C^TC= [[ 1. -0. 0. 0. -0. 0. 0. -0.]
[-0. 1. -0. -0. 0. -0. -0. 0.]
[ 0. -0. 1. -0. -0. -0. -0. -0.]
[ 0. -0. -0. 1. 0. 0. 0. 0.]
[-0. 0. -0. 0. 1. -0. 0. -0.]
[ 0. -0. -0. 0. -0. 1. 0. 0.]
[ 0. -0. -0. 0. 0. 0. 1. 0.]
[-0. 0. -0. 0. -0. 0. 0. 1.]]
```
:::success
- 由於浮點數精度的問題,用`np.allclose`容忍一定範圍內的誤差
- orthogonal matrix
1. 實數矩陣
2. 方陣
3. 行列皆為orthogonal unit vectors (orthonormal vectors)
:::
### 2-D DCT
2-D DCT其中一個特性是seperable,也就是說可拆分成兩個1-D DCT
1. 垂直方向對column進行DCT
2. 水平方向對row進行DCT
以矩陣表示之
$X=CxC^T$
- $x$為image matrix
- $X$為經過2-D DCT的image matrix
以8x8 DCT matrix大小為例,兩次矩陣乘法的運算量十分驚人,如果我們將$x$及$X$進行[向量化](https://en.wikipedia.org/wiki/Vectorization_(mathematics))成一維矩陣(以$vec(...)$表示),可以用[Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product)將$C^T,C$合併成一個矩陣。Kronecker product又被稱作matrix direct product,運算子$\otimes$,不具交換性,定義如下:
![](https://i.imgur.com/uKoHYYr.png)
![](https://i.imgur.com/5gkkbl6.png)
與[向量化結合](https://en.wikipedia.org/wiki/Vectorization_(mathematics)#Compatibility_with_Kronecker_products)
$vec(ABC)=(C^T\otimes A)vec(B)$
依據以上式子改寫原式:
$X=CxC^T\\
\rightarrow vec(X)=((C^T)^T\otimes C)vec(x)\\
\rightarrow vec(X)=(C\otimes C)vec(x)$
事先計算$C\otimes C$可增加DCT轉換效率。`numpy`可協助做到Kronecker product
```python=
import numpy as np
A = np.array([[1,2],[3,4]])
B = np.array([[0,5],[6,7]])
C = np.kron(A,B)
C = array([[ 0, 5, 0, 10],
[ 6, 7, 12, 14],
[ 0, 15, 0, 20],
[18, 21, 24, 28]])
```
:::info
JPEG是由一小片一小片壓縮後的影像組合而成,以8x8 pixels為單位,進行量化、編碼,稱為Minimum Coded Unit (MCU)。因此2D-DCT矩陣以及量化矩陣也是8x8。
:::
接著使用`scipy.fftpack.dct`檢驗依`dct_2`生成的8x8 2D-DCT矩陣是否一致,`dct_2`是我們的JPEG Encoder用來計算2-D DCT的函式:
```python=
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import dct, idct
def dct_2(input):
L = input.shape[0]
C = np.zeros((L, L))
for k in range(L):
for n in range(L):
C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
if k != 0:
C[k, n] *= np.sqrt(2)
return (np.kron(C, C) @ input.flatten()).reshape(L, L)
M0 = np.random.rand(8, 8)
D1 = dct_2(M0)
D2 = dct(dct(M0, norm='ortho', axis=0), norm='ortho', axis=1)
print('D1=', np.array2string(np.round(D1, 3), prefix='D1='))
print('D2=', np.array2string(np.round(D2, 3), prefix='D2='))
assert np.allclose(D1, D2)
```
假設$C$為8x8:
```
C⊗C= [[ 0.125 0.125 0.125 ... 0.125 0.125 0.125 ]
[ 0.1734 0.147 0.0982 ... -0.0982 -0.147 -0.1734]
[ 0.1633 0.0676 -0.0676 ... -0.0676 0.0676 0.1633]
...
[ 0.0271 -0.0478 0.0095 ... 0.0095 -0.0478 0.0271]
[ 0.0187 -0.0451 0.0451 ... -0.0451 0.0451 -0.0187]
[ 0.0095 -0.0271 0.0406 ... 0.0406 -0.0271 0.0095]]
```
# 3. Quantization
量化是為了去除視覺上不重要的資訊。quality factor越高,生成的量化矩陣(quantization table)就多元素接近1,經過DCT轉換的區塊除上量化矩陣,保留的資料就越多。各家相機廠商使用的[量化矩陣](https://www.impulseadventure.com/photo/jpeg-quantization.html),皆有所不同。
實作上由於quality factor生成的量化矩陣可能有元素為0,需要將結果限制在`[0,255]`
```python=
import numpy as np
# Standard JPEG quantization tables
# Luminance
qt_luminance = np.array([
[16, 11, 10, 16, 24, 40, 51, 61],
[12, 12, 14, 19, 26, 58, 60, 55],
[14, 13, 16, 24, 40, 57, 69, 56],
[14, 17, 22, 29, 51, 87, 80, 62],
[18, 22, 37, 56, 68, 109, 103, 77],
[24, 35, 55, 64, 81, 104, 113, 92],
[49, 64, 78, 87, 103, 121, 120, 101],
[72, 92, 95, 98, 112, 100, 103, 99],
])
# Chrominance
qt_chrominance = np.array([
[17, 18, 24, 47, 99, 99, 99, 99],
[18, 21, 26, 66, 99, 99, 99, 99],
[24, 26, 56, 99, 99, 99, 99, 99],
[47, 66, 99, 99, 99, 99, 99, 99],
[99, 99, 99, 99, 99, 99, 99, 99],
[99, 99, 99, 99, 99, 99, 99, 99],
[99, 99, 99, 99, 99, 99, 99, 99],
[99, 99, 99, 99, 99, 99, 99, 99],
])
def Quantize(input, qf, jpeg_quantiz_matrix):
# Quality factor = 1...100
factor = 5000/qf if (qf < 50) else 200-2*qf
# Prevent divide by 0 error
jpeg_quantiz_matrix = np.maximum(np.floor(jpeg_quantiz_matrix*factor/100+.5),
np.ones((8, 8)))
print("QF={}".format(qf))
print(np.array2string(np.round(jpeg_quantiz_matrix, 3), prefix=''))
return (input // jpeg_quantiz_matrix) # '//' is the floor division operator
Quantize(np.eye(8), 100, qt_luminance)
Quantize(np.eye(8), 95, qt_luminance)
Quantize(np.eye(8), 80, qt_luminance)
```
Standard JPEG quantization tables (Luminance)在不同quality factor下的量化矩陣
```
QF=100
[[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]]
QF=95
[[ 2. 1. 1. 2. 2. 4. 5. 6.]
[ 1. 1. 1. 2. 3. 6. 6. 6.]
[ 1. 1. 2. 2. 4. 6. 7. 6.]
[ 1. 2. 2. 3. 5. 9. 8. 6.]
[ 2. 2. 4. 6. 7. 11. 10. 8.]
[ 2. 4. 6. 6. 8. 10. 11. 9.]
[ 5. 6. 8. 9. 10. 12. 12. 10.]
[ 7. 9. 10. 10. 11. 10. 10. 10.]]
QF=80
[[ 6. 4. 4. 6. 10. 16. 20. 24.]
[ 5. 5. 6. 8. 10. 23. 24. 22.]
[ 6. 5. 6. 10. 16. 23. 28. 22.]
[ 6. 7. 9. 12. 20. 35. 32. 25.]
[ 7. 9. 15. 22. 27. 44. 41. 31.]
[10. 14. 22. 26. 32. 42. 45. 37.]
[20. 26. 31. 35. 41. 48. 48. 40.]
[29. 37. 38. 39. 45. 40. 41. 40.]]
```
# Reference
- [JPEG Huffman Coding Tutorial](https://www.impulseadventure.com/photo/jpeg-huffman-coding.html)