JPEG Decoder (1/3)

--- ###### tags: `DataCompression` --- # JPEG Decoder (1/3) # 簡介 Baseline JPEG Baseline編碼時依序將經過下面流程： 1. RGB to Y'CbCr 2. 2-D DCT 3. Quantization 4. Huffman encoding 最後Header、Huffman table、Quantization table和資料一起包裝成jpg。 ![](https://i.imgur.com/6DdYiJU.png) 出自[An Introduction to Image Compression](http://disp.ee.ntu.edu.tw/meeting/%E7%B6%AD%E6%AF%85/An%20Introduction%20to%20Image%20Compression/An%20introduction%20to%20Image%20Compression.pdf) :::info 以`numpy`進行矩陣運算 ```python= import numpy as np A = A @ B # matrix multiplication A = np.dot(A,B) # matrix multiplication A = np.multiply(A,B) # element-wise matrix multiplication A = A.astype(np.uint8) # Copy the array and cast to uint8 type A = A.reshape((3,3)) ``` ::: # 1. RGB-to-Y'CbCr 參考 [YCbCr](https://en.wikipedia.org/wiki/YCbCr): JPEG conversion，`LenaRGB.raw` 為 512x512 24bit RGB影像。 ```python= import numpy as np def rgb2yuv(rgb): rgb = rgb.astype(np.float64) m = np.array([ [0.29900, -0.168736, 0.5], [0.58700, -0.331264, -0.418688], [0.11400, 0.5, -0.081312] ]) yuv = np.dot(rgb, m) yuv[:, :, 1:] += 128 return yuv.astype(np.uint8) def yuv2rgb(yuv): yuv = yuv.astype(np.float64) yuv[:, :, 1:] -= 128 m = np.array([ [1.000, 1.000, 1.000], [0.000, -0.344136, 1.772], [1.402, -0.714136, 0.000], ]) rgb = np.dot(yuv, m) return rgb.astype(np.uint8) with open("./LenaRGB.raw", "rb") as f: original = [f.read(1)[0] for i in range(512*512*3)] original = np.array(original, dtype=np.uint8).reshape((512, 512, 3)) image = original[:, :, :].copy() img_yuv = rgb2yuv(image) img_rgb = yuv2rgb(img_yuv) # Difference between two image Diff = np.abs(img_rgb.astype(np.int16) - image.astype(np.int16)) for idx, color in enumerate(("Red", "Green", "Blue")): print(color) unique, counts = np.unique(Diff[:,:,idx], return_counts=True) for d, c in np.asarray((unique, counts)).T: print(d, c , '{:.3f}%'.format(c/512**2*100)) assert np.allclose(image, img_rgb, atol=3, rtol=0) ``` ``` # 與原圖誤差 Red 1 98459 37.559% 2 150671 57.476% 3 13014 4.964% Green 0 137287 52.371% 1 124857 47.629% Blue 1 76003 28.993% 2 143250 54.646% 3 42891 16.362% ``` 測試$RGB\rightarrow Y'CbCr \rightarrow RGB$，雖然肉眼分不出還原的圖與原圖差異，但從數據看來，值差異最多到3，以藍色最為明顯。造成誤差的原因是float、uint8間的轉換誤差。 # 2. 2D Discrete Cosine Transform ### DCT & IDCT > 先定義JPEG使用的DCT式子，再推導出1D-DCT和2D-DCT。 [DCT-II](https://en.wikipedia.org/wiki/Discrete_cosine_transform)為最廣泛使用的DCT形式之一，用於JPEG壓縮，DCT依據定義不同而有不同形式。DCT-II式子如下： - $X_k=2\sum_{n=0}^{N-1}x_n cos(\frac{\pi}{N}(n+\frac{1}{2})k)$ 為了在其化為矩陣時成為orthogonal matrix ，乘上 scaling factor: f 化為 orthonormalized DCT-II。 ![](https://i.imgur.com/NDlaXKU.png) DCT-III為DCT-II的inverse，orthonormalized DCT-III如下： $x_k=\frac{X_0}{\sqrt{N}}+\sqrt{\frac{2}{N}}\sum_{n=1}^{N-1}X_n cos(\frac{\pi}{N}(k+\frac{1}{2})n)$ 現在簡單驗證一下乘上$\sqrt{\frac{1}{4N}}$、$\sqrt{\frac{1}{2N}}$係數不會影響到轉換，並以`scipy.fftpack.dct`驗證結果是否一致。`scipy.fftpack.dct`有提供幾個參數，設置`norm='ortho'`使矩陣係數orthonormal。 ```python= import matplotlib.pyplot as plt from scipy.fftpack import dct, idct import numpy as np def get_Xk(Xs, k): L = len(Xs) ns = np.arange(0, L, 1) terms = 2*Xs*np.cos((np.pi*k*(1/2+ns))/L) if k == 0: terms *= np.sqrt(1/(4*L)) else: terms *= np.sqrt(1/(2*L)) return np.sum(terms) def get_xn(Xk, k): L = len(Xk) ks = np.arange(0, L, 1) terms = Xk*np.cos((np.pi*(1/2+k)*ks)/L) terms[0] *= np.sqrt(1/L) terms[1:] *= np.sqrt(2/L) return np.sum(terms) # my data original_data = np.random.rand(4) L = original_data.shape[0] print('Original data:', original_data) # DCT X = [] for k in range(L): Xk = get_Xk(original_data, k) X.append(Xk) print('After DCT:', X) assert np.allclose(X, dct(original_data, norm='ortho', axis=0)) # IDCT Y = [] for k in range(L): xn = get_xn(X, k) Y.append(xn) print('After IDCT', Y) assert np.allclose(Y, idct(X, norm='ortho', axis=0)) # Verify assert np.allclose(original_data, Y) ``` ``` Original data: [0.55236188 0.60173472 0.6432904 0.86820949] After DCT: [1.332798243756804, -0.21758228008826608, 0.08777313099349143, -0.058320189471026906] After IDCT [0.5523618833360506, 0.6017347154930212, 0.6432903972702912, 0.8682094914142449] ``` 結果符合預期。 ### 1-D DCT 試著將DCT化成矩陣形式 $X=Cx$ - $X\in R^N$，第k項元素代表$X_k$ - $C$是NxN矩陣，為轉換時的係數，稱為DCT matrix - $when\ k=0,C_{k,n}=\sqrt{\frac{1}{N}}cos{(\frac{\pi}{N}(n+\frac{1}{2})k)}$ - $when\ k>0,C_{k,n}=\sqrt{\frac{2}{N}}cos{(\frac{\pi}{N}(n+\frac{1}{2})k)}$ 隨意列舉幾個DCT matrix： $C_{2X2}=\left[ \begin{array}{ccc} 0.707 & 0.707 \\ 0.707 & -0.707 \\ \end{array} \right]$ $C_{3X3}=\left[ \begin{array}{ccc} 0.577 & 0.577 & 0.577 \\ 0.707 & 0 & -0.707 \\ 0.408 & -0.816 & 0.408 \\ \end{array} \right]$ $C_{4X4}=\left[ \begin{array}{ccc} 0.5 & 0.5 & 0.5 & 0.5 \\ 0.653 & 0.271 & -0.271 & -0.653\\ 0.5 & -0.5 & -0.5 & 0.5 \\ 0.271 & -0.653 & 0.653 & -0.271 \\ \end{array} \right]$ 不難看出$C$是[orthogonal matrix](https://en.wikipedia.org/wiki/Orthogonal_matrix)，根據orthogonal matrix的特性：$C^TC=CC^T=I \longleftrightarrow C^T=C^{-1}$。在進行反轉換時原本要求的$C^{-1}$可代換為$C^T$，得到$x=C^{-1}=C^TX$。驗證$C$的orthogonality： ```python= import matplotlib.pyplot as plt import numpy as np L = 8 C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) print('C=', np.array2string(np.round(C, 3), prefix='C=')) print('C^TC=', np.array2string(np.round(np.transpose(C) @ C, 3), prefix='C^TC=')) assert np.allclose(np.transpose(C) @ C, np.eye(8)) ``` ``` C= [[ 0.354 0.354 0.354 0.354 0.354 0.354 0.354 0.354] [ 0.49 0.416 0.278 0.098 -0.098 -0.278 -0.416 -0.49 ] [ 0.462 0.191 -0.191 -0.462 -0.462 -0.191 0.191 0.462] [ 0.416 -0.098 -0.49 -0.278 0.278 0.49 0.098 -0.416] [ 0.354 -0.354 -0.354 0.354 0.354 -0.354 -0.354 0.354] [ 0.278 -0.49 0.098 0.416 -0.416 -0.098 0.49 -0.278] [ 0.191 -0.462 0.462 -0.191 -0.191 0.462 -0.462 0.191] [ 0.098 -0.278 0.416 -0.49 0.49 -0.416 0.278 -0.098]] C^TC= [[ 1. -0. 0. 0. -0. 0. 0. -0.] [-0. 1. -0. -0. 0. -0. -0. 0.] [ 0. -0. 1. -0. -0. -0. -0. -0.] [ 0. -0. -0. 1. 0. 0. 0. 0.] [-0. 0. -0. 0. 1. -0. 0. -0.] [ 0. -0. -0. 0. -0. 1. 0. 0.] [ 0. -0. -0. 0. 0. 0. 1. 0.] [-0. 0. -0. 0. -0. 0. 0. 1.]] ``` :::success - 由於浮點數精度的問題，用`np.allclose`容忍一定範圍內的誤差 - orthogonal matrix 1. 實數矩陣 2. 方陣 3. 行列皆為orthogonal unit vectors (orthonormal vectors) ::: ### 2-D DCT 2-D DCT其中一個特性是seperable，也就是說可拆分成兩個1-D DCT 1. 垂直方向對column進行DCT 2. 水平方向對row進行DCT 以矩陣表示之 $X=CxC^T$ - $x$為image matrix - $X$為經過2-D DCT的image matrix 以8x8 DCT matrix大小為例，兩次矩陣乘法的運算量十分驚人，如果我們將$x$及$X$進行[向量化](https://en.wikipedia.org/wiki/Vectorization_(mathematics))成一維矩陣(以$vec(...)$表示)，可以用[Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product)將$C^T,C$合併成一個矩陣。Kronecker product又被稱作matrix direct product，運算子$\otimes$，不具交換性，定義如下: ![](https://i.imgur.com/uKoHYYr.png) ![](https://i.imgur.com/5gkkbl6.png) 與[向量化結合](https://en.wikipedia.org/wiki/Vectorization_(mathematics)#Compatibility_with_Kronecker_products) $vec(ABC)=(C^T\otimes A)vec(B)$ 依據以上式子改寫原式： $X=CxC^T\\ \rightarrow vec(X)=((C^T)^T\otimes C)vec(x)\\ \rightarrow vec(X)=(C\otimes C)vec(x)$ 事先計算$C\otimes C$可增加DCT轉換效率。`numpy`可協助做到Kronecker product ```python= import numpy as np A = np.array([[1,2],[3,4]]) B = np.array([[0,5],[6,7]]) C = np.kron(A,B) C = array([[ 0, 5, 0, 10], [ 6, 7, 12, 14], [ 0, 15, 0, 20], [18, 21, 24, 28]]) ``` :::info JPEG是由一小片一小片壓縮後的影像組合而成，以8x8 pixels為單位，進行量化、編碼，稱為Minimum Coded Unit (MCU)。因此2D-DCT矩陣以及量化矩陣也是8x8。 ::: 接著使用`scipy.fftpack.dct`檢驗依`dct_2`生成的8x8 2D-DCT矩陣是否一致，`dct_2`是我們的JPEG Encoder用來計算2-D DCT的函式： ```python= import matplotlib.pyplot as plt import numpy as np from scipy.fftpack import dct, idct def dct_2(input): L = input.shape[0] C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) return (np.kron(C, C) @ input.flatten()).reshape(L, L) M0 = np.random.rand(8, 8) D1 = dct_2(M0) D2 = dct(dct(M0, norm='ortho', axis=0), norm='ortho', axis=1) print('D1=', np.array2string(np.round(D1, 3), prefix='D1=')) print('D2=', np.array2string(np.round(D2, 3), prefix='D2=')) assert np.allclose(D1, D2) ``` 假設$C$為8x8: ``` C⊗C= [[ 0.125 0.125 0.125 ... 0.125 0.125 0.125 ] [ 0.1734 0.147 0.0982 ... -0.0982 -0.147 -0.1734] [ 0.1633 0.0676 -0.0676 ... -0.0676 0.0676 0.1633] ... [ 0.0271 -0.0478 0.0095 ... 0.0095 -0.0478 0.0271] [ 0.0187 -0.0451 0.0451 ... -0.0451 0.0451 -0.0187] [ 0.0095 -0.0271 0.0406 ... 0.0406 -0.0271 0.0095]] ``` # 3. Quantization 量化是為了去除視覺上不重要的資訊。quality factor越高，生成的量化矩陣(quantization table)就多元素接近1，經過DCT轉換的區塊除上量化矩陣，保留的資料就越多。各家相機廠商使用的[量化矩陣](https://www.impulseadventure.com/photo/jpeg-quantization.html)，皆有所不同。實作上由於quality factor生成的量化矩陣可能有元素為0，需要將結果限制在`[0,255]` ```python= import numpy as np # Standard JPEG quantization tables # Luminance qt_luminance = np.array([ [16, 11, 10, 16, 24, 40, 51, 61], [12, 12, 14, 19, 26, 58, 60, 55], [14, 13, 16, 24, 40, 57, 69, 56], [14, 17, 22, 29, 51, 87, 80, 62], [18, 22, 37, 56, 68, 109, 103, 77], [24, 35, 55, 64, 81, 104, 113, 92], [49, 64, 78, 87, 103, 121, 120, 101], [72, 92, 95, 98, 112, 100, 103, 99], ]) # Chrominance qt_chrominance = np.array([ [17, 18, 24, 47, 99, 99, 99, 99], [18, 21, 26, 66, 99, 99, 99, 99], [24, 26, 56, 99, 99, 99, 99, 99], [47, 66, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], ]) def Quantize(input, qf, jpeg_quantiz_matrix): # Quality factor = 1...100 factor = 5000/qf if (qf < 50) else 200-2*qf # Prevent divide by 0 error jpeg_quantiz_matrix = np.maximum(np.floor(jpeg_quantiz_matrix*factor/100+.5), np.ones((8, 8))) print("QF={}".format(qf)) print(np.array2string(np.round(jpeg_quantiz_matrix, 3), prefix='')) return (input // jpeg_quantiz_matrix) # '//' is the floor division operator Quantize(np.eye(8), 100, qt_luminance) Quantize(np.eye(8), 95, qt_luminance) Quantize(np.eye(8), 80, qt_luminance) ``` Standard JPEG quantization tables (Luminance)在不同quality factor下的量化矩陣 ``` QF=100 [[1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1.]] QF=95 [[ 2. 1. 1. 2. 2. 4. 5. 6.] [ 1. 1. 1. 2. 3. 6. 6. 6.] [ 1. 1. 2. 2. 4. 6. 7. 6.] [ 1. 2. 2. 3. 5. 9. 8. 6.] [ 2. 2. 4. 6. 7. 11. 10. 8.] [ 2. 4. 6. 6. 8. 10. 11. 9.] [ 5. 6. 8. 9. 10. 12. 12. 10.] [ 7. 9. 10. 10. 11. 10. 10. 10.]] QF=80 [[ 6. 4. 4. 6. 10. 16. 20. 24.] [ 5. 5. 6. 8. 10. 23. 24. 22.] [ 6. 5. 6. 10. 16. 23. 28. 22.] [ 6. 7. 9. 12. 20. 35. 32. 25.] [ 7. 9. 15. 22. 27. 44. 41. 31.] [10. 14. 22. 26. 32. 42. 45. 37.] [20. 26. 31. 35. 41. 48. 48. 40.] [29. 37. 38. 39. 45. 40. 41. 40.]] ``` # Reference - [JPEG Huffman Coding Tutorial](https://www.impulseadventure.com/photo/jpeg-huffman-coding.html)