Try   HackMD

JPEG Decoder (1/3)

簡介

Baseline JPEG Baseline編碼時依序將經過下面流程:

  1. RGB to Y'CbCr
  2. 2-D DCT
  3. Quantization
  4. Huffman encoding

最後Header、Huffman table、Quantization table和資料一起包裝成jpg。


出自An Introduction to Image Compression

numpy進行矩陣運算

import numpy as np A = A @ B # matrix multiplication A = np.dot(A,B) # matrix multiplication A = np.multiply(A,B) # element-wise matrix multiplication A = A.astype(np.uint8) # Copy the array and cast to uint8 type A = A.reshape((3,3))

1. RGB-to-Y'CbCr

參考 YCbCr: JPEG conversion,LenaRGB.raw 為 512x512 24bit RGB影像。

import numpy as np def rgb2yuv(rgb): rgb = rgb.astype(np.float64) m = np.array([ [0.29900, -0.168736, 0.5], [0.58700, -0.331264, -0.418688], [0.11400, 0.5, -0.081312] ]) yuv = np.dot(rgb, m) yuv[:, :, 1:] += 128 return yuv.astype(np.uint8) def yuv2rgb(yuv): yuv = yuv.astype(np.float64) yuv[:, :, 1:] -= 128 m = np.array([ [1.000, 1.000, 1.000], [0.000, -0.344136, 1.772], [1.402, -0.714136, 0.000], ]) rgb = np.dot(yuv, m) return rgb.astype(np.uint8) with open("./LenaRGB.raw", "rb") as f: original = [f.read(1)[0] for i in range(512*512*3)] original = np.array(original, dtype=np.uint8).reshape((512, 512, 3)) image = original[:, :, :].copy() img_yuv = rgb2yuv(image) img_rgb = yuv2rgb(img_yuv) # Difference between two image Diff = np.abs(img_rgb.astype(np.int16) - image.astype(np.int16)) for idx, color in enumerate(("Red", "Green", "Blue")): print(color) unique, counts = np.unique(Diff[:,:,idx], return_counts=True) for d, c in np.asarray((unique, counts)).T: print(d, c , '{:.3f}%'.format(c/512**2*100)) assert np.allclose(image, img_rgb, atol=3, rtol=0)
# 與原圖誤差
Red
1 98459 37.559%
2 150671 57.476%
3 13014 4.964%
Green
0 137287 52.371%
1 124857 47.629%
Blue
1 76003 28.993%
2 143250 54.646%
3 42891 16.362%

測試

RGBYCbCrRGB,雖然肉眼分不出還原的圖與原圖差異,但從數據看來,值差異最多到3,以藍色最為明顯。造成誤差的原因是float、uint8間的轉換誤差。

2. 2D Discrete Cosine Transform

DCT & IDCT

先定義JPEG使用的DCT式子,再推導出1D-DCT和2D-DCT。

DCT-II為最廣泛使用的DCT形式之一,用於JPEG壓縮,DCT依據定義不同而有不同形式。DCT-II式子如下:

  • Xk=2n=0N1xncos(πN(n+12)k)

為了在其化為矩陣時成為orthogonal matrix ,乘上 scaling factor: f 化為 orthonormalized DCT-II。

DCT-III為DCT-II的inverse,orthonormalized DCT-III如下:

xk=X0N+2Nn=1N1Xncos(πN(k+12)n)

現在簡單驗證一下乘上

14N
12N
係數不會影響到轉換,並以scipy.fftpack.dct驗證結果是否一致。scipy.fftpack.dct有提供幾個參數,設置norm='ortho'使矩陣係數orthonormal。

import matplotlib.pyplot as plt from scipy.fftpack import dct, idct import numpy as np def get_Xk(Xs, k): L = len(Xs) ns = np.arange(0, L, 1) terms = 2*Xs*np.cos((np.pi*k*(1/2+ns))/L) if k == 0: terms *= np.sqrt(1/(4*L)) else: terms *= np.sqrt(1/(2*L)) return np.sum(terms) def get_xn(Xk, k): L = len(Xk) ks = np.arange(0, L, 1) terms = Xk*np.cos((np.pi*(1/2+k)*ks)/L) terms[0] *= np.sqrt(1/L) terms[1:] *= np.sqrt(2/L) return np.sum(terms) # my data original_data = np.random.rand(4) L = original_data.shape[0] print('Original data:', original_data) # DCT X = [] for k in range(L): Xk = get_Xk(original_data, k) X.append(Xk) print('After DCT:', X) assert np.allclose(X, dct(original_data, norm='ortho', axis=0)) # IDCT Y = [] for k in range(L): xn = get_xn(X, k) Y.append(xn) print('After IDCT', Y) assert np.allclose(Y, idct(X, norm='ortho', axis=0)) # Verify assert np.allclose(original_data, Y)
Original data: [0.55236188 0.60173472 0.6432904  0.86820949]
After DCT: [1.332798243756804, -0.21758228008826608, 0.08777313099349143, -0.058320189471026906]
After IDCT [0.5523618833360506, 0.6017347154930212, 0.6432903972702912, 0.8682094914142449]

結果符合預期。

1-D DCT

試著將DCT化成矩陣形式

X=Cx

  • XRN
    ,第k項元素代表
    Xk
  • C
    是NxN矩陣,為轉換時的係數,稱為DCT matrix
    • when k=0,Ck,n=1Ncos(πN(n+12)k)
    • when k>0,Ck,n=2Ncos(πN(n+12)k)

隨意列舉幾個DCT matrix:

C2X2=[0.7070.7070.7070.707]

C3X3=[0.5770.5770.5770.70700.7070.4080.8160.408]

C4X4=[0.50.50.50.50.6530.2710.2710.6530.50.50.50.50.2710.6530.6530.271]

不難看出

Corthogonal matrix,根據orthogonal matrix的特性:
CTC=CCT=ICT=C1

在進行反轉換時原本要求的
C1
可代換為
CT
,得到
x=C1=CTX

驗證

C的orthogonality:

import matplotlib.pyplot as plt import numpy as np L = 8 C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) print('C=', np.array2string(np.round(C, 3), prefix='C=')) print('C^TC=', np.array2string(np.round(np.transpose(C) @ C, 3), prefix='C^TC=')) assert np.allclose(np.transpose(C) @ C, np.eye(8))
C= [[ 0.354  0.354  0.354  0.354  0.354  0.354  0.354  0.354]
   [ 0.49   0.416  0.278  0.098 -0.098 -0.278 -0.416 -0.49 ]
   [ 0.462  0.191 -0.191 -0.462 -0.462 -0.191  0.191  0.462]
   [ 0.416 -0.098 -0.49  -0.278  0.278  0.49   0.098 -0.416]
   [ 0.354 -0.354 -0.354  0.354  0.354 -0.354 -0.354  0.354]
   [ 0.278 -0.49   0.098  0.416 -0.416 -0.098  0.49  -0.278]
   [ 0.191 -0.462  0.462 -0.191 -0.191  0.462 -0.462  0.191]
   [ 0.098 -0.278  0.416 -0.49   0.49  -0.416  0.278 -0.098]]
C^TC= [[ 1. -0.  0.  0. -0.  0.  0. -0.]
      [-0.  1. -0. -0.  0. -0. -0.  0.]
      [ 0. -0.  1. -0. -0. -0. -0. -0.]
      [ 0. -0. -0.  1.  0.  0.  0.  0.]
      [-0.  0. -0.  0.  1. -0.  0. -0.]
      [ 0. -0. -0.  0. -0.  1.  0.  0.]
      [ 0. -0. -0.  0.  0.  0.  1.  0.]
      [-0.  0. -0.  0. -0.  0.  0.  1.]]
  • 由於浮點數精度的問題,用np.allclose容忍一定範圍內的誤差
  • orthogonal matrix
    1. 實數矩陣
    2. 方陣
    3. 行列皆為orthogonal unit vectors (orthonormal vectors)

2-D DCT

2-D DCT其中一個特性是seperable,也就是說可拆分成兩個1-D DCT

  1. 垂直方向對column進行DCT
  2. 水平方向對row進行DCT

以矩陣表示之

X=CxCT

  • x
    為image matrix
  • X
    為經過2-D DCT的image matrix

以8x8 DCT matrix大小為例,兩次矩陣乘法的運算量十分驚人,如果我們將

x
X
進行向量化成一維矩陣(以
vec(...)
表示),可以用Kronecker product
CT,C
合併成一個矩陣。Kronecker product又被稱作matrix direct product,運算子
,不具交換性,定義如下:


向量化結合

vec(ABC)=(CTA)vec(B)

依據以上式子改寫原式:

X=CxCTvec(X)=((CT)TC)vec(x)vec(X)=(CC)vec(x)

事先計算

CC可增加DCT轉換效率。numpy可協助做到Kronecker product

import numpy as np A = np.array([[1,2],[3,4]]) B = np.array([[0,5],[6,7]]) C = np.kron(A,B) C = array([[ 0, 5, 0, 10], [ 6, 7, 12, 14], [ 0, 15, 0, 20], [18, 21, 24, 28]])

JPEG是由一小片一小片壓縮後的影像組合而成,以8x8 pixels為單位,進行量化、編碼,稱為Minimum Coded Unit (MCU)。因此2D-DCT矩陣以及量化矩陣也是8x8。

接著使用scipy.fftpack.dct檢驗依dct_2生成的8x8 2D-DCT矩陣是否一致,dct_2是我們的JPEG Encoder用來計算2-D DCT的函式:

import matplotlib.pyplot as plt import numpy as np from scipy.fftpack import dct, idct def dct_2(input): L = input.shape[0] C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) return (np.kron(C, C) @ input.flatten()).reshape(L, L) M0 = np.random.rand(8, 8) D1 = dct_2(M0) D2 = dct(dct(M0, norm='ortho', axis=0), norm='ortho', axis=1) print('D1=', np.array2string(np.round(D1, 3), prefix='D1=')) print('D2=', np.array2string(np.round(D2, 3), prefix='D2=')) assert np.allclose(D1, D2)

假設

C為8x8:

C⊗C= [[ 0.125   0.125   0.125  ...  0.125   0.125   0.125 ]
     [ 0.1734  0.147   0.0982 ... -0.0982 -0.147  -0.1734]
     [ 0.1633  0.0676 -0.0676 ... -0.0676  0.0676  0.1633]
     ...
     [ 0.0271 -0.0478  0.0095 ...  0.0095 -0.0478  0.0271]
     [ 0.0187 -0.0451  0.0451 ... -0.0451  0.0451 -0.0187]
     [ 0.0095 -0.0271  0.0406 ...  0.0406 -0.0271  0.0095]]

3. Quantization

量化是為了去除視覺上不重要的資訊。quality factor越高,生成的量化矩陣(quantization table)就多元素接近1,經過DCT轉換的區塊除上量化矩陣,保留的資料就越多。各家相機廠商使用的量化矩陣,皆有所不同。

實作上由於quality factor生成的量化矩陣可能有元素為0,需要將結果限制在[0,255]

import numpy as np # Standard JPEG quantization tables # Luminance qt_luminance = np.array([ [16, 11, 10, 16, 24, 40, 51, 61], [12, 12, 14, 19, 26, 58, 60, 55], [14, 13, 16, 24, 40, 57, 69, 56], [14, 17, 22, 29, 51, 87, 80, 62], [18, 22, 37, 56, 68, 109, 103, 77], [24, 35, 55, 64, 81, 104, 113, 92], [49, 64, 78, 87, 103, 121, 120, 101], [72, 92, 95, 98, 112, 100, 103, 99], ]) # Chrominance qt_chrominance = np.array([ [17, 18, 24, 47, 99, 99, 99, 99], [18, 21, 26, 66, 99, 99, 99, 99], [24, 26, 56, 99, 99, 99, 99, 99], [47, 66, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], [99, 99, 99, 99, 99, 99, 99, 99], ]) def Quantize(input, qf, jpeg_quantiz_matrix): # Quality factor = 1...100 factor = 5000/qf if (qf < 50) else 200-2*qf # Prevent divide by 0 error jpeg_quantiz_matrix = np.maximum(np.floor(jpeg_quantiz_matrix*factor/100+.5), np.ones((8, 8))) print("QF={}".format(qf)) print(np.array2string(np.round(jpeg_quantiz_matrix, 3), prefix='')) return (input // jpeg_quantiz_matrix) # '//' is the floor division operator Quantize(np.eye(8), 100, qt_luminance) Quantize(np.eye(8), 95, qt_luminance) Quantize(np.eye(8), 80, qt_luminance)

Standard JPEG quantization tables (Luminance)在不同quality factor下的量化矩陣

QF=100
[[1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]]

QF=95
[[ 2.  1.  1.  2.  2.  4.  5.  6.]
 [ 1.  1.  1.  2.  3.  6.  6.  6.]
 [ 1.  1.  2.  2.  4.  6.  7.  6.]
 [ 1.  2.  2.  3.  5.  9.  8.  6.]
 [ 2.  2.  4.  6.  7. 11. 10.  8.]
 [ 2.  4.  6.  6.  8. 10. 11.  9.]
 [ 5.  6.  8.  9. 10. 12. 12. 10.]
 [ 7.  9. 10. 10. 11. 10. 10. 10.]]

QF=80
[[ 6.  4.  4.  6. 10. 16. 20. 24.]
 [ 5.  5.  6.  8. 10. 23. 24. 22.]
 [ 6.  5.  6. 10. 16. 23. 28. 22.]
 [ 6.  7.  9. 12. 20. 35. 32. 25.]
 [ 7.  9. 15. 22. 27. 44. 41. 31.]
 [10. 14. 22. 26. 32. 42. 45. 37.]
 [20. 26. 31. 35. 41. 48. 48. 40.]
 [29. 37. 38. 39. 45. 40. 41. 40.]]

Reference