---
###### tags: `DataCompression`
---
# JPEG Decoder (3/3)
本篇介紹
1. Quantization table
2. Huffman table
3. Encoded data
在JPEG檔案內排列的順序,範例參考[Understanding and Decoding a JPEG Image using Python](https://yasoob.me/posts/understanding-and-writing-jpeg-decoder-in-python/#huffman-encoding),參考規格書補漏,並實作JPEG decoder。
JPEG有4種壓縮格式
- Baseline
- Extended Sequential
- Progressive
- Loseless
每種壓縮格式的佈局都不一,我們預設圖片為**Baseline** JPEG。
![](https://i.imgur.com/DwAuA7b.png)
上方為[Ange Albertini](https://yasoob.me/posts/understanding-and-writing-jpeg-decoder-in-python/#huffman-encoding)的圖,這顯示出JPEG內重要的區段(segment),大部分的標記(marker)後面接著不含標記的區段長度,較為重要的標記有:
Hex|Marker|Marker Name|Description
-|-|-|-
0xFFD8|SOI|Start of Image|
0xFFE0|APP0|Application Segment 0|JFIF - JFIF JPEG image / AVI1 - Motion JPEG (MJPG)
0xFFDB|DQT|Define Quantization Table |
0xFFC0|SOF|Start of Frame 0|Baseline DCT
0xFFC4|DHT|Define Huffman Table|
0xFFDA|SOS|Start of Scan|
0xFFD9|EOI|End of Image|
# 1. File Start & File End
- `SOI`: 圖檔起始
- `EOI`: 圖檔結束
`SOS`並沒有提及壓縮資料的長度,所以解碼器會一直讀取直到`EOI`。
嘗試寫個簡單的decoder,除了讀到`SOI`、`EOI`要進行特別處理,其餘只要遵循讀完2 bytes標記,再讀2 bytes的區段長度的規則:
```python=
from struct import unpack
marker_mapping = {
0xffd8: "Start of Image",
0xffe0: "Application Default Header",
0xffdb: "Quantization Table",
0xffc0: "Start of Frame",
0xffc4: "Define Huffman Table",
0xffda: "Start of Scan",
0xffd9: "End of Image"
}
class JPEG:
def __init__(self, image_file):
with open(image_file, 'rb') as f:
self.img_data = f.read()
def decode(self):
data = self.img_data
while(True):
marker, = unpack(">H", data[0:2])
print(marker_mapping.get(marker))
if marker == 0xffd8: # soi
data = data[2:]
elif marker == 0xffd9: # eoi
return
elif marker == 0xffda: # sos
self.decodeSOS(data[2:-2])
data = data[-2:]
else:
lenchunk, = unpack(">H", data[2:4])
chunk = data[4:2+lenchunk]
data = data[2+lenchunk:]
if marker == 0xffc4:
self.decodeHuffmanTable(chunk)
elif marker == 0xffdb:
self.DefineQuantizationTables(chunk)
elif marker == 0xffc0:
self.decodeFrameHeader(chunk)
if len(data) == 0:
break
if __name__ == "__main__":
img = JPEG('profile.jpg')
img.decode()
# OUTPUT:
# Start of Image
# Application Default Header
# Quantization Table
# Quantization Table
# Start of Frame
# Huffman Table
# Huffman Table
# Huffman Table
# Huffman Table
# Start of Scan
# End of Image
```
# 2. Application Default Header (`0xFFE0`)
略過,不影響解碼
# 3. Define Quantization Table, DQT(`0xFFDB`)
[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 39-40
![](https://i.imgur.com/5XMc5gU.png)
![](https://i.imgur.com/n83GffK.png)
Baseline:
Parameter | Value
-|-
Lq|2+65n
Pq|0 (8-bit $Q_k$)
Tq|0~3,(0: Luminance, 1:Chrominance)
實際佈局
```
Lq | LqPq | Qk...
0x43 0x01 0x01 0x02 0x03
```
### 3.1 Decode DQT
```python3=
from struct import *
def GetArray(type,l, length):
s = ""
for i in range(length):
s =s+type
return list(unpack(s,l[:length]))
class JPEG:
# ------
def DefineQuantizationTables(self, data):
offset = 0
while offset < len(data):
id, = unpack("B", data[offset: offset+1])
self.quant[id] = GetArray("B", data[offset + 1:offset + 65], 64)
print("#{} Table".format(id))
print("Elements: ", self.quant[id])
offset += 65
def decode(self):
self.DefineQuantizationTables(chunk)
```
### 3.2 Encode DQT
使用全是1的quantization table進行測試
```python=
def encodeQuantizationTables(self, hdr):
# DQT, Lq, (Pq|Tq), Qk
return pack('>HHB'+"B"*64, 0xffdb, 2+1+64, hdr, *self.quant[hdr])
jpeg.quant[0] = np.ones(64, dtype='uint8').tobytes()
print('DQT: ', jpeg.encodeQuantizationTables(0))
```
```
# b'\xff\xdb\x00C\x00\x01\x01\x01\x01.......\x01'
```
Quantization table可使用預設的表,加上quality factor
```python=
qf = 90
factor = 5000/qf if (qf < 50) else 200-2*qf
q = load_quantization_table('lum')
q = np.clip(q*factor/100,1,255)
```
使用方式如下:
```
(img // _jpeg_quantiz_matrix)
```
# 4. Start of Frame, SOF(`0xFFC0`)
[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 35-36
![](https://i.imgur.com/n6ujDPm.png)
![](https://i.imgur.com/8nNSJt8.png)
---
![](https://i.imgur.com/uZEn8WC.png)
與上面的圖交戶對照
Parameter|Description|Value (For Baseline DCT)
-|-|-
SOFn|Marker|0xffc0
Lf|Frame header length|Lf+P+Y+X+Nf=8 bytes, Ci+Hi+Vi+Tqi=3 bytes, 8+3*3(Lf)=17
P|Sample precision|8
Y|Number of lines (Height)|2
X|Number of sample per line (Width)|6
Nf|Number of image components in frame|Y, Cb, Cr => Nf=3
Ci|Component identifier|1~3 (1=Y, 2=Cb, 3=Cr, 4=I, 5=Q)
Hi|Horizontal sampling factor|略
Vi|Vertical sampling factor|略
Tqi|Quantization table destination selector, component所對應的量化表|Y對應到0,CbCr共用一張表對應到1
關於`Y`, `X`, `Hi`, `Vi`的意義,還需要參照[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf) A.1.1
![](https://i.imgur.com/ruMaanx.png)
假設圖片512x512且有3個components (`Y`, `Cb`, `Cr`),要求出每個component是由多少個`xi*yi`的sample組成(sample就是我們常在說構成圖片的"點",在Baseline中1 sample=8-bit):
```
Component 0 H0=4 V0=1
COmponent 1 H1=2 V1=2
Component 2 H2=1 V2=1
```
`X=512`, `Y=512`, `Hmax=4`, `Vmax=2`, 那每個component的`xi`, `yi`會由decoder計算,分別為:
```
Component 0 x0=512 y0=256
Component 1 x1=256 y1=512
Component 2 x2=128 y2=256
```
所以在上方表格中,對應的`xi`, `yi`:
```
Y=2 X=6 Hmax=2 Vmax=2
Component 1 H1=2 V1=2 x1=6 y1=2
COmponent 2 H2=1 V2=1 => x2=3 y2=1
Component 3 H3=1 V3=1 x3=3 y2=1
```
會發現chroma儲存的資訊水平、垂直上都減半,也就是[YUV 4:2:0](https://www.impulseadventure.com/photo/jpeg-decoder.html)
```
MCU1 | MCU2
Y00 Y10 Y01 Y11 Cb Cr | Y...
```
### 4.1 Decode Frame Header
```python=
subsample_mapping = {
"11": "4:4:4",
"21": "4:2:2",
"41": "4:1:1",
"22": "4:2:0"
}
class JPEG:
def decodeFrameHeader(self, data):
# BaselineDCT
precison, self.height, self.width, self.components = unpack(
">BHHB", data[0:6])
for i in range(self.components):
id, factor, QtbId = unpack("BBB", data[6+i*3:9+i*3])
h, v = (factor >> 4, factor & 0x0F)
self.horizontalFactor.append(h)
self.verticalFactor.append(v)
self.quantMapping.append(QtbId)
print("size {}x{}".format(self.width, self.height))
print("subsampling {}".format(subsample_mapping.get("{}{}".format(
int(max(self.horizontalFactor)/self.horizontalFactor[0]),
int(max(self.verticalFactor)/self.verticalFactor[0])
))))
```
### 4.2 Encode Frame Header
```python=
def encodeFrameHeader(self):
# BaselineDCT, 4:4:4, 3 components
C = []
for i in range(3):
C.append(i+1)
C.append(1 << 4 | 1)
C.append(self.quantMapping[i])
return pack('>HHBHHB'+'BBB'*3, 0xffc0, 8+3*3, 8, self.height, self.width, 3, *C)
```
輸出一小段測試
```python=
img.quantMapping=[0,1,1]
img.height=2
img.width=6
print(img.encodeFrameHeader()) # b'\xff\xc0\x00\x11\x08\x00\x02\x00\x06\x03\x01\x11\x00\x02\x11\x01\x03\x11\x01'
```
# 5. Define Huffman Table, DHT(`0xFFC4`)
[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 40-41
![](https://i.imgur.com/GyvQgYt.png)
![](https://i.imgur.com/KIiJTmd.png)
Parameter|Description|Value (For Baseline DCT)
-|-|-
Lh|Huffman table definition length|2+17+mt
Tc|Table class|0=DC table, 1=AC table
Th|Huffman table destination identifier|0-1 (0=Luminance, 1=Chroma)。其實沒有規範,只要與Quantization table和Scan header的destination對應起來就好。
Li|Number of Huffman codes of length i|0-255
Vij|Value associated with each Huffman code |0-255
先前提到JPEG將Huffman table拆成codeword length、decoded value兩部份存,合計16+n bytes,就是在講Li、Vij。
```
01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 | 01 02 03 04 05 06 07 08 09 0A 0B
codeword length | decoded value
```
回顧範例,試著寫encoder/decoder。
![](https://i.imgur.com/4blWV32.png)
:::warning
Standard允許一個marker後方接多組Huffman table
:::
### 5.1 Decode Huffman Table
Huffman decoder的部份覺得[micro-jpeg-visualizer](https://github.com/aguaviva/micro-jpeg-visualizer)寫的不錯,索幸拿來用,整份程式只有250行不依賴任何套件。
```python=
class HuffmanTable:
def __init__(self):
self.root=[]
self.elements = []
def BitsFromLengths(self, root, element, pos):
if isinstance(root,list):
if pos==0:
if len(root)<2:
root.append(element)
return True
return False
for i in [0,1]:
if len(root) == i:
root.append([])
if self.BitsFromLengths(root[i], element, pos-1) == True:
return True
return False
def GetHuffmanBits(self, lengths, elements):
self.elements = elements
ii = 0
for i in range(len(lengths)):
for j in range(lengths[i]):
self.BitsFromLengths(self.root, elements[ii], i)
ii+=1
def Find(self,st):
r = self.root
while isinstance(r, list):
r=r[st.GetBit()]
return r
def GetCode(self, st):
while(True):
res = self.Find(st)
if res == 0:
return 0
elif ( res != -1):
return res
```
`GetCode` 需要逐位元讀取data,所以另外建立`Stream`以便循序讀取資料上的每一位元
```python=
class Stream:
def __init__(self, data):
self.data= data
self.pos = 0
def GetBit(self):
b = self.data[self.pos >> 3]
s = 7-(self.pos & 0x7)
self.pos+=1
return (b >> s) & 1
def GetBitN(self, l):
val = 0
for i in range(l):
val = val*2 + self.GetBit()
return val
```
假設在Huffman table中`0b000`對應0x04:
```python
st = Stream([0x00]) #0b000000011
print('0b000: ', hf.GetCode(st)) # 4
print('0b000: ', hf.GetCode(st)) # 4
```
將`length`, `elements`解出來丟進`HuffmanTable()`建表得到Huffman tree - `root`,便能用`GetCode`將codeword解碼。
```python=
def decodeHuffmanTable(self, data):
offset = 0
tbcount = 0
while offset < len(data):
tcth, = unpack("B", data[offset:offset+1])
tc, th = (tcth >> 4, tcth & 0x0F)
offset += 1
# Extract the 16 bytes containing length data
lengths = unpack("BBBBBBBBBBBBBBBB", data[offset:offset+16])
offset += 16
# Extract the elements after the initial 16 bytes
elements = []
for i in lengths:
elements += (unpack("B"*i, data[offset:offset+i]))
offset += i
print("#{}: {} {} Table".format(tbcount, "Luma" if th ==
0 else "Chroma", "DC" if tc == 0 else "AC"))
print("lengths: ", lengths)
print("Elements: ", elements)
# tc = 0(DC), 1(AC)
# th = 0(Luminance), 1(Chroma)
hf = HuffmanTable()
hf.GetHuffmanBits(lengths, elements)
self.huffman_tables[tc << 1 | th] = hf
tbcount += 1
```
```
Define Huffman Table
Luma AC Table
lengths: (0, 1, 4, 1, 3, 4, 2, 3, 0, 1, 5, 1, 0, 0, 0, 0)
Elements: [3, 1, 2, 4, 5, 6, 7, 17, 18, 0, 8, 19, 33, 20, 34, 21, 35, 49, 50, 9, 22, 36, 65, 81, 23]
Define Huffman Table
Chroma DC Table
lengths: (0, 2, 3, 1, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Elements: [6, 7, 4, 5, 8, 3, 0, 1, 2, 9]
Define Huffman Table
Chroma AC Table
lengths: (0, 2, 1, 3, 3, 3, 2, 5, 3, 4, 1, 4, 3, 0, 0, 0)
Elements: [1, 2, 3, 4, 5, 17, 0, 18, 33, 6, 49, 65, 19, 34, 7, 50, 81, 97, 113, 20, 35, 129, 21, 66, 145, 177, 161, 22, 37, 82, 209, 130, 193, 241]
```
### 5.2 Encode Huffman Table
```python=
def encodeHuffmanTable(self, matrix, tc, th):
L = []
V = []
Lh = 5 + 16 + len(V)
tcth = tc << 4 | th
return pack('>HHB'+'B'*16+'B'*len(V), 0xffc4, Lh, tcth, *L, *V)
```
# 6. Start of Scan, SOS (`0xFFDA`)
[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 37-38
![](https://i.imgur.com/Ts5Y0ht.png)
![](https://i.imgur.com/aznVGVf.png)
### 6.1 Decode SOS
在進行解碼前,先對一些函數進行修改。
JPEG在色域轉換後會減去128才進行DCT,將`[0, 255]`對應到`[-128, 127]`,因此解碼時需先補上128。
```python=
# Level-shifting
for i in range(3):
self.img[i] += 128
```
#### 6.1.1 `Image`
儲存圖像資料的物件`Image`只有在進行色域轉換時才以3維陣列處理,平時以1維陣列`img`儲存Y、Cb、Cr的2維資料。`DrawMatrix`將已解碼完成的部份資料渲染到整塊畫布。
```python3=
class Image:
def __init__(self, height, width):
self.width = width
self.height = height
# [0] = Y, [1] = Cb, [2]=Cr
self.img = []
for i in range(3):
self.img.append(np.zeros((self.height, self.width)))
def ycbcr2rgb(self):
# Level-shifting
for i in range(3):
self.img[i] += 128
# Merge multiple 2D arrays into a 3D array
self.img = np.dstack(tuple([i for i in self.img]))
# Convert image from YCbCr to RGB
self.img[:, :, 1:] -= 128
m = np.array([
[1.000, 1.000, 1.000],
[0.000, -0.344136, 1.772],
[1.402, -0.714136, 0.000],
])
rgb = np.dot(self.img, m)
self.img = np.clip(rgb, 0, 255)
# Convert a 3D array to multiple 2D arrays
self.img = [self.img[:, :, i] for i in range(3)]
def DrawMatrix(self, y, x, L, Cb, Cr):
_x = x*8
_y = y*8
self.img[0][_y:_y+8, _x:_x+8] = L.reshape((8, 8))
self.img[1][_y:_y+8, _x:_x+8] = Cb.reshape((8, 8))
self.img[2][_y:_y+8, _x:_x+8] = Cr.reshape((8, 8))
```
#### 6.1.2 `decodeSOS`
> To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the `0xFF` byte, followed by a non-zero byte). If the huffman coding scheme needed to write a `0xFF` byte, then it writes a `0xFF` followed by a `0x00` -- a process known as adding a stuff byte.
JPEG允許scan data出現marker(`0xFFXX`)來增加容錯性,為了和marker進行區別,寫入`0xFF`的資料會寫入成`0xFF00`,這是在進行Huffman decoding前要進行留意到的細節。
解碼流程:
1. 讀取各component的Huffman table id、quantization table id
2. 取代`0xFF00`為`0xFF`
3. 依照MCU內的component順序呼叫`BuildMatrix`查表解碼,在YCbCr 4:4:4下就是Y, Cb, Cr, Y, Cb, Cr...
4. 解碼後的component交給`DrawMatrix`渲染在畫布`img`上
```python=
def decodeSOS(self, data):
# BaselineDCT
ls, ns = unpack(">HB", data[0:3])
csj = unpack("BB"*ns, data[3:3+2*ns])
dcTableMapping = []
acTableMapping = []
for i in range(3):
dcTableMapping.append(csj[i*2+1] >> 4)
acTableMapping.append(csj[i*2+1] & 0x0F)
data = data[6+2*ns:]
# Replace 0xFF00 with 0xFF
i = 0
while i < len(data) - 1:
m, = unpack(">H", data[i:i+2])
if m == 0xff00:
data = data[:i+1]+data[i+2:]
i = i + 1
img = Image(self.height, self.width)
st = Stream(data)
oldlumdccoeff, oldCbdccoeff, oldCrdccoeff = 0, 0, 0
for y in range(self.height//8):
for x in range(self.width//8):
matL, oldlumdccoeff = self.BuildMatrix(
st, dcTableMapping[0], acTableMapping[0], self.quant[self.quantMapping[0]], oldlumdccoeff)
matCb, oldCrdccoeff = self.BuildMatrix(
st, dcTableMapping[1], acTableMapping[1], self.quant[self.quantMapping[1]], oldCrdccoeff)
matCr, oldCbdccoeff = self.BuildMatrix(
st, dcTableMapping[2], acTableMapping[2], self.quant[self.quantMapping[2]], oldCbdccoeff)
img.DrawMatrix(y, x, matL.base, matCb.base, matCr.base)
img.ycbcr2rgb()
self.decoded_data = img.img
```
#### 6.1.3 `BuildMatrix`
`BuildMatrix`的功能是重排DC/AC coeficient的順序,並進行inverse DCT。
[micro-jpeg-visualizer](https://github.com/aguaviva/micro-jpeg-visualizer)的實作有兩處值得檢視:
1. 假如DC coefficient解碼後的係數是`0`(`EOB`),那就沒有必要讀addition bits的必要。在`GetBitN`的實作中,傳入`0`不會從`Stream`讀取任何位元,而是返回`0`;`DecodeNumber(0, 0)`也是直接返回`0`,這樣的實作方式省去判斷`code`是否為0。
```python3
bits = st.GetBitN(code)
dccoeff = DecodeNumber(code, bits) + olddccoeff
i.base[0] = dccoeff
```
2. 在AC coefficient中,讀到ZRL(`0xF0`)要將offset(`l`)加上16並讀取下一組,但實作上是分成 +15、+1兩次進行,也省下判斷式。
```python3=
if code > 15:
l += code >> 4 # +15
code = code & 0x0F
bits = st.GetBitN(code)
if l < 64:
coeff = DecodeNumber(code, bits)
# print(coeff)
i.base[l] = coeff * quant[l]
l += 1 # +1
```
為了比較好釐清JPEG的實作,這邊刻意改寫成:
```python3=
if code == 0xF0:
# ZRL
l += 16 # +16
continue
elif code > 15:
l += code >> 4
code = code & 0x0F
bits = st.GetBitN(code)
if l < 64:
coeff = DecodeNumber(code, bits)
i.base[l] = coeff
l += 1
```
自己沒有注意到兩處小技巧在改寫時忽略掉幾點,解碼時在幾個MCU後解碼錯誤,因為是Bit-stream非常難除錯,最後還是靠比對原始程式碼發現這點。整體程式碼如下:
```python=
class JPEG:
def BuildMatrix(self, st, dcTableId, acTableId, quant, olddccoeff):
i = DCT()
code = self.huffman_tables[0b00 | dcTableId].GetCode(st)
bits = st.GetBitN(code)
dccoeff = DecodeNumber(code, bits) + olddccoeff
i.base[0] = dccoeff
l = 1
while l < 64:
code = self.huffman_tables[0b10 | acTableId].GetCode(st)
if code == 0:
# EOB
break
if code == 0xF0:
# ZRL
l += 16
continue
elif code > 15:
l += code >> 4
code = code & 0x0F
bits = st.GetBitN(code)
if l < 64:
coeff = DecodeNumber(code, bits)
i.base[l] = coeff
l += 1
i.base = np.multiply(i.base, quant)
i.rearrange_using_zigzag()
i.perform_IDCT()
return i, dccoeff
```
#### 6.1.4 `DCT`
實作前一章的內容。
```python=
class DCT():
def __init__(self):
self.base = np.zeros(64)
self.zigzag = np.array([
[0, 1, 5, 6, 14, 15, 27, 28],
[2, 4, 7, 13, 16, 26, 29, 42],
[3, 8, 12, 17, 25, 30, 41, 43],
[9, 11, 18, 24, 31, 40, 44, 53],
[10, 19, 23, 32, 39, 45, 52, 54],
[20, 22, 33, 38, 46, 51, 55, 60],
[21, 34, 37, 47, 50, 56, 59, 61],
[35, 36, 48, 49, 57, 58, 62, 63],
]).flatten()
# Generate 2D-DCT matrix
L = 8
C = np.zeros((L, L))
for k in range(L):
for n in range(L):
C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
if k != 0:
C[k, n] *= np.sqrt(2)
self.dct = C
def perform_DCT(self):
self.base = np.kron(self.dct, self.dct) @ self.base
def perform_IDCT(self):
self.base = np.kron(self.dct.transpose(),
self.dct.transpose()) @ self.base
def rearrange_using_zigzag(self):
newidx = np.ones(64).astype('int8')
for i in range(64):
newidx[list(self.zigzag).index(i)] = i
self.base = self.base[newidx]
```
#### 6.1.5 Reconstruct image
我們需要一張8倍數長寬的圖片作為解碼器測試用
1. 從twitter取得[400x400的Fubuki頭貼](https://twitter.com/shirakamifubuki)
![](https://pbs.twimg.com/profile_images/1482580721466486784/q36x-Uux_400x400.jpg)
2. 以[GIMP](https://www.gimp.org/)開啟後另存一張Optimized JPG
![](https://i.imgur.com/CHSDxt3.png)
3. 解碼。`decode`後產生的是RGB不同分量的二維矩陣的一維陣列,因此還需要用`dstack`疊成一個三維陣列
```python=
jpeg = JPEG('sample2.jpg')
jpeg.decode()
img = np.dstack(tuple([i for i in jpeg.decoded_data])).astype(np.uint8)
```
4. 用`pyplot`展示結果
```python=
import matplotlib.pyplot as plt
img = plt.imshow(img, interpolation='nearest')
plt.axis('off')
plt.savefig("output.png", bbox_inches='tight')
```
```
real 0m5.922s
user 0m6.220s
sys 0m1.460s
```
![](https://imgur.com/rmoDoOn.jpg)
#### 6.1.6 Image resolution isn't mod8.
JPEG對於非8倍數寬度圖像的做法,就是將其擴展至8倍數,至於如何擴展由實作決定,有以鏡像方式填充(padding),為何是以鏡像方式?以留黑邊影像為例,黑邊在進行轉換到頻域時會產生分佈極廣的係數(回想step function經過fourier transform的結果),這會產生大量不同的位元需要編碼,因此以鏡像方式填補。
以8x2的黑白影像分別以重複(edge)、鏡像(reflection)、對稱(symmetric)進行填補,使用[numpy.pad](https://numpy.org/doc/stable/reference/generated/numpy.pad.html)實作。
```python=
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import entropy
class DCT():
def __init__(self):
self.base = np.zeros(64)
self.zigzag = np.array([
[0, 1, 5, 6, 14, 15, 27, 28],
[2, 4, 7, 13, 16, 26, 29, 42],
[3, 8, 12, 17, 25, 30, 41, 43],
[9, 11, 18, 24, 31, 40, 44, 53],
[10, 19, 23, 32, 39, 45, 52, 54],
[20, 22, 33, 38, 46, 51, 55, 60],
[21, 34, 37, 47, 50, 56, 59, 61],
[35, 36, 48, 49, 57, 58, 62, 63],
]).flatten()
# Generate 2D-DCT matrix
L = 8
C = np.zeros((L, L))
for k in range(L):
for n in range(L):
C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
if k != 0:
C[k, n] *= np.sqrt(2)
self.dct = C
def perform_DCT(self):
self.base = np.kron(self.dct, self.dct) @ self.base
quant = np.array([
[16, 11, 10, 16, 24, 40, 51, 61],
[12, 12, 14, 19, 26, 58, 60, 55],
[14, 13, 16, 24, 40, 57, 69, 56],
[14, 17, 22, 29, 51, 87, 80, 62],
[18, 22, 37, 56, 68, 109, 103, 77],
[24, 35, 55, 64, 81, 104, 113, 92],
[49, 64, 78, 87, 103, 121, 120, 101],
[72, 92, 95, 98, 112, 100, 103, 99],
])
image = np.array([
[0, 255],
[0, 255],
[0, 255],
[0, 255],
[0, 255],
[0, 255],
[0, 255],
[0, 255]
])
seq = ((0, 8-image.shape[0]), (0, 8-image.shape[1]))
bins = np.linspace(-128, 127, 256)
for mode in ('edge', 'reflect', 'symmetric'):
m = DCT()
if mode == 'reflect' or mode == 'symmetric':
m.base = np.pad(image, seq, mode)
else:
m.base = np.pad(image, seq, mode)
m.base = m.base.flatten()
m.base -= 128 # level-shifting
m.perform_DCT()
m.base = m.base // quant.flatten()
plt.hist(m.base, bins, alpha=0.5, label=mode)
unique, counts = np.unique(m.base,return_counts=True)
freq = counts/np.sum(counts)
print(mode, entropy(freq, base=2))
plt.legend(loc='upper right')
plt.show()
```
![](https://imgur.com/lbovgCx.jpg)
結果顯示reflect和symmetric所產生的係數更為集中,如果我們計算一下三者的entropy,會發現reflect和symmetric能夠帶給我們更好的壓縮比。
```
edge 1.7733292042478084
reflect 1.3967822215997983
symmetric 1.0960127271685762
```
# 7. Conclusion
本篇介紹資料在JPEG中的佈局並實作[decoder](),未來有空的話將實作encoder,需要處理SOI、DQT、SOF、DHT、SOS、EOI幾個重要的marker,因為注重在quality factor和圖片差異的關係而非壓縮率,因此Luma、Chroma採用的Huffman table採用[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf)的Table K.3~K.6,quantization table則使用Table K.1~K.2,實作參考[ghallak](https://github.com/ghallak/jpeg-python/tree/2fe1bd2244c3090543695b106866dfa0a3b48f6c)。
解碼過程中也發現,縱使處理MCU不用耗費太多時間,JPEG decoder的速度仍會受限於Huffman decoder的限制,因為必須解碼當前的符號才知道下一個符號的起始點。[t0rakka](https://t0rakka.silvrback.com/jpeg-decoding-benchmark)加入RST marker區隔MCU,達到多執行緒decode,效率驚人,可惜並非所有圖片都有加入RST,使用上較為受限。
# Reference
- [Designing a JPEG Decoder & Source Code](https://www.impulseadventure.com/photo/jpeg-decoder.html)
- [JPEG Huffman Coding Tutorial](https://www.impulseadventure.com/photo/jpeg-huffman-coding.html)
- [從計算機編碼的角度看Entropy](https://hackmd.io/@sXG2cRDpRbONCsrtz8jfqg/ry-0k0PwH)
- [DLCV Lab-影像壓縮](http://ip.csie.ncu.edu.tw/course/IP/IP1607cp.pdf)