JPEG Decoder (3/3)

本篇介紹

Quantization table
Huffman table
Encoded data

在JPEG檔案內排列的順序，範例參考Understanding and Decoding a JPEG Image using Python，參考規格書補漏，並實作JPEG decoder。

JPEG有4種壓縮格式

Baseline
Extended Sequential
Progressive
Loseless

每種壓縮格式的佈局都不一，我們預設圖片為Baseline JPEG。

上方為Ange Albertini的圖，這顯示出JPEG內重要的區段(segment)，大部分的標記(marker)後面接著不含標記的區段長度，較為重要的標記有:

Hex	Marker	Marker Name	Description
0xFFD8	SOI	Start of Image
0xFFE0	APP0	Application Segment 0	JFIF - JFIF JPEG image / AVI1 - Motion JPEG (MJPG)
0xFFDB	DQT	Define Quantization Table
0xFFC0	SOF	Start of Frame 0	Baseline DCT
0xFFC4	DHT	Define Huffman Table
0xFFDA	SOS	Start of Scan
0xFFD9	EOI	End of Image

1. File Start & File End

SOI: 圖檔起始
EOI: 圖檔結束

SOS並沒有提及壓縮資料的長度，所以解碼器會一直讀取直到EOI。

嘗試寫個簡單的decoder，除了讀到SOI、EOI要進行特別處理，其餘只要遵循讀完2 bytes標記，再讀2 bytes的區段長度的規則：






























































from struct import unpack


marker_mapping = {
    0xffd8: "Start of Image",
    0xffe0: "Application Default Header",
    0xffdb: "Quantization Table",
    0xffc0: "Start of Frame",
    0xffc4: "Define Huffman Table",
    0xffda: "Start of Scan",
    0xffd9: "End of Image"
}


class JPEG:
    def __init__(self, image_file):
        with open(image_file, 'rb') as f:
            self.img_data = f.read()
    
    def decode(self):
        data = self.img_data
        while(True):
            marker, = unpack(">H", data[0:2])
            print(marker_mapping.get(marker))
            if marker == 0xffd8:  # soi
                data = data[2:]
            elif marker == 0xffd9:  # eoi
                return
            elif marker == 0xffda:  # sos
                self.decodeSOS(data[2:-2])
                data = data[-2:]
            else:
                lenchunk, = unpack(">H", data[2:4])
                chunk = data[4:2+lenchunk]
                data = data[2+lenchunk:]

                if marker == 0xffc4:
                    self.decodeHuffmanTable(chunk)
                elif marker == 0xffdb:
                    self.DefineQuantizationTables(chunk)
                elif marker == 0xffc0:
                    self.decodeFrameHeader(chunk)

            if len(data) == 0:
                break  

if __name__ == "__main__":
    img = JPEG('profile.jpg')
    img.decode()    

# OUTPUT:
# Start of Image
# Application Default Header
# Quantization Table
# Quantization Table
# Start of Frame
# Huffman Table
# Huffman Table
# Huffman Table
# Huffman Table
# Start of Scan
# End of Image

2. Application Default Header (`0xFFE0`)

略過，不影響解碼

3. Define Quantization Table, DQT(`0xFFDB`)

ISO/IEC 10918-1:199, 39-40

Baseline:

Parameter	Value
Lq	2+65n
Pq	0 (8-bit $Q_{k}$ )
Tq	0~3,(0: Luminance, 1:Chrominance)

實際佈局

Lq   | LqPq | Qk...
0x43  0x01   0x01 0x02 0x03

3.1 Decode DQT
























from struct import *

def GetArray(type,l, length):
    s = ""
    for i in range(length):
        s =s+type
    return list(unpack(s,l[:length]))
  
class JPEG:
    # ------

    def DefineQuantizationTables(self, data):
        offset = 0
        while offset < len(data):
            id, = unpack("B", data[offset: offset+1])
            self.quant[id] = GetArray("B", data[offset + 1:offset + 65], 64)
            print("#{} Table".format(id))
            print("Elements: ", self.quant[id])
            offset += 65


    def decode(self):

        self.DefineQuantizationTables(chunk)

3.2 Encode DQT

使用全是1的quantization table進行測試






def encodeQuantizationTables(self, hdr):
    # DQT, Lq, (Pq|Tq), Qk
    return pack('>HHB'+"B"*64, 0xffdb, 2+1+64, hdr, *self.quant[hdr])

jpeg.quant[0] = np.ones(64, dtype='uint8').tobytes()
print('DQT: ', jpeg.encodeQuantizationTables(0))

# b'\xff\xdb\x00C\x00\x01\x01\x01\x01.......\x01'

Quantization table可使用預設的表，加上quality factor




qf = 90
factor = 5000/qf if (qf < 50) else 200-2*qf
q = load_quantization_table('lum')
q = np.clip(q*factor/100,1,255)

使用方式如下：

(img // _jpeg_quantiz_matrix)

4. Start of Frame, SOF(`0xFFC0`)

ISO/IEC 10918-1:199, 35-36

與上面的圖交戶對照

Parameter	Description	Value (For Baseline DCT)
SOFn	Marker	0xffc0
Lf	Frame header length	Lf+P+Y+X+Nf=8 bytes, Ci+Hi+Vi+Tqi=3 bytes, 8+3*3(Lf)=17
P	Sample precision	8
Y	Number of lines (Height)	2
X	Number of sample per line (Width)	6
Nf	Number of image components in frame	Y, Cb, Cr => Nf=3
Ci	Component identifier	1~3 (1=Y, 2=Cb, 3=Cr, 4=I, 5=Q)
Hi	Horizontal sampling factor	略
Vi	Vertical sampling factor	略
Tqi	Quantization table destination selector, component所對應的量化表	Y對應到0，CbCr共用一張表對應到1

關於Y, X, Hi, Vi的意義，還需要參照ISO/IEC 10918-1:199 A.1.1

假設圖片512x512且有3個components (Y, Cb, Cr)，要求出每個component是由多少個xi*yi的sample組成(sample就是我們常在說構成圖片的"點"，在Baseline中1 sample=8-bit)：

Component 0   H0=4  V0=1
COmponent 1   H1=2  V1=2
Component 2   H2=1  V2=1

X=512, Y=512, Hmax=4, Vmax=2, 那每個component的xi, yi會由decoder計算，分別為：

Component 0   x0=512  y0=256
Component 1   x1=256  y1=512
Component 2   x2=128  y2=256

所以在上方表格中，對應的xi, yi:

Y=2 X=6 Hmax=2 Vmax=2
Component 1   H1=2  V1=2     x1=6 y1=2
COmponent 2   H2=1  V2=1  => x2=3 y2=1
Component 3   H3=1  V3=1     x3=3 y2=1

會發現chroma儲存的資訊水平、垂直上都減半，也就是YUV 4:2:0

MCU1                  | MCU2
Y00 Y10 Y01 Y11 Cb Cr | Y...

4.1 Decode Frame Header

























subsample_mapping = {
    "11": "4:4:4",
    "21": "4:2:2",
    "41": "4:1:1",
    "22": "4:2:0"
}

class JPEG:
    
    def decodeFrameHeader(self, data):
        # BaselineDCT
        precison, self.height, self.width, self.components = unpack(
            ">BHHB", data[0:6])
        for i in range(self.components):
            id, factor, QtbId = unpack("BBB", data[6+i*3:9+i*3])
            h, v = (factor >> 4, factor & 0x0F)
            self.horizontalFactor.append(h)
            self.verticalFactor.append(v)
            self.quantMapping.append(QtbId)

        print("size {}x{}".format(self.width, self.height))
        print("subsampling {}".format(subsample_mapping.get("{}{}".format(
            int(max(self.horizontalFactor)/self.horizontalFactor[0]),
            int(max(self.verticalFactor)/self.verticalFactor[0])
        ))))

4.2 Encode Frame Header








    def encodeFrameHeader(self):
        # BaselineDCT, 4:4:4, 3 components
        C = []
        for i in range(3):
            C.append(i+1)
            C.append(1 << 4 | 1)
            C.append(self.quantMapping[i])
        return pack('>HHBHHB'+'BBB'*3, 0xffc0, 8+3*3, 8, self.height, self.width, 3, *C)

輸出一小段測試




img.quantMapping=[0,1,1]
img.height=2
img.width=6
print(img.encodeFrameHeader()) # b'\xff\xc0\x00\x11\x08\x00\x02\x00\x06\x03\x01\x11\x00\x02\x11\x01\x03\x11\x01'

5. Define Huffman Table, DHT(`0xFFC4`)

ISO/IEC 10918-1:199, 40-41

Parameter	Description	Value (For Baseline DCT)
Lh	Huffman table definition length	2+17+mt
Tc	Table class	0=DC table, 1=AC table
Th	Huffman table destination identifier	0-1 (0=Luminance, 1=Chroma)。其實沒有規範，只要與Quantization table和Scan header的destination對應起來就好。
Li	Number of Huffman codes of length i	0-255
Vij	Value associated with each Huffman code	0-255

先前提到JPEG將Huffman table拆成codeword length、decoded value兩部份存，合計16+n bytes，就是在講Li、Vij。

01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 | 01 02 03 04 05 06 07 08 09 0A 0B
             codeword length                 |            decoded value

回顧範例，試著寫encoder/decoder。

Standard允許一個marker後方接多組Huffman table

5.1 Decode Huffman Table

Huffman decoder的部份覺得micro-jpeg-visualizer寫的不錯，索幸拿來用，整份程式只有250行不依賴任何套件。








































class HuffmanTable:
	def __init__(self):
		self.root=[]
		self.elements = []
	
	def BitsFromLengths(self, root, element, pos):
		if isinstance(root,list):
			if pos==0:
				if len(root)<2:
					root.append(element)
					return True				
				return False
			for i in [0,1]:
				if len(root) == i:
					root.append([])
				if self.BitsFromLengths(root[i], element, pos-1) == True:
					return True
		return False
	
	def GetHuffmanBits(self,  lengths, elements):
		self.elements = elements
		ii = 0
		for i in range(len(lengths)):
			for j in range(lengths[i]):
				self.BitsFromLengths(self.root, elements[ii], i)
				ii+=1

	def Find(self,st):
		r = self.root
		while isinstance(r, list):
			r=r[st.GetBit()]
		return  r 

	def GetCode(self, st):
		while(True):
			res = self.Find(st)
			if res == 0:
				return 0
			elif ( res != -1):
				return res

GetCode 需要逐位元讀取data，所以另外建立Stream以便循序讀取資料上的每一位元
















class Stream:
    def __init__(self, data):
        self.data= data
        self.pos = 0

    def GetBit(self):
        b = self.data[self.pos >> 3]
        s = 7-(self.pos & 0x7)
        self.pos+=1
        return (b >> s) & 1

    def GetBitN(self, l):
        val = 0
        for i in range(l):
            val = val*2 + self.GetBit()
        return val

假設在Huffman table中0b000對應0x04：

    st = Stream([0x00]) #0b000000011
    print('0b000: ', hf.GetCode(st)) # 4
    print('0b000: ', hf.GetCode(st)) # 4

將length, elements解出來丟進HuffmanTable()建表得到Huffman tree - root，便能用GetCode將codeword解碼。





























    def decodeHuffmanTable(self, data):
        offset = 0
        tbcount = 0
        while offset < len(data):
            tcth, = unpack("B", data[offset:offset+1])
            tc, th = (tcth >> 4, tcth & 0x0F)
            offset += 1

            # Extract the 16 bytes containing length data
            lengths = unpack("BBBBBBBBBBBBBBBB", data[offset:offset+16])
            offset += 16

            # Extract the elements after the initial 16 bytes
            elements = []
            for i in lengths:
                elements += (unpack("B"*i, data[offset:offset+i]))
                offset += i

            print("#{}: {} {} Table".format(tbcount, "Luma" if th ==
                  0 else "Chroma", "DC" if tc == 0 else "AC"))
            print("lengths: ", lengths)
            print("Elements: ", elements)

            # tc = 0(DC), 1(AC)
            # th = 0(Luminance), 1(Chroma)
            hf = HuffmanTable()
            hf.GetHuffmanBits(lengths, elements)
            self.huffman_tables[tc << 1 | th] = hf
            tbcount += 1

Define Huffman Table
Luma AC Table
lengths:  (0, 1, 4, 1, 3, 4, 2, 3, 0, 1, 5, 1, 0, 0, 0, 0)
Elements:  [3, 1, 2, 4, 5, 6, 7, 17, 18, 0, 8, 19, 33, 20, 34, 21, 35, 49, 50, 9, 22, 36, 65, 81, 23]
Define Huffman Table
Chroma DC Table
lengths:  (0, 2, 3, 1, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Elements:  [6, 7, 4, 5, 8, 3, 0, 1, 2, 9]
Define Huffman Table
Chroma AC Table
lengths:  (0, 2, 1, 3, 3, 3, 2, 5, 3, 4, 1, 4, 3, 0, 0, 0)
Elements:  [1, 2, 3, 4, 5, 17, 0, 18, 33, 6, 49, 65, 19, 34, 7, 50, 81, 97, 113, 20, 35, 129, 21, 66, 145, 177, 161, 22, 37, 82, 209, 130, 193, 241]

5.2 Encode Huffman Table






    def encodeHuffmanTable(self, matrix, tc, th):
        L = []
        V = []
        Lh = 5 + 16 + len(V)
        tcth = tc << 4 | th
        return pack('>HHB'+'B'*16+'B'*len(V), 0xffc4, Lh, tcth, *L, *V)

6. Start of Scan, SOS (`0xFFDA`)

ISO/IEC 10918-1:199, 37-38

6.1 Decode SOS

在進行解碼前，先對一些函數進行修改。

JPEG在色域轉換後會減去128才進行DCT，將[0, 255]對應到[-128, 127]，因此解碼時需先補上128。



# Level-shifting
for i in range(3):
    self.img[i] += 128

6.1.1 `Image`

儲存圖像資料的物件Image只有在進行色域轉換時才以3維陣列處理，平時以1維陣列img儲存Y、Cb、Cr的2維資料。DrawMatrix將已解碼完成的部份資料渲染到整塊畫布。




































class Image:
    def __init__(self, height, width):
        self.width = width
        self.height = height
        # [0] = Y, [1] = Cb, [2]=Cr
        self.img = []
        for i in range(3):
            self.img.append(np.zeros((self.height, self.width)))

    def ycbcr2rgb(self):
        # Level-shifting
        for i in range(3):
            self.img[i] += 128

        # Merge multiple 2D arrays into a 3D array
        self.img = np.dstack(tuple([i for i in self.img]))

        # Convert image from YCbCr to RGB
        self.img[:, :, 1:] -= 128
        m = np.array([
            [1.000,  1.000, 1.000],
            [0.000, -0.344136, 1.772],
            [1.402, -0.714136, 0.000],
        ])
        rgb = np.dot(self.img, m)
        self.img = np.clip(rgb, 0, 255)

        # Convert a 3D array to multiple 2D arrays
        self.img = [self.img[:, :, i] for i in range(3)]

    def DrawMatrix(self, y, x, L, Cb, Cr):
        _x = x*8
        _y = y*8
        self.img[0][_y:_y+8, _x:_x+8] = L.reshape((8, 8))
        self.img[1][_y:_y+8, _x:_x+8] = Cb.reshape((8, 8))
        self.img[2][_y:_y+8, _x:_x+8] = Cr.reshape((8, 8))

6.1.2 `decodeSOS`

To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the 0xFF byte, followed by a non-zero byte). If the huffman coding scheme needed to write a 0xFF byte, then it writes a 0xFF followed by a 0x00 – a process known as adding a stuff byte.

JPEG允許scan data出現marker(0xFFXX)來增加容錯性，為了和marker進行區別，寫入0xFF的資料會寫入成0xFF00，這是在進行Huffman decoding前要進行留意到的細節。

解碼流程：

讀取各component的Huffman table id、quantization table id
取代0xFF00為0xFF
依照MCU內的component順序呼叫BuildMatrix查表解碼，在YCbCr 4:4:4下就是Y, Cb, Cr, Y, Cb, Cr…
解碼後的component交給DrawMatrix渲染在畫布img上


































    def decodeSOS(self, data):
        # BaselineDCT
        ls, ns = unpack(">HB", data[0:3])
        csj = unpack("BB"*ns, data[3:3+2*ns])
        dcTableMapping = []
        acTableMapping = []
        for i in range(3):
            dcTableMapping.append(csj[i*2+1] >> 4)
            acTableMapping.append(csj[i*2+1] & 0x0F)
        data = data[6+2*ns:]

        # Replace 0xFF00 with 0xFF
        i = 0
        while i < len(data) - 1:
            m, = unpack(">H", data[i:i+2])
            if m == 0xff00:
                data = data[:i+1]+data[i+2:]
            i = i + 1

        img = Image(self.height, self.width)
        st = Stream(data)
        oldlumdccoeff, oldCbdccoeff, oldCrdccoeff = 0, 0, 0
        for y in range(self.height//8):
            for x in range(self.width//8):
                matL, oldlumdccoeff = self.BuildMatrix(
                    st, dcTableMapping[0], acTableMapping[0], self.quant[self.quantMapping[0]], oldlumdccoeff)
                matCb, oldCrdccoeff = self.BuildMatrix(
                    st, dcTableMapping[1], acTableMapping[1], self.quant[self.quantMapping[1]], oldCrdccoeff)
                matCr, oldCbdccoeff = self.BuildMatrix(
                    st, dcTableMapping[2], acTableMapping[2], self.quant[self.quantMapping[2]], oldCbdccoeff)
                img.DrawMatrix(y, x, matL.base, matCb.base, matCr.base)

        img.ycbcr2rgb()
        self.decoded_data = img.img

6.1.3 `BuildMatrix`

BuildMatrix的功能是重排DC/AC coeficient的順序，並進行inverse DCT。

micro-jpeg-visualizer的實作有兩處值得檢視：

假如DC coefficient解碼後的係數是0(EOB)，那就沒有必要讀addition bits的必要。在GetBitN的實作中，傳入0不會從Stream讀取任何位元，而是返回0；DecodeNumber(0, 0)也是直接返回0，這樣的實作方式省去判斷code是否為0。
```
bits = st.GetBitN(code)
dccoeff = DecodeNumber(code, bits) + olddccoeff
i.base[0] = dccoeff
```

在AC coefficient中，讀到ZRL(0xF0)要將offset(l)加上16並讀取下一組，但實作上是分成 +15、+1兩次進行，也省下判斷式。












if code > 15:
    l += code >> 4 # +15
    code = code & 0x0F

bits = st.GetBitN(code)

if l < 64:
    coeff = DecodeNumber(code, bits)
    # print(coeff)

    i.base[l] = coeff * quant[l]
    l += 1 # +1

為了比較好釐清JPEG的實作，這邊刻意改寫成：














if code == 0xF0:
    # ZRL
    l += 16 # +16
    continue
elif code > 15:
    l += code >> 4
    code = code & 0x0F

bits = st.GetBitN(code)

if l < 64:
    coeff = DecodeNumber(code, bits)
    i.base[l] = coeff
    l += 1

自己沒有注意到兩處小技巧在改寫時忽略掉幾點，解碼時在幾個MCU後解碼錯誤，因為是Bit-stream非常難除錯，最後還是靠比對原始程式碼發現這點。整體程式碼如下：





































class JPEG:
        
    def BuildMatrix(self, st, dcTableId, acTableId, quant, olddccoeff):
        i = DCT()

        code = self.huffman_tables[0b00 | dcTableId].GetCode(st)
        bits = st.GetBitN(code)
        dccoeff = DecodeNumber(code, bits) + olddccoeff
        i.base[0] = dccoeff

        l = 1
        while l < 64:
            code = self.huffman_tables[0b10 | acTableId].GetCode(st)
            if code == 0:
                # EOB
                break

            if code == 0xF0:
                # ZRL
                l += 16
                continue
            elif code > 15:
                l += code >> 4
                code = code & 0x0F

            bits = st.GetBitN(code)

            if l < 64:
                coeff = DecodeNumber(code, bits)
                i.base[l] = coeff
                l += 1

        i.base = np.multiply(i.base, quant)
        i.rearrange_using_zigzag()
        i.perform_IDCT()

        return i, dccoeff

6.1.4 `DCT`

實作前一章的內容。



































class DCT():
    def __init__(self):
        self.base = np.zeros(64)
        self.zigzag = np.array([
            [0, 1, 5, 6, 14, 15, 27, 28],
            [2, 4, 7, 13, 16, 26, 29, 42],
            [3, 8, 12, 17, 25, 30, 41, 43],
            [9, 11, 18, 24, 31, 40, 44, 53],
            [10, 19, 23, 32, 39, 45, 52, 54],
            [20, 22, 33, 38, 46, 51, 55, 60],
            [21, 34, 37, 47, 50, 56, 59, 61],
            [35, 36, 48, 49, 57, 58, 62, 63],
        ]).flatten()
        # Generate 2D-DCT matrix
        L = 8
        C = np.zeros((L, L))
        for k in range(L):
            for n in range(L):
                C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
                if k != 0:
                    C[k, n] *= np.sqrt(2)
        self.dct = C

    def perform_DCT(self):
        self.base = np.kron(self.dct, self.dct) @ self.base

    def perform_IDCT(self):
        self.base = np.kron(self.dct.transpose(),
                            self.dct.transpose()) @ self.base

    def rearrange_using_zigzag(self):
        newidx = np.ones(64).astype('int8')
        for i in range(64):
            newidx[list(self.zigzag).index(i)] = i
        self.base = self.base[newidx]

6.1.5 Reconstruct image

我們需要一張8倍數長寬的圖片作為解碼器測試用

從twitter取得400x400的Fubuki頭貼
以GIMP開啟後另存一張Optimized JPG

解碼。decode後產生的是RGB不同分量的二維矩陣的一維陣列，因此還需要用dstack疊成一個三維陣列



jpeg = JPEG('sample2.jpg')
jpeg.decode()
img = np.dstack(tuple([i for i in jpeg.decoded_data])).astype(np.uint8)

用pyplot展示結果




import matplotlib.pyplot as plt
img = plt.imshow(img, interpolation='nearest')
plt.axis('off')
plt.savefig("output.png", bbox_inches='tight')

real    0m5.922s
user    0m6.220s
sys     0m1.460s

6.1.6 Image resolution isn't mod8.

JPEG對於非8倍數寬度圖像的做法，就是將其擴展至8倍數，至於如何擴展由實作決定，有以鏡像方式填充(padding)，為何是以鏡像方式？以留黑邊影像為例，黑邊在進行轉換到頻域時會產生分佈極廣的係數（回想step function經過fourier transform的結果），這會產生大量不同的位元需要編碼，因此以鏡像方式填補。

以8x2的黑白影像分別以重複(edge)、鏡像(reflection)、對稱(symmetric)進行填補，使用numpy.pad實作。







































































import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import entropy

class DCT():
    def __init__(self):
        self.base = np.zeros(64)
        self.zigzag = np.array([
            [0, 1, 5, 6, 14, 15, 27, 28],
            [2, 4, 7, 13, 16, 26, 29, 42],
            [3, 8, 12, 17, 25, 30, 41, 43],
            [9, 11, 18, 24, 31, 40, 44, 53],
            [10, 19, 23, 32, 39, 45, 52, 54],
            [20, 22, 33, 38, 46, 51, 55, 60],
            [21, 34, 37, 47, 50, 56, 59, 61],
            [35, 36, 48, 49, 57, 58, 62, 63],
        ]).flatten()
        # Generate 2D-DCT matrix
        L = 8
        C = np.zeros((L, L))
        for k in range(L):
            for n in range(L):
                C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L)
                if k != 0:
                    C[k, n] *= np.sqrt(2)
        self.dct = C

    def perform_DCT(self):
        self.base = np.kron(self.dct, self.dct) @ self.base


quant = np.array([
    [16, 11, 10, 16, 24, 40, 51, 61],
    [12, 12, 14, 19, 26, 58, 60, 55],
    [14, 13, 16, 24, 40, 57, 69, 56],
    [14, 17, 22, 29, 51, 87, 80, 62],
    [18, 22, 37, 56, 68, 109, 103, 77],
    [24, 35, 55, 64, 81, 104, 113, 92],
    [49, 64, 78, 87, 103, 121, 120, 101],
    [72, 92, 95, 98, 112, 100, 103, 99],
])

image = np.array([
    [0, 255],
    [0, 255],
    [0, 255],
    [0, 255],
    [0, 255],
    [0, 255],
    [0, 255],
    [0, 255]
])
seq = ((0, 8-image.shape[0]), (0, 8-image.shape[1]))
bins = np.linspace(-128, 127, 256)
for mode in ('edge', 'reflect', 'symmetric'):
    m = DCT()
    if mode == 'reflect' or mode == 'symmetric':
        m.base = np.pad(image, seq, mode)
    else:
        m.base = np.pad(image, seq, mode)
    m.base = m.base.flatten()
    m.base -= 128  # level-shifting
    m.perform_DCT()
    m.base = m.base // quant.flatten()
    plt.hist(m.base, bins, alpha=0.5, label=mode)
    
    unique, counts = np.unique(m.base,return_counts=True)
    freq = counts/np.sum(counts)
    print(mode, entropy(freq, base=2))
plt.legend(loc='upper right')
plt.show()

結果顯示reflect和symmetric所產生的係數更為集中，如果我們計算一下三者的entropy，會發現reflect和symmetric能夠帶給我們更好的壓縮比。

edge 1.7733292042478084
reflect 1.3967822215997983
symmetric 1.0960127271685762

7. Conclusion

本篇介紹資料在JPEG中的佈局並實作decoder，未來有空的話將實作encoder，需要處理SOI、DQT、SOF、DHT、SOS、EOI幾個重要的marker，因為注重在quality factor和圖片差異的關係而非壓縮率，因此Luma、Chroma採用的Huffman table採用ISO/IEC 10918-1:199的Table K.3~K.6，quantization table則使用Table K.1~K.2，實作參考ghallak。

解碼過程中也發現，縱使處理MCU不用耗費太多時間，JPEG decoder的速度仍會受限於Huffman decoder的限制，因為必須解碼當前的符號才知道下一個符號的起始點。t0rakka加入RST marker區隔MCU，達到多執行緒decode，效率驚人，可惜並非所有圖片都有加入RST，使用上較為受限。

JPEG Decoder (3/3)

1. File Start & File End

2. Application Default Header (0xFFE0)

3. Define Quantization Table, DQT(0xFFDB)

3.1 Decode DQT

3.2 Encode DQT

4. Start of Frame, SOF(0xFFC0)

4.1 Decode Frame Header

4.2 Encode Frame Header

5. Define Huffman Table, DHT(0xFFC4)

5.1 Decode Huffman Table

5.2 Encode Huffman Table

6. Start of Scan, SOS (0xFFDA)

6.1 Decode SOS

6.1.1 Image

6.1.2 decodeSOS

6.1.3 BuildMatrix

6.1.4 DCT

6.1.5 Reconstruct image

6.1.6 Image resolution isn't mod8.

7. Conclusion

Reference

Read more

2022q1 Homework3 (quiz3)

2021q1 Homework4 (quiz4)

JPEG Decoder (1/3)

JPEG Decoder (2/3)