JPEG Decoder (3/3)

--- ###### tags: `DataCompression` --- # JPEG Decoder (3/3) 本篇介紹 1. Quantization table 2. Huffman table 3. Encoded data 在JPEG檔案內排列的順序，範例參考[Understanding and Decoding a JPEG Image using Python](https://yasoob.me/posts/understanding-and-writing-jpeg-decoder-in-python/#huffman-encoding)，參考規格書補漏，並實作JPEG decoder。 JPEG有4種壓縮格式 - Baseline - Extended Sequential - Progressive - Loseless 每種壓縮格式的佈局都不一，我們預設圖片為**Baseline** JPEG。 ![](https://i.imgur.com/DwAuA7b.png) 上方為[Ange Albertini](https://yasoob.me/posts/understanding-and-writing-jpeg-decoder-in-python/#huffman-encoding)的圖，這顯示出JPEG內重要的區段(segment)，大部分的標記(marker)後面接著不含標記的區段長度，較為重要的標記有: Hex|Marker|Marker Name|Description -|-|-|- 0xFFD8|SOI|Start of Image| 0xFFE0|APP0|Application Segment 0|JFIF - JFIF JPEG image / AVI1 - Motion JPEG (MJPG) 0xFFDB|DQT|Define Quantization Table | 0xFFC0|SOF|Start of Frame 0|Baseline DCT 0xFFC4|DHT|Define Huffman Table| 0xFFDA|SOS|Start of Scan| 0xFFD9|EOI|End of Image| # 1. File Start & File End - `SOI`: 圖檔起始 - `EOI`: 圖檔結束 `SOS`並沒有提及壓縮資料的長度，所以解碼器會一直讀取直到`EOI`。嘗試寫個簡單的decoder，除了讀到`SOI`、`EOI`要進行特別處理，其餘只要遵循讀完2 bytes標記，再讀2 bytes的區段長度的規則： ```python= from struct import unpack marker_mapping = { 0xffd8: "Start of Image", 0xffe0: "Application Default Header", 0xffdb: "Quantization Table", 0xffc0: "Start of Frame", 0xffc4: "Define Huffman Table", 0xffda: "Start of Scan", 0xffd9: "End of Image" } class JPEG: def __init__(self, image_file): with open(image_file, 'rb') as f: self.img_data = f.read() def decode(self): data = self.img_data while(True): marker, = unpack(">H", data[0:2]) print(marker_mapping.get(marker)) if marker == 0xffd8: # soi data = data[2:] elif marker == 0xffd9: # eoi return elif marker == 0xffda: # sos self.decodeSOS(data[2:-2]) data = data[-2:] else: lenchunk, = unpack(">H", data[2:4]) chunk = data[4:2+lenchunk] data = data[2+lenchunk:] if marker == 0xffc4: self.decodeHuffmanTable(chunk) elif marker == 0xffdb: self.DefineQuantizationTables(chunk) elif marker == 0xffc0: self.decodeFrameHeader(chunk) if len(data) == 0: break if __name__ == "__main__": img = JPEG('profile.jpg') img.decode() # OUTPUT: # Start of Image # Application Default Header # Quantization Table # Quantization Table # Start of Frame # Huffman Table # Huffman Table # Huffman Table # Huffman Table # Start of Scan # End of Image ``` # 2. Application Default Header (`0xFFE0`) 略過，不影響解碼 # 3. Define Quantization Table, DQT(`0xFFDB`) [ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 39-40 ![](https://i.imgur.com/5XMc5gU.png) ![](https://i.imgur.com/n83GffK.png) Baseline: Parameter | Value -|- Lq|2+65n Pq|0 (8-bit $Q_k$) Tq|0~3,(0: Luminance, 1:Chrominance) 實際佈局 ``` Lq | LqPq | Qk... 0x43 0x01 0x01 0x02 0x03 ``` ### 3.1 Decode DQT ```python3= from struct import * def GetArray(type,l, length): s = "" for i in range(length): s =s+type return list(unpack(s,l[:length])) class JPEG: # ------ def DefineQuantizationTables(self, data): offset = 0 while offset < len(data): id, = unpack("B", data[offset: offset+1]) self.quant[id] = GetArray("B", data[offset + 1:offset + 65], 64) print("#{} Table".format(id)) print("Elements: ", self.quant[id]) offset += 65 def decode(self): self.DefineQuantizationTables(chunk) ``` ### 3.2 Encode DQT 使用全是1的quantization table進行測試 ```python= def encodeQuantizationTables(self, hdr): # DQT, Lq, (Pq|Tq), Qk return pack('>HHB'+"B"*64, 0xffdb, 2+1+64, hdr, *self.quant[hdr]) jpeg.quant[0] = np.ones(64, dtype='uint8').tobytes() print('DQT: ', jpeg.encodeQuantizationTables(0)) ``` ``` # b'\xff\xdb\x00C\x00\x01\x01\x01\x01.......\x01' ``` Quantization table可使用預設的表，加上quality factor ```python= qf = 90 factor = 5000/qf if (qf < 50) else 200-2*qf q = load_quantization_table('lum') q = np.clip(q*factor/100,1,255) ``` 使用方式如下： ``` (img // _jpeg_quantiz_matrix) ``` # 4. Start of Frame, SOF(`0xFFC0`) [ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 35-36 ![](https://i.imgur.com/n6ujDPm.png) ![](https://i.imgur.com/8nNSJt8.png) --- ![](https://i.imgur.com/uZEn8WC.png) 與上面的圖交戶對照 Parameter|Description|Value (For Baseline DCT) -|-|- SOFn|Marker|0xffc0 Lf|Frame header length|Lf+P+Y+X+Nf=8 bytes, Ci+Hi+Vi+Tqi=3 bytes, 8+3*3(Lf)=17 P|Sample precision|8 Y|Number of lines (Height)|2 X|Number of sample per line (Width)|6 Nf|Number of image components in frame|Y, Cb, Cr => Nf=3 Ci|Component identifier|1~3 (1=Y, 2=Cb, 3=Cr, 4=I, 5=Q) Hi|Horizontal sampling factor|略 Vi|Vertical sampling factor|略 Tqi|Quantization table destination selector, component所對應的量化表|Y對應到0，CbCr共用一張表對應到1 關於`Y`, `X`, `Hi`, `Vi`的意義，還需要參照[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf) A.1.1 ![](https://i.imgur.com/ruMaanx.png) 假設圖片512x512且有3個components (`Y`, `Cb`, `Cr`)，要求出每個component是由多少個`xi*yi`的sample組成(sample就是我們常在說構成圖片的"點"，在Baseline中1 sample=8-bit)： ``` Component 0 H0=4 V0=1 COmponent 1 H1=2 V1=2 Component 2 H2=1 V2=1 ``` `X=512`, `Y=512`, `Hmax=4`, `Vmax=2`, 那每個component的`xi`, `yi`會由decoder計算，分別為： ``` Component 0 x0=512 y0=256 Component 1 x1=256 y1=512 Component 2 x2=128 y2=256 ``` 所以在上方表格中，對應的`xi`, `yi`: ``` Y=2 X=6 Hmax=2 Vmax=2 Component 1 H1=2 V1=2 x1=6 y1=2 COmponent 2 H2=1 V2=1 => x2=3 y2=1 Component 3 H3=1 V3=1 x3=3 y2=1 ``` 會發現chroma儲存的資訊水平、垂直上都減半，也就是[YUV 4:2:0](https://www.impulseadventure.com/photo/jpeg-decoder.html) ``` MCU1 | MCU2 Y00 Y10 Y01 Y11 Cb Cr | Y... ``` ### 4.1 Decode Frame Header ```python= subsample_mapping = { "11": "4:4:4", "21": "4:2:2", "41": "4:1:1", "22": "4:2:0" } class JPEG: def decodeFrameHeader(self, data): # BaselineDCT precison, self.height, self.width, self.components = unpack( ">BHHB", data[0:6]) for i in range(self.components): id, factor, QtbId = unpack("BBB", data[6+i*3:9+i*3]) h, v = (factor >> 4, factor & 0x0F) self.horizontalFactor.append(h) self.verticalFactor.append(v) self.quantMapping.append(QtbId) print("size {}x{}".format(self.width, self.height)) print("subsampling {}".format(subsample_mapping.get("{}{}".format( int(max(self.horizontalFactor)/self.horizontalFactor[0]), int(max(self.verticalFactor)/self.verticalFactor[0]) )))) ``` ### 4.2 Encode Frame Header ```python= def encodeFrameHeader(self): # BaselineDCT, 4:4:4, 3 components C = [] for i in range(3): C.append(i+1) C.append(1 << 4 | 1) C.append(self.quantMapping[i]) return pack('>HHBHHB'+'BBB'*3, 0xffc0, 8+3*3, 8, self.height, self.width, 3, *C) ``` 輸出一小段測試 ```python= img.quantMapping=[0,1,1] img.height=2 img.width=6 print(img.encodeFrameHeader()) # b'\xff\xc0\x00\x11\x08\x00\x02\x00\x06\x03\x01\x11\x00\x02\x11\x01\x03\x11\x01' ``` # 5. Define Huffman Table, DHT(`0xFFC4`) [ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 40-41 ![](https://i.imgur.com/GyvQgYt.png) ![](https://i.imgur.com/KIiJTmd.png) Parameter|Description|Value (For Baseline DCT) -|-|- Lh|Huffman table definition length|2+17+mt Tc|Table class|0=DC table, 1=AC table Th|Huffman table destination identifier|0-1 (0=Luminance, 1=Chroma)。其實沒有規範，只要與Quantization table和Scan header的destination對應起來就好。 Li|Number of Huffman codes of length i|0-255 Vij|Value associated with each Huffman code |0-255 先前提到JPEG將Huffman table拆成codeword length、decoded value兩部份存，合計16+n bytes，就是在講Li、Vij。 ``` 01 05 01 01 01 01 01 01 00 00 00 00 00 00 00 | 01 02 03 04 05 06 07 08 09 0A 0B codeword length | decoded value ``` 回顧範例，試著寫encoder/decoder。 ![](https://i.imgur.com/4blWV32.png) :::warning Standard允許一個marker後方接多組Huffman table ::: ### 5.1 Decode Huffman Table Huffman decoder的部份覺得[micro-jpeg-visualizer](https://github.com/aguaviva/micro-jpeg-visualizer)寫的不錯，索幸拿來用，整份程式只有250行不依賴任何套件。 ```python= class HuffmanTable: def __init__(self): self.root=[] self.elements = [] def BitsFromLengths(self, root, element, pos): if isinstance(root,list): if pos==0: if len(root)<2: root.append(element) return True return False for i in [0,1]: if len(root) == i: root.append([]) if self.BitsFromLengths(root[i], element, pos-1) == True: return True return False def GetHuffmanBits(self, lengths, elements): self.elements = elements ii = 0 for i in range(len(lengths)): for j in range(lengths[i]): self.BitsFromLengths(self.root, elements[ii], i) ii+=1 def Find(self,st): r = self.root while isinstance(r, list): r=r[st.GetBit()] return r def GetCode(self, st): while(True): res = self.Find(st) if res == 0: return 0 elif ( res != -1): return res ``` `GetCode` 需要逐位元讀取data，所以另外建立`Stream`以便循序讀取資料上的每一位元 ```python= class Stream: def __init__(self, data): self.data= data self.pos = 0 def GetBit(self): b = self.data[self.pos >> 3] s = 7-(self.pos & 0x7) self.pos+=1 return (b >> s) & 1 def GetBitN(self, l): val = 0 for i in range(l): val = val*2 + self.GetBit() return val ``` 假設在Huffman table中`0b000`對應0x04： ```python st = Stream([0x00]) #0b000000011 print('0b000: ', hf.GetCode(st)) # 4 print('0b000: ', hf.GetCode(st)) # 4 ``` 將`length`, `elements`解出來丟進`HuffmanTable()`建表得到Huffman tree - `root`，便能用`GetCode`將codeword解碼。 ```python= def decodeHuffmanTable(self, data): offset = 0 tbcount = 0 while offset < len(data): tcth, = unpack("B", data[offset:offset+1]) tc, th = (tcth >> 4, tcth & 0x0F) offset += 1 # Extract the 16 bytes containing length data lengths = unpack("BBBBBBBBBBBBBBBB", data[offset:offset+16]) offset += 16 # Extract the elements after the initial 16 bytes elements = [] for i in lengths: elements += (unpack("B"*i, data[offset:offset+i])) offset += i print("#{}: {} {} Table".format(tbcount, "Luma" if th == 0 else "Chroma", "DC" if tc == 0 else "AC")) print("lengths: ", lengths) print("Elements: ", elements) # tc = 0(DC), 1(AC) # th = 0(Luminance), 1(Chroma) hf = HuffmanTable() hf.GetHuffmanBits(lengths, elements) self.huffman_tables[tc << 1 | th] = hf tbcount += 1 ``` ``` Define Huffman Table Luma AC Table lengths: (0, 1, 4, 1, 3, 4, 2, 3, 0, 1, 5, 1, 0, 0, 0, 0) Elements: [3, 1, 2, 4, 5, 6, 7, 17, 18, 0, 8, 19, 33, 20, 34, 21, 35, 49, 50, 9, 22, 36, 65, 81, 23] Define Huffman Table Chroma DC Table lengths: (0, 2, 3, 1, 0, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) Elements: [6, 7, 4, 5, 8, 3, 0, 1, 2, 9] Define Huffman Table Chroma AC Table lengths: (0, 2, 1, 3, 3, 3, 2, 5, 3, 4, 1, 4, 3, 0, 0, 0) Elements: [1, 2, 3, 4, 5, 17, 0, 18, 33, 6, 49, 65, 19, 34, 7, 50, 81, 97, 113, 20, 35, 129, 21, 66, 145, 177, 161, 22, 37, 82, 209, 130, 193, 241] ``` ### 5.2 Encode Huffman Table ```python= def encodeHuffmanTable(self, matrix, tc, th): L = [] V = [] Lh = 5 + 16 + len(V) tcth = tc << 4 | th return pack('>HHB'+'B'*16+'B'*len(V), 0xffc4, Lh, tcth, *L, *V) ``` # 6. Start of Scan, SOS (`0xFFDA`) [ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf), 37-38 ![](https://i.imgur.com/Ts5Y0ht.png) ![](https://i.imgur.com/aznVGVf.png) ### 6.1 Decode SOS 在進行解碼前，先對一些函數進行修改。 JPEG在色域轉換後會減去128才進行DCT，將`[0, 255]`對應到`[-128, 127]`，因此解碼時需先補上128。 ```python= # Level-shifting for i in range(3): self.img[i] += 128 ``` #### 6.1.1 `Image` 儲存圖像資料的物件`Image`只有在進行色域轉換時才以3維陣列處理，平時以1維陣列`img`儲存Y、Cb、Cr的2維資料。`DrawMatrix`將已解碼完成的部份資料渲染到整塊畫布。 ```python3= class Image: def __init__(self, height, width): self.width = width self.height = height # [0] = Y, [1] = Cb, [2]=Cr self.img = [] for i in range(3): self.img.append(np.zeros((self.height, self.width))) def ycbcr2rgb(self): # Level-shifting for i in range(3): self.img[i] += 128 # Merge multiple 2D arrays into a 3D array self.img = np.dstack(tuple([i for i in self.img])) # Convert image from YCbCr to RGB self.img[:, :, 1:] -= 128 m = np.array([ [1.000, 1.000, 1.000], [0.000, -0.344136, 1.772], [1.402, -0.714136, 0.000], ]) rgb = np.dot(self.img, m) self.img = np.clip(rgb, 0, 255) # Convert a 3D array to multiple 2D arrays self.img = [self.img[:, :, i] for i in range(3)] def DrawMatrix(self, y, x, L, Cb, Cr): _x = x*8 _y = y*8 self.img[0][_y:_y+8, _x:_x+8] = L.reshape((8, 8)) self.img[1][_y:_y+8, _x:_x+8] = Cb.reshape((8, 8)) self.img[2][_y:_y+8, _x:_x+8] = Cr.reshape((8, 8)) ``` #### 6.1.2 `decodeSOS` > To help resiliency in the case of data corruption, the JPEG standard allows JPEG markers to appear in the huffman-coded scan data segment. Therefore, a JPEG decoder must watch out for any marker (as indicated by the `0xFF` byte, followed by a non-zero byte). If the huffman coding scheme needed to write a `0xFF` byte, then it writes a `0xFF` followed by a `0x00` -- a process known as adding a stuff byte. JPEG允許scan data出現marker(`0xFFXX`)來增加容錯性，為了和marker進行區別，寫入`0xFF`的資料會寫入成`0xFF00`，這是在進行Huffman decoding前要進行留意到的細節。解碼流程： 1. 讀取各component的Huffman table id、quantization table id 2. 取代`0xFF00`為`0xFF` 3. 依照MCU內的component順序呼叫`BuildMatrix`查表解碼，在YCbCr 4:4:4下就是Y, Cb, Cr, Y, Cb, Cr... 4. 解碼後的component交給`DrawMatrix`渲染在畫布`img`上 ```python= def decodeSOS(self, data): # BaselineDCT ls, ns = unpack(">HB", data[0:3]) csj = unpack("BB"*ns, data[3:3+2*ns]) dcTableMapping = [] acTableMapping = [] for i in range(3): dcTableMapping.append(csj[i*2+1] >> 4) acTableMapping.append(csj[i*2+1] & 0x0F) data = data[6+2*ns:] # Replace 0xFF00 with 0xFF i = 0 while i < len(data) - 1: m, = unpack(">H", data[i:i+2]) if m == 0xff00: data = data[:i+1]+data[i+2:] i = i + 1 img = Image(self.height, self.width) st = Stream(data) oldlumdccoeff, oldCbdccoeff, oldCrdccoeff = 0, 0, 0 for y in range(self.height//8): for x in range(self.width//8): matL, oldlumdccoeff = self.BuildMatrix( st, dcTableMapping[0], acTableMapping[0], self.quant[self.quantMapping[0]], oldlumdccoeff) matCb, oldCrdccoeff = self.BuildMatrix( st, dcTableMapping[1], acTableMapping[1], self.quant[self.quantMapping[1]], oldCrdccoeff) matCr, oldCbdccoeff = self.BuildMatrix( st, dcTableMapping[2], acTableMapping[2], self.quant[self.quantMapping[2]], oldCbdccoeff) img.DrawMatrix(y, x, matL.base, matCb.base, matCr.base) img.ycbcr2rgb() self.decoded_data = img.img ``` #### 6.1.3 `BuildMatrix` `BuildMatrix`的功能是重排DC/AC coeficient的順序，並進行inverse DCT。 [micro-jpeg-visualizer](https://github.com/aguaviva/micro-jpeg-visualizer)的實作有兩處值得檢視： 1. 假如DC coefficient解碼後的係數是`0`(`EOB`)，那就沒有必要讀addition bits的必要。在`GetBitN`的實作中，傳入`0`不會從`Stream`讀取任何位元，而是返回`0`；`DecodeNumber(0, 0)`也是直接返回`0`，這樣的實作方式省去判斷`code`是否為0。 ```python3 bits = st.GetBitN(code) dccoeff = DecodeNumber(code, bits) + olddccoeff i.base[0] = dccoeff ``` 2. 在AC coefficient中，讀到ZRL(`0xF0`)要將offset(`l`)加上16並讀取下一組，但實作上是分成 +15、+1兩次進行，也省下判斷式。 ```python3= if code > 15: l += code >> 4 # +15 code = code & 0x0F bits = st.GetBitN(code) if l < 64: coeff = DecodeNumber(code, bits) # print(coeff) i.base[l] = coeff * quant[l] l += 1 # +1 ``` 為了比較好釐清JPEG的實作，這邊刻意改寫成： ```python3= if code == 0xF0: # ZRL l += 16 # +16 continue elif code > 15: l += code >> 4 code = code & 0x0F bits = st.GetBitN(code) if l < 64: coeff = DecodeNumber(code, bits) i.base[l] = coeff l += 1 ``` 自己沒有注意到兩處小技巧在改寫時忽略掉幾點，解碼時在幾個MCU後解碼錯誤，因為是Bit-stream非常難除錯，最後還是靠比對原始程式碼發現這點。整體程式碼如下： ```python= class JPEG: def BuildMatrix(self, st, dcTableId, acTableId, quant, olddccoeff): i = DCT() code = self.huffman_tables[0b00 | dcTableId].GetCode(st) bits = st.GetBitN(code) dccoeff = DecodeNumber(code, bits) + olddccoeff i.base[0] = dccoeff l = 1 while l < 64: code = self.huffman_tables[0b10 | acTableId].GetCode(st) if code == 0: # EOB break if code == 0xF0: # ZRL l += 16 continue elif code > 15: l += code >> 4 code = code & 0x0F bits = st.GetBitN(code) if l < 64: coeff = DecodeNumber(code, bits) i.base[l] = coeff l += 1 i.base = np.multiply(i.base, quant) i.rearrange_using_zigzag() i.perform_IDCT() return i, dccoeff ``` #### 6.1.4 `DCT` 實作前一章的內容。 ```python= class DCT(): def __init__(self): self.base = np.zeros(64) self.zigzag = np.array([ [0, 1, 5, 6, 14, 15, 27, 28], [2, 4, 7, 13, 16, 26, 29, 42], [3, 8, 12, 17, 25, 30, 41, 43], [9, 11, 18, 24, 31, 40, 44, 53], [10, 19, 23, 32, 39, 45, 52, 54], [20, 22, 33, 38, 46, 51, 55, 60], [21, 34, 37, 47, 50, 56, 59, 61], [35, 36, 48, 49, 57, 58, 62, 63], ]).flatten() # Generate 2D-DCT matrix L = 8 C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) self.dct = C def perform_DCT(self): self.base = np.kron(self.dct, self.dct) @ self.base def perform_IDCT(self): self.base = np.kron(self.dct.transpose(), self.dct.transpose()) @ self.base def rearrange_using_zigzag(self): newidx = np.ones(64).astype('int8') for i in range(64): newidx[list(self.zigzag).index(i)] = i self.base = self.base[newidx] ``` #### 6.1.5 Reconstruct image 我們需要一張8倍數長寬的圖片作為解碼器測試用 1. 從twitter取得[400x400的Fubuki頭貼](https://twitter.com/shirakamifubuki) ![](https://pbs.twimg.com/profile_images/1482580721466486784/q36x-Uux_400x400.jpg) 2. 以[GIMP](https://www.gimp.org/)開啟後另存一張Optimized JPG ![](https://i.imgur.com/CHSDxt3.png) 3. 解碼。`decode`後產生的是RGB不同分量的二維矩陣的一維陣列，因此還需要用`dstack`疊成一個三維陣列 ```python= jpeg = JPEG('sample2.jpg') jpeg.decode() img = np.dstack(tuple([i for i in jpeg.decoded_data])).astype(np.uint8) ``` 4. 用`pyplot`展示結果 ```python= import matplotlib.pyplot as plt img = plt.imshow(img, interpolation='nearest') plt.axis('off') plt.savefig("output.png", bbox_inches='tight') ``` ``` real 0m5.922s user 0m6.220s sys 0m1.460s ``` ![](https://imgur.com/rmoDoOn.jpg) #### 6.1.6 Image resolution isn't mod8. JPEG對於非8倍數寬度圖像的做法，就是將其擴展至8倍數，至於如何擴展由實作決定，有以鏡像方式填充(padding)，為何是以鏡像方式？以留黑邊影像為例，黑邊在進行轉換到頻域時會產生分佈極廣的係數（回想step function經過fourier transform的結果），這會產生大量不同的位元需要編碼，因此以鏡像方式填補。以8x2的黑白影像分別以重複(edge)、鏡像(reflection)、對稱(symmetric)進行填補，使用[numpy.pad](https://numpy.org/doc/stable/reference/generated/numpy.pad.html)實作。 ```python= import numpy as np import matplotlib.pyplot as plt from scipy.stats import entropy class DCT(): def __init__(self): self.base = np.zeros(64) self.zigzag = np.array([ [0, 1, 5, 6, 14, 15, 27, 28], [2, 4, 7, 13, 16, 26, 29, 42], [3, 8, 12, 17, 25, 30, 41, 43], [9, 11, 18, 24, 31, 40, 44, 53], [10, 19, 23, 32, 39, 45, 52, 54], [20, 22, 33, 38, 46, 51, 55, 60], [21, 34, 37, 47, 50, 56, 59, 61], [35, 36, 48, 49, 57, 58, 62, 63], ]).flatten() # Generate 2D-DCT matrix L = 8 C = np.zeros((L, L)) for k in range(L): for n in range(L): C[k, n] = np.sqrt(1/L)*np.cos(np.pi*k*(1/2+n)/L) if k != 0: C[k, n] *= np.sqrt(2) self.dct = C def perform_DCT(self): self.base = np.kron(self.dct, self.dct) @ self.base quant = np.array([ [16, 11, 10, 16, 24, 40, 51, 61], [12, 12, 14, 19, 26, 58, 60, 55], [14, 13, 16, 24, 40, 57, 69, 56], [14, 17, 22, 29, 51, 87, 80, 62], [18, 22, 37, 56, 68, 109, 103, 77], [24, 35, 55, 64, 81, 104, 113, 92], [49, 64, 78, 87, 103, 121, 120, 101], [72, 92, 95, 98, 112, 100, 103, 99], ]) image = np.array([ [0, 255], [0, 255], [0, 255], [0, 255], [0, 255], [0, 255], [0, 255], [0, 255] ]) seq = ((0, 8-image.shape[0]), (0, 8-image.shape[1])) bins = np.linspace(-128, 127, 256) for mode in ('edge', 'reflect', 'symmetric'): m = DCT() if mode == 'reflect' or mode == 'symmetric': m.base = np.pad(image, seq, mode) else: m.base = np.pad(image, seq, mode) m.base = m.base.flatten() m.base -= 128 # level-shifting m.perform_DCT() m.base = m.base // quant.flatten() plt.hist(m.base, bins, alpha=0.5, label=mode) unique, counts = np.unique(m.base,return_counts=True) freq = counts/np.sum(counts) print(mode, entropy(freq, base=2)) plt.legend(loc='upper right') plt.show() ``` ![](https://imgur.com/lbovgCx.jpg) 結果顯示reflect和symmetric所產生的係數更為集中，如果我們計算一下三者的entropy，會發現reflect和symmetric能夠帶給我們更好的壓縮比。 ``` edge 1.7733292042478084 reflect 1.3967822215997983 symmetric 1.0960127271685762 ``` # 7. Conclusion 本篇介紹資料在JPEG中的佈局並實作[decoder]()，未來有空的話將實作encoder，需要處理SOI、DQT、SOF、DHT、SOS、EOI幾個重要的marker，因為注重在quality factor和圖片差異的關係而非壓縮率，因此Luma、Chroma採用的Huffman table採用[ISO/IEC 10918-1:199](https://www.w3.org/Graphics/JPEG/itu-t81.pdf)的Table K.3~K.6，quantization table則使用Table K.1~K.2，實作參考[ghallak](https://github.com/ghallak/jpeg-python/tree/2fe1bd2244c3090543695b106866dfa0a3b48f6c)。解碼過程中也發現，縱使處理MCU不用耗費太多時間，JPEG decoder的速度仍會受限於Huffman decoder的限制，因為必須解碼當前的符號才知道下一個符號的起始點。[t0rakka](https://t0rakka.silvrback.com/jpeg-decoding-benchmark)加入RST marker區隔MCU，達到多執行緒decode，效率驚人，可惜並非所有圖片都有加入RST，使用上較為受限。 # Reference - [Designing a JPEG Decoder & Source Code](https://www.impulseadventure.com/photo/jpeg-decoder.html) - [JPEG Huffman Coding Tutorial](https://www.impulseadventure.com/photo/jpeg-huffman-coding.html) - [從計算機編碼的角度看Entropy](https://hackmd.io/@sXG2cRDpRbONCsrtz8jfqg/ry-0k0PwH) - [DLCV Lab-影像壓縮](http://ip.csie.ncu.edu.tw/course/IP/IP1607cp.pdf)

Read more

2022q1 Homework3 (quiz3)

2021q1 Homework4 (quiz4)

JPEG Decoder (1/3)

JPEG Decoder (2/3)