多媒體技術概論

# 多媒體技術概論 ###### tags: `資工系課程` ## CH1 ### A/D Conversion Analog-to-digital conversion requires two steps: Sampling (取樣) and Quantization (量化) Sampling: - chooses discrete points at which to measure a continuous signal - The number of samples taken per unit time or unit space is called the sampling rate（同樣一張圖中採樣大小數量的差異） Quantization: - 要求每個樣本以固定數量的bits表示，稱為 bit depth（像是一格之中的bits空間大小，也就是可儲存數值的空間） ### Sampling Aliasing：在訊號頻譜上可稱作疊頻；在影像上可稱作疊影，主要來自於對連續時間訊號作取樣以數位化時，取樣頻率不夠導致採樣的離散信號無法重建原始信號。 Undersampling：sampling rate does not keep up with the rate of change in the signal. ### Nyquist Theorem Let f be the frequency of a sine wave. Let r be the minimum sampling rate that can be used in the digitization process such that the resulting digitized wave is not aliased. **Then the Nyquist frequency r is r = 2f** ### Quantization ![](https://i.imgur.com/5rhMkMM.png) #### Quantization Error ![](https://i.imgur.com/Cx4Yjfl.png) ### Fourier Transform Fourier analysis shows that any periodic signal can be decomposed into an infinite sum of sinusoidal waveforms ![](https://i.imgur.com/cSDxxaN.png) ### Compression Digital media files are usually very large, and they need to be made smaller—compressed Compression Rate： A file that is reduced by compression to half its original size: 50% compression or compression rate is 2:1 Compression algorithms can be divided into two basic types: **lossless compression and lossy compression** #### Types of compression Lossless compression： - No information is lost between the compression and decompression steps - Reduces the file size to fewer bits Method: - Run-Length encoding (RLE) - Entropy encoding - Arithmetic encoding Lossy compression： Sacrifice some information which is not important to human perception Method: - Transform encoding ### Run-Length encoding (RLE) Instead of storing value of each pixel, to store number pairs (c,n). c: grayscale value n: how many consecutive pixels have that value ![](https://i.imgur.com/ykZeYY9.png) ![](https://i.imgur.com/wEG3V4u.png) Example 3沒有compress到 ### Entropy Encoding Using fewer bits to encode symbols that occur more frequently, while using more bits for symbols that occur infrequently ![](https://i.imgur.com/YhXf7AN.png) #### Example of entropy encoding ![](https://i.imgur.com/cOVf1W4.png) ![](https://i.imgur.com/wZQC53U.png) ### Practical Solution: Shannon-Fano algorithm ![](https://i.imgur.com/m48vlPE.png) ![](https://i.imgur.com/JwTr42u.png) 使得頻率出現高的顏色以較少的bits表示上述的都是lossless compression ### Transform Encoding Lossy methods are often based upon transform encoding - Discrete Cosine Transform (DCT) - Discrete Fourier Transform (DFT) **No information is lost** in the DCT or DFT 當使用轉換時，可以在後面的步驟中丟棄冗餘或不相關的信息，從而減小文件的大小。這是該過程的有損部分。 ## CH2 ### Bitmaps **Resolution分辨率** 分辨率可以用每英寸像素數來衡量，縮寫為 ppi。200ppi圖像將使用200像素(pixels)打印出來，以確定水平和垂直方向上每英寸的顏色 **Pixel dimensions** The number of pixels horizontally (i.e., width, w) and vertically (i.e., height, h), denoted w × h ![](https://i.imgur.com/cJnkv4M.png) **Image size** The physical dimensions of an image when it is printed out or displayed on a computer - Pixel dimensions w x h - Resolution r - Printed image size a x b - a = w/r , b = h/r Example: an image is 1600 x 1200 and print it out at 200 dpi - The printed image size is 8” x 6” ### Frequency in digital image 在digital imaging領域，頻率是指顏色值變化的速率 ![](https://i.imgur.com/I6K7eI3.png) ![](https://i.imgur.com/NZR880s.png) ### Discrete Cosine Transform(DCT) ![](https://i.imgur.com/g4O696c.png) #### 1D Discrete Cosine Transform (DCT) ![](https://i.imgur.com/ajBuFKN.png) M pixels：訊號長度若使用其他8pixel訊號，黃圈的部分仍然不會改變，因為是取決於r跟u，M為8，u又跟M有關，r跟訊號無關，因此不會改變。 ![](https://i.imgur.com/3MJxrOD.png) 任何訊號應該都可以使用這八種frequency線性組合表示(? ![](https://i.imgur.com/V4b2f5b.png) ![](https://i.imgur.com/QuCQL42.png) #### Example : 4-point DCT 向量數量取決於M因為summary是r=0到R=M-1 ![](https://i.imgur.com/YX0g0lK.png) ![](https://i.imgur.com/w4IsrtY.png) ![](https://i.imgur.com/F2QwWSs.png) 因為前面的amplitude 7 跟-2比起0.9跟0.3大很多，所以只用這兩個coefficients來看結果 #### 2D DCT 看過就好 ![](https://i.imgur.com/iubDaBc.png) ![](https://i.imgur.com/9lc5veo.png) ![](https://i.imgur.com/V7hyPz0.png) ![](https://i.imgur.com/xBHJklk.png) ### JPEG Compression DCT的圖能量集中在左上，越右下數字越小，這是DCT的特性，因為f(0)的frequency最白(就是一整條水平線那個) ![](https://i.imgur.com/6YFkQQ2.png) #### Quantization 會發現右下角的數字都比較大，為了前一頁所講的右下角能量小的問題 ![](https://i.imgur.com/DzZ8QxE.png) #### Zigzag Scan ![](https://i.imgur.com/OaKP3KU.png) 到-26前沒有0 第二個-3前有1個0 在-1之前有5個0 ### Aliasing - Moiré patterns Moiré patterns：The sampling rate for the digital image is not high enough to capture the frequency of the pattern ### Bayer Color Filter 為了成本的考量，大多數的數位相機在單一像素位置上只會有單一顏色分量。最普遍使用到的色彩濾波矩陣為Bayer color filter array 使用插值法根據來自鄰近站點的信息推導出其他兩種顏色分量 #### Demosaicing（去馬賽克） photosite 上只會紀錄單一顏色，用內插法得知其他 2 個顏色的做法稱為去馬賽克 Algorithm • Nearest neighbor • Linear • Cubic • Cubic spline #### NN Example ![](https://i.imgur.com/nTKJa9u.png) black(0,0,0) white(255,255,255) red(255,0,0) blue(0,255,0) green(0,0,255) 先看A點的Bayer color部分，R為上下兩格，B為左右兩格，G看自己，所以對照actual，R的部分取A上下兩格的R值平均，紅白皆為255，B取左右兩格(白藍)皆為255，G則是看自己中心點為紅所以是0 #### Color Aliasing 插值無法完美再現所拍攝的場景，並且偶爾會在處理過程中產生顏色混疊（color aliasing），檢測為原始場景中不存在的莫爾條紋、條紋或色點。 ![](https://i.imgur.com/p1QuHLx.png) ### Anti-aliasing Anti-aliasing是一種減少由鋸齒引起的線條或邊緣鋸齒的技術這個想法是用其指定顏色的陰影為像素著色，與線條或邊緣覆蓋的像素數量成比例 ![](https://i.imgur.com/Lalm9hF.png) #### Aliasing of bitmap & vector graph 與bitmap相比，vector graph受鋸齒問題的影響較小，因為vector graph可以在不損失分辨率的情況下調整大小 • vector graph成像適用於具有純色和清晰邊緣和形狀的圖片 • bitmap成像適用於照片等連續色調的圖片 ### Color Three Characteristics of colors - Hue 色調 (essential color) - Saturation 飽和度 (color purity) - Luminance 亮度 (lightness) ### RGB color mode ![](https://i.imgur.com/NjIXkEb.png) ![](https://i.imgur.com/JJdZmxX.png) ### CMY color model ![](https://i.imgur.com/d1GtXHM.png) CMYK：在 CMY 顏色模式下，三基色的最大量應該組合成黑色，但實際上會產生深棕色，所以加上純黑 ### HSV Color Model ![](https://i.imgur.com/biMWR9X.png) ![](https://i.imgur.com/MLOuA1t.png) - 色相（H）是色彩的基本屬性，就是平常所說的顏色名稱，如紅色、黃色等。 - 飽和度（S）是指色彩的純度，越高色彩越純，低則逐漸變灰，取0-100%的數值。 - 明度（V），亮度（L），取0-100%。 ### YIQ color model 它將亮度信息捕獲在一個值中，並將顏色（色度）信息放入其他兩個值中色度：chrominance - Luminance component: Y - Chrominance components: I, Q ![](https://i.imgur.com/tHbgTZz.png) ### CIE color model 創建一個可以比較所有其他顏色模型的顏色空間 A standard color model that represents all visible colors was called CIE XYZ - 沒有三種可見的原色可以正量組合以創建所有顏色 - 使用三個“虛擬”原色——稱為X、Y 和Z——是純理論而非物理實體 - 前一個空間的線性變換 ![](https://i.imgur.com/PyKDJb1.png) ![](https://i.imgur.com/c8CZo4p.png) ![](https://i.imgur.com/7JO6lV1.png) #### CIE L* a* b* 由於 CIE L* a* b* 模型可以表示所有可見的顏色，所以它一般用作其他顏色模型之間轉換的中間顏色模型。 • L* axis gives brightness values: varying from 0 to 100, • a axis moves from red (positive values) to green(negative values), and • b axis moves from yellow (positive values) to blue (negative values). ### Bézier curves A Bézier curve is a parametric curve described by polynomials based on any number of control points ![](https://i.imgur.com/CSojj0l.png) P0-P1是curve的切線，p2-p3也是 ![](https://i.imgur.com/XPqawci.png) ![](https://i.imgur.com/mWNNdXj.png) ![](https://i.imgur.com/zJz5GLv.png) #### Example ![](https://i.imgur.com/ECZQdW9.png) ![](https://i.imgur.com/GkS6hOu.png) T就只是把多次項t^2、t、1補上 ![](https://i.imgur.com/mbrg4BF.png) #### de Casteljau algorithm 找兩點之間的中間點 ![](https://i.imgur.com/yTsk4ad.png) ## CH3 ### Digital image type – bitmap ![](https://i.imgur.com/tDz5I1B.png) alpha channel：control degree of transparency(透明度) ![](https://i.imgur.com/rVGjzhO.png) ### Color Quantization The process of reducing the number of colors in an image file is called color quantization. 減少顏色種類 ![](https://i.imgur.com/TApiLvd.png) Color quantization begins with an image file stored with a bit depth of n and reduces the bit depth to b. original file is $2^n$, and the number of colors representable in the adjusted file will be $2^b$. 步驟： 1. Determine the actual range and number of colors used in the original picture.以24bit為例，就有16,777,216種可能顏色 2. The second step entails choosing $2^b$ colors torepresent those that actually appear in the picture. 3. Map the colors in the original picture to the colors chosen for the reduced bit-depth picture. ![](https://i.imgur.com/SW2xMSo.png) actual color是透過演算法所選擇的顏色值，因為也不確定真正出現的顏色有哪些 ![](https://i.imgur.com/zphbYzJ.png) 第二排應該要是24bits的顏色值，第三排是圖片原始的值，用nn找代表的index 上方表格則是nn後的距離以及代表的index ![](https://i.imgur.com/mQgovMk.png) ### Popularity algorithm The $2^b$ colors that appear most often in the picture are chosen for the reduced-bit depth picture 留下最常出現的顏色缺點是不會選擇留下不常出現的顏色 ### Uniform Partitioning Algorithm ![](https://i.imgur.com/ky6PrQm.png) 第一步:把圖片中每種顏色值的點找出來，再找出可包含所有點的最小盒子第二步:最後公平切出幾個block，每個block選擇一種顏色代表，一樣用color centroid（找色心）缺點：如果顏色值所代表的點分配很不均勻，像是某些區域的點特別多，但只會用一種顏色去表示，但其他地方很少點的也會給一種顏色表示，會使得圖片顏色表示不好。 ### Median-cut algorithm The algorithm proceeds by a stepwise partitioning where at each step some sub-block containing $2^n = c$ colors is divided in half along its longest dimension such that c/2 of the existing colors from the image are in one half and c/2 are in the other. ![](https://i.imgur.com/llysD26.png) ![](https://i.imgur.com/S9AXuxl.png) ![](https://i.imgur.com/Ols492y.png) ![](https://i.imgur.com/ahmqvG1.png) ![](https://i.imgur.com/LB8Z7W2.png) 最後指定一個顏色(color centroid) ### Octree Algorithm ![](https://i.imgur.com/v4XdsGy.png) The idea here is to build a tree structure containing always a maximum of $2^b$ different colors. 可能透過z型去掃描整張圖，每到一個pixel就去掃octree，沒找到的話，有可能會新增不同樹(顏色)到結構中，像前一張一樣建立 The $2^b$ leaves are used as entries for the color look-up table 如果要挑八種顏色，一開始掃整個圖，就用最先建成的八個點建成樹，挑超過八個顏色之後，之後的第九個點會找最相近的點，之後兩個點的RGB值做平均，就用這顏色代表這個點。 ### Dithering 用很相近的顏色，建成視覺上很像別的顏色，像是紅跟藍建成像紫色 ![](https://i.imgur.com/8pwqOx3.png) ![](https://i.imgur.com/1Bcha0i.png) ### Thresholding You have a grayscale image that uses eight bits per pixel and decide to reduce it to a black and white bitmap that uses one bit per pixel 超過threshold就指定為白沒超過就是黑 ![](https://i.imgur.com/0q9qnjb.png) ### Noise Dithering 比前面ㄉThresholding效果好 1. Generate a random number from 0 to 255 2. If the pixel’s color value is greater than the number then it is white, otherwise black 3. Repeat step 1 for the next pixel in the image ![](https://i.imgur.com/6jf0xLb.png) ### Average Dithering 1. Calculate an average pixel value 2. If a pixel value is above this value then set it to white, else black. 3. Repeat step 2 for each pixel in the image ![](https://i.imgur.com/UWltF49.png) ### Pattern Dithering Also called ordered dithering,Bayer method Step: 1. Break the image into small blocks 2. Define a threshold matrix • Use a different threshold for each pixel of the block • Compare each pixel to its own threshold ![](https://i.imgur.com/J80S51Q.png) ![](https://i.imgur.com/pcn9she.png) ### Error Diffusion Dithering Also called Floyd-Steinberg algorithm 用這個方法是希望若p的值比較暗，用這個mask會使得p周圍的值比較亮，反之則是鄰居變黑，因為此時e的值會變負的，加上之後會讓周圍的值變暗。 ![](https://i.imgur.com/s9tpAHv.png) mask：因為是z字型的掃整張圖，所以只需要target(p)旁邊以及下面有數字假設p=10，此時e=10，就可以計算該pixel z字型的其他值，旁邊的就是7/16　*　10+p(x+1,y) ![](https://i.imgur.com/MdjAZBJ.png) ### Image Transform a process of changing the color or grayscale values of image pixels ![](https://i.imgur.com/dExTSt6.png) Image transforms can be divided into two types 1. Pixel point processing - a pixel value is changed based only on its original value, without reference to surrounding pixels 值改變只跟自己有關，沒有參考周邊的值 2. Spatial filtering - changes a pixel’s value based on the values of neighboring pixels 值改變跟鄰居有關 ### Histogram A histogram is a discrete function that describes frequency distribution 直方圖 ![](https://i.imgur.com/hlu4Htu.png) ### Transform function We define a transform as a function that changes pixel values $g(x, y) = T( f(x, y) ) , p2=T(p1)$ ![](https://i.imgur.com/lgZ54Kx.png) b. 整個curve往上移動，使得整體圖片變亮，此線為transform function ![](https://i.imgur.com/9XmagAO.png) ### Gamma Transform $s = r^γ , 0 ≤r,s ≤ 1$ ![](https://i.imgur.com/MyiRubr.png) ![](https://i.imgur.com/7GbHMwm.png) ### Filter A filter is an operation performed on digital image data to **sharpen, smooth, or enhance some feature, or in some other way modify the image** ### Convolution ![](https://i.imgur.com/vIFYGa9.png) ![](https://i.imgur.com/LjX9qIk.png) 如果周邊沒有數值(剛好在邊界)，會有四種方法，第一是周邊當成0，第二是用mean代表?，第三種是只用mask部分，第四是邊邊不做convlution ### Gaussian blur 會有馬賽克模糊的效果 ![](https://i.imgur.com/ui8TorE.png) ### An edge-detection filter 套filter要左右、上下翻轉，變成藍色的mask，因為前面57頁的公式，所以算出來應該是-255 ![](https://i.imgur.com/5nTxmN1.png) ### Unsharp mask this filter actually sharpens images ![](https://i.imgur.com/zQliL51.png) ### Replication The simplest method for upsampling is replication, a process of inserting pixels and giving them the color value of a neighboring pre-existing pixel ![](https://i.imgur.com/ZdjQvVd.png) 直接複製旁邊的 ### Row-column Deletion The simplest method of downsampling is row-column deletion, the inverse of replication 直接刪除一列或是一行 Row-column deletion throws away information about the image, so you obviously lose details. ### Interpolation Interpolation is a process of estimating the color of a pixel based on the colors of neighboring pixels #### The first two steps in resampling ![](https://i.imgur.com/Wj4xkJP.png) ![](https://i.imgur.com/K3QyTEW.png) 16/6是scale factor ### Nearest neighbor interpolation Nearest neighbor interpolation simply rounds down to find one close pixel whose value is used for fs(i, j) ### Bilinear interpolation Bilinear interpolation uses four neighbors and makes fs(i, j) a weighted sum of their color values. The contribution of each pixel toward the color of fs(i, j) is a function of how close the pixel’s coordinates are to (a, b). 距離越近的權重越大 ![](https://i.imgur.com/G08Qo7R.png) ![](https://i.imgur.com/AtHv03j.png) 用nn則是找最近的點，則gray value就是100 ### LZW compression LZW compression is a method that is applicable to both text and image compression. **It is a lossless compression** commonly applied to GIF and TIFF image files ![](https://i.imgur.com/vmIG9Dd.png) If the pixel sequence is already in the code table, the window is successively expanded by one pixel until finally a color sequence not in the table is under the window. 一格黃色的已經在表格裡了所以又再多選一格總共變兩格黃，之後下面一個則是兩格黃色有了又多選一格，發現三格黃表格裡沒有所以丟進表格裡。每次的開頭都是上一個的尾巴 ![](https://i.imgur.com/ayENs2Z.png) Code = 5 7 7 1 10 4 ### Huffman encoding Huffman encoding is another lossless compression algorithm that is used on bitmap image files. It is a variable-length encoding scheme; that is, not all color codes use the same number of bits. 1) determining the codes for the colors 2) compressing the image file by replacing each color with its code ![](https://i.imgur.com/slD1oem.png) ![](https://i.imgur.com/tc5g4e9.png) 要先sort然後把兩個最小的加起來 ![](https://i.imgur.com/1vRxWKp.png) 越少使用的顏色在樹中的node越深，需要的bit表示也越多，但是常用的則是bit需要的比較少與Shannon-Fano相似之處是常用的bit需要的也是較少，huffman是buttom-up往上建、Shannon是top-down往下分，編碼結果可能不同，但效率差不多 ![](https://i.imgur.com/4IFlcdz.png) Also, because the codes are created from the tree data structure, no code can be a prefix of another code prefix：不會有w=001b=00的情況產生，不然在碰到0001001就不知道怎麼處理，此時b為w的prefix ### JPEG Compression JPEG is a **lossy compression** method The main disadvantage to JPEG compression is that it takes longer for the encoding and decoding than other algorithms require Step1 : Divide the image into 8 × 8 pixel blocks and convert RGB to a luminance/chrominance color model 像是轉成HSV Step 2: Shift values by –128 and transform from the spatial to the frequency domain Transforming from the spatial to the frequency domain makes it possible to **remove high frequency components.** 如果顏色值在小空間內快速上升和下降，則會出現高頻分量。這些微小的變化在大多數人的視覺中幾乎無法察覺，因此移除它們並不會顯著影響圖像質量。 ![](https://i.imgur.com/0s7nq7V.png) ![](https://i.imgur.com/hslsvqv.png) Step 3: Quantize the frequency values Quantization involves dividing each frequency coefficient by an integer and rounding off. ![](https://i.imgur.com/D3s8jNn.png) Step 4: Apply DPCM to the block DPCM是differential pulse code modulation的縮寫 DCPM 只是儲存前一個 8 × 8 塊中的第一個值(DC component)與目前塊中的第一個值之間的差異。由於差異通常小於實際值，因此這一步增加了壓縮 Step 5: Arrange the values in a zigzag order and do run-length encoding. 重新排序將值從低頻分量排序到高頻分量。最後將高頻係數組合在一起。如果它們中的許多在量化後舍入為零，則run-length encoding會更加有效。 ![](https://i.imgur.com/sq3bKJF.png) 後面很多是0 Step 6: Do entropy encoding. In JPEG, Huffman coding is used. ### RGB to YC~b~C~r~ ![](https://i.imgur.com/2fgiaIg.png) 大概看看 ### Chrominance subsampling ![](https://i.imgur.com/LVYJLve.png) luminance/chrominance subsampling is represented in the form $a:b:c$ ![](https://i.imgur.com/oXZfC2Z.png) 黑色是有填充的部分 ### Transform Coding ![](https://i.imgur.com/fTTjW0p.png) ### Quantization Tables in JPEG ![](https://i.imgur.com/qevQzRS.png)