# Compression of Images and Signals [toc] --- ## Lecture 1 ![image](https://hackmd.io/_uploads/HkeUbat66.png) :::info - **Motivation:** - Make data smaller to make it being transmitted and being stored efficiently. - Standardization in image and video coding, like JPEG, MPEG - In terms of coding efficiency ![image](https://hackmd.io/_uploads/rkg6dWUnp.png) > We can see the compression efficiency is gradually improved until 2020. - procedure of compression ![image](https://hackmd.io/_uploads/rkv4sb8hT.png) - Lossless compression: Original data can be recovered exactly from compressed data which can not tolerate any difference from original data. - Lossy compression: Original data can not be recovered exactly from compressed data which can tolerate the difference from original data. - Measurement of compression performance ![image](https://hackmd.io/_uploads/r1yGhW8np.png) - Modeling and coding ![image](https://hackmd.io/_uploads/ByhXTZL2T.png) - Example 1 ![image](https://hackmd.io/_uploads/BJEeyzLhp.png) - Example 2 ![image](https://hackmd.io/_uploads/rypbkMU26.png) - Example 3 ![image](https://hackmd.io/_uploads/S1yQ1zL3T.png) - Self-information ![image](https://hackmd.io/_uploads/H1kulf82p.png) - example ![image](https://hackmd.io/_uploads/rk4AgG8na.png) - Self-information and entropy ![image](https://hackmd.io/_uploads/SydU-M83a.png) - Source entropy ![image](https://hackmd.io/_uploads/HyKFGfLnT.png) - example 1: assume they are totally independent seq. ![image](https://hackmd.io/_uploads/rJZsGM82p.png) - example 2: assume they are somehow related ![image](https://hackmd.io/_uploads/BJR6MfIn6.png) - example 3: combine multiple elements into one ![image](https://hackmd.io/_uploads/HJQe4fL26.png) > if we can know about structure (model), we can reduce entropy - models - simplest(ignorance) model: independent and smae prob. letters - model with prob. of independent alphabet letters - model with dependence of elements - Markov models - example 1 & example 2-1 ![image](https://hackmd.io/_uploads/Hy3lHMLnp.png) - example 2-2 ![image](https://hackmd.io/_uploads/H1y7HGIh6.png) - example 2-3 ![image](https://hackmd.io/_uploads/B1arSGIha.png) - In many applications, it is not easy to use a single model to describe source, we need composite source model ![image](https://hackmd.io/_uploads/S1oWKNL2p.png) - meaning of coding ![image](https://hackmd.io/_uploads/S1Ko8GUna.png) - for sure fixed-length code is not efficient - example ![image](https://hackmd.io/_uploads/rkG2Pz83T.png) - check if there is no need to check unique decodability of prefix codes ![image](https://hackmd.io/_uploads/H1heOGI2p.png) - Kraft-McMillan inequality: to show the necessary codeword lengths it requires ![image](https://hackmd.io/_uploads/HJHDPHUhT.png) - Problem on algorithmic information theory: given string is fixed, means nothing probabilistic about it ![image](https://hackmd.io/_uploads/B1-IuHLhp.png) - Kolmogorov complexity provides solution - [reference video](https://youtu.be/0cHHKDAelCo?si=f0Q_uFVPkZU5uWFI) - example ![image](https://hackmd.io/_uploads/BkUJ9BLnT.png) ::: :::success 1. Characterize lossless and lossy compression methods, list examples of compression of selected data types and signals in both groups. - lossless compression: original data can be fully recovered from the compressed data which can not tolerate any difference from original data - lossy compression: orignal data can not be recovered exactly from compressed data, which can tolerate the difference from original data 3. Specify the quantities that are typically used to characterize the compression efficiency with respect to uncompressed data. - compression ratio, it is the ratio of number of bits required to represent data before the after compression, before divided by after 5. Define the quantities of self-information and entropy, their relationship and meaning with respect to signal compression. - self-information: Event A is from random experiment, P(A) is probability that A will occur, used to quantify the surprise of specific outcome ![image](https://hackmd.io/_uploads/B1aDz1zNC.png) - their relationship: average self-info. associated with random experiment, the quantity is called entropy ![image](https://hackmd.io/_uploads/H1v3xJMVC.png) 7. Describe basic models used in lossless coding. - Run-Length Encoding (RLE): one of the simplest forms of lossless compression - Example: For the string "AAAABBBCCDAA", RLE would compress it to "4A3B2C1D2A" - Huffman Coding: assigns variable-length codes to input characters - if there is a set of characters with frequencies, it would generate a binary tree to assign shorter codes to more frequent characters - Arithmetic Coding: it encodes an entire message into a single number, a fraction between 0 and 1 9. Define a uniquely decodable code. - each symbol is assigned to unique codeword 11. Explain Kraft-McMillan inequality for uniquely decodable code. - if it satisfies Kraft-McMillan inequality, then there exists uniquely decodable code including prefix code with those codeword length - example ![image](https://hackmd.io/_uploads/B1zI5vzEA.png) 13. Explain how to determine the Kolmogorov complexity of a sequence. - if given string is fixed, means nothing probabilistic about it -> Kolmogorov complexity can solve it - use the length of the shortest possible description to determine - example: 010101010101 -> 01\*6 15. Determine the source entropy based on the sequence with a given frequency of occurrence of individual symbols. - step 1 ![image](https://hackmd.io/_uploads/Bkk6gvfEA.png) - step 2 ![image](https://hackmd.io/_uploads/rk7CeDMV0.png) - step 3 ![image](https://hackmd.io/_uploads/HJOJWwGNR.png) 16. Based on the code table, decide whether the code is uniquely decodable and determine its average length. - uniquely decodable code ![image](https://hackmd.io/_uploads/rkKECIMNA.png) - if A with prob. 0.5, B with prob. 0.125, C with prob. 0.25, D with prob. 0.125, average length will be 1.875 - not uniquely decodable code ![image](https://hackmd.io/_uploads/B1z5CUGNR.png) ::: --- ## Lecture 2 :::info - introduction of Shannon-Fano coding ![image](https://hackmd.io/_uploads/r16R-SJT6.png) - example of Shannon-Fano coding ![image](https://hackmd.io/_uploads/rkhc-r1pT.png) - higher prob. with shorter code - it is almost not used nowsday, because Huffman coding has better performance - it is kind of like from top to the bottom ![image](https://hackmd.io/_uploads/Hk2WjPl6p.png) - introduction of Huffman coding ![image](https://hackmd.io/_uploads/H1a4GSy6a.png) - description of algorithm ![image](https://hackmd.io/_uploads/rkdLXBkaa.png) - example of Huffman coding - in terms of steps, see slide - it is like from bottom to the top (combination from lower prob. and most right side) ![image](https://hackmd.io/_uploads/BkVIsPl6p.png) - result ![image](https://hackmd.io/_uploads/SkRPSry66.png) - average code length for sure is greater or equal than source entropy - Huffman code procedure - original sorting procedure ![image](https://hackmd.io/_uploads/SybWPHy6a.png) - modified one (used for fixed rate application) ![image](https://hackmd.io/_uploads/r1kBDBJTT.png) - comparison ![image](https://hackmd.io/_uploads/HJB8DB1ap.png) - Canonical Huffman codes ![image](https://hackmd.io/_uploads/BkkPury6T.png) - the encoding information passed to the decoder can be made more compact and memory efficient. For example, one can simply pass the bit lengths of the characters or symbols to the decoder. - example ![image](https://hackmd.io/_uploads/rkCLFHkaa.png) - another modification of Huffman code (to limit the length of codeword, because sometimes too long codeword will not fit the hardware design) ![image](https://hackmd.io/_uploads/BJktFS1a6.png) - explanation ![image](https://hackmd.io/_uploads/SJ1Ccr1aa.png) - start explanation 1-1 ![image](https://hackmd.io/_uploads/HkvyoBkpT.png) - explanation 1-2 ![image](https://hackmd.io/_uploads/r1TloBya6.png) - explanation 1-3 ![image](https://hackmd.io/_uploads/HkxXsH16a.png) - now it successfully limits the code length, and there is cost of limiting length - optimality of Huffman codes ![image](https://hackmd.io/_uploads/r1hHyIkpT.png) - length of Huffman codes ![image](https://hackmd.io/_uploads/BJJDJ8yTa.png) - implementation of Huffman coding (encoding) ![image](https://hackmd.io/_uploads/SkzukIyTT.png) - implementation of Huffman coding (decoding) ![image](https://hackmd.io/_uploads/ByVo1IJp6.png) - example 1 ![image](https://hackmd.io/_uploads/SySexI1a6.png) - example 2 ![image](https://hackmd.io/_uploads/SkjGlUJ66.png) - brief discussion about Adaptive Huffman coding - traditional Huffman coding ![image](https://hackmd.io/_uploads/Sy2m-UJaT.png) - adaptive Huffman coding ![image](https://hackmd.io/_uploads/SkCSW8JTa.png) - now we can dynamically adjust the weights of each lead based on the prob. each leaf appears - Golomb codes ![image](https://hackmd.io/_uploads/Hycz78kpT.png) - introduction of Golomb coding ![image](https://hackmd.io/_uploads/SJMNXUy66.png) - compared to Huffman code, it is explicitly dealt with as natural numbers, rather than being abstract symbols - example of Golomb coding ![image](https://hackmd.io/_uploads/SJmL7U1pT.png) - just needed to know the terms ![image](https://hackmd.io/_uploads/S1dp4Ukaa.png) - Tunstall code ![image](https://hackmd.io/_uploads/HkFZBUyTp.png) - requirement of Tunstall code ![image](https://hackmd.io/_uploads/ByAmSL16T.png) - example ![image](https://hackmd.io/_uploads/SyJUHL16T.png) ::: :::success 1. Explain the principle of Shannon-Fano coding method utilizing Shannon-Fano tree. ![rkhc-r1pT](https://hackmd.io/_uploads/ryHGx174R.png) - sort the list of symbols based on freq. descending left to right - divide list into two parts, prob. of both side should be as close as possible - left side is binary 0 and right side is binary 1 - iterate until each symbol has corresponding code leaf 3. Explain the principle of Huffman coding and algorithm of codeword generation using binary tree. What are the two main observations on optimum prefix codes? ![SybWPHy6a](https://hackmd.io/_uploads/SJGixJQEC.png) - procedure - sort the list from low prob. to high prob. - combine the two lowest prob. symbols, high prob. side is 0, low prob. side is 1 - iterate until each symbol has corresponding code leaf - two observations on optimum prefix codes - more frequent symbols have shorter codewords - two symbols that occur least frequently have the same codeword length 5. How does Shannon-Fano encoder differ from Huffman encoder namely regarding optimality? - Shannon Fano coding is not guaranteed to produce optimum prefix codes, especially if the prob. of symbols are not evenly distributed 7. Explain what a minimum variance Huffman code is and how this can be obtained? - explanation: suitable in fixed rate application, the difference is the combining code will be placed at higher position in list ![r1kBDBJTT](https://hackmd.io/_uploads/BJ8MOJX4C.png) - it tries to make the code length as identical as possible ![HJB8DB1ap](https://hackmd.io/_uploads/BkVpOyQN0.png) 9. What will be the optimum Huffman code average length related to the source entropy? - source entropy ![image](https://hackmd.io/_uploads/BJIa1g7NR.png) - Huffman code average length ![image](https://hackmd.io/_uploads/SJXCJlmVR.png) - relationship between code length and entropy ![image](https://hackmd.io/_uploads/Hk4flxQ4A.png) - there is example - calculate entropy ![image](https://hackmd.io/_uploads/HJnwgg7ER.png) - calculate average length ![image](https://hackmd.io/_uploads/SJOcglQNC.png) 11. What is the main problem with static methods of lossless coding (e.g. Huffman coding) and what is the advantage of the adaptive coding procedure? - traditional Huffman coding requires knowledge of source seq. prob. - if we use adaptive coding procedure, we can dynamically adjust the weights of each lead based on the prob. each leaf appears (maybe use linear approximation) ::: --- ## Lecture 3 - [Arithmetic coding (encoding and decoding procedure)](https://youtu.be/4yYgRAHtDLk?si=U49FyvkJkDKWkUv_) :::info - compared to Huffman coding ![image](https://hackmd.io/_uploads/r1stsuOp6.png) - recall Huffman coding procedure ![image](https://hackmd.io/_uploads/S1nxh_dpp.png) > because of skewed distribution of prob. of the alphabets - how to improve it? extended alphabet (good, but it is not efficient in memory requirements) ![image](https://hackmd.io/_uploads/r1kohuua6.png) > impractical in Huffman codes for long sequences of symbols - introducing Arithmetic coding ![image](https://hackmd.io/_uploads/SkaUpOOap.png) - coding seq. ![image](https://hackmd.io/_uploads/rkT36OOTp.png) - example ![image](https://hackmd.io/_uploads/B1qz1YdTT.png) - generating tag in arithmetic coding ![image](https://hackmd.io/_uploads/HJ8nJKuap.png) - any member of interval can be used as **Tag** - usually choose midpoint of the interval - example 1 ![image](https://hackmd.io/_uploads/ryq5lY_6T.png) - example 2 ![image](https://hackmd.io/_uploads/HJSyutuap.png) - deciphering tag ![image](https://hackmd.io/_uploads/rJgcXPqTp.png) - the key is to know where to stop deciphering ![image](https://hackmd.io/_uploads/B1koQD5aa.png) - arithmetic code properties - properties ![image](https://hackmd.io/_uploads/BJwDYYu66.png) - truncated representation can limit the length of coding - increase sequence length, then closer to entropy - arithmetic coding implementation ![image](https://hackmd.io/_uploads/r1BWFYupp.png) - update formula ![image](https://hackmd.io/_uploads/rkXcVYdap.png) - example of Arithmetic encoding - step 1 ![image](https://hackmd.io/_uploads/Sy0CYKuTp.png) - step 2 ![image](https://hackmd.io/_uploads/BJkz-PqpT.png) - step 3 ![image](https://hackmd.io/_uploads/S1e7bP5aa.png) - step 4 ![image](https://hackmd.io/_uploads/SJur-v9aa.png) - step 5 ![image](https://hackmd.io/_uploads/r1JBMvcTa.png) - example of Arithmetic decoding - step 1 ![image](https://hackmd.io/_uploads/Hyz8e_5ap.png) - step 2 ![image](https://hackmd.io/_uploads/BkX6xu5a6.png) - step 3 ![image](https://hackmd.io/_uploads/S1kfbdcpa.png) - step 4 ![image](https://hackmd.io/_uploads/H1nfWO9ap.png) - step 5 ![image](https://hackmd.io/_uploads/HkY7W_q66.png) - integer implementation of Arithmetic coding ![image](https://hackmd.io/_uploads/ryE-JOc6a.png) - useful in scenarios where floating-point arithmetic is computationally expensive - it simplifies the implementation, care must be taken to address issues related to precision, scaling, and potential overflow during long sequences - Adaptive arithmetic coding scheme ![image](https://hackmd.io/_uploads/ByZHTwcpa.png) - decoder side has the same update - **Comparison of Huffman coding and Arithmetic coding** ![image](https://hackmd.io/_uploads/By3sg5_aT.png) | Feature | Huffman coding | Arithmetic coding | | --------- | -------------- | ----------------- | | performance | optimal only for prob. in powers of two, storage consuming | easier to implement | | Implementation | harder (storage requirement) | easier | | Understanding | easy and for education | harder to understand | | Complexity | easier | harder | - Arithmetic coding adapts well to varying symbol probabilities, making it suitable for sources with changing statistics > limitations in dynamic range due to finite precision arithmetic ::: :::success 1. Explain the principle of arithmetic coding for sequences of symbols. Which fundamental model/function is needed for code generation? - if there is skewed distribution, Huffman coding is not efficient (big average code length) - arithmetic coding uses a single floating-point number in the interval \[0,1), any member of interval can be used as Tag, usually choose middle point as tag 3. What are the two ways of knowing that the entire sequence was decoded using arithmetic code? - decoder know the length of message - there is special end of transmission symbol 5. What is the principle of generating binary code in arithmetic coding and its integer implementation? - procedure of generating binary code - start with the entire rage \[0,1) - for each symbol in the seq., divide the current interval into smaller subintervals based on prob. of symbols - update the interval based on the subinterval we got - we get the binary code at the middle point of interval - integer implementation: use integers instead of floating-point numbers can improve efficiency and avoid precision issues - use large integer for interval \[low,high) - set the interval by a fixed precision, such as using 32-bit or 64-bit - for each symbol in the seq., divide the current interval into smaller subintervals based on prob. of symbols - update the interval based on the subinterval we got 7. Explain the principle of binary arithmetic coding (coding of binary data representation). What is the fundamental model parameter related to the more probable symbol (MPS) and less probable symbol (LPS)? - it is a specific form of arithmetic coding for binary symbol - procedure - start with the entire rage \[0,1) - determine the prob. of binary symbol 0 and 1 - other steps are the same as previous - for the final interval, lower bound is chosen as the tag - MPS with higher prob. p, LPS with lower prob. denoted by 1 - p 9. Explain binary arithmetic coding using QM and M coder. - QM coder: prob. of LPS, range, current value ![image](https://hackmd.io/_uploads/HyfM3GXEA.png) - M coder: lower bound + range, current value, use a fixed set of probability models ![image](https://hackmd.io/_uploads/rJC8nzQVC.png) 11. What dictionaries are used for dictionary coding? - a form of lossless data compression - Static Dictionary: contains a predefined set of entries or patterns known to both the encoder and decoder - Adaptive Dictionary: starts with a predefined set of entries but can also update and expand based on the data 13. Compare the average code length for a symbol sequence of m symbols (block of m symbols) achievable using arithmetic and Huffman coding in relation to the entropy H. - the relation is like this ![image](https://hackmd.io/_uploads/r12-rMmNR.png) - but Huffman procedure requires building entire codebook ::: --- ## Lecture 4 :::info - Glossary ![image](https://hackmd.io/_uploads/Bk4vvnb0T.png) - Static dictionary ![image](https://hackmd.io/_uploads/S1xet3b0p.png) - example ![image](https://hackmd.io/_uploads/B1kQKhbAT.png) - Adaptive dictionary - LZ approach - LZ77 approach: there is no matching, so that's why we add decoded code at the end. - [Referenced video -> encoding/decoding](https://youtu.be/cSyK2iCqr4w?si=TaZSdGFDDQorpOi3) - Patterns recuring over longer period than window -> not captured, it is not efficient - LZ78 solves the issue we mentioned, every term has its own index - [Referenced video -> encoding/decoding](https://youtu.be/qVmpPk1C2wc?si=D4CHRw77w_Gg12dt) - LZW -> modification of LZ78 by Terry Archer Welch -> encoder only sends index to dictionary - [Referenced video](https://youtu.be/KJBZyPPTwo0?si=EOUgnkIgm6k9vAGZ) - the application of LZ dictionary technique - LZW -> GIF -> suitable for computer generated images - Patent concerns -> increasing use of LZ77 -> PNG, gzip - Summary - for dictionary - effective for compressing data that contains repeated patterns or sequences - Well-suited for compressing text data, where words, phrases, or sequences are often repeated - pros and cons of LZ77 > less efficient, and high memory requirement ![image](https://hackmd.io/_uploads/rJX6ZvXCT.png) - pros and cons of LZ78 > higher efficient, and higher memory requirement ![image](https://hackmd.io/_uploads/HkIyMvQRp.png) - pros and cons of LZW > higher performance to high redundancy, but vulnerable to highly diverse or random data ![image](https://hackmd.io/_uploads/H1--GwXAT.png) - for context-based compression - effective for compressing data with predictable or structured patterns - pros and cons of Prediction with Partial Match (PPM) ![image](https://hackmd.io/_uploads/HksjAv7A6.png) - pros and cons of Burrows-Wheeler Transform (BWT) > BWT is often used as a preprocessing step in combination with MTF or RLE ![image](https://hackmd.io/_uploads/H1F6AwXAT.png) - pros and cons of Move-to-Front (MTF) Coding > Often used as a preprocessing step or in conjunction with other compression techniques ![image](https://hackmd.io/_uploads/Sygyy_mAa.png) - pros and cons of Run-Length Coding (RLE) > Can be used in combination with other compression techniques as part of a hybrid compression approach to improve overall compression ratios ![image](https://hackmd.io/_uploads/SJ2zk_QAT.png) ::: :::success 1. Explain the main principle used in dictionary coding techniques. - identify repetitive patterns within the data to achieve compression - static approach: have sufficient prior knowledge - adaptive approach: get knowledge while encoding 3. What is the difference between static and adaptive dictionaries in dictionary techniques? - Static Dictionary: contains a predefined set of entries or patterns known to both the encoder and decoder ![B1kQKhbAT](https://hackmd.io/_uploads/rkUSMQmNR.png) - Adaptive Dictionary: starts with a predefined set of entries but can also update and expand based on the data 5. Explain the encoding process using LZ77 dictionary technique. - encoder goes through input seq. using sliding window - window consists of two buffers - search buffer: contains portion of recently encoded seq. - lookahead buffer: contains portion of next encoded seq. - triplet - offset: the distance from the currect position in lookahead buffer to the start of the matching seq. in search buffer - length: the length of the matching seq. - next symbol: the next symbol in lookahead buffer that follows the matching seq. - [example](https://youtu.be/cSyK2iCqr4w?si=2KPPrSd_tea7eS9h) 7. Explain the main differences between LZ77, LZ78, and LZW dictionary coding techniques. - LZ77: if patterns recuring over longer period than window -> not captured, and it will not be efficient - LZ78: not use triplet, only two members index and last symbol, there is no sliding window to solve LZ77's problem - LZW: encoder only sends index to dictionary 8. Explain the main principles used in context-based compression techniques. - use the context of symbols to predict and encode data more efficiently, in the case with skewed prob. - so it is efficient for compressed data with predictable patterns 10. Explain move-to-front (mtf) and run-length encoding (RLE) - MTF: often used as preprocessing step for data compression ![image](https://hackmd.io/_uploads/BJcRDr7E0.png) - find the position(index) of the symbol in list - output the index - move the symbol to the front of list - RLE: the same data value occurs in many consecutive elements are stored as single data value ![image](https://hackmd.io/_uploads/ry-VKrQER.png) - for each run, output the symbol and the count of its repetitions ::: --- ## Lecture 5 :::info - lossless compression ![image](https://hackmd.io/_uploads/r1e9Zgs0a.png) - example of digital signal ![image](https://hackmd.io/_uploads/S1MrGejCT.png) - at least need to know the meaning of variables ![image](https://hackmd.io/_uploads/HyXqNxsA6.png) - signal-to-mask ratio ![image](https://hackmd.io/_uploads/B12Gugo0a.png) - lossy compression ![image](https://hackmd.io/_uploads/B1TrugsRp.png) - perceptual entropy ![image](https://hackmd.io/_uploads/ryWdugsCp.png) - listening testing - motivation ![image](https://hackmd.io/_uploads/rkUI5eiAa.png) - recommendation ITU-R BS.1116 ![image](https://hackmd.io/_uploads/BkwkjeoAp.png) - Experimental variables ![image](https://hackmd.io/_uploads/SymGsxjCp.png) - basic binaural cues ![image](https://hackmd.io/_uploads/rke1M-iAT.png) ::: :::success 1. What is the relationship between dynamic range of the audio signal and associated number of bits used in uniform quantization? - The same resolution, more number of bits can represent wider range of audio signal. [reference](https://youtu.be/UKtGMNAUlsc?si=qT4awOxqGswYWnFM) 3. Describe the basic anatomy of the human ear (peripheral part), name each of the three main parts, describe in what environment and in what form the signal spreads and briefly describe the function of the individual parts. - outer ear - pinna(auricle) collects sound waves from the surrounding environment - ear canal directs the sound waves towards the eardrum - middle ear - three tiny bones called the ossicles: the malleus (hammer), incus (anvil), and stapes (stirrup), transmit and amplify the vibrations from the eardrum to the inner ear - inner ear - cochlea is responsible for converting sound vibrations into electrical signals that can be interpreted by the brain - The semicircular canals and vestibule are involved in balance and spatial orientation 5. Explain the concept of critical bands. What "scales" are used in this context? - auditory filter and cochlear filter are the same thing, [reference](https://youtu.be/KZj1YjwJ7sE?si=_HQLVF8QJk2-_umA) - critical bands are freq. regions within the audible spectrum where the human auditory system processes sound differently, like freq. masking - Bark scale, also called ERB scale, divides the audible frequency range into a series of bands 7. Define the concept of sound masking. Draw an example of masking curves for different levels of masking signal. Describe the time domain masking. - sound masking: perceive one sound that is affected by another sound - in time domain, temporal masking occurs when we perceive a sound is influenced by sounds that precede or follow it in time 9. What brings the binaural hearing above the monaural? (list and briefly describe each area) - involves the perception of sound with two ears - enable accurate localization of sound sources in space - better understand speech in noisy environment, use difference in arrival times and intensities of sound between the ears, integrate inputs in brain 11. How do people locate the direction of the incoming sound in the horizontal and vertical plane? - the brain combines multiple auditory cues with neural processing to determine the direction of incoming sound sources in both the horizontal and vertical planes - horizontal plane localization - interaural time difference (ITDs): the time delay between a sound arrives at one ear versus the other ear - interaural level difference (ILDs): differences in sound intensity between the ears - Spectral Cues: the shape of the outer ear (pinna) and head affect the spectral characteristics of sound reaching each ear, providing additional localization cues - vertical plane localization - Spectral Cues - head movement can help determine a sound is coming from above or below - neural processing in brain 13. Describe possible approaches to audio signal processing system quality assessment. List the advantages and disadvantages of individual approaches. - subjective listening tests ![image](https://hackmd.io/_uploads/BkfnqOTA6.png) - objective metrics ![image](https://hackmd.io/_uploads/SkftpOpCp.png) - psychoacoustic models ![image](https://hackmd.io/_uploads/rJaiAup0T.png) - comparative listening tests - compare the output of different systems - diagnostic tool ![image](https://hackmd.io/_uploads/Sko5xKpCp.png) 14. What are the advantages and disadvantages of subjective methods for assessing the quality of audio codecs? Provide an example of the method of quality assessment in systems with little variation in quality. How limited is the use of this method for inferior quality codecs? - very comprehensive and flexible but too subjective and time-consuming - Mean Opinion Score (MOS) test ![image](https://hackmd.io/_uploads/HkBzYKa0p.png) - limitation, especially for inferior quality codecs ![image](https://hackmd.io/_uploads/ryNe5Kp0a.png) 16. What is the basis for objective methods for evaluating the quality of audio codecs? Provide a general block diagram and examples of specific methods. - perceptual audio evaluation ![image](https://hackmd.io/_uploads/rkwB8YTA6.png) - if there is no suitable standard exists, then do statistics and experiment by ourselves ![image](https://hackmd.io/_uploads/r12rdFT0T.png) 17. What are the limits of the sound localization and why? - Frequency Dependence: High-frequency sounds tend to be more localized than low-frequency sounds. This is because high-frequency sounds have shorter wavelengths, allowing the auditory system to detect subtle differences in arrival time and intensity between the ears. However, at very high frequencies, diffraction effects become significant, making localization more challenging. - Distance: Sound localization becomes less accurate at greater distances from the listener. - interference between sounds - Head-Related Transfer Functions (HRTFs): Individual differences in the shape and size of the head, torso, and outer ears influence the way sound waves interact with the auditory system 18. Example on: estimation of signal to quantization noise ratio, middle ear function, masking, calculation of ITD. - Estimation of Signal-to-Quantization Noise Ratio (SQNR): ![image](https://hackmd.io/_uploads/HkEe4Y60T.png) - Middle Ear Function: ![image](https://hackmd.io/_uploads/HkQmVtpAp.png) - Masking: ![image](https://hackmd.io/_uploads/BJIvEtaRp.png) - Calculation of Interaural Time Difference (ITD): ![image](https://hackmd.io/_uploads/B1jdEtaRa.png) ::: --- ## Lecture 6 :::success 1. List and comment on the basic requirements to be considered when selecting the appropriate audio codec for given purpose. p4 - compression efficiency, e.g. bit rate - absolute achievable quality, e.g. the resolution - complexity - computational complexity - storage requirements - hardware of complexity of encoder and decoder - editability, like if we can adjust the parameters or not - error resilience, ability to recover the error - source coding, removal of redundancy(know model of generation of the signal in advance) - perceptual coding, remove irrelevant data from auditory system to minimize noise 3. Draw the general structure of the lossy audio encoder and describe the basic and auxiliary blocks. ![image](https://hackmd.io/_uploads/HyAuKU6kC.png) - filter bank: decomposes the input signal into subsampled spectral components - perceptual model: estimate the actual (time dependent) masking threshold - quantization and coding: quantizes and codes the signal with the aim of keeping the quantization noise below the masking threshold - frame packing: format the bitstream (quantized and coded spectral coefficients + side information) > Side Information: Information such as coding parameters, metadata, and error correction data 5. What is the purpose of the filter bank and what are the two basic ways of its implementation? Why is it possible to sub-sample the signal in individual bands at a filter bank? What does the concept of critical sampling mean in this context? - the purpose of the filter bank is to analyze or process different frequency components of the audio signal separately - two ways of filter bank - Time domain aliasing cancellation based filter banks (MDCT) - Fourier Transform based filter banks (DFT, DCT) - it is possible because the human auditory system is less sensitive to high-frequency details compared to low-frequency components - critical sampling refers to the sampling rate required to avoid aliasing when sub-sampling a signal that has been filtered by a bandpass filter 7. Describe the selected filter bank, its properties and utilization. - Quadrature Mirror Filter Banks (QMF) ![image](https://hackmd.io/_uploads/Hy4svpyeR.png) - Wavelet Based Filter Banks > Wavelets are termed a "brief oscillation" ![image](https://hackmd.io/_uploads/H11BWCkgA.png) - Polyphase Filter Banks – Equally Spaced Filter Banks ![image](https://hackmd.io/_uploads/r1Odz01lC.png) - Fourier Transform Based Filter Banks (DFT, DCT) ![image](https://hackmd.io/_uploads/BJhbmA1lC.png) - Time Domain Aliasing Cancellation Based Filter Banks (MDCT) ![image](https://hackmd.io/_uploads/ryXiQyge0.png) - Hybrid Filter Banks ![image](https://hackmd.io/_uploads/rJbHE1lxA.png) - Adaptive Filter Banks ![image](https://hackmd.io/_uploads/ry_UV1lgC.png) 9. Describe the compression artifact "preecho" - how it manifests and why it occurs. What are the ways of suppressing it? ![image](https://hackmd.io/_uploads/HyAJPJglC.png) - bit reservoir: to allocate extra bits dynamically to encode transient signals accurately to reduce preecho - window switching: dynamically adpat the window size and shape - hybrid/switched filterbanks: combine multiple filterbank structures - gain modifications: smoothes transient peaks at the decoder - TNS(temporary noise shaping): shape the quantization noise spectrum based on the signal characteristics, like using feedback to eliminate the noise 10. Describe the basic and other functions of the psychoacoustic model. Describe in detail the algorithm of the selected psychoacoustic model. -> Psychoacoustic model MPEG - Spectral analysis and SPL(Sound Pressure Level) normalization: devided into freq. component and then normalize spectral magnitude values based on Sound Pressure Level(SPL) to represent perceptual loudness - Identification of tonal and noise maskers: identifies tonal components, and classify where the energy is spread across multiple freq. bins - Decimation and reorganization: to reduce computational complexity, critical bands are defined based on psychoacoustic principles and correspond to the frequency resolution of the auditory system - Calculation of individual masking thresholds: the model computes the individual masking threshold within each critical band - Calculation of global masking thresholds: individual masking thresholds are then combined to derive the global masking threshold for each critical band 12. Explain the abbreviations SNR, NMR and SMR and describe the meaning of these parameters for bit allocation and quantization. ![image](https://hackmd.io/_uploads/SyWhKbel0.png) - Signal to Noise Ratio: higher SNR, more bits - Noise to Masker Ratio: higher NMR, noise is less audible due to stronger masking signals, more bits - Signal to Masker Ratio: higher SMR, more important, more bits for encoding 13. Describe the basic approaches of quantization control mechanism for fixed and variable bit streams. How does quantization control influence the use of lossless compression of lossy compressed data? What is a bit reservoir? - it is basically the trade-off between bitrate and audio quality - fixed bit rate quantization control: predetermined number of bits is allocated to encode each audio frame or subband, it conforms to specific bitrate target, compression rate might be limited by quantization distortion during encoding - variable bit rate quantization control: dynamically adjusts the number of bits allocated to each frame or subband based on the complexity of the audio content, compression rate might be influenced by the degree of variability in the bitstream - bit reservoir: it's a buffer used in storing excess bits during encoding, or utilized during more complex segments (need more bits to maintain audio quality) 15. Explain the Bit allocation techniques and noise allocation approaches - what is the difference between them and for which cases are the particular methods suitable? p26 - bit allocation: no entropy coding, refer to the process of determining the number of bits allocated to encode each frequency component or subband in the audio signal, focus on distributing bits to encode signal components efficiently - noise allocation: with entropy coding, used in conjunction with entropy coding to allocate bits more efficiently, focus on distributing bits to encode quantization noise and minimizing its perceptibility 16. Discuss issues of multichannel sound compression. For what reasons it is not suitable for the independent lossy compression of individual channels? - in multichannel audio, there is often significant redundancy or correlation between channels, especially in stereo or surround sound setups - human perception of sound in multichannel audio environments involves not only individual channel content but also interactions between channels - to address these issues, multichannel audio compression techniques often employ joint coding strategies that consider the inter-channel relationships, like Joint Stereo coding 18. Describe the use of the MS method for stereo/multichannel audio encoding. What is the principle of this method, what are its limitations, advantages, disadvantages? - used in stereo and multichannel audio encoding - method: - in the MS method, a stereo signal is decomposed into two components: the Mid channel (M) and the Side channel (S) - Mid channel contains the sum of the left and right channels, while the Side channel contains the difference between the left and right channels ![image](https://hackmd.io/_uploads/BydUTGgx0.png) - advantages - more efficient compression compared to encoding each channel independently - ensure natural listening experience - Mid channel contains the mono information of the stereo signal, making it compatible with mono playback systems - disadvantages - increased complexity - imbalance between the left and right channels can affect the accuracy of the MS decomposition - some audio playback systems or devices may not support MS-encoded audio - aggressive compression of the Side channel may introduce distortion 20. Describe how to use the intensity stereo for stereo/multichannel audio encoding. What is the principle of this method, what are its limitations, advantages, disadvantages? - it encodes one channel (typically the center or sum channel) along with information about the intensity difference between channels - advantage - lower sampling data - limitation - useful only for high freq. range - disadvantage - not perfect reconstruction ::: --- ## Lecture 7 :::success 1. What is the specific characteristic of lossless compression of the audio signal? Draw and describe a general block diagram of a lossless audio encoder. - lossless methods like LZW, Huffman, etc, not effective for audio due to long-time correlations and high range of values - linear predictive coding: - the higher model order - the higher accuracy - the slower - number of bits depends on sample rate - residuum = difference between original and model - two stage processing - encoding ![image](https://hackmd.io/_uploads/HkEnLq8eA.png) - decoding ![image](https://hackmd.io/_uploads/H17CU58e0.png) 3. Describe a selected lossless audio compression standard. - take FLAC (Free Lossless Audio Codec) as example - FLAC is commonly used lossless audio compression standard - use simple model to save data, fast but less accurate than LPC - unlike lossy compression methods like MP3, which discard some audio data to achieve compression, FLAC retains all the original audio information. - FLAC is an open-source codec - error resilience - FLAC is supported by various audio playback softwares - but actually it is still large, so not suitable to be streamed over the internet 5. Provide an example of a audio lossy standard with a hybrid bank of filters, draw and describe a block diagram. ![image](https://hackmd.io/_uploads/B1rP1oIxC.png) > Pulse-code modulation (PCM) - MP3 is a widely used audio compression format, layer 1 and layer 2 use a polyphase filter bank, layer 3(MP3 compression) adds an additional Modified Discrete Cosine Transform (MDCT) to the process - input audio stream: - the audio signal enters the MP3 compression process - hybrid filter bank - filter bank divides the sound into subbands of frequency - layer 1 and 2 are polyphase filter bank - layer 3 compensates deficiencies of layer 1 and 2 - bit allocation and quantization - quantization process assigns a specific number of bits to each spectral coefficient - entropy coding - quantized coefficients are further encoded using entropy coding to reduce the size - compressed bitstream - compressed bitstream containing the encoded audio data 7. Describe the basic differences between the MPEG1 layer 3(MP3) and MPEG2 - AAC encoder. - compression efficiency: AAC generally offers better compression efficiency compared to MP3 - audio quality: AAC typically provides better audio quality than MP3, especially at lower bitrate - algorithm complexity: AAC encoding algorithm is more complex than MP3 - support for multi-channel audio: both AAC and MP3 support stereo and multichannel audio, AAC provides better support for multichannel audio encoding, including surround sound formats like 5.1 and 7.1 channels 9. Describe the basic differences between the MPEG1 layer 3 and Dolby AC3 encoder. - they use different compression algorithm - AC3 typically offers better audio quality than MP3 - AC3 is specifically designed to support multichannel audio - AC3 files tend to have larger file sizes compared to MP3 files 11. Explain the principle of Spectral Band Replication (SBR) ![image](https://hackmd.io/_uploads/BySFsaLeA.png) - instead of directly encoding the high-frequency bands at full bitrate, SBR generates side information that describes the spectral envelope of these bands in a compressed format - cut off the sampling rate in half, and encode the missing high freq. with side information 13. Explain the principle of Perceptual Noise Substitution (PNS) - a lossy compression technique that is based on the assumption that all white noise sounds similar to the human being, because it is hard for human to distinguish different types of noise - briefly speaking, use white noice to replace original noice to reduce the carried data it needs 15. How does the USAC (MPEG-D) encoder differ from AAC (MPEG 2) (list basic differences) - AAC is primarily designed for high-quality audio compression, and USAC is designed for efficient compression - AAC employs perceptual coding techniques such as transform coding, perceptual noise shaping, and psychoacoustic modeling, USAC integrates advanced coding techniques tailored for both speech and audio signals. It combines waveform and parametric coding approaches, including Linear Prediction (LP), Sinusoidal Coding, and Harmonic and Noise (HnS) modeling - AAC is standardized in MPEG-2 and MPEG-4 - USAC is standardized in MPEG-D 17. Describe multi-channel audio encoding options in the MPEG-H 3D audio standard ![image](https://hackmd.io/_uploads/r10c41PxA.png) - Immersive Audio Object Coding: focus on encoding individual audio objects with precise spatial and perceptual attributes - Channel-Based Coding: Channel-Based Coding represents audio content using discrete channels without spatial granularity - Scene-Based Audio Coding: combination of objects and channels ::: --- ## Lecture 8 :::success 1. Define so-called mathematically lossy/lossless and perceptually lossy/lossless coding in respect to redundancy, irrelevance, and image entropy. - lossless compression is identical to original - lossy compression is degraded in respect to original - perceptually lossless: perceive no distortion - perceptually lossy: perceive distortion - so usually, we do irrelevance reduction(lossy part) on irrelevant data, and do redundancy reduction(lossless part) on remaining data ![image](https://hackmd.io/_uploads/rJnETpkbC.png) 3. Describe the basic anatomy of the human eye and the impact of the structure of the individual parts on the design of image compression systems. Also focus on spectral sensitivity of receptors and description of receptive fields. - Cornea and Lens: focusing light onto the retina at the back of the eye, determining the sharpness and clarity of the image formed on the retina - Retina: light-sensitive tissue lining the inner surface of the eye, convert light into electrical signals that are transmitted to the brain - Spectral Sensitivity of Receptors: cones are sensitive to different wavelengths of light, corresponding to different colors - Receptive Fields: refer to the specific regions of the retina that influence the firing rate of individual neurons in the visual pathway, vary in size, across different parts of the retina and are crucial for spatial processing and feature detection 5. What are the basic functional parameters of HVS exploited in image compression, especially with respect to W-F law and CSF? - W-F(Weber-Fechner) law: describes the relationship between the physical intensity of a stimulus and the perceived intensity by a human observer ![image](https://hackmd.io/_uploads/SJjbpA1bC.png) - CSF(Contrast Sensitivity Function): describes the human eye's sensitivity to contrast at different spatial frequencies, and human vision is more sensitive to mid-range spatial frequencies than to very low or very high frequencies ![image](https://hackmd.io/_uploads/SkIO3RJWC.png) - magnitude of physical stimulus: measurable quantity of the stimulus without considering human observation - subjectively perceived intensity: how intense a stimulus is perceived subjectively by a human observer - noticeable amplitude of sinusoid: especially vision and hearing, this refers to the minimum change in amplitude of a sinusoidal waveform that can be perceived by human observer - background luminance: the level of brightness or luminance in the background 6. Describe the CSF for monochrome and color stimulus and how this feature is used in image compression. - for monochrome stimulus, CSF typically shows higher sensitivity to mid-range spatial frequencies - CSF for color stimuli follows a similar pattern of higher sensitivity to mid-range spatial frequencies, but with some variations depending on the specific color channels being considered - compression algorithms allocate more bits to preserve details in the mid-range frequencies 8. What are the two basic approaches in image quality assessment, describe their advantages and disadvantages? - subjective assessment - advantages: - provide direct feedback from human observers - capture the overall subjective impression of image quality, taking into account various factors such as clarity, color fidelity, and visual artifacts - disadvantages: - time-consuming and resource-intensive - subject to individual differences - objective assessment - advantages - offer automated and quantitative measurement of image quality, enabling fast and efficient evaluation of large datasets - consistent and repeatable results - disadvantages - it may not fully capture the complexity of human perception and subjective judgment 10. Describe the basic procedures of statistical processing of subjective data from image quality assessment with a group of observers, especially the meaning of MOS, CI and screening of observers. - experimental designs: define the purpose of the image quality assessment, e.g., overall quality, sharpness, color accuracy, and select data set, rating scale for observers - observer screening: ensure observers have normal vision - data collection: conduct the image quality assessment experiment and collect observer ratings - statistical analysis: calculate the Mean Opinion Score (MOS) for each image by averaging the ratings provided by all observers, and compute the Confidence Interval (CI) to quantify the uncertainty or variability associated with the MOS estimate - result interpretation: interpret the statistical results 12. How is crowdsourcing used for subjective image quality assessment, what are its advantages and disadvantages? - quality assessment outside of laboratory - advantages - very fast - more comprehensive evaluations because of diverse pool of participants from different location - disadvantages - limited control over the data set and environments - more demanding on test design 14. Indicate which international recommendation are particularly used to perform image quality assessment with a group of observers. Describe at least one of the methods (scale, presentation order) that this recommendation lists. - international recommendations - International Telecommunication Union (ITU) BT.500-13: subjective assessment of quality of television pictures - ITU-T P.910: for multimedia application - ITU-R BT.1788: video quality in multimedia application - basic evaluation methods ![image](https://hackmd.io/_uploads/Sy9yNjfbA.png) - Single Stimulus (SS) method, observers are presented with one stimulus at a time and asked to rate its quality independently - Double Stimulus Impairment Scale (DSIS) method involves presenting pairs of stimuli to observers, where one stimulus is a reference (original or uncompressed) and the other is a degraded version (e.g., compressed or processed) 16. What are the three basic configurations of methods for objective image quality assessment, also describe using a block diagram. List the advantages and disadvantages of individual solutions and possible areas of use. ![image](https://hackmd.io/_uploads/rJwyxJQb0.png) - Full-Reference (FR) - advantages: comprehensive assessment of image quality - disadvantages: more computationally intensive, and it requires access to the original reference image, which may not always be available in practical applications - areas of use: quality assessment of images and videos in multimedia compression systems - Reduced-Reference (RR) - advantages: offers a balance between accuracy and complexity by comparing a reduced set of features - disadvantages: may not provide as accurate results as FR methods - areas of use: in real-time systems with limited computational resources, or full reference image is restricted or impractical - No-Reference (NR) - advantages: suitable for scenarios where reference images are unavailable - disadvantages: less accurate than FR and RR methods due to the lack of direct comparison with the reference image - areas of use: in scenarios where access to the reference image is impossible or impractical, such as network-based image transmission 18. Describe basic pixel metrics for image quality assessment preferably by using mathematical relationships. What are their main advantages and disadvantages compared to more advanced metrics? - basic pixel metrics - Mean Squared Error (MSE): average squared difference between corresponding pixels in the reference and distorted images - Peak Signal-to-Noise Ratio (PSNR): the ratio between the maximum possible intensity of the image (peak signal) and the distortion introduced by compression (noise) - Structural Similarity Index (SSIM): similarity between two images by considering luminance, contrast, and structure - advantages: simple and easy to understand and interpret - disadvantages: do not directly reflect human visual perception, and sensitive to artifacts and outliers 20. Describe the principle of image quality metrics exploiting the HVS model. What are their main advantages and disadvantages in comparison to pixel metrics? ![image](https://hackmd.io/_uploads/SJrCo1XWC.png) - principle: combine various stages of processing from the retina to the visual cortex - advantage: better align with human perception, also higher correlation with subjective assessments - disadvantage: highly non-linear, very complex 22. What are the two basic tools for performance evaluation of objective image quality metrics in relation to subjective tests? What do these characteristics express? - correlation Analysis: indicates how closely the objective metric aligns with human perception - linear Regression Analysis: quantifies how well the objective metric scores can explain the quality in subjective ratings 24. Draw and describe the SSIM block diagram. ![image](https://hackmd.io/_uploads/Byg2G17-0.png) 23. Explain the importance of special databases containing reference and distorted images supplemented by subjective test results. - subjective test results are important resources for image quality assessment (IQA) algorithm, it provides a standardized framework for assessing image quality 25. Explain the principle of VIF (Visual Information) metric. ![image](https://hackmd.io/_uploads/HyvQvlXZC.png) - quantify the similarity between a reference image and a distorted image - the metric operates in the domain of local image patches rather than considering the entire image at once. This is because the HVS is known to be sensitive to local variations in image content 27. Calculate MSE, RMSE, and PSNR based on the presented image matrices. ![image](https://hackmd.io/_uploads/SJGGYgm-A.png) - example ![image](https://hackmd.io/_uploads/HJiNFl7bC.png) ::: --- ## Lecture 9 - [Compression method of JPEG](https://www.youtube.com/watch?v=Kv1Hiv3ox8I&t=926s) :::success 1. What is the principle of adaptive and nonuniform scalar quantization? - variable quantization step size based on signal distribution (signal PDF) 3. Describe the principle of vector quantization for image compression. ![image](https://hackmd.io/_uploads/Skn0mzYbC.png) - merge source output into blocks and their encoding as vectors - the index of the closest codeword is transmitted instead of the original vector, by replacing similar vectors with codewords from the codebook, VQ achieves compression by exploiting redundancy and similarity within the image. - the quality of the reconstructed image depends on the size and design of the codebook - larger codebooks generally provide better image quality but require more bits to transmit the indices - smaller distortion than scalar quantization 5. Describe the principle of differential coding for image compression. ![image](https://hackmd.io/_uploads/B1e7_zYZ0.png) - instead of encoding the pixel values directly, the difference between each pixel and its neighboring pixel is calculated and encoded - this difference, known as the prediction error, is typically smaller than the original pixel values, especially in regions with smooth transitions or low-frequency components - reduce the amount of information required to represent accurately, but it may introduce error propagation, where errors in the prediction of one pixel affect the prediction of subsequent pixels 7. Describe the principle of transformation encoding including the most commonly used transformations for image coding. What are the advantages and disadvantages of individual transformations? - principle of transform encoding - transformation: spatial information of the image into freq. or other domain representation - quantization: transformed coefficients are quantized, leading to lossy compression - entropy coding: quantized coefficients are encoded using entropy coding - commonly used transformations for image coding - Discrete Cosine Transform (DCT): is widely used in image compression algorithms such as JPEG, it converts the image into a set of frequency coefficients - Wavelet Transform: decompose the image into different frequency subbands at multiple resolutions, this allows for more efficient representation of both low and high-frequency components of the image > JPEG2000 uses it - Karhunen-Loeve Transform (KLT): optimal transformation that diagonalizes the covariance matrix of the image data, KLT requires knowledge of the image statistics, making it less practical for general-purpose compression - advantages - Efficient representation of image data in fewer coefficients, reducing redundancy - Enables high compression ratios with minimal loss of image quality - disadvantages - DCT: May introduce blocking artifacts, especially at high compression ratios. - Wavelet Transform: More computationally intensive compared to DCT, especially for multi-level decomposition. - KLT: Requires knowledge of image statistics, making it less practical for real-world applications. 9. Describe the principle of image coding in multiple bands and its main parts. - also known as subband coding, the image is decomposed into multiple freq. bands, each representing a different range of freq. - subband coding consists of three main processes - analysis: filter bank decomposition covering source freq. range - quantization and coding: compression scheme and bit allocation for different bands - synthesis: signal reconstructed from quantized and encoded coefficients 11. Describe the decomposition of the video signal by wavelet transformation and how the image of the first order of decomposition will look. 12. Draw and describe the block diagram of the JPEG encoder for color images. How do compression artifacts look like? ![image](https://hackmd.io/_uploads/H1Q1VkjWC.png) - compression artifacts in JPEG images are typically observed as distortions or visual imperfections - Blockiness: blocky artifacts can appear as visible boundaries, especially with high compression ratios - Color Bleeding: chroma subsampling can lead to color bleeding or color smearing artifacts, particularly in areas with fine details or high-frequency color changes - Edge and Detail Loss: lossy compression can result in the loss of fine details and sharp edges - Ringing Artifacts: ![image](https://hackmd.io/_uploads/SJFfXkoWC.png) - Mosquito Noise: high-frequency noise patterns that can appear around edges or boundaries in the image, resembling mosquito wings 14. Describe the basic structure of the JPEG 2000 encoder and decoder. How do compression artifacts look like? ![image](https://hackmd.io/_uploads/rkwGN1jW0.png) - in terms of compression artifacts, similar to JPEG but they visuallly look much better than JPEG 16. Describe why and what is the most common preprocessing of RGB color matrix before compression? - color space conversion: converting the RGB color space to another color space, such as YCbCr, conversion separates the luminance (brightness) and chrominance (color) information, which allows for more efficient compression - chroma Subsampling: reduce the resolution of chrominance components, because HVS is more sensitive to changes in luminance than in chrominance, so we reduce subsampling chrominance data to remove redundant information - Denoising and Filtering: by reducing noise levels, the compression algorithm can allocate more bits to represent important image features 18. Explain what are the so-called R-D curves and how these describe the image compression system. ![image](https://hackmd.io/_uploads/HJ1BHYobA.png) > the R-D curve illustrates how increasing compression (lower rate) leads to higher distortion - Rate(R): it represents the bitrate or the number of bits required for encoding - Distortion (D): It quantifies the difference between the original image and the compressed image. Common measures of distortion include Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM) 20. Explain the basic principle of the EZW encoder. - combines wavelet transforms with embedded coding techniques to achieve high compression ratios - steps - wavelet transform, separate the image into different frequency bands, to enable efficient representation of both smooth and detailed image features - encoded using Embedded Zerotree Coding, this method exploits the hierarchical structure of wavelet coefficients, where insignificant coefficients are often located near significant ones - during encoding, wavelet coefficients are thresholded to identify significant coefficients that contribute to image features. The remaining insignificant coefficients are quantized to reduce their precision, leading to compression 22. Explain the basic principle of the LBG algorithm for VQ. - it is a automated codebook design using vector quantization - steps - begin with an initial codebook - split each codeword into two new codewords to double the size of the codebook - refine the codebook by k-means algorithm or a similar clustering technique, each input vector is assigned to the nearest codeword - repeat the splitting and refinement steps iteratively until a convergence criterion is met 23. Explain the principle of bit allocation based on functional analysis of R-D curves. - allocate more bits to components with steeper R-D curves, indicating higher sensitivity to distortion, and fewer bits to components with flatter curves - ensure that the desired bitrate is achieved while maintaining acceptable perceptual quality 24. Based on the R-D curves depicted, show which codec has better performance. ![image](https://hackmd.io/_uploads/rybdqYjZ0.png) ::: --- ## Lecture 10 - [Explanation of video compression](https://youtu.be/QoZ8pccsYo4?si=8zge2vq16RhlzVaF ) :::success 1. Describe the historical development of the basic standards for video coding with respect to time cycles of introducing new standards and amount of compression performance improvement. ![image](https://hackmd.io/_uploads/HylqN_WG0.png) - it uses RD(rate/distortion) curves to present it - Compared to JPEG, H.261, bitrate reduction of H.266 is more than 10 times than old technology - and closer to now, it needs longer to enhance it, since the technology is already very mature 3. What are the basic redundancy groups used in video compression and what is their meaning? ![image](https://hackmd.io/_uploads/BkGGLuWfC.png) - Psychovisual Redundancy: human visual system is more sensitive to certain types of information and less sensitive to others - Statistical Redundancy: relationships among symbols for lossless(entropy) encoding 5. Describe the basic blocks of the general video encoder and their function. ![image](https://hackmd.io/_uploads/BJyNwdbGR.png) - temporal model: aiming temporal redundancy reduction, do prediction based on past and future frames with motion compensation -> motion vector - spatial model: aiming spatial redundancy reduction using transform coding and quantization -> quantized coefficients - entropy encoder: aiming statistical redundancy reduction by do bitstream compression -> compressed bitstream 7. Describe the principle of block estimation and motion compensation in the video compression time model including its advantages and disadvantages. - for block based motion estimation and compensation - motion estimation: compare current frame with all possible blocks in search area, try to minimize residual energy - motion compensation: and selected block substracted from current block obtaining residual block - advantage of block based approach - relatively simple - corresponds to rectangular frame dimensions and DCT blocks - disadvantage of block based approach - shapes of real objects do not match block shape as the block is big, and also real motion might not be precise 9. What is a motion vector in video compression, what are the most common metrics used to determine it? - vector motion indicates how much a block in one frame has moved relative to its position in previous frame - for rough checking, we can compare macro blocks (e.g. 16x16) between frames - for precise checking, we compare the motion of pixels between frames 10. What is a macroblock in video encoding and how does block size for motion compensation affect the compression efficiency? - macroblock is basic unit for motion compensation - smaller block size - better prediction energy - lower energy (entropy) of residual frame - computation is more complex - more bits allocated to motion vectors 12. Describe the basic principle of predictive coding for interframe and intraframe prediction in video coding. - interframe prediction: motion estimation and motion compensation - intraframe prediction: do prediction between pixels and try to find the redundancy, like spatial prediction - differential pulse code modulation (DPCM) 14. What are the basic requirements on transform to be used in video transformation encoding? What are the popular transformation groups, including their advantages and disadvantages? - requirements on transform - coefficients of transform domain decorrelated (minimal inter-dependence) and compact representation(small number of coefficients) - transform should be reversible - transform should be low complexity and low memory requirement - popular transform groups - Block based transforms (DCT, KLT): lower memory requirement, but block artifacts happen - image based transform (DWT): processing of entire image or tiles, better efficiency than DCT but higher memory requirements 16. Describe the principle of the hybrid DPCM/DCT video encoder, preferably by using a block diagram. ![image](https://hackmd.io/_uploads/B1r1piMMA.png) - hybrid DPCM/DCT video encoder combines Differential Pulse Code Modulation (DPCM) for temporal prediction and Discrete Cosine Transform (DCT) for spatial transformation - procedure 1. input video frames enter the encoder as a seq. of frames 2. temporal prediction is performed by estimating motion vectors between the current frame and reference frames 3. the difference between the original video frame and the predicted frame 4. residual signal is divided into blocks, and DCT transform spatial domain data into freq. domain data, and try to concentrate most of the signal energy in a few coefficients 5. transformed coefficients are quantized to reduce the number of bits required for representation, and do entropy coding like Huffman coding or Arithmetic coding 6. then bitrate transmission 18. Describe the principle of basic search algorithms for determining motion vectors. - procedure for block search is to do global error minimum search - take Full(exhaustive) search(FS) as example - for each block search all(2w+1)^2 positions in search window - the metrics might be Mean Square Error(MSE) and Sum of Absolute Differences(SAD) 19. Describe basic pre-processing and post-processing methods to increase compression efficiency. - pre processing: - noise suppression to improve efficiency by reducing the amount of high-frequency information that needs to be encoded - keep camera movements stable, since encoder treats movements as global movement between frames, motion compensation can not correct large displacements - post processing: - suppression of visible artifacts(block structure) by using motion compensation and de-block filter (suppressing block structure) ::: --- ## Lecture 11 :::success 1. What elements are usually specified in the standard for video compression (e.g., preprocessing, encoder, decoder, bitstream, etc.) and what are the reasons for this selective standardization? - bitstream, syntax and decoder are standardized - encoder can be simplified, optimized - for encoder and decoder, we just need to choose one of them to standardize 3. Explain and describe the GOP in MPEG 2 standard. ![image](https://hackmd.io/_uploads/rJClTe2fR.png) - group of pictures between two I frames, a sequence of consecutive frames are treated as a single unit for compression and encoding purposes - I frames are encoded indenpendently, it contains complete image information and serve as reference points for decoding subsequent frames, typically placed at the beginning of each GOP - P frames are predicted from previously encoded reference frames, it only contains the differences (motion vectors and residuals) between the predicted frame and the actual frame - B frames are predicted from both preceding and subsequent reference frames, more accurate, it contains motion vectors and residuals relative to both past and future frames 5. What are the key improvements of the H.264/AVC standard relative to MPEG 2? - enhanced motion compensation - integer transform of smaller blocks - enhanced intraframe prediction - adaptive de-blocking filter - enhanced entropy coding 7. Draw a block diagram of a typical H.264/AVC encoder and explain the meaning of each block. ![image](https://hackmd.io/_uploads/rkFtmf2zA.png) 9. Explain the meaning of profiles and levels in video compression standards. - profile: set of algorithmic features to create bitstream - level: degree of capability and limits (resolution, bitrate) 10. Describe the principle of intraframe directional prediction for the H.264/AVC standard. - there are luma samples and chroma samples for intraframe prediction - and there are 8 directions and one DC(mean of each blocks) 12. What is the main idea behind the so-called DCT-based integer transformation for H.264/AVC? - It is to efficiently represent spatial information in video frames by transforming pixel values into a frequency domain using integer arithmetic. 14. Explain the principle of the deblocking filter in the H.264/AVC standard. Is this filter inside the inter-frame prediction loop or after the decoder output and for what reason? - the principle is revolving around smoothing out discontinuities between adjacent blocks within a frame - it is a post-processing filter applied after the decoding process, for reducing blockiness artifacts introduced during encoding proecss 16. What is the basic principle of Multiview Video Coding (MVC) encoding in a particular H.264/AVC extension? - Multiview Video Coding (MVC) is an extension of the H.264/AVC video coding standard - designed to efficiently compress and encode multiple views of a scene - prediction and compensation are done within multiple views ![image](https://hackmd.io/_uploads/HJF2YC3zA.png) 18. What are the key changes and improvements to the H.265/HEVC video encoding standard compared to the H.264/AVC standard? ![image](https://hackmd.io/_uploads/S1Ak00nMR.png) - HEVC has larger block size - HEVC with hierarchical quad tree, so the transform unit can be more flexible - there is more intraframe prediction mode to improve accuracy - the transform size can be more flexible 20. What type of entropy encoder is used to encode quantized transformation coefficients in H.265/HEVC? What are the advantages and disadvantages of this encoder compared to the entropy encoder used, e.g., in the JPEG standard? ![image](https://hackmd.io/_uploads/r1bvhNTG0.png) - Context Adaptive Binary Arithmetic Coding - advantages ![image](https://hackmd.io/_uploads/HJqsh4az0.png) - disadvantages ![image](https://hackmd.io/_uploads/SksAhETzR.png) 22. What are two main examples of a series of video compression formats originally developed by On2 Technologies and now by Google? Compare the compression efficiency of the H.264/AVC, H.265/HEVC standards with these compression formats and indicate where is the main application area for these alternative formats. - VP8 and VP9(successor to VP8) - intended for web applications - video compression using VP8 and VP9 - VP8 initially developed by On2 technologies - VP9 targeting 50% bitrate savings with respect to VP8, but VP9 performs worse than HEVC 24. What were the technical goals and business model of the AOMedia Video 1 (AV1) compression format design? What are the main advantages compared to H.265/HEVC standard? - it is open source and royalty-free, developed from VP9 codec, target 20% better compression than H.265/HEVC and VP9, 50% better compression than H.264/AVC - main advantages compared to H.265/HEVC standard - AV1 has better efficiency - AV1 is developed as open source, royalty-free 26. In what application area is the main use of Apple ProRes? List the main differences against the H.264/AVC standard, etc. - it is a lossy video compression technology, primarily for video editing - Apple ProRes is preferred in professional video production environments where maximum image quality - main difference between Apple ProRes and H.264/AVC - compression efficiency and quality: significantly higher quality and less compression artifacts compared to H.264/AVC - editing performance: specifically optimized for video editing applications, providing fast decoding and playback performance even for high-resolution and high-bitrate video files - bitrate and file size: larger file sizes compared to H.264/AVC files at equivalent quality levels - intra-frame compression: faster decoding and editing performance - color depth: it supports higher color depths(up to 12-bit) 28. What video compression formats use discrete wavelet transforms, and what are their application areas? What is the main difference in encoder configuration compared to the formats that apply DCT to individual blocks? - JPEG 2000 and WVC - application areas of video compression formats using DWT - medical imaging: high-quality images with minimal loss - remote sensing: satellite imaging and aerial photography, be able to efficiently transmit and store large volumes of image and video data - archival and preservation: suitable for archival and preservation purposes, where maintaining high image fidelity and resolution over long periods - the main difference in encoder configuration compared to formats that apply discrete cosine transform (DCT) to individual blocks - transform stage: the input video frames are decomposed into multiple frequency subbands using wavelet transforms, allowing for efficient representation of both low and high-frequency components - prediction stage: DWT-based compression formats may utilize both spatial and temporal prediction techniques, this allows for efficient representation of temporal redundancies and motion information across video frames :::