# NCKU CSIE Multimedia Midterm Question Bank (OLD) ###### tags: `multimedia` `note` ## Q1. Whats the difference between "Hypermedia" and "Multimedia" ? #### Multimedia + An integration that is construct by media needing a device (multimedia deliver systems) to represent. E.g., Images, Audio, Video... + Can be stored Linear or Non-linearly #### Hypermedia + Hypermedia is a system that links multimedias, providing each user ability to access other resources. E.g., WWW + Can only be stored non-linearly ## Q2. What extra information is multimedia good at conveying #### What can spoken text convey that written text cannot? + Speeking speed + Tone + Emotion and Attitude #### What can written text convey that spoken text cannot? + Non-linear reading + Help understanding via layout adjustment + Some text sounds same but means different, which is only aknoledgable via text. - I said "Quickly, come here." - I said quickly, "come here." ## Q3. Embedded information in Speech, Image and Video. #### Speech + Text + Speeking speed + Tone + Emotion and Attitude #### Image + 2D Luminance + Color information #### Video + Multi pieces of speech and image + Time information ## Q4-1. What is gamma correction for the display in the CRT system? Under CRT system, providing a unit of voltage won't result in a same amount of luminosuiy. Thus, when saving, we adjust the power of luminance with a $1\over\gamma$ which is called a **Gamma Correction**. ## Q4-2. If the color is out of gmaut on a device, please provide a method to deal with the problem. Choose the closest color in gamut. Draw a line from white point to the unsupported color, then choose the color on both the line and the border of gamut. ## Q5-1. What is "Weber's Law"? To cause "Just Noticable Difference", $\frac{\Delta R}{R} =constant$, which $R$ is the refrence stimulu. ## Q5-2. What is "Nyquist Theorem"? If a signal is band-limited with lowest frequency $f$ and higest freq. $F$, the sampling freq. must be at least $2F$ to avoid aliasing. :::success **Nyquist Rate:** $2F$ **Nyquist Frequency:** $1\over 2$ of sampling frequency ::: ## Q5-3. What is "Aliasing"? If Nyquist Theorem is not obeyed, a.k.a. the sampling frequency is lower than $2F$, we will get a wrong result restoring the signal. :::info $f_{\text{alias}}=f_{\text{sample}}-f_{\text{true}} \ \ \ \ \text{for}\ \ \ \ f_{\text{true}}<f_{\text{sample}}<2f_{\text{true}}$ ::: ## Q6. Why is compression necessary for multimedia activities, especially when transmitted on the internet? Size of multimedia information are usually huge, which compressing can save storage space. On internet, compressing can also save time transmitting. ## Q7. SQNR (Signal to Quantization Noise) #### What is it? When quantizating, there exists errors. SQNR is the indicator to estimate the quality of the quantization, by the formula: $$ \text{SQNR}=20\ \log_{10}( \frac {V_{_\text{Signal}}} {V_{_\text{Quantization noise}}} ) $$ #### What is the best SQNR a 8-bit card can achieve? By PSQNR, $\text{SQNR}=20\ log_{10}(\frac{2^{8-1}}{1\over 2})\approx48.16$ ## Q8. For a quantization accuracy of N bits per sample, what is the worst case of SQNR? Map the Max signal into $2^{N-1}-1$, the Mix signal into $2^{N-1}-1$. The worst case we can get is that **The quantization-error is exactly half of interval**, which we can rewrite the SQNR formula to: $\text{SQNR}=20\ log_{10}( \frac {2^{N-1}} {1\over2} )\approx6.02N$ :::success If the input signal is a sinusoid, the formula becomes $$ \text{SQNR}\approx6.02N+1.76 $$ Due to the error not being a constant. ::: ## Q9. Describe the process of digitazation of an analog sound. Also define the processes. The processes are: 1. ++**Sampling**++ + To record an analog signal at a regular discrete moment of time, which the frequence is the **Frequency of Occurrence** 2. ++**Quantization**++ + To transform the sampled signal into a amplitude value #### How can these factors affect the quality of sound? If the samplizing frequency didn't follow the Nyquist Theorem, will cause aliasing. If the quantization error is too huge, also affects the quality when restoring. ## Q10. Dithering #### What is the dithering algorithm? It is the algorithm used to determine whether the pixel should be coloured when transforming a greyscale image to monochrome, by the fact that human eyes will mix neighboring colors, which causes the greyscale-like effect. #### With the given dithering matrix, describe the ordered dithering algorithm. :::info $\begin{bmatrix} 0 & 2\\ 3 & 1 \end{bmatrix}$ Assuming the greyscale image has pixel value as 0 ~ 255 ::: The Dithering Threshold map will be $M = \begin{bmatrix} 51.2 & 153.6\\ 204.8 & 102.4 \end{bmatrix}$ And for all pixel of the original image $I(x,y)$, if $I(x,y)>D=M(\frac{x}{n}, \frac{y}{n})$, fill the pixel, with $n$ means that the threshold matrix is $n\times n$ ## Q11. How to transform a 8-bit greyscale image into 48-bit image? Since in 48-bit RGB there is 16 bit for each color channel, and all gray shades has the same R/G/B values, to make the image look same, we do: $\text{Original Pixel Value}\times 2^{16-8}$, and store the value into the three channels. ## Q12. Descrive a way to devise a Color Lookup Table to make 8-bit color lookup out of 24-bit color. Using Median Cut Algorithm 1. For all color existing in image, Find in R/G/B, which have the biggest range. >>> E.g., with three color (R/G/B) >>> **100**/**30**/**150**, **180**/**45**/**120**, **230**/**40**/**95** >>> The one has biggest range is R (230 - 100 = 130) 2. Make a cut at the middle of the range, which divides the space into two. 3. With all new spaces, repeat step 1 and 2, until we get $2^{16-8}$ different blocks 4. For each box, the average pixel value inside it is its new color value. >>> Assuming 3 color in a block: >>> **100**/**40**/**100**, **105**/**0**/**85**, **110**/**5**/**100**, The new color for them all will be >>> **105**/**15**/**95** >>> ## Q14. Chromaticiy coordinates calculating + $(X,Y,Z)$ is defined as $$ \Big(\int E(\lambda)\bar{x}(\lambda)d\lambda, \int E(\lambda)\bar{y}(\lambda)d\lambda, \int E(\lambda)\bar{z}(\lambda)d\lambda \Big) $$ + With given $E(\lambda)=1 \text{ for all }\lambda$, we can calculate the values simply adding the $x(\lambda)$ values on the chart. Hence, $(X,Y,Z)=(1,1,1)$. + $(x,y,z)$ (the chromaticiy coordinates) can be calculated by $$ \Big( \frac{X}{X+Y+Z},\frac{Y}{X+Y+Z},\frac{Z}{X+Y+Z} \Big) $$ + Therefore $(x,y)=(\frac{1}{3}, \frac{1}{3})$ + Also, $(x,y)$ is **defined** as $(\frac{1}{3},\frac{1}{3})$ under uniform $E(\lambda)$. ## Q15. Digital video uses chroma subsampling... #### What is the purpose of this? By preserving the luminosity information and reducing color information, we can save more storage space or transmitting time. #### Why is it feasible? Since human's sensitivity on colors is weaker than on luminosity, it is acceptable deleting some of the color information. E.g., in TV system, the bandwith for luminance information is usually wider than color information. ## Q16. If a set of ear protectors reduces the nose level by 30dB, how much do they reduce the intensity(power)? 10 dB means 10 times. -> 30 db means 1000 times. ## Q17. If the sampling frequency is 1.5x the true frequency, what is the alias frequency? Since: $f_{\text{alias}}=f_{\text{sample}}-f_{\text{true}} \ \ \ \ \text{for}\ \ \ \ f_{\text{true}}<f_{\text{sample}}<2f_{\text{true}}$ We get $f_{\text{alias}} = 0.5\ f_{\text{true}}$ ## Q18. DPCM coder :::info Suppose we use the predictor: $$ \hat{f_n}=trunc\big[ \frac{1}{2}(\tilde{f}_{n-1}+\tilde{f}_{n-2})\big] $$ $$ e_n=f_n-\tilde{f}_n $$ Also, suppose we adopt the quantizer $$ \tilde{e}_n=Q[e_n]=16\ trunc\big[ \frac{255+e_n}{16} \big]-256+8 $$ $$ \tilde{f}_n=\hat{f}_n+\tilde{e}_n $$ If the input signal is $\text{20 38 56 74 90 110 128}$ What is the output from a **DPCM** decoder without entropy coding? Assuming $\tilde{e}_1=0$. ::: + Assuming $\tilde{f}_0=\hat{f}_1 = f_1 = 20$, we get $\tilde{f}_1=20$ + With the following calculation: | Notation | 20 | 38 | 56 | 74 | 92 | 110 | 128 | | :-: | :-:| :-:| :-:| :-:| :-:| :-: | :-: | | $\hat{f}$ | 20 | 20 | 32 | 50 | 65 | 81 | 97 | | $e$ | 0 | 18 | 24 | 24 | 27 | 29 | 31 | | $\tilde{e}$ | 0 | 24 | 24 | 24 | 24 | 24 | 24 | | $\tilde{f}$ | 20 | 44 | 56 | 74 | 89 | 105 | 121 | ## Q19. DM decoder :::info Suppose we use the predictor and the uniform delta modulation (**DM**): $$ \hat{f_n}=\tilde{f}_{n-1} $$ $$ e_n=f_n-\hat{f}_n $$ Also, suppose we adopt the quantizer $$ \tilde{e}_n=\begin{cases} 10\ \ \ ,& \text{if }e_n>0\\ -10,& \text{otherwise} \end{cases} $$ $$ \tilde{f}_n=\tilde{f}_n+\tilde{e}_n $$ If the input signal is $\text{25 34 39 54 62 70 88}$ What is the output from a **DM** coder Assuming $\tilde{e}_1=0$. ::: + Assuming $\tilde{f}_0=\hat{f}_1 = f_1 = 25$, we get $\tilde{f}_1=25$ | Notation | 25 | 34 | 39 | 54 | 62 | 70 | 88 | | :-: | :-:| :-:| :-:| :-:| :-:| :-:| :-:| | $\hat{f}$ | 25 | 25 | 35 | 45 | 55 | 65 | 75 | | $e$ | 0 | 9 | 4 | 9 | 7 | 5 | 13 | | $\tilde{e}$ | 0 | 10 | 10 | 10 | 10 | 10 | 10 | | $\tilde{f}$ | 25 | 35 | 45 | 55 | 65 | 75 | 85 | ## Q20. Scale down accuracy from 8 bit to 2 bit (Greyscale) #### What is the simplest way to do? Use only the 2 MSBs. #### What ranges of byte values in the original image are mapped to waht quantized values? + **00'b**: 0~63 + **01'b**: 64~127 + **10'b**: 128~191 + **11'b**: 192~255 ## Q21. End-point detection algorithm 1. Find the upper/lower energy threshold ITU/ITL, and the IZCT (Zero crossings rate threshold) 2. Search from the beginning until the energy is larger than ITU. 3. Back off until the energy is less than ITL, set the point as N1 4. Repeat step 1~3, starting from the end of speech, set the point as N2 5. Examine the previous 250ms of N1. If IZCT is reached, move N1 to the first point where IZCT is first exceed. 6. Repeat step 5, start from N2 and check the following 250ms. 7. Now N1 is the beginning point and N2 is the endpoint.