# 50.012 Lecture 3: Multimedia Application Networks ## Context Majority of Internet bandwidth consumers are from the video traffic. How do we serve > 1 billion heterogeneous users (e.g. wired vs. mobile)? ## Audio Multimedia ![](https://i.imgur.com/hO73ppv.png) * Analog audio signal sampled at a constant rate. * e.g. telephone: 8000 samples/sec, CD music: 44,100 samples/sec * The higher the number of bits sampled per second, the more accurate the audio recorded is. * Each sample is quantized (rounded), represented by bits (e.g. 2^8^=256 possible quantized values) * Each quantized value represented by bits (8 bits for values of 256) ## Video Multimedia * Sequence of images displayed at a constant rate. * Digital image: array of pixels, where each pixel is represented by bits. * Optimization is done to use redundancy within and between images to decrease the number of bits used to encode image. * spatial (within image): Instead of sending N values of same colour (all purple), send only 2 values: colour value (purple) and number of repeated values (N) * temporal (from one image to next): Instead of sending complete frame at i+1, send only differences from frame i. ## Multimedia Networking: Application Types * Streaming stored audio, video * streaming: can begin playout before downloading the entire file * stored: can transmit faster than audio / video will be rendered (storing/buffering at client). * Conversational voice / video over IP * Interactive nature of human-to-human conversation limits delay tolerance. * Streaming live audio, video ## Streaming stored video ### Concept ![](https://i.imgur.com/egZ9Wsu.png) ### Challenges * Continuous playout constraint: once client playout begins, playback must match original timing. * However, network delays are variable (jitter), we need client-side buffer to match playout requirements. * Other challenges: * client interactivity: pause, fast-forward, rewind, jump through video. * video packets lost which need to be retransmitted. ### Client-side Buffering, playout ![](https://i.imgur.com/sTj54SH.png) * Client-side buffering and playout delay to compensate for network-added delay, delay jitter. * The frames that are yet to be played will be stored on client-side buffer. ![](https://i.imgur.com/Dts2tMR.png) 1. Initial fill of buffer until playout begins at t~p~. 2. Playout begins at t~p~. ![](https://i.imgur.com/X14qPjS.png) 3. Buffer fill level varies over time as fill rate x(t) varies and playout rate r is constant. Playout buffering: average fill rate ($\bar{x}$), playout rate (r): * x < r: buffer eventually empties (causing video playout to freeze until buffer fills again) * x > r: buffer will not be empty, provided initial playout delay is large enough to abosrb variability in x(t). * Initial playout delay tradeoff: it will take longer to fill up the buffer initially if we want for a "smoother" experience. ### Streaming Multimedia: UDP * The reason why UDP is used because we have control over the rate. * Server sends at a rate that is appropriate for the client. * Usually send rate = encoding rate = playback rate. * Send rate can be oblivious to congestion levels. * Short playout delay (2-5 seconds) to remove network jitter. * Error recovery: application-level (unlike TCP), time permitting. * Real-time Transport Protocol (RTP) [RFC 2326]: multimedia payload types * UDP may not go through firewalls, because sometimes it's not friendly ### Streaming Multimedia: HTTP * Mutlimedia file retrieved via HTTP GET * Send at maximum possible rate under TCP ![](https://i.imgur.com/G7eDNEA.png) * Fill rate fluctuates due to TCP congestion control, retransmissions (in-order delivery) * Larger playout delay: smooth TCP delivery rate. * HTTP/TCP passess more easily through firewalls. ### Streaming Multimedia: DASH * DASH: Dynamic, Adaptive Streaming over HTTP. * Other adaptive solutions: Apple's HTTP Live Streaming (HLS), Adobe Systems HTTP Dynamic Streaming, Microsoft Smooth Streaming * Server: * Encodes video files into multiple versions * Each version is stored and encoded at a different rate. * Manifest file: provides URLs for different versions. * Client: * Periodically measures server-client bandwidth * Consulting manifest, requests one chunk at a time. * Chooses maximum coding rate sustainable given current bandwidth. * Can choose different coding rates at different points in time depending on available bandwidth at the time. * Benefit: "intelligence" at client * When to request chunk so that buffer starvation / overflow doesn't occur * What encoding rate to request * Where to request chunk (from URL server that is "close" to the client or has high available bandwidth). * Can leverage web and its existing infrastructure (proxy, caching, etc.) ## Voice-over-IP ### Challenge VoIP end-to-end delay requirement: needed to maintain "conversational" aspect. * Higher delays are noticeable and can impair interactivity * Delay should be < 150 msec. * Includes application-level (packetization, playout), network delays. ### VoIP Characteristics * Speaker's audio: alternating talk spurts, silent periods. * 64 kbps during talk spurt * packets generated only during talk spurts. * 20 msec chunks at 8 Kbytes/sec: 160 bytes of data * Application-layer header added to each chunk. * Chunk + header encapsulated into UDP or TCP segment. * Application sends segment into socket every 20 msec during talk spurt. ### VoIP: packet loss, delay * Network loss: IP datagram lost due to network congestion (router buffer overflow) * Delay loss: IP datagram arrives too late for playout at receiver. * Delay varies due to queuing in network; end-system (sender, receiver) delays. * Typical m ax tolerable delay: 400 ms. * Loss tolerance ### VoIP: Delay Jitter ![](https://i.imgur.com/fhQQH6O.png) End-to-end delays of two consecutive packets: difference an be more or less than 20 msec (transmission time difference). ### VoIP: Fixed Playout Delay * Receiver attempts to playout each chunk exactly q msecs after chunk was generated. ### VoIP: Adaptive Playout Delay * Goal: low playout delay, low late loss rate * Approach: adaptive playout delay adjustment: * Estimate network delay, adjust playout delay at beginning of each talk spurt. * Silent periods compressed and elongated * Chunks still played out every 20 msec during talk spurt. * Adaptively estimate packet delay: EWMA (Exponentially Weighted Moving Average): $$ d_{i}=(1-\alpha) d_{i-1}+\alpha\left(r_{i}-t_{i}\right) $$ * d~i~: delay estimate after i-th packet * &alpha;: small constant * r~i~: time received * t~i~: time sent * (r-t~i~): measured delay of i-th packet We can also estimate average deviation of delay, v~i~: $$ v_{i}=(1-\beta) v_{i-1}+\beta\left|r_{i}-t_{i}-d_{i}\right| $$ * Estimates d~i~, v~i~ calculated for every received packet, but used only at the start of talk spurt. * For first packet in talk spurt, playout time is: $playoutTime_i = t_i+d_i+Kv_i$ * Remaining packets in talk spurt are played out periodically. How does receiver determine whether packet is in a talk spurt? * If no loss, receiver looks at successive timestamps * difference of successive stamps > 20 msec => talk spurt begins. * With loss possible, receiver must look at both timestamps and sequence numbers. * Difference of successive stamps > 20 msec and sequence numbers without gaps => talk spurt begins. ## VoIP: Recovery from Packet Loss Challenge: Recover from packet loss given small tolerable delay between original transmission and playout * each ACK/NAK takes ~ one RTT. * alternative: Forward Error Correction (FEC) => sending enough bits to allow recovery without retransmission. ### Simple FEC * For every group of n chunks, create a redundant chunk by XOR-ing n original chunks. * send n+1 chunks, increasing bandwidth by factor 1/n * can reconstruct original n chunks if at most one lost chunk from n+1 chunks * Playout delay will increase with n. If we lose the first chunk, need to wait for the entire n+1 chunk to arrive to recover the first chunk. ### Another FEC Scheme * "piggyback lower quality stream" * Send lower resolution audio stream as redundant information * e.g. nominal stream PCM at 64 kbps and redundant stream GSM at 13 kbps. * non-consecutive loss: receiver can conceal loss * generalization: can also append (n-1)st and (n-2)nd low-bit rate chunk ### Interleaving to conceal loss * Audio chunks are divided into smaller units, e.g. four 5 msec units per 20 msec audio chunk * Packet contains small units from different chunks * If packet lost, still have most of every origianl chunk * No redundancy overhead, but increases playout delay.