Beauty and the Burst Reviews

# Beauty and the Burst Reviews ##### What problem is this paper solving? ###### Arpit Gupta This paper demonstrates a system which can accurately predict the content of video streams, that are generally encrypted, with and without local access. ###### <Your Name> <Your response> ##### What problem is this paper solving? ###### Fahed Abudayyeh This paper demonstrates the insecurity of the modern video streaming standard DASH by exploiting a side channel information leakage to accurately predict the content being streamed. Deep learning is used on very coarse grained data (bursty packet transmission patterns) that uniquely identify streamed videos. ###### Jaber Daneshamooz Actually the paper is not solving THE problem; It rather highlights an existing problem. The paper shows that the metadata and the pattern in the data rate of the video can reveal the content, even if the date is encrypted. It uses ML to prove the previous statement. ###### Alan Roddick The problem the paper is trying to solve is to implement a classifier to predict the movie/video title given only metadata. ###### Seif Ibrahim This paper solves the problem of identifying certain characteristics about encrypted video streams based only on the burst pattern of the video, namely it shows that this information can be leaked using CNNs. ###### Satyam Awasthi This paper shows that many encrypted video streams are uniquely characterized by burst patterns, and given course network measurements they can be identified accurately with either direct or remote access. ###### Aaron Jimenez This paper is trying to devise a deep ML solution for classifying video streaming service traffic based on the burst pattern of the video chunk requests. ###### Samridhi Maheshwari The paper aims on doing 4 things - 1. They explain the root cause of packet bursts in video streaming applications 2. They show how the burst pattern is a way of identifying which video is being streamed in an open world setting 3. They develop a CNN based approach to identify videos given some burst patterns 4. They demonstrate how an attacker need not have direct observations to attack and find out which video is being streamed ###### Shereen Elsayed The authors are showing that video streams are charaterised by their burst patterns, demonestrate that CNN classifiers can identify these patterns. ###### Ajit Jadhav This paper presents a Neural Networks based approach to accurately identify the user video based on packet burst patterns even for encrypted video streams, irrespective of the attacker having local access. This method can also be scaled to an open-world setting. This effectively exposes an information leak in the MPEG-DASH streaming video standard. ###### Nikunj Baid This paper attempts to lay emphasis on the fact that videos streamed over a network from a given server can leak information about the video, even though the content is encrypted. The burst patterns of the video are often unique enough to identify the video accurately, and such an analysis of these patterns can be done with or w/o direct access to the client’s machine. ###### Shubham Talbar The paper demonstrates a methodology for how an attacker can deduce what videos are watched by a targeted user on streaming services such as Netflix and Youtube, from direct and indirect measurement of encrypted network traffic. In general, any streaming service using the popular MPEG-DASH standard for streaming over HTTP(S), and specifically the MPEG-DASH content segmentation, is likely to cause an exploitable information leak. ###### Arjun Prakash This paper presents a CNN model that uses the burst patterns in video streams to identify a streaming video. ###### Liu Kurafeeva This paper shows the problem of current trafic encryption, by training the model that identifies the video by packet burst. ###### Deept Mahendiratta It shows an interesting CNN based apporach to identify the videos using their burst patterns. It explores how an attacker can identify a video being streamed without having direct observations. ###### Rhys Tracy This paper has 4 different main purposes all related to how encryption of video streams can easily be broken. First the paper explains how the burst patterns of ecrypted packets when streaming a video are highly correlated to the video you are streaming. Second, the paper demonstrates how these burst patterns are very unique for many YouTube videos. Third, the paper proposes and analyzes a cnn model with very high accuracies for predicting streaming content from packet bursts (on both YouTube and Netflix). And fourth, the paper shows how these burst patterns can be easily captured by a remote attacker. ###### Roman Beltiukov This paper looks at video reidentification process and provides a method to reliably reidentify the video in user's video streams in 'open-world' scenario, effectibely allowing to know what users watch. ###### Navya Battula This paper exlains how despite encryption there is certain information leak and how we could infer this based on the burst patterns in the video streams using Convolutional Neural Networks. ###### Nawel Alioua This paper uses an information leak in MPEG-DASH to characterize video streams based on their unique burst patterns. ###### Nagarjun Avaraddy The paper is highlighting the problem of how encrypted video streams have information leak in the form of burst patterns it employs to reach the destination and how that information leak can be used by the attackers using CNN based approach. ###### Pranjali Jain The MPEG-DASH streaming video standard contains an information leak. The paper implements a video identification method based on deep learning by leveraging this information leak. Video identification can be easily done for streaming videos by both direct and remote adversaries. ###### Achintya Desai Because of TLS encryption, it is harder to perform traffic identification over an encrypted stream than in an unencrypted stream where data is visible to analyze for the entire network by packet capture. This paper provides a systematic way to perform encrypted video traffic identification remotely by capitalizing on the burst pattern leakage of the MPEG-DASH streaming video standard. ###### Brian Chen This paper seeks to demonstrate that the current burst heavy network traffic patterns of many streaming services are unsafe in regards to privacy. It seeks to explain root causes of burst patterns, develop a machine learning exploit that is resilient to noise, and expand the attack to remote adversaries without direct network access. ###### Vinothini Gunasekaran The standard video streaming which is widely used (MPEG-DASH) has information leaks. An attacker can identify the user’s streaming video content by analyzing the video characteristics on the network, even though they are encrypted. ###### Apoorva Jakalannanava The paper highlights that the burst patterns even in encrypted video streams can be exploited to identify the videos watched by the client. These attacks can be performed in on-path, cross-site and even cross-device settings. The paper develops a robust burst pattern identification methodology based on CNNs for the same. ###### Punnal Ismail Khan This ###### <Your Name> <Your response> ##### Why is that problem important? ###### Fahed Abudayyeh This problem is important because it shows that content encryption isn't enough to make security guarantees for a streaming service. The paper shows why it is paramount to account for side channel information leakage. ###### Jaber Daneshamooz Because the privacy matters and mapping the content of video to the individuals would reveal many things about people like political and sexual information. We bother ourselves to encrypt the data for privacy reasons but the paper shows that this method doesn't work. ###### Alan Roddick This problem is important because nowadays the content of the packets are encrypted, but a lot of data still gets leaked such as the "burst" download of video segments in order to fill the client's buffer. Knowing the burst size gives information about the video segment size and also the bitrate pattern. ###### Seif Ibrahim We can see that encryption alone is insufficient to hide all information about the data in a video stream. This calls for stronger security algorithms in the case where data privace is cruicial. ###### Satyam Awasthi It demonstrates that encryption is not enough to safeguard user privacy in video streaming. An attacker can observe the victim's video fingerprints either directly or remotely to identify streamed videos. ###### Aaron Jimenez This problem is important because it shows that even though network traffic is encrypted, there is still information leakage through things such as network burst patterns. ###### Samridhi Maheshwari The video streaming segmentation is usually done at the application layer, and hence, even if the content is encrypted at the network layer, because of the burst sizes and the delay between each burst, video content can be identified. By showing that tools can be made to identify videos (which would lead to privacy breaches), the paper aims at showing how we need better encryption and packet sending mechanisms at the application layer as well to avoid attackers from identifying videos which users watch. ###### Shereen Elsayed Their might be a security issue in video streams if their traffic patterns are correlated with the content, the attacker can be able to identify the video being streamed. ###### Ajit Jadhav The presented information leak in the MPEG-DASH leads to possible privacy compromise for the user. Thus, this problem is important to preserve user privacy. ###### Nikunj Baid This paper highlights that encrypting the payload doesn’t gurantee that the user activity is private. Most of the popular video streaming services use MPEG-DASH as the standard, and the burst pattern it generates makes it relatively easy for an attacker to track down the video being streamed. This is a major security threat and needs to be addressed. ###### Shubham Talbar What videos we stream from Internet beholds important information about our choices, personality, socioeconomics and mood. Large number of parties seek to exploit this information for monetization by advertisers, insurers or simply to monitor access of undesired information. Hence, the problem of information leak is important. ###### Arjun Prakash It's important because just by using the burst patterns an adversary can identify the video being streamed even if it is encrypted. ###### Liu Kurafeeva The shown leak is a privacy issue. The viewed/viewing video list is a private information that can be misused. ###### Deept Mahendiratta The issue highlighted is important because it brings out in the open that current video streaming encryption techinques are inefficient in maintaining user privacy and that better security techniques need to be implemented. ###### Roman Beltiukov This result is important as the proposed leak provides the stable method of video identification from encrypted traffic and therefore violates user's privacy. ###### Rhys Tracy This paper shows how encryption of network packets is not enough to ensure that your traffic is protected and your privacy is ensured when using DASH ###### Navya Battula I think this is an important problem because, talking not only in the perspective of privacy protection of users which ofcourse is an area that deems improvement, but in terms of also the interesting intuitions we can form from these patterns because the current internet is migrating rapidly into encryption and a significant portion of traffic out there is encypted. In future, our reliance on these patterns is going to be critical if we are developing machine learning models that work protocol agnostically. ###### Nawel Alioua It exposes sensitive information of video streams, and uses them to make inferences of otherwise undisclosed information (ex. Popularity histogram of Netflix Videos), which is a security problem. ###### Nagarjun Avaraddy The problem is important because it shows how encryption is not enough as burst patterns are still enough to predict the video being transported in the network and calls attention to better measures of secure transfer.. ###### Pranjali Jain The paper brings to the forefront the information leakage issue of the standard streaming video protocols even when the packets are encrypted. This leads to major security and privacy issues for the user as well as the streaming services. ###### Achintya Desai A recent estimate by Google suggests that about 95% of the traffic over the internet is encrypted [1]. This makes a solid case for finding a solution to perform traffic identification over encrypted traffic without invading the user's privacy. Although it is presented as an attack, This paper could be seen as a step toward traffic identification over encrypted streams. It also shows a side-channel exploit on a widely used video streaming standard that leaks user information even in the presence of encryption. Citation: [1]https://threatpost.com/decryption-improve-security/176613/#:~:text=Google%20estimates%20that%2095%20percent,data%20integrity%20and%20consumer%20privacy. ###### Brian Chen This problem is critical to understanding the current state of streaming and network security. The paper has proposed concrete attacks against burst heavy networks and even expanded it to remote adversaries. Ultimately, by demonstrating that such a thing is possible, the paper draws attention to the current privacy vulnerabilities arising from network patterns. Only by exposing the issue can be the issue be resolved, and this paper does just that. ###### Vinothini Gunasekaran Attackers can observe users’ video-watching patterns and can sell them for commercial purposes like targeted advertising. This information leak leads to users’ privacy invasion. ###### Punnal Ismail Khan ##### How does DASH standardize a leak? ###### Fahed Abudayyeh DASH standardizes an information leak through its use of variable-bitrate encoding (VBR). In order to stream a video, DASH divides the content into segments of varying sizes, dependent on VBR. Since VBR dynamically adjusts the bitrate of a video based on how many pixels are changing position, we end up with differently sized segments for the same duration of video. From these differently sized segments, we can encode a unique fingerprint for each video based on the bitrate over time. ###### Jaber Daneshamooz In DASH, clients request for the segments of video. Each segment has 5 second of the video(in terms of duration, the size of these segments are equal). But, due to encoding and compression methods, the size of these segments in terms of bites is different(VBR). For example, when there are too much motion in the video, we need to send more bits for 5 second of video comparing to the case when the picture does not change much in that 5 second. The pattern in the size of these chunks leaks information about the video. ###### Alan Roddick The leak is standardized because DASH creates 5-second segments of video to be requested by the client. Due to variable bitrate, these 5-second segments vary in size depending on the content of the video. This affects the size/amount of packets that a client will receive for each 5 second interval of video. ###### Seif Ibrahim Since videos are compressed using algorithms that take into account the content of the unencrypted video in order to encode more or less information for a given chunk of time in the video, different portions of the video that have the same length in seconds may have a different segment size in bytes. This means that the client will be downloading varying amounts of data based on which video it is playing and what part of the video it is playing (e.g. if the video is changing from scene to scene quickly the client may request a large burst of data vs if there is a stationary scene). This variation in datarate over time can be used as a fingerprint to identify a video. ###### Satyam Awasthi Dash divides the video into segments. During a streaming session, a client sends the server requests for individual segments when its buffer is just below threshold. Therefore, in a steady-state, on/off stream burst sizes correlate with the on-disk segment sizes. The disk sizes leak information about the encoded content due to variable-rate encoding since DASH segments of the same length segments have very different sizes. ###### Aaron Jimenez DASH helps standardize a leak because it standardizes chunk sizes based on content time-length. This means that the only thing that may change within a chunk is its size in bytes, which allows for estimation of what kind of content may be playing. ###### Samridhi Maheshwari Because of the way that DASH divides the video into segments and stores them on the server, the way that the content is requested, corresponds to the segment sizes. Usually content is requested in form of bursts, and hence each burst more or less corresponds to the segment size which is being requested. Each segment uses variable bit rate encoding, where higher levels of content correspond to a higher bitrate. Hence even if the video is the same size, the bit rate can vary depending on the content. The papers authors show that the burst is proportional to content by making a video with alternating action scenes and resting scenes, and they observe that the burst size increases every 45 seconds and decreases every 45 seconds ###### Ajit Jadhav DASH video is streamed in segment sized chunks. Since video streaming services use content-based variable-bitrate (VBR) encodings, DASH segment sizes vary a lot based on the content. These segments are downloaded by the clients before the current segments are over which leads to packet bursts in the network corresponding to the on-disk size of the segments. And based on these burst sizes, the content can be identified leading to the leak. Since DASH standardized the bursty behavior of video streams, it effectively standardized the above mentioned leak. ###### Shereen Elsayed The videos are divided into chuncks of roughly same duration ###### Nikunj Baid In order to improve QoE, most of the major streaming services use variable-bitrate encoding (VBR ) , which may lead to different sizes of the constituting DASH segments with similar duration. This can generate a pattern now thereby making it prone to leak info about the exact video being played, as different videos would have different patterns. ###### Shubham Talbar The root cause of information leaks in MPEG-DASH streaming is that the amount of information needed to represent a video segment, at a given perceived quality, depends on the content of the segment. For example, a still scene where most of the picture is static, can be compressed to a much smaller size than a fast-paced action scene. Streaming services use variable-bitrate (VBR) compression schemes that take advantage of this, to reduce the amount of transmitted data. This burst series of video content can be mapped to a video tile using Deep Neural Network. ###### Arjun Prakash DASH videos are streamed in segment-sized chunks and each of these segments are encoded using VBR. And since VBR encoded segments have different sizes for the same duration of video content, it can leak information about what video is being streamed. ###### Liu Kurafeeva The leak standardized by segmented video division which is not changes from user to user (differs only for different qualities, which is a constant set), which leads to simular patterns for each video. ###### Deept Mahendiratta DASH works by segmenting a video into different chunks, and since content is requested in form of patterns, each packet burst more or less correspond to the segment size which is being requested. Hence this behavoiur helps in creating a pattern which can be trained. ###### Roman Beltiukov Due to advanced video compression techniques, chunks of the same time of different videos would be different in the size (depending on amouunt of pixels changing in the video). As DASH provides a predictable timeslots for chunk requests, we can easily separate chunks and calculate each chunk total size ###### Rhys Tracy DASH made it a standard to split video data into segments of roughly the same amount of video time. In each of these segments, there will be different amounts of visual information (so different sizes of sent packets). This causes easily identifiable patterns to show up in the network traffic as a standard. ###### Nawel Alioua Packet bursts in encrypted streams correspond to segment requests from the client and that burst sizes are highly correlated with the sizes of the underlying segments. ###### Navya Battula When we discuss about different frames in a video, each frame might be depicting differnt scene and it depends on the video type when comparing the amount of variation between these frames. Therefore video streaming services make use of variable bitrate and therefore the same size DASH segments often end up having different sizes. The paper tries to explain that burst sizes are correlated with the on disk segment sizes and they tend to leak information due to variable rate encoding. ###### Nagarjun Avaraddy DASH standardizes leak as it divides the videos into segments coupled with the fact that the bitrate of the parts of video is variable wrt to the type of content and the packet bursts are correlated to the size of the segments. ###### Pranjali Jain DASH divides video content into segments based on in-video presentation seconds because streaming services use variable bit rate(VBR) for video encoding. DASH videos with the same segment sizes can have different sizes in bytes. Thus burst sizes can be correlated with on-disk segment sizes which can, in turn, lead to information leakage about the video being streamed. ###### Achintya Desai DASH video standard produces segments with different byte sizes but the same video duration. This is due to the way VBR encoding works, which encodes the video scenes based on the amount of perceptually meaningful information to make itself efficient. When a client downloads a segment, it is done in burst which has correlated on-disk segment size. This leaks information about the content since VBR makes it correlated with the segment size. ###### Brian Chen DASH fundamentally standardizes the leak in the sense of making it commonplace. The way that DASH works is by separating video into chunks and compressing these chunks where possible. This leads to variable bit rate as the data streamed across a certain amount of time will usually differ. Furthermore, buffers are filled when the existing content in the buffer is consumed to just below a threshold. This is exactly the burst like behavior that this paper targets, and hence DASH standardizes the leak. ###### Vinothini Gunasekaran In DASH, the video streaming happens in segments which use variable bitrate encoding (VBR). The client requests the new segment based on its buffer size. Since the burst sizes are correlated with the segment sizes, one can know more about the content based on the variable bitrate. ###### Punnal Ismail Khan ###### <Your Name> <Your response> ##### Explain Figure 2.2. and 2.3. Use **WALTER** technique to describe the figures. Here, W=Why?, A=Axes, L=lines, T=trend, R=recap/takeaway ###### Fahed Abudayyeh Figure 2.2 provides a demonstration of how VBR can result in the ability to create a unique identifier for a video based on the bitrate over time of the streamed content. The video in this figure shows spikes in bitrate for high action sequences, which will increase the size of the streamed data segments for that period of time. Figure 2.3 shows how the packet burst sizes while streaming form an identifier for a video because of the use of VBR. The alternating bitrate affects the size of the segments which affects the packet burst sizes. ###### Jaber Daneshamooz For figure 2.2: Why? It wants to show how VBR in DASH standardize the leak in the video. Axes: The X axes correspond to the time of the video and the Y axes corresponds to the video bitrate. Lines: Show the video bitrate at different times of the video. Trend: The lines show that when there were too much motion in the video(chase scene), the data rate increases and when there is not(Iguana is still), the data rate drops. Takeaway: There is a noticeable correlation between the video content and bitrate(it has a pattern) For figure 2.3: It is somehow like the previous figure. When the bigger segments(steady motions) were fetched, we had an increase in the burst size and vice versa. ###### Alan Roddick Figure 2.2: Bitrate varies over time depending on the scene in the video. The figure is plotting bitrate vs. time for the "Iguana vs. Snakes" video. When there is more rapid movement in the video (Rapid movement, Chase scene) the bitrate is higher than when there is less movement (Iguana is still, Iguana is resting). This figure shows that there is some correlation between what is occuring in the video and what the bitrate is.\ Figure 2.3: Because of how media streaming services store their videos, segment sizes of the videos, and therefore burst sizes, are related to the bitrate of the video. The figure is plotting the burst size in bytes vs. time. From this data one can determine when the clients buffer was initially filled and when certain periods of the video had a higher bitrate, and therefore more motion. This figure illustrates that the burst size can be distinguised depending on the bitrate. ###### Satyam Awasthi VBR (variable bit-rate) encoding is used by the majority of video streaming services. This video compression algorithm utilizes the fact that different scenes can contain different amounts of perceptual data. Since the on/off stream for buffering the video correlates to segment sizes, vbr causes leak of information. Fig 2.2 shows the bitrate for a video in time, Fig. 2.3 shows the buffer and burst sizes of the “on” periods in steady state. During the intense scenes with more movements, we see higher bitrates thus, a larger “on” period or burst size. Similarly for resting sequences, the bitrate decreases and so does the burst size. This gives us a pattern that can be uniquely attributed to a particular video. In a video stream with different contents, the pattern would be different. ###### Aaron Jimenez Figure 2.2: This figure is meant to show the variability of bitrates within a single video as different things occur within it over time. The y-axis measures the bitrate in bits per second, while the x-axis measures time in seconds. As more action occurs in the video (i.e. the iguana is moving) the bitrate of the video starts to increase and as less action occurs (i.e. the iguana is standing still) the bitrate decreases. In the end, as more action occurs within a video, the bitrate will temporarily increase, before falling again when less action occurs. This cycle repeats until the video ends. Figure 2.3: The figure is meant to show the changes in the size of individual bursts of a video over time. The y-axis represents the burst size in bytes, while the x-axis represents time measured in seconds. Bursts occur every five seconds. The graph shows an initial large burst size equal to the client's buffer size before eventually falling and stabilizing at a near constant rate, with small increases and decreases in burst size corresponding to the actions of the iguana in the video. In the end, this graph showcases the increases and decreases of burst sizes over time as the video plays. This in many ways ties back to the changes in video bitrate of a stream. As the bitrate increases, the burst size must increase to compensate for the greater data demanded. ###### Samridhi Maheshwari Figure 2.2 - This figure shows bitrate over time of the Iguana v Snakes video. The idea of this graph is to showcase the proportionality between bitrate and content in the video. In action scenes, there is a lot more content as compared to the resting video and hence the bitrate is higher alternatively in the chasing scenes. Figure 2.3 - This figure shows the burst sizes (during the on periods of steady state) vs time. During steady state when content is fetched in segments, burst sizes correspond to the segment sizes. The segments where the iguana is escaping, the burst size increases, and when the iguana is resting, the burst size decreases. This is because of the different bitrates corresponding to content and hence segment sizes are different. The first burst is of the largest size and that is the client's buffer size. ###### Shereen Fig 2.2 illustrates the difference in bitrates according to the scene, if the scene is intense/action, the bitrate increases. The axes are the bitrate and the time. The video was divided of 45 seconds chuncks of either high/low action scene. Fig 2.3 showes the buffer and burst sizes. In the beginning of the figure, the burst size was very large (50 sec.) as all the seg-files were fetched before reaching the steady state and start fetching every 5 sec. However, the scenes wih high action were being fetched, the burst size increased and the opposite with the low actions - they alternated every 45 sec as the chunks of scenes with high action were 45 sec and low action were 45 sec and they keep alternating. ###### Ajit Jadhav Figure 2.2: For a sample video “Iguana vs. Snakes”, the graph shows a plot of bitrate over time (bitrate on y axis and time on x axis). The plot indicates that not all scenes require the same bit rate with scenes containing more variation requiring higher bit rate compared to scenes with little variation over time. Figure 2.3: This figure shows the plot of burst size over time (burst size on y axis and time on x axis). It indicates the variation in burst sizes due to the difference in bit rate throughout the video thus showing us that the burst size data of a video can have a unique characteristic depending on the video. ###### Nikunj Baid Fig 2.2 : This is to show that different scenes, irrespective of the duration of the scene, are encoded with varying bitrates. The x axis represents the time in seconds into the video being streamed. The y axis represents the bitrate used to encode that particular frame of the video. We notice that for the given video, the bitrate used for the action scene i.e the snake chasing the iguana, is encoded with a higher bitrate, compared to the scenes where the iguana is chilling. We see this trend for the entire duration of the video. Fig 2.3: Here we show the characteristics of the traffic generated while fetching the above video on the client side, using Wireshark. The x-axis represents the time in seconds into the video being streamed. The y-axis represents the burst sizes of the segement files that are being fetched, at 5 sec intervals. We observe a similar pattern as we saw in Fig 2.2, where the burst size for the segments representing the action scene is higher compared to that of the resting scenes. This pattern of varying burst sizes can now be used by an attacker to identify the video being streamed. ###### Shubham Talbar Figure 2.2: This figure demonstrates the variability of bitrates used to encode different scenes of a video. The y-axis measures the bitrate in bits per second, while the x-axis measures time in seconds. Compression using variable-bitrate (VBR) encoding is designed to use the minimum amount of data needed to represent a scene at a given perceptual quality. This is highly dependent on the video content. For example, highly eventful action-scenes, such as the one in the video Iguana vs. Snakes, require a high bitrate to represent. This is easily seen in the figure, depicting the bit rate fluctuations as the video progresses through different scenes. Figure 2.3: The figure quantifies the changes in the size of individual bursts of a video over time. The y-axis represents the burst size in bytes, while the x-axis represents time measured in seconds. During the steady state, when segments are fetched every 5 seconds, burst sizes correspond to the sizes of segment files. When the segments with an escaping iguana are being fetched, burst size increases. When the segments with a resting iguana are being fetched, it decreases. ###### Arjun Prakash Fig 2.2 shows the variation in the bitrate as the video is being streamed. It shows the Bitrate is high when there is HIGH action(chasing scene) and is low during low action scenes. Fig 2.3 shows Burst size vs time in the custom-built video. It shows a higher initial burst size where segment files are fetched at a rate higher than the presentation rate. Once a steady-state is reached, fetch happens every 5 seconds. The burst size is high when the video is streamed with a high bit rate and it's low during the low-bitrate period and this alternates every 45-seconds. ###### Liu Kurafeeva Fig 2.2. shows the bitrate changes (Y) over specific times of video (X) and explains the major bitrate changes with video events. It is clearly seen that low scene change leads to low bitrate. Fig 2.3. shows that the burst size (Y) changes over specific time moments in video (X) and relates this changes to the scene change and events on video. ###### Deept Mahendiratta Fig 2.2 Shows how bitrate changes for different scenes in a video even though the time duration is same. It is more for high action scenes and low for resting scenes after a steady state is reached. Fig 2.3 Shows that burst size changes with time during a video depending on the content being requested. It is high for ###### Rhys Tracy Fig 2.2: This graph shows the bitrate of information streamed from a “Iguana vs. Snakes” YouTube video over time as the video is streamed. This graph is interesting because it shows the bursty nature of video streaming and how more action in a video means a higher bitrate (more visual information being sent per period of time). This is important for understanding why burst sizes vary in DASH. Fig 2.3: This graph shows burst sizes when streaming a video with variable bit rates over time as the video is streamed. Primarily, this demonstrates the correlation between bitrate and burst sizes (ie higher bitrate means higher burst sizes) and how the burst sizes will vary over time in a video leaving patterns. This is a primary piece behind understanding DASH's vulnerability. ###### Roman Beltiukov Figure 2.2.: On this figure the dependency between video scene speed and bitrate-per-second is shown. The authors show that due to variable bitrate resolution for scenes with more action actual bitrate would be higher. Figure 2.3.: On this figure burst size graph is shown. Authors show that bursts are very predictable in time and are connected to whether the scene is an action scene or not. ###### Nawel Alioua Figure 2.2 shows the fluctuation of the bitrate throughout the duration of the “Iguana vs. Snakes” video. The y-axis represents the bitrate in bits per second, and the x-axis represents the time in seconds. A noticeable trend is the increase of bitrate when there is rapid movement in the video, and a decrease when the video is predominantly still. Figure 2.3 shows the buffer and burst sizes of the “on” periods in the steady state throughout the length of an artificially crafter video based on scenes from the previous video. The graph shows that when segments are fetched every 5 seconds, burst sizes correspond to the sizes of segment files. The variation correlates with how the video was crafted, the burst sizes alternating from high to low and vice versa every 45 seconds. ###### Seif Ibrahim Figure 2.2 demonstrates how bitrate changes over time while a video is playing. It puts bitrate (bits/s) on the y-axis and time in seconds on the x-axis. We see that during scenes where there is a lot of change between frames the bitrate goes up and decreases when the video is stationary. Figure 2.3 shows the burst size of network data used to fill the buffer over time. The y-axis shows the burst size in bytes and the x-axis shows time in seconds. We can see a big burst at the start to fill up the initially empty buffer, then in situations where the buffer is emptying quickly due to a higher bitrate (we can compare with figure 2.2) another burst happens to fill up the emptying buffer. ###### Nagarjun Avaraddy Figure 2.2 - To show that bitrate is correlated to the kind of content playing in the video i.e. action is high bitrate & non-action is low bitrate. Bitrate(Y axis) vs Time (X axis) shows the changes in bitrate over the course of video. We can see that the bitrate is lower when the video content is non-action, and higher when there is a lot of action content. Figure 2.3 - To show the burst size of content during the course of time in video streaming. Burst size (Y axis), time(X axis). We can see that the initial burst size is highest, and the burst size is higher for higher bitrate (action) content and lower for resting content. ###### Pranjali Jain Figure 2.2 shows how the bitrate of a part of the “Iguana vs. Snakes” video changes over time. The x-axis vs y-axis is time(seconds) and bitrate(bits per second) respectively. Action scenes where frames change very fast are encoded with higher bitrate compared to slower scenes where not much happens. Figure 2.3 shows the buffer and burst sizes of the network. The x-axis vs y-axis is time(seconds) vs burst size(bytes). The burst size increases when the action scene is being fetched and decreases when the slower scene is being fetched. ###### Achintya Desai *Figure 2.2 Why: The figure shows video bitrate fluctuations depending on the content of the video Axes: The Y-axis shows bitrate per second. The X-axis shows the time elapsed in the video in seconds Lines: The lines showcase the bitrate change over the time period of the video and how it changes according to the scene. Trend: The trend in the figure is that whenever there is a high movement/action scene is happening in the video, the bitrate goes high. On the other hand, whenever there is not much movement or action in the scene, the bitrate goes low. Takeaway: This clearly shows that the bitrate leaks the information about the type of content. Such a graph/figure can be used to identify the content. For example, if the video is CCTV footage of a home backyard where almost everything is still, the bitrate is expected to have very low fluctuations. *Figure 2.3 Why: The figure shows burst size changes in bytes over the video time Axes: The Y-axis shows burst size in bytes. The X-axis shows the time elapsed in the video in seconds. Lines: The dots showcase the burst size change over the time period of the video and how it changes according to the scene. Trend: The first dot indicates the client buffer size which is the largest burst indicating the beginning of the video. Later, the trend in the figure is that whenever there is a high movement/action scene is happening in the video, the burst rapidly increases beyond 10^6. As soon as the movement slows down, burst size also goes back to its stable size of 10^6. Takeaway: This clearly shows that the burst size is affected by the content streamed in the video. ###### Brian Chen Figure 2.2 maps out the different bitrates for a particular scene and allows for easier visualization of the example. It demonstrates that even relatively short videos can fluctuate wildly depending on content. The vertical axis is bits per second and the horizontal axis is second elapsed since start. There is only one line which depicts the current bits per second at the given time for “Iguana vs. Snakes”. The trend of the graph is that there are high bitrate bursts at action scenes and low bitrate lulls when there is less action. Ultimately, the graph shows that there is a high degree of variable bit rate during video fetching, which will likely lead to burst behavior. Figure 2.3 exists to visualize the number of bursts and the degree of burst for the example video. The vertical axis is burst size measured in bytes and the horizontal axis is time elapsed in seconds. There is no line, but instead there are dots that represent the peak of bursts. The trend of the graph is that every so often there are bursts, and these bursts become even greater at certain points in a fairly repetitive manner. The takeaway from the graph is that streaming videos often have clear patterns that relate to the content of the video being streamed. Presumably, these patterns are to be used in the attacks against high frequency burst networks. ###### Vinothini Gunasekaran The DASH leak is explained with an example video where Iguana and Snake are engaged in a chase. - Figure 2.2 shows how the bitrate changes based on the Iguana’s movement at different times. X and Y coordinates are mapped using time and bitrate respectively. The graph line spikes during the action scenes (snake chases iguana) and goes down when there is no action (iguana stays still). This trend shows how the bitrate pattern changes over time based on the video content. - Figure 2.3 shows how the burst size changes based on the Iguana’s movement at different times (here it is for every 5 seconds). X and Y coordinates are mapped using time and burst sizes respectively. The plotted graph shows that the burst size is increasing during action scenes (snake chasing iguana) and decreasing on non-action scenes (Iguana resting). This trend is different for the first burst alone because it is the size of the receiver’s buffer size. ###### Punnal Ismail Khan ###### <Your Name> <Your response> ##### How is data collection automated? ###### Fahed Abudayyeh The data is collected and automated by simulating users using a web crawler to launch instances of an internet browser (Google Chrome in this case) to stream content from different streaming services. The crawler would use a rewind procedure for each service to ensure video playback starts at the beginning of a video. Wireshark is used to monitor the packet flow to the user from the streaming service and the data is saved to analyze the burst patterns. ###### Satyam Awasthi They used an automated crawling algorithm for capturing streaming network data. The crawler would emulate user behavior, which was captured using WireShark. ###### Alan Roddick The data was collected using Google Chrome. Each video was ensured to always play from the start. Once the videos were playing, the network traffic was recorded using Tshark. For youtube, they automatically clicked on the next video that was recommended. ###### Aaron Jimenez Data was collected by launching an instance of Google Chrome, with a “rewind” procedure for each service so that the videos would always start at the beginning. From there, network traffic was captured via Wireshark where packets were filtered for TLS (Netflix, Amazon, Vimeo, etc.) and QUIC (YouTube) and kept the flows with the greatest amount of bits. ###### Jaber Daneshamooz Used a web crawler to follow the recommendation links on YouTube etc and made a Chrome instance that starts the video at the beginning of the content. ###### Samridhi Maheshwari For each title, they spawned a Chrome browser instance and used a service-specific “rewind” procedure so that playback commenced at the beginning of the content. For videos with an initial title sequence (ads for example), this sequence is downloaded as part of the initial buffering; the bursts in the on-off phase correspond to the segments of unique content. They captured the network traffic of each streaming session for a certain duration using Wireshark’s tshark. For Amazon, Netflix, and Vimeo, the application-layer protocol is TLS, for YouTube, it is either QUIC, or TLS. ###### Shereen Elsayed They used chrome-browsers' spawns and service-specific “rewind” procedure. For unique initial titles, they downloaded it as part of initial buffering. They used Wireshark to capture network traffic. ###### Seif Ibrahim They launched a chrome browser that started the video from the begining and then used wireshark to capture packets and plot the bitrate over time. ###### Ajit Jadhav Data collection automation is achieved by crawling the web pages and recommendation links. The crawling was done by spawning a Chrome browser instance. Also, failed playbacks containing very few bits were discarded in the process. Corresponding network traffic for each streaming session for a certain duration was captured using Wireshark’s tshark. ###### Nikunj Baid It is automated by spawning a chrome instance for each of the titles being observed, and use a rewind procedure specific to the service, such that the same video can be replayed multiple times, and that the playback commences at the beginning of the content each time. The network traffic for each of these sessions was then captured using Wireshark’s tshark. ###### Shubham Talbar Data collection was automated via using a web crawler that emulated user behaviour on four popular video streaming platforms. For each title, the authors spawned a Chrome browser instance and used a service-specific “rewind” procedure so that playback commenced at the beginning of the content. ###### Arjun Prakash The authors used a crawler to emulate the user's behavior. The crawler would start from a popular YouTube video and recursively follow the recommendation links. Each video is spawned in a new Chrome instance and the “rewind” procedure was used to commence the video from the start. ###### Liu Kurafeeva The authors simulates users behaviour by loading and "viewing" the videos, collects the traffic and labels it correspondingly to "viewed" video. ###### Deept Mahendiratta The authors started a new chrome browser instance for all the titles, using "rewind" to start the video from beginning. They used a webcrawler to emulate user behaviour starting from a famous video and then following the recommendations. ###### Rhys Tracy Data collection was automated by simulating a user with a web crawler and accessing a video on a certain platform, then using the platform's 'rewind' feature to start from the video beginning and capturing the transferred packets with WireShark. ###### Nawel Alioua The authors focused on four popular streaming services: Netflix, YouTube, Amazon, and Vimeo. they manually chose a number of titles from each service. The automation of data collection was done by - spawning a Chrome browser instance and using a service-specific “rewind” procedure so that playback commenced at the beginning of the content. - capturing the network traffic of each streaming session for a certain duration using Wireshark’s tshark [60]. ###### Roman Beltiukov The data collection process is automated using crawler based on Chrome with simulated user behaviour. Authors collect 100 iterations of each selected video and record network traffic using tshark. ###### Nagarjun Avaraddy Data is collected by automating a launch of a new chrome browser instance for each video title, the titles range from data in Netflix, Vimeo, Amazon as well as Youtube. Wireshark is used simultaneously to collect the packets. From each capture, the TCP flow with the greatest amount of bits was kept, the time series of the following flow attributes: bytes per second, packet per second and average packet length was extracted as final data to work with. ###### Navya Battula The chrome browser was spawned and made use of a certain service specific rewind mechanism that the play back commenced right from the begining of the video. For Netflix, prime and Vimeo, the videos were selected specifically for playing and for youtube, the videos were streamed using automatic crawl. The traffic was captured using the Tshark tool. ###### Pranjali Jain Data collection is done for titles on Netflix, YouTube, Amazon, and Vimeo. The process is automated by creating Chrome browser instances for each title and starting playback at the beginning of the titles using service-specific rewind procedures. The paper also makes sure that the bursts in the on-off phase correspond to the segments of unique content. Wireshark’s tshark framework is used to capture the network traffic for the streaming sessions for the required duration. ###### Achintya Desai The data is collected over a number of different titles from major streaming platforms such as Netflix, Youtube, Amazon, and Vimeo. To select the content, for Youtube specifically, they used a crawler that started at the front pages of topical channels and continued following the recommendation links. Firstly, in order to avoid the repeated title sequence, they used “rewind” procedure to ensure that the video is played at the beginning of the actual content. This was done on a chrome browser instance individual to a title. Furthermore, the network traffic which was encrypted mostly under TLS or under QUIC was captured by using Wireshark’s network protocol analyzer tshark. ###### Brian Chen Data collection was automated by opening individual Chrome browsers. Each browser had a service-specific rewind procedure that would allow the video to replay from the very beginning. Then, the videos would be run and Wireshark’s tshark would capture the network traffic. ###### Vinothini Gunasekaran They used the Chrome browser and used service specific rewind procedures to start from the beginning of the videos. Then captured network packets using WireShark's tshark for each streaming session. ###### <Your Name> <Your response> ##### What's the Bento4 MPEG-DASH toolset doing? ###### Fahed Abudayyeh The Bento4 MPEG-DASH toolset is used to simulate a video streaming server (Youtube in this case) that uses the DASH standard to segmentize videos. ###### Satyam Awasthi The Bento4 tool takes care of creating the DASH MPD document for serving DASH streams. The author of the paper used the said tool to model a server for Youtube videos but using standardized streaming MPEG-DASH. ###### Alan Roddick The Bento4 tool helps to standardize the streaming for YouTube videos. The videos were divided into 5-second time segments to be requested one at a time by the client. ###### Aaron Jimenez The Bento4 MPEG-DASH toolset is used to process the YouTube videos collected for standardized streaming. They divided the videos into five-second time segments and manifests were created for them. ###### Jaber Daneshamooz They used it to process the YouTube videos for standard streaming. For example, they needed to divide the videos into them into 5 second segments in order to create the manifests. ###### Samridhi Maheshwari The authors used Bento MPEG DASH toolset to process the youtube videos for standardized streaming - divide them into time segments and create the manifests. The authors opted for 5 second segments. ###### Shereen Elsayed It is used to process the video streams and apply their standard (videos are divided into chunks/segments) and to crerate manifests. ###### Seif Ibrahim This tool is used to divide videos into segments and generate manifests (metadata about which segments are available and what quality) as would be done by a DASH server. ###### Ajit Jadhav It was used for processing all the YouTube videos for standardized streaming. As a part of the processing, it was used to divide the videos into 5-second segments and create the manifests. ###### Shubham Talbar The Bento4 MPEGDASH toolset was used to process the collected 3,558 YouTube videos for standardized streaming. To divide video streams into time segments and create the manifests. ###### Arjun Prakash The Bento4 MPEG-DASH toolset is used to process YouTube videos for standardized streaming. They divide them into time segments and create the manifests. ###### Nikunj Baid It is used to process the set of Youtube videos for standardized streaming on the home server, where the video is divided into 5-sec segments, which is then used to create the manifests. We do this to mimic the youtube server like behaviour on our side. ###### Liu Kurafeeva It is used for deviding video streams into time segemnts by the rule of the standart iteself and creating mamifests. ###### Deept Mahendiratta It's converting the youtube videos into DASH format diving them into 5 second segments so that data can be standardised across platforms. ###### Nawel Alioua It standardizes the streaming, i.e., divide videos into time segments and create the manifests. ###### Rhys Tracy The toolset is used to process videos for streaming (divide them into segments of the same time period and create manifests). ###### Roman Beltiukov This toolset is designed for all DASH media format needs, including streaming preparation and video segmentation. ###### Nagarjun Avaraddy The Bento4 MPEG- DASH toolset is used to process YouTube videos for standardized streaming, i.e., divide them into time segments and create the manifests. This is used to model the server behavior which servers video data using DASH. ###### Pranjali Jain The Bento4 MPEG-DASH toolset is used to process the YouTube videos. It divides the videos into time segments of 5 seconds in this case and creates manifests. ###### Achintya Desai Bento4 MPEG-DASH toolset was used to create XML manifest that generate the entire media presentation which is used to simulate the Youtube server ###### Brian Chen The Bento4 MPEG-DASH toolset processed the 3,558 YouTube videos for standardized streaming. Essentially, it converted the videos into DASH format by dividing them into segments and creating manifests. This allows the construction of a server that would emulate what YouTube itself likely provides, except on a much smaller scale. ###### Vinothini Gunasekaran The Bento4 toolset is used to process the Youtube videos that have been collected via recommendation. It will divide the video content into 5 second segments and create manifests. ###### <Your Name> <Your response> ##### Why did the authors use CNN? What can be a better tool to use here (if any)? ###### Fahed Abudayyeh CNNs fit the application in this paper because the temporal locality of the data is hugely important in making predictions. I am not sure what other methods would be adequate for this problem. ###### Satyam Awasthi * Robust: can operate on noisy and coarse measurements * Agnostic to protocol-specific attributes (e.g., QUIC vs. TLS) * Can learn features other than burst patterns, e.g., arrival patterns of individual packets * Can use multiple session representations, train on all at once ###### Alan Roddick The authors used a CNN because they believed in helped to capture the local features of the data. They wanted to analyze the burst patterns on the local level. I think LSTMs or Transformers would be interesting to use in this case because they have been used in NLP and time series and have been shown to perform well. ###### Aaron Jimenez The authors used a CNN classifier as it is good at extracting spatially local features from an input to be used in later classification. Another tool that could have been used is a transformer to classify the time-series. ###### Samridhi Maheshwari The authors use CNN since the features are all temporally local in a time series - which is the feature set in this paper and because all the events in DASH occur in temporal proximity and are related to one another. A better tool to use could be a more advanced neural network - something like an LSTM which has feedback support, and LSTMs would work fine since the data is both temporal and sequence based. ###### Jaber Daneshamooz Becuase CNN is suitable for the problem(finding the local features and patterns). The bursts we have in DASH protocol occure in close temporal proximity and in CNN, the lower layers are used to produce the representation of features like temporally local in a time series. ###### Shereen CNN fits in this problem as it's layers are looking for local features where network events occur in close temporal proximity. RNN can be an option, it leverage the time and sequences in the problems. ###### Seif Ibrahim ###### Ajit Jadhav Due to the close temporal proximity of the DASH bursts, the representations of local features characteristic to CNNs are a good fit for the purpose of video identification. RNNs and LSTMs are other possible tools we can use since we have a time series. As an improvement over CNNs, capsule networks could show positive results. ###### Shubham Talbar It is natural to represent a streaming session as a series of burst sizes and directly search for correlations with the bursts in other sessions to check if the same content was streamed. However, this requires an approach that is very robust to the presence of noise and distortion that are introduced by encrypted protocol layers, by the indirect measurement using side channels attack channel. The authors constructed a deep convolutional network architecture that detects videos using features from their traffic. Employing CNN resulted in an accurate and noise-adaptive detector, effective even when measurements are performed from a side-channel. ###### Arjun Prakash CNN's are better at finding local patterns and since network events corresponding to each DASH burst occur in close temporal proximity, authors have used CNN. I believe RNN/LSTM can be used here but they can be computationally expensive. ###### Nikunj Baid Since CNN is a deep neural network, its lower layers can be used to generate embeddings for local features, especially on a time series data. As the burst events happen sequentially and nearly continuously ( fixed intervals ), it accumulates as a time series data, that is aptly suited for CNNs. ###### Liu Kurafeeva Since the data is video stream we need the extract the high-level features automatically and convolutional networks fit's for that nicely. Maybe any modern image-processing or other deep-learning techincs would fit as well. Maybe some transformers models, or something prepared for ts prediciton. ###### Deept Mahendiratta The featurs are all temporally related and that is what CNN trains on. A better option can be to use RNN which has feedback loops and RNNs are sequential in nature. ###### Nawel Alioua CNNs are generally used to produce representations of local features (e.g., spatially local in an image, or temporally local in a time series). The authors argue that since the network events of interest occur in close temporal proximity, CNNs are suitable for this setting. ###### Rhys Tracy Since DASH bursts occur in close temporal proximity, and CNN's are good at identifying patterns relating to close spatial or temporal proximity, CNN's are a good choice. CNN's are also reasonably fast, so that makes them a good option. RNN's (like LSTM) would likely be better at identifying temporal patterns (particularly over longer time periods), but RNN's can be much slower than CNN's and the CNN model in the paper demonstrates good results. Therefore, CNN was probably one of the best choices for use in this paper. ###### Roman Beltiukov Authors use CNNs because of their ability of catching local features co-dependency and creating of useful representations what could be used for current setting. Given the current setup, most tools that work with temporal differential data (like lstm, transformer, or wavenet architecture) could probably work fine. ###### Seif Ibrahim CNNs are well suited for this problem because they apply linear transformations to many windows on the input data which means they are good at finding local features (in this case temporally in a time series). If the videos were very long and only subtly similar maybe they could have use LSTMs for better recall. ###### Navya Battula The paper makes use of CNNs in this case to capture certain sub patterns within the data. CNNs are very good in identifying the local sub patterns within the given data. However, I think that LSTMs could be a better fit in this particular case because LSTMs tend to remmeber longer sequences while trying to capture the sub patterns and that would give us more informtaion about the patterns we are trying to infer. ###### Nagarjun Avaraddy CNNs work with filters/kernels which help in identifying interesting features in locality, as the key info of bursts/segments occur in locality of the time series data authors used CNN. I feel that as it is a time series data, we can employ LSTMs or Attention based models like Transformers to extract more data out. Although we get more global contextual info, the local context is anyway learnt. ###### Pranjali Jain The authors argue that CNNs are effective for representing spatial and temporal locality in the feature space. Since network events for each DASH burst have temporal relationships, CNNs are suitable here. RNNs, LSTMs, and more recent transformer models might also be effective models in this case because sequential packet bursts can be considered time series data. ###### Achintya Desai Authors justify using CNN by pointing out that the lower layers of CNN are used to generate representations of local features such as temporally local in a time series. This is closely related to the setting described in the paper where network events with respect to each DASH burst occur in quick succession. Another viable option could be LTSM which is well suited for classifying time series data. ###### Brian Chen The authors used a convolutional neural network since this variant of deep neural networks is better suited for identifying locality, either spatial lor temporal. The lower layers apply the same linear transformation on many windows, which leads to said behavior. A tool that might be better here is adversarial machine learning. The goal of this paper seems to be to reduce false positives to a minimum rather than to increase overall predication capability. In that case, adversarial machine learning should provide an avenue for minimizing unwanted complications in the final result. ###### Vinothini Gunasekaran Theoretically, there are many possible ways for an attacker to use the traffic features and to detect bursts. So the authors opted for a more sophisticated process which could construct complex models to process low level features. CNNs are deep neural networks which can be applied on lower levels. These are used to produce local features’ representations in close temporal proximity. This is very suitable for the setting that we discuss in this paper. ###### <Your Name> <Your response> ##### What's the input to the classifier? ###### Fahed Abudayyeh k x n: "k denotes the number of feature types taken. n is the recording time in seconds divided by the time-series sampling rate" The input for the classifier is a time-series for each of the features specified. ###### Satyam Awasthi Each feature is a time-series, sampled at 0.25-second intervals. Features considered by the author were: downstream/upstream/total values of bytes per second, packet per second, average packet length, and burst sizes. ###### Alan Roddick The input to the classifier is a one-dimensional vector that is composed of the time-series of burst size, packet length, bytes per second, packets per second. ###### Aaron Jimenez In this case, the input for the classifier are the n k-sized network traffic samples taken during the recording period. The time-series is passed into the classifier as a k x n matrix ###### Jaber Daneshamooz It's a one dimentional vector of features which as sampled in a periodic time series. ###### Samridhi Maheshwari The inputs to the classifier is a 1 dimensional feature vector which is aggregated by the max value of the feature. The features are the time series of packet length, Burst per second, bursts series, packets per second. ###### Shereen Elsayed A vector of features: bytes/sec, packets per second, average packet length and the traffic direction (inbound, outbound, or both). ###### Ajit Jadhav Input is a time series of features like downstream/upstream/total values of bytes per second, packet per second, average packet length, and burst sizes. ###### Shubham Talbar Each feature is a time-series, sampled at 0.25-second intervals. Features considered by the author were- 1. downstream/upstream/total values of bytes per second 2. packet per second 3. average packet length 4. burst sizes ###### Arjun Prakash The input features are the time series of packet length, Bytes per second, bursts series, and packets per second averaged over .25 second intervals and measured in up, down, and both directions. ###### Nikunj Baid The input here would be N vectors, each having k attributes ( packet length, burst sizes etc ). N would be the ( sampling time in seconds / time-series sampling rate, which in this case is 0.25 ). ###### Nawel Alioua Time series of the following flow attributes: • down/up/all bytes per second (BPS) • down/up/all packet per second (PPS) • down/up/all average packet length (PLEN). To create uniformly sized vectors, the series was aggregated into 0.25-second chunks by averaging over 0.25-second intervals. ###### Liu Kurafeeva Time series of network features that we discussed previousely and by which pattern can be identified, such as packet lenght, packets per seconds, etc. ###### Deept Mahendiratta Input is a vector of size 1*N where N is the number of features. The feautres used are bytes per second, packets per second, burst series, avg packet length ###### Rhys Tracy The input is a time series of input vectors where each vector has k features (bytes per second, packets per second, and average packet length). ###### Seif Ibrahim They input four different feature vectors over time. These are bytes/s, packets/s, average packet length, burst size. ###### Navya Battula The input to the model is going to certain timeseries data about the video playing in the background. This is going to be a multi variate timeseries data of channels bytes per second, packets per second and packet length in all directions. ###### Nagarjun Avaraddy The input is the time series of the feature list which are flow attributes i.e. bytes per second, packet per second and average packet length. ###### Pranjali Jain The input to the CNN model is a kxn vector with k features recorded for n seconds, sampled from a time series every 0.25 seconds. The features that can be extracted from the time series include bytes per second(BPS), packets per second(PPS), and the average packet length(PLEN) for up, down, and all directions. ###### Achintya Desai Input to the classifier is a time series with packet length, bytes per second, packets per second, and burst size features. All features with their up, down and both direction values averaging over 0.25 seconds. ###### Brian Chen The input to the classifier is a tensor representation of the input. For the case of this particular classifier, the input is tensor representation of time series which characterizes the bit stream of the network. ###### Vinothini Gunasekaran From the captures, they extract some features in 0.25 second interval time (bytes per second, packet per second and average packet length) that are given as input to the classifier. With 0.25 sampling rate, they used K number of features for N vectors. ###### <Your Name> <Your response> ##### What are the detection cascades? How can they be useful for this problem? ###### Satyam Awasthi The cascade can be viewed as an object specific focus-of-attention mechanism which provides statistical guarantees that discarded regions are unlikely to contain the object of interest. A cascade thus accepts only the inputs that are accepted by all of its classifiers and is efficient to train because most inputs are rejected by the simple lower-level classifiers. For this problem, the collected data seems to have a low base rate and cascades have demonstrated almost human-level accuracy for such cases. ###### Aaron Jimenez Detection cascades are a series of classifiers in which each one gets progressively more complex. When training the ith classifier, only those accepted by the i-1th classifier are passed to the ith. This has the advantage of making training more efficient as the more complex classifiers do not have to deal with any data that has previously been rejected. It also has the advantage of being to have almost human-like accuracy for complex tasks. ###### Alan Roddick Detection cascades is a technique where classifiers are used in a pipeline. These classifiers are given a set of data points and only passes along the set that was accepted to the next classifer. This helps train classifiers that become more complex the later in the pipeline. The reasoning for this is that the earlier classifiers can potentially reject a larger number of data points than later classifiers. ###### Samridhi Maheshwari Detection cascades are an optimization consisting of a series of classifiers, each of which is more complex than the previous one. During training, the (i + 1)th classifier is trained using only the samples accepted (possibly falsely) by the ith classifier. A cascade accepts only the inputs that are accepted by all of its lower level classifiers and is efficient to train because most inputs are rejected by the simple lower-level classifiers. Cascades have demonstrated almost human-level accuracy for complex tasks. ###### Ajit Jadhav Detection cascades are a series of classifiers of increasing complexity. This helps in training as each classifier accepts only the inputs accepted by classifiers before it. This results in most inputs getting rejected by the simple lower-level classifiers resulting in a more efficient training, leading to an increased accuracy for complex tasks. ###### Shereen Elsayed ###### Arjun Prakash Detection cascade consists of a series of classifiers, each of them trained using inputs from its previous layer. A cascade thus accepts only the inputs that are accepted by all of its classifiers and is efficient to train because most inputs are rejected by the simple lower-level classifiers. Because of the below base rate, detection cascades help in improving efficiency. ###### Shubham Talbar Detection cascades consist of a series of classifiers, each of which is more complex than the previous one. During training, the (i + 1)th classifier is trained using only the samples accepted by the ith classifier. A cascade thus accepts only the inputs that are accepted by all of its classifiers and is efficient to train because most inputs are rejected by the simple lower-level classifiers. Cascades have demonstrated almost human-level accuracy for complex tasks with low base rate such as face detection ###### Nawel Alioua Detection cascades consist in a series of classifiers, where each classifier is more complex than the previous one. A classifier in the cascade is only trained with the samples accepted by the previous one. The cascade is efficient to train, since it only accepts input that was accepted by all of its classifiers and most inputs are rejected by lower-level classifiers. ###### Nikunj Baid It consists of a series of classifiers, such that each cascade is complex than the preceding one, having larger input feature space and more hidden layer activations ). If a sample is rejected by some classfier, than it is not propagated to the succeeding classifiers. This makes training efficient as most of the inputs get rejected by the simpler lower-level classifiers. The stated problem has a low base rate and cascades have shown almost human-level accuracy for such tasks. ###### Liu Kurafeeva Detection cascades is an approach, which combines cascades of classifiers. Each of following classifiers are more complex than the previouse ones (more features, bigger input). Each of the following classifiers uses only the accepted by the previose one examples. ###### Deept Mahendiratta A detection cascade is made up of a sequence of classifiers, each of which is trained using the inputs from the layer before it. Because most inputs are rejected by low level classifiers, a cascade accepts just the inputs that are accepted by all of its classifiers and is thus efficient to train. ###### Rhys Tracy A detection cascade is a series of classifiers that all feed into each other and get increasingly more complex. The next classifier only recieves inputs accepted by the previous classifier. Therefore a detection cascade works by feeding data through increasingly complex classifiers until you get an accurate final result even with a very small base rate. ###### Seif Ibrahim Detection cascades put the input through a series of classifiers where each classifier only accepts if all of the previous classifiers have also accepted. They are easier to train that having a single large model. ###### Jaber Daneshamooz The series of classifires are cascaded together and beginning classifiers are less complex than the upper layer ones. Each classifier only accepts the input which is accpeted by previous classifiers. This method is like layering in network which helps the train better. ###### Nagarjun Avaraddy Detection classifiers are a series of classifiers which are progressively complex. The classifier accepts input only if the previous classifier does not reject it. This ensures the reduction in computation as complex classifiers are not used for significant sample space of the input. ###### Pranjali Jain Detection cascades consist of a series of classifiers where each classifier is more complex than its predecessor. Each classifier accepts only the inputs that were accepted by all the previous classifiers, which makes the training more efficient. Detection cascades have shown very good accuracy for tasks with a low base rate, which is also the case for the problem presented in this paper. ###### Achintya Desai Detection cascades are made up of a series of classifiers where each one is the one with a larger feature space and more hidden layer activations than the previous classifier. Since a classifier in the detection cascade is trained using only the samples accepted by all of the previous classifiers, it accepts only those inputs which are accepted by all the classifiers in the series. This improves the efficiency as most of the inputs are rejected in the early stages only. Thereby, making it suitable to deal with low base rate. ###### Brian Chen Detection cascades are a series of linked classifiers, each subsequent one being more complex than the last in terms of feature space and layers. The (i+1)th classifier is trained only using what passed the ith classifier. Detection cascades are useful for this problem as they achieve the goal of minimizing false positives. By running the results through the gauntlet of classifiers, each more well trained than the last, only accurate predictions are likely. Furthermore, it seems that cascades manage to reduce the time need to train the classifiers relative to naively training the same n number separately. ###### Vinothini Gunasekaran In detection cascades, when a model is trained, the classifier at i+1-th level is trained using the samples accepted in the previous classifier at i level. So a cascade only accepts inputs that are accepted by all of its classifiers. It is useful because it improves the training efficiency and it showcases very high level accuracy for complex tasks. ###### <Your Name> <Your response> ##### What other systems have leveraged VBR leaks? How is the proposed system better/worse than these systems? ###### Fahed Abudayyeh The paper "Privacy trends in consumer ubiquitous computing" (Saponas et al.) pointed out the side channel information leakage that is a side effect of VBR. The methods in that paper use finer data granularity and result in higher false positive rates. ###### Satyam Awasthi * Saponas et al. [Privacy trends in consumer ubiquitous computing] observed that encrypted, VBR-encoded videos leak information * Li et al. [Wavelet Based traffic analysis] focus on detecting re-encoded content * In [Video streaming forensic– content identification with traffic snooping], Liu et al. use aggregated traffic throughput traces (as opposed to frame-size time series) These methods work on fine-grained measurements and capture short-term variations due to changes of picture and long-term variations due to changes of scene. Also, their false positive rates are high for “open-world” identification and none of them would work if the measurements of the attacker are noisy and coarse-grained. ###### Alan Roddick Schuster et al. mention the following authors: Saponas et al., Li et al., and Liu et al. who leverage VBR leaks on a fine-grained level. The authors argue that these techniques would not work when an attacker only has access to course grained information such as in the sandboxed Javascript example. These methods also have a fairly high false positive rate. ###### Aaron Jimenez In contrast to this system, some other systems take advantage of the fact that VBR-encoded videos leak information about their content and as such take their traffic trace, average, and apply a sliding-window DFT. Another system takes the avenue of trying to detect re-encoded content. They try to apply a wavelet transform to the frame size time-series and try to cross-correlate the wavelet coefficient series of the traffic with a reference file. While these approaches do show results, their false positivity rate is high. As such, when measurements are noisy it can be hard for these systems to work properly. ###### Samridhi Maheshwari Fine grained video detection - 1 - Creating a signature of video by taking its trace at bits per second time series and applying a fourier transformation on it. The detector then takes the fourier transform of the attacker traffic and matches the closest one 2 - Li et al focus on detecting re-encoded content. They apply a wavelet transform to the time series of frame sizes and cross-correlate the wavelet coefficient series of the observed traffic with those of a reference content file.Liu et al. use aggregated traffic throughput traces (as opposed to frame-size time series) These methods would not work if the detection environment is noisy and coarse grained (ie cant remove information at bits per second rate) VOIP - VBR leakage in encrypted VoIP communication can be used to identify the speaker’s language and detect phrases. A comparison between voice and video cannot be made since encodings, content, etc are different. ###### Ajit Jadhav Saponas et al. used the traffic trace of the video to create a video “signature” based on the observation that encrypted, VBR-encoded videos leak information about their content. While Li et al. focused on detecting re-encoded content. The disadvantages of these approaches are: high false positive rates for open-world identification and inability to work on noisy and coarse-grained measurements (requires fine-grained measurements). ###### Nawel Alioua Saponas et al. (Devices that tell on you: Privacy trends in consumer ubiquitous computing) Li et al. (Waveletbased traffic analysis for identifying video streams over broadband networks.) The false positive rate of these methods is high for real world settings. These methods rely on fine-grained measurements, and would not perform well in the face of noisy and coarse-grained attacker measurements Dubin et al. (I know what you saw last minute — the Chrome browser case.) far less accurate classifiers and vulnerable to noise. Cannot be used by a JavaScript attacker. Reed and Klimovski (Leaky streams: Identifying variable bitrate DASH videos streamed over encrypted 802.11n connections.) Approach not evaluated in an off-path setting, where the attacker has only noisy side-channel measurements, nor for any streaming services other than Netflix. ###### Jaber Daneshamooz T Scott Saponas, Jonathan Lester, Carl Hartung, Sameer Agarwal, and Tadayoshi Kohno. Devices that tell on you: Privacy trends in consumer ubiquitous computing. In USENIX Security 2007. ###### Shubham Talbar Saponas et al. observed that encrypted, VBR-encoded videos leak information about their content. To create a “signature” of a video, they take its traffic trace as a bits-per-second time series at the granularity of 100 milliseconds, average, and apply a sliding-window DFT. Their detector applies DFT to traffic traces and matches to the closest signature. Li et al. focus on detecting re-encoded content. They apply a wavelet transform to the time series of frame sizes and cross-correlate the wavelet coefficient series of the observed traffic with those of a reference content file. Even though these methods rely on fine-grained measurements, their false positive rates are prohibitively high for “open-world” identification. None of them would work if the measurements of the attacker (e.g., performed by sandboxed JavaScript) are noisy and coarse-grained. Dubin et al. suggest using the (unordered) set of segment sizes as a title fingerprint. This detector is far less accurate than the proposed CNN classifiers and vulnerable to noise, and consequently cannot be used by a JavaScript attacker. ###### Liu Kurafeeva Saponas et al, Li et al, Dubin et al Paper-proposed system can work in off-device attacker, can cope with compatibly high noise levels and have extremly low false-positive rates. Reed and Klimovski Paper-proposed system works not only on Netflix, and still works nice with noise. VOIP This works showed possibility of language detection and phrases detection (not shown in paper-proposed work) ###### Deept Mahendiratta Dubin et al.: Significantly less accurate and susceptible to noise. A JavaScript attacker will be unable to exploit this vulnerability. Saponas et al: Based on the discovery that encrypted, VBR-encoded films leak information about their content. created a video "signature" using the video's traffic trace. Li et al. were interested in detecting re-encoded material. The other methods don't work efficiently when the network is noisy and corase grained which is taken care of by the used method. ###### Rhys Tracy [30] (Liu et al.) primarily focusing on detecting redistribution of video content. Used fine grain measurements which lead to high false positive rates, so it isn’t very usable in the real world. [31] (Liu et al.) uses aggregated traffic throughput traces (also fine grain). Again not realistic to use in the real world. ###### Seif Ibrahim In [46] Saponas et al. apply a sliding window DFT to a bits/s time series and match the closest signature. In [30] Li et al. focus on re-encoded content and uses a wavelet transform for correlation. The authors state that these models rely on very fine-grain measurements and they have high false positive rates so they would not work in the weaker adversarial setting of a Javascript ad. Most other approaches either rely on observing TCP traffic directly or on unencrypted metadata that is specific to certain applications (e.g. Netflix). ###### Nagarjun Avaraddy 1. Sopranos et al. used DFT on windows of time series data which is collected 100ms granularity and averaged to generate the signature, and then the closest signature is matched. 2. Li et al. used wavelet transform and cross correlated with the wavelet coefficient of the observed with the reference. Issue with above two is that they work best with fine grained ( which isn't always available) and even that isn't generalized well. 3. Dubin et al. used the segment set as fingerprint for the video Issue is that it does not work well with noise. 4. Reed et al. use pearson correlation to find netflix videos, they fingerprint entire netflix catalog for it. Is basically closed-world i.e. only netflix titles work well, not evaluated on noisy traffic for netflix also. Also major issue is that the fingerprinting is done with some metadata which is not encrypted, given a case where the encryption for that metadata happens, this solution fails. ###### Pranjali Jain Saponas et al. - created signature of a video using its traffic trace as a BPS time series and further applied a sliding-window DFT. The detector can use another DFT in order to match the closest signature. Li et al. detect re-encoded content using wavelet transformation on the time series of frame sizes. Liu et al. use aggregated traffic throughput traces. These systems have high false-positive rates for open-world identification even with fine-grained measurements. These methods will not be effective if the measurements of the attacker are noisy or coarse-grained. ###### Nikunj Baid 1. Fine grained video detection : Saponas et al. tried to fingerprint a given video by taking its traffic trace as a bits-per-second time series at the granularity of 100ms, took its average, which was then applied to a sliding window DFT. It is worse in the sense that its false positive rate is significantly high for open world scenarios. And it would drastically fail if the network environment is noisy and coarse grained. 2. VoIP : Wright et al. demonstrated that VBR leakage in encrypted VoIP communication can be used to extract the speaker’s language and also to detech phrases. This approach can also be extended to extract conversation transcripts. ###### Achintya Desai Saponas et al. takes traffic trace at bits per second (more granular than the proposed system) time series at 100 milliseconds on average compared to 0.25 seconds. It uses DFT to identify the closest signature match. Lie et al. use the VBR leak to detect re-encoded content. It applies wavelet transform to the time series made up of frame sizes. Detection is done by cross correlating wavelet coefficient series of the observed traffic with a reference content file. Both these methods have a comparatively higher false-positive rate and need fine-grained measurements. This indicates that the proposed system is better than the above two techniques especially when the data is noisy. Dubin et.al. uses the set of segment sizes for fingerprinting which is more susceptible to the noise with low accuracy and infeasible to use remotely. Reed & Klimovski uses Pearson correlation to identify Netflix streams which is scaled to fingerprinting entire Netflix title selection. However, it assumes on-path attacker who can observe TCP-layer traffic contrary to encrypted streams. This technique uses the ismv file sent by Netflix at the beginning of every stream. This file contains all segment sizes for all possible encodings of the title which are sent unencrypted. These are used to exploit VBR leak. Wright et. al. demonstrated that VBR leak allows an attacker to identify speaker’s language and detect phrases. This approach is based on Hidden Markov Model trained to identify specific phrases. It was further extended to extract conversation transcripts. ###### Brian Chen Several fine-grained video approaches currently exist. Saponas et. al use a sliding window DFT. Li et al. use a wavelet transform on the time series of frame sizes ad cross-correlate. These systems have a much higher false-positive rate, so high that it is in fact prohibitive. Dubin et al. suggest using the set of segment sizes, but this is less accurate and vulnerable to noise. Reed and Klimovski use Wi-Fi sniffing and implement a Pearson correlation attack which has high accuracy. Later, Reed and Kranch scale the approach up, but this approach relies heavily on the metadata. Wright et al. used VBR to identify language and detect phrases with a Hidden Markov Model in VoIP. White et al. extended this approach to extracting transcripts. Both of these works target a different space, but are nonetheless related to VBR ###### Vinothini Gunasekaran VBR leak exploitation on fine-grained video: -> Saponas et al. took the traffic trace in bits per second and created a signature for videos and observed that VBR encoded videos leak its content information. -> Li et al. applied a wavelet transform on time series to focus on re-encoded content detection. -> Dubit et al. uses title fingerprinting which is less accurate compared to other methods VBR leak exploitation on VoIP: -> Wright et al. identified that these leak could lead attackers to find client’s language and detect phrases. They even extended this approach to extract conversation transcripts ###### <Your Name> <Your response> ##### How do you compare this approach with the one in [44]? ###### Satyam Awasthi [44] assume an on-path attacker who can observe TCP-layer traffic, it has not been evaluated in an off-path setting, where the attacker has only noisy side-channel measurements. Also it is only evaluated on Netfix. The mass fingerprinting in [44] relies on the metadata ( .ismv file headers) sent by Netflix to the client at an early stage of the streaming process for all encodings of the title. They are sent in the clear, while the video content is DRMencrypted. Thus, [44] may not work if these headers were DRM-protected, too. ###### Aaron Jimenez While fingerprinting all content involved and trying to identify traffic based on this fingerprint could be very effective in a “cleanroom” environment, it does have a number of real world issues. For one, the effort in trying to fingerprint an ever-changing video library, such as Netflix, means that attacker must put in more work collecting video data than they will designing the actual system. And second, this approach may be susceptible to noisy side-channel measurements that may affect the classification of the content. ###### Alan Roddick Reed and Kranch have a similar paper where the entire Netflix library was fingerprinted. Their threat model is an on-path attacker observing TCP traffic. Included in their fingerprints are the metadata that Netflix sends to the clients such as the segment sizes for the different encodings of the video. Schuster et al. state that this has not been evaluated for an attacker that may only have access to course grained information. Additionally, it is unclear whether the metadata had a significant impact on the classification and if removing that information would weaken the performance. ###### Samridhi Maheshwari In the paper suggested by Reed and Kranch, they do a fingerprinting using the Pearson correlation - they do this for the entire Netflix title selection. They use on path traffic capture - i.e directly looking at access points, routers, etc. The paper does not tell how the method would work for off path traffic analysis. The fingerprinting used in the paper also relies on the metadata sent by netflix, and does not tell what would happen if the metadata (the headers which mention the segment sizes and encoding) were encrypted (DRM protected) too. The approach suggested by Reed and Kranch is restricted by two things - the type of traffic analysis (on path), and also the streaming service used. It could be possible that different streaming services send metadata differently, and it would be possible that we may not get decrypted metadata for all of them for the creation of our classifier. ###### Ajit Jadhav [44] uses Pearson correlation for fingerprinting the entire Netflix selection using a Wi-Fi sniffing attack. The underlying assumption was that there is an on-path attacker who can observe TCP-layer traffic so this approach has not been evaluated for using the noisy side-channel measurements in an off-path setting. Also, this approach was tested only using Netflix. Finally, the mass fingerprinting in [44] relies on the metadata sent by Netflix (which is not DRM-protected) to the client at an early stage of the streaming process, namely the .ismv file headers that contain all segment sizes for all possible encodings of the title. There is no mention of how the approach would work if the metadata was also DRM-protected. ###### Arjun Prakash Reed and Kranch [44] assume an on-path attacker who can observe TCP-layer traffic. This approach has not been evaluated in an off-path setting, where the attacker has only noisy side-channel measurements. They also rely on the metadata sent by Netflix to the client at an early stage of the streaming process, namely the .ismv file headers that contain all segment sizes for all possible encodings of the title. It is not clear how the approach would work if these headers were DRM-protected. ###### Nawel Alioua The approach described in [44] is Mass fingerprinting, where the metadata sent from Netflix to the client at the early stages of the streaming is collected, namely the .ismv file headers. This metadata contains all segment sizes for all possible encodings of the title, and are sent in clear. A limitation to this approach is in the case where the .ismv is also encrypted. Also, this approach was not evaluated in an off-path setting where the attacker can’t observe the TPC-layer traffic and has only access to noisy side-channel measurements. It was not evaluated for services other than Netflix. ###### Nikunj Baid Reed and Klimovski implemented a wifi sniffing attack and suggested an approach based on Pearson correlation for identifying netflix streams. In [44], this approach is scaled by fingerprinting the entire Netflix title selection. But the assumption here is that an on-path attacker can observer TCP layer traffic. This approach has not been tested and probably won’t work for a remote attacker, who would just have some noisy measurements. Also, it has not been tested for any service other than Netflix. Also, the Mass fingerprinting setup relies on the metadata sent by Netflix, to the client during the early stage of the streaming process. that contain the segment sizes for all the various encodings of the title. If this metadata in the header gets DRM-protected, just like the video content is now, this appraoch would probably won’t work. ###### Shubham Talbar [44] Andrew Reed and Michael Kranch. Identifying HTTPS-protected Netflix videos in real-time. Reed and Kranch propose Mass fingerprinting to scale the proposed approach and hence, fingerprinting the entire Netflix section. It relies on the metadata sent by Netflix to the client at an early stage of the streaming process, namely the .ismv file headers that contain all segment sizes for all possible encodings of the title. They are sent in the clear, while the video content is DRMencrypted. This approach has not been evaluated in an off-path setting, where the attacker has only noisy side-channel measurements, nor for any streaming services other than Netflix. ###### Liu Kurafeeva For proper comparioson we need either extend 44 approach on other streaming services (how, since the approach rely on the netflix-specific) or compare only on Netflix (which immediatly benifits paper one), using the procentage of videos that can classify (which is good, but not enough). Also difficulties lies in the noise-resistance comparison. Due to limitations of attacker position paper-proposed approach seems better to me. ###### Deept Mahendiratta [44] is using the idea that an attacker is on path and will monitor TCP-layer activity. This method hasn't been tested in an off-path scenario where the attacker just has noisy side-channel readings. They also rely on metadata given to the client by Netflix early in the streaming process, specifically the.ismv file headers, which contain all segment sizes for all possible encodings of the title. If these headers were DRM-protected, it's unclear how the strategy would function. ###### Seif Ibrahim The main difference between [44] and this paper is that this paper assumes a much weaker adversary who is incapable of monitoring network traffic directly, instead the adversary can only run javascript on the end host's browser. The authors of this paper show that a malicious javascript ad can be used to leak the traffic for any device on the same local network as the targeted end host. ###### Nagarjun Avaraddy This approach is not limited to netflix titles, generalizes well in comparison to 44. The correlation technique isn't scalable as compared to the time series based local feature identification and making use of other network flow based info to reduce the target sample space. The solution also works well with noisy data. ###### Pranjali Jain In [44] they fingerprint the entire Netflix title. The assumption is that the on-path attacker can observe the TCP-layer traffic. Their approach was not evaluated in an off-path setting or for any other streaming applications except NetFlix. Their video content is DRMencrypted and their mass fingerprinting approach relies on sending metadata to the client at an early stage of the streaming process. ###### Rhys Tracy The approach in [44] involves fingerprinting videos based on their TCP/IP headers. This approach is shown to clearly work with similar results (if not better) than the results in this paper’s approach. The fingerprinting approach uses an on-path TCP observation, so doesn’t test out a much easier off-path attack (like this paper). As such, the fingerprinting approach seems to be unlikely to be used as other on-path attacks with different (and more extreme) goals would likely be used in the case that a hacker were to get on-path access. The approach from this paper is much more feasible to use in the real world as opposed to the approach in [44]. ###### Achintya Desai The main flaw in [44] is that it relies on ismv file which is sent by Netflix at the beginning of streaming containing all segment sizes for all possible encodings of the title. If this file was DRM protected like the video content then it might be difficult to use this approach. It also assumes an on-path attacker with TCP layer traffic access. The behavior of this technique is unknown for an off-path attacker. ###### Brian Chen I believe that this approach is better than the one in [44] in the sense that this one can be utilized by remote attackers. In other words, despite [44] having a much higher accuracy, the CNN attack proposed in this paper is more generally applicable. This is especially true when considering the fact that [44] relies heavily on metadata. There are several Private Information Retrieval schemes being developed currently that aim to obscure metadata. I feel that [44] might be too metadata dependent. ###### Vinothini Gunasekaran In Reed and Krach [44], they scale the approach that is discussed in this paper by fingerprinting the entire Netflix title selection along with some assumptions about attacking in the networking environment. Also, this method relies on the metadata that has been sent to the client by Netflix in .ismv file headers. This dependency makes this approach very vulnerable. For example, if Netflix changes their design of the metadata, that would heavily affect [44] input. Unlike [44], this paper is primarily focused on application on separately collected data which can be extended for future changes.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.