owned this note
owned this note
Published
Linked with GitHub
# NetMicroscope Reviews
#### What problem is this paper solving?
###### Nagarjun Avaraddy
The paper is trying to solve the problem of infering the quality metrics like startup time and resolution for an encrypted video stream using the network data. They are also developing a single model for different kind of video streaming services; more general model.
###### Navya Battula
The paper dives into the existing approaches for video quality inference in encrypted traffic and proposes an approach that improves over previous model in 4 aspects: deployment settings, single composite model, predictions made at finer granularity, scale.
###### Brian Chen
The paper is addressing the issue of deriving network quality from encrypted traffic. As the paper puts it, ISPs do not have access to decrypted network traffic and therefore cannot determine video quality. By extension, it becomes difficult to optimize the network for said video traffic. Specifically, the paper builds on existing methods and addresses the existing issues of operating on real network traffic and multiple services.
###### Samridhi Maheshwari
This paper aims at predicting video quality metrics i.e start up delay and resolution for encrypted video streaming services using information available from encrypted network data.
###### Shereen Elsayed
The paper is targeting encrypted videos as they are so hard to extract/learn/predict their quality metricies. They created a model that as is claimed to be general enough to cover multiple video streaming platforms.
###### Apoorva Jakalannanavar
Internet Service Providers need to infer the quality of experience for providing better service to customers. And more specifically this paper looks at QOE of video streaming applications. But as the network data is generally encrypted by the video streaming providers, this paper looks into how the QOE can be predicted on the encrypted data. They look at 2 specific metrics for measuring QOE i.e startup delay and video resolution.
###### Rhys Tracy
The paper is trying to help ISP's better infer the quality of video streams by creating a generalized model to predict video quality metrics and training it across different streaming services.
###### Seif Ibrahim
This paper is trying to solve the problem of predicting Quality of Experience for video playback on platforms like YouTube, Amazon, and Twitch using only the information that can be collected from an encrypted video stream.
###### Aaron Jimenez
This paper is trying to solve the problem of streaming video quality identification using packet traces that are encrypted. As such they are trying to develop a model that can identify video quality across multiple services based on network streaming packets in an online environment that can be filled with a large amount of noise.
###### Arjun Prakash
The authors try to build a model that can be used to infer the quality of a streaming video in encrypted traffic by focusing on startup delay and resolution metrics. Their main idea was to build a generic model that can be used with multiple streaming services.
###### Deept Mahendiratta
It is trying to solve the problem of predicing video quality using network streaming packets.
###### Shubham Talbar
The paper proposes a model that infers quality metrics more specifically - startup delay and resolution for encrypted streaming video services. The proposed model is better than the author's previous work in a threefold manner : First, the model works in a real deployment setting. Second, the model developed is a single composite model for four different video streaming applications. Third, the model performs predictions at finer granularity.
###### Satyam Awasthi
The paper is trying to infer the quality of video streaming, specifically the startup delay and resolution. Unlike the previous works, this one takes works in deployment settings (so, has a mix of traffic). Second develops a composite model for multiple services. Third, provides predictions at finer granularity.
###### Punnal Ismail Khan
The authors developed models that infer startup delay and resolution for encrypted video streaming services.
###### Nikunj Baid
The paper attempts to solve the problem of finding the quality metrics for video streaming services for a network even though the traffic is encrypted. Also, the authors have tried to come up with a single model that could be used for multiple services to determine these metrics.
###### Ajit Jadhav
The paper is trying to solve the problem of inferring the quality of streaming video applications for an encrypted video stream. They also ensure that the developed solution works in a deployment environment and is also able to handle a number of video services.
###### Alan Roddick
The paper is trying to solve the issue of encrypted video streams that inhibit ISPs from being able to infer the quality of streaming video applications
###### Vinothini Gunasekaran
The paper focuses on inferring the video streaming quality by measuring metrics such as startup delay and video resolution. It aims to reduce the challenges that ISPs face in content delivery optimization.
###### Pranjali Jain
The paper develops models that infer quality metrics like startup delay and resolution for encrypted streaming video services.
###### Nawel Alioua
The problem is the difficulty of inferring the quality of video traffic in an increasingly encrypted internet
###### Achintya Desai
This paper is about inferring video quality over an encrypted video streaming traffic. It proposes a model to infer quality metrices of encrypted streams.
###### Liu Kurafeeva
The papaer descusses the problem of calculation QoE based on the encryted trafic. It's try to get some kind of feedback from user to apply it to long-term and short term desisions.
#### Why is that problem important?
###### Nagarjun Avaraddy
Infering the quality of a streaming video provider is essential for ISPs and other entities involved in the network packet transmission to provide more optimal data delivery service. As video streams are encrypted; solving QoE inference for encrypted video streams becomes an essential problem to solve.
###### Navya Battula
In order to deliver suitable bandwidth depending on the resolution required by the user, the video content providers and the service providers must have access to the video quality from the client side. Video content providers can achieve this using their softwares. However it is essential that the ISPs have access to the client side video quality to make a better decision about the bandwidhth to be served and with more and more internet being encrypted everyday, it only makes sense we need approaches that could essentially solve this problem.
###### Brian Chen
Encryption will only increase from now on, so an approach which can fundamentally solve the issue is necessary, assuming that network optimizations are desired. The issues of operating on real networks and multiple services are also major drawbacks to old approaches that would prevent them from ever being realistically deployed. Addressing these two issues is a big step towards practical usage.
###### Samridhi Maheshwari
Video streaming traffic is by far the dominant application traffic on today’s Internet. Optimising video delivery depends on the ability to determine the quality of video stream. By predicting quality of video, ISPs can get a better idea on how to better serve users and retain users on their platform. Hence predicting the quality of video correctly becomes important.
###### Shereen Elsayed
Video streaming trafficing is the dominant application traffic in the current days (82% of the internet traffic in 3 years). For ISP to optimize video delivery quality, they should be able to infer video quality from traffic as it passes through the internet.
###### Apoorva Jakalannanavar
It is important to infer QOE for internet service providers, as they can figure out how the experience for the clients can be optimized and also to figure out the inefficiencies in the network. But it is hard to do so when the data the ISP's recieve is encrypted. Hence there is a need to look at the encrypted data and come up with a methodology to predict the qoe metrics.
###### Rhys Tracy
If ISPs have a better understanding of the quality of experience users have (and metrics that quantify it) when video streaming, they will potentially be able to improve the quality of experience of their customers.
###### Seif Ibrahim
The motivation for this problem is to help ISPs or network admins to get feedback on the Quality of Experience that the end users are experiencing so that they can better tune the network and improve the user experience. This problem is important because data streams for all kinds of applications are becoming encrypted so there is more need for this type of inference.
###### Aaron Jimenez
As video streaming data is now encrypted, ISPs and other network operators can no longer perform deep packet inspection to determine the quality of video streaming to optimize network conditions for this traffic. In addition, different streaming services may behave differently, and so being able to identify multiple different services can be beneficial. This is important because a degradation in streaming quality can reflect badly on the service provider, as much as the content provider, thus, lowering the customer’s quality of experience.
###### Arjun Prakash
Determining the quality of service would help the ISPs in optimizing the video content delivery. And since the contents are encrypted it makes it difficult for ISPs to get the required information for optimizing the content delivery. Hence inferring such useful metrics from encrypted traffic is important.
###### Deept Mahendiratta
Video streaming traffic is promiment traffic on internet these days. Hence understanding the quality of experience users are having will help ISP's serve the users better.
###### Shubham Talbar
Optimizing video delivery depends on the ability to determine the quality of video stream that a user receives. Video Content providers have direct access to video quality from client software but Internet Service Providers (ISPs) have to infer video quality from traffic as it passes through the network. Since the end-to-end encryption is common as a result of HTTPS and QUIC, ISPs cannot directly infer video quality metrics.
###### Satyam Awasthi
For optimal network delivery in video streaming, like the streaming companies (who can simply collect the essential statistics directly from clients at app level), the ISP too should have an idea of the quality of the video streaming. So that they can understand which aspects of the service they are to improve. It will also allow the customers to infer the best network plan for streaming since higher throughput (more expensive plans) after a threshold yields low returns.
###### Punnal Ismail Khan
This is Important for ISPs to infer the quality of encrypted video streams to better optimize data delivery.
###### Nikunj Baid
Video stream services have encrypted traffic which makes it difficult for ISPs to extract and infer the network quality metrics, which hinders their ability to deliver good performance to its users. With the increasing volume of video traffic in the network, it makes sense for ISPs to be able to infer these metrics.Also, since different services have their own protocols in terms of buffer size, segment delivery etc, it would be nice to have a single model that could handle traffic from as many services as possible.
###### Ajit Jadhav
Since video streaming makes up the bulk of internet traffic and ISP providers don’t have direct access to video quality metrics, it is important to be able to infer these metrics for ISP providers to optimize video delivery to the users for better quality and efficiency
###### Alan Roddick
This problem is important because it is a subcategory of a bigger issue, which is classification of encrypted internet traffic. As encryption becomes more widely used, it is important to find good inference tools that aid in traffic analysis. For ISPs, it is important for them to understand the QoE for their users in order to enhance the overall experience.
###### Vinothini Gunasekaran
With the increased video streaming traffic in today’s internet, it is necessary to focus on content delivery optimization, which depends on the video streaming quality. Since the video content is end to end encrypted, ISPs can not directly observe the video quality metrics. If we develop more generalized models that could identify video sessions accurately from mixed traffic and if we could apply them to different video streaming services, ISPs could perform better content delivery optimization.
###### Pranjali Jain
Inferring the quality of streaming video applications is important for Internet service providers. ISPs need video quality information of the video stream that the user received in order to make optimization decisions for video delivery. However, the end-to-end encryption of popular video streaming applications poses a challenge for ISPs. Thus, it is necessary to implement mechanisms to infer video quality metrics from properties of network traffic that are directly observable.
###### Nawel Alioua
Inferring video quality is important for service providers to adjust the quality of their service to different network conditions and optimize the user’s QoE.
###### Achintya Desai
The video quality inference from video streams is an important metric for content providers as well as internet service providers. ISPs use this data to perform better content delivery optimization. On the other hand, service providers can also understand what video quality is more preferred and feasible by the end users. Additionally, most of the traffic over internet is increasingly becoming encrypted. This makes the problem of inferring video quality over encrypted traffic worth pursuing.
###### Liu Kurafeeva
QoE affects both, long-term and short-term disucions, but "people in the middle" do not have access to users experience or to pure trafic.
#### How is it curating the QoE dataset? What are the fundamental challenges in the data-collection process? What are the limitations of this dataset?
###### Nagarjun Avaraddy
QoE dataset is being curated using a google chrome extension to monitor and assign the video quality metrics as seen by the client.
* Netflix : parsing the overlay text which netflix provides when a keystroke is input by client.
* Youtube : youtube's iFrame API to collect metadata for the video. Also parse the html5 video tag
* Twitch and Amazon : extend youtube's work and parse the html5 video tag.
The challenges lie in the data transmission rates and processing of the same by the chrome extension. The other challenge is that this is not extensible and the dataset may have different uncaptured features for different services due to nature of the client-server interactions being different for each service.
###### Brian Chen
The paper curates the dataset by using a Chrome extension that automatically labels the traffic traces with a corresponding video quality. Data was collected across a 14 month window from 66 different devices. Collecting the data at a sufficient granularity while still being practical is one challenge. Another is detecting the start and end of a video in the presence of cross-traffic. On limitation of the dataset is the scope of coverage. While the 66 devices were probably spread far apart, with some even in France, it is undeniable that 66 points of coverage is not sufficient to from a proper set of data at this scope. Perhaps if the scope were narrowed then the dataset might be more reliable.
###### Navya Battula
The paper makes use of 6 datsets - Netflix, Amazon, YouTube and Twitch plus two combined datasets - one that makes use of all services and one that makes use of three out of four services. Labeling of the dataset is done using the Google Chrome extension which monitor the application level information for these four services. The basic challenges in the data collection process could arise from the client side mismatches in collection infrastructure and discrepencies in google chrome extension.
###### Samridhi Maheshwari
The authors use a chrome extension to get data from the four services. This chrome extension allowed the authors to assign video quality metrics to each stream as seen by the client. They ran different collection methods for the different services like using overlay text for Netflix, iFrame API for youtube and HTML5 tag parsing for Amazon & Twitch. They used a combination of home and lab networks to collect data. Even though the data collection technique focuses on spreading the types of devices used for data collection, and the videos used are random, there is a possibility that the data collected can lead to underfitting - i.e it can lead to many data points of the same type.
###### Apoorva Jakalannanavar
The QOE data is collected using 4 video streaming applications(YouTube, Netflix, Amazon, and Twitch) and for 2 ground truth labels(startup delay and video resolution). They utilized 11 machines in a lab with emulated environment and 6 machines in real-world network, to generate video traffic and collect raw packet traces and used a chrome extension for gathering the qoe metrics. To capture various network conditions, they emulated the network conditions by varying the capacity from 50 kbps to 30 Mbps, and introduced loss rates between 0% and 1% and additional latency between 0 ms and 30 ms. Although they considered to add diversity in the data using emulated network conditions, it might not serve as good data points as they do not represent real-world network conditions. Although they have collected data for around 13000 video sessions, but it is collected from 4 different video streaming applications for 5 resolutions each, resulting in lesser data for every individual setting.
###### Rhys Tracy
The paper made use of a chrome extension to capture video quality information as well as a chrome api to capture the network traffic. The authors captured data from 4 of the biggest streaming services: YouTube, Netflix, Amazon, and Twitch. One of the biggest challenges was the fact that video quality information had to be gathered differently for each service (ie parsing text, parsing html tags, or using an api). The dataset is limited by the fact that the labelled training dataset contains only 13,000 video sessions across 66 networks (average of only about 200 sessions for each network over 16 months).
###### Seif Ibrahim
The paper collects a dataset of labeled QoE data from YouTube, Netflix, Amazon, and Twitch. They are able to do this collection automatically using Chrome extensions and selenium scripts. They did the data collection on 11 machines, 5 that ran in a lab with simulated network conditions (e.g. varying bandwidth, latency, and packet loss) and 6 machines running in a real network environment. They collected a total of 13,000 data points. Once challenge with this dataset is that it was not diverse enough and that is why they needed to do domain adaptation by adding noise from the simulations in the lab.
###### Shereen Elsayed
The dataset covers YouTube, Netflix, Amazon and Twitcj
- Traffic traces were collected using chrome extension (supports any HTML5 video)
- Chrome extension uses Chrome WebRequest APIs to get all the information required to know the start and the end of the video. In addition, the API gets the HTTPs requests and responses during the video
For each video streaming provider, there were specifics in implementing the curating algorithm:
- Netflix --> Parsing overlay text: data collected every 1 second. Statistics collected: player and buffer state information; including whether the player is playing or not, buffer levels (i.e., length of video present in the buffer), and the buffering resolution.
- YouTube --> IFrame API: data extracted periodically. They used HTML5 tags to know when the video started or stops due to both user interaction (e.g., pressing pause) or due to lack of available content in the buffer.
- Twitch and Amazon --> HTML5 tag parsing: `<video>` tag was used to collect the required information
###### Aaron Jimenez
They curated the QoE dataset by measuring streaming traffic on 11 different devices (laptops and a desktop) in differing network connections (Wifi and wired) in different parts of the US and France. They used ChromeDriver to run the videos on YouTube (TCP and QUIC), Netflix, Twitch, and Amazon while recording the traffic using TCPDump and using a Chrome extension to capture QoE labels. They also tried to emulate varying network conditions by varying network capacity and different points. The challenges and limitations of this process and dataset are that they were captured in emulated environments that may not be completely one-to-one with real world conditions. In addition, this dataset is limited by the fact that it randomly sampled certain videos on the different platforms, which means that it only captured a very small sample subset of the platforms’ libraries.
###### Arjun Prakash
They created a chrome extension to monitor application-level information from four major services - Netflix, YouTube, Twitch, and Amazon. The extension captures the information from Netflix by parsing the overlay text. YouTube stats were captured using the iframe API. For Twitch and Amazon, they parsed the HTML 5 tag to collect the required data. The data were collected from 66 homes, and there were only close to 20 homes in each speed tier. In reality, the network condition might be very different from these 20 homes and the collected data may not generalize well
###### Alan Roddick
The dataset was created using different sessions, from Netflix, YouTube, Amazon, and Twitch, one with all services, and one with sessions from three out of four services.The dataset was labeled from the Chrome browser using an extension that monitored application-level information. For Netflix, it parsed the overlay text that contains the video quality statistics. For YouTube, it used the iframe API to extract player status information. For Twitch and Amazon, it parsed the HTML 5 tags. The challenges in this data collection process is getting consistent information across different services. The limitations of this dataset is that we may not be getting a good representation of the entire distribution of network packets.
###### Deept Mahendiratta
To gather data from the four services, the authors use a chrome plugin. The authors were able to apply video quality metrics to each stream as perceived by the client using this Chrome extension. For each provider, they used different collection methods, such as overlay text for Netflix, iFrame API for YouTube, and HTML5 tag parsing for Amazon and Twitch. To acquire data, they used a combination of home and lab networks.
###### Shubham Talbar
In order to label traffic traces with appropriate video quality metrics the authors developed a Chrome extension that monitors application-level information for the four services. The extension supports any HTML 5-based video allowing a user to assign video quality metric to each stream as seen by the Client. The extension collects browsing history by parsing events available from Chrome Webrequest APIs. The collection is further tailored once the service is identified: Parsing overlay text for Netflix. Using YouTube iframe API to extract player status information and video resolution for YouTube. Use HTML 5 tag parsing for Twitch and Amazon.
Since the dataset is specifically collected for these four-services the model developed cannot be generalized for services that are not in the dataset.
###### Satyam Awasthi
Network traffic was measured on 11 different machines in varying network conditions (wired/wireless) in different home networks in the US and France. ChromeDriver was used to start the videos on Youtube, NetFlix, Twitch, and Amazon. And the network traffic was recorded using TCPDump, and QoE labels were captured using a Chrome extension.
The limitation of the dataset is that since it was captured in emulate environments, it might not represent real-world network conditions.
###### Punnal Ismail Khan
The dataset is curated using a chrome extension that monitors application-level information from Netflix, Youtube, Amazon, and Twitch. For Netflix, it was parsing overlay text, For youtube, it was using iframe API and for twitch and amazon, it was parsing HTML 5 tag to get the application level information. This extension then allowed authors to assign video quality metrics to each stream as seen by the client. They were using 60 devices in homes in the USA and 6 devices in france to collect the data. One limitation would be that the data collected from these 66 devices is a lot but still not generalizable enough to represent all network conditions.
###### Nikunj Baid
QoE dataset here is being curated by using a Chrome extension that the authors developed, which monitors application-level information for the services under consideration : Netflix, Amazon prime, youtube, and twitch. The extension is able to handle any HTML5 based video, and assist with capturing the metrics across these services. After the services is identified by the extension, we defined a data collection strategy for each service individually.
Like Parsing overlay text for netflix, iFrame API for youtube etc.
This data was curated for a 14 month period, using 66 different devices. Even though the authors tried to generalize the conditions, it is almost impossibe to capture all the various possibilities that can arrive w.r.t the network conditions.
###### Ajit Jadhav
The authors developed and used a new chrome extension to monitor application-level information for four services (Netflix, Youtube, Amazon and Twitch). The challenge lies in the fact that collecting the QoE metrics is not straightforward enough on many platforms. Also, the dataset quantity seems to be a limitation and we could benefit from having more data.
###### Vinothini Gunasekaran
QoE Dataset: It has been collected for four video streaming services, Netflix, YouTube, Amazon and Twitch. They developed a chrome extension to monitor application level information for labeling the traffic traces. It uses the Chrome browser API to identify video pages and handles each service differently. 1) For Netflix, they collected player and buffer state information by parsing the overlay text. To parse the statistics information without impacting user experience, they injected the user-specific keystroke combination and rendered the text. 2) For YouTube, they collected player status information such as video resolution, available playback buffer and playing position. They used the YouTube iframe API to extract this information. 3) For Amazon and Twitch, they collected the same metrics as collected from YouTube by generalizing YouTube’s data collection method to rely on HTML5 tags.
Limitation: 1) More than half of the dataset has been collected in the lab environment using four laptops by introducing changes in network capacity, loss rates and latency. This may not be a good representation of the real deployment conditions. 2) Their YouTube data is heavily biased towards 360p resolutions whereas all other video services have been operated at diverse resolutions. It impacts the commonness of the training dataset.
###### Pranjali Jain
The dataset is collected across a 16-month period, in 66 home networks in the United States and France, comprising a total of 216,173 video sessions and 13,000 labeled video sessions from 4 video streaming services. The traffic traces were labeled with the appropriate video quality metrics as seen by the client using a Chrome extension that monitors application-level information for the four services, and supports any HTML5-based video.
For Netflix, are collected by parsing overlay text and include player and buffer state information. For YouTube, the iframe API is used to extract player status information. Also, the video HTML tag is used to collect video playback statistics. For Twitch and Amazon, video HTML 5 tag parsing is done to collect data about video playback, and player status information.
While the dataset collects a good amount of QoE metrics for different video streaming services, a major limitation of this approach is that even after the data is collected in two countries and across 16 months, 66 home networks, it may still not be able to capture all real network conditions that are possible.
###### Nawel Alioua
The labeling of the application-level information for the four services (YouTube, Netflix, Twitch and Amazon) is done using a Chrome Extension. The data collection was done from different sites, in two cities in Europe and in the US. The extension collects browsing history by parsing events available from the Chrome WebRequest APIs which exposes all necessary information to identify the start and end of video sessions, as well as the HTTPS requests and responses for video segments. Then depending on the service, they used specific APIs to collect information: overlay text for Netflix, iframe API for YouTube, and HTML5 tag parsing for Amazon and Twitch. The authors mention that they filtered any session that experienced playing errors during the execution, which might be a problem since suboptimal network conditions would potentially not be represented in the dataset.
###### Achintya Desai
It uses a chrome extension to monitor client video quality and infer quality metrics. This extension uses WebRequest APIs to know when the video begins and ends. For Netflix, it parses overlay text every second to collect data. In this data collection stage, it collects information on whether the player is playing or paused, the current buffer level and the resolution of video. For Youtuve, extension uses IFrame API which periodically pulls the data. It uses HTML5 tags to understand the starting point of video and pausing point of the video. For Twitch and Amazon, \<video> tag was used to collect the same information about video as done for youtube. A possible limitation of this dataset could be that all these data collection methods are susceptible to developement changes to the content delivery services like youtube, twitch etc. They may not work if there are major development changes. The paper also mentions encountering playback errors. These cases are much more general when the network is in bad state in terms of throughput. This prevents us from collecting data when the network is actually in a bad condition.
###### Liu Kurafeeva
The paper collects data from 4 different streaming services (YouTube, Netflix, Twitch and Amazon) and combine it to 6 dataset (2 - combined). QoE estimation made by collectiong statictics from Chrome extension. Eleven devices were used with different network situation in different areas. Data were collected for 18 month. Total amount of data records is 13000. Duration of each section was 8-12 minutes. Also the measurment per laptop per day seems strange - maybe the devices were busy and used for general purpose tasks. For 5 devices network conditions were emulated (TC - [50kb/s - 300kb/s, 0-1% loss, 0-30ms latency])
Not all statictics can be really representative to the proper QoE estimation, also they are different for all 4 described services and for all the rest as well. Also the last mile network conditions not very represented, since devices locations are not dense.
#### How will you use PINOT to curate datasets considered in this paper? Will usage of PINOT address the limitations discussed above?
###### Nagarjun Avaraddy
We can define the set of features beforehand and collect them without relying on the client-based modular approach preferred here. This will lead to removal or data schema mismatch between two different video streaming service providers.
###### Brian Chen
PINOT can be used in a nearly identical manner as compared to the paper. By installing a docker instance with the desired web extension, it would be possible to gather data from the different devices connected to PINOT. IN this case, the scope of collection is small, so there should be a higher density of devices relative to area. This ought to make the dataset more reliable for this given area.
###### Samridhi Maheshwari
With PINOT, we can use it in the same way that authors have used their devices - deploying them in different areas and collecting data from the devices remotely. However, PINOT gives us more control over the devices, so we can monitor the videos being played more closely and make sure that we’re getting a varied input data set.
###### Aaron Jimenez
I would use PINOT to capture the data in a similar manner described above. However, I would make sure to arbitrarily add some noise and congestion to the network traffic by having the minions also perform other non-video related tasks during high network usage where it is located. In addition, by having a larger amount of minions, I can get a larger dataset. However, even this does not fully address the issues when it comes to full network condition emulation.
###### Apoorva Jakalannanavar
The data collection approach used in the paper can be scaled further using the PINOT infrastructure. Multiple minion devices can be deployed at various locations to better capture network variations. We can possibly make use of the same video streaming applications and the chrome extensions to capture startup delay and resolution. This can help in overcoming the limitations of the paper interms of data collection.
###### Rhys Tracy
PINOT can be used to distribute video sessions across minion devices making use of the same apis and chrome extensions from this paper (or using alternatives) to capture video information and network traffic. With uninterrupted use of PINOT, you can capture many, many video sessions in 16 months on all of the distributed minions, so with PINOT you likely will not see the same limitation with having few data points on each network.
###### Shereen Elsayed
The techniques discussed here are straight forward and can be applied on PINOT. Chrome extensions are already being used for curating YouTube QoE and Twitch/Amazon seems to be straight forward without the need for the chrome extension. The missing part that they didn't discuss is how did they get accounts for Netflix
###### Seif Ibrahim
We can use PINOT to do the data collection by just running the data collection software on the Raspberry PI instead of the 11 devices they were using.
###### Deept Mahendiratta
We may utilize PINOT in the same way that authors have used their devices: by deploying them in various locations and remotely gathering data from them. PINOT, on the other hand, allows us more control over the devices, allowing us to keep a closer eye on the videos being played and ensure that we're obtaining a diverse range of input data.
###### Shubham Talbar
PINOT could be used to collect and curate dataset using the same extension that the authors developed and the same set of APIs for service specific configurations. The limitation of capturing a variety of video streams from other services could be solved using PINOT since it is deployed over a campus network where a variety of services are used by the diverse underlying crowd. Although the issue of labeling still remains since there are service level APIs used to identify a particular stream content.
###### Satyam Awasthi
A similar approach might be applied as discussed in the paper: use chrome extension to capture labels along with the same APIs. However, to create a more real-world-like session, variable traffic can be introduced by running random tasks on it like running youtube on autoplay and auto resolution on multiple tabs.
###### Punnal Ismail Khan
A similar technique can be used to collect data i.e running video streaming services on minions and using the chrome extension to collect data. The data can be made more generalizable but use a lot of minions in different network settings.
###### Nikunj Baid
We can build a docker image that uses the same chrome extension and deploy it using PINOT on its constituent devices. Depending on the varying network conditions, we can collect data that is more generic. Also, PINOT will enable us to control the data curation process from one place, instead of independent installations in various homes across cities.
###### Ajit Jadhav
PINOT can be used for data collection in almost the same way that the paper used for their data collection. PINOT gives us greater control, thus allowing us to collect data in varying network conditions which can lead to a higher quality dataset.
###### Alan Roddick
We can use PINOT to curate the dataset in a similar way to what was done in the paper. We can be more selective in the periods where we collect data in order to better capture the entire distribution of the domain.
###### Pranjali Jain
PINOT can be used to create the video streaming applications dataset in the same way as described in the paper. PINOT faces similar limitations in capturing different network conditions. However, PINOT can be extended to have more RasPis, so it has the potential to be more flexible and generalizable as a network traffic data collection system.
###### Vinothini Gunasekaran
We can use PINOT in a similar way as described in the paper. We could expand the number of devices used throughout the campus which would simulate various network conditions. Even then, it would still lack the real deployment conditions.
###### Nawel Alioua
PINOT can be used to curate a similar dataset that the author describe, collecting data from several locations. Since PINOT uses smaller devices, it can potentially allow more flexibility in terms of collection sites and greater diversity of the data collected.
###### Achintya Desai
It is easy to see that PINOT can easily be compatible in this case where we can assign a PINOT slave to play the video which can also attempt to collect the data using similar extensions and another PINOT slave to measure the network condition. This way could help us in collecting data even if there are playback issues and will help us in understanding the root cause of the playback issue as well, if needed.
###### Liu Kurafeeva
The PINOT usage will be a lot alike here, but since PINOT have dense devices distribution across the last-mile network and simple way to set the tasks for devices and collect the data. Also PINOT simular dataset will be less affected by device-specifc, since there we can choose from what (for example nearbying) devices we collect the data.
#### What’s the precise learning problem considered in the paper? More concretely, what’s the input, what’s the output? What’s the model selection pipeline?
###### Nagarjun Avaraddy
The learning problem considered is given the network flow features collected for encrypted video services along with the QoE metrics of startup delay and resolution. We build a model which takes the network features as input and predicts the startup_delay and resolution in different set of models respectively.
The model selection for startup delay was mostly done in regeression space with its variants. (Regresssion)
The model selection for resolution was done in more classes of models like AdaBoost, Decision trees, Random Forest and logistic regression. (Classification) F1 Score as metric to balance out precision and recall.
There is also different models based on the composite data, specific data for each video streaming service as well as for different set of features (Network/App/Transport Layer)
###### Brian Chen
The input consists of different sets of features: network layer, transport layer, application layer, network and transport, network and application, and all. The input is further divided into six different datasets: Netflix, YouTube, Amazon, Twitch, all services, and one service excluded. In total, 32 models are trained, all outputting predictions for startup delay and resolution. These 32 models are then assessed based on the feature set used to identify the important feature sets. Afterwards, the dataset used is considered and composite is chosen for its representation of all services.
###### Samridhi Maheshwari
The learning problem in this paper is predicting two video quality metrics - Start up delay and video resolution. The inputs used here are network layer, transport layer, application layer features that are collected from network traces, and the labels are derived from the chrome browser extension. There are a total of 32 classifiers that are trained for 6 different feature sets and 6 different datasets. The authors select models based on how the models are doing on different feature sets and getting model performance for each model (for eg: RMSE for video quality and precision-recall for resolution). Using these metrics they eliminate models which have a high error rate.
###### Aaron Jimenez
In this paper the learning problem being considered is the estimation of the QoE for four streaming services based on their network traffic patterns. The particular QoE metrics being predicted are startup delay and resolution, which are predicted based on different combinations of network, transport, and application-layer features. The authors decided to use two different models to predict the two QoE measures (both random forest). To select which model combination was best performing, they evaluated all the models and found that models that rely on network and application-layer features outperformed the other models.
###### Apoorva Jakalannanavar
The precise learning problem considered is infering QOE metrics, startup delay and video resolution by training a machine learning model from encrypted network packets. The inputs considered for modelling are manually engineered features from network layer, transport layer, application layer and their combinations. The model predicts startup delay and video resolution. For startup delay they considered regression based modelling and for video resolution prediction they considered classification based modelling. They performed extensive hyperparameter tuning using Grid-Search to obtain best model performance based on F1 score and R2 score.
###### Rhys Tracy
The input is the network traffic patterns and video information across different network layers. The output is a prediction of certain quality of experience metrics (startup delay and video resolution). The specific learning problem addressed is giving an ISP good estimates of QoE metrics using network traffic information (since it is hard to discern these metrics with encrypted traffic).
###### Shereen Elsayed
- Input: Video traffic and and packet traces
- Output:
- Model selection pipeline:
###### Arjun Prakash
The authors used features from the network layer, transport layer, application layer, network+application, network+transport, and all layers combined. They trained a total of 32 models with 6 different datasets and varying across these 6 feature sets. All these models were trained to infer the startup delay and resolution. For startup delay, they experimented with 5 different regression models, linear, ridge, SVR, decision tree regressor, and random forest regressor. For the resolution, they evaluated Adaboost, logistic regression, decision trees, and random forest. In both cases, the random forest was performing better.
###### Navya Battula
The paper tries to infet the quality metrics of video streaming - startup delay and resolution. They make use of 6 datasets - Netflix, Prime, Youtube, Twitch and 2 combined datasets. They try to infer startup delay using the regression techniques and the resolution by using a class of techniques like Adaboost, decision trees, random forests, etc.
###### Deept Mahendiratta
Network layer, transport layer, application layer, network and transport, network and application, and all are among the features included in the input. Netflix, YouTube, Amazon, Twitch, all services, and one service excluded are all separated into six different datasets. There are 32 models in total that have been trained, and they all produce forecasts for startup delay and resolution. These 32 models are then evaluated using the feature set that was used to determine the most essential feature sets. Following that, the dataset is reviewed, and a composite is chosen to represent all services.
###### Shubham Talbar
The precise learning problem is to predict startup delay and resolution based on various input features. The authors considered network-layer features (Net), application-layer features (App), transport-layer features (Tran), as well as a combination of features from different layers: Net+Tran, Net+App and all layers combined (All). For each target quality metric, 32 different models were trained in total - varying across these six feature sets and using six different datasets. For each target quality metric 10-fold cross-validation was used. Sessions captured over UDP were ignored for models relying on transport-layer features.
###### Satyam Awasthi
Inferring the video quality metrics - startup delay and video resolution for encrypted video services is the learning problem.
Inputs include the sets of network features, transport layer features, and application layer features. There are a total of 32 models that are trained for 6 different feature sets combinations and 6 different datasets (Netflix, Youtube, Amazon, Twitch, all, and all but one[netflix] excluded).
Model selection is based on how the models perform on different feature sets and evaluating scores for each model (like RMSE for video quality and precision-recall for resolution). Thus, high error rate models are eliminated.
###### Punnal Ismail Khan
The input features include the Network layer features, Transport layer features, application-layer features, and a combination of these features( Net+Tran, Net+App, and
all layers combined). The output of the model was startup delay and resolution. A total of 32 models are trained for 6 different features. Each model is evaluated and how it performed on the different feature sets. Lastly, the best models are selected.
###### Nikunj Baid
The learning problem here is for QoE estimation, more specifically start up delay and video resolution. The input being used here includes combination of various features from network layer, transport layer and application layer. These combinations were applied to different classifiers, which includes data from 6 different datasets. The feature-set / model combination that gave the best performance was used. Random-forest came out as the winner for both the use cases as it gave higher precsion and recall and lower false positive rates. The features that gave the best performance were from the network and application layer.
###### Ajit Jadhav
Network traces are used to get the network layer, transport layer and application layer features that are then used to predict start-up delay and video resolution from encrypted streaming video services. 32 different models are trained that vary across 6 feature sets with the data collected from 4 video streaming sources. For each task, their respective performance metrics are used to select the best models.
###### Alan Roddick
The learning problem is determining the quality metrics such as startup delay and resolution given a features in the Network Layer, Transport Layer, and Application Layer. They also combine the network and transport, and network and application layer features, and all layers combined. There were 32 models trained that were varied between the 6 different feature sets. In addition, the models were trained on six different datasets: Netflix, YouTube, Amazon, Twitch, all sessions combined, and sessions in 3 out of 4 services. They used a 10-fold cross validation to avoid overfitting the training dataset.
###### Pranjali Jain
The paper focuses on building models for the learning problem that tries to infer startup delay and resolution for video streaming applications.
Input is a set of features computed from the captured traffic at various levels of the network stack - network-layer features (Net), transport-layer features (Tran), and application-layer features (App). Combinations of these features are also used as input to models - Net+Tran, Net+App, and all layers combined.
For each target quality metric, 32 models are trained, varying across the 6 feature sets and using 6 different datasets.
The output of the models is predictions for QoE metrics - startup delay and resolution. For startup delay and resolution, the paper selects the model that gives the lowest RMSE and highest precision respectively.
###### Vinothini Gunasekaran
The paper analyzes the quality of streaming for different video streaming services under different network conditions. The model uses the traffic traces as input to predict the video quality for services. For model selection, they started with 32 models in total using six different feature sets and using six different datasets from four different services. For each target quality metric, they evaluated models using cross-validation and chose models that resulted in higher precision.
###### Nawel Alioua
The authors used different regression methods: linear, ridge, SVR, decision tree regressor, and random forest regressor. The model selected is random forests as it shows better performance metrics. Two types of models were trained: composite models (where data from all services is introduced), and specific models (proper to each of the services) The input is the curated dataset containing data from some or all of the four services, and the output is the prediction of the startup delay and the resolution. The composite model performs nearly as well as the specific models, but the constraint is that it needs to have data from all services included in the training set, i.e. it does not generalize well across new services.
###### Achintya Desai
The learning problem here is to prdict the QoE based on the features gathered from an encrypted network flow.
The inputs are different features such as network-layer features (Net), transport-layer features (Tran), application-layer features (App), as well as a combination of features from different layers. These features are collected from network traces and labels are collected from the extension.
The output here is estimated startup delay and one of the predicted resolution classes from 240, 360, 480, 720, 1080p.
The authors trained 32 different models with 6 datasets and feature set combination. In the case of startup delay, they used ridge, linear, SVR, decision tree regressor as regression models. In the case of resolution, they used Adaboost, logistic regression, decision trees and random forest.
###### Liu Kurafeeva
input: packet data features from different layers
output: startup delay and resolution
models: 32 different models for 6 datasets, perfomance measurment for each model
#### Is this problem vulnerable to underspecification? Is the current approach capturing the underlying causal structure of the problem?
###### Nagarjun Avaraddy
Yes. This problem is is vulnerable to underspecification. This problem arises due to the kind of data collected. The problem set out to be solved in the paper to estimate QoE of encrypted video streaming services is not achieved for other video services who are not in the data space. Even the composite model (without a video streaming service) fails to predict that QoE of that video streaming service. It does not capture the underlying causal structure and would need more robust and generalised data to learn the structure.
###### Brian Chen
This problem is vulnerable to underspecification as it seeks to identify patterns among traffic flow. Without a broad enough sample, it is possible that the system might predict accurately only for a subset of locations and network conditions. The current approach seems to be capturing the underlying causal structure of the problem. By working with bitrate and individual video segments, it should be possible to predict resolution to a certain extent.
###### Samridhi Maheshwari
Yes, this problem is susceptible to under specification since the data collected is random and can be specific to certain kinds of videos all of which have similar feature sets in the network/app/transport layers. The current approach works well in capturing all important features and labels but does have a problem of being underspecified.
###### Navya Battula
The problem is vulnerable to the under specification problem in the sense that with only 13K flows, it is not possible to capture all the required network conditions and the dataset can essentially suffer from lack of appropriate features which could prompt in under specification problem.
###### Aaron Jimenez
The model is vulnerable to underspecification in the sense that while the model performs well for the streaming services it was trained with; when applied to other services, it starts to fall apart. As such this approach may not be on the right path, but still not capturing the full underlying causal structure.
###### Apoorva Jakalannanavar
Although the authors have collected the data from 4 different video streaming providers for varying startup delays and video resolutions, the model still seems to be unable to learn the underlying patterns well enough to generalize on unseen video stremaing applications. From Fig 7 and Fig 9, although the model performance is higher, as later highlighted by the authors they dont generalize well. It means that the current dataset collected is not completely representative and hence the underspecification problem exists.
###### Rhys Tracy
This problem is certainly vulnerable to underspecification. As I detailed before, the training dataset only has 13,000 video sessions (over 66 networks); with all the features included in model inputs and wide variations in network traffic, it will be hard to create a well-specified, generalized model with this dataset (particularly for startup delay which happens once per video). The current approach does seem to capture most of the causal structure though: network burst rates and chunk sizes when streaming with DASH can certainly yield information on current video resolution, and other network traffic information can be used to estimate startup delay well.
###### Arjun Prakash
Yes, the problem is vulnerable to underspecification because the data was collected from just 66 homes, and in reality, the network conditions might be very different in different locations and the collected data does not generalize well. I believe it does not capture the causal structure and might need more data from varied network conditions for the model to learn and perform better.
###### Deept Mahendiratta
Yes, because the data collected is random and can be particular to certain types of movies with similar feature sets in the network/app/transport layers. The current method captures all important features and labels well, but it has the drawback of being underspecified.
###### Shubham Talbar
The problem is vulnerable to underspecification since the dataset collected only has videos from four video streaming services and cannot be generalized to features captured from other service providers. The current approach seems to capture the causal structure as it is using a variety of models over different combinations of the collected datasets.
###### Satyam Awasthi
Yes, since the training dataset only has 13,000 video sessions over 66 home networks. For all the feature sets considered in the models and highly variable network conditions, it does not create a well-specified, generalized model with this dataset for all possible streaming services and not also completely for the given services.
###### Punnal Ismail Khan
This is vulnerable to underspecification. Although it works well for the tested streaming services it might not work for other services. Also, 66 devices used might not generalize to all network conditions.
###### Nikunj Baid
It seems like the given problem is indeed vulnerable to underspecification as the dataset just includes 13000 video sessions, which is probably not enough to highlight the plethora of scenarios that the network might have to face. Though it works for the given dataset, it might fail when applied to a large scale real-world dataset. Speaking of the causal structure, it does incorporate all the important features that would contribute to the QoE estimation for the given set of services. It is possible though that these features do not work for other streaming services out there.
###### Ajit Jadhav
The problem is vulnerable to underspecification due to the limited scope of the data collected. The lack of data variety is due to it being collected from limited video streaming services and the collecting not happening across varying network conditions.
###### Alan Roddick
Yes, this problem is vulnerable to underspecification because the datasets given may not be representative of all possible examples that could arise. The features collected seem to be capturing the underlying structure of the problem due to the good performance shown from their results on the streaming services tested.
###### Pranjali Jain
This problem is vulnerable to underspecification since the size of the labeled dataset that is used for model training is small(just 13000 video sessions) and the data is not fully representative of different network conditions. Also, as mentioned in the paper, a truly general model is not developed with this approach, since the model cannot predict video quality for services that are not in the training set. The dataset does capture the causal structure of the problem since the models perfom very well on their datasets.
###### Seif Ibrahim
This problem
###### Vinothini Gunasekaran
Yes, they have shown that their model performs well for their dataset that consists of 13k videos. Since they are collected under manually manipulated network conditions, they might not perform well for the various types of real world network problems. Considering how many other video service platforms out there, the current performance which has been derived using only four video services, may not be of the same quality for other services.
###### Nawel Alioua
The problem is vulnerable to underspecification as the authors confirmed a drop in the performance of the composite model for a specific service as soon as the data from the service is omitted from the training set. This clearly shows that the causal structure of the general problem tackled in this paper is difficult to capture across different services, which means the more services we consider and the more specific data needs to be collected and curated.
###### Achintya Desai
Due to data collection stage not being able to capture wide range of network scenarios, this approach is facing the problem of underspecification. The models used in this paper are already vulnerable to underspecification and for the dataset mentioned in the paper, the problem becomes worse. This makes it unable to capture the causal structure of the underlying problem.
###### Liu Kurafeeva
Yes, since emulation never guaranty good representartion since possible feature field is so huge. Also 6 real-life devices are not very representative, since they are part of different networks. The approach is good in current limitation, but it definetly can lead to underspecification.
#### How is the paper analysing the feature importance? How is this approach different from different interpretability tools we discussed in previous lectures?
###### Nagarjun Avaraddy
The paper analysis feature importance by dividing the features logically into Network, Application and Transpport layer feature sets. It then trains the data on the combination of these above sets of features and the resulting model is used to deem the order of importance of different models. The decision tree approach also gives the direct feature importance measure collorary in the form of Gini Index.
###### Brian Chen
The paper directly trains multiple models across different feature inputs and then compares the models to the validation data. In this manner, the paper then determines which model is most accurate, and by extension which features provide the best result for this task. Instead of trying to explain the import of certain features relative to each other for a specific model, this approach seeks to filter out the important features through the performance of multiple different models.
###### Seif Ibrahim
This paper uses the Gini index for feature importance to determine which features are impacting the model the most whether it's in the network layer, transport layer, or application layer.
###### Samridhi Maheshwari
The paper uses Gini Index for feature importance for start up delay prediction. It also uses the error rate of different models to figure out which types of feature sets are more important (Network layer, Application Layer, Transport layer, or composite features). This is different than traditional interpretability models where feature importance is found by removing feautures from the model and checking the output of the model.
###### Aaron Jimenez
This paper is analyzing feature importance by measuring the error for different combinations of features (network, transport, and application-layer features) and selecting the features used in the models with the lowest error. This differs from other approaches which used tools such as AutoML to calculate the feature importance.
###### Apoorva Jakalannanavar
The paper uses Gini index to compute feature importances for various models trained with combinations of features selected from Network, Application and Transport layer. These feature importances are computed at global level post modelling.
###### Rhys Tracy
The model analyzed average errors and density of errors when using different feature sets as inputs. This gives some idea of what features yield better results (so are likely more important). Additionally, the paper analyses feature sets with Gini Index to give more information on feature importance. This paper actually trains the models on different sets of features to analyze how they perform differently in contrast to previous approaches we've discussed involving explainability in the model.
###### Arjun Prakash
The authors experimented with different combinations of features from the network, transport, and application layer and identified the important features based on the Gini index.
###### Deept Mahendiratta
For the purpose of predicting start-up delays, the paper employs the Gini Index. This differs from traditional interpretability models, which determine feature importance by deleting features from the model and examining the model's output.
###### Shubham Talbar
The authors considered network-layer features (Net), application-layer features (App), transport-layer features (Tran), as well as a combination of features from different layers: Net+Tran, Net+App and all layers combined (All). To further understand the effect of different types of features on the models, Gini indices were evaluated across different services.
This is in stark contrast with the AutoML tools we discussed prior in the class.
###### Satyam Awasthi
The paper uses the error rate of different models to figure out which types of feature sets are more important (Network, Transport, Application Layer, or composite features). It also uses Gini Index for feature importance for start-up delay prediction.
Traditional interpretability tools explain the importance of particular features relative to each other for a model, but this approach filters out the important features by comparing the performances of different models.
###### Punnal Ismail Khan
The authors are using different feature sets: network layer features, Transport layer features, application-layer features, and a combination of these features ( Net+Tran, Net+App, and all layers combined). Then, they select the features which give the lowest error in the model.
###### Nikunj Baid
The authors pick and choose the features from network/application/transport layer and then use that with various classifiers. Gini Index is used to determine the importance of individual features which are then selected for the training of the final model. This is different from earlier approaches because Gini Index actually helps determine the relative importance of features instead of elimination approach used in the earlier methods, where different combinations are used to train the model and then based on the combination that gives the best performance, the subset is selected.
###### Ajit Jadhav
Models are trained using sets of features (network layer, transport layer and application layer). The paper uses Gini based index to understand the effect of different types of features on the models.
###### Alan Roddick
The paper analyzes feature importance based on the Gini index analyzed for different models trained on a different feature set. This differs from other methods such as automatically learning the feature importances with a neural network, or applying some auto machine learning to determine the best features to use.
###### Pranjali Jain
The paper uses Gini Index to understand the effect of different features on the models for startup delay prediction. For resolution prediction, it analyses the feature importance based on input features and the model performance in terms of precision and recall. This is different from interpretability tools that involve using decision trees to evaluate feature importance. Also, this approach uses model performance to evaluate feature importance instead of evaluating the benefits of one feature over the other.
###### Vinothini Gunasekaran
They analyze features that yield the highest precision in the model and choose which has consistently smaller errors. In this case, they choose a combination of features from the network and application layer and studied the feature importance based on the Gini index across the different services.
###### Nawel Alioua
The authors quantify the feature importance using the Gini index.
###### Achintya Desai
The paper uses Gini Index for feature importance in predicting start up delay. This is different than usual interpretability models which, for a given mode, suggest the feature importance relative to each other. However, this paper removes the features by comparing model performance.
###### Liu Kurafeeva
The authors selects different layers combination for models and compare that models on validation dataset. Approach is one of the traditional aproach in ml, but not very traditional for Network area.
#### What domain adaptation technique did the paper use?
###### Nagarjun Avaraddy
Noise is introduced into the training data to tackle the noise generated by practical monitoring, so that it resembles the data collected from deployment. Because the actual start time can fall anywhere within the five second interval, training data is preprocessed and artificially adjusted each session start time over a window of -5 secs to +5 secs from the actual start value in increments of 0.5 seconds. For each new artificial start time, all metrics are recaluclated based on this value for the entire session. This technique has two benefits: it makes the model more robust to noise, and it increases the volume of training data.
###### Brian Chen
The paper adjusted their collected data by shifting a specific data point and then recalculating from there. Specifically, the paper adjusted the start time by increments of 0.5 seconds from the range of -5 to +5.
###### Samridhi Maheshwari
The authors adjusted the start time of each session and all other metrics were adjusted by +-5 seconds in increments of 0.5 seconds to introduce noise in the system and make it more robust. It also increased the volume of the training data.
###### Navya Battula
The paper makes use of certain domain adaptation techniques to create a better emulation of outside network conditions. They accomplish this by introducuing a little noise to push the start time a little bit + or - 5s to be precise to recreate the real world scenarios that could prompt them to get readings based on real world conditions.
###### Seif Ibrahim
The authors added noise to the dataset by putting five of their eleven machines in a simulated lab environment where they varied throughput, latency and packet loss.
###### Aaron Jimenez
The authors tried to emulate network noise by introducing artificial network noise to the dataset when performing lab collection. One of the things they did was they adjusted the start time of the capture session + 5 or - 5 seconds in order to account for the 5 second window where an actual capture session may start.
###### Apoorva Jakalannanavar
Domain adaptation is used in the paper to introduce noise in the dataset collected from the controlled lab setting so that the training data more closely resembles the data collected from deployment. To do this, the start time of each session is adjusted over a window of -5 and +5 seconds from actual start value in increments of 0.5s. This technique makes the model more robust to noise and also increases the size of training data.
###### Rhys Tracy
The paper involved simulation of some noise in the system by adjusting the start time by 0.5 second intervals (from -5s to 5s).
###### Arjun Prakash
They introduced noise into the training data so that it closely resembles the data collected from deployment. They pre-processed the training data and artificially adjusted each session start time over a window of -5 seconds to +5 seconds from the actual start value in increments of 0.5 seconds. This technique makes the model more robust to noise, and it increases the volume of training data.
###### Deept Mahendiratta
To add noise into the system and make it more resilient, the authors modified the start time of each session and all other measures by +-5 seconds in increments of 0.5 seconds.
###### Shubham Talbar
In order to ensure that the training data collected resembles the data collected from deployment - The authors preprocessed training data and artificially adjusted each session start time over a window of -5 seconds to +5 seconds from the actual start value in increments of 0.5 seconds.
###### Satyam Awasthi
Because the actual start time can fall anywhere within the 5 second interval, the start times of each session were adjusted by increments of 0.5 seconds over the window of -5 to +5 and all the metrics were recalculated to make the model more robust to noise, and also increases the volume of training data.
###### Punnal Ismail Khan
They pre-processed the training data by artificially adjusting each session start time over a window of -5 seconds to +5 seconds from the actual start value in increments of 0.5 seconds. For all the new start times they recalculated all metrics based on this value.
###### Nikunj Baid
To curate a more generic dataset, with noise, the start time and other measures for each session was explicitly adjusted within a window of +-5 seconds in increments of 0.5 seconds.
###### Ajit Jadhav
Robustness and data augmentation are both achieved by pre-processing the data to include noise by artificially adjusting the session start time at 0.5 second increments in the range of -5 to +5 seconds.
###### Alan Roddick
The authors introduced noise to account for the real world differences from the lab setting. They modified the start time by +-5 seconds in increments of 0.5 seconds. Then they recalculated the other metrics based on the new value for the session. This allows the model to become more robust to noise and increases the amount of samples in the training set.
###### Pranjali Jain
In order to account for noise in data from real deployments, the paper introduces noise in the training data. This is done by preprocessing the training data to have the start time of each session adjusted over a window of -5 to +5 seconds from the actual start value in increments of 0.5 seconds. For each start time calculated like this, they also calculated the metrics based on this value for each session.
###### Vinothini Gunasekaran
The data collected from the lab environment lacks the noise that is present in practical environments. Since their model involves accounting for additional noise, they introduce the noise manually into the training data to make the lab-collected-data more closely resemble the data collected from real deployment. They do this by adjusting each session start time with +/-5 seconds and recalculating all metrics.
###### Nawel Alioua
The adaptation technique used is introducing noise into the training data so that the training data more closely resembles the data collected from deployment. The training data was thus pre-processed and the start time of each session was artificially adjusted over a window of -5 seconds to +5 seconds from the actual start value in increments of 0.5 seconds.
###### Achintya Desai
In order to emulate real network conditions which are susceptible to noise, this paper adds noise to the training data in the form of a startup delay within the first 5 seconds interval in the increments of 0.5 seconds. According to this artificial noise, training data is adjusted and preprocessed. This makes the model robust to noise.
###### Liu Kurafeeva
In the paper they adjusted start time of each session to introduce noise, but it is not very clear, why this specific chage were suggested and how it adopts to different domains.
#### WALTER Figure 15 and Figure 17 in the paper.
###### Nagarjun Avaraddy
Figure 15 is showing startup times for various network speeds for different video streaming providers. The x axis is network speed and y axis is the startup time. The visualisation for different services show different relationships between startup time and network speed.The startup delay more or less stays the same even in different network speeds. This non-intutitve result is reasoned by observing it might be variations in capacities in end to end path.
Figure 17 shows the relationsjip between resolution and network speeds. The clear observation, that higher resoliutions are observed in more percentages when the network speed is high, is made.
###### Brian Chen
Figure 15 exists for the sake of visualizing the range of startup times across different services and network speeds. The horizontal axis is network speed tiers grouped by mbps. This is further divided into different services. The vertical axis is the startup delay in sconds. There are a total of 16 different box plots across the figure, each representing a different combination of service and speed tier, and all of them depict startup delay. Netflix and YouTube have no trends, but Amazon delay increases and then decreases as tiers increase. Twitch delays increase in distribution as tiers increase, and there is no decrease in delay. Overall, the figure shows that startup delay does not decrease as on might predict when network speed increases.
Figure 17 visualizes the distribution of different qualities at different network speeds. The horizontal axis is again different speed tiers grouped by mbps and further divided by service. The vertical axis maps the distribution of video quality out of 100%. Here, each tier has five differently colored sections that correspond to the distribution which a quality occupies amongst the speed tier. Netflix trend is confusing as the proportion of highest quality decreases while middle quality increases. YouTube doesn’t seem to have much of a trend at all. Amazon actually shows an increase in 1080p quality at higher tiers, but there is actually an increase in the lowest quality at the 50-100 tier. Twitch has a trend that goes against intuition. The lower tier seems to perform much better than the higher tiers. Ultimately, the figure displays how the different services respond to different network speeds. Some of the responses are not as one might expect.
###### Samridhi Maheshwari
Figure 15 - Figure 15 shows speed tiers on x axis and start up delay on y axis. The graph is depicted as candle sticks. The major observation from this figure is that even if speed tiers increase, the distribution of the start up delay is not changing by a lot. Start up delay remains almost similar independent of speed tiers.
Figure 17 - Figure 17 shows speed tiers on x axis and resolution on y axis. The graph is depicted as a stacked histogram. The major observation from this figure is that even if speed tiers increase, the distribution of the higher resolutions is not changing by a lot except for Amazon and Twitch which show some amount of variance in resolution vs speed tiers especially in higher speed tiers.
###### Aaron Jimenez
Figure 15 measures the nominal home network speed versus video startup delay. The x-axis represents the network speed tiers (mbps), while the y-axis represents startup delay (seconds). Each line is meant to represent a specific speed tier. The graph shows that the median startup delays for each speed tier is largely the same across the services studied, regardless of the increases in the network speed tier. In the end this figure is meant to show that there is a large amount of diminishing returns in terms of startup delay, the greater one’s home speed tier. In fact, for most users, regardless of tier, their startup delay will most likely remain the same.
Figure 17 measures the nominal home network speed versus experienced video resolution that a user experiences. The x-axis represents the network speed tiers (mbps), while the y-axis represents experience resolution (percentage of time per resolution). Each bar represents a specific range of speed tiers. The graph shows the percentage of time users experience a specific resolution (240p - 1080p) for each speed tier. The graph shows a similar trend as Figure 15, where nominal network speed tier does not play a huge role in determining video resolution. This shows that it is not just network conditions, alone, that determine experience resolution. Other factors, such as a user’s hardware, can also play a large role in determining experienced resolution.
###### Apoorva Jakalannanavar
Figure 15 - Box plots between startup delay(y-axis) and nominal speed tiers(x-axis). This plot shows that the median startup delays for each service tend to be similar across the speed tiers. This is contrary to what we expect since higher speed tiers are more costly, but this study indicates that in reality a higher speed tier does not provide any benefit in terms of startup delay for video streaming applications.
Figure 17 - Plot between resolution(y-axis) and nominal speed tier(x-axis). This figure doesn't have a clear trend for nominal speed tiers. However, this figure can be used to infer that higher speed tiers do not guarantee higher resolution.
###### Rhys Tracy
Figure 15 - Shown is startup delay times as compared to network speed tiers across the different streaming services. There is no real trend across these graphs. The point of the figure is to show that speed tiers don't have the biggest influence on startup times either way, and startup delays can range from 0s to almost 20s across all services.
Figure 17 - Shown is video resolution as compared to network speed tiers across the different streaming services. Again there aren't really any consistent trends across the services. The only clear trend at all is that Twitch sees worse video quality as a whole when network speed tier increases. The point of the figure is to show that ISP speed tiers don't have much influence on video streaming resolution.
A big conclusion from both figures is that ISPs will need to address other factors in their networks to improve users' quality of experience when streaming videos (having users upgrade their speed tier most likely will not improve their quality of experience at all and can even make it worse in some cases).
###### Seif Ibrahim
Figure 15 shows how startup delay varies with different speed tiers on the y-axis. There is no clear trend other than the fact that the medians are around 5 or 6 seconds with a standard deviation of about 2 seconds.
Figure 17 shows resolution versus different speed tiers. The y-axis is resolution (e.g. 1080p) and the x-axis are the different speed tiers in Mbps. There is generally no trend across the tiers except for Twitch.
###### Arjun Prakash
Figure 15 shows us how the startup delay varies with an increase in nominal speed tiers. The y-axis represents the startup delay and the x-axis represents the nominal speeds. We can see that median startup delays tend to be similar across the subscription tiers. Netflix and YouTube achieve a median close to a 5-second startup delay across all tiers and Amazon has a startup delay of less than 6 seconds. Twitch’s startup delay differs by ± 2 seconds across tiers, but these plots exhibit no trend of decreasing startup delay as nominal speeds increase as one would expect.
Figure 17 shows the resolution versus the nominal speed tier graph. The y-axis represents the experienced resolution and the x-axis represents the speed tiers. We could see a small increase in resolution with increased capacity for Netflix, but it's not that clear with YouTube and Amazon. The graph is almost consistent for youtube across different tiers. We can say that with increased speed tiers the experienced resolution remains the same and offers no benefit.
###### Deept Mahendiratta
Figure 15 On the x axis, there are speed tiers, and on the y axis, there is a start-up delay. Candlesticks are used to represent the graph. The main takeaway from this graph is that, even when the speed tiers grow, the distribution of start-up delay does not change significantly.
Figure 17 On the x axis are speed tiers, while on the y axis is resolution. A stacked histogram is used to represent the graph. Except for Amazon and Twitch, which show considerable volatility in resolution vs speed levels, especially in higher speed tiers, when speed tiers increase, the distribution of higher resolutions does not change significantly.
###### Shubham Talbar
Figure 15 displays the Startup Delay Inference (y-axis) vs nominal Speed tier (x-axis). The unexpected result interpreted from the graph is that the median startup delay for each service is almost similar across the subscription tiers. One possible explanation for this anomaly is that the actual speeds vary considerably over time due to variations in available capacity along end-to-end paths due to diurnal traffic demands.
Figure 17 presents Resolution (y-axis) vs Nominal Speed Tier (x-axis). The different resolution metrics captured were 240p, 360p, 480p, 720p and 1080p highest from darkest blue to faintest blue. Netflix and YouTube had very consistent QoE across the variety of speed tiers in terms of the QoE metric distribution. Amazon’s metric distribution improved with the speed tier. But for Twitch the Resolution distribution for 1080p actually dropped with a better speed which seems unexpected.
###### Satyam Awasthi
Figure 15 shows the startup delay time inferred for different speed tiers in each streaming service (Netflix, Youtube, Amazon, Twitch). These graphs do not seem to vary much across the speed tiers. Thus, we conclude that speed tiers don’t have much influence on startup times. So, the startup delays can range from 0s to almost 20s across all services.
Figure 17 shows the video resolutions as compared for different speed tiers in each streaming service. Like the startup delay, here too there isn’t a significant variation across the speed tiers. However, Twitch seems to generally have poor video quality when the network speed tier increases.
This reveals that the speeds that consumers purchase from their ISPs have considerably diminishing returns with respect to video quality.
###### Punnal Ismail Khan
Figure 15 shows the relationship between startup delay for different speed tiers. The x-axis is the speed tier and the y-axis is the startup delay. The figure has box plots for Netflix, youtube, twitch, and amazon for 4 different speed tiers. The startup delay stays approximately the same for all speed tiers.
Figure 17 shows the relationship between experienced resolution for different speed tiers. The x-axis is the speed tier and the y-axis is the experienced resolution. It can be observed that the resolution for Netflix and youtube is not changing much for different speed tiers but we see some variations with amazon and twitch. For amazon, 1080p resolution increases with a higher speed tier. For twitch, resolution decreases with high tier which is a bit weird
###### Nikunj Baid
Figure 15 - It demonstrates the relation between nominal speed tier (x-axis) vs Startup delay inteference ( y-axis ). One can observe that the median startup delay for the given services tend to be quite similar. This just shows that the startup delay is not directly related to the network speed tier, in contrast to popular belief.
Figure 17 - It demonstrates the relation between nominal speed tier ( x-axis ) vs video resolution ( y-axis ). Netlfix and youtube’s resolution was pretty much unaffected with the variation in speed tier, whereas Amazon’s resolution only slightly improved with the increase in speed tier. Twitch, on the other hand actually performed worse in terms of resolution at higher speed tier.
From the trends shown in the graph, it seems like increase in network speed tier does not really affect the resolution of the video being played.
###### Ajit Jadhav
Fig. 15: The figure shows the range of startup times for all 4 services used across different speed tiers. The x-axis represents speed tiers (in mbps) and the y-axis represents startup delay (in seconds). From the figure, we can see that startup times don’t vary significantly based on the speed tiers.
Fig. 17: This figure presents graphs of resolution versus the nominal speed tier for the 4 streaming platforms. The x-axis represents speed tiers (in mbps) and the y-axis represents the percentage distribution of experienced resolutions. We observe higher resolutions with higher capacities though it is less evident for nominal speed tiers.
###### Alan Roddick
Figure 15 shows the startup delay based on speed tiers for the 4 different streaming platforms. The y axis is the startup delay in seconds and the x axis is the speed tiers in mbps. For Netflix and YouTube, the startup delay is very skewed to high startup delays, even though the majority of the startup delays are low. Amazon is also very skewed except for the 500 - 1000 mbps speed tier which did not have as many outliers. Twitch was very inconsistent depending on the speed tier. The lowest speed tiers had a larger percentage of samples with a startup delay close to 5 seconds, while the two fastest speed tiers had a larger percentage of samples with a higher startup delay.
Figure 17 shows the experienced resolution based on the speed tiers for the 4 streaming platforms. The y axis is the experienced resolution in percent and the x axis is the speed tier in mbps. The darkest blue is 240p, and then 360p, 480p, 720p, and 1080p as the colors get lighter. Netflix and YouTube have the most consistent QoE no matter what the speed tier category was. The largest number of samples had a resolution of 480p, and the smallest number of samples had a resolution of either 1080p or 360p. For Amazon and Twitch, the results were more unbalanced. Amazon’s results made the most sense, as the speed tier increased, so did the percentage of the samples that experienced higher resolution. For twitch, on the other hand, the lowest speed tier had the highest percentage of high resolution and the highest speed tier had negligible amount of samples with a high resolution. This is a very surprising result.
###### Vinothini Gunasekaran
Figure 15 shows the plot between startup delay (Y axis) and nominal speeds (X axis). In this figure, we can see that median startup delays are similar across the subscription tiers. YouTube and Amazon achieve startup delays of <5 sec and <6 sec respectively. Netflix and Twitch’s delays differ by +/-2 sec with the increasing nominal speeds.
Figure 17 shows the plot between resolution (Y axis) and nominal speed(X axis). The graph shows how the resolution changes over different speed tiers. Netflix gets the higher resolution when the capacity increases, but YouTube and Amazon’s trend is not consistent. Overall, the trend is not clear from this graph.
###### Pranjali Jain
Figure 15 shows the plot between Startup Delay Inference(y-axis) and Nominal Speed Tier(x-axis) for all four video streaming services. The main takeaway from this figure is that median startup delays for each service tend to be similar across the subscription tiers. We don’t see a trend of decreasing startup delay as nominal speeds increase, contrary to popular belief.
Figure 17 shows the plot between Resolution(y-axis) and Nominal Speed Tier(x-axis) for all four video streaming services. For Amazon and Twitch, there is an increased percentage of bins with higher resolution as capacity increases. Netflix and Youtube have a better resolution at higher speed tiers. However, the trend is less clear for all services in nominal speed tiers.
###### Nawel Alioua
Figure 15 presents box plots of startup delay versus nominal speeds, and shows that median startup delays for each service is similar across the subscription tiers, especially for Netflix and YouTube. This can be explained by the variation of network conditions along the end-to-end path from servers to clients. Note that the mount of data collected in the highest tiers of Amazon and Twitch is small, so it might not be representative of the real trends for those two services.
Figure 17 presents the resolution versus the nominal speed tier for each of the four services. The figure shows a clear trend of increased percentage of bins with higher resolutions as capacity increases. Netflix and YouTube in the highest speed tier achieve about 40% more 1080p than in their lowest speed tiers.
###### Achintya Desai
Figure 15 plots speed tiers in mbps(X) vs startup delay in seconds(Y). It includes separate plots for each streaming service. It can be observed that the median startup delay is similar among all subscription(speed) tiers. Youtube and Amazon roughly achievee <6 seconds in all tiers. However, there is no trend of decreasing startup delay as the speed tier increases.
Figure 17 is similar to figure 15 except it plots speed tiers in mbps(X) vs resolution experienced in %(Y). Except highest speed tier, Amazon and Twitch show increased percentage in higher resolution as the network conditions become better. In general, Youtube streams at a lower resolution for same network conditions than other services. However, this data seems unreliable. Authors also mention the same idea that the video resolution relies on different things than just the network conditions. It depends on availability of the video in higher resolution, tailored content to device type etc.
###### Liu Kurafeeva
Figure 15: Speed tiers (X) startup delay (Y). Startup delay almost independed from speed tiers. The result slightly varries for different sources
Figure 17: Speed tiers (X) resolution (Y). For Amazon and Twitch dependecy obviouse, espesially in high speed tiers, for others trend is not so clear.