owned this note
owned this note
Published
Linked with GitHub
# Pensieve Reviews
#### Question 1: What problem is this paper solving?
###### Jaber Daneshamooz
The paper tries to increase the quality of experience of users when they are using streaming platforms like youtube. It mentions the shortcomings of current adaptive bit rate(ABR) algorithm(in DASH protocol)
###### Shereen Elsayed
The current ABR algorithms do not achieve the optimal performance due to the assumptions and simplified models of the deployment environment.
###### Nagarjun Avaraddy
This paper tries to propose an RL based ABR algorithm as an improvement over the exsiting state of art ABR algorithm which are control rule based and suscpetible to overfitting to limited set of network conditions.
###### Aaron Jimenez
This paper is trying to create a new ABR system that is more able to predict future network conditions and provide a high QoE accordingly. The system they propose uses deep reinforcement learning in order to accomplish this. This is in contrast to other ABR solutions which use set heuristics to determine requests such as video resolution.
###### Samridhi Maheshwari
This paper is applying a Reinforcement learning solution to the classical problem of ABR to optimise video quality of user experience. It uses a nn model to select bitrates for future chunks based on observations collected by client video players. It does not rely on pre programmed models or assumptions about the network environment that earlier works do.
###### Liu Kurafeeva
This paper solves the problem of poor perfomance for current ABR algorithms for vidoe streaming.
###### Achintya Desai
The paper addresses the difficulty of state of the art ABR algorithms to reach optimal performance. This is a result of inaccurate model of deployment environment based on assumptions related to network throughput, playback buffer size etc. which need to be known apriori and prevents the selected algorithm from scaling well for a generalized case.
###### Pranjali Jain
ABR are used to optimize QoE on client side. Majority of existing ABR algorithms use fixed rules or heuristics to make bitrate decisions. These schemes require significant tuning and do not generalize to different network conditions and QoE objectives. This paper proposes an RL-based approach to generate ABR algorithms that can adapt to a wide range of network environments and QoE metrics.
###### Alan Roddick
The paper is trying to solve the problem that many ABR algorithms are unable to achieve high performance given a set of network conditions that are different from those that were trained on. This is due to the models being trained on a distribution of data that can differ drastically from the entire distribution.
###### Arjun Prakash
The paper proposes a system that generates an ABR algorithm for predicting bitrates for future video chunks using neural network based reinforcement learning.
###### Rhys Tracy
This paper attempts to improve on current adaptive bitrate algorithms by using reinforcement learning with QoE based rewards.
###### Fahed Abudayyeh
Pensieve aims to solve the problem that previous ABR algorithms are tailored for a specific set of network conditions, resulting in poor performance when network conditions differ from the dev environment.
###### Shubham Talbar
Client-side video players employ adaptive bitrate algorithms (ABRs) to optimize user quality of experience (QoE). Existing state-of-the-art ABR algorithms suffer from a key limitation that they use fixed control rules based on simplified models. These schemes fail to perform on a broad set of network conditions. Hence, the paper tries to solve this problem via a new system called Pensieve which is a deep reinforcement based ABR learning algorithm.
###### Seif Ibrahim
This paper solves the problem of choosing an ABR algorithm for videos that can generalize to many different network conditions while still giving good performance. Part of the problem is that preset algorithms only take into account the development environment and do not generalize well.
###### Vinothini Gunasekaran
Since the state-of-the-art ABR algorithms use fixed rule based models, they fail to provide optimal performance in a wide range of network conditions and QoE objectives. So, this paper proposes a system that automatically learns ABR algorithms that adapt to various network environments using Reinforcement Learning.
###### Satyam Awasthi
The current ABR algorithms are based on throughout prediction which is hard for uncertain network conditions and so, results in an oversimplified model for the deployment environment and, cannot achieve optimal performance. The paper tries to solve the problem by using RL based solution instead for predicting network conditions.
###### Punnal Ismail Khan
This paper presents a model that can predict the bitrate for video streaming better than the existing heuristic-based approaches.
###### Nikunj Baid
The existing schemes used to determine the ABR for video streaming applications use fixed rules and unable to utilize the network conditions aptly.
The paper proposes a deep learning soln using RL to tackle this problem and enable the clients to make smarter decisions based on the dynamic network conditions.
###### Ajit Jadhav
The paper presents a reinforcement learning based solution to make ABR decisions for adapting to a wide range of environments and QoE metrics.
###### Brian Chen
This paper is trying to create a machine learning solution to the issue of adaptive bitrate. The motivation behind this is that network conditions are not necessarily static and may not be properly represented with heuristic abased adaptive bitrate algorithms. By using a machine learning model based on the results of past decisions, this paper seeks to address this issue of inflexibility.
###### Nawel Alioua
The paper addresses a key limitation in state-of-the-art ABR algorithms, that is the use of fixed control rules, based on simplified or inaccurate models of the deployment environment.
#### Question 2: Why is that problem important?
###### Jaber Daneshamooz
Beacuse if the users experience low QoE, they would not watch the content. In this way, the streaming company looses benefit and customer. The experiments show that if the video pauses for 2 sec, the users would not watch that video.
###### Shereen Elsayed
ABR is created in order to have a better user experience during watching a video. Recent studies showed that users close the video if the quality is not good i.e buffers a lot, bad resolution...etc.
###### Aaron Jimenez
This problem is important because current heuristic-based ABR systems tend to emphasize a specific metric to base its estimations on. This can lead to issues when the network experiences periods of instability for which the system may not be able to account for, thus lowering the user’s QoE.
###### Samridhi Maheshwari
Recent years have seen a rapid increase in the volume of HTTP based video streaming traffic. ABR algorithms are the primary tool that content providers use to optimize video quality. The majority of existing ABR algorithms develop fixed control rules for making bitrate decisions based on estimated network throughput, playback buffer size , or a combination of the two signals. These schemes require significant tuning and do not generalize to different network conditions and QoE objectives. Thus making a system which generalizes well and does not have any assumptions about the network conditions is important.
###### Nagarjun Avaraddy
This problem is important as improvement in ABR will generate a good QoE for consumers and this will lead to increased revenue for the video streaming services. The exisiting state of art ABR algorithms have fixed control rules which are not very generalisable to different network conditions. Employing RL and Neural Network to model the algorithm ensures the robustness of the solution.
###### Liu Kurafeeva
Currently the QoE for users of video streaming become a very important question - people tend to leave the streaming platform if QoE is not good enough. Current algotrihtms based on heuristics which works poorly in general cases and very poorly in specific cases.
###### Achintya Desai
Unlike problems that deal with cost of deployment of ML model, this problem is more important to the end-user because the performance of ABR algorithm has a direct and significant impact on video streaming experience. A state-of-the-art ABR algorithm might demonstrate the best result in theory under certain assumption. However, this does not translate well in practice under generalized network scenario.
###### Pranjali Jain
The popularity of video streaming traffic and demand of high video quality by users makes ABR algorithms an important tool that content providers can use to optimize video quality. It is useful to create a system that does not reply on pre-programmed models or assumptions while calculating ABR, and generalises well to different network conditions and QoE metrics.
###### Alan Roddick
This problem is important because low QoE can greatly impact a user’s satisfaction, and therefore a company’s success. If ABR continually underperforms for a user with different network conditions, they may get fed up and stop using the service.
###### Arjun Prakash
It is important because it helps in improving the quality of experience for the user. Without good quality, the content providers might lose their customers.
###### Fahed Abudayyeh
It is important to adapt ABR to a variety of network conditions when the goal is to reach as many clients as possible. Optimizing the QoE for all types of network conditions is crucial for important streaming services that should be accessible to people on any network.
###### Rhys Tracy
ABR definitely plays a large role in an end user’s overall quality of experience, so improvements to the adaptive bitrate algorithm used can have large impacts on a user’s quality of experience.
###### Shubham Talbar
In the past few years there has been a concurrent rise in the volume of HTTP-based video streaming traffic and user demands for video quality. Most users quickly abandon video sessions due to its poor quality which leads to significant loss in revenue to content providers. ABR algorithms are the primary tool that the content providers use to optimize video quality. Hence, the problem of developing a better ABR algorithm that adapts to the network conditions of clients could be extremely beneficial to content providers to improve their QoE.
###### Seif Ibrahim
This problem is important because improving ABR algorithms will improve QoE for users which will result in smoother video playback and more user satisfaction.
###### Satyam Awasthi
It is crucial for the video streaming service providers to ensure that their users have high QoE for given network conditions for example experiments show that if the video pauses for 2 sec, the users might leave the platform with a high chance.
###### Vinothini Gunasekaran
Due to the increase in HTTP based video streaming traffic, the user demand on video quality has also increased. It is important for content providers to deliver high quality video to its users satisfaction. Also, the existing ABR algorithms are not providing optimal performance for a wide range of network conditions. So, it makes sense to have a more flexible and effective model to dynamically learn a control policy for the various characteristics of the network.
###### Punnal Ismail Khan
This is important because if the bitrate is not being predicted accurately, this will lead to bad QoE for users and any video streaming company with bad QoE will lose customers.
###### Nikunj Baid
The problem is important because content providers incur immense loses when the QoE is not great for the end users and the authors have proposed a superior solution to improve the QoE which performs better than the state-of-the-art solutions for ABR prediction.
###### Ajit Jadhav
Since the existing approaches use fixed control rules based on simplified or inaccurate models of the deployment environment leading to inefficient ABR control.
###### Brian Chen
User experience can directly lead users to either appreciate or spurn a service. Hence, if an efficient and more accurate quality predictor can be created, then there is money to be gained. Essentially, the problem relates directly to user retention and customer base satisfaction.
###### Nawel Alioua
ABR algorithms that are based on fixed rules fail to achieve optimal performance across a broad set of network conditions and QoE objectives, which calls for a more flexible and adaptive approach.
#### Question 3: Describe the four practical challenges in designing a good ABR algorithm.
###### Jaber Daneshamooz
First, past thoroughput observations can not predict what would happen in the future(network throughput is variable and uncertain). Second, we have conflicting QoE goals like Bitrate, Rebuffering time and smoothness. Also, the decision for the next chunck can affect the decisions in future(cascading effects of decisions). Also, they need to take the coarse-grained nature of ABR decision into account
###### Aaron Jimenez
There are four main challenges that a designer must take into account when designing an ABR system to select the right bitrate. They must take into account network throughput variability, various conflicting QoE requirements (high bitrate, resolution, rebuffering events, etc.), the cascading effect of selecting a specific bitrate (a high bitrate may drain the buffer and force a rebuffering event in the near future), and the coarse-grained nature of the control decisions an ABR algorithm must make.
###### Shereen Elsayed
The four challenges are:
- The lack of consistency for network throughput
- Conflict between the video QoE requiements
- Consequences of bitrate decisions
- Coarse-grained nature of ABR decisions
###### Samridhi Maheshwari
- Changing networking conditions - network conditions can change over time and vary across environments. For example, on time varying cellular links, throughput prediction is inaccurate and does not account for fluctuations in the bandwidth. More stable inputs like buffer occupancy need to be prioritised.
- Balancing QoE goals - Often times, requesting for higher quality with limited bandwidth can result in increasing rebuffering rates. On the other hand, choosing highest bitrate that the network can support at that time can result in a poorer quality. Hence, this balance needs to be carefully decided.
- Cascading effects of selected bitrate - selecting a high bitrate may deplete the playback buffer and force subsequent chunks to be downloaded at low bitrates to avoid rebuffering. A given bitrate selection will directly influence the next decision when quality is considered.
- Decision are coarse grained resulting in bitrate being above or below the threshold resulting either in rebuffering or high quality at times.
###### Nagarjun Avaraddy
The four challenges to design a good ABR algorithm are
- The changing network conditions which lead to unstable throughput predictions which inturn leads to prioritising the buffer occupancy.
- The variety of QoE metrics to achieve; the balance between the quality of bitrate and the throughput and the buffer occupancy is not trivial to achieve and requires some dynamic finetuning.
- The consequences of ABR decision output; given a ABR result for a particular bitrate the deciison will effect the next set of chunks, as the bitrate quality might be too intensive/too simple for the set of network conditions.
- The output space being coarse grained - this leads to only a few classes to be predicted and the decision to select the either of the two bitrates is again a tradeoff on the better QoE and the network condition rebuffering risk.
###### Liu Kurafeeva
- Sudded fluctuations in network bandwidth (very common for mobile networks)
- Conflict between goals of optimization (maximizing quality, minimizing the rebeffering events, etc)
- Risk management when bitrate selection is limited
- Cascadeness of the ABR decisions effects - when once dicision is wrong we need to adopt to concequences of that as well
###### Alan Roddick
1. The variability of network throughput
2. Conflicting video QoE requirements
3. Cascading effects of bitrate decisions
4. Course-grained nature of ABR decisions
###### Arjun Prakash
- Variability of network throughput - network conditions can fluctuate and vary significantly across environments which complicate bitrate selection
- Conflicting video QoE requirements - Balancing a variety of conflicting QoE goals such as maximizing video quality, minimizing rebuffering events, and maintaining video quality smoothness
- Cascading effects of bitrate decisions, like selecting a high bitrate may drain the playback buffer to a dangerous level and cause rebuffering in the future.
- Coarse-grained nature of ABR decisions. The algorithm has to make a decision based on limited available bitrates for a given video
###### Fahed Abudayyeh
- Network conditions constantly change over time and between environments
- Tradeoffs exists when optimizing for QoE. QoE should be estimated differently depending on network conditions and the type of application
- Bitrate selection for a particular chunk may impact the selected bitrate of the next chunk. Optimizing a streaming experience across many chunks becomes a difficult problem.
- Bitrates to switch between across chunks are limiting. ABR algo forced to make suboptimal choice when choosing between limited selection of bitrates.
###### Achintya Desai
- Network throughput variability: Fluctuations in network conditions can vary the throughput in an unpredictible manner. Hence, it is not a reliable quantity to make bitrate decisions in practical scenario.
- QoE requirement conflicts: During the balancing act of various QoE factors, they might be in direct or indirect conflict with each other. This calls for better trade-off between conflicting QoE requirements which also varies across different users, making it a harder task.
- Cascading effects of bitrate choice: Selecting a high bitrate could potentially fill the playback buffer quickly and, thereby, resulting in low bitrate for later parts of the video
- Coarse-grained ABR decisions: Decision to change bitrate are limited for a given video.
###### Rhys Tracy
First, network conditions can vary wildly over time and across networks, so it makes it difficult to create an algorithm that can work well across all possible network conditions. Second, ABR algorithms impact many different QoE metrics (that can often be conflicting), so need to find a way to balance all of them well. Third, ABR algorithms need to be conscious of how quality changes impact the network and the device's buffer as well as the user’s QoE (meaning an ABR algorithm changing quality can change conditions and force it to immediately change back to the previous quality). Lastly, ABR algorithms only have access to coarse network information meaning all its decisions are based on estimations.
###### Shubham Talbar
Following are the four major challenges in designing a good ABR algorithm-
1. the variability of network throughput
2. the conflicting video QoE requirements (high bitrate, minimal rebuffering, smoothness, etc.)
3. the cascading effects of bitrate decisions (e.g., selecting a high bitrate may drain the playback buffer to a dangerous level and cause rebuffering in the future)
4. the coarse-grained nature of ABR decisions
###### Seif Ibrahim
The first challenge is that network conditions can vary causing certain features to fluctuate much more than others, essentially making those features less useful for prediction, the model needs to adapt by more heavily weighting other features. The second challenge is that "improving QoE" could mean multiple things because there are multiple conflicing factors such as resolution and number of rebuffers and we have to decide which factors to weigh more. The third challenge is that each ABR decision has cascading effects on future decisions. Finally, ABR is coarse-grained in the sense that there are only a few available resolutions to select from so making a decision means we may need to round up or down to the nearest available resolution.
###### Satyam Awasthi
* Uncertain network conditions - past throughput observations cannot accurately predict the throughput for the next time window
* Conflicting QoE goals - Trying to achieve a higher video quality and lesser rebuffering, and no bitrate fluctuations is difficult. Selecting higher bitrates could result in rebuffering if there’s not enough playback buffer.
* Cascading effects of bitrate decision- A given bitrate selection will directly influence the next decision when quality is considered. Choosing a high bitrate may deplete the playback buffer and force subsequent chunks to be downloaded at low bitrates to avoid rebuffering.
* Coarse-grained nature of ABR decisions: Bitrate being above or below the threshold resulting either in frequent rebuffering or low quality at times.
###### Pranjali Jain
ABR algorithms adapt the video bitrate to the underlying networking conditions to maximise the QoE on the client side. Selecting the bitrate can be tricky due to: 1. Variability of network throughput, 2. Conflicting video QoE requirements like high bitrate, minimal rebuffering, smoothness,etc., 3. Cascading effects of bitrate decisions, 4. Coarse-grained nature of ABR decisions
###### Punnal Ismail Khan
1) The throughput of the network varies in general network conditions.
2) Conflicting video QoE requirements. Some may want high bitrate and smoothness, others may want minimum rebuffering, etc
3) The cascading effects of bitrate decisions. For example, selecting a high bitrate may empty or reduce the number of chunks to low level in the playback buffer and cause rebuffering in the future.
4) ABR algorithms are coarse-grained as they are limited to the available bitrates for chunks in a given video. E.g throughput might be below for one video chunk but high for the next.
###### Nikunj Baid
- The extremely dynamic nature of the network conditions, and it is difficult to account for sudden fluctuations.
- The QoE metrics are inherently contradicting, as high bitrate at all times would mean more re-buffering, and immediately updating the ABR by reacting to the current network conditions would mean less smoothness.
- Selecting bitrate for a given chunk can have cascading affect on the remaining chunks that are fetched.
- There are only a limited set of ABRs that can be chosen from while prediction, and it can be a tough choice to make between high quality vs more re-buffering events.
###### Ajit Jadhav
- Constantly changing network conditions
- Need to handle conflicting QoE goals
- Subsequent effects of current bitrate selection
- Coarse grained control decisions
###### Vinothini Gunasekaran
- Variability of network throughput: Network conditions can fluctuate across different environments. This complicates bitrate selections as different scenarios might require different input signals
- Conflicting video QoE requirements: Balancing a variety of goals such as achieving highest average bitrate, minimizing rebuffering events and avoiding constant bitrate fluctuations are inherently conflicting.
- Cascading effects of bitrate decisions: A given bitrate selection will directly influence the next decision since ABR algorithms are less inclined to change bitrates.
- Coarse-grained nature of ABR decisions: If the estimated throughput falls just below one bitrate but above the next available bitrate, ABR algorithm must decide between higher quality prioritization and risk of rebuffering.
###### Brian Chen
The four challenges are variability of network throughput, conflicting video quality of experience requirements, cascading effects of bitrate decisions, and coarse-grained nature of adaptive bitrate decisions.
###### Nawel Alioua
(1) Network conditions can fluctuate over time and can vary significantly across environments.
(2) ABR algorithms must balance a variety of conflicting QoE goals such as maximizing video quality, minimizing rebuffering events, and maintaining video quality smoothness.
(3) Bitrate selection for a given chunk can have cascading effects on the state of the video player. For example, the playback buffer may be depleted by high quality chunks and force the subsequent chunks to be transmitted at a lower quality.
(4) The control decisions available to ABR algorithms are coarse-grained as they are limited to the available bitrates for a given video.
#### Question 4: Why learning-based approach makes sense for ABR algorithms? Are you satisfied with the arguments presented in the paper? Explain.
###### Jaber Daneshamooz
Yes I'm satisfied. Because the current ABR algorithm is based on heuristics. The input domain is so big so that human can not explore all of that(overwhelming activity) and also, we can get feedback from system for our decision(aka labling or our data)
###### Aaron Jimenez
Learning-based approaches do make sense for ABR algorithms as there are most likely trends that we as humans cannot account for, but a ML model might be able to. This is especially important in unstable network environments where you must work to balance all of the various QoE requirements to make sure none of them lead the system to make a poor decision. I am fairly satisfied with the arguments given in the paper for the need for an RL system. I think that the argument they make about how RL models can learn to adapt its approach to ABR depending on the situation versus other solutions makes sense.
###### Shereen Elsayed
ABR problems are more approachable using the learning approach because the problem's inputs and outputs are somehow defined as well as the success metric (stable videos at the client's side with clear image, less bufffering, a video that doesnt disconnect). I am satisfied with the current ABR espeically for Netflix for instance when I was in aplace with very light internet connection, I could still watch it and with good resolution. The argument for inaccurate environment may not always be true. You can design the worst case environment, and this will cover all the good/average cases.
###### Samridhi Maheshwari
As evident by the case studies, robustMPC has difficulty factoring throughput fluctuations and prediction errors into its decisions, and choosing the appropriate optimization horizon. These deficiencies exist because MPC lacks an accurate model of network dynamics, thus it relies on simple and sub-optimal heuristics such as conservative throughput predictions and a small future prediction window. By contrast, RL algorithms learn from actual performance resulting due to rewards from different decisions. By incorporating this information into a policy, RL algorithms can automatically optimize for different network characteristics and QoE objectives. Thus learning based approach makes sense for ABR algorithms were current situation of network matters a lot.
###### Nagarjun Avaraddy
Learning based approach makes more sense for ABR algorithms. This argument seems fair cause its difficult to design the input space i.e. the network conditions, the buffer size and the throghput and client side info which will be simplified for non-control based models in terms of the throughput, buffer and other fixed discrete inputs which cannot generalise well.
###### Liu Kurafeeva
Approach makes sense, since it will be more adaptive them heuristics, in general I am satisfied with authors explanations, though I not satisfied with proofs that RL model will adopt nicely as well and no simpler model will work here. I think the itroduction should include that kind of discussion as well.
###### Fahed Abudayyeh
###### Alan Roddick
Learning-based approches make sense for ABR algorithms because fixed control rules require frequent and precise tuning as long as not being able to generalize or adapt to network conditions. A learning based approach will be able to continuously adapt given different network conditions.
###### Arjun Prakash
The learning-based approach makes sense because it uses information about the actual performance of past choices and optimizes its future predictions. By contrast, approaches that use fixed control rules or simplified network models are unable to optimize their bitrate decisions based on all available information about the operating environment.
###### Rhys Tracy
Yes, the paper’s arguments are satisfactory. Learning approaches clearly learn from previous behavior in a network and are adaptive, so can certainly capture distributions in the data that a human cannot. Additionally, the paper mentions that the learning model is able to generalize much better than other methods which makes sense because a ML approach learns and models general patterns in the data.
###### Shubham Talbar
Existing ABR algorithms develop fixed control rules for making bitrate decisions based on estimated network throughput, playback buffer size, or a combination of two signals. Such schemes rely on significant tuning and do not generalize to different network conditions. State-of-the-art MPC, makes bitrate decisions by solving a QoE optimization problem over a horizon of several future chunks. But MPC is sensitive to throughput prediction errors and the length of the optimization horizon. Hence, learning-based approaches which could dynamically adapt to client-side network conditions do make sense for ABR algorithms.
###### Achintya Desai
Yes. The authors mainly argue based on two motivating examples to use learning based approach. The first example is about a network scenario where the throughput is highly variable. The performance of RobustMPC which does over cautious estimation of throughput prevents it from reaching the highest bitrate possible. This is suboptimal compaered to performance of reinforcement learning algorithm which does an accurate throughput measurement and achieves the highest bitrate possible. In the second example, the authors consider a new QoE metric leaning towards HD content assigning high reward to HD bitrate and low reward to the rest. For such metric, it is expected that client's buffer should be built up to high enough level so as to switch and maintain HD bitrate as much as possible. However, robustMPC maintains medium sized playback buffer and requests chunks at bitrates that are non-HD/low HD since it does not plan enough for sustaining HD quality. However, reinfocement learning is able to achieve this to maintain HD bitrate as well as smoothness.
###### Seif Ibrahim
I believe that reinforcement learning is a good match for this problem since it is a type of problem where the actions we take affect our future decisions and there is a clear reward function which is QoE. Reinforcement learning typically does well at problems like video games where it has to take actions in order to optimize for a long-term goal -- this problem is very similar.
###### Satyam Awasthi
The learning-based approach can capture the network conditions for the deployment environment much better than heuristics or rule-based approaches can. Also, they can be made to optimize the user’s QoE and a streaming service’s business logic. Under uncertain network conditions, the QoE metrics can be prioritized according to business logic: lower rebuffering with low quality or more rebuffering with high quality, high startup delay and low buffering, etc. The RL-based approach is apt for the given problem, as it will adjust itself according to changing network conditions to meet the QoE needs.
###### Punnal Ismail Khan
The learning-based approach makes sense as there is a trend to predict. The existing approaches use heuristics that have no knowledge of trends are overly cautious at some points and don't take advantage of the available average throughput.
###### Nikunj Baid
Learning based approach makes more sense as it is almost impossible to account for the dynamic network conditions using simple heuristics or fixed set of rules, as demonstrated with RobustMPC. This is because we need to plan for far ahead in the future, which is difficult to achieve with rule based approach.
Since different users can have different ideas for a good QoE, the learning based approach can adapt to that as well and deliver accordingly. The only caveat would be that it would then turn into a black box, and it might be difficult to justify the decisions that the model would take at times.
###### Ajit Jadhav
I am satisfied with the provided arguments. Learning-based approach makes sense due to the complexity of factors involved in the decision making process and the constant changing network conditions that can be better handled using this approach.
###### Vinothini Gunasekaran
The Reinforcement Learning based approach for ABR algorithms makes more sense. This problem is complicated because of the wide range of network conditions. Using fixed control rules, the ABR algorithms are not able to make optimal bitrate decisions for the input space (buffer size, network throughput, etc.,) But, the pensieve model learns from the performance of past choices to optimize its control policy. And I agree that we need an non-programmed model because of the problem's complexity level.
###### Brian Chen
A learning-based ABR algorithm does make sense depending on the scope and cost of deployment. The paper argues that heuristic based approaches require large amounts of effort while still being specific and difficult to generalize. It also mentions MPC and states that MPC requires accurate pre-knowledge to perform its predictions. I agree with both of these points, but it should be mentioned that there is not discussion of the cost of deployment or scope of deployment regarding these two. Learning-based approaches could also be too specific depending on where and how frequently they are trained. Furthermore, the costs of such systems may not be practical in terms of both time constraints and monetary costs.
###### Nawel Alioua
One argument in favor of a learning-based approach for the ABR problem is that heuristics require to carefully tune their parameters and can backfire in terms of performance if the assumptions used during parameter tuning are no longer valid in a given scenario.
The authors gave as a first example a case where the throughput prediction of a heuristic called robustMPC is overly cautious, which prevented it from reaching higher bitrates even when the buffer occupancy was increasing, while the RL-based method showed more adaptability. The second example is comparing the bitrate selection of both robustMPC and the RL-based algorithm when using a metric that favors HD bitrate level. robustMPC was unable to apply this policy because it fails to plan far enough into the future, and maintains a medium-size buffer occupancy with a bitrate confined between 300 kbps and 1850 kbps. The RL-based approach on the other hand was more successful in implementing the policy. Overall, the authors suggest that MPC experiences those deficiencies because it lacks an accurate model of network dynamics, thus it relies on simple and sub-optimal heuristics such as conservative throughput predictions and a small optimization horizon.
#### Question 5: Describe the design of the simulator used in the paper for training?
###### Aaron Jimenez
The simulator used for training the model is meant to model the environment experienced by a real client application (just in an accelerated amount of time). It maintains a representation of a client’s playback buffer. For each each chunk that is “downloaded”, the simulator assigns a download time based on its bitrate and the input network throughput traces. It then adds and drains the playback buffer based on the download time of the chunks. The simulator keeps track of rebuffering events in a counter and in scenarios where the playback buffer cannot accommodate a chunk, it waits 500 ms before retrying.
###### Shereen Elsayed
They used chunk-level simulator which gives Pensieve the opportunity to experience 100 hours of video download in only 10 min. The simulator keeps the internal representation of the client's playback buffer. It sets the download time for the chunk based on the network throughput trace and the chunk's bitrate. It assumes that the chunk downloaded will use the entire throughput specified by the trace.
###### Samridhi Maheshwari
Pensieve’s simulator maintains an internal representation of the client’s playback buffer. For each chunk download, the simulator assigns a download time that is based on the chunk’s bitrate and the input network throughput traces. The simulator then depletes the playback buffer by the current chunk’s download time, to represent video playback, and adds the playback duration of the downloaded chunk to the buffer. The simulator carefully keeps track of rebuffering events that arise as the buffer occupancy changes, in scenarios where the chunk download time exceeds the buffer occupancy at the start of the download. In scenarios where the playback buffer cannot accommodate video from an additional chunk download, Pensieve simulator pauses requests for 500 ms before retrying.
###### Liu Kurafeeva
Simulator inclides ABR decisions experience that is speeded up so model can obtain more experience in less time. The simulator keep track of bad and good events that happens during run to create the "reward" based on it later.
It siulates playback by working with current playback buffer.
###### Alan Roddick
Pensieve’s simulator keeps track of the client’s buffer and assigns a download time for each chunk based on the bitrate. The download time for each chunk is then used to drain the client’s buffer to simulate playback. The simulator then adds this chunk duration to the buffer. Rebuffer events are managed by the simulator and various statistics are sent to the reinforcement learning model such as buffer size filled, rebuffering time, chunk download time, size of the next chunk, and number of remaining chunks. The simulator allows the model to learn 100 hours of video in 10 minutes.
###### Fahed Abudayyeh
###### Nagarjun Avaraddy
The simulator has a representation of playback buffer of client. It assigns a download time based on network and video bitrate for each chunk. The simulator then removes the content from playback buffer based on the download time assigned and adds the playback duration of the downloaded chunk to the buffer. Simulator then adds this chunk duration to the buffer. Rebuffing events are documented and RL model receives this information.
In case of buffer overflow, the simulator waits for 500ms before retrying.
###### Arjun Prakash
The simulation environment models a real client video streaming application. It maintains an internal representation of the client's playback buffer. For each chunk, the simulator assigns a download time that is based on the chunk’s bitrate and throughput traces. The simulator then drains the buffer by the current chunk’s download time and adds the playback duration of the downloaded chunk to the buffer. The rebuffering events are also tracked. In case of buffer overflow, the simulator pauses requests for 500ms before retrying. After each downloaded chunk, the simulator passes the current buffer occupancy, rebuffering time, chunk download time, size of the next chunk, and the number of remaining chunks in the video to the RL agent for processing.
###### Rhys Tracy
The simulator keeps a representation of a client’s video buffer and simulates chunk downloads filling the buffer and video play clearing the buffer. For each simulated chunk download, a download time is assigned using bitrate and simulated network throughput traces. The simulator simulates watch time during a chunk download and adds the new chunk’s time to the video buffer. The simulator also rebuffering events and situations where a client’s buffer doesn’t have enough space to download a new chunk (waits 500ms). All this simulated information is then passed to the Pensieve model.
###### Shubham Talbar
Pensieve trains ABR algorithms in a simple simulation environment that faithfully models the dynamics of video streaming with real client applications. Pensieve’s simulator maintains an internal representation of the client’s playback buffer. For each chunk download, the simulator assigns a download time that is solely based on the chunk’s bitrate and the input network throughput traces. The simulator then drains the playback buffer by the current chunk’s download time., to represent video playback during the download, and adds the playback duration of the download chunk to the buffer. The simulator carefully keeps track of rebuffering events that arise as the buffer occupancy changes i.e. scenarios where the chunk download time exceeds the buffer occupancy at the start of the download.
###### Seif Ibrahim
Pensieve uses a simulator in order to decrease training time since training one a real network would be very slow (they can train on 100 hours of video in 10 minutes with their simulator). The simularor divides the video into chunks as would the DASH server and assigns a download time to each chunk based on simulated network conditions. It also simulates the buffer and rebuffering events. Information from the simulator is passed to the RL model.
###### Achintya Desai
The paper introduces a simulation environment which replicates the video streaming with real client application. It internally maintains a representation for client's playback buffer. For every chunk, the simulator calculates download time based on bitrate and input network througput traces. It then drains the playback buffer by the current chunk's download time and adds the duration of downloaded chunk to the buffer. It keeps track of rebuffering events that might arise as the occupancy of the buffer changes. This includes scenarios where chunk download time exceeds buffer occupancy at the beginning or when playback cannot accomodate additional chunk download. It waits around 500 ms before retrying. After every chunk is downloaded, the simulator passes several state observations to the learning agent for processing such as buffer occupancy, rebuffering time, download time etc.
###### Satyam Awasthi
The simulator used by Pensive for training aims to model the real deployment environment of a client. It maintains an internal representation of the client’s playback buffer. For each downloaded chunk, the simulator assigns a download time based on bitrate and network throughput traces. The buffer is then filled and depleted using the download time for the chunks, to simulate video playback. A counter keeps track of the rebuffering event, and in cases when the playback buffer cannot hold a chunk, the simulator pauses for 500 ms before retrying.
###### Nikunj Baid
The simulator maintains an internal representation of the client’s playback buffer and each chunk is assigned a download time depending on it’s bitrate and the throughput traces. The playback buffer is drained by the current chunk’s download time while the playback duration of the downloaded chunk is added to the buffer. While doing this, and buffer events are recorded. Also, the simulator pauses for 500ms before retrying to download another chunk. The state of the system is passed on to the RL agent at regular intervals for processing. This enables the simulator to learn from 100 hours worth of video downloads in just 10 minutes.
###### Ajit Jadhav
Pensieve uses simulations over a large corpus of network traces by using a fast and simple chunk-level simulator which is faster and allows Pensieve to gain 100 hours of video downloads in only 10 minutes.
###### Vinothini Gunasekaran
In the simulation environment, it represents a client video streaming app and maintains the client's playback buffer. Based on the video chunk’s bitrate and throughput traces, the simulator assigns downloading time and adds playback duration for the downloaded video chunk. The simulator stops requests for a period of time in case of buffer overflow.
###### Brian Chen
The simulator has an internal representation of the client’s buffer. It assigns download times based on bitrate and network throughput measurements to chunks. It then drains these chunks after by the marked download time to simulate playback. The simulator adds the chunks playback time to some counter. The simulator pauses for 500ms when the buffer is noted as full and tracks various other states as well. It then sends state observations to the model after each chunk.
###### Nawel Alioua
Pensieve uses a simulator to save time on the training phase. The simulator maintains an internal representation of the client’s playback buffer. For each chunk download, the simulator assigns a download time that is solely based on the chunk’s bitrate and the input network throughput traces. The simulator then drains the playback buffer by the current chunk’s download time, to represent video playback during the download, and the playback duration of the downloaded chunk to the buffer is added. As the buffer occupancy changes, the simulator keeps track of rebuffering events that arise.
#### Question 6: Describe the input s_t taken as input by the learning agent.
###### Jaber Daneshamooz
The input is collection of some metrics related to past, current and future chunks like the download time and throughput of past chuncks, current buffer level, the remaining number of chunks etc.
###### Aaron Jimenez
The state inputs are the network throughput for measurements and download time for the last k video chunks, the m different sizes for video chunks, the remaining chunks in the video, and the bitrate of the last downloaded chunk.
###### Samridhi Maheshwari
The state St is a tuple which consists of past chunk throughput, past chunk download time, next chunk sizes, current buffer size, chunks left and last bit rate.
###### Liu Kurafeeva
s_t = (x, tau, n, b, c, l), where x is network throtughput measurements for past k chunks, tau is download time of past K chunks (time interval for throughtput measurements), n is vector of availibale sizes for next chunk, b is current buffer level, c - number of remaining chunks, l - last chunk bitrate.
###### Shereen Elsayed
S_t is the state input of chunk t, it consists of:
- x_t: throughput of the past k chunks
- Tau_t: download time of the past k chunks representing time interval of the throughput measurements
- n_t: vector of the available sizes of the next video chunk
- b_t: current buffer level
- c_t: number of chunks remianing in the video
- b_t: bitrate of the last downloaded chunk
###### Alan Roddick
This is the state that is passed from the simulator that was described above: buffer size filled, rebuffering time, chunk download time, size of the next chunk, and number of remaining chunks.
###### Fahed Abudayyeh
The input takes into account the selected bitrate of the last chunk, the number of chunks remaining, the current buffer level, the potential sizes for the chunk to send to the buffer, and throughput measurements including times for previously downloaded chunks.
###### Nagarjun Avaraddy
The input s_t is a 6 dimensional vector which has past chunk throughput, past chunk download time, next chunk size, the remaining chunks, the bitrate of the last downloaded chunk, the different size for video chunks.
###### Arjun Prakash
The model takes state st as its input. It consists of the network throughput measurements for the past k video chunks, the download time of the past k video chunks, a vector of m available sizes for the next video chunk, the current buffer level, the number of chunks remaining in the video and the bitrate at which the last chunk was downloaded.
###### Rhys Tracy
S_t represents the state inputs to the model (current state of the network/client application). These inputs include network throughput for previous chunks, download time for previous chunks, a vector of available sizes for the next chunk, current buffer level, number of chunks remaining in the video, and the bitrate for the previous chunk.
###### Shubham Talbar
Input s_t is composed of (xt, τt, nt, bt, ct, lt)
1. xt is the network throughput measurements for the past k video chunks
2. τt is the download time of the past k video chunks, which represents the time interval of the throughput measurements
3. nt is a vector of m available sizes for the next video chunk
4. bt is the current buffer level
5. ct is the number of chunks remaining in the video
6. lt is the bitrate at which the last chunk was downloaded
###### Seif Ibrahim
s_t is a 6-tuple representing the current video stats measured on the client side (e.g. like stats for nerds on Youtube). It has information like throughput, download times, chunck sizes, buffer level, bitrate, and number of remaining chunks in the video.
###### Satyam Awasthi
Input s_t is a tuple consisting of x_t (throughput of the past k chunks), Tau_t (download time of the past k chunks), n_t (vector of the available sizes of the next video chunk), c_t (number of chunks remaining in the video), b_t (current buffer level), and l_t (bitrate of the last downloaded chunk).
###### Achintya Desai
Input s_t is composed as a tuple (x_t, tau_t, n_t, b_t, c_t, l_t) where x_t is the network throughput measurements for the past k video chunks, tau_t is the download time of the past k video chunks, which represents the time interval of the throughput measurements, n_t is a vector of m available sizes for the next video chunk, b_t is the current buffer level, c_t is the number of chunks remaining in the video, and l_t is the bitrate at which the last chunk was downloaded.
###### Punnal Ismail Khan
xt: Network throughput measurements for the past k video chunks.
τt: Download time of the past k video chunks, which represents the time interval of the throughput measurements.
nt: vector of m available sizes for the next video chunk.
bt: current buffer level.
ct: is the number of chunks remaining in the video.
lt: bitrate at which the last chunk was downloaded
###### Nikunj Baid
The input vector s_t includes details about the past k chunks : network throughput, download time, vector of the available sizes for the next video chunk, current buffer level, number of chunks remaining in the video, bitrate at which the last chunk was downloaded.
###### Ajit Jadhav
The input consists of the following: network throughput measurements for the past k video chunks, download time of the past k video chunks, vector of size m of available choices for the next video chunk, current buffer level, number of chunks remaining in the video, bitrate at which the last chunk was downloaded.
###### Vinothini Gunasekaran
Each chunk is represented as t and state input is represented as s_t. A state input is a tuple consisting of the following network details.
S_t = (x_t, tau_t, n_t, b_t, c_t, l_t)
Whereas x_t is network throughput, tau_t is download time of past video chunks, n_t is vector of available sizes for next video chunk, b_t is current buffer level, c_t is remaining chunks count and l_t is downloaded chunk’s bitrate.
###### Brian Chen
s_t are states of chunks and are divided into six sections: network throughput measurements for the past k video chunks, download time of the past k video chunks, a vector of m available sizes for the next video chunk, the current buffer level, the number of chunks remaining in the video, and the bitrate at which the last chunk was downloaded.
###### Nawel Alioua
St is the state input of the form st = (xt; Tt ; nt; bt; ct; lt) where:
- xt is the network throughput measurements for the past k video chunks;
- Tt is the download time of the past k video chunks, which represents the time interval of the throughput measurements;
- nt is a vector of m available sizes for the next video chunk;
- bt is the current buffer level;
- ct is the number of chunks remaining in the video; and
- lt is the bitrate at which the last chunk was downloaded.
#### Question 7: Describe how policy gradient training works for this problem. How A(s,a) is estimated?
###### Aaron Jimenez
Based on the advantage and policy calculated for each state-action pair and learning rate, the actor network parameter ϴ is update updated. The advantage, A(s,a), for each state-action pair is calculated by taking the reward at t and adding it to the estimated total reward for states t and the discount rate times the estimated total reward for the state at t+1.
###### Samridhi Maheshwari
The idea in policy gradient methods is to estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. A(s,a) is the advantage function, which represents the difference in the expected total reward and reward for actions drawn from a specific policy deterministically. The advantage function tells us how much better a specific action is compared to the average action taken according to a specific policy.
###### Liu Kurafeeva
Policy gradient training gets the raward for past chuncks (based on specific QoE). Method tries to estimate discounted reward with respect to policy parameters. Each action decided based on the accumulative discounted reward estimation (by calculating advantage functio ), and after obtaining reward for each step the parameters of the actor model apdated accordingly to the forumula, which can be explained as updates in directions that will increase pi (policy). The A(s,a) is advantage function that provides the numerical explanation of how expected total reward differs in case we select the action deterministically (for each state) and in case we pick action from policy pi.
###### Shereen Elsayed
The policy gradient method is to estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. The advantage function is the difference between the expected total reward and the action (when picked deterministically) compared with the expected reward for actions drawn from policy. The estimate value funtion need to be calculated first (the estimated total reward starting at state s and following the policy PI)
###### Alan Roddick
The policy gradient method words to estimate the gradient of the expected total reward by analyzing the paths of executions that are obtained by following that policy. A(s, a) is estimated by the agent taking a sample of the bitrate decisions from the policy and using that to compute an unbiased estimate.
###### Nagarjun Avaraddy
Policy gradient training works by optimizing for the best possible reward for execution path by following the policy. A(s,a) is estimated by estimating the total reward difference between the action selected for each state and the action selected by policy.
###### Arjun Prakash
This paper uses an actor-critic algorithm to train its policy. The main idea here is to estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. They use the advantage functions and the probability of the action to determine how the parameter is updated in the actor-network. A(s,a) is calculated using the value function which is the expected total reward starting from a specific state and following the policy. The estimate of this value function is learned from the critic network.
###### Rhys Tracy
A(s,a) represents how much better an action is from a given state as compared to the standard action for the current policy at that given state. It is estimated by using the reward for action a at state s, adding an estimation for the total future reward from the resulting state following taking action a at state s (times a decay value), and subtracting an estimation for the total future reward for using the expected action at state s.
###### Shubham Talbar
A(s,a) is the advantage function, which represents the difference in the expected total reward when we deterministically pick action a in state s, compared with the expected reward for actions drawn from policy. The advantage function encodes how much better a specific action is compared to the “average action” taken according to the policy.
###### Seif Ibrahim
A(s,a) is the "advantage function" it represents how much better the action taken is than the average action taken by the network. They use something called a critic network which itself needs to be trained to give us an estimate of how good the action we just took was. If the action is good or bad then the current policy is tuned accordingly.
###### Satyam Awasthi
The policy gradient method estimates the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. The advantage, A(s,a), for each state-action pair is estimated by taking the reward at t and adding it to the estimated total reward for states t. The advantage function tells us how much better a specific action is compared to the average action taken according to a specific policy.
###### Achintya Desai
The paper uses actor-critic algorithm, which is a policy gradient method, to train Pensieve policy. The main idea is to estimate the gradient of the expected total reward by observing the execution trajectory from the policy. The authors calculate an advantage function which represents the difference between expected total rewards when action is picked deterministically and when action comes from policy given. A(s,a) is calculated by estimating the value function which is expected total reward starting at the state s and following given policy.
###### Punnal Ismail Khan
The main idea is to use reward-based training. The reward or penalty depends on three factors: We reward a higher bit rate, we penalize when the video gets rebuffering happens, and we penalize if the video is not smooth. The goal of the training model is is maximize reward. A(s, a) tells how good the action(changing bit rate) is given the state. A(s,a) is estimated by adding a reward to the total reward for a given state.
###### Nikunj Baid
The gradient training depends on the trajectory of the executions obtained for the given policy. The action taken is compared with the average action and depending on whether it has performed better or worse, the model is tuned. This is the main idea behind the critic algorithm that the policy is based on.
###### Ajit Jadhav
Policy gradient method estimates the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. A(s,a) or the advantage function, which represents the difference in the expected total reward when we deterministically pick action a in state s, compared with the expected reward for actions drawn from policy πθ . It encodes how much better a specific action is compared to the “average action” taken according to the policy.
###### Vinothini Gunasekaran
Pensieve uses an actor-critic algorithm to train its policy. The policy gradient methods’ key idea is to estimate the gradient of the expected total reward by observing the trajectories of executions that are obtained by following the policy. A(s,a) is the advantage function that represents the difference between the expected total reward picked in state ‘s’ for action ‘a’ and the expected reward drawn from the policy. They followed the standard temporal difference method to train the critic network parameters.
###### Brian Chen
The training in this problem works by performing a policy and then seeing how that policy affects the state of the chunks. A(s,a) is estimated by training a critic model that estimates the value of a policy. This estimate is what is used to compute differences between policies.
###### Nawel Alioua
The key idea is to estimate the gradient of the expected total reward by observing the trajectories of executions obtained by following the policy. The gradient if function of A(s; a) which is the advantage function, representing the difference in the expected total reward when action a in state s is deterministically picked, compared with the expected reward for actions drawn from policy pi(st, at). The advantage function encodes how much better a specific action is compared to the “average action” taken according to the policy.
#### Question 8: Why enhancement is required to generalize the learning model across multiple videos? What enhancement techniques are used in this paper?
###### Aaron Jimenez
The reason enhancement is necessary is because videos can be encoded in multiple different sets of bitrates. Some videos may not have all of the same bitrate encodings as other videos. If the input was not enhanced, then the NN must be designed to take in variable input sizes, which can be rather difficult and time consuming. As such, in this paper the authors decided to use two different enhancement techniques, masking to mask out the bitrates not available for said video and using a modified softmax function for the final NN output to normalize the probabilities after masking out the unavailable bitrates.
###### Samridhi Maheshwari
Enhancement is required because there can be different bitrate levels for videos and this would mean that multiple neural networks will be required to cover all possible levels. This solution is not scalable. In the paper, the enhancement technique is two fold - first they map the input chunk and map it to the closest possible bitrate to have 1/0 value for the bitrate that is present and that is not present. Second, they mask the outputs to only have bitrates that are actually present in the input chunks and represent the output as well in a 0-1 vector. This enables the backpropogation properties of NN to still hold true.
###### Liu Kurafeeva
For each different video the input parameters would differs (since vidoes can be encoded in at different bitrate). Training the model for each possible set of properties is impossible that why enchantments are introduced. First, they prepare possible levels for possible bitrates and for each video zeroed out the ones that not persist for current video. Second they apply zero-one mask to the softmax output to work only with possible bitrates (softmax modified but still softmax). The effectiveness of these suggestionsare evaluated.
###### Jaber Daneshamooz
We need enhancement cause the data is not homogenious. For example, some videoes do not have all of the encodings and alos, the bitrate is different among the videoes and also among the video chuncks.
###### Shereen Elsayed
Problems to the current algorithm is the different chunck size due to encoding at different bitrates. The I/O of the NN should be all of the same size, this would require several NN to handle this variation. The authors suggested two enhancements:
1. Set the maximum bitrate level as I/O format. To to determine the input state for a specific video, we take the chunk sizes and map them to the index which has the closest bitrate. The remaining input states, which pertain to the bitrates that the video does not support, are zeroed out.
2. How the output of the actor network is interpreted. They presented the masking technique to the output of the final layer so that output probability distribution is only over the bitrates that the video actually supports. With this modification, the output probability is still a cont. function and back propagation of the gradient in the NN still holds and the algorithm can go without any modifictions
###### Alan Roddick
Enhancement is required for generalization because videos can be encoded at different bitrates, and therefore the chunk sizes can be different within the same resolution video (variable bitrate). The first enhancement is to pick input and output formats that span the entire range of bitrates that are expected to be seen. The second enhancement is the output of the final softmax layer in the network masked to only allow bitrates supported by the video. This mask is independent from the network parameters and therefore allows for back-propagation.
###### Nagarjun Avaraddy
Enhancement is required to generalize the video bitrates space; as the videos have varying bitrate and resolutions in many video streaming services. They map the video to existing bitrates and the other bitrates are zeroed out in input. They also handle the output vector by applying a mask to ensure only one of the valid bitrates for the video is selected output.
###### Arjun Prakash
Enhancement is required because videos can be encoded at different bitrate levels and may have diverse chunk sizes due to variable bitrate encoding. Without the enhancement, we might have to train multiple neural networks for each bitrate level.
Pensieve uses two enhancements to generate a single model to handle multiple videos. First, to determine the input state for a specific video, they take the chunk sizes and map them to the index which has the closest bitrate. The remaining input states, which pertain to the bitrates that the video does not support, are zeroed out. Second, they apply a mask to the output of the final softmax layer in the actor-network, such that the output probability distribution is only over the bitrates that the video supports.
###### Rhys Tracy
Two of the biggest challenges are that different videos can have different ranges of bitrates allowed and can have huge variations in chunk sizes thanks to VBR. The paper tries to address issues by having a set range of 13 possible video qualities, then mapping incoming chunk sizes to the closest video quality. Additionally, the paper modifies the model’s softmax output function to only output video qualities that the given video can support (making use of a one-hot vector that represents a certain video’s available qualities).
###### Shubham Talbar
The primary challenge is that videos can be encoded at different bitrate levels and may have diverse chunk sizes due to variable bitrate encoding e.g. chunk sizes for 720p video are not identical across videos. Handling this variation would require each neural network to take a variable sized set of inputs and produce a variable sized set of outputs. The naive solution to supporting a broad range of videos is to train a model for each possible set of video properties. Unfortunately, this solution is not scalable. To overcome this, we describe two enhancements to the basic algorithm that enable Pensieve to generate a single model to handle multiple videos.
###### Seif Ibrahim
Different videos are encodedently and at different bitrates and chunk sizes. The authors use two enhancement techniques to allow the model to generalize. The first technique is a modificaiton to the inputs which makes it so that bitrates are mapped to the closest bitrate out of some set of pre-chosen bitrates. Then at the output layer they apply a transformation such that the output distribution is only over bitrates that the video supports.
###### Satyam Awasthi
Videos can be encoded using different bitrates, thus multiple neural network models will be required to cover all of them. Training the models for each set of bitrate is not feasible and so, enhancement is used. It involves two steps - first, the input chunk is mapped to the closest possible bitrate. A 0 or 1 value is given to the bitrates depending on their presence in the video. Second, they apply a 0-1 mask to the softmax output to work only with possible bitrates. This enables backpropagation of the gradient in the NN.
###### Punnal Ismail Khan
Enhancements are required because videos can be encoded at different bitrates, and may have diverse chunk sizes due to variable bitrate encodings.
First to determine the input state for a specific video they take the chunk sizes and map them to the index with the closest bitrate. The remaining input states that the video does not support are zeroed out.
Second, they applied a mask to the output of the final softmax layer so that the output probability distribution is only over the bitrates that the video supports.
###### Nikunj Baid
This is because different videos of the same resolution can be encoded using different bitrates and may have different chunk sizes, due to VBR. Training individual models using all the possible combinations of video properties, with varying number of input features and output vectors is not ideal nor scalable.
To enable a single model to get the job done, we can use the following two enhancements.
- we can have buckets of predefined levels of bitrates, and then map the chunk of the given video to the closest bitrate amongst these levels, while the bitrates that the video does not support are zeroed out.
- To determine how the output vector is interpreted, a mask is applied to output of the final softmax layer in the actor network, such that only one of the supported bitrates is chosen as the outcome.
###### Achintya Desai
In practical scenarios, videos can be encoded at different bitrates and might have diverse chunk size due to variable bitrate encoding. This requires each neural network to take a variable sized set of inputs and prodcue a variable sized set of outputs. In the basic solution, multiple video support can be provided by training a model for each possible set of video properties. However, this is not a scalable solution. Following two enhancements are done to generate single model to handle multiple videos
- Authors select canonical input and output formats to obtain the span of minimum and maximum bitrate levels. For a specific video input state is determined by mapping the chunk sizes to index with closes bitrate. The rest input states which pertain to bitrates that video does not support
- For a given video, they mask the output to final softmax layer in the actor network such that the output probability distribution is only over the bitrates that the video actually supports. This also ensures that the standard back-propagation of gradient in NN still holds.
###### Ajit Jadhav
To be able to handle different bitrate levels for videos by covering all possible levels, enhancement is required. This is achieved by using all possible levels for the input but masking the output to use only the possible levels for the video.
###### Vinothini Gunasekaran
The videos can be encoded at different bitrate levels and may have diverse chunk sizes due to variable bitrate encoding. The proposed solution is not scalable for a broad range of videos. So the authors are discussing two enhancements. First, picking canonical input and output formats that span the maximum number of bitrate levels that we can expect in practice. Second, changing the interpretation of the actor network’s output which is to apply a mask to the output to modify output probability distribution.
###### Brian Chen
Variable bitrate across video chunks means that each video has the potential to be wildly different from any other video despite being at similar chunks sequence wise. Some kind of enhancement to the algorithm is necessary as a workaround. The first enhancement they made was altering the input to support indexing 13 distinct tiers of bitrate levels. The second enhancement has to do with the output. The bitrates that the video does not support are masked so that the probability distribution only reflects relevant resolutions.
###### Nawel Alioua
Enhancement is required because of practical issues such as the diversity of chunk sizes due to variable bitrate encoding. Handling this variation would require each neural network to take a variable sized set of inputs and produce a variable sized set of outputs, and training a model with each set of different properties is not a scalable solution. Two enhancements were proposed: First, picking canonical input and output formats that span the maximum number of bitrate levels expect to appear in practice. The second enhancement is to apply a mask to the output of the final softmax layer in the actor network to make the output probability distribution only over the bitrates that the video actually supports.
#### Question 9: What's the meta story for evaluations? Does it justifies all the design choices with empirical results?
###### Aaron Jimenez
The meta story for the evaluations is proving that even with a synthetic dataset and emulated network conditions for training, the Pensive model can still compete with an outperform preexisting ABR systems with respectable margins in both emulated benchmarks and real world conditions. Thus justifying the design decisions for model architecture and training chosen by the authors.
###### Samridhi Maheshwari
The meta story for the evaluations is that with emulated network conditions as well for training the model, the Pensive model can still outperform existing ABR algorithms. This justifies the design choices of the authors.
###### Jaber Daneshamooz
Pensive outperforms the state of the art ABR solutions. It can be trained and work even on synthetic dataset. The result justifies the design choices but we can not say it's the optimal solution.
###### Liu Kurafeeva
Evaluations support the idea of general cases and cases where current ABR technics preform poorly (different QoE objectives). Also evaluations uses offline optimal policy and shows that suggested approach are very close o the optimal. It justifies made disign choices, though I still qestioning the perfomance and instalation costs with current robustMPC - maybe we do not need to chaise that tail.
###### Shereen Elsayed
Their evaluation compared Pensieve with the state of the art ABR algorithms and outperformed them. They tried to cover all the conditions and the environments to ensure a realistic comparison. Although they justified their choices by outperforming, but I am wondering in real deployments and with a huge traffic of packets, will it give the same results?
###### Alan Roddick
The evaluations show that Pensieve outperforms existing ABR algorithms on both broadband and 3G/HSDPA networks for every QoE metric that was tested. The authors justify their design choices with their results.
###### Nagarjun Avaraddy
Evaluations show that the general use cases of the ABR prediction algorithm performs better than state of art ABR algorithms for different QoE objectives. It does justify all their design choices.
###### Arjun Prakash
Pensieve can outperform the existing state-of-the-art ABR techniques even when trained using synthetic datasets and emulated network conditions. Thus justifying their design choices.
###### Rhys Tracy
The paper shows that even when trained on simulated data, the learning model can outperform current ABR algorithms on real data (and performed very similar to pensieve trained on real data). Additionally, when trained on a certain network, it performed noticeably better than every pre-existing ABR approach tested on that network.
###### Shubham Talbar
Pensieve outperforms existing state-of-the-art ABR algorithms even on synthetic dataset and emulated network conditions. Hence, the authors justify the design choices with the empirical results.
###### Satyam Awasthi
The meta story in the paper is that even with a synthetic dataset and emulated network conditions Pensieve outperforms existing ABR algorithms for every QoE metric that was tested. This justifies the design choices of the authors.
###### Seif Ibrahim
The paper shows that the RL model can outperform existing algorithms by 10-15% at controlling ABR to improve QoE. This is because models based on heuristics do not adapt well to network conditions.
###### Punnal Ismail Khan
They showed that even after the model was trained on a synthetic dataset it still outperformed other existing ABR algorithms. This justifies the design choices.
###### Nikunj Baid
The proposed model, even though having been trained under emulated conditions and synthetic datasets, has outperformed the existing state of the art models and optimized the ABR determining strategy significantly. Hence, I think that the design choices are justified.
###### Ajit Jadhav
The meta story for evaluations is that even with synthetic dataset and emulated network conditions, the Pensieve model comfortably outperformed the existing ABR algorithms. This helps in justifying the design choices through the results.
###### Achintya Desai
In terms of QoE, pensieve outperforms the best existing schemes with average QoE performance in the range 12.1%- 24.6%. Additionally Pensieve is able to maintain high performance even in new network conditions and new network properties. Also, the performance of Pensieve is largely unaffected by parameters such as neural network architecture and latencey between ABR server and video client. It does justify the design choices made by the authors especially in training.
###### Vinothini Gunasekaran
The paper does performance testing on both Pensieve model and state-of-the-art ABR algorithms using synthetic data. Pensieve outperforms with improvements in average QoE perfomance by 12%-25% for all considered scenarios.
###### Brian Chen
The evaluation section aimed to answer three questions: how does Pensieve compare to other ABR algorithms, is Pensieve generalizable, and how sensitive is Pensieve to various parameters. In a way, by answering these questions with their experiment, the paper does justify the design choices.
#### Question 10: How is this paper using network traces and Mahimahi tool for evaluation?
###### Aaron Jimenez
This paper is using network traces and Mahimahi in order to simulate a network environment using the broadband (provided by the FCC) and 3G/HSDPA (collected in Norway) network traffic datasets. Mahimahi is then used to emulate the network conditions from the corpus, along with an 80 ms RTT between client and server.
###### Samridhi Maheshwari
This paper uses network traces by combining public datasets - broadband data by FCC, 3G/HSDPA data collected in norway. They generated 1000 traces for their corpus with a duration of 320 seconds for each trace for the FCC dataset. For HSDPA they used a sliding window technique to match the FCC data. They reformatted the throughput traces to be compatible with the mahimahi networking emulation tool.
###### Jaber Daneshamooz
Mahimahi emulates the nwetork condition. It was used to emulate the network conditions that were taken from the network traces for broadband and 3G/HSDPA network datasets. Mahimahi also adds an 80 ms round trip time between the client and server.
###### Liu Kurafeeva
Mahimahi tool along with network traces (several public datasets) used to simulate the network conditions that is alike the real one. They had 1000 traces, each 320 seconds. In most cases 80% of this set were used to train model and 20% to test all the comparing models.
###### Alan Roddick
Mahimahi was used to emulate the network conditions that was taken from the network traces for broadband and 3G/HSDPA network datasets. Mahimahi also adds an 80 ms round trip time between the client and server.
###### Nagarjun Avaraddy
This paper is using network traces and Mahimahi tool to simulate network environment using the broadband and 3G/HSDPA network traffic datasets. Mahimahi is then used to emulate the network conditions from the corpus, along with an 80 ms RTT between client and server. They generated 1000 traces with a duration of 320 seconds for each trace.
###### Shereen Elsayed
They used network traces dataset from a combination between a broadband dataset provided by the FCC and a 3G/HSDPA mobile dataset collected in Norway. Mahimahi is an emulation tool, they formatted their dataset to be compatable with it. It emulated the network conditions from our corpus of network traces, along with an 80 ms RTT, between the client and server.
###### Arjun Prakash
They created a corpus of network traces by combining datasets provided by the FCC and a 3G/HSDPA mobile dataset. They generated 1000 traces from both datasets, each with a duration of 320 seconds. And they reformatted throughput traces from both datasets to be compatible with the Mahimahi network emulation tool.
###### Rhys Tracy
The paper uses network traces and Mahimahi to emulate network conditions for two different networks: a FCC broadband network and a 3G/HSDPA network from Norway. Network traces capture the general conditions, then Mahimahi is used to simulate those conditions.
###### Shubham Talbar
Mahimahi is a network emulation tool that was used to emulate the network conditions that were taken from the network traces for broadband and 3G/HSDPA network datasets. Mahimahi also adds an 80 ms round trip time between the client and server.
###### Satyam Awasthi
The paper uses network traces from several public datasets for broadband and 3G/HSDPA network in Norway. For their corpus, they generated 1000 traces with a duration of 320 seconds for each trace. And they reformatted throughput traces from both datasets to be compatible with the Mahimahi network emulation tool.
###### Seif Ibrahim
The paper gathers network traces from multiple datasets. They then feed those traces to Mahimahi to do network emulation.
###### Punnal Ismail Khan
Mahimahi was used to emulate the network conditions from the corpus of network traces data. 80 ms RTT was used between client and server. The data used was FCC and 3G/HSDPA mobile datasets.
###### Nikunj Baid
Mahimahi can be used to emulate network conditions. The two network conditions emulated here are : FCC broadband network and 3G/HSDPA in Norway. The network traces were reformatted to be made compatible with Mahimahi, and they used a 80-20 ratio for the training/testing of the model. The only traces that were considered were the ones whose average throughput was in the range of 0.2-6 Mbps to avoid scenarios where bitrate selection will be trivial. They generated 1000 traces for their corpus, each having a duration of 320 seconds.
###### Ajit Jadhav
A corpus of network traces by combining several public datasets: a broadband dataset provided by the FCC and a 3G/HSDPA mobile dataset collected in Norway was used. The paper simulates network conditions using Mahimahi and network conditions for evaluation.
###### Achintya Desai
Mahimahi tool is used to emulate network conditions based on network traces. They combined a broadband dataset and a 3G/HSDPA mobile dataset to generate corpus of network traces which mahimahi tool uses to emulate network conditions. They randomly select traces from we browsing category and concatenate them to generate 1000 network traces. They use sliding window to generate 1000 network traces from HSDPA dataset. They also consider original traces when throughput is in range 0.2-6 mbps to avoid trivial choices of bitrate. 80% of network corpus was used for training and 20% for testing which were about 30 hours of network traces.
###### Vinothini Gunasekaran
They created a corpus of network traces by combining several public datasets. They reformatted the throughput traces to be compatible with MahiMahi. Then, they used MahiMahi to emulate the network conditions from their corpus of network traces along with 80 ms RTT between the server and client.
###### Brian Chen
The paper used network traces to source their dataset for the model. Mahimahi was used to emulate the network based off of these traces.
###### Nawel Alioua
The authors combined several public datasets (network traces) a broadband dataset provided by the FCC and a 3G/HSDPA mobile dataset collected in Norway. Then, the obtained traces were reformatted from both datasets to be compatible with the Mahimahi network emulation tool.
#### Question 11: How is this paper demonstrating the generalizability of the proposed solution? Is it making a strong case?
###### Aaron Jimenez
This paper makes a strong case for generalizability by showing how the system can compete and outperform already established systems when trained on unrelated network data and a synthetic network dataset. Furthermore, when testing in the real world the authors demonstrated that their system can still outperform other ABR systems on QoE_lin.
###### Samridhi Maheshwari
To demonstrate the generalisability of Pensieve, the authors challenge themselves with making a synthetic dataset and training the model on it. They designed a dataset to cover a broad range of networking conditions with throughput ranging from 0.2mbps to 4.3mbps. THe dataset was generated by using a Markovian model. State transitions in the model were performed at 1s granularity. They then compared pensieve to 2 ABR algorithms on the synthetic dataset and showed that the model trained on synthetic data outperformed robustMPC. This suggests that Pensieve may be able to outperform even while using synthetic data since the results obtained by real data and synthetic data were almost similar.
###### Liu Kurafeeva
Firstly, authors support generalizability using real-world experiments (verizon celluar network, publiv wifi, wide area of network). Though authors can use larger variety of devices and explain the choice of differnet network conditions (differnt times of day, add satelites etc?)
Secondly authors showed that Pensieve pretrained on fully syntetic dataset (but vide variety of network conditions) outpreforms other algorithms on real-worls datasets. This should show that Pensieve adopts well (though for me it can be just very good syntetic dataset and that is all)
Thirdly, authors used multiple videos (1000 syntetic videos). The videos ranges are quite impressive, though I would rather prefer real-life testing vidoes as well.
Also authors showed the Pensieve generalization ability by using Pensieve in imperfect simulations.
###### Jaber Daneshamooz
In order to show the generalizability, they compare their solution with the state of the art solutions; They performed this experiment on a synthetic dataset created by markov chain rules. The result of comparison showed that Pensive is better than current ABR. We can not definintly say that this type of generalization is true cause maybe the randomly generated dataset can not resemble the network.
###### Alan Roddick
The paper does show how Pensieve outperforms other ABR systems on two different datasets. It is somewhat unclear whether this technique would always perform better under various network conditions different from those in the datasets. For example, networks that have many dropped packets at various times is different than a constant speed network.
###### Shereen Elsayed
The authors tested their model over a huge varity of videos and over a real deployment
###### Nagarjun Avaraddy
Yes, the model does generalize well and its is demonstrated by the idea of having various bitrate video during training and the training done without data. Synthetic data generated using Markovian models and still outperforms the state of art ABR algorithms.
###### Rhys Tracy
The paper both tests the generalizability of the model across different video types and when trained without data. To test the ability of the model to generalize when data isn’t available, the model is trained on completely synthetic data and the results still show that for the most part it improves over all other ABR algorithms. To test generalizability across videos types, the paper uses synthetic videos with many random properties (such as the types and number of available bitrate encodings or calculating chunk sizes randomly with white noise); when comparing results for the multi-video model trained on this dataset to the single video model trained on another dataset, they were very similar in performance with the single video model only performing slightly better. Overall, yes it makes a strong case.
###### Shubham Talbar
The authors demonstrate the generalisability of Pensieve, by training the model on completely synthetic data generated using the Markovian model and show that it still outperforms existing state-of-the-art ABR algorithms (trained on real-world dataset) in terms of the QoE metrics. Hence, Pensieve adapts well to client-side custom network conditions and makes a strong case.
###### Satyam Awasthi
For demonstrating the generalisability of Pensieve, the model is trained on completely synthetic data generated using the Markovian model and show that it still outperforms existing state of the art ABR algorithms (trained on real-world dataset) in terms of the QoE metrics. Thus, Pensieve adopts well to the network conditions and makes a strong case.
###### Seif Ibrahim
The paper tests with a variety of videos and they also simulate various network conditions when it comes to throughput, latency, and packet loss. This is evidence that their model may generalize well to real networks.
###### Punnal Ismail Khan
They trained their model using synthetic data and then tested it on real-world data. They still obtained better results than robustMPC. This shows that the model trained on a synthetic dataset can generalize to different real-world data.
###### Nikunj Baid
The generalizability of the model is demonstrated from the fact it can adapt to different video types, and even though the training data is synthetic and emulated, it does outperform the existing ABR solutions. Also, random distortions are including in the emulated data as well to come close to the real world setting. Based on the tests performed and their outcomes, it does make a strong case in my opinion.
###### Ajit Jadhav
Yes, it does make a strong case by providing performance comparison across a large variety of datasets that includes data across varying network conditions.
###### Vinothini Gunasekaran
The authors used a wide range of videos for testing and showed how the model outperforms the existing ABR algorithms on synthetic data. Also, they have used the proposed model on real world data which demonstrates the generalizability of the proposed solution.
###### Achintya Desai
Yes it makes a strong case for demonstrating generalization due to its performance numbers compared to the state of the art solutions especially in network scenarios which are completely new to the model. During training phase it faces a large enough variety of network conditions which seems enough based on the two experiments conducted by the authors. They test Pensieve over Verizon LTE cellular, public wifi and wide area network between Shanghai and Boston. In all three cases, Pensieve outperformed other algorithms. In second experiment, authors use markovian model to generate a new dataset with throughputs in range 0.2-4.3 mbps. The model which trained on this new synthetic dataset were still able to outperform traditional ABR algorithms.
###### Brian Chen
The paper conducts three experiments to demonstrate generalizability of Pensieve. The first one tests its ability to adapt to different networks. The second one tests it ability to adapt to inconsistent network conditions. The third one tests its ability to adapt to different video properties. I feel that the experimental design makes sense, but I’m not convinced by the results. I feel that if the paper is testing generalizability, then it needs to create broader datasets and perform more comparisons. Especially in the case of synthetic data. If the data is already generated and not authentic, then why not push the boundaries of reasonable networks? Going up to 4.3Mbps hardly seems like much of a range.
###### Nawel Alioua
The approach to show generalizability consists in first evaluating Pensieve in the wild on two real networks, and second training to perform across multiple environments using a purely synthetic dataset. The real world experiments were performed on three datasets: the Verizon LTE cellular network, a public WiFi network at a local coffee shop, and the wide area network between Shanghai and Boston. The synthetic dataset on the other hand was generated using a Markovian model in which each state represented an average throughput ranging from 0.2 Mbps to 4.3 Mbps. The authors suggest that Pensieve’s ABR algorithm that was trained on the synthetic dataset is able to generalize across these new networks, outperforming robustMPC and achieving better QoE values.
#### Question 12: How will you improve this work? What questions are left unanswered in this paper?
###### Aaron Jimenez
One of the main questions that I still had when reading this paper was in relation to the fact that the ABR decisions were done by a separate server. This raises the question of how cost effective and scalable this system will be for streaming providers as they now have to do that extra processing that is usually done by the client. If this is the case, why does the server even have to send the client ABR prediction only for the client to then send the request? Couldn’t they just automatically send the predicted chunk without informing the client beforehand? Perhaps this wouldn’t be such an issue if the model ran on the client’s own machine. This would most likely require downsizing the model for mobile and edge devices who may not have the compute power to run the model in a timely manner given their power consumption constraints.
###### Samridhi Maheshwari
The paper mentions how there is a gap between offline optimal and Pensieve performance. Maybe there can be an improvement there. The other improvement could be making scalable deployments on clients - where the server can send the trained model to the client, and the client can just run the model predictions without having to train the model (because of limited resources). This way, no information would need to be exchanged with Server and Client as this could lead to privacy violations.
###### Alan Roddick
I think testing this system on an even larger and diverse set of network conditions would be a good start. If we can show that Pensieve still outperforms other state of the art methods even on the edges of the network distribution, it is a very strong case that it should be deployed in production environments. In order to offload work on the server, more work could be done to look at allowing the clients to make ABR decisions instead of sending network observations to a server for it to make on the client’s behalf.
###### Nagarjun Avaraddy
Interpretable solutions are better for the real world scenario, which Pensive lacks. The other case is that the ABR solution is server located and processing intensive, which leads to scalability, network reliability and processing cost issues.
###### Jaber Daneshamooz
Using better and more general datasets which cover all types for network(high entropy) may increase the accuracy of the model.
###### Liu Kurafeeva
I think this paper will benifit from extensive testing on differnet network conditions (statelites, differnet times of day) and different devices (is it alsways benifitial to connect to the server? Can it be cases where costs of this is too much? can we put the model on device, how costly will it be?).
The cost and perfomance issues were not deeply discussed in the paper, which leaves some important questions.
Maybe some futher explorations abput different QoE calculations (which were mentioned, but still) can proove if Pensive is as adaptive as it is.
###### Rhys Tracy
First, pensieve is still 10-15% away from optimal, which makes me question whether a deep machine learning model can better understand the relationship between the given features and QoE, or if there are more features that can give better insight into QoE and get closer to optimal. Second, I would have liked to test pensieve on more types of networks to further test generalizability (besides just using synthetic data).
###### Shubham Talbar
The authors propose a RL-based ABR algorithm generation which is an offline task. That is Pensieve uses the ABR algorithm generated a priori (during a training phase) and is never modified after deployment. However, Pensieve can naturally support an approach in which an
ABR algorithm is generated or updated periodically as new data arrives. This technique would enable ABR algorithms to further adapt to the exact conditions that video clients are experiencing at a given time. The extreme version of this approach is to train online directly on the video client. However, online training on video clients raises two challenges. First, it increases the computational overhead for the client. Second, it requires algorithms that can learn from small amounts of data and converge to a good policy quickly. Retraining frequency depends on how quickly new network behaviors emerge to which existing models do not generalize.
###### Satyam Awasthi
The way a client finally gets the ABR predictions involves a server sending it each such prediction. However, this approach might not be feasible if there are multiple clients and all in different network conditions. This might lead to delayed decisions and unreliability. To make the system more scalable, maybe the models can be run on the client itself. Also, the paper would make a stronger case had it used different networks with varying conditions.
###### Seif Ibrahim
I wonder how pensieve would perform in real networks and in different types of networks such as satallite networks or wireless vs wired networks. Maybe it's possible to have the model adjust QoE based on certain user preferences without retraining, for example, one user may care more about higher resolution while one user cares more about smooth playback and few rebuffer events.
###### Punnal Ismail Khan
The results are good but there is not much work done to understand the model and why it is making certain decisions. Some work related to getting more insights about the model is something that is needed.
###### Nikunj Baid
The RL model will be a blackbox model in many ways, and it would be nice if we could translate the decisions such that they can be justified on paper. Also, whether the model would be equivalently performant at a much larger scale is something we need to find out. Testing the model in a more rigorous setting and deploying it in a real life network for an extended period would further help in proving the supremacy of the model.
###### Ajit Jadhav
While the learning based approach does give us the model, we need information about what the model is learning to deploy it in an operational setting. So, figuring out model interpretability could make a stronger case for adoption of this approach.
###### Achintya Desai
The Pensieve technique might not perform well when the network is unreliable. For example when the policy is that the HD streaming is rewarded much more that the rest of the bitrates, and the network is unreliable in such cases there might be much more delays and low QoE compared to the traditional approach which keeps up with low HD bitrate and scores consistent low points.
###### Brian Chen
I would improve this work by addressing the issue of scalability. The idea behind Pensieve is to provide ABR algorithms that are themselves adaptive. The whole purpose is defeated if Pensieve isn’t frequently trained and also trained across a grid of minimal intervals. If the first condition is not met the adaptive aspect is lost and if the second condition is not met then the approach becomes too generalized and not much different from heuristics. One question that I have for the paper is just how useful do the authors suspect Pensieve to be? To what granularity of deployment do they think is necessary to outperform heuristics at scale?