Overflow Document -- CS 293 N, Spring 22

# Overflow Document -- CS 293 N, Spring 22 ## Lecture 15 ### Reading Assignment Read the paper, [Aurora](https://arxiv.org/pdf/1810.03259.pdf) Questions * What problem is this paper solving? * What is the state of the art (read section 7)? Why existing solutions are not good enough? * How is paper motivating the case for RL-guided congestion control? * What is non-congestion loss? * What are the actions, state, and rewards for RL-based congestion control problem? * How many hidden layers are used for the neural network? If a very small NN architecture works for a problem, what does it tell about the nature of the problem? * How is reward quantified for this problem? How do the authors justify the choice of weights (e.g., 10 for throughput, etc.)? * How do they justify choices for various other parameters? Are you satisfied with those choices? How else can these choices be justified? * What are the success metrics for the proposed solution? * Why the evaluation section focuses a lot on robustness? Are the results satisfactory? * How will you improve this work? What questions are left unanswered in this paper? #### Reviews [Aurora Reviews](/rlvutK9-Q5mRfksahkNmCA) #### Post-lecture Blog — Team 6 ##### Introduction Internet congestion control is an important networking problem that requires modulating the data transmission rates of traffic sources to efficiently utilize network capacity. Existing congestion control protocols like TCP CUBIC - the default for Linux - rely on heuristics to make decisions about the sending rate. TCP CUBIC doesn’t have access to an oracle state to determine the data transmission rate. But even with its simple heuristics-based approach, it is able to achieve good performance across different network conditions. However, it is not suited for mobile networks or networks with larger capacities where bufferbloat is a possibility, which can lead to packet loss. Given these issues and the inherent nature of congestion control, it is helpful to model it as a learning problem. Inferring capacity of bottleneck links, identifying useful acknowledgments, network capacity, data transmission rate, etc. can be learned by the models. This paper introduces Aurora, a framework that models the internet congestion control problem as a machine learning task using deep RL. Unline CUBIC, ML-based algorithms can maintain a state to store the history of data traffic and learn sending rates accordingly. ##### Motivation for RL-based congestion control Aurora uses a Deep RL-based approach for congestion control that relies on capturing intricate patterns from the history of data traffic and network conditions that can be exploited for better rate selection by learning the mapping between them. Deep RL-based schemes like Aurora can learn useful patterns like distinguishing between non-congestion and congestion-induced loss and adapting to variable network conditions among others. ###### Non-congestion and congestion induced loss State-of-the-art internet congestion control protocols use packet loss as a signal to decrease the sending rate in turn failing to fully utilize the links’ bandwidth. For example, TCP CUBIC halves the sending rate upon any occurrence of packet loss, leading to a higher cost of deployment. However, packet drops are not necessarily indicators of a decrease in network capacity. As such, heuristics based approaches make assumptions to over-simplify the decision making process. However, it is important to distinguish between different types of packet loss and the reasons behind such losses to make better congestion control decisions. Non-congestion based loss can occur due to several factors, for example, handover between mobile base stations. TCP CUBIC is not well adapted for mobile networks in general, amplifying the need for better congestion control algorithms. Figure 2 compares the throughput in situations of packet loss for TCP CUBIC and Aurora. This comparison is made for a single flow on a link of bandwidth of 30 Mbps, where1% of the sent packets are randomly dropped. As expected, Aurora has better bandwidth utilization compared to TCP CUBIC, as it is able to effectively distinguish between two types of packet loss situations. ###### Adapting to variable network conditions Network conditions can change considerably over time. Link capacity, packet loss rate, and end-to-end latency are highly dynamic, especially in mobile networks. TCP fails to adapt to variable network conditions. However, Aurora can learn that sudden steep rises in packet loss indicate a decrease in available bandwidth and no packet loss with an increased sending rate indicates higher available bandwidth. Figure 3 shows a comparison between TCP CUBIC and Aurora in a variable network with a single flow on a link whose capacity alternates between 20 Mbps and 40 Mbps every 5 seconds. ![](https://i.imgur.com/LYXIRMf.png) ##### Training Aurora Aurora focuses on chasing the tail in the congestion control task in networking, given that existing approaches like CUBIC already perform very well in most network conditions. Aurora uses deep RL to learn a policy to map observed network statistics to sending rates. Aurora uses a simple simulation framework for training a small neural network, which can generate congestion control policies that perform well in different network conditions. The testing suite for Aurora uses an emulated environment that sends real packets using a real Linux stack across virtual network interfaces using standard networking tools - Mininet and Pantheon. The simulation environment code for training Aurora and the testing modules are open-sourced as an OpenAI Gym repository. ###### Simulator The simulator used for training Aurora uses a simple simulation of a single traffic source on realistic internet links with various characteristics like bandwidth, latency, random loss rate, and queue size. The simulator is sufficiently accurate and uses three abstractions: links, packets, and senders, to model a simple network. Figure 8 shows a high-level view of the training framework. ![](https://i.imgur.com/5uWjyS9.png) ###### Model architecture Congestion control decisions are made in the Linux kernel, which provides limited flexibility in possible representations and computes time, making it hard to train a large ML model. Aurora uses a small neural network architecture that performs low-cost training for RL-based congestion control. The input to the training model is the bounded length history statistics vectors, and the output is the change in sending rate. The small neural network architecture chosen uses two hidden layers of 32→16 neurons and tanh nonlinearity. ###### Modeling congestion control as RL In Aurora, congestion control is modeled as a sequential decision-making problem, which uses actions, states, and rewards functions. **Actions:** Actions are the changes in sending rate of the sender. The sending rate x is changed after time t, also known as Monitor Interval(MI). The value of x and when it should be changed is regressed from the learning model. **State:** States are bounded histories of network statistics calculated from received packet acknowledgments. Networks have a dynamism in terms of latency over time that can effectively indicate transitions in the state of a network. Statistics vectors model state use delay metrics: - Latency gradient is the derivative of latency with respect to time. It essentially calculates the gradient over time series of latency values at the granularity of MIs. - Latency ratio is the ratio of the current MI’s mean latency to any MI's minimum observed mean latency in the connection’s history. - Sending ratio is the ratio of packets sent to packet acknowledgments received at the sender. This metric can give a signal about the state of the network. State-of-the-art congestion control techniques like CUBIC maintain a state with different information like timeout to indicate packet loss, with no focus on delay metrics like Aurora. The sender uses a bounded length history(k) of the statistics vectors collected from packer acknowledgments sent by the receiver to decide on the following rate change. This allows the sender to detect trends and changes in network conditions over time. **Reward:** The reward resulting from a specific sending rate at a certain time depends on the performance requirements of the particular application in consideration. Aurora uses a linear reward function that rewards throughput while penalizing loss and latency: Reward = (10∗throughput−1000∗latency−2000∗loss) ###### Parameter tuning **History length:** The sender makes sending rate decisions based on the statistics vectors from the k latest MIs, where k is the history length being considered. Increasing history length should improve performance. However, a model with k=2 has comparable performance to a model with k=10. **Discount factor:** An action of changing sending rate can have long-term consequences. For example, a faster sending rate can lead to buffer load and packet loss. Similarly, rewards can be delayed due to limited buffer size on the link and increased latency due to an increase in link occupancy. This long-term decision making is captured using the discount factor γ in the loss function of the RL algorithm. The best policy is learned fastest when γ=0.99, while γ=0.5 eventually learns a reasonable policy, but very low γ like γ=0.0 cannot learn a useful policy. ![](https://i.imgur.com/1Yn3yAl.png) Parameter tuning is the most critical task in machine learning models. The parameter values used in Aurora have been empirically evaluated on the training network. However, based on the given experiments, it is unclear if these parameters are dependent or agnostic of network conditions. Also, these parameters can be different for different network conditions. For example, cellular networks are very dynamic, and a notion of have coherence time, whereas wireless networks are generally invariant. ##### Evaluation and Results The authors used a simple simulator for training and an emulator for testing. They implemented a novel simulator which realistically mimics internet links with various characteristics. They conducted experiments to test the robustness of the trained model using network research tools such as Mininet and Pantheon. They demonstrated the model’s behavior with varying bandwidth, latency, queue size and random loss rate that is far beyond the training conditions. They ran the test for two minutes with a single sender over a single link. For each test, they compared the result against TCP CUBIC and PCC Vivace. Following is the different factors and its performance results: Bandwidth: The training bandwidth was between 1.2 Mbps and 6 Mbps. In testing, they configured the bandwidth differently for each test, ranging from 1 to 128 Mbps. Aurora operates 20x better than its training environment. Latency: The training latency was between 50 to 500ms. But the testing has been done between 1 to 512 ms which is a good representation of real network conditions. Aurora performed poorly on 1ms latency. Queue Size: Aurora was trained with queue size varying between 2 - 2981 packets. For testing, they used queue size with 1 - 10K packets. In the below figure, you can see that Aurora has significant throughput improvements over TCP CUBIC over that range. Loss Rate: Aurora’s training was with random loss probability between 0% to 5% which is a closer representation of real internet traffic. They tested the model upto 8% random loss probability and Aurora provided near-capacity throughput at higher loss than other two algorithms. ![image alt](https://i.imgur.com/g4gLIqZ.png) Overall, Aurora shows robust performance to environments outside of its training scope. And it outperforms the state-of-the-art Congestion Control algorithms. This suggests that applying deep RL to congestion control with limited training, has the potential of outperforming handcrafted protocols. ###### Research Discussions One major downside of this project is that this paper does not test and show results with real network conditions and data. Even though the performance with synthetic network conditions seems promising, the testing has to be conducted on real-world network conditions to gain confidence on this model. Still, it is a good step in the right direction. By providing open source code, this paper paves the way for the research community to explore more on RL-based Internet Congestion Control solutions. - *Do they have compatibility limitations on Kernal/OS?* - They use a simple neural network with only two hidden layers. But still, they have tradeoffs in terms of cost and time. - *What is the cost associated with this learning problem?* - *Memory Cost* - The model has to store the history and do a calculation for each decision - *No.of decision cycles* - Based on the history length, the process will take time to make a decision. The history length is dependent on the network conditions since the requirement would vary. For example, if we run the experiment for longer in a network setting, then we can have long monitor intervals and higher history length. - *Time* - The model’s performance rate could be affected by the computational time. If the decision making does not happen in real-time, then it is a futile attempt at a CC problem. - *What can we do differently here?* - We can use ML tools to understand the heuristics of the Congestion Control learning problem. We can train the model to get a ML solution, learn from it and apply it to interpretability tools. To understand the derived solution, we can compare the model performance with TCP CUBIC. Then, we can create whitebox representations based on our understanding of ML solutions. - Congestion Control problems are complicated with many factors involved. So, the whitebox representation would be more appropriate than the blackbox models. - The other option is to use hybrid model. Meaning, we can use heuristics based CC protocol and when the network becomes complicated, we can switch to ML model. The algorithm should be able to switch between heuristics based protocol and ML model. It should be able to maintain the balance on the time the ML model takes place and the scenarios to switch between two models should be clearly defined. ## Lecture 16 ### Reading Assignment Read the paper, [Metis](https://zilimeng.com/papers/metis-sigcomm20.pdf) Questions: * What problem is this paper solving? * What are the limitations of DRL solutions for networking? * Why the existing interpretability tools were not suited for the problem considered in the paper? * Describe the algorithm/methodology that the paper adopted to convert a DRL into a decision tree? * Why do we need to prune decision trees? How did METIS prunes the decision tree? * What are the key takeaways from applying METIS to Pensieve? * Why is it challenging to convert RNNs into decision trees? * What other approaches would you take to make DRL solutions more interpretable? * How will you improve this work? What questions are left unanswered in this paper? #### Reviews [Metis Reviews](/PRtvgORYSKO6jYS0Ex0t3A) #### Post-lecture Blog — Team 7 ###### What problem METIS is trying to solve? Increase interpretability and create models that are lightweight while maintaining the accuracy The paper is solving the problem of lack of explainability/interpretability of DL based networking systems. It does so by designing a framework which involves interpretation methods based on decision trees (by converting to rule based controllers) and hypergraphs (by highlighting the critical components based on analysis over hypergraph). ###### What are the limitations of DRL solutions for networking? Difficult to debug Difficult to train. Need a lot of data and training time Difficult to deploy Models with millions of parameters are slower to infer than a decision tree. Pensieve helped to speed up this training by simulating a lot of data for video playback ###### Why the existing interpretability tools were not suited for the problem considered in the paper? One idea: train a whitebox model from the start instead of a blackbox model. This may result in lower accuracy, but it depends on the nature of the problem. ###### Describe the algorithm/methodology that the paper adopted to convert a DRL into a decision tree? The paper uses a student-teacher methodology. The student learns from the answers of the teacher. The student tries to emulate the teacher with a smaller model. You take a subset of the samples in order to train because the amount of data is huge. The goal of the subset is to maximize the diversity in the samples. Suppose you have trained a black box model. Now goal is to make it interpretable. Take the labeled data along with the predictions from the black box model, and use the predictions from the black box model to train the decision tree. Another way is to train the white box model from the start. This may not perform as well as the black box model. Decision tree is harder to train and needs more data points. If our goal is to evaluate the model, we may try to convert the black box into white box. If our goal is to solve the model, we may want to train white box from the start. The method: From a state s, you take an action a to get to s’. A decision tree might instead learn a’ to get to s’’. If the reward deviates too much from the black box model, the black box model then takes over and predicts the next state s’’’ from s’’ to feed back as input to the decision tree. This is an interactive method. ###### Why do we need to prune decision trees? How did METIS prunes the decision tree? Pruning: The result of this process may result in a very complex decision tree. The purpose of the decision tree is to understand the decisions, so a complex decision tree is not what we want. We want to simplify the decision tree that finds a good structure of the problem that will help explain the decisions. Simpler models also help to improve inference time which is important for certain time sensitive applications. ###### What are the key takeaways from applying METIS to Pensieve? They found some flaws in Pensieve and worked to mitigate them; this was a major takeaway from the paper. They found that simplifying the model resulted in a more interpretable model without sacrificing performance. Pensieve rarely chooses certain bitrates. For example, Pensieve oscillates between 1850kbps and 4300 kbps instead of choosing the optimal 2850kbps for a 3000kbps link. ###### Why is it challenging to convert RNNs into decision trees? The biggest challenge was to include the idea of a memory unit inside the RNN into the decision tree, as intrinsically the decision tree is incapable of holding memory in a training state for a sample. ###### What other approaches would you take to make DRL solutions more interpretable? The other approaches would be to reproduce the data to get realistic traces. This would be done by using the physical RasPis and measuring the trace near the client and then sample it across the different networks, even the networks in which the solution will work. The other approach would be to measure the data from the server perspective if there is access to that. And then use METIS to reproduce the results for the data and then compare it with the real data. This would help us create a smaller decision tree rules which would match with the heuristic, worst case we will at least have an idea of how the decisions are made. Even selectively using the model only when the network conditions are suitable for this adaptively. ## Lecture 17 ### Reading Assignment Read the paper, [GENET](https://arxiv.org/pdf/2202.05940.pdf) Questions: * What problem is this paper solving? * What is curriculum learning? * What are the challenges in applying curriculum learning to networking problems? * What key observations did the paper make to address these limitations? * Describe the algorithm/methodology that the paper adopted to apply curriculum learning? * What key success metrics did paper focused on in the evaluation section? * Describe the testbed setup for the ABR and congestion control problems (Appendix A.4). * Why do you think the proposed approach works better than Bayesian Optimization? * What other (networking-related) learning problems can benefit from curriculum learning? * How will you improve this work? What questions are left unanswered in this paper? ### Post-lecture Blog — Team 8 ### Introduction Deep reinforcement learning (DRL) has been used recently to achieve state-of-the-art results for various networking and system adaptation problems like congestion control, adaptive-bitrate streaming, load balancing, wireless resource scheduling and cloud scheduling. However, these RL-based techniques face two challenges that can ultimately impede their wide use in practice: • Training in a wide range of environments: RL policy may perform poorly for a training distribution spanning a wide variety of environments even if tested in environments taken from the training distribution. For this, the paper uses three target distributions (with increasing parameter ranges), labeled RL1/RL2/RL3 ranges of synthetic environment parameters to compares the asymptotic per- formance of three RL policies (with different random seeds) with rule-based baselines for three different applications i.e. congestion control, adaptive bitrate and load balancing. The results from the experiment are shown in the figure below. ![](https://i.imgur.com/O5REHfM.png) • Generalization: RL policies trained on one distribution of synthetic or trace-driven environments may have poor performance and even erroneous behavior when tested in a new distribution of environments. The paper tested the generalizability of RL-based CC schemes in two ways. First, an RL-based CC algorithm was trained on a range of synthetic environments and it was validated by confirming its performance against a rule-based baseline BBR, in environments that are independently generated from the same range as training. Also, when tested on real-world recorded network traces under the category of “Cellular” and “Ethernet”, the RL-based policy yields much worse performance than the rule-based baseline. The results are shown in the figure below. ![](https://i.imgur.com/Hvt366S.png) ### Curriculum learning The paper supports the use of curriculum learning technique for training of RL-based network adaptation. Unlike the traditional RL training which samples training environments from a fixed distribution in each iteration, curriculum learning gradually increases the difficulty of training environments, so that it always focuses on training environments that are easier to improve, i.e., most rewarding environments. Prior work has demonstrated the benefits of curriculum learning in other applications of RL, in- cluding faster convergence, higher asymptotic performance, and better generalization. Challenges for Curriculum learning However, the challenge of employing curriculum learning lies in determining which environments are rewarding. This varies with applications, but three general approaches exist: (1) training the current model on a set of environments individually to determine in which environment the training progresses faster; (2) using heuristics to quantify the easiness of achieving model im- provement an environment; and (3) jointly training another model (typically DNN) to select rewarding environments. Among them, the first option is prohibitively expensive and thus not widely used, whereas the third introduces extra complexity of training a second DNN. Therefore, the paper explores the second approach. ### Genet approach To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is based on this approach of using curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. In curriculum learning, increasingly complex environments are introduced for training so that the RL model can have meaningful progress. The major challenge in applying curriculum learning is quantifying the difficulty of the learning environment for structuring the curriculum learning problem. To address this, the paper uses traditional rule-based (non-RL) baselines to add the environments for training that perform significantly worse than this baseline. This eliminates the reliance on handcrafted heuristics for determining the environment's difficulty level. Using this approach, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases: adaptive video streaming, congestion control, and load balancing, the paper shows that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments. ![](https://i.imgur.com/E2TEbaT.png) ### Design and implementation of Genet #### Curriculum generation Genet tries to find environment which has a large gap-to-baseline. This has three main benefits to this. If a rule based baseline performs better than the RL model in an unknown environment, RL model may learn to replicate the baseline's rules while training. Hence, a large gap-to-baseline indicates area to improve for RL model. Second benefit is related to network operators who tend to scrutinize any performance disadvantages of RL policy compared to traditional rule-based baselines employed in the system. By promoting large gap-to-baseline, Genet reduces the possibility of performance regression for RL policy. This makes RL model easier to train in these environments. This is how authors argue their preference of large gap-to-baseline environments for curriculum generation. This does not have any relation with reward associated with the environment. ![](https://i.imgur.com/8vAUos9.png) #### Training framework Genet follows an iterative workflow to realize curriculum learning. Each iteration has 3 steps: First, the current RL model is updated for a fixed number of epochs over the current training environment distribution. Second, environments are selected where the current RL model has large gap-to-baseline as suggested in curriculum generation section. Third, these selected environments are promoted in the training environment distribution used by RL training process in the next iteration. Training environment is distributed according to probability distribution over the space of configurations used for generating network environments. Genet sets the initial training environment distribution to be a uniform distribution along each parameter and updates the distribution used in each iteration. When trace records are available, Genet categorizes each trace along with bandwidth related parameters. Each time a configuration is selected by RL training to create a new environment, Genet samples a trace whose parameters fall into the range implied by the configuration. When the number of epochs are fixed at the beginning of an iteration, Genet reuses traditional training method in prior work which makes it possible to incrementally apply Genet to existing code. After certain number of epochs, a sequencing model searches for environments where the current RL model has a large gap-to-baseline. Bayesian optimization is used in this search. The expected gap-to-baseline over the environment created by configuration p is a function gap(p) = R(pi^{rule}, p) - R(pi^{rl}_{theta}, p) where R(pi, p) is the average reward of a policy pi over 10 randomly generated environments by configuration p. BO searches the environment such that gap(p) is maximized. When a new training environment is sampled, RL training process will chooses the new configuration with some probability or uniformly samples a configuration from old distribution with the remaining probability. Genet restarts the BO search when the model is updated with this choice to ensure the updated rewarding environments are considered. #### Design rationale Author talks about three main decisions. First one deals with the choice of baseline. The criteria for selecting a baseline is that it should not fail in simple environments. Although choice of baseline does not have a huge impact on performance, authors suggest that we can consider an ensemble of rule based heuristics and allow training to focus on choosing environments where any one of set of heuristics fail. The second one deals with the effect of BO exploration. Authors argue that BO is efficient whin a relatively small number of steps using the empirical evidence. Third one is related to the problem of model forgetting how to handle environments seen before. However, Genet stops training after changing training distribution 9 times. #### Implementation Genet is implemented in Python and Bash. Genet interacts with RL code using Train and Test call. Train signals RL to continue training using the given distribution of environment configurations and returns a snapshot of model after a number of epochs. Test calculates average reward of a given algorithm over number of environments. Genet integrates with Pensieve ABR, Aurora CC and Park LB. Genet uses the implemented rule-based baseline from these works. Additionally, authors also implemented a few more baselines. ### Evaluation For evaluation the authors trained GENET for three use cases 1) Congestion control, 2) Adaptive bitrate streaming, and 3) Load balancing. The training and testing were done using two types of environments 1) Synthetic environments where parameters are chosen to cover a wide range of factors and 2) Trace-driven environments where the bandwidth time-series are set by the real traces with other parameters set as in the synthetic environment and collected using the pantheon platform. GENET was compared against both traditional RL training and rule-based algorithms. They trained three types of RL policies over the uniform distribution of synthetic environments and trace-driven environments. For the rule-based approach, they compared it against BBA and RobustMPC for ABR, PCC-Vivace, BBR, and CUBIC for CC, and least-load-first (LLF) for LB. #### Asymptotic performance When GENET-trained policies were compared against baselines trained from the synthetic environment, the authors show that the GENET-trained model consistently improves over traditional RL-trained policies by 8–25% for ABR, 14–24% for CC, and 15% for LB, compared with traditional RL training methods. These results can be seen in the below figure ![](https://i.imgur.com/QnYctSG.png) For trace-driven environments, the models were trained by varying the ratio of real traces and synthetic environments. Even here the GENET-trained policies outperform traditional RL training by 17-18% regardless of the ratio of real traces. #### Generalization To test the generalization they compare the RL policies trained entirely over a synthetic environment and test it in a trace-driven environment generated by testing traces. ![](https://i.imgur.com/m9e3Zhy.png) The above figure shows that they perform better than traditional RL baselines trained over the same synthetic environment distribution. ![](https://i.imgur.com/uwk71GE.png) The above figure shows the comparison against the rule-based baselines and clearly, the GENET-trained policy outperforms others. To check how likely the GENET outperforms the rule-based baseline, they create various versions of Genet-trained RL policies by setting the rule-based baselines to be Cubic and BBR (for CC), and MPC and BBA (for ABR). Compared to RL1, RL2, and RL3 (unaware of rule-based baselines), Genet-trained policies remarkably increase the fraction of real-world traces where the RL policy outperforms the baseline used to train them. This suggests that by specifying a rule-based baseline, Genet will train an RL policy that outperforms it with high probability. The authors also tested GENET trained ABR and CC policies over five real wide-area network paths and they showed that GENET outperforms in all except two cases. In one case, Genet trained ABR shows only a little improvement, because the bandwidth is always much higher than the highest bitrate, and the baselines always use the highest bitrate, leaving no room for Improvement. In the other case, Genet-trained CC has negative improvement, because the network has a deeper queue than used in training, so RL cannot handle it well. The authors also show how well the GENET design choice performs over other curriculum learning schemes. They show that Genet’s training curves have faster ramp-ups, suggesting that with the same number of training epochs, Genet can arrive at a much better policy. They also used a variant of Genet which uses BO to find the environment configuration with a high gap-to-baseline. BO can identify good configurations within a small number of steps and its as good as randomly searching from many points #### Reviews [GENET Reviews](/DfViUpveS1O26st7BzvpTg) ## Lecture 18 ### Guest Lecture from Dr. Ajay Mahimkar, AT\&T. #### Post-lecture Blog — Team 9 In this guest lecture Dr. Ajay Mahimkar, Lead Inventive Scientist at AT&T, covered several topics related to the application of ML on networking problems, including the suitability and necessity of AI/ML to 5G network optimization, the current status of research in this field and the challenges that remain to be solved. #### 5G network applications and potential for AI/ML boost With the recent advancement of mobile networking towards 5G, new applications such as autonomous cars, virtual reality, IoT (Internet of Things) and autonomous drones, are emerging. Although AI/ML was applied for a long time already, it is expected to be a game-changer especially for QoE sensitive applications by enabling an enhanced user experience, and therefore an increased revenue for service providers, while lowering costs such as operational expenditure. To achieve these goals, AI/ML will have the potential to intervene and improve on various levels of the development loop, including: - data collection and curation - Network design, planning and building, e.g. deciding where to place new nodes, 5G site selection… etc. - Network optimization, e.g. traffic forecasting - Operation automation, e.g. helping network operators make configuration decisions. One characteristic of the configuration is that it is not static as conditions may change (in case of congestion, outage, reconfigure neighbor nodes to take on the traffic. Therefore, optimization should happen continuously, posing a challenge to the operators. - Applications and experiences **Q:** What is the current status of is AI/ML application in 5G? **A:** Taking a step back to previous technologies, 3G did not see much performance benefit from applying ML. Conversely, ML did find application in LTE for dynamic traffic management. Now with the advent of 5G, the claims in terms of performance are bigger and the culprit for this level of performance not being achieved yet is the RAN (Radio Access Network). One of the challenges that make applying ML for performance optimization in 5G challenging is the complexity of its architecture which is comprised of different layers. Currently, engineers are looking into traffic patterns of the 5G networks in order to identify ways to perform optimizations while experimenting on standalone vs. non-standalone versions of 5G. **Q:** It seems that the discussions on 5G optimization will be more extensive, and that simple heuristics will not be able to cope with the expectations. **A:** Absolutely, because of the sheer number of parameters we have to deal with in 5G (in the hundreds). For LTE, engineers perform parameter tuning depending on the location, and the morphology of the network, but it is mostly based on trial and error. This is clearly unfeasible in the context of 5G. In terms of ML, choosing the method is another challenge. For example, Reinforcement Learning (RL) is a good fit if the problem at hand only has few parameters whose ranges are not so wide. However, if there are too many parameters spanning a large range then the search space becomes exponential. In addition, an important aspect of how radio performs is environment which include external factors such as noise, weather, and interference. Consequently, for the same configuration parameters the actual experience can be very different. The question is can we, using AI/ML, do better than what humans already do? **Q:** One problem we encounter in the field is the sparsity of data, and collecting larger amounts of data does not solve the issue… **A:** That’s right. Configuration parameters are sparse, and might not be applicable in different settings. RL consists in exploration, where we try out different parameters, and exploitation, where we apply the parameters to a different setting. Therefore, obtaining good results cannot guarantee that the exploitation will be as successful, since the optimized parameters might be tied to a specific environment. #### AI/ML for operations automation Open research questions include: • Change management. How to introduce a change in configuration without hurting performance? • Resource management. E.g. to save energy, cell site should be put to sleep if under-used. When is it most appropriate to put the cell to sleep? then how to make the wake-up fast? ![](https://i.imgur.com/7dzQlVx.png) #### Open Network Automation Platform (ONAP) for configuration change management To help establish a dynamic control loop, a major problem that engineers faced was the lack of interoperability between stakeholders in the field. Indeed, it is a nightmare for network operators if the vendors and the carriers use different standards / terminology. ONAP was introduced as a standardization effort to address this problem. **Q:** What is the current status of ONAP? **A:** A number of mobile operators such as AT&T and Orange are using ONAP as a common platform for configuration change management. **Q:** is the data based on SNMP? **A:** Yes, some of it. Other data formats with different granularity are used as well. Besides, we need to make sure the data collection granularity allows for a meaningful decision making in terms of timing. #### Auric: ML-based configuration recommendation While the rulebook method for managing configuration change worked well for some time due to the relative simplicity of previous technologies, the question that arises with 5G is can we automatically derive configuration recommendations from existing ones? Can we derive implicit knowledge from existing configurations, since they represent best practices for their respective environments? The address these questions, a system similar to Amazon’s recommendations, named Auric, was proposed. Auric distinguishes two types of learners: local learners and global learners. Local learners rely on voting from similar carriers restricted by geo-proximity, so the voting seeks maximum support across similar carriers. Global learners are about voting from similar carriers across the whole network. ![](https://i.imgur.com/6x2MjKo.png) **Q:** A point we extensively discussed in the class is the importance of capturing the causal structure of the problem. In practice, how do you know if the problem is well defined? **A:** The key is to maintain the dialogue with the engineers, and share progress so that it can be validated experimentally. The collaboration is also important to address the issue of the hidden parameters. Domain knowledge from engineers can greatly help in identifying those parameters and thus improve our understanding of the causal structure of the problem. ![](https://i.imgur.com/qSlSM8n.png) It is worth noting that the configuration parameters, i.e. the output of the learning problem, can be correlated. Therefore, it is necessary to take those correlations into account, instead of learning the output parameters separately. At initial stages, a simple correlation analysis is performed. **Q (chat):** in general, what training methods or factors other than performance are emphasized to make the decision to deploy an ML model? **A:** In mobile networking, there is a tradeoff between coverage and throughput, and the sweet spot differs depending on the environment. Therefore, those factors have to be considered. Other aspects include security and fairness. #### Configuration change deployment At the last phase of the ML pipeline, a challenge the engineers face is the deployment of the configuration change onto servers. A question could be how many servers should the new configuration be pushed to while ensuring service continuity? ![](https://i.imgur.com/SxME5uL.png) Another solution, named CORNET is relevant in the context of change management. CORNET is a dynamic composition framework for change management which focuses on the modularization of change procedures into a reusable library of building blocks. CORNET also facilitates the coordination of change deployment with the engineering teams. **Q (chat):** In case the ML model we obtain is difficult to explain, that is, the role of the different features in determining the outcome is not clear, is the model still deployable? **A:** It is important to understand that the responsibility of deploying new models and more generally making changes in the current configuration lies with the engineers. Consequently, the degree to which a model is deployable is really about how comfortable the engineers are with the new solution. Also, their bonus is tied with the network’s performance. They will usually ask for evidence that the solution is effective and reliable, also because some networks are critical in nature, such as hospitals or police departments. The first trials are the hardest, but as the trials begin to show acceptable results the confidence goes up, and the engineering team will be more open to new changes and more likely to agree on further experimentation. **Q (chat):** Typically, how long do we need to wait before we decide that the change is working for the small population and we can now deploy it at a larger scale? **A:** We first need to make sure that the results we have are statistical rather than a one-off. It also depends on the size of the changes we intend to apply: if it’s a major software release, then it will naturally take more time. Configuration changes (editor’s note: in contrast to major releases probably represent smaller changes) on the other hand can be applied locally or on state level. And since the RAN (Radio Access Network) is more distributed in terms of management and engineering teams, it may take less time to push those changes to different part of the RAN. In summary, bootstrapping trust is the tricky point. #### Future AI/ML challenges for AT&T **Architecture:** Bypass EMS (Element Management System) to talk to the base station directly. **Decision cycle:** Building faster decision loops from the data collection and curation to decision making so that ML models can effectively be used for forecasting. **Explainability:** to achieve or approach safe trials. **Automation:** for example, how can we use drones in place of human operators to check towers?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.