GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks

# GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks ###### tags: `GPUs` ###### paper origin: EuroSys’19 ###### paper: [Link](https://dl.acm.org/doi/10.1145/3302424.3303958) ## Introduction ### Problems * A key distinguishing aspect of the microservice architecture is the availability of pre-existing, well-defined and implemented software services by cloud providers ### Proposed Solutions * Designed a Execution Time Estimation Model to predict the latency of every microservices * Reordering the tasks based on the microservice stage slack * Performing dynamic batching to increase the sharing degree of microservice tasks. ## Analysis of Microservices ### Performance of Microservices ![](https://i.imgur.com/9TKtqdL.png) 1. Sharing degree Sharing degree defines the granularity at which requests belonging to different jobs (or applications) are batched together for execution. 2. Input size As the input size increases, additional amounts of computation would be performed by the microservices. 3. Queuing delay Queuing delay is the last factor that affects execution time of requests. This is experienced by requests waiting on previously dispatched requests to be completed. ### Execution Time Estimation Model Accurately estimating the execution time of a request at each microservice stage is crucial as it drives the entire microservice execution framework. ![](https://i.imgur.com/BbVCrQT.png) We use a linear regression model to determine the T_compute of a request, for each microservice type and the input size, as a function of the sharing degree. ![](https://i.imgur.com/g3kbZRx.png) X is the sharing degree(Batch Size) ## GrandSLAm Design ### Building Microservice Directed Acyclic Graph ![](https://i.imgur.com/0yGG6r0.png) The first step in GrandSlam’s execution flow is to identify the pipeline of microservices present in each job. For this purpose, our system takes the user’s job written in a high-level language such as Python, Scala, etc. ### Calculating Microservice Stage Slack The end-to-end latency of a request is a culmination of the completion time of the request at each microservice stage. In other words, microservice stage slack is defined as the maximum amount of time a request can spend at a particular microservice stage ![](https://i.imgur.com/7I4Mm04.png) ### Dynamic Batching with Request Reordering GrandSLAm’s final step is an online step orchestrating requests at each microservice stage based on two main objec- tive functions (i) meeting end-to-end latency (ii) maximizing throughput. ![](https://i.imgur.com/f6dUWJk.png) 1. Request reordering Slack based request reordering is performed at each microservice instance by our runtime system. The primary objective of our request reordering mechanism is to prioritize the execution of requests with lower slack as they possess much tighter completion deadlines. 2. Dynamic batching At each microservice stage, once the requests have been reordered using slack, we identify the largest sharing degree (actual batch size during execution) that can be employed such that each request’s execution time is within the allocated microservice stage slack. ### Slack Forwarding ![](https://i.imgur.com/cRoi1co.png) While performing slack based request scheduling in multi-stage applications, we observed a common scenario. There is always some leftover slack that remains unused for many requests. ## Evaluation ### Experimental Environments 1. Infrastructure We evaluate GrandSLAm on a testbed consisting of 100 docker containers. Each container has a single 2.4 GHz CPU core, 2GB of RAM and runs Ubuntu 16.10. 2. Microservice types ![](https://i.imgur.com/x0fkSgk.png) 4. Load generator/Input We design a load generator that submits user requests following a Poisson distribution that is widely used to mimic cloud workloads ### Achieving Service Level Agreements (SLAs) * Reducing SLA Violations Under this experimental setup, we first obtain the percentage of requests violating SLA under a baseline scheme which executes requests (i)in a first-in-first-out (FIFO) fashion (ii)without sharing the microservices. ![](https://i.imgur.com/c4ncuh6.png) ## Comparing with Prior Techniques ![](https://i.imgur.com/Ebu0LsJ.png) ## Comparied to my research * We both use reordering and batching but with different method. * My environment features GPUs with different computing capabilities. * Use profiled latencies instead of an latency estimation model.