Scaling and Allocation of Containers in Edge System

# Scaling and Allocation of Containers in Edge System ## Motivation 1. With the rapid development of edge computing system and containerization technologies, more and more internet service provider has deployed their application server onto the MEC system. By the way, as the computing and networking resource on edge system are not so powerful, it is important to develop a strategy which optimize the utilization of both computing and networking resources. 2. With the presence of edge system, one may deploy containers with higher data flowing on the edge. This can help decreasing data flowing in the internet and save up more networking resources. With the assistance of scaling and allocation strategy, one may easily decide the deployment of containers and thus optimize the utilization of the resources. 3. On the other hand, various customization such as dynamically allocate requests distribution, could be developed for the local users present in the edge area. For an example, edge system can easily serve the requests of its local clients by scaling and allocating applications appropriately according to the demands of the clients. 4. According to the architecture of edge system, it is suitable for multi-layer colloborative computing. Thus, the problem of scaling and allocating containers could be separated into 2 subproblem. One is the allocation of computing resource in each clusters, the another is the scaling of containers according to the allocated resources. ## Scenario * Change the following content to the topic of vehicles. Most web servers nowadays are multithreaded which may efficiently utilize more than one CPU core, the utilization of computing resources could be optimized by finding the appropriate amount of allocated resources for each running container. However, the request distribution of various applications in each cluster is not uniform. This could result in underutilization of computing resources which lower the efficiency and thus increases the serving latency. Since the requests could be forwarded to other clusters in the same edge area based on various policy(such as load-balance), the amount of requests handled by each cluster is not independent with the allocated resource. In this case, one can allocate resources to each cluster and the forwarding of these requests will then be determined according to their forwarding strategies. Thus, a low latency and excellent efficiency of the resources in each cluster can be reached by considering the computing delay and networking delay simultaneously. ## System Model - We proposed a 2-layer architecture to separate the problem of resource allocation and containers scaling. That is, orchestrator in the edge system is responsible for resource allocation for various applications whereas the control node in each clusters is in charge of containers scaling of applications according to the allocated resource. ![](https://hackmd.io/_uploads/HkktjhPr2.png) Fig.1: 2-layer architecture. ![workflow](https://hackmd.io/_uploads/rJVcCVBdT.png) Fig.2: Message Flow of proposed algorithm. ### Resource Allocation Generally, resource allocation were computed considering only the performance of each server which is usually its processing latency. However, as the requests may be re-distributed to other clusters according to various method that usually depends on the resource distribution, strategy considering only server performance may not lead to the global best. In our case, we consider the forwarding policy of load-balancing which distribute requests in the system according to the ratio of allocated resources. That is, the cluster with more resource allocated for the running service should be handling higher ratio of requests of the service. To tackle this issue, we formulate the relation between requests forwarding and resource allocation. That is, a function that takes resource allocation strategy as its input and output the best forwarding strategy best satisfying the requirement of load-balancing. With the formulated relation, we can easily transform the problem into a simple resource allocation with a detail optimization target. ![Load-Balancer-System](https://hackmd.io/_uploads/S1YtouB_p.png) ### Containers Scaling To lower the processing delay, one may provide more computing resource to the application. Due to the containerization of applications, we may have 2 options: 1. Vertical Scaling - This refers to the allocation of extra computing resource to a specific container. ![](https://hackmd.io/_uploads/r1YVECHr3.png) - Optimization of multithreaded application: - The efficiency of a multithreaded application might vary due to the fluctuation of request distribution and allocated CPUs. - ![](https://hackmd.io/_uploads/B1LhQJLBn.png) - Since there could be idle time for each thread, it is possible to save some time by doing context switching if the idle time is longer than the time for doing context switching. - With the statement above, we may easily deduce that the processing latency could be reduced by increasing the available CPU for the computing. - The efficiency of CPU could reach an local optimal level since the idle time could be less than the time required by context switching. 2. Horizontal Scaling - This refers to the duplications of container in a specific cluster. ![](https://hackmd.io/_uploads/S1sKEASrh.png) Since there exists convergence of the efficiency of multithreaded application for increasing allocated computing resource, vertical scaling may optimize each container to a better performance. Besides, by assuming the requests distribution of applications is not fluctuating dramatically, our strategy proposed horizontal scaling after vertical scaling to ensure the optimization of every running containers. ## Problem Formulation ### Orchestrator-level: $$D=avg([D_i])=\frac{\sum _{i\in Clus}（D_i\cdot \sum_{k\in App_i}Req_{ik}）}{\sum _{i\in Clus}\sum_{k\in App_i}Req_{ik}};$$ - $D$: Effective delay of the system. - $D_i$: Effective Overall delay of cluster $i$. - $Clus(C)$: Set of all clusters in the system. - $Req_{ik}$: Observed requests density of application $k$ in cluster $i$. - $App_{i}$: Set of applications which have requests density larger than 0 in cluster $i$. Our goal is to improve the QoE of the users in the system, which is only the latency experienced by every users. The concept is to find an average delay representative enough for the whole system. Thus, we compute the average delay considering all clusters. ### Cluster-level: $$D_i=avg([D_{T_{ij}}+D_{P_{ij}}])=\frac{\sum_{j\in Clus}(T_{ij}+P_{ij})}{\sum _{k\in App_{i}}Req_{ik}};$$ - $D_{T_{ij}}$: Propagation delay for a requests being forwarded from cluster $i$ to $j$. - $D_{P_{ij}}$: Processing delay for a requests being forwarded from - $T_{ij}$: Product of the amount of requests forwarded from cluster $i$ to $j$ and its propagation delay. - $P_{ij}$: Product of the amount of requests forwarded from cluster $i$ to $j$ and its processing delay. For the delay of a single request, it is the summation of propagation delay and processing delay. In this case, we compute the average of all requests in the cluster. #### Processing Delay: $$P_{ij}=\sum_{k\in App}(\eta _{jk}\cdot N_{ijk});$$ - $\eta _{jk}$: Estimated processing effiency for application $k$ running on cluster $j$. - $N_{ijk}$: Density amount of application $k$'s requests forwarded from cluster $i$ to $j$. The processing delay can be obtained using an estimation function which takes resource allocation and concurrent serving requests as its input and output a processing delay. Similarly we add the summation term in order to compute the overall case. #### Processing Efficiency: $$\eta _{jk}=\begin{cases} \epsilon _k([R_{jk}]_{l\in resr}, \sum _{i\in Clus}N_{ijk}), |R_{jk}|<|Res_k|;\\ \frac{(|R_{jk}|//|Res_k|)\epsilon_k([Res_k]_{l\in resr}, \frac{|Res_k|}{|R_{jk}|} \sum _{i\in Clus}N_{ijk})+(|R_{jk}|\%|Res_k|)\epsilon_k([R_{jk}\%Res_k]_{l\in resr}, \rho\sum _{i\in Clus}N_{ijk})}{|R_{jk}|}, R_{jk}\ge Res_k; \end{cases}$$ $$[a_l]_{l\in resr}=[a_0, a_1, a_2, ....], \rho =\frac{|R_{jk}|\%|Res_k|}{|R_{jk}|};$$ $$|R_{jk}|=\sqrt{\sum _{l\in resr}R_{jkl}^2}, |Res_k|=\sqrt{\sum _{l\in resr}Res_{kl}^2};$$ - $\epsilon _k$: Approximation function that describe the processing delay of a single server given specific resource allocation and concurrent requests. - $resr$: Set of available resource types such as CPU, memory, and etc. - $|R_{jk}|$: Allocated resource for application $k$ in cluster $j$. - $|Res_k$|: Optimal Allocated resource for application $k$ in cluster $j$. - $\rho$: Ratio of non-optimally allocated resources. `` The optimization of allocated resource Res_k should be conducted as another research. Here we simply assume it is a constant value. `` The estimation function used to predict the computing efficiency of a single server can be trained by machine learning regression model. The overall efficiency within a cluster can be computed by calculating the average efficiency under its allocated resource. #### Propagation delay: $$T_{ij}=e_{ij} \cdot \sum_{k\in App} (N_{ijk}\cdot d_k);$$ - $e_{ij}$: Average propagation speed. - $App$: Set of all running applications in the system. - $d_k$: Average data size of application $k$. The propagation delay can be computed by multiplying the average data size of the requests and the average propagation speed between cluster $i$ and $j$. Our goal here is to find the summation of the propagation delay of all requests forwarded from cluster $i$. Thus, the relation can be expressed by adding the summation term of the amount being forwarded. #### Re-distribution of requests density using load-balancing: $$Req_{jk}'=\frac{|R_{jk}|}{\sum _{n\in Clus}|R_{nk}|}(\sum _{n\in Clus}Req_{nk})$$ - $Req_{jk}'$: Adjusted requests density of application $k$ that should be handled in cluster $j$. In this section, we first introduce the re-distribution of requests density using load-balancing method. The basic concept is that, with more resource allocated, it should handle more requests. Thus, the request after re-distribution represent the amount of requests that should be handled. #### Requests Density forwarding For the cases of $i=j$, $$N_{ijk}=\begin{cases} Req_{ik}', Req_{ik}\ge Req_{ik}';\\ Req_{ik}, Req_{ik}< Req_{ik}'; \end{cases}$$ For the cases of $i!=j$, $$N_{ijk}=\begin{cases} 0, Req_{ik}'+\sum _{\beta \in Clus-prevC_j}N_{i\beta k}\ge Req_{ik} || Req_{jk}'\le Req_{jk}+\sum _{\alpha\in Clus-prevC_i }N_{\alpha jk};\\ min[(Req_{jk}'-Req_{jk}-\sum _{\alpha \in Clus-prevC_i}N_{\alpha jk}), (Req_{ik}-Req_{ik}'-\sum _{\beta \in Clus-prevC_j}N_{i\beta k})], else \end{cases}$$ - $prevC_n$: Set of cluster(s) iterated for cluster $n$ before. With the re-distributed requests, we are able to compute the forwarding amounts iteratively. In order to retrieve the best forwarding strategy, we use the greedy approach that clusters with higher inter-propagation speed will be iterated in high priority. In the relation formulated above, we consider 2 conditions in which on the clusters being forwarded requests should have available requests slot and the cluster which its request being forwarded to other cluster should have excess requests on it. The function will enter second part only if the 2 conditions are valid simultaneuously. In the second part where the function doesn't always equal to zero, it is another set of conditional value where it will be the smaller term between the available requests slot on the cluster being forwarded requests and the excess requests on the other cluster. ## Proposed Solution and Algorithm ### Direct Optimization: The problem formulated above is a function which takes only resource allocation and requests distribution as its input and output a result of average system delay. Since the problem has been defined mathematically, we can solve the optimization problem using direct derivatives. The optimization of the average delay can be reach by solving the partial derivatives of the delay function $D$. In our case, the input requests distribution would be a constant and is predicted using various estimation model for the next time slot. Therefore, the only variables would be the resource allocation strategies. Compute the partial derivatives of the delay function with respect to these allocation strategies would form a system of equation with variables' size of $(n*m)^2$, where $n$ is the cluster amount and $m$ is the available resource types such as CPU, memory, and etc. \begin{align*} \frac{\partial D (CPU_{00}, Memory_{00}, CPU_{01}, Memory_{01}, ...)}{\partial CPU_{00}};\\ \frac{\partial D (CPU_{00}, Memory_{00}, CPU_{01}, Memory_{01}, ...)}{\partial CPU_{01}};\\ ... \end{align*} ### Utilization of Sigmoid Filter: $$\sigma (ax)=\frac{1}{1+e^{-ax}};$$ ![image](https://hackmd.io/_uploads/Sy1E1vq_a.png) As discussed so far, we have not mentioned the resource constraint on each cluster and yet to implement them into the mathematical relation defined above. In this case, although the method of direct derivative can solve the optimization problem mathematically, we still have to abandone invalid solution due to the resource constraint. Here we introduce a filter simulated by sigmoid function to solve the issue of the resource constraint. As in the illustration, a sigmoid function act like a rough unit step function. By adjusting the coefficient $a$ into a greater value, the sigmoid function will be approaching a unit-step function. In that case, we can adapt the sigmoid function on the model we have defined to satisfy the resource constraint. The advantages using sigmoid function as our filter is that it is easily derivable. Instead of a unit-step function, a sigmoid function doesn't has any discontinous point and thus can be computed easily. In order to formulate the system delay equation with resource constraint, we simply add a conditional term on the delay model. That is as shown below: \begin{align*} D^*=D+\Omega \sum _{i\in constr}\sigma(\gamma_i) \end{align*} - $D^*$: Adapted system delay with resource constraint. - $\Omega$: A hyperparameter used to represent a enourmous quantity. - $constr$: Set of the resource constraint. - $\gamma _i$: Invalid range due to resource constraint. The complexity of computing the equation above has nearly no difference with the one without resource constraint. That is, the derivative of sigmoid function is easily computable as the following: $$\frac{\partial \sigma(x)}{\partial x}=\sigma(x)[1-\sigma (x)]$$ ## Experiments/Simulation Evaluation - To be implemented in the future. ### General server model expression: Generally, the performance of a server depends on its allocated resource and concurrent serving requests. In order to estimate the processing response time, we use a regression model to predict its response time under the given allocation and requests distribution. $$\epsilon_k=\frac{Req_{conc}}{k_{CPU}\cdot CPU}+\frac{Req_{Memory}}{k_{Memory}\cdot Memory}$$ ## Reference - L. H. Phuc, M. Kundroo, D. -H. Park, S. Kim and T. Kim, "Node-Based Horizontal Pod Autoscaler in KubeEdge-Based Edge Computing Infrastructure," in IEEE Access, vol. 10, pp. 134417-134426, 2022, doi: 10.1109/ACCESS.2022.3232131. - L. Toka, G. Dobreff, B. Fodor and B. Sonkoly, "Adaptive AI-based auto-scaling for Kubernetes," 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 2020, pp. 599-608, doi: 10.1109/CCGrid49817.2020.00-33. - M. Chen and Y. Hao, "Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network," in IEEE Journal on Selected Areas in Communications, vol. 36, no. 3, pp. 587-597, March 2018, doi: 10.1109/JSAC.2018.2815360. - S. Hu and G. Li, "Dynamic Request Scheduling Optimization in Mobile Edge Computing for IoT Applications," in IEEE Internet of Things Journal, vol. 7, no. 2, pp. 1426-1437, Feb. 2020, doi: 10.1109/JIOT.2019.2955311. - Y. Liu, H. Yu, S. Xie and Y. Zhang, "Deep Reinforcement Learning for Offloading and Resource Allocation in Vehicle Edge Computing and Networks," in IEEE Transactions on Vehicular Technology, vol. 68, no. 11, pp. 11158-11168, Nov. 2019, doi: 10.1109/TVT.2019.2935450. - K. Ye, Y. Kou, C. Lu, Y. Wang and C. -Z. Xu, "Modeling Application Performance in Docker Containers Using Machine Learning Techniques," 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 2018, pp. 1-6, doi: 10.1109/PADSW.2018.8644581. - Y. Liu, D. Lan, Z. Pang, M. Karlsson and S. Gong, "Performance Evaluation of Containerization in Edge-Cloud Computing Stacks for Industrial Applications: A Client Perspective," in IEEE Open Journal of the Industrial Electronics Society, vol. 2, pp. 153-168, 2021, doi: 10.1109/OJIES.2021.3055901.