github
Every requests will send to mn1 server and there are 20% probability send to mn2 server by mn1 after receive the request, each request will create a cpu instence task on target server (Calculate prime number from 1~N)
Rt in reward function will be calculated at the last 5 second in the period(30 second), this request will send to mn1 and mn2 servers respectively per second, which we can calculate the response time, and their ports are not same as the previous ports(use in create cpu instence task), Rt will be the average of these five response time, but there is an exception, when there is a timeout in this five collection, Rt will be set as max rt (in our setting 50 ms)
$c_{utilization}$ in reward function will be calculated at the last 5 second in the period(30 second), we get the container's metric by docker stats command, $c_{utilization}$ will be the average of these five cpu usage which are contained in container's metric.
Reward function
To calculate the reward I use moving average (4) to smooth the unstable $c_{utilization}$ and Rt
\begin{align*}
Rt &= \begin{cases}
RtList_{lastfourmean} & \text{if } RtList_{size} \geq {4} \