Docker Resource use Measurement

# Docker Resource use Measurement ## Project description The cloud providers currently charge for the reserved resources' capacity (such as, computation, storage, networking, among others) and not take into account the actual resource usage. However, using the containerization technology the monitoring of resource use at the container level can be performed, which can finally commoditize computing using resource use instead of resource reservation. In a system where Docker containers are executed on multiple hosts, the developed system allows the following: - Real time resource use monitoring of the running Docker containers - Historical resource consumption of the Docker containers - Sum total historical resource consumption for a group of containers running on multiple hosts - Device metrics and measurement tools for the following resources: - CPU time - RAM usage - Possible future extentions - Disk Input Output - Network Input Output ### Design Considerations #### Data The project concerns with a huge amount of real time usage data of the Docker containers. The data can be categorized into two types, measurement data and user data. The measurements are timeseries data and are likely to be high volume. The user data will not be written as often, but still needs to be retrieved quickly to keep the page load time short. For the measurements, it is acceptable if the data is not always 100% accurate, as long as it is eventually consistent. It is important that data storage is always available to accept measurements, because losing data could lead to billing problems. For the user data, consistency is more important. User data is primarily used to service the frontend. As such, while availability is still important, it is acceptable if failover takes a brief period. #### UI The website is implemented as a single-page application (SPA) that dynamically rewrites the web page with new data instead of reloading the entire page again. The reason for this choice is to reduce latency, since the aim to have high availability. ## Architecture The architecture for our project is depicted in the following figure: ![](https://i.imgur.com/oMlmdSh.png) As any web application, our project has a frontend and a backend component. In addition to these components, there is also the ingest service. This receives the measurements from the ingest sender, which will be running on the nodes for which the container resource use needs to be measured. The ingest service is separate from the backend, which enables them to scale up or down separate from eachother. For example, if a large number of measurements are coming in, the backend API will not be affected, as long as the databases can handle the traffic. ## Technology ### Kubernetes We use Kubernetes to deploy the project. This gives us easy scaling, and allows us to automatically restart containers if something fails. ### Databases #### Redis Normally, Redis is used as a cache. We have not used it in this way yet. Here, we use Redis for its stream feature: when a user is connected to the WebSocket, a stream is created in Redis. When data is received by the ingest service it is added to this stream, at which point the WebSocket task monitoring this stream will get the data and send it to the frontend. Pros: Cache speed, ease of use and supports different types of datastructure. Cons: Persistence, stores directly in memory and expensive. #### Cassandra Cassandra is well suited for storing timeseries data. As such we store the Docker container use measurements in Cassandra. Pros: Masterless, Availability and Horizontal Scalability. Cons: Eventual Consistency and dealing with tombstones. #### MongoDB MongoDB is where we store all data that is not timeseries data. We had intended to use this to store user data, but due to time constraints we did not get around to adding proper user management to our application. Pros: Simplicity and flexibility. Cons: Lack of schema can make migrations challenging and more difficult to scale. ### Backend For both the API backend as well as the ingest service we used Rust, with the `actix-web` framework. The backend has a number of endpoints which enable to frontend to retrieve data from the different databases and add new data. | Endpoint | Type | Description | | -------- | -------- | -------- | | `/container/{container}` | GET | All measurements for the specified container | | `/container` | GET | Most recent measurement for all added containers | | `/add` | GET | All containers that data exists for, but that have not been added | | `/add` | POST | Add a container | | `/measurements/ws?container={container}` | WebSocket | Open a WebSocket to receive new measurements for the specified container | | `/status` | GET | Health check | ### Ingest #### Service The ingest service is in essence a single endpoint where new measurements can be submitted. These measurements are then saved in Cassandra, and if a client is connected to a WebSocket for that container this measurement is also send to that client, by adding it to the corresponding Redis stream. | Endpoint | Type | Description | | -------- | -------- | -------- | `/stats_push` | POST | Add a new measurement | | `/status` | GET | Health check | #### Sender The python implementation of the ingest sender fetches the statistics of the currently running containers with the help of `aiodocker`. These stats are then sent into our databases through the Rust ingest service API. ### Frontend The frontend of DRM.io consists of a single-page-application made with [React](https://reactjs.org/). The UI consists of a form and a table (with an embedded graph in each row). The states are maintained using a React Hook, [useState](https://reactjs.org/docs/hooks-state.html). ![](https://i.imgur.com/SKVK5LJ.png) #### A Form The dashboard contains a form with a list of all the containers along with checkboxes. Selecting the checkboxes allows the users to choose which containers' details they want to view in the table that follows the form. #### A Table The dashboard contains a [React Bootstrap Table](https://www.npmjs.com/package/react-bootstrap-table-next) with each row storing a container id. When the user clicks on a row (or container id), it expands and shows a timeseries graph of the resources usage by the container. For initialisation of the table, [Axios](https://www.npmjs.com/package/axios) API is used. #### A Graph The graph is a dynamically changing realtime curve, made using [apexcharts](https://apexcharts.com/) library. Each row of the table contains three graphs displaying real time resource usage (such as cpu, memory and storage) of a given container. For fetching the realtime data, a standard [WebSocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) API is used. ## Reflection Our approach as a number of benefits: - Traffic is automatically load-balanced. - Services can be scaled up or down depending on the traffic. - Every component is fault-tolerant. However, our choice to use three different databases in our implementation also has some downsides: - Higher costs - More complexity - Increases difficulty of maintenance ## Deploying ### Creating Kubernetes Cluster To create the Kubernetes cluster, [Kubespray](https://kubespray.io/) is used. The openstack resources are provisioned using [Terraform](https://www.terraform.io/), and then a [Ansible](https://www.ansible.com/) playbook is used to configure everything, from installing the dependencies on each instance, creating the basic Kubernetes cluster, etc. With this configuration volumes can be automatically provisioned using Cinder and load balancers can also be created from Kubernetes. Detailed instructions can be found in the [terraform folder](terraform/openstack/README.md). #### Previous attemps We also investigated other approaches, using [kOps](https://kops.sigs.k8s.io/) and [Magnum](https://wiki.openstack.org/wiki/Magnum). We abandoned kOps due to an issue where the volumes for `etcd` could not be bound. We believe that this could be caused by the [Openstack metadata service](https://docs.openstack.org/nova/latest/user/metadata.html) not working as expected on the Fuga Cloud. In the interest of time we decided to try another approach for setting up the cluster. Magnum likely would have worked, but because it is so well integrated with Openstack we thought that it would be too simple. In fact, if a Kubernetes cluster is created using Magnum, it shows up on the Fuga Cloud dashboard, just as if the cluster was created by clicking on a few options on the dashboard. ### Inside Kubernetes When a fully working k8s cluster is available, deploying the project should be easy. Run the following commands from the project root. #### Databases This script uses helm to configure Redis, MongoDB and Cassandra clusters. ```shell scripts/helm.sh ``` Cassandra requires an additional step to create the keyspace and tables. Wait until all 3 cassandra nodes are running `kubectl get pods -w` ```shell scripts/exec-cql.sh db/keyspace_kube.cql scripts/exec-cql.sh db/measurements_table.cql ``` #### Application All the stateless resources can be created with one command: ```shell kubectl apply -f ../../kube/cloud/ ```