cgroups: The Key to Effective Resource Management in Linux Systems

# cgroups: The Key to Effective Resource Management in Linux Systems Imagine you have a server that hosts several applications, whether for frontend, backend, database or monitoring. You might have realised that each of these applications have different requirements and you would want to allocate resources fairly to all these applications for an optimal performance. While containerisation and orchestration tools have been a great boon in simplifying all these tasks currently, having fine-grained control over individual processes and threads would be preferable for high-performance computing or real-time applications. In this article, let's dive deep into the world of cgroups as we explore how to effectively manage and allocate resources in a Linux system! ## Introduction cgroups, also known as control groups, is a feature that allows you to manage, allocate and monitor system resources, such as CPU, memory, network and disk I/O, to a group of processes. cgroups are useful for a variety of tasks, such as limiting the resources that a process can use, prioritizing certain processes over others, and isolating processes from each other. This article would describe cgroups, internals, hierarchies and how you could implement them in different scenarios. Understanding the key concepts of cgroups is essential for effectively managing and allocating resources in a Linux system, so let's familiarize ourselves with some key terms! ## Some key terms - **Tasks**: Processes are alternatively called tasks in cgroups context. - **Subsystems**: Simply put these are the resources on which we are going to use cgroups on. They can be disk, memory, bandwidth etc. There are many subsystems in the latest Linux version, but some of the major ones are stated below - **cpu**: This is used to guarantee the minimum number of CPU shares and in the later editions it was extended to provide CPU bandwidth control where we can control the amount of time we can allocate to a task in the cgroup. - **cpuset**: This is used to allow individual CPUs to the processes/tasks in the cgroup - **memory**: This is used to limit the memory being used by a task in the cgroup. - **devices**: This is used to limit the usage of device for a given task in the cgroup. - **net_prio**: This is used to set the priority of network traffic per interface. There are many other subsystems available but we shall limit our conversation mostly to these. - **Hierarchies**: Anyone familiar with the Linux processes knows that it follows a hierarchy-based structure. cgroups in similar sense follow a hierarchy too. We can see the similarities comparing the outputs of `pstree` and `systemd-cgls` in Figure 1.1 and 1.2 : ![Comparing the outputs of pstree and systemd-cgls](https://i.imgur.com/45uOtoL.png) **Fig 1.1** Output of `systemd-cgls` which shows cgroup contents **Fig 1.2** Output of `pstree` which shows a tree of processes. Hierarchies are a bunch of cgroups in a tree-based structure. This hierarchy is required to setup configs for our tasks in cgroups. The hierarchy is determined and maintained according to a set of rules: - A hierarchy of cgroups can have single or multiple subsystems attached to it. A configuration of cgroups can be mapped onto multiple subsystems. - If a hierarchy is assigned a subsystem previously, the same subsystem cannot be assigned another hierarchy. - In a given hierarchy, a task can be part of a single cgroup. However, the same task can be part of multiple hierarchies. - A child task inherits the same cgroup as its parent task. The diagram below shows a brief of some of the few terms we have visited in this section ![cgroups - subsystemd, heirarchies and tasks](https://i.imgur.com/2JRQ1yB.png) cpu, memory and net_cls are the subsystems in the configuration specified. We notice that both cpu and memory subsystems share the same hierarchy. There are 2 hierarchies `cpu_mem_hier` and `net_cls_hier`. These hierarchies consist of multiple cgroups `cg1`, `cg2` and `cg3`. There are 3 tasks where crond, its fork and httpd. As discussed crond’s fork inherits cgroups from its parent process/task. ## Resource limiting and sharing through slices On a broader look resources can be divided into 3 major slices: - **System**: This slice is concerned with all daemons and services like httpd, crond etc. - **User**: This slice is concerned with all user tasks - **Machine**: This slice takes care of all requirements for VMs, containers etc. Sharing is a concept used to manage the usage of resources. Each slice has a relative share of 1024 units. The absolute value of shares from all three slices discussed above amounts to 100%. A small visual example is shown below:  ![Sharing among slices](https://i.imgur.com/4Je3Mzi.png) Using the concept of shares we can essentially manipulate how many shares each slice of resource would get. Imagine we have 3 services A, B and C and we allocate 1024 shares to service A, 256 shares to service B and 512 shares to service C, it essentially means that 57% of the resources would be allocated to service A, 14% to service B and 29% to service C. This is one way we can calculate how the resources can be limited. Now that we are familiar with the fundamentals and theoretical aspects of cgroups, we can now delve into how we can practically use them to manage and allocate resources to limit CPU, memory, network etc. ## Tweaking the limits In this section we shall look at how to create cgroups and manage our subsystems using the concepts learnt in previous sections. Let’s begin by looking at the directory structure of cgroups. cgroups can be mounted anywhere on the filesystem but by default, they are present in `sys/fs/cgroup`. Let’s play around with CPU limits by creating cgroups. ### CPU Imagine in your organisation there are various 2 types of users: managers and interns. The administration would want to allocate more resources to the managers than the interns. Let’s assume the division is 75-25. Let’s begin with the experiment. I am performing these experiments on Ubuntu 20.04 using an Azure B1S VM. We shall start by creating the two new users `manager` and `intern` and setting their respective passwords. Remember to perform all these steps as a root user. ```bash # Create users manager, intern with `useradd` and set password using `passwd` useradd manager && passwd manager useradd intern && passwd intern # Create a repository and allocate the right priviliges mkdir /home/manager && chown manager:manager /home/manager mkdir /home/intern && chown intern:intern /home/intern ``` We use `chown` command to assign our respective home folders with their respective ownership. Doing this will allow us to run scripts in their userspace. Now that we have both our users ready, we can start creating our cgroups. Remember cgroups can be mounted anywhere on the filesystem ```bash # Create a new folder ‘demo_cgroup’, where we will mount our cgroups mkdir demo_cgroup # Now we shall mount cgroups here with the required subsystems ie(cpu and memory) mkdir demo_cgroup/cpu mount -t cgroup -o cpu,cpuacct h1 demo_cgroup/cpu mkdir -p demo_cgroup/cpu/{manager,intern} ``` The mount command is used to attach a file system to a mount point, which here is `demo_cgroup/cpu`. The `-t cgroup` flag specifies that the file system being mounted is a cgroup file system. The `-o cpu,cpuacct` flag specifies that the cgroup file system being mounted should include support for `cpu` and `cpuacct` subsystems. `h1` is the name of the hierarchy within which the cgroup file system should be mounted. So, the command mounts a cgroup file system that supports control and accounting of CPU usage at the `demo_cgroup/cpu` mount point within the `h1` hierarchy. On inspecting the manager/intern directory inside the cpu folder, we see that it is already filled with configuration parameters. Now we can begin with our experiment. Let’s write a simple python code which generates load and run it on both manager and intern users. ```python a=1 while True: a+=1 ``` On the main user(alpha), let’s open `top` to see the CPU usage. ``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2198 manager 20 0 15836 8960 5672 R 49.8 1.0 2:28.52 python3 2199 intern 20 0 15836 8868 5664 R 49.8 1.0 2:26.15 python3 ```  We see that initially, the CPU resources are divided equally with approximately 50% of the resources allotted to each user. We can verify that ```bash cat demo_cgroup/cpu/manager/cpu.shares # 1024 cat demo_cgroup/cpu/intern/cpu.shares # 1024 ```  But as per our requirement we would prefer if a manager has more CPU time allocated compared to an intern. Using the concept of shares, lets assume we allot 1024 shares to manager and 340 shares to intern to maintain an ~75-25 ratio among both ```bash # Updating the shares for the specific user echo 1024 > demo_cgroup/cpu/manager/cpu.shares echo 340 > demo_cgroup/cpu/intern/cpu.shares # Linking the task to the cgroup echo <manager PID> > demo_cgroup/cpu/manager/tasks echo <intern PID> > demo_cgroup/cpu/intern/tasks ``` Now looking at the CPU usage, we find that we have succeeded with our aim ```bash PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2198 manager 20 0 15836 8960 5672 R 74.8 1.0 14:32.69 python3 2199 intern 20 0 15836 8868 5664 R 24.9 1.0 11:21.72 python3 ```  We see that 75% of the CPU bandwidth is allocated to manager and 25% to intern. We can also verify that by looking at the time being spent on the process. We had initially spent some time running with 50-50 resources so the time wouldn’t show an exact 75-25 division, but we can clearly notice that manager gets more of CPU time now ie. 14 mins over intern 11 mins. In a similar fashion we can also limit the memory usage by mounting the memory subsystem and updating the `memory.limit_in_bytes` file. Once we are done with the limiting, in order to unmount the hierarchy we can simply use the `umount` command to do the same. ```bash umount demo_cgroup/cpu ``` ### Devices In Linux, device files are special files that represent devices, such as hard drives and terminals. They allow the operating system to communicate with the devices and perform actions such as reading and writing data. Device files are typically located in the `/dev` directory and are named according to the type of device they represent. For example, `/dev/sda1` represents the first hard drive, `/dev/tty` represents the terminal. Device files can be either character devices or block devices. Character devices transfer data one character at a time, such as a terminal. Block devices transfer data in blocks, such as a hard drive. Device files are used by the operating system to access the devices, and they can also be accessed by users through the command line or by programs that need to interact with the devices. By using `devices` subsystem in cgroups, we can restrict the access that certain processes have to certain device files. Continuing the previous example, there could be certain scenarios where you would want to limit access to certain devices in a restricted environment. Let’s start by mounting the subsystem to our hierarchy. ```bash mount -t cgroup -o devices h1 demo_cgroup/devices ``` This command mounts the `devices` subsystem in the `demo_cgroup/devices` directory and allows you to control access to device files. Let’s start with a simple example to restrict a device `sdb1`. To identify your partitions, type `fdisk -l` and find out. ```bash echo "b 8:17 rw" > demo_cgroup/devices/devices.deny ``` Since we are restricting access to `sdb1`, its a block device specified by `b`, then we enter the major and the minor number to identify the driver associated with our device. We then add `rw` to make sure that the user does not have access to read and write privileges to that device All these details can be extracted by doing the command `ls`. And voila! we have restricted access to `sdb1` device. In another example, we will restrict the `tty` device to our intern user. `/dev/tty` is a device file in Linux that represents the terminal that the user is currently using. When you open a terminal window and enter commands, you are interacting with the system through /dev/tty. It is used to communicate with the operating system and execute commands. In other words, `tty` is like a bridge between the user and the system, allowing you to send commands to the system and receive output in return. By restricting `tty` write access, admins can prevent users from accidentally or intentionally making changes that could potentially cause the system to crash or malfunction. This can help to maintain system stability and prevent downtime. Let’s write a simple bash script which writes to `tty` ```bash #!/bin/bash while : do echo "hello world" > /dev/tty; sleep 2 done ``` This prints the line hello word to `tty` every 2 seconds. Let’s update the tunable and bind the process to our tasks file ```bash echo "c 5:0 w" > demo_cgroup/devices/devices.deny echo <PID of script> > demo_cgroup/devices/tasks ``` We see that now we cannot access `dev/tty`. This is the way we can limit/deny access to devices. ![Restricting access to tty](https://i.imgur.com/LxjjALg.png) cgroups could be used for many more applications. There could be situations when there is a shared host which could be used for team-intensive network queries and personal network queries. It’s pretty clear that in a workspace setup, it would be more favourable if the team network server has more bandwidth than the personal server. This could be a good use case to limit the network bandwidth between two types of services. `net_prio` could be used to perform these tasks. QoS can be maintained and SLAs can be controlled in an efficient manner using cgroups. ### Tooling and latest advancements Through this, we have manually been able to limit the CPU resources. But in a production-based set up this is a very horrible idea. One of the major reasons is that these steps need to be done over and over again and are transient in nature ie. they aren’t saved after a reboot. As admins, we must be able to automate our tasks and be able to replicate the steps quickly and preferably keep them persistent in nature. Let’s now use a few tools to improve the same workflow and limit our resources in a much better manner. We shall try to design a configuration file to allocate how CPU and memory must be shared among the groups' manager and intern using a similar example discussed in the previous section. RHEL users can install `libcgroups` and `libcgroups-tools` to install the toolkit which can simplify these tasks. ```bash mount { cpu = demo_cgroup/cpu_mem; cpuacct = demo_cgroup/cpu_mem; memory = demo_cgroup/cpu_mem; } group manager { cpu { cpu.shares="1024"; } cpuacct { cpuacct.usage="0"; } memory { memory.limit_in_bytes="1.5G"; } } group intern { cpu { cpu.shares="340"; } cpuacct { cpuacct.usage="0"; } memory { memory.limit_in_bytes="0.5G"; } } ``` Once we are done with the step, we start the `cgconfig` service and perform the following steps ```bash service cgconfig start service cgred start # To make these configs persistent chkconfig cgconfig on chkconfig cgred on ``` This is a simple configuration file to achieve what we have described above. There are many persistent alternatives like `systemd` for limiting the default subsystem resources which is preferable to using `cgconfig`. A small glimpse on how `systemd` can help solve these problems are shown below. ```bash systemctl set-property <service name> CPUShares=1024 systemctl set-property <service name> MemoryLimit=50% ``` We can tweak multiple tunables as discussed in properties to their respective values in this way using `systemd` ## Conclusion In this article, we have understood the concepts of cgroups, subsystems, hierarchies and ways to access, limit, deny and monitor the resources of users or groups in organisations. We have briefly discussed ways to set up configuration files and ways to persist these files. cgroups have been widely used in various containerisation technologies to do the similar tasks of limiting resources. With so many use cases, cgroup combined with tools like namespaces and many more can help sysadmins with greater control and isolation. ## References - [World Domination using cgroups](https://www.redhat.com/en/blog/world-domination-cgroups-part-1-cgroup-basics) - [Managing cgroups the hard way-manually](https://www.redhat.com/sysadmin/cgroups-part-three) - [cgroups - Linux Insides](https://0xax.gitbooks.io/linux-insides/content/Cgroups/linux-cgroups-1.html) - [Using cgroups with systemd](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/using-control-groups-version-1-with-systemd_managing-monitoring-and-updating-the-kernel) - Linux manpages