# Linux Power Management - System Sleep States [TOC] This referes to the sleep states defined in [*System Sleep States*](https://www.kernel.org/doc/html/latest/admin-guide/pm/sleep-states.html#system-sleep-states) of the kernel documentation. Also see: 1. [*Power management/Suspend and hibernate*](https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate) in Arch Wiki 2. [*Suspend and hibernate*](https://wiki.gentoo.org/wiki/Suspend_and_hibernate) in Gentoo Wiki. 3. [`include/linux/pm.h`](https://elixir.bootlin.com/linux/latest/source/include/linux/pm.h#L311) in Linux kernel. 4. [*Device Power Management Basics*](https://www.kernel.org/doc/html/latest/driver-api/pm/devices.html) in Linux Kernel Documentaion to see how devices handle the system-wide suspension. For more case studies: 1. [Power Management](https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&component=Hibernation%2FSuspend&product=Power%20Management) board on Bugzilla. 2. Discussion for [Acheiving s3 suspend-to-ram?](https://github.com/jakeday/linux-surface/issues/571) in surface-linux issues. 3. [*DebuggingKernelSuspend*](https://wiki.ubuntu.com/DebuggingKernelSuspend) in Ubuntu Wiki ## Power states in Linux ![Screenshot 2024-03-10 at 10.59.16 PM](https://hackmd.io/_uploads/B1-dsroTp.png) Conceptually, there are 4 main *power states* supported by kernel, according to [*System Sleep States*](https://www.kernel.org/doc/html/latest/admin-guide/pm/sleep-states.html) of Linux kernel documentation: 1. **Suspend-to-idle**: CPU are put into idle states. Rely on `cpuidle` subsystem. 2. **Suspend-to-standby**: See 02:31 of [*Is Linux Suspend ready for the next decade - Len Brown*](https://youtu.be/Pv5KvN0on0M?si=rPTXXQXiZtQ4JFsz&t=152): *...standby is sort of deprecated. I haven't seen standby on a machine for years.* By Todd Brandt from Intel. Maintainer of `sleepgraph` in the [`pm-graph`](https://github.com/intel/pm-graph). 3. **Suspend-to-RAM**: Unplug all the CPUs except the booting CPU. Relies cpu hot-plugging. Also most likely relies on platform firmware support. 4. **Suspend-to-disk**: a n image of the current states of the system is created. The image is stored to disk and the systme gets powered off. When the system is powered on again, this image is used to restore the state. This is also called *hibernate*. The final one (suspend-to-disk) is a rather special one. Resuming from suspend-to-disk is closer to reboot than resume compared to the other three methods. Even the `pm_lables[]` and `mem_sleep_lables[]` [`kernel/power/suspend.c`](https://elixir.bootlin.com/linux/latest/source/kernel/power/suspend.c#L36) don't include it. In fact, its core functionalities are implemented separately in the [`kernel/power/hibernate.c`](https://elixir.bootlin.com/linux/latest/source/kernel/power/hibernate.c), rather than the [`kernel/power/suspend.c`](https://elixir.bootlin.com/linux/latest/source/kernel/power/suspend.c). Indeed, most function in `suspend.c` do "if it is s2idle then do s2idle; otherwise do suspend". ## Naming Maniacs Linux maps the above power states into different interfaces. All the above power states except the suspend-to-disk has different set of names in various places. 1. The `suspend-to-{idle, standby, ram}` namings are generic terms referring to the actual underlying mechanisms. 2. The `freeze`, `standby`, `mem` namings are for sysfs interface -- don't let the `mem` fool you! It is a configurable state and can be anyone mentioned in 1. 3. The `s2idle`, `shallow`, `deep` namings are for configuring what the `mem` does. Those are the terminologies you use when it comes to configuring which state mentioned in 1. you'd like the `mem` mentioned in 2. does. ### Suspend-to-idle/standby/ram: generic terms for power states Those are the generic terms when it refers to the power states. ### `feeze`, `standby`, `mem`: terms used by sysfs interface The sysfs add another layer of indirection. Instead of using those 4 terminologies, sysfs speaks of `freeze`, `standby`, and `mem`. The `/sys/power/state` under the sysfs is the interface to control system power state for Linux. Reading the file: ``` $ cat /sys/power/state ``` shows supported power states of current system. For example on my laptop this shows: ``` freeze mem disk ``` Writing name of the power state to this file will put system into that state. In those options, the `freeze` refers to suspend-to-idle, while the `disk` refers to suspend-to-disk. The `mem` is the tricky one. What `mem` does is configurable by the user. It can be configured as suspend-to-ram, if this is availble for the platform; but it could also be configured to suspend-to-idle as a fall-back mechanism, especially if suspend-to-ram is not supported. The `/sys/power/mem_sleep` is what control this behavior -- and it has another set of terminologies for those states. This is yet another layer inderection (and hopefully the last layer) for the power state namings. ### `s2idle`, `shallow`, `deep`: terms for configuring the `mem` behavior The `/sys/power/mem_sleep` file control what system does when `mem` is written into the `/sys/power/state`. Reading the file: ``` $ cat /sys/power/mem_sleep ``` Shows all available options. For example, on my Intel MacBook, 2 operations are available: ``` s2idle [deep] ``` As you can see, there's another set of terminologies defined for using this interface. Similar to other interface, writing one of those options to the file will change the behavior accordingly. Note that the current configuration will be surrounded by square brackets. In this example it is `deep`, meaning that this is what system would do when `mem` is written into `/sys/power/state`. This can be changed by writting different options into the `mem_sleep` file. For example, if I write `si2dle` into the `/sys/power/mem_sleep`: ``` $ echo 's2idle' | sudo tee /sys/power/mem_sleep ``` And read that again: ``` $ cat /sys/power/mem_sleep ``` Now it is the `s2idle` being surrounded by square brackets: ``` [s2idle] deep ``` meaning that now the `mem` options is configrured to perform suspend-to-idle. All possible option, sorted by how deep the system sleeps, are `s2idle`, `shallow`, `deep`. The `s2idle` unsprisingly means suspend-to-idle; the `shallow` means suspend-to-standby; and the `deep` means suspend-to-ram. ### The cheat sheet: `pm_lables[]` and the `mem_sleep_labels[]` Internally they are defined in the `pm_labels[]` and the `mem_sleep_labels[]` in [`kernel/power/suspend.c`](https://elixir.bootlin.com/linux/latest/source/kernel/power/suspend.c#L36): ```c const char * const pm_labels[] = { [PM_SUSPEND_TO_IDLE] = "freeze", [PM_SUSPEND_STANDBY] = "standby", [PM_SUSPEND_MEM] = "mem", }; const char *pm_states[PM_SUSPEND_MAX]; static const char * const mem_sleep_labels[] = { [PM_SUSPEND_TO_IDLE] = "s2idle", [PM_SUSPEND_STANDBY] = "shallow", [PM_SUSPEND_MEM] = "deep", }; ``` This is all you nead to clarify. ### Why? Imagine you'd like write a script to put system into suspend-to-ram. Doing this in a naive way would have to check whether a power state supported or not before doing anything, and add extra fallbacks in case it is not supported. Or you could just use the `mem`. If suspend-to-ram is not supported then it automatically fallbacks to suspend-to-idle. ## Phases of suspension Linux divides suspension into multiple phases. Callbacks in device drivers for each phases can be implemented. In each phase of suspension, those per-device suspension callbacks are executed. The phases are defined in the `enum suspend_stat_step` of the [` include/linux/suspend.h`](https://elixir.bootlin.com/linux/latest/source/include/linux/suspend.h#L43): ```c num suspend_stat_step { SUSPEND_FREEZE = 1, SUSPEND_PREPARE, SUSPEND_SUSPEND, SUSPEND_SUSPEND_LATE, SUSPEND_SUSPEND_NOIRQ, SUSPEND_RESUME_NOIRQ, SUSPEND_RESUME_EARLY, SUSPEND_RESUME }; ``` Note that phases of suspend and resume are mixed in this enum. The actual phases would be like this, with each phase in the device resume path theoretically unwinding the corresponding phase on suspension path. [![Screenshot 2024-03-10 at 11.03.01 PM](https://hackmd.io/_uploads/HJKToBipp.png)](https://youtu.be/wvcM-Uf3DBU?si=p7b3edMkQ062yUtn&t=928) ## What devices do when system-wide suspension happens? For indicidual device, callbacks invoked at phases of system-wide suspend/resume are in the [`dev_pm_ops`](https://elixir.bootlin.com/linux/latest/source/include/linux/pm.h#L62). The initializing macros shed lights on what they are. See the `SYSTEM_SLEEP_PM_OPS`, `LATE_SYSTEM_SLEEP_PM_OPS`, `NOIRQ_SYSTEM_SLEEP_PM_OPS` in [`include/linux/pm.h`](https://elixir.bootlin.com/linux/latest/source/include/linux/pm.h#L312). Note that for a device, callbacks invoked when system-wide suspension happens are not necessarily the same as those used for runtime power management. One way to see the difference is that they have separate initializing macros. For example the `RUNTIME_PM_OPS` in the same file is a macro for initializing runtime PM functions, and it surely initializes different function pointers than those found in the system suspend-resume ones. That being said, a device driver author may decide to use the same function for a device on both system-wide suspend/resume and runtime PM, like the cases where [`DEFINE_SIMPLE_DEV_PM_OPS`](https://elixir.bootlin.com/linux/latest/source/include/linux/pm.h#L411) is used. Comments in the `include/linux/pm.h` also states this: ``` /* * Use this if you want to use the same suspend and resume callbacks for suspend * to RAM and hibernation. * ... */ ``` ## Wake-up system from suspension The system in suspension needs external input to signify the beginning of resume. Those are called "wakeup souces" in Linux power management terminologies. [![Screenshot 2024-03-10 at 11.06.49 PM](https://hackmd.io/_uploads/Sy1a3SoTT.png)](https://youtu.be/wvcM-Uf3DBU?si=6Iy6G9H_Cba9cp5R&t=1799) ### Devices as wakeup sources One big category of those wakeup sources is the interrupts from devices. Whether interrupts from a device should be treated as wakeup sources can be set via its sysfs node. It is the `/sys/devices/.../power/wakeup` that does this. Writing `1` to this file makes interrupts from this device wakeup sources; writing `0` means not to treat those interrupts as wakeup sources. Reading the file shows the current configuration. See [`sysfs-devices-power`](https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-power) in kernel documentation for detail. [![Screenshot 2024-03-10 at 11.17.17 PM](https://hackmd.io/_uploads/B1hzyUja6.png)](https://youtu.be/wvcM-Uf3DBU?si=B_C4gJ12Evqp20LM&t=1275) ### Wakeup by platform Another wakeup source from other than the devices is the ACPI. For example, the EC (embedded controller) can also be a wakeup source. In this case you may see the system wakeup by `acpi` in `/proc/interrupts`. [![Screenshot 2024-03-10 at 11.19.17 PM](https://hackmd.io/_uploads/SyoFJIi6a.png)](https://youtu.be/wvcM-Uf3DBU?si=6Iy6G9H_Cba9cp5R&t=1799) ## References ### [Is Linux Suspend ready for the next decade - Len Brown](https://youtu.be/Pv5KvN0on0M) {%youtube Pv5KvN0on0M %} ### [Evolution of Suspend-to-Idle Support in The Linux Kernel - Rafael Wysocki (LCA 2021 Online)](https://youtu.be/wvcM-Uf3DBU) {%youtube wvcM-Uf3DBU %} ### [BKK19-119 - Device power management and idle](https://youtu.be/LaFartS_dv0) {%youtube LaFartS_dv0 %}