# Linux Power Management - Runtime Power Management [TOC] ## Basics PM core provides a workqueue called `pm_wq`. Device supporting runtime power management will put their work items related to suspend and resume into this workqueue. That runtime power managements also have to work with PM core in order to syncronize with system-wide power management. ### The `dev_pm_ops` - Callbacks for power managements The `runtime_suspend`, `runtime_resume`, and `runtime_idle` of the `struct dev_pm_ops` are specific to runtime power management. These are what a device would use to perform suspend/resume, or send idle request. Note that the `RUNTIME_PM_OPS` macro used for initializing these functions hints this: ```c #define RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ .runtime_suspend = suspend_fn, \ .runtime_resume = resume_fn, \ .runtime_idle = idle_fn, ``` ### The `dev_pm_info` There are also certain fields used by device drivers to update the information on how runtime powr managements are done. Notably the struct `dev_pm_info` power in a `struct device`. The fields are then used by PM core. Before doing any suspend or resume, it will first check for these fields to see whether they satisfy certain criteria. Only when the criteria are met can the PM core proceed the operations. For example, the `autosuspend_delay` specify how long the inactivity should last before the device can enter suspension. ### `runtime.c` - Helper functions for dirvers to request suspend/resume Finally, there are some helper functions in `drivers/base/power/runtime.c`. These helper functions, once called by device drivers, queue works into `pm_wq` in the PM core. PM core will the call the pm callback functions of the device drivers when processing the work item. ## Power transition of group of devices It is not always the case where functionalities of suspend/resume come from individual device drivers. It may also come from layers above devices. ### The PM domain It is possible to suspend/resume devices in groups, so that one doesn't have to exhaustingly call into pm functions of each device to be suspended/resumed. Group of devices supporting power management can be grouped into **power domains**. The power domain a device belongs to can be access by the `dev->pm_domain` of that device. Domain-wide suspend/resume makes it possible to do power transition on group of devices. ### Other ways to specify groups of devices Other than the power domain, it is also possible to specify devices to be suspended/resumed by their sysfs classes (`dev->class->pm`), types (`dev->type->pm`, or bus (`dev->bus->pm`). ### Dependencies between devices It is possible for a device to depend on other devices for power. Power management uses *child-parent* relationship for describing the dependencies for power. For example, there may be mutiple devices on the same bus. In this case, the devices whose power depending on the bus are called the *children* of the bus; while the bus is called the *parent* for those devices. Wheter a parent device can be suspended before every children devices can be specified by `pm_suspend_ignore_children()`: ```c /** * pm_suspend_ignore_children - Set runtime PM behavior regarding children. * @dev: Target device. * @enable: Whether or not to ignore possible dependencies on children. * * The dependencies of @dev on its children will not be taken into account by * the runtime PM framework going forward if @enable is %true, or they will * be taken into account otherwise. */ static inline void pm_suspend_ignore_children(struct device *dev, bool enable) { dev->power.ignore_children = enable; } ``` See checks in the [`rpm_check_suspend_allowed()`](https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L268) function. ## Initialization in drivers On initialization, the PM core considers all devices to be in suspension on power up. This may not always be true. In this case `pm_runtime_set_active()` can be use to infrom PM core that this device is active. ### Enable runtime PM for a device Use `devm_pm_runtime_enable()` to have PM core enable runtime PM for this device. This has to be enable first for other helper functions in `drivers/base/power/runtime.c` can be used. ### Determine context in which the pm callbacks can run The [`pm_runtime_irq_safe()`](https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L1604) tells the PM core that the `runtime_suspend()` and `runtime_resume()` callbacks for this device should always be invoked with the spinlock held and interrupts disabled. Note that this will implicitly disallow parent device to suspend. The purpose of this is to prevent the child devices waiting for an already-suspended parent in the atomic context, which deadlocks the system. ## Suspend/Resume at runtime ### `_get_` and `_set_` The `dev_pm_info power` of a device maitain a usage counter `usage_counter`. This counter is for tracking user of this device. Only when this counter is `0` can a device be allowed to enter suspension. Calling the `pm_runtime_resume_and_get()` will increment this usage counter and try to have device resume. One mnemonic is that the one who resume the device is assumed to attempt to aquire that device. There are also other `_get_` functions for increment the usage counter. For example [`pm_runtime_get_sync()`](https://elixir.bootlin.com/linux/latest/source/include/linux/pm_runtime.h#L411), used in the graphic driver for Intel processors (the `i915` driver): ```c static intel_wakeref_t __intel_runtime_pm_get(struct intel_runtime_pm *rpm, bool wakelock) { struct drm_i915_private *i915 = container_of(rpm, struct drm_i915_private, runtime_pm); int ret; ret = pm_runtime_get_sync(rpm->kdev); drm_WARN_ONCE(&i915->drm, ret < 0, "pm_runtime_get_sync() failed: %d\n", ret); intel_runtime_pm_acquire(rpm, wakelock); return track_intel_runtime_pm_wakeref(rpm); } ``` The `_put_` functions decrements the `usage_counter`. See the [*Runtime Power Management Framework for I/O Devices*](https://www.kernel.org/doc/Documentation/power/runtime_pm.txt) for further explanation. ### Example: manually reset the`amdgpu` with debugfs The `amdgpu` driver exposes debufs interface for poking the AMD GPUs. One of them can trigger GPU reset by reading the debugfs file (e.g. by `cat`). The read result won't In the read callback for the debugfs entry, the `gpu_recover_get()`, it tries to resume the GPU by `pm_runtime_get_sync()` before queueing the resuming `work_struct` and wait for it to finsh: ```c static int gpu_recover_get(void *data, u64 *val) { struct amdgpu_device *adev = (struct amdgpu_device *)data; struct drm_device *dev = adev_to_drm(adev); int r; r = pm_runtime_get_sync(dev->dev); if (r < 0) { pm_runtime_put_autosuspend(dev->dev); return 0; } if (amdgpu_reset_domain_schedule(adev->reset_domain, &adev->reset_work)) flush_work(&adev->reset_work); *val = atomic_read(&adev->reset_domain->reset_res); pm_runtime_mark_last_busy(dev->dev); pm_runtime_put_autosuspend(dev->dev); return 0; } DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info); DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL, "%lld\n"); ``` ## References ### [Adding Runtime Power Management Capabilities to Device Drivers - Shreeya Patel, Collabora](https://youtu.be/L_pNP9LkTOk) {%youtube L_pNP9LkTOk %} ### [SAN19-421 Training: Device power management for idle](https://youtu.be/wbG1rXibzMY) {%youtube wbG1rXibzMY %} ### [BKK19-119 - Device power management and idle](https://youtu.be/LaFartS_dv0) {%youtube LaFartS_dv0 %} ### [LPC2019 - Integration of PM-runtime with System-wide Power Management](https://youtu.be/5GXjQsnH8H8) {%youtube 5GXjQsnH8H8 %} ### [The Fall of the Legacy - Vaibhav Gupta, Open Source Contributor](https://youtu.be/MAxzhurvDTw) {%youtube MAxzhurvDTw %}