Linux Power Management - Runtime Power Management

Linux Power Management - Runtime Power Management

Basics

PM core provides a workqueue called pm_wq. Device supporting runtime power management will put their work items related to suspend and resume into this workqueue.

That runtime power managements also have to work with PM core in order to syncronize with system-wide power management.

The `dev_pm_ops` - Callbacks for power managements

The runtime_suspend, runtime_resume, and runtime_idle of the struct dev_pm_ops are specific to runtime power management. These are what a device would use to perform suspend/resume, or send idle request.

Note that the RUNTIME_PM_OPS macro used for initializing these functions hints this:

#define RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
	.runtime_suspend = suspend_fn, \
	.runtime_resume = resume_fn, \
	.runtime_idle = idle_fn,

The `dev_pm_info`

There are also certain fields used by device drivers to update the information on how runtime powr managements are done. Notably the struct dev_pm_info power in a struct device.

The fields are then used by PM core. Before doing any suspend or resume, it will first check for these fields to see whether they satisfy certain criteria. Only when the criteria are met can the PM core proceed the operations.

For example, the autosuspend_delay specify how long the inactivity should last before the device can enter suspension.

`runtime.c` - Helper functions for dirvers to request suspend/resume

Finally, there are some helper functions in drivers/base/power/runtime.c. These helper functions, once called by device drivers, queue works into pm_wq in the PM core. PM core will the call the pm callback functions of the device drivers when processing the work item.

Power transition of group of devices

It is not always the case where functionalities of suspend/resume come from individual device drivers. It may also come from layers above devices.

The PM domain

It is possible to suspend/resume devices in groups, so that one doesn't have to exhaustingly call into pm functions of each device to be suspended/resumed.

Group of devices supporting power management can be grouped into power domains. The power domain a device belongs to can be access by the dev->pm_domain of that device. Domain-wide suspend/resume makes it possible to do power transition on group of devices.

Other ways to specify groups of devices

Other than the power domain, it is also possible to specify devices to be suspended/resumed by their sysfs classes (dev->class->pm), types (dev->type->pm, or bus (dev->bus->pm).

Dependencies between devices

It is possible for a device to depend on other devices for power. Power management uses child-parent relationship for describing the dependencies for power.

For example, there may be mutiple devices on the same bus. In this case, the devices whose power depending on the bus are called the children of the bus; while the bus is called the parent for those devices. Wheter a parent device can be suspended before every children devices can be specified by pm_suspend_ignore_children():

/**
 * pm_suspend_ignore_children - Set runtime PM behavior regarding children.
 * @dev: Target device.
 * @enable: Whether or not to ignore possible dependencies on children.
 *
 * The dependencies of @dev on its children will not be taken into account by
 * the runtime PM framework going forward if @enable is %true, or they will
 * be taken into account otherwise.
 */
static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
{
	dev->power.ignore_children = enable;
}

See checks in the rpm_check_suspend_allowed() function.

Initialization in drivers

On initialization, the PM core considers all devices to be in suspension on power up. This may not always be true. In this case pm_runtime_set_active() can be use to infrom PM core that this device is active.

Enable runtime PM for a device

Use devm_pm_runtime_enable() to have PM core enable runtime PM for this device. This has to be enable first for other helper functions in drivers/base/power/runtime.c can be used.

Determine context in which the pm callbacks can run

The pm_runtime_irq_safe() tells the PM core that the runtime_suspend() and runtime_resume() callbacks for this device should always be invoked with the spinlock held and interrupts disabled.

Note that this will implicitly disallow parent device to suspend. The purpose of this is to prevent the child devices waiting for an already-suspended parent in the atomic context, which deadlocks the system.

Suspend/Resume at runtime

`_get_` and `_set_`

The dev_pm_info power of a device maitain a usage counter usage_counter. This counter is for tracking user of this device. Only when this counter is 0 can a device be allowed to enter suspension.

Calling the pm_runtime_resume_and_get() will increment this usage counter and try to have device resume. One mnemonic is that the one who resume the device is assumed to attempt to aquire that device.

There are also other _get_ functions for increment the usage counter. For example pm_runtime_get_sync(), used in the graphic driver for Intel processors (the i915 driver):

static intel_wakeref_t __intel_runtime_pm_get(struct intel_runtime_pm *rpm,
					      bool wakelock)
{
	struct drm_i915_private *i915 = container_of(rpm,
						     struct drm_i915_private,
						     runtime_pm);
	int ret;

	ret = pm_runtime_get_sync(rpm->kdev);
	drm_WARN_ONCE(&i915->drm, ret < 0,
		      "pm_runtime_get_sync() failed: %d\n", ret);

	intel_runtime_pm_acquire(rpm, wakelock);

	return track_intel_runtime_pm_wakeref(rpm);
}

The _put_ functions decrements the usage_counter.

See the Runtime Power Management Framework for I/O Devices for further explanation.

Example: manually reset the`amdgpu` with debugfs

The amdgpu driver exposes debufs interface for poking the AMD GPUs. One of them can trigger GPU reset by reading the debugfs file (e.g. by cat). The read result won't

In the read callback for the debugfs entry, the gpu_recover_get(), it tries to resume the GPU by pm_runtime_get_sync() before queueing the resuming work_struct and wait for it to finsh:

static int gpu_recover_get(void *data, u64 *val)
{
	struct amdgpu_device *adev = (struct amdgpu_device *)data;
	struct drm_device *dev = adev_to_drm(adev);
	int r;

	r = pm_runtime_get_sync(dev->dev);
	if (r < 0) {
		pm_runtime_put_autosuspend(dev->dev);
		return 0;
	}

	if (amdgpu_reset_domain_schedule(adev->reset_domain, &adev->reset_work))
		flush_work(&adev->reset_work);

	*val = atomic_read(&adev->reset_domain->reset_res);

	pm_runtime_mark_last_busy(dev->dev);
	pm_runtime_put_autosuspend(dev->dev);

	return 0;
}

DEFINE_SHOW_ATTRIBUTE(amdgpu_debugfs_fence_info);
DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_gpu_recover_fops, gpu_recover_get, NULL,
			 "%lld\n");