GPE Storm

One may find his/her Linux machine always has 1 CPU under full load. In system monitor such as top or htop, a command called kworker takes up nearly 100% of a CPU.

Observation

This problem mostly happens on laptops. On my MSI GS60 6QD, with kernel Linux 6.8.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr 2024 15:20:28 +0000 x86_64, top two CPU usage listed in top are kworker/0:1+kacpid and irq/9-acpi

PID USER   PR  NI  VIRT  RES  SHR S  %CPU  %MEM     TIME+ COMMAND
 10 root   20   0     0    0    0 R  82.1   0.0   9:32.66 kworker/0:1+kacpid
 79 root  -51   0     0    0    0 R  15.6   0.0   1:48.09 irq/9-acpi 

kworker/0:1+kacpid constantly takes high CPU usage, and irq/9-acpi constantly takes around 15% as well. This drains the battery fast and drives CPU fans to high load. It happens right from booting up.

Investigation

kworker refers to worker threads managed in worker-pool that consume work items in workqueue (CMWQ). Subsystems and drivers can create and queue work items through workqueue API functions as they see fit. The number after kworker/ denotes the CPU core (in this case, core 0) and the specific worker id. These numbers are determined when create_worker() is called (defined in kernel/workqueue.c). kacpid stands for Kernel ACPI Daemon. ACPI is a standard for handling power management and hardware configuration. The kacpid process is responsible for ACPI-related tasks such as managing power events and handling ACPI interrupts.

irq/9-acpi indicates that it's an interrupt request associated with IRQ line 9 and ACPI. In x86 IRQ, IRQ 9 is on Slave PIC, for ACPI interrupts on Intel Chipsets.

The handling of interrupts has two parts: top half and bottom half. Top half, which receives the hardware interrupt, needs to run as quickly as possible. Bottom half, where corresponding tasks to the interrupt are executed, is not as time-critical in non-real-time system and can be deferred for later execution. There are several options for this deferral: softirq, tasklet, workqueue and threaded interrupts. In this case, ACPI uses workqueue to do so.

So far we can figure out that my computer is probably handling excessive numbers ACPI interrupts. We have two options:

  1. Fix the hardware and firmware. Stop it from sending insane number of interrupts.
  2. Stop handling the interrupts.

Possible Solutions

Stop handling the interrupts seems keep a fairly easy option. UEFI firmware on my laptop is closed-source software. Open-source ones such as Core Boot don't support my laptop, either. This document in kernel explains how to use sysfs to check ACPI firmware behavior, including interrupts.

However, one of the main functions of ACPI is to make the platform understand random hardware without special driver support. So while the SCI handles a few well known (fixed feature) interrupts sources, such as the power button, it can also handle a variable number of a "General Purpose Events" (GPE).

A GPE vectors to a specified handler in AML, which can do a anything the BIOS writer wants from OS context. GPE 0x12, for example, would vector to a level or edge handler called _L12 or _E12. The handler may do its business and return. Or the handler may send send a Notify event to a Linux device driver registered on an ACPI device, such as a battery, or a processor.

To figure out where all the SCI's are coming from, /sys/firmware/acpi/interrupts contains a file listing every possible source, and the count of how many times it has triggered

Besides this, user can also write specific strings to these files to enable/disable/clear ACPI interrupts in user space, which can be used to debug some ACPI interrupt storm issues.

As suggested, use grep . -r /sys/firmware/acpi/interrupts/ to list possible sources and triggered count:

error:       0
ff_gbl_lock:       0         disabled     unmasked
ff_pmtimer:       0     STS invalid      unmasked
ff_pwr_btn:       0  EN     enabled      unmasked
ff_rt_clk:       0         disabled     unmasked
ff_slp_btn:       0         invalid      unmasked
gpe00:       0         invalid      unmasked
gpe0A:       0         invalid      unmasked
gpe0B:       0         invalid      unmasked
gpe0C:       0  EN     enabled      unmasked
gpe0D:       0         invalid      unmasked
gpe0E:       0         invalid      unmasked
gpe0F:       0         invalid      unmasked
gpe01:       0         invalid      unmasked
...
...
...
gpe58:       0         invalid      unmasked
gpe59:       0         invalid      unmasked
gpe60:       0         invalid      unmasked
gpe61: 2461007     STS enabled      unmasked
gpe62:       0  EN     enabled      unmasked
gpe63:       0         invalid      unmasked
gpe64:       0         invalid      unmasked
gpe65:       0         invalid      unmasked
gpe66:       9  EN     enabled      unmasked
gpe67:       0         enabled      unmasked
gpe68:       0         invalid      unmasked
gpe69:       0         disabled     unmasked
gpe70:       0         invalid      unmasked
gpe71:       0         invalid      unmasked
gpe72:       0         invalid      unmasked
gpe73:       0         invalid      unmasked
gpe74:       0         invalid      unmasked
gpe75:       0         invalid      unmasked
gpe76:       0         invalid      unmasked
gpe77:       0         invalid      unmasked
gpe78:       0         invalid      unmasked
gpe79:       0         invalid      unmasked
gpe_all: 2461163
sci: 2461131
sci_not:       4

As listed, GPE61 is triggered 2461007 times in only a few minutes. Disable GPE61 by echo disable > /sys/firmware/acpi/interrupts/gpe61. To disable it everytime system booted, add as a cron job in root user.

One may think how about disable the interrupt right at PIC? This may disable all the other normal ACPI interrupts.

Along the above investigation, this could be the stepping stone of exploring interrupt mechanism in Linux kernel.

Further investigation

  • How about masking it instead of disabling it?
  • Use tools/workqueue/wq_monitor.py or tools/workqueue/wq_dump.py to examine what GPE61 on my laptop really does. Find if disabling it poses any impact on the system. Not sure if my battery died pretty quick because of this.
    • This post suggest "Transactions will use polling mode means that to handle the storm, the OS will stop using GPEs/interrupts to be informed of ACPI events and will instead on its own schedule "poll" or proactively ask the ACPI EC if any events it should know of have occurred. This way, the OS can still effectively perform ACPI functions while not being overwhelmed with a GPE storm."
  • Dig further into how interrupt works by following this Lab.
  • A comment in a LWN article introducing Concurrency-managed workqueues:

    Article:
    The ACPI code had bound a workqueue thread to CPU 0 because some operations corrupt the system if run anywhere else;

    Comment:
    (reply to someone being shocked) On some HPs, at least, certain ACPI operations trigger SMIs that then appear to be run on the CPU that triggered the SMI. HP's SMI handler seems to fail to restore CPU state if it runs on anything other than CPU 0.

    • This fits the observation that kworker works on CPU 0.
  • Notice that "disable interrupt" in the top half is different from this one we are doing here. Disabling interrupts in the top half is stop receiving all the other maskable interrupts, which is a key latency concern for Linux to be hard-real-time.