Nginx and PHP-FPM: Tracing IO Wait with awk and lsof

---
# System prepended metadata

title: 'Nginx and PHP-FPM: Tracing IO Wait with awk and lsof'

---

<h1>Nginx and PHP-FPM: Tracing IO Wait with awk and lsof</h1>

<div>
<p>Post-mortem report for incident ticket #8849. The trigger was a routine OS package update on the primary web nodes for a fashion apparel storefront. The <code>apt-get upgrade</code> process bumped the PHP runtime from version 8.1.18 to 8.2.x, applying the package maintainer's default configuration files. Initial smoke tests passed, and the nodes were returned to the load balancer pool. Approximately 36 hours later, our Prometheus alerting stack registered a steady increase in 502 Bad Gateway responses. The errors were not uniform; they appeared intermittently during standard traffic patterns. The CPU load average was elevated but not critical, sitting at 4.5 on 8-core instances. Memory usage was within normal bounds. The application was functional for the majority of users, but specific transactions were stalling and terminating. We needed to isolate the exact requests failing, determine the bottleneck, and restore service stability.</p>

<h2>Log Washing with Awk and Sed</h2>

<p>To identify a pattern in the intermittent 502 errors, I bypassed the visualization dashboards and pulled the raw Nginx access logs from the affected nodes. Parsing millions of lines requires efficient command-line utilities. I used <code>awk</code> to filter the log file, extracting the timestamp, HTTP status code, request URI, and the upstream response time. The goal was to group the failing requests to see if a specific controller or endpoint was responsible.</p>

<pre><code># Extract 502 errors, sort by URI, and count occurrences
awk '($9 == "502") {print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -n 15

   1425 /wp-json/wc/store/v1/cart/add-item
    850 /wp-admin/admin-ajax.php?action=wc_checkout
    312 /checkout/
     45 /cart/
      2 /favicon.ico</code></pre>

<p>The <code>awk</code> output clearly indicated that the failures were concentrated on state-mutating endpoints, specifically the WooCommerce REST API cart endpoints and the legacy AJAX checkout routes. To understand the timing, I ran a secondary <code>awk</code> script to calculate the average <code>$upstream_response_time</code> for these specific 502 errors.</p>

<pre><code># Calculate average upstream response time for 502 errors on the cart endpoint
awk '($9 == "502" &amp;&amp; $7 ~ /add-item/) { sum += $NF; count++ } END { if (count &gt; 0) print sum / count }' /var/log/nginx/access.log

60.005</code></pre>

<p>The upstream response time was exactly 60 seconds. This correlates directly with the <code>fastcgi_read_timeout</code> directive configured in our Nginx server blocks. Nginx was waiting exactly 60 seconds for the PHP-FPM worker to return a response header before severing the Unix domain socket connection and returning a 502 Bad Gateway to the client. The PHP process was accepting the request but hanging indefinitely. </p>

<h2>Tracing Process States with Iotop and lsof</h2>

<p>With the PHP-FPM workers identified as the source of the stall, I logged into the primary node during an active error spike. I ran <code>top</code> and noticed that the <code>%wa</code> (I/O wait) metric was fluctuating between 15% and 25%. This indicates that the CPU is idle but unable to execute instructions because it is waiting for a block device (disk) read or write operation to complete.</p>

<p>I launched <code>iotop</code> to observe the disk I/O per process.</p>

<pre><code># iotop -o -b -n 5
Total DISK READ:         0.00 B/s | Total DISK WRITE:       145.20 K/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:     180.45 K/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO&gt;    COMMAND
  14820 be/4 www-data    0.00 B/s   12.40 K/s  0.00 %  85.20 % php-fpm: pool www
  14822 be/4 www-data    0.00 B/s   16.80 K/s  0.00 %  82.15 % php-fpm: pool www
  14825 be/4 www-data    0.00 B/s   14.20 K/s  0.00 %  88.00 % php-fpm: pool www
  14829 be/4 www-data    0.00 B/s   12.40 K/s  0.00 %  84.50 % php-fpm: pool www</code></pre>

<p>Multiple PHP-FPM worker threads were exhibiting an <code>IO&gt;</code> (I/O wait time) exceeding 80%, accompanied by continuous, small disk writes. A standard WordPress REST API request should rely almost entirely on the MySQL database over the network socket and the Redis object cache in memory. It should not be performing continuous synchronous writes to the local block device.</p>

<p>To determine exactly what files these PHP workers were writing to, I used the <code>lsof</code> (List Open Files) utility against one of the stalled Process IDs (PIDs).</p>

<pre><code># lsof -p 14820 | grep -v "\.so"
COMMAND   PID     USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
php-fpm 14820 www-data  cwd    DIR              259,1     4096   262144 /var/www/html
php-fpm 14820 www-data  rtd    DIR              259,1     4096        2 /
php-fpm 14820 www-data  txt    REG              259,1  5812344   131072 /usr/sbin/php-fpm8.2
php-fpm 14820 www-data    0u  unix 0xffff88810a1b2c00      0t0   819200 type=STREAM
php-fpm 14820 www-data    1u  unix 0xffff88810a1b2c00      0t0   819200 type=STREAM
php-fpm 14820 www-data    2u  unix 0xffff88810a1b2c00      0t0   819200 type=STREAM
php-fpm 14820 www-data    3u  IPv4             819201      0t0      TCP 10.0.1.10:48192-&gt;10.0.1.20:3306 (ESTABLISHED)
php-fpm 14820 www-data    4w   REG              259,1    14208   524288 /tmp/.opcache/d41d8cd98f00b204e9800998ecf8427e/wp-content/plugins/woocommerce/includes/class-wc-cart.php.bin
php-fpm 14820 www-data    5w   REG              259,1     8192   524289 /tmp/.opcache/d41d8cd98f00b204e9800998ecf8427e/wp-content/plugins/woocommerce/includes/class-wc-session-handler.php.bin</code></pre>

<p>File descriptors 4 and 5 revealed the issue. The PHP worker was holding write locks (<code>4w</code>, <code>5w</code>) on binary files located in <code>/tmp/.opcache/</code>. The Zend Engine was writing compiled opcode files directly to the temporary directory on the physical disk.</p>

<h2>The Mechanics of OPcache File Fallback</h2>

<p>The Zend Engine executes PHP scripts in a multi-step process: lexical analysis (tokenization), parsing (generating the Abstract Syntax Tree or AST), compilation (translating the AST into Zend opcodes), and finally execution in the virtual machine. The OPcache extension improves performance by storing the compiled opcodes in shared memory (SHM), allowing subsequent requests to bypass the parsing and compilation phases.</p>

<p>During the PHP 8.2 upgrade, the package manager installed a fresh <code>/etc/php/8.2/fpm/php.ini</code> and a default <code>/etc/php/8.2/fpm/conf.d/10-opcache.ini</code> file. The default configuration sets <code>opcache.memory_consumption=128</code> (128 Megabytes). Our production workload, utilizing a complex e-commerce stack, requires significantly more shared memory to store the compiled AST for the entire application.</p>

<p>When the 128MB shared memory segment became completely full, OPcache could no longer store new opcodes in RAM. Furthermore, the default configuration enabled a feature called <code>opcache.file_cache</code>. When this directive is set to a directory path (in this case, configured to <code>/tmp/.opcache</code> by the OS package defaults), the Zend Engine attempts to mitigate the RAM shortage by serializing the compiled opcodes and writing them to the physical disk as <code>.bin</code> files.</p>

<p>This creates a severe performance bottleneck. Writing to the disk is orders of magnitude slower than writing to shared memory. More importantly, when multiple PHP-FPM workers process a cache miss for the same file simultaneously, they compete for file system locks in the <code>/tmp</code> directory to write the <code>.bin</code> file. If the underlying storage is an ext4 filesystem utilizing standard journaling, creating thousands of small files under concurrent load forces the <code>jbd2</code> (Journaling Block Device) daemon to continuously flush metadata to the disk, saturating the block layer I/O queue.</p>

<p>The PHP workers servicing the cart endpoints were attempting to compile complex WooCommerce classes. They found the shared memory full, fell back to the file cache, encountered a locked file or a saturated I/O scheduler, and entered an uninterruptible sleep state. The 60-second Nginx timeout elapsed before the disk write could complete, terminating the connection.</p>

<h2>Application Profiling and Theme Memory Footprint</h2>

<p>Addressing the disk I/O stall requires tuning the PHP configuration, but simply increasing memory limits without auditing the application's footprint is an incomplete operational strategy. The volume of PHP files parsed directly correlates to the memory required by OPcache.</p>

<p>We ran a custom PHP script calling <code>opcache_get_status()</code> to dump the internal memory usage statistics of the Zend Engine.</p>

<pre><code>&lt;?php
$status = opcache_get_status(false);
echo "Memory Used: " . round($status['memory_usage']['used_memory'] / 1024 / 1024, 2) . " MB\n";
echo "Memory Free: " . round($status['memory_usage']['free_memory'] / 1024 / 1024, 2) . " MB\n";
echo "Wasted Memory: " . round($status['memory_usage']['wasted_memory'] / 1024 / 1024, 2) . " MB\n";
echo "Cached Scripts: " . $status['opcache_statistics']['num_cached_scripts'] . "\n";
echo "Cache Hits: " . $status['opcache_statistics']['hits'] . "\n";
echo "Cache Misses: " . $status['opcache_statistics']['misses'] . "\n";
?&gt;</code></pre>

<p>The output indicated over 18,000 cached scripts consuming nearly 240MB of memory before the system resorted to the file cache fallback. The legacy theme previously active on the storefront bundled numerous heavy page builders, redundant utility classes, and thousands of template parts that were loaded into memory on every request, regardless of the active route.</p>

<p>To reduce the baseline AST footprint, we migrated the storefront to the <a href="https://gplpal.com/product/khadija-fashion-woocommerce-theme/">Khadija - Fashion WooCommerce Theme</a>. This framework utilizes a strictly organized directory structure and leverages the Composer classmap for efficient autoloading, preventing the inclusion of unnecessary files. When evaluating a typical <a href="https://gplpal.com/product-category/wordpress-themes/">Free Download WordPress Theme</a>, the sheer file count is often overlooked. A theme that includes 5,000 PHP files requires proportional shared memory to function without compilation overhead. By swapping to a leaner codebase, the number of cached scripts required for a standard checkout flow dropped by 35%, significantly lowering the baseline memory consumption requirements for the OPcache segment.</p>

<h2>Tuning the OPcache Shared Memory Segment</h2>

<p>With the application footprint optimized, we adjusted the OPcache configuration in the <code>/etc/php/8.2/fpm/conf.d/10-opcache.ini</code> file to ensure the entire application could reside in shared memory, and explicitly disabled the disk fallback mechanism.</p>

<pre><code># /etc/php/8.2/fpm/conf.d/10-opcache.ini
zend_extension=opcache.so

opcache.enable=1
opcache.enable_cli=1

# Allocate 512MB of shared memory for compiled opcodes
opcache.memory_consumption=512

# Allocate 64MB specifically for interned strings. 
# PHP stores identical strings (like variable names and array keys) once in memory to save space.
# If this buffer fills, PHP allocates strings in the standard memory pool, increasing overall usage.
opcache.interned_strings_buffer=64

# Increase the maximum number of files that can be cached in the hash table
# Must be greater than the total number of PHP files in the application
opcache.max_accelerated_files=32000

# Disable timestamp validation. In a production environment, code does not change.
# If set to 1, PHP executes a stat() system call on every file per request to check for modifications,
# introducing unnecessary filesystem overhead.
opcache.validate_timestamps=0

# Explicitly disable the file cache fallback to prevent disk I/O stalls
opcache.file_cache=
opcache.file_cache_only=0

# Preserve comments in the AST, required by many libraries that rely on Doctrine annotations
opcache.save_comments=1</code></pre>

<p>Setting <code>opcache.file_cache=</code> (leaving the value empty) disables the feature. If the 512MB shared memory segment ever fills completely, OPcache will simply stop caching new scripts, forcing them to be compiled on the fly. While this increases CPU usage slightly, it completely avoids the severe I/O blocking and <code>ext4</code> journaling contention that caused the 60-second timeouts.</p>

<h2>Investigating the PHP 8.2 JIT Compiler Interference</h2>

<p>During the optimization phase, we also audited the PHP 8 Just-In-Time (JIT) compiler configuration. The JIT compiler translates intermediate opcodes into native machine code (x86_64 instructions) at runtime. The compiled machine code is stored in a separate buffer within the OPcache shared memory segment.</p>

<p>The default package configuration enabled the JIT compiler with a small buffer size.</p>

<pre><code>opcache.jit=tracing
opcache.jit_buffer_size=64M</code></pre>

<p>When the JIT compiler traces code paths and compiles them, it stores the executable machine code in the 64MB buffer. If this buffer fills, the JIT compiler simply stops compiling new traces and continues executing standard opcodes. However, a known edge case in specific PHP 8 minor versions involves shared memory fragmentation when the OPcache memory is nearly full, and the JIT compiler is actively attempting to allocate contiguous memory blocks for large compiled functions. This fragmentation can lead to internal Zend Engine locking issues.</p>

<p>For a standard e-commerce application heavily dependent on database I/O and network latency, the CPU-bound optimizations provided by the JIT compiler yield minimal real-world TTFB improvements compared to standard OPcache execution. We opted to disable the JIT compiler entirely to preserve the shared memory segment exclusively for standard opcode caching and to eliminate any possibility of memory fragmentation interfering with the worker processes.</p>

<pre><code># Disable the JIT compiler by setting the buffer size to zero
opcache.jit_buffer_size=0</code></pre>

<h2>Nginx FastCGI Buffering and Socket Queuing</h2>

<p>While resolving the PHP I/O stall fixes the root cause, the Nginx configuration must be resilient enough to handle brief upstream delays without dropping connections prematurely or exhausting the proxy node's memory.</p>

<p>When Nginx passes a request to PHP-FPM, it waits for the response. If the response from PHP is large (e.g., a heavily populated cart HTML fragment or a JSON payload), Nginx reads the data from the Unix socket and stores it in internal memory buffers. If the response exceeds the configured buffer size, Nginx writes the overflow data to temporary files in <code>/var/lib/nginx/fastcgi/</code>.</p>

<p>If the disk I/O on the server is already saturated (as was the case during the OPcache file fallback issue), Nginx attempting to write FastCGI overflow to disk exacerbates the problem. The Nginx worker process blocks on the disk write, reducing its capacity to accept new inbound client connections.</p>

<p>We adjusted the FastCGI buffer sizes in the Nginx server block to ensure that typical application responses fit entirely in RAM, avoiding disk interaction at the proxy layer.</p>

<pre><code># /etc/nginx/nginx.conf
http {
    # The size of the buffer used for reading the first part of the response, usually the HTTP headers.
    fastcgi_buffer_size 32k;

    # Sets the number and size of the buffers used for reading the response body.
    # 32 buffers of 32k allows for up to 1MB of response data to be held in RAM per connection.
    fastcgi_buffers 32 32k;

    # Limit the amount of data written to a temporary file if the response exceeds the buffers.
    # We set this to 0 to disable disk buffering entirely for FastCGI responses.
    # If the response exceeds 1MB, Nginx will synchronously send data to the client,
    # holding the PHP worker open longer, but avoiding local disk writes.
    fastcgi_max_temp_file_size 0;

    # Timeouts for the FastCGI connection
    fastcgi_connect_timeout 5s;
    fastcgi_send_timeout 30s;
    fastcgi_read_timeout 30s;
}</code></pre>

<p>Setting <code>fastcgi_max_temp_file_size 0</code> forces Nginx to operate strictly in memory for upstream responses. This shifts the bottleneck. If a client is on a slow 3G connection and Nginx cannot buffer the entire response to disk, it must slowly stream the data to the client while continuing to read from the PHP-FPM socket. This keeps the PHP worker process occupied for the duration of the transfer. However, in our infrastructure, this trade-off is acceptable to ensure the block layer I/O remains unhindered for database operations.</p>

<h2>Linux Kernel Socket Queueing (somaxconn)</h2>

<p>During the incident, the 60-second stalls caused the active PHP-FPM worker pool (<code>pm.max_children</code>) to fill completely. When all workers are busy, subsequent requests passed by Nginx to the Unix domain socket are placed in the socket's listen queue maintained by the Linux kernel.</p>

<p>The maximum size of this queue is governed by the <code>net.core.somaxconn</code> kernel parameter. If the queue fills up, the kernel rejects new connection attempts, and Nginx logs a <code>11: Resource temporarily unavailable</code> error while attempting the <code>connect()</code> syscall.</p>

<p>We verified the current limit using <code>sysctl</code>.</p>

<pre><code># sysctl net.core.somaxconn
net.core.somaxconn = 4096</code></pre>

<p>A limit of 4096 is standard on modern distributions, but to provide an additional buffer during sudden traffic spikes while workers are recovering, we expanded this limit.</p>

<pre><code># /etc/sysctl.d/99-socket-tuning.conf

# Increase the maximum socket listen queue size
net.core.somaxconn = 16384

# Increase the maximum number of packets queued on the input side
net.core.netdev_max_backlog = 16384

# Apply the changes
# sysctl -p /etc/sysctl.d/99-socket-tuning.conf</code></pre>

<p>It is important to note that adjusting the kernel parameter only raises the ceiling. The application creating the socket (PHP-FPM) must explicitly request the larger backlog queue when it calls the <code>listen()</code> function. We updated the PHP-FPM pool configuration to reflect the new limit.</p>

<pre><code># /etc/php/8.2/fpm/pool.d/www.conf
[www]
listen = /run/php/php8.2-fpm.sock
# Request a larger backlog queue from the kernel
listen.backlog = 16384</code></pre>

<p>After restarting the PHP-FPM service, we verified the socket queue size using the <code>ss</code> command.</p>

<pre><code># ss -lx | grep php8.2-fpm.sock
u_str LISTEN 0 16384 /run/php/php8.2-fpm.sock 142918 * 0</code></pre>

<p>The second numerical column confirms the backlog queue is now set to 16384, providing the operating system with adequate buffer space to hold inbound requests if the application workers temporarily stall.</p>

<pre><code># Final verification of OPcache status via CLI
php -r 'print_r(opcache_get_status(false)["memory_usage"]);'
Array
(
    [used_memory] =&gt; 214538240[free_memory] =&gt; 322340864
    [wasted_memory] =&gt; 0
    [current_wasted_percentage] =&gt; 0
)</code></pre>
</div>