WP-Cron stacking on cPanel: PHP-FPM exhaustion fix

The page came in at 09:02 local time on a Tuesday. Every WordPress site on cpanel-host was returning 500s for roughly forty seconds, then quietly recovered, then went down again at 09:07, then again at 09:12. The interval was the giveaway. A multi-site outage that ticks on a five-minute boundary is rarely a kernel bug, rarely a network fault, and almost always WordPress firing wp-cron through its own HTTP loopback. By the time the on-call engineer reached the box, ps -ef | grep wp-cron.php was returning 41 lines for a single cPanel user, each one a fat PHP-FPM child holding around 2.4GB of resident memory.

If you have arrived at this post by Googling wp-cron stacking php-fpm or ea-php81-fpm pool exhausted wp-cron, you are running WordPress on cPanel, your PHP-FPM pool keeps maxing out on the minute boundary, and the dashboards you trust are telling you nothing useful because the load average looks healthy in between the spikes.

This is the postmortem for that scenario. Three real symptoms drawn from our incident log, the diagnostic commands we ran on the box, the three-layer fix that closed the ticket, and an honest description of what ServerGuard's use case does and does not do for it today.

The symptom: PHP-FPM pool exhausted, 500s on every site

The first thing you see is not a wp-cron error. It is a wave of 500s from sites that have nothing to do with wp-cron. A photographer's brochure site on the same server returns blank pages. A static marketing site that is barely WordPress at all returns 500. The shared part of "shared hosting" is the PHP-FPM pool, and once the pool is exhausted nothing in that pool answers.

The log lines in /var/log/php-fpm/error_log follow a predictable shape. The first one is benign:

[09:02:14] WARNING: [pool lakeshor] server reached pm.max_children
setting (40), consider raising it

That warning fires once per minute that the pool stays at the ceiling. The second log line is the one that actually breaks the site:

[09:02:21] WARNING: [pool lakeshor] child 31427 said into stderr:
"PHP message: PHP Fatal error: Allowed memory size of
2147483648 bytes exhausted (tried to allocate 20480 bytes) in
/home/lakeshor/public_html/wp-includes/option.php on line 309"

A 2GB memory_limit is generous. WordPress should not need it. But when 40 wp-cron invocations are running the same scheduled action on the same site at the same time (each one loading every active plugin, each one materialising the same Action Scheduler queue) 2GB per child is what it takes. And once one child is OOM-killed, the pool's accounting is wrong for a few seconds, the next request queues, and the cascade is underway.

The third log line, the one that confirms wp-cron is the trigger, sits in the FPM slow log if you have one enabled:

[09:02:18] script_filename = /home/lakeshor/public_html/wp-cron.php
[09:02:18] [0x00007f9b1c0a3b40] curl_exec() /home/lakeshor/public_html/wp-content/plugins/some-backup-plugin/includes/class-remote.php:114

Forty of those, all on the minute boundary, all from the same WordPress install, all running the same plugin's scheduled task. That is wp-cron stacking, and on a shared cPanel server it does not matter that only one site is misbehaving. Every site sharing the same FPM pool is going down with it.

The blast radius nobody talks about

WordPress documentation discusses wp-cron as if it were a per-site performance concern. On cPanel it is rarely per-site. The default EasyApache + PHP-FPM configuration in WHM places multiple users into the same pool unless the operator has explicitly opted into per-user pools. In our incident logs the worst single event was 27 WordPress sites returning 500s simultaneously because one site with a misconfigured backup plugin saturated a pool shared across all 27.

If you have ever wondered why the support ticket from a customer who runs a sleepy static brochure site arrived at the same moment as the ticket from your noisiest WooCommerce store, this is the answer. The brochure site did nothing wrong; it shares a pool with a site that did.

How wp-cron actually stacks

WordPress does not have a real cron. It has a polling system that pretends to be one. On every front-end request, after the page has rendered, WordPress checks the option table for a list of pending scheduled events. If any are due, WordPress makes a non-blocking HTTP request back to itself at /wp-cron.php. That second request re-enters the full WordPress bootstrap, loads every active plugin, inspects the same options table, takes a soft lock via a transient, and runs the due hooks.

There are four properties of this design that interact badly with cPanel + PHP-FPM:

The trigger is traffic. A site with one visitor per hour triggers wp-cron once per hour; a site with one hundred visitors per minute triggers wp-cron up to one hundred times per minute. The expensive case is not the small site, it is the moderately busy site whose owner thinks it is small.
The lock is a transient, not a file lock. WordPress sets a doing_cron transient to mark "a cron run is in progress". A transient is a database row with an expiry. Two PHP processes can read "no transient set" within the same millisecond, both write the transient, and both proceed. Under load the lock is racy.
The loopback is HTTP, not internal. When WordPress fires wp-cron it issues a real HTTP request to its own front-door URL. On a cPanel box that request goes through Apache, hits FPM, and spawns a new FPM child. So firing wp-cron from a front-end request costs two FPM workers, not one.
Action Scheduler turns one hook into thousands. Plugins that embed Action Scheduler (WooCommerce, every major backup plugin, every major SEO plugin) push their own jobs into the WordPress options or actionscheduler_actions table. When wp-cron fires it does not run "one" job, it runs the next slice of an unbounded queue.

The pathological case is a popular WordPress site whose Action Scheduler queue has thousands of pending rows, whose front page is getting fifty visitors per minute, and whose FPM pool has a pm.max_children of 40. Every visitor probabilistically triggers wp-cron. Each wp-cron run takes longer than a normal page load because it is chewing through scheduler rows. The HTTP loopback doubles the FPM cost. Within sixty seconds the pool is full of wp-cron children, every one of them holding the same option-table state in memory, and the next legitimate page request returns a queue-timeout 500.

The diagnostic flow we run

The first command we run on any "WordPress site is randomly 500ing on a cPanel server" ticket has not changed in three years:

ps -ef | grep -i wp-cron.php | grep -v grep | wc -l

A healthy server returns 0 or 1. Two is the borderline case. Anything above three means wp-cron is stacking. The worst we have ever seen on a single site was 86, during a UpdraftPlus backup window. The worst on a per-server basis was 153, spread across four sites.

When the count is high we pivot to the FPM slow log to find out which site is responsible. The slow log is not on by default in cPanel; we turn it on with request_slowlog_timeout = 10s in the per-user pool file and reload FPM. Within one minute of slow log being enabled the log will tell us which wp-cron.php is the offender and which plugin's stack frame is on top:

tail -F /var/log/php-fpm/ea-php81-www.slow.log

To confirm that the wp-cron storm is correlated with a specific scheduler queue we run, as the cPanel user:

# Action Scheduler: used by WooCommerce, UpdraftPlus, MainWP, etc.
mysql --defaults-file=~/.my.cnf wpdb_022 -e "
  SELECT status, COUNT(*)
  FROM wpgt_actionscheduler_actions
  GROUP BY status;"

A healthy site has a few hundred complete rows, single-digit pending, zero failed. A site about to crash has tens of thousands of pending. The table prefix here, wpgt_, is the install pattern we use across most of our managed WordPress; on yours it will be wp_ or whatever the installer chose. The prefix does not matter, the row counts do.

We also pull the wp_options row that WordPress uses to track its own cron queue:

mysql --defaults-file=~/.my.cnf wpdb_022 -e "
  SELECT LENGTH(option_value) AS bytes
  FROM wpgt_options WHERE option_name = 'cron';"

Normal: a few kilobytes. A site whose cron has been stuck for weeks and whose Action Scheduler queue is unbounded: a megabyte or more. The size of that row matters because every wp-cron invocation fetches and deserialises it.

The last diagnostic is the one that actually tells us the FPM pool is the bottleneck rather than CPU, memory, or disk:

# FPM status page: must be enabled in the pool config
curl -s 'http://127.0.0.1:9001/status?full' | head -40

The values we care about are active processes, max active processes, and max children reached. If max children reached is a non-zero integer and active processes equals pm.max_children, the pool is the bottleneck and nothing else matters until that is fixed.

Why "just disable wp-cron" isn't enough on cPanel

The standard advice on the internet is two lines long. Add define('DISABLE_WP_CRON', true); to wp-config.php. Add a system cron that calls wp-cron.php every five minutes. Done.

On a single-tenant VPS that advice is correct. On a multi-tenant cPanel box it is incomplete in three ways.

First, the standard fix replaces the HTTP loopback with a system cron that still calls wp-cron.php over HTTP. Most published snippets look like this:

*/5 * * * * curl -s https://lakeshorebooks.com/wp-cron.php?doing_wp_cron > /dev/null

That curl call enters Apache, enters FPM, and spawns a new FPM child. It is one wp-cron firing instead of forty, which is the win, but it still consumes an FPM worker per site per five minutes and it does not stop a poorly written plugin from spawning further loopback requests from inside the cron run. We have seen plugins that, mid-cron-execution, issue their own internal HTTP requests back to wp-admin to "refresh" their settings page. Those still spawn FPM children.

Second, even with wp-cron disabled cleanly the FPM pool is still shared. Disabling wp-cron on the noisy site does nothing for the other 26 sites that share its pool. If the noisy site has a single Apache request that takes 90 seconds (say, a bad plugin admin screen) it still ties up a worker for 90 seconds and the pool still drains. Pool isolation is a separate problem.

Third, plugins that rely on Action Scheduler do not stop when wp-cron is disabled. Action Scheduler has its own fallback, a loopback over Ajax (admin-ajax.php?action=as_async_request_queue_runner), which exists specifically because WooCommerce learned the same lesson we are documenting here. That fallback uses the same FPM pool. Disabling wp-cron without also configuring Action Scheduler's runner mode just moves the problem one stack frame over.

A complete fix on cPanel has to address all three: stop wp-cron at the source, replace it with a runner that does not go over HTTP, and isolate pools so one runaway site cannot drown its neighbours.

Our fix, in three layers

Layer 1: disable wp-cron loopback on every site

The cheapest, fastest, most reversible step is to disable the wp-cron loopback on every WordPress install on the server. On a box with 27 WordPress sites we do not edit 27 wp-config.php files by hand. We iterate.

#!/usr/bin/env bash
# /root/scripts/disable-wp-cron-loopback.sh
# Walks every cPanel user, finds wp-config.php files, ensures
# DISABLE_WP_CRON is set to true. Idempotent.
set -euo pipefail
 
for home in /home/*/; do
  user="$(basename "$home")"
  # Skip non-cPanel system dirs.
  id "$user" >/dev/null 2>&1 || continue
 
  # Find every wp-config.php this user owns. -xdev to avoid
  # crossing into bind-mounted backup snapshots.
  while IFS= read -r -d '' cfg; do
    if grep -q "DISABLE_WP_CRON" "$cfg"; then
      # Already managed. Make sure the value is true, not false.
      sed -i "s/define( *['\"]DISABLE_WP_CRON['\"], *[^)]*)/define('DISABLE_WP_CRON', true)/" "$cfg"
    else
      # Insert above the "stop editing" marker WordPress ships.
      sed -i "/^\/\* That's all, stop editing/i define('DISABLE_WP_CRON', true);" "$cfg"
    fi
    echo "patched: $cfg"
  done < <(find "$home" -xdev -name wp-config.php -not -path '*/backups/*' -print0)
done

The script is deliberately conservative. It only edits files named exactly wp-config.php, it skips paths that contain backups, and it never deletes a line. It either rewrites the existing DISABLE_WP_CRON constant or inserts a new one. The reason we use sed rather than WP-CLI here is that WP-CLI fails on installs where the database is currently down or the plugin set is broken; this script needs to work even when WordPress itself is unhappy.

If WP-CLI is available and the site is healthy, we prefer it for verification afterwards:

sudo -u lakeshor -- wp --path=/home/lakeshor/public_html config get DISABLE_WP_CRON
# true

The final validation step is the one that actually proves the loopback has stopped. We tail the FPM access log and visit the site's homepage once in a browser:

tail -F /var/log/php-fpm/ea-php81-www.access.log | grep -E "wp-cron\.php"

Before the fix, that command produces a line every few seconds. After the fix, it produces nothing. If it still produces lines, some plugin is hitting wp-cron.php directly, and Layer 2 needs to account for it.

Layer 2: replace with system cron, staggered

The wrong way to schedule wp-cron from system cron is to put 27 sites on the same minute. We have seen this in the wild:

# /var/spool/cron/root: DO NOT do this
*/5 * * * * curl -s https://cypressclinic.com/wp-cron.php > /dev/null
*/5 * * * * curl -s https://lakeshorebooks.com/wp-cron.php > /dev/null
*/5 * * * * curl -s https://valleycycle.co/wp-cron.php > /dev/null
# ... 24 more lines, all at */5 ...

Every five minutes, at the same second, 27 curl processes fire and 27 FPM children spawn. We have just rebuilt wp-cron stacking with extra steps.

The right way is to stagger each site to a unique minute within the five-minute window. We hash the cPanel username, modulo five, and use that as the minute offset:

#!/usr/bin/env bash
# /root/scripts/install-wp-cron-system.sh
# Installs a per-user system cron that runs wp-cron via WP-CLI,
# staggered by a hash of the cPanel username so 27 sites don't
# fire on the same minute.
set -euo pipefail
 
for home in /home/*/; do
  user="$(basename "$home")"
  id "$user" >/dev/null 2>&1 || continue
 
  # Only act on users that actually have a WordPress install.
  [ -f "$home/public_html/wp-config.php" ] || continue
 
  # Stagger: hash the username, mod 5, gives a minute 0-4.
  offset=$(printf '%s' "$user" | cksum | awk '{print $1 % 5}')
 
  # Write a per-user cron file. cPanel's system reads /var/spool/cron/$user.
  tmp="$(mktemp)"
  crontab -u "$user" -l 2>/dev/null | grep -v "wp cron event run" > "$tmp" || true
  echo "$offset-59/5 * * * * /usr/local/bin/wp --path=$home/public_html cron event run --due-now --quiet" >> "$tmp"
  crontab -u "$user" "$tmp"
  rm -f "$tmp"
  echo "scheduled: $user at minute $offset/5"
done

Two choices in there are worth calling out.

We run wp cron event run --due-now rather than curling wp-cron.php. WP-CLI runs the cron jobs inside a single PHP process invoked from the shell. There is no HTTP, no Apache, no FPM worker consumed; the cron run is one process that the system cron supervises directly. The FPM pool stays free for the actual front-end traffic that pays the bills.

The stagger formula is cksum % 5. It is not cryptographic and it does not have to be. We just need a stable, deterministic assignment so the same user always lands on the same minute slot. Three sites collide on each minute on average, which is fine; 27 sites all firing at minute zero is not.

If you have hundreds of sites, you stagger across the full five minutes more aggressively or extend the interval to ten minutes and stagger across ten. The principle is the same.

Layer 3: per-pool PHP-FPM limits

Layer 1 and Layer 2 stop the symptom. Layer 3 stops the next incident, because the next incident will not be wp-cron. It will be a long-running plugin admin screen, a crawler hitting an expensive sitemap, a runaway WP-CLI invocation. The defence is to make sure that whatever the next outlier is, it can only consume the resources of one site, not the resources of every site sharing its pool.

In cPanel WHM this is MultiPHP Manager → per-user pool. Enable per-user FPM pools for every WordPress account, then write a template that sizes each pool based on the user's memory budget. A small site gets a small pool. The noisy site gets a slightly larger pool. Neither can starve the others.

The pool config we ship is below. It lives in /var/cpanel/ApachePHPFPM/system_pool_overrides/users/<user>.yaml in cPanel's YAML format; WHM rebuilds the actual pool file from the YAML on every restart, so do not hand-edit the pool's .conf file directly.

# Per-user FPM pool override. Sized for a "moderate WordPress"
# site, roughly 100k pageviews/month on ea-php81 with a typical
# plugin set. Adjust pm.max_children for noisier or quieter sites.
pm: ondemand
pm_max_children: 12
pm_process_idle_timeout: 30s
pm_max_requests: 500
request_terminate_timeout: 60s
php_admin_value_memory_limit: 512M

Five values are doing the work here. pm = ondemand means a quiet site uses zero FPM workers when it has zero traffic. pm_max_children = 12 is the hard upper bound: even if wp-cron stacking returns (it won't, but if it did), the damage is bounded to 12 workers for this one site. pm_max_requests = 500 recycles every worker after 500 requests, which keeps PHP opcache memory from drifting and purges any worker that has accidentally accumulated state. The request_terminate_timeout = 60s kills any single request that runs longer than a minute, which is the actual safety net for a hung Action Scheduler job. And the per-pool memory_limit = 512M is the cap that wp-cron stacking used to silently bypass. With a per-user pool, no one site can OOM the server because each pool's budget is its own.

For a server with 32GB of memory and 25 WordPress sites the arithmetic is simple: 25 * 12 * 512M = 153.6GB in the worst absolute case, but pm = ondemand plus realistic per-site traffic patterns means actual concurrent usage rarely exceeds 4-6GB across all sites. The configuration is sized for tail events, not steady state.

The plugins that make this worse

A handful of plugin families are responsible for most wp-cron incidents. Knowing which ones are on a server is half the audit.

Action Scheduler. Bundled in WooCommerce, Automatic.css, ActivityPub, and dozens of others. Action Scheduler runs its own queue inside the WordPress database (*_actionscheduler_actions). It uses wp-cron as its trigger but it also has an Ajax loopback fallback. Misconfigured, it will retry failed jobs indefinitely and your actionscheduler_actions table grows to millions of rows. The remediation is to set the runner mode to "WP CLI" via the WooCommerce filter action_scheduler_run_queue_hook and to add a weekly cleanup cron for completed rows older than 30 days.

UpdraftPlus and BackWPup. Both schedule long-running backup jobs through wp-cron. A backup that takes 14 minutes ties up an FPM worker for 14 minutes if it is running through the HTTP loopback. Both plugins support running via WP-CLI; the fix is to configure the plugin to use the WP-CLI runner and to schedule the backup as a separate system cron entry well outside business hours.

WP Cron Control plugins. A category of plugins that promise to "fix wp-cron". Some are useful (Automattic's WP-Cron-Control is the one we trust). Most are abandoned and add their own surface area. If a site already has a wp-cron control plugin installed and you are about to apply the three-layer fix above, remove the plugin first; otherwise its disabled-but-not-deactivated state will fight with your wp-config.php constant.

How to detect this before it crashes you

The single metric that reliably predicts a wp-cron stacking incident is FPM pool utilisation. Not CPU, not memory, not load average. Active FPM children as a percentage of pm.max_children, sampled at one-minute resolution, watched per pool.

We pull it via the FPM status endpoint into Netdata, but any monitoring system that can curl an HTTP endpoint and graph a number will work. Two thresholds:

Warn at 70% sustained over five minutes. This is "the pool is hot, look at what is running before it gets worse". An engineer acknowledges the alert and pulls the FPM slow log.
Page at 90% sustained over two minutes. This is "the pool is about to exhaust and customer sites will start returning 500s imminently". The on-call engineer joins immediately.

A secondary detector that costs nothing is the wp-cron process count itself, exposed as a node-exporter textfile collector:

# /etc/cron.d/wp-cron-count: runs every minute
* * * * * root pgrep -fc wp-cron.php > /var/lib/node_exporter/wp_cron_count.prom.tmp && mv /var/lib/node_exporter/wp_cron_count.prom{.tmp,}

Graph that number alongside FPM utilisation. When wp-cron count rises and FPM utilisation rises in lockstep, the cause and the effect are both on the same screen. When FPM utilisation rises without wp-cron count rising, something else is the culprit and this is the wrong runbook.

For a related failure mode where the bot traffic that triggers the stacking is itself the deeper problem (a hostile crawler hammering faceted filter URLs on a WooCommerce store) see our writeup on the WooCommerce filter URL crawler trap. The diagnostic instincts overlap even though the cause is unrelated.

A pre-mortem checklist

Six questions to ask of every cPanel server with WordPress on it, before the page arrives instead of after.

Is DISABLE_WP_CRON set to true in every wp-config.php? Answer must be yes. Verify with the find loop in Layer 1.
Does every WordPress site have a system cron entry running wp cron event run --due-now, not curling wp-cron.php? Answer must be yes. Verify with for u in $(ls /home); do crontab -u "$u" -l 2>/dev/null | grep -H wp; done.
Are system crons staggered across the polling window? Answer must be yes. Verify with for u in $(ls /home); do crontab -u "$u" -l 2>/dev/null; done | grep wp | awk '{print $1}' | sort | uniq -c | sort -rn | head. The output should be a roughly flat distribution, not a single row at */5.
Is every WordPress account in a per-user FPM pool, not the shared default? Answer must be yes. Verify in WHM MultiPHP Manager → System PHP-FPM Configuration; every user's "Pool Options" should be customised.
Does each per-user pool have request_terminate_timeout below 90 seconds and pm.max_requests set? Answer must be yes. A stuck worker without request_terminate_timeout is a permanent leak.
Is FPM pool utilisation graphed and alerted at 70% and 90%? Answer must be yes. Without the alert, the first time you find out about a stacking incident is when a customer escalates.

The honest version of this checklist is that most cPanel servers in the wild answer "no" to at least four of these. The fix on each is cheap. The fix together prevents the entire incident class.

How ServerGuard handles this

ServerGuard's use case for wp-cron stacking covers the immediate remediation today and the structural remediation upcoming roadmap. We are deliberate about the line between the two because editing every wp-config.php on a server is exactly the kind of change that demands a human approval.

Today, ServerGuard's safe-action use case handles the in-flight incident:

Detect. When FPM pool utilisation crosses the 90% threshold on a cPanel host, SGuard correlates the spike with the output of pgrep -fc wp-cron.php. If wp-cron count is above three on the same host within the same one-minute window, the incident is classified as wp_cron_stacking and the runbook activates.
Act, automatically. SGuard kills any wp-cron.php process that exceeds 60 seconds of wall-clock time or 1GB of resident memory. This is a Safe action under the spec. Killing a runaway PHP worker is reversible, scoped to a single process, and produces a bounded blast radius. The kill is logged to the audit trail with the PID, the cPanel user, the resident memory at kill time, and the wall-clock age.
Diagnose. SGuard pulls the FPM slow log, the actionscheduler_actions row counts for each WordPress install on the host, and the size of the cron row in wp_options. Those land in the incident ticket so the on-call engineer reads one timeline instead of running ten commands.

The structural remediation (Layer 1 and Layer 2 from this post) sits behind a Moderate-tier approval gate and is upcoming roadmap. Editing wp-config.php across every account on a server and installing per-user system crons is the kind of change that needs a human signing off on the diff before it ships. When this ships, SGuard will present the proposed edit set, the per-user stagger plan, and the rollback as a single approval prompt in Telegram or the web dashboard.

Layer 3, per-pool FPM limits, is server configuration policy and sits outside SGuard's automation scope by design. SGuard will flag servers that fail the pre-mortem checklist above and link this post; it will not silently rewrite WHM MultiPHP settings. Policy belongs to humans.

For the related "what happens if we ignore this entirely" failure mode where one stacked wp-cron storm cascades into full server CPU exhaustion across every pool, see CSF, lfd, and Imunify360: why your firewall is killing itself. The diagnostic pattern of "minute-boundary multi-site outage with healthy load average in between" recurs.

If you operate a fleet of cPanel servers and the pre-mortem checklist made you wince, join the ServerGuard waitlist. We are onboarding agencies in cohorts and the wp-cron-stacking use case is one of the seven runbooks we ship today.

WordPress WP-Cron stacking on cPanel: a complete fix

The symptom: PHP-FPM pool exhausted, 500s on every site

The blast radius nobody talks about

How wp-cron actually stacks

The diagnostic flow we run

Why "just disable wp-cron" isn't enough on cPanel

Our fix, in three layers

Layer 1: disable wp-cron loopback on every site

Layer 2: replace with system cron, staggered

Layer 3: per-pool PHP-FPM limits

The plugins that make this worse

How to detect this before it crashes you

A pre-mortem checklist

How ServerGuard handles this

xmlrpc.php abuse and the 27-site one-shot fix on cPanel

Disable WP-Cron across every WordPress site on cPanel

Patchman activation breaks PHP sites: memory_limit gotcha

On this page

The symptom: PHP-FPM pool exhausted, 500s on every site

The blast radius nobody talks about

How wp-cron actually stacks

The diagnostic flow we run

Why "just disable wp-cron" isn't enough on cPanel

Our fix, in three layers

Layer 1: disable wp-cron loopback on every site

Layer 2: replace with system cron, staggered

Layer 3: per-pool PHP-FPM limits

The plugins that make this worse

How to detect this before it crashes you

A pre-mortem checklist

How ServerGuard handles this

Related posts

xmlrpc.php abuse and the 27-site one-shot fix on cPanel

Disable WP-Cron across every WordPress site on cPanel

Patchman activation breaks PHP sites: memory_limit gotcha