All ServerGuard use cases

Each entry below is one remediation use case. We list the trigger that opens an incident and the response we run.

Database recovery

We watch ChkServd MySQL alerts, parse the actual error log, and either restart MariaDB cleanly or queue a dangerous recovery action for approval.

Field notes: MySQL OOM on cPanel: diagnosing innodb_buffer_pool_size →

MySQL / MariaDB down

Trigger: ChkServd mysql alert or a failed mariadb.service unit. We tail the actual error log within a minute of the alert.
Response: We diagnose four known root causes: buffer pool sized above available RAM, slow-log permission denied, missing auth_socket.so, and InnoDB initialisation failure. The first three we fix and verify automatically. InnoDB corruption routes to approval before any restore.

WordPress under load

PHP-FPM stacking from WP-Cron loops, xmlrpc.php abuse, and bot crawler floods that pin a single account's pool at 100 percent.

Field notes: Three real WordPress compromises and how we found them →

High CPU, PHP-FPM stacking

Trigger: Sustained CPU pressure with a backlog of php-fpm workers piling up in the same pool, or a 360 Monitoring alert.
Response: We identify the cPanel account whose pool is stacking, clear the wp_options cron table, set DISABLE_WP_CRON, and recycle that pool only. The whole-server restart we used to do by hand is the wrong tool for one noisy site.

WordPress security hardening

Trigger: New server onboarding, or any incident that closed with a malware finding on a WordPress install.
Response: We disable client-side WP-Cron and replace it with a server cron, deny xmlrpc.php at the .htaccess layer, and bump the PHP memory limit on ea-php81 sites that need it. The hardening profile applies per cPanel account, not server-wide.

Bot crawler mitigation

Trigger: A single IP making an abnormal volume of requests against filter URLs (?filter_, min_price=, max_price=).
Response: We match the user-agent against a known list (meta-externalagent, MJ12bot, AhrefsBot), append a Disallow rule to robots.txt, and block the active range in Imunify360. Sites without page caching get flagged for the on-call engineer.

Disk and backup

Inode and partition pressure, JetBackup retention overruns, and the log files that quietly fill /var before anyone notices.

Field notes: cPanel disk full at 96 percent: the backup retention trap →

Disk space critical

Trigger: Backup-failed email or partition usage running hot on root or /home.
Response: We rotate oversized logs (safe, automatic), prune JetBackup sets older than the retention floor (safe), and report the ten largest paths. Anything that touches /home/{user} content waits for approval.

Backup management

Trigger: Disk pressure on /home or /backup, or a JetBackup failure email.
Response: We delete backup sets older than seven days when more than four sets exist (safe, automatic after a 30-minute notification window). Pruning down to one or two remaining sets, or removing files larger than 1GB from /home, always waits for approval.

Security and brute force

SSH and wp-login brute force, csf/lfd watchdog failures, and Imunify360 malware detections that need quarantine plus an audit trail.

Field notes: SSH brute force on cPanel: the 8,127-attempt night and the fix →

SSH brute force attack

Trigger: A flood of failed SSH attempts from the same /24 inside the polling window.
Response: We check the per-server admin IP allowlist first. That always wins. Foreign subnets get an Imunify360 ip-list block. Subnets in the org's registered country route to approval, to avoid locking out a travelling admin. Every block is reversible via the rollback_command field.

CSF / lfd failure

Trigger: ChkServd lfd alert or lfd killed with signal 9 (the Imunify360 conflict signature).
Response: We restore csf.conf from the reset profile when missing, write version.txt when the package upgrade dropped it, and disable lfd monitoring through whmapi1 when Imunify360 is the active firewall. All four known root causes are safe-class actions.

Malware detection

Trigger: Imunify360 malware alert, or a site redirecting to an unknown domain while the server itself is clean.
Response: We run an on-demand scan against the affected docroot, list the malicious files, and check .htaccess for injected RewriteRule directives. Quarantine and file removal are logged to the audit table; database row deletions on wp_posts wait for approval.

Mail and SSL

Exim queue cleanup, DNSBL delisting workflow, and Let's Encrypt renewals that AutoSSL skipped.

Field notes: AutoSSL fails on Microsoft 365 autodiscover subdomains: the fix →

Exim mail queue buildup

Trigger: An unusual surge in the Exim queue, or a ChkServd exim alert.
Response: We identify the top sender, flush frozen non-deliverable bounces, and surface the spam/DNSBL hits in the diagnosis. Suspending a compromised account is dangerous and waits for approval.

Server IP blacklisted

Trigger: Outbound IP appears in Spamhaus ZEN, Barracuda, or SpamCop on a proactive poll.
Response: We trace the offending account through exim_mainlog, queue an account suspension for approval, flush the outbound queue, and submit delisting requests to each DNSBL with the request URL written to the audit log.

SSL / Let's Encrypt certificate failure

Trigger: A certificate expiring inside 14 days, or an AutoSSL failure email.
Response: We retry AutoSSL per user, fix .htaccess rules that block .well-known/acme-challenge, whitelist the ACME user-agent in ModSecurity when the rule trips, and alert when the domain is on a paid wildcard we cannot renew automatically.

Platform health

Apache, DNS, cPanel/WHM, account suspensions, FTP, cron, OOM, and NTP. The surface beyond MySQL that still has to stay green.

Field notes: 86 CPU spikes in 24 hours: a multi-cause cascade postmortem →

Apache / LiteSpeed failure or ModSecurity false positives

Trigger: ChkServd httpd alert, or a sudden cluster of 403/406 responses across one virtual host.
Response: We run apachectl configtest, parse the ModSecurity rule ID from the error log, and disable that single rule for the affected domain. Worker-pool exhaustion and port conflicts are restarted; .htaccess parse errors alert the on-call engineer with the offending line.

DNS zone failure

Trigger: A site unreachable but the host healthy. DNS resolution fails on dig @localhost.
Response: We restart BIND on a hard crash and run named-checkzone against the failing zone. Zone-file corruption restoration and registrar-side glue mismatches alert the on-call engineer; we do not modify zone files on autopilot.

cPanel / WHM specific failures

Trigger: ChkServd alerts for cpaneld, a cPanel license check failure, or a cPHulk lockout against the admin IP.
Response: We restart cpaneld on a clean crash, retry the license check against lp.cpanel.net, and whitelist the admin IP in cphulkd directly when the lockout blocks WHM access. EasyApache rebuild failures alert with the last known config diff.

Account resource limit suspension

Trigger: A cPanel account flips to SUSPENDED through cPHulk or the resource-limit watchdog.
Response: We pull the suspension reason from cphulkd_log, the bandwidth history, and the recent exim activity. Unsuspension is dangerous. It always waits for approval, and we never unsuspend an account whose Exim queue is still full of spam.

FTP service failure

Trigger: ChkServd pure-ftpd alert, or a connection-timeout pattern against port 21.
Response: We restart pure-ftpd on a clean crash and open the passive port range in CSF when the data-channel timeout signature appears. MaxClientsNumber tuning and port-21 conflicts queue for approval.

System cron daemon failure

Trigger: crond not running on a proactive poll, or scheduled jobs not firing.
Response: We restart crond on a clean crash and trigger missed WordPress crons through the server-side replacement we set up for WordPress hardening. Corrupt crontab entries alert with the offending line. We do not edit user crontabs automatically.

OOM killer / swap exhaustion

Trigger: OOM events in dmesg, or swap usage running hot.
Response: We read dmesg for the killed process name, tune innodb_buffer_pool_size when MySQL was the victim, reduce pm.max_children when PHP-FPM was the victim, and create a 2GB swapfile on hosts with no swap configured.

NTP / time drift

Trigger: Significant time offset, or SSL handshake errors that trace back to clock skew.
Response: We restart chronyd, run chronyc makestep to force an immediate sync, and verify timedatectl status afterwards. Hosts without chrony installed alert the on-call engineer; we do not install packages without approval.