All ServerGuard use cases

Each entry below is one remediation use case. We list the trigger that opens an incident and the response we run.

Database recovery

We watch ChkServd MySQL alerts, parse the actual error log, and either restart MariaDB cleanly or queue a dangerous recovery action for approval.

MySQL / MariaDB down

Trigger
ChkServd mysql alert or a failed mariadb.service unit. We tail the actual error log within a minute of the alert.
Response
We diagnose four known root causes: buffer pool sized above available RAM, slow-log permission denied, missing auth_socket.so, and InnoDB initialisation failure. The first three we fix and verify automatically. InnoDB corruption routes to approval before any restore.

WordPress under load

PHP-FPM stacking from WP-Cron loops, xmlrpc.php abuse, and bot crawler floods that pin a single account's pool at 100 percent.

High CPU, PHP-FPM stacking

Trigger
Sustained CPU pressure with a backlog of php-fpm workers piling up in the same pool, or a 360 Monitoring alert.
Response
We identify the cPanel account whose pool is stacking, clear the wp_options cron table, set DISABLE_WP_CRON, and recycle that pool only. The whole-server restart we used to do by hand is the wrong tool for one noisy site.

WordPress security hardening

Trigger
New server onboarding, or any incident that closed with a malware finding on a WordPress install.
Response
We disable client-side WP-Cron and replace it with a server cron, deny xmlrpc.php at the .htaccess layer, and bump the PHP memory limit on ea-php81 sites that need it. The hardening profile applies per cPanel account, not server-wide.

Bot crawler mitigation

Trigger
A single IP making an abnormal volume of requests against filter URLs (?filter_, min_price=, max_price=).
Response
We match the user-agent against a known list (meta-externalagent, MJ12bot, AhrefsBot), append a Disallow rule to robots.txt, and block the active range in Imunify360. Sites without page caching get flagged for the on-call engineer.

Disk and backup

Inode and partition pressure, JetBackup retention overruns, and the log files that quietly fill /var before anyone notices.

Disk space critical

Trigger
Backup-failed email or partition usage running hot on root or /home.
Response
We rotate oversized logs (safe, automatic), prune JetBackup sets older than the retention floor (safe), and report the ten largest paths. Anything that touches /home/{user} content waits for approval.

Backup management

Trigger
Disk pressure on /home or /backup, or a JetBackup failure email.
Response
We delete backup sets older than seven days when more than four sets exist (safe, automatic after a 30-minute notification window). Pruning down to one or two remaining sets, or removing files larger than 1GB from /home, always waits for approval.

Security and brute force

SSH and wp-login brute force, csf/lfd watchdog failures, and Imunify360 malware detections that need quarantine plus an audit trail.

SSH brute force attack

Trigger
A flood of failed SSH attempts from the same /24 inside the polling window.
Response
We check the per-server admin IP allowlist first. That always wins. Foreign subnets get an Imunify360 ip-list block. Subnets in the org's registered country route to approval, to avoid locking out a travelling admin. Every block is reversible via the rollback_command field.

CSF / lfd failure

Trigger
ChkServd lfd alert or lfd killed with signal 9 (the Imunify360 conflict signature).
Response
We restore csf.conf from the reset profile when missing, write version.txt when the package upgrade dropped it, and disable lfd monitoring through whmapi1 when Imunify360 is the active firewall. All four known root causes are safe-class actions.

Malware detection

Trigger
Imunify360 malware alert, or a site redirecting to an unknown domain while the server itself is clean.
Response
We run an on-demand scan against the affected docroot, list the malicious files, and check .htaccess for injected RewriteRule directives. Quarantine and file removal are logged to the audit table; database row deletions on wp_posts wait for approval.

Mail and SSL

Exim queue cleanup, DNSBL delisting workflow, and Let's Encrypt renewals that AutoSSL skipped.

Exim mail queue buildup

Trigger
An unusual surge in the Exim queue, or a ChkServd exim alert.
Response
We identify the top sender, flush frozen non-deliverable bounces, and surface the spam/DNSBL hits in the diagnosis. Suspending a compromised account is dangerous and waits for approval.

Server IP blacklisted

Trigger
Outbound IP appears in Spamhaus ZEN, Barracuda, or SpamCop on a proactive poll.
Response
We trace the offending account through exim_mainlog, queue an account suspension for approval, flush the outbound queue, and submit delisting requests to each DNSBL with the request URL written to the audit log.

SSL / Let's Encrypt certificate failure

Trigger
A certificate expiring inside 14 days, or an AutoSSL failure email.
Response
We retry AutoSSL per user, fix .htaccess rules that block .well-known/acme-challenge, whitelist the ACME user-agent in ModSecurity when the rule trips, and alert when the domain is on a paid wildcard we cannot renew automatically.

Platform health

Apache, DNS, cPanel/WHM, account suspensions, FTP, cron, OOM, and NTP. The surface beyond MySQL that still has to stay green.

Apache / LiteSpeed failure or ModSecurity false positives

Trigger
ChkServd httpd alert, or a sudden cluster of 403/406 responses across one virtual host.
Response
We run apachectl configtest, parse the ModSecurity rule ID from the error log, and disable that single rule for the affected domain. Worker-pool exhaustion and port conflicts are restarted; .htaccess parse errors alert the on-call engineer with the offending line.

DNS zone failure

Trigger
A site unreachable but the host healthy. DNS resolution fails on dig @localhost.
Response
We restart BIND on a hard crash and run named-checkzone against the failing zone. Zone-file corruption restoration and registrar-side glue mismatches alert the on-call engineer; we do not modify zone files on autopilot.

cPanel / WHM specific failures

Trigger
ChkServd alerts for cpaneld, a cPanel license check failure, or a cPHulk lockout against the admin IP.
Response
We restart cpaneld on a clean crash, retry the license check against lp.cpanel.net, and whitelist the admin IP in cphulkd directly when the lockout blocks WHM access. EasyApache rebuild failures alert with the last known config diff.

Account resource limit suspension

Trigger
A cPanel account flips to SUSPENDED through cPHulk or the resource-limit watchdog.
Response
We pull the suspension reason from cphulkd_log, the bandwidth history, and the recent exim activity. Unsuspension is dangerous. It always waits for approval, and we never unsuspend an account whose Exim queue is still full of spam.

FTP service failure

Trigger
ChkServd pure-ftpd alert, or a connection-timeout pattern against port 21.
Response
We restart pure-ftpd on a clean crash and open the passive port range in CSF when the data-channel timeout signature appears. MaxClientsNumber tuning and port-21 conflicts queue for approval.

System cron daemon failure

Trigger
crond not running on a proactive poll, or scheduled jobs not firing.
Response
We restart crond on a clean crash and trigger missed WordPress crons through the server-side replacement we set up for WordPress hardening. Corrupt crontab entries alert with the offending line. We do not edit user crontabs automatically.

OOM killer / swap exhaustion

Trigger
OOM events in dmesg, or swap usage running hot.
Response
We read dmesg for the killed process name, tune innodb_buffer_pool_size when MySQL was the victim, reduce pm.max_children when PHP-FPM was the victim, and create a 2GB swapfile on hosts with no swap configured.

NTP / time drift

Trigger
Significant time offset, or SSL handshake errors that trace back to clock skew.
Response
We restart chronyd, run chronyc makestep to force an immediate sync, and verify timedatectl status afterwards. Hosts without chrony installed alert the on-call engineer; we do not install packages without approval.