ChkServd alert field guide: reading cPanel service alerts
Decode the chkservd alert subject lines and bodies cPanel sends when a service flaps, with notes on common false positives and how to tune sensitivity.
ChkServd alert field guide: how to read cPanel's service alerts
chkservd is the bit of cPanel's tailwatchd that watches services
and emails you when one looks unhappy. Most teams treat its alerts as
noise. They are not noise; they have a stable grammar, and once you
read them at a glance you can sort signal from flap in two seconds.
What ChkServd is
ChkServd runs inside tailwatchd and polls the registered service
list every ~5 minutes. The list lives in
/etc/chkserv.d/chkservd.conf and each service has its own driver
file under /etc/chkserv.d/<service>. A driver tells ChkServd what
port to probe, what banner to expect, and what command to run if the
banner is wrong.
ls /etc/chkserv.d/
# apache_php_fpm cpsrvd exim ftpd imap mailman mysql
# named nscd pop queueprocd spamd sshdcat /etc/chkserv.d/mysql
# service[mysql]=x,x,x,connect,/etc/init.d/mysql restart,mysql,rootThe alert format
Every alert has the same anatomy. Subject:
[chkservd] Service check on cpanel-host -- FAILED: <service> ([reason])
Body, in order:
- The service name and the failure timestamp.
- The probe ChkServd ran (port, expected banner).
- What it actually got (banner mismatch, refused connection, timeout).
- The recovery action, if any (
Notice: TailWatchd has restarted <service>).
The five alerts you will see most often
[chkservd] Service check -- FAILED: mysql ([connect failed])
MariaDB or MySQL is not accepting connections on 127.0.0.1:3306.
Either the daemon is dead (check /var/log/mariadb/mariadb.log for
crash) or it is alive but stuck (check mysqladmin processlist).
[chkservd] Service check -- FAILED: <service> is unable to detect a connection on port <N>
The service is running according to systemctl but the TCP probe
times out. Three usual causes: firewall rule blocking
127.0.0.1, daemon stuck on a long-running query, or iptables
state table full. Tail /var/log/messages and run ss -ltnp first.
Notice: TailWatchd has restarted <service>
Informational, not failure. ChkServd ran the recovery action from the driver file and the service came back. If you see this every hour for the same service, the underlying cause is unresolved and you need to investigate, not silence the alert.
[chkservd] /var/lib/mysql is over <threshold> full
Disk full alarm scoped to the MySQL data directory. Threshold is configurable in WHM > Server Configuration > Tweak Settings > "Maximum percentage of space used by MySQL". Default 95%. When this fires, MariaDB stops accepting writes long before the partition itself runs out.
[chkservd] SSL certificate is expiring on <domain>
cPanel certificate, not Let's Encrypt for the cPanel hostname. Two weeks default warning. If AutoSSL is failing for the cPanel host, this is the only alert that will tell you before things break.
When ChkServd is right vs wrong
It is right almost always when the alert is connect failed on a
real port. The service is dead or the listener is bound wrong. It
is wrong often when:
- The probe times out at exactly the same time
cron.dailyis running. The host is alive; ChkServd just lost the race. - The service runs on an odd port and the driver still expects the default. (Same shape of bug as the Imunify360 custom SSH port issue.)
- The service accepts the connection but takes >5s to send a banner. ChkServd's TCP probe is short-fused.
Tuning ChkServd
WHM > Service Manager lets you toggle which services ChkServd
monitors at all and whether it auto-restarts on failure. For per-
service probe overrides, edit the driver file under
/etc/chkserv.d/. The format is comma-delimited; the cPanel docs
linked from the WHM page are the only reliable reference.
# Disable auto-restart for a specific service while keeping monitoring:
sed -i 's/connect,\/etc\/init.d\/mysql restart/connect,\/bin\/true/' \
/etc/chkserv.d/mysql
/scripts/restartsrv_tailwatchdRelated reading
ChkServd alerts are the entry point for several incident types we write up in detail:
- MySQL OOM on cPanel and the innodb_buffer_pool_size trap
- 86 CPU spikes in 24 hours: a multi-cause cascade postmortem
- The cPanel disk full backup retention trap
For the slow log permission flap that often follows a
mysql connect failed alert, see the
MariaDB slow log permissions quickref.
How ServerGuard uses this
We parse the ChkServd alert subject line into a structured event (service, kind, port, threshold) and route it into the matching use case before paging a human. Most ChkServd alerts resolve themselves in our triage pass before anyone wakes up.
Related posts
- 15 min read
86 CPU spikes in 24 hours: a multi-cause cascade postmortem
86 CPU spikes in 24 hours: a multi-cause cascade postmortem The mailbox at 08:00 had 86 ChkServd CPU alerts from , all from the previous 24 hours. Not a single tidy outage with a single cause. A steady drip of "CPU at 95% for the last minut
- 12 min read
cPanel disk full at 96 percent: the backup retention trap
cPanel disk full at 96 percent: the backup retention trap was at 96 percent. The exact numbers were 931GB used out of 970GB, which left 39GB of headroom on a server that wrote roughly 2GB an hour into mail spools and InnoDB tablespaces alon
- 12 min read
MySQL OOM on cPanel: diagnosing innodb_buffer_pool_size
MySQL OOM on cPanel: diagnosing innodbbufferpoolsize The page came in at 03:14. cPanel's ChkServd had decided MariaDB was down on , and the on-call inbox was filling up with the alert every cPanel operator eventually learns to dread: A juni