13 min read

SSH brute force on cPanel: the 8,127-attempt night and the fix

A postmortem on 8,127 failed SSH logins to a cPanel server in six hours from rotating /24s, why lfd alone could not see the pattern, and the layered fix.

SSH brute force on cPanel: the 8,127-attempt night and the fix

The first lfd alert landed at 02:14. Five failed root logins from a single address in Bulgaria, blocked at the 5/300s threshold, business as usual. By 02:31 the inbox had nine more of the same alert from nine different addresses, all in the same /24. By 03:00 the count was over eight hundred attempts, the lfd alerts had stopped being readable as individual events, and the SSH daemon on cpanel-host was spending more CPU on auditd rejections than on legitimate sessions. By the time the on-call engineer finished the morning audit at 08:00, /var/log/secure had recorded 8,127 failed login attempts across six hours from two coordinated /24 ranges, with rotating usernames, never more than ten attempts from any single source address.

This is the postmortem of why a default CSF + lfd install does not stop that kind of attack, the layered fix we deploy now, and the one guardrail we will not remove no matter what the attack pattern looks like: the admin IP allowlist is non-negotiable.

The night the alerts started

The first alert was unremarkable. lfd's default LF_SSHD threshold is five failed logins from a single IP in 300 seconds, and the address in that alert had crossed it cleanly. lfd added a csf.deny entry, sent the email, and moved on. The first nine minutes were ordinary.

By 02:31 we had fifty failed logins recorded in /var/log/secure across twelve usernames (root, admin, oracle, postgres, ubuntu, deploy, git, jenkins, centos, ftpuser, test, user) and ten source addresses. No single address had hit the five-in- 300-seconds threshold yet. They were rotating fast enough that lfd's per-IP counters never tripped. Twelve of the alerts that did fire came from inside the same /24. There was no pattern in the lfd email summary because each address was being scored independently.

By 03:00 the count was eight hundred and the rate was climbing. csf -t showed only eleven temporary blocks in place. The attacker was paying the cost of one IP getting blocked every minute or so and was getting the other ninety percent of attempts through. The defence was per-IP; the attack was per-subnet.

That is the realisation that defines this incident: lfd is a single-attacker tool. It was designed when "brute force" meant one machine in someone's basement running hydra against your server. Modern coordinated SSH brute force runs from rented /24s and never exceeds the per-IP threshold of any one address. The defence has to move up one octet.

What the log looks like

This is the redacted shape of /var/log/secure during the worst fifteen minutes. The hostnames have been replaced per our anonymisation glossary; the attacker addresses below are illustrative — the shape of the log matters, not the specific source IPs.

Mar 22 02:47:13 server sshd[18421]: Failed password for root from 203.0.113.12 port 51234 ssh2
Mar 22 02:47:14 server sshd[18421]: Connection closed by authenticating user root 203.0.113.12 port 51234
Mar 22 02:47:18 server sshd[18443]: Failed password for invalid user oracle from 203.0.113.47 port 39112 ssh2
Mar 22 02:47:22 server sshd[18445]: Failed password for invalid user postgres from 203.0.113.91 port 44820 ssh2
Mar 22 02:47:25 server sshd[18447]: Failed password for invalid user ubuntu from 203.0.113.118 port 41003 ssh2
Mar 22 02:47:29 server sshd[18451]: Failed password for invalid user deploy from 198.51.100.22 port 60112 ssh2
Mar 22 02:47:33 server sshd[18454]: Failed password for invalid user git from 198.51.100.74 port 41884 ssh2
Mar 22 02:47:37 server sshd[18458]: Failed password for invalid user jenkins from 198.51.100.119 port 38222 ssh2
Mar 22 02:47:40 server sshd[18460]: Failed password for invalid user admin from 203.0.113.203 port 52001 ssh2
Mar 22 02:47:43 server sshd[18463]: Failed password for invalid user centos from 198.51.100.201 port 33910 ssh2
# ... 138 more lines in the same fifteen-minute window ...

Two things stand out. The first is that no single address appears more than nine times in the entire six-hour window, well under any per-IP threshold a sensible operator would set. The second is that if you group the lines by the first three octets of the source address, the first attacker subnet accounts for 4,318 of the 8,127 attempts and the second accounts for the remaining 3,809. The pattern is invisible per-IP and obvious per-subnet.

The awk one-liner that turns the log into that summary is short enough to memorise and worth memorising:

grep "Failed password" /var/log/secure \
  | awk '{for(i=1;i<=NF;i++) if($i=="from") print $(i+1)}' \
  | awk -F. '{print $1"."$2"."$3".0/24"}' \
  | sort | uniq -c | sort -rn | head -20

Run it during any suspected attack. If the top two entries account for more than half of the failed-login traffic, the attack is distributed and the fix is not per-IP.

Why CSF and lfd alone are not enough

CSF can block CIDR ranges. lfd can read /var/log/secure in real time. The defaults do not put those two things together because the defaults are tuned for the threat model of fifteen years ago. The relevant limits are:

  • LF_SSHD (default 5) counts failed logins per IP, not per subnet.
  • LF_TRIGGER_PERM controls whether a tripped IP is blocked permanently or temporarily, not how broadly.
  • DENY_IP_LIMIT (default 200) caps how many entries csf.deny can hold before old ones get rotated out. A distributed attack will fill this in an hour and start evicting genuine blocks.

There is no built-in LF_SSHD_SUBNET. The subnet check has to be added in a cron script, and the script has to know how to talk to CSF.

We layer the defence:

  1. Admin IP allowlist first, before any other change.
  2. Move SSH off port 22.
  3. Add a subnet-level detection script that calls csf -d on /24 ranges exceeding a threshold.
  4. Disable password authentication entirely.
  5. Optionally add 2FA for admin accounts that still need shell access.

Each step is independently useful. Together they take the attack surface from "anyone on the internet can probe SSH" to "the attacker has to find a non-port-22 service, brute force keys instead of passwords, and clear a 2FA challenge". The cost of probing goes up by several orders of magnitude. The cost to a legitimate admin goes up by about thirty seconds the first time and zero seconds after that.

The fix we deploy now (in this order)

Step 1: whitelist your admin IP first

The catastrophic scenario for the rest of this post is that one of the changes blocks the operator running it. Before touching anything, we add the admin's static IP to csf.allow with a comment that explains why it is there.

csf -a 198.51.100.10 "Permanent admin: do not remove"
grep 198.51.100.10 /etc/csf/csf.allow

The second line is not optional. We verify the entry exists in the file, not just that the command returned cleanly. CSF will write to the allow list even when the address is malformed; the verification grep is the only thing that catches a typo before it matters.

If the admin works from more than one static address, every address goes in before step 2. If the admin works from a dynamic address, this step is "set up a jump host on a static IP first" and step 2 waits.

Step 2: move SSH off port 22

This is the change that produces the biggest single drop in attack volume. Most opportunistic SSH scanners only hit port 22. Moving the service to a high non-standard port reduces the noise by more than ninety percent in our measurements. It is not security (anyone who wants to find your SSH port will find it) but it is signal-to-noise relief that lets the rest of the defence breathe.

Pick a port high enough that scanners do not bother. Avoid the "obvious" alternates (2222, 22222, 2200) because the scanners that know about port changes know about those. Pick something in the 15600 to 64000 range that no other service on the box uses.

The cPanel-specific change is in three places, in this order:

# 1. Edit sshd config to listen on both old and new during the cutover.
vi /etc/ssh/sshd_config
# Add:  Port 15672
# Keep: Port 22
sshd -t                    # syntax check before restart
systemctl restart sshd
 
# 2. Open the new port in CSF.
vi /etc/csf/csf.conf
# Edit TCP_IN to include 15672
csf -r                     # restart CSF
 
# 3. Test the new port from the admin host BEFORE killing port 22.
ssh -p 15672 admin@cpanel-host

Once the new port is verified working from every admin workstation (not "I tested it once and it worked", but "every operator who needs shell access has logged in successfully via the new port") port 22 gets removed from sshd_config and from CSF's TCP_IN. Imunify360 needs the same change if it is in the firewall path; see our Tier 3 reference on the Imunify360 custom-port settings.

This step is the one most cPanel operators resist because cPanel itself keeps documentation that assumes port 22. The documentation is wrong for any production server.

Step 3: add subnet-level detection

The script below reads the last hour of /var/log/secure, groups failed-login source addresses into /24 ranges, and calls csf -d on any range that crossed our threshold. Threshold for our environment is fifty failed attempts from a single /24 in the last hour; this catches the kind of distributed attack the 8,127-attempt night was and produces roughly zero false positives in our weekly review.

#!/bin/bash
# /usr/local/sbin/csf-block-subnets.sh
# Block /24 subnets with more than 50 failed SSH logins in the last hour.
 
set -euo pipefail
 
THRESHOLD=50
WINDOW_MIN=60
LOG=/var/log/secure
ALLOWLIST=/etc/csf/csf.allow
 
# Build the time prefix to grep for the last $WINDOW_MIN minutes.
SINCE=$(date -d "${WINDOW_MIN} minutes ago" "+%b %_d %H:%M")
 
awk -v since="$SINCE" '
  $0 >= since && /Failed password/ {
    for (i=1;i<=NF;i++) if ($i=="from") print $(i+1)
  }
' "$LOG" \
| awk -F. '{print $1"."$2"."$3".0/24"}' \
| sort | uniq -c | sort -rn \
| awk -v t="$THRESHOLD" '$1 >= t {print $2}' \
| while read -r subnet; do
    # Refuse to block any subnet that overlaps the admin allowlist.
    if grep -qE "^${subnet%/*}\." "$ALLOWLIST"; then
      logger -t csf-block-subnets "skipping $subnet (overlaps allowlist)"
      continue
    fi
    if ! csf -g "$subnet" | grep -q "DENY"; then
      csf -d "$subnet" "ssh-bruteforce auto $(date +%F)" >/dev/null
      logger -t csf-block-subnets "blocked $subnet"
    fi
  done

Two details are load-bearing. The first is the allowlist check before the block call. The script refuses to block a /24 that overlaps the admin allowlist even if the data says it should. Locking the operator out of their own server is the failure mode that matters most.

The second is the csf -g check before the csf -d call. CSF's csf.deny file is finite (default 200 entries), and blocking the same subnet twice in quick succession rotates legitimate older blocks out of the list. We only add a new block if the subnet is not already in the deny rules.

Cron schedule, every five minutes, with output silenced because the script logs to syslog already:

*/5 * * * * /usr/local/sbin/csf-block-subnets.sh >/dev/null 2>&1

Step 4: disable password authentication

Password auth is the entire reason brute force exists. Disabling it turns the 8,127 attempts of this incident into 8,127 attempts that cannot succeed regardless of which password was tried, because there is no password prompt. The cost is a one-time key-distribution exercise.

We generate keys for every admin who needs shell access, copy each public key into the appropriate ~/.ssh/authorized_keys on the server, and verify each operator can log in with their key while password auth is still enabled. Only after every admin has confirmed working key auth do we change sshd_config:

# /etc/ssh/sshd_config
PasswordAuthentication no
ChallengeResponseAuthentication no
UsePAM yes
PermitRootLogin prohibit-password

The order matters. Disabling password auth before every admin has working key auth is the second catastrophic scenario, and it has cost people their weekend more than once. We will not flip the flag until the operator running the change can show us a list of every admin user on the box paired with a confirmed key-auth login from each.

Step 5: optional 2FA via Google Authenticator

cPanel servers can run PAM-based 2FA for SSH with google-authenticator-libpam. Worth it for any account with sudo access on a server that handles client data. Not worth it for read-only operator accounts that already have hardware key-auth and a narrow IP allowlist; the extra friction does not buy proportionate security on those.

When we do enable it, the setup walks each admin through google-authenticator -t -d -f -r 3 -R 30 -W and the QR code, the emergency codes go into the operator's password manager, and the auth required pam_google_authenticator.so line goes into /etc/pam.d/sshd after auth substack password-auth.

What we never recommend

Several "solutions" come up in cPanel forum threads about SSH brute force. We do not recommend any of these.

Disabling SSH "to be safe". You will need it the moment something breaks that the WHM UI cannot fix, and the something-breaks moment will not wait for you to re-enable the service.

Port-knocking. Security through obscurity that breaks every automation tool we use, including the backup runners and the monitoring agents.

Custom one-off iptables rules outside CSF. They survive CSF restarts inconsistently and they will be invisible to the next operator on the team. If the rule is worth keeping, it goes in csf.allow, csf.deny, or csfpre.sh.

"AI-powered SSH protection" SaaS that does not show you the rules it adds. We say this with a self-aware wink because we are building an AI-driven product, and we still think you should never deploy something that hides what it is doing from the operator. Every action our use case takes is logged and reversible. If a vendor will not show you the rules, the vendor is the threat.

Geolocation as a force multiplier

Most legitimate SSH on a single cPanel server comes from two or three known countries. Blocking everything else via MaxMind GeoIP + CSF's CC_DENY directive is a fast way to cut another order of magnitude off attack volume:

# /etc/csf/csf.conf
CC_DENY = "CN,RU,BR,VN,IN,KR"
CC_ALLOW = ""
CC_ALLOW_FILTER = ""

The honest limit on this is that GeoIP is a heuristic, not a barrier. Attackers route through VPN endpoints in your allowed countries and through rented residential proxy pools that have addresses everywhere. The 8,127-attempt night included traffic from at least four countries that any sensible CC_ALLOW list would include. Geolocation reduces the volume of background scanning; it does not stop a determined attacker. We deploy it because the volume reduction is real and the operational cost is near zero, not because we trust it as a primary control.

The forensic data you should be collecting

When step 3 starts blocking subnets automatically, the post-incident question is always "did any of them succeed before we caught them?". The data that answers that question has to exist before the incident, not after.

What we keep, with retention:

  • /var/log/secure rotated weekly, kept for 90 days, never compressed away during an active incident.
  • lfd CT (connection tracking) logs at default rotation.
  • Apache access logs around any successful SSH login from an unfamiliar address, cross-referenced for control-panel activity.
  • Output of last and lastb captured nightly into a separate retention directory. last shows every successful login; lastb shows every failure. The two together answer "did anyone get in".

The mandatory follow-up: any time a login succeeds from an address that is not on the operator allowlist, we treat it as a potential compromise until proven otherwise. The proof is matching the timestamp to an explanation: a known operator on a known travel day, a known contractor on a known engagement. If there is no explanation, the account is locked and the keys are rotated before anyone goes back to sleep.

The wider firewall context for this (why CSF and Imunify360 sometimes fight each other in ways that complicate brute-force response) is in our postmortem on the CSF, lfd, and Imunify360 conflict. And the close cousin to SSH brute force, the WordPress xmlrpc and wp-login flood that hits at the same hours from the same kinds of attackers, is the subject of the 27-site xmlrpc abuse postmortem.

The 60-second audit

Four commands. Run them now. If any of them returns more than the expected output, you have ongoing SSH probing and you should not finish reading this post before starting on step 1.

# 1. How many failed SSH logins in the last hour?
grep "Failed password" /var/log/secure \
  | awk -v cutoff="$(date -d '1 hour ago' '+%b %_d %H')" '$0 ~ cutoff' \
  | wc -l
 
# 2. Which /24 subnets account for the most failed logins today?
grep "Failed password" /var/log/secure \
  | awk '{for(i=1;i<=NF;i++) if($i=="from") print $(i+1)}' \
  | awk -F. '{print $1"."$2"."$3".0/24"}' \
  | sort | uniq -c | sort -rn | head -5
 
# 3. How full is csf.deny right now?
wc -l /etc/csf/csf.deny
 
# 4. Is the admin IP allowlist still in place?
grep -E "198\.51\.100\.10" /etc/csf/csf.allow

Healthy numbers for a quiet cPanel box: command one under fifty, command two showing the top subnet well under twenty attempts, command three under a hundred entries, command four returning exactly the expected allowlist line. Numbers that look like the 8,127-attempt night are command one over a thousand, command two showing two subnets each above a hundred, command three at or near the DENY_IP_LIMIT cap, and command four. If this one comes back empty, stop reading and run step 1 of the fix immediately.

How ServerGuard handles this

ServerGuard's use case for distributed SSH brute force is implemented today and is the canonical second scenario in the product hero animation. The detection and the autonomous Safe action are both live; the deeper migration work to key-only auth and 2FA stays in human hands by design.

What the use case does today:

  • Detection. SGuard subscribes to lfd's event stream and tails /var/log/secure in real time. The threshold (fifty failed attempts from a single /24 in the last hour) is the same one the cron script in step 3 uses; the difference is that SGuard reacts inside thirty seconds rather than waiting for the next five-minute tick.
  • Action 1, Safe, autonomous. Block the offending /24 via CSF. Before the block call, SGuard verifies the admin allowlist is intact and refuses to add any rule that overlaps an allowlisted address. The refusal is hard-coded and cannot be overridden from the dashboard.
  • Action 2, Safe, autonomous. Increment a per-server daily counter of subnets blocked and post the summary to the operator's daily digest. The counter is one of the early signals we use to decide whether a server needs the deeper hardening of steps 2, 4, and 5.
  • Action 3, Moderate, with approval. Suggest the port-change and the key-only-auth migration when the daily counter crosses a tuneable ceiling. SGuard will draft the sshd_config diff and the CSF changes; it will not execute them without explicit operator sign-off via the approval flow, because the catastrophic-lockout scenarios in steps 2 and 4 are not risks we want the autonomous layer to take.

The non-negotiable: SGuard's admin IP allowlist is treated as a hard constraint, not a soft preference. If the data says the attack pattern is coming from the same /24 as an allowlisted admin address (and this does happen, when an operator's ISP rotates their static address into an adjacent block that has been compromised elsewhere) SGuard refuses to block the subnet and alerts the operator to choose between moving their static address or temporarily removing the allowlist. There is no automated path through that decision because there should not be one.

If you are running a cPanel server today and the 60-second audit above returned numbers that look like an attack, the first three steps of the fix are deployable in under an hour and they will hold against the shape of brute force we have seen consistently across the last three years. The ServerGuard use case automates the detection and the Safe block; it does not automate the parts of the response where being wrong locks you out of your own server.

Share this post

  • 14 min read

    Locked out of cPanel SSH: VNC, iptables, and the way back in

    Locked out of cPanel SSH: VNC, iptables, and the way back in The terminal hangs. You hit Enter again. Nothing. You try a different SSH client. Nothing. You try from your phone's hotspot, on a different ISP, with a different public IP, and S

  • 14 min read

    CSF, lfd, and Imunify360: why your firewall is killing itself

    CSF, lfd, and Imunify360: why your firewall is killing itself The page came in at 03:14. A cPanel node on had stopped accepting new connections to wp-login on three sites, then started accepting them again, then stopped. The firewall was al

  • 14 min read

    xmlrpc.php abuse and the 27-site one-shot fix on cPanel

    xmlrpc.php abuse and the 27-site one-shot fix on cPanel The first time floods one of your servers, you Google the symptom, find a guide called "how to disable xmlrpc.php in WordPress", install a plugin, click a checkbox, and move on. The se