AutoSSL fails on Microsoft 365 autodiscover subdomains: the fix
Why cPanel AutoSSL emails nightly that autodiscover.client failed HTTP-01 when the client uses Microsoft 365, and the two-step WHM exclusion fix.
AutoSSL fails on Microsoft 365 autodiscover subdomains: the fix
The email arrives every night at the same time. [cpanel-host] AutoSSL Failed for User 'bayareade'. The body lists three or four
subdomains that failed Domain Control Validation, the same ones every
night, week after week. Nothing on the server has changed. The cert on
the primary domain renewed cleanly. The site loads. WHMCS is happy. And
yet at 03:00 every morning AutoSSL writes a new line in
/var/log/autossl.log saying it could not validate
autodiscover.bayareadesignco.com, and the admin inbox gets another copy
of the same failure email it got yesterday, and the day before, and the
day before that.
This is the postmortem of a slow-burn AutoSSL failure that ran on one
of our cPanel servers for six weeks before anyone correlated the email
with its root cause. The pattern is unambiguous once you know what to
look for: a client moves their mail to Microsoft 365, the M365 setup
guide tells them to CNAME autodiscover and a couple of other names to
Microsoft's endpoints, and from that moment cPanel's AutoSSL can no
longer prove domain control over those subdomains because the HTTP-01
challenge file lives on our server while DNS now points the subdomain
somewhere else. The fix is two clicks in WHM. The interesting part is
how the failure hides in plain sight for so long.
This post covers the failure mode, the DNS pattern that produces it, the two-step fix, the audit command we now run across the fleet, and an honest description of how ServerGuard's use case handles the same class of incident.
The repeating failure email
The first signal is always the email, and the email always looks the same. Subject:
[cpanel-host] AutoSSL Failed for User 'bayareade'
Body (trimmed to the load-bearing lines):
The AutoSSL system failed to renew the SSL certificate for the
"bayareade" cPanel user account.
The following domains failed DCV (Domain Control Validation):
- autodiscover.bayareadesignco.com
- autodiscover.bayareadesignco.com (IPv6)
- mail.bayareadesignco.com
Reason: "DNS DCV" failed: The DNS query for
"_acme-challenge.autodiscover.bayareadesignco.com" returned a result that
does not match the expected challenge response.
The AutoSSL system will continue to attempt renewal.
That last line is what makes this incident class so corrosive. AutoSSL
will keep trying. It will keep emailing. The failures are partial. The
primary domain bayareadesignco.com renewed fine, only the mail-adjacent
subdomains failed, so nothing user-visible breaks. There is no broken
site to investigate, no angry client ticket, just a nightly email that
joins the queue of other AutoSSL notices, slowly trains the on-call
engineer to mark anything starting with [cpanel-host] AutoSSL
as background noise, and quietly establishes alert fatigue as the
default response to certificate emails.
The cost is not the failing cert. The cost is the day a real renewal fails (a domain transfer that did not propagate, a misconfigured DNS record on a billing-critical site) and the alert for that failure arrives in the same shape, in the same inbox, and gets handled with the same shrug as the six weeks of M365 autodiscover noise that preceded it.
Why it fails
cPanel's AutoSSL validates domain control by serving a challenge file
over HTTP. For each domain on a cPanel account, AutoSSL writes a
randomised token to
/home/<user>/public_html/.well-known/acme-challenge/<token>, asks
Sectigo or Let's Encrypt to fetch
http://<domain>/.well-known/acme-challenge/<token>, and proves
control by matching the response. This is the HTTP-01 challenge in the
ACME spec, and on a normal cPanel-hosted domain it works because the
A record for <domain> points at the cPanel server's IP, the request
hits Apache, Apache serves the file from the user's docroot, the CA
verifies the token, and the cert issues.
The flow breaks the moment a subdomain points anywhere other than the cPanel server. For Microsoft 365, the standard tenant setup instructions tell the customer to add CNAME records for at least three names:
| Name | CNAME target | Purpose |
|---|---|---|
autodiscover.bayareadesignco.com | autodiscover.outlook.com | Outlook client autodiscovery |
lyncdiscover.bayareadesignco.com | webdir.online.lync.com | Teams/Skype federation |
sip.bayareadesignco.com | sipdir.online.lync.com | SIP signalling for Teams |
Once those CNAMEs are in place, a request for
http://autodiscover.bayareadesignco.com/.well-known/acme-challenge/<token>
no longer reaches our server. It reaches autodiscover.outlook.com,
which does not know what the AutoSSL challenge is, has never heard of
the token, and responds with a 404 (or worse, an HTTP redirect to an
HTTPS endpoint with an unrelated cert, which AutoSSL also treats as
failure). cPanel's AutoSSL reads the failure, logs it, emails the
admin, and queues another attempt for tomorrow night.
The same thing happens for mail.bayareadesignco.com if the client has
CNAMEd that name to Microsoft's mail endpoint, which several M365
deployment guides recommend even though Microsoft's own documentation
suggests using outlook.office365.com only via Outlook's autoconfig.
We have seen the mail.* variant in roughly half of M365 migrations
on our fleet.
The full chain looks like this in the AutoSSL log:
[2026-04-02 03:14:08 +0000] info [autossl] Domain "autodiscover.bayareadesignco.com": HTTP DCV check started
[2026-04-02 03:14:08 +0000] info [autossl] Domain "autodiscover.bayareadesignco.com": Fetching "http://autodiscover.bayareadesignco.com/.well-known/acme-challenge/q7nM..."
[2026-04-02 03:14:09 +0000] warn [autossl] Domain "autodiscover.bayareadesignco.com": HTTP DCV failed: 404 Not Found
[2026-04-02 03:14:09 +0000] info [autossl] Domain "autodiscover.bayareadesignco.com": Trying DNS DCV fallback
[2026-04-02 03:14:10 +0000] warn [autossl] Domain "autodiscover.bayareadesignco.com": DNS DCV failed: no _acme-challenge TXT record found
[2026-04-02 03:14:10 +0000] error [autossl] Domain "autodiscover.bayareadesignco.com": all DCV methods exhausted, skipping for this issuance
The fallback to DNS DCV is interesting and worth understanding. cPanel
will try a DNS-01 challenge if the HTTP-01 fails, but only if there is
a _acme-challenge.<domain> TXT record on the authoritative DNS server
that AutoSSL can write to. For a domain whose DNS is managed externally
(Cloudflare, Route 53, the registrar's own panel), AutoSSL cannot place
that TXT record, and the DNS DCV fallback fails immediately. So the
two-stage failure is the normal case: HTTP-01 fails because the CNAME
points elsewhere, DNS-01 fails because AutoSSL cannot write to external
DNS.
Why this only happens with M365 (and similar)
This failure mode is specific to mail-as-a-service providers that expect customers to point named subdomains at provider endpoints via CNAME. The three common cases on our fleet:
Microsoft 365. The case in this post. Requires CNAMEs for
autodiscover, lyncdiscover, sip, and frequently enterpriseregistration
and enterpriseenrollment for Intune deployments. All five fail
AutoSSL the same way.
Google Workspace. Does not produce this failure. Google Workspace
uses MX records to route mail and the autoconfig flow is keyed off MX
plus a single _dmarc TXT record. There are no provider-pointing
CNAMEs on the mail subdomain, so AutoSSL still sees the cPanel server's
A record for mail.clientdomain.com and the HTTP-01 challenge succeeds
naturally.
Zoho Mail and ProtonMail business. Both can produce this failure in
some configurations. Zoho recommends a CNAME for mail to
business.zoho.com for webmail access. That one breaks AutoSSL the
same way. ProtonMail business with custom domain plus their bridge
feature can produce the same pattern on mail.* if the customer
follows the optional setup steps.
On-server mail (Dovecot/Exim on cPanel itself). No failure. The
mail subdomains resolve to the cPanel server's IP because the
mail.clientdomain.com A record is auto-managed by cPanel and points to
the same server. HTTP-01 succeeds for the same reason it succeeds for
the primary domain.
The pattern that matters for diagnosis: if dig <subdomain> CNAME
returns a target outside the cPanel server (especially anything ending
in .outlook.com, .lync.com, .protection.outlook.com, or
.onmicrosoft.com), this incident class is the explanation.
The DNS verification flow
Before excluding anything, confirm the diagnosis. From the cPanel
server (or anywhere with dig):
$ dig +short autodiscover.bayareadesignco.com CNAME
autodiscover.outlook.com.
$ dig +short autodiscover.outlook.com
autodiscover-emeawest.office.com.
52.97.146.226
52.97.147.32
# ... rotating Microsoft endpoint IPs ...The first line is the smoking gun. Any CNAME target outside our cPanel
server's domain confirms the diagnosis. If the CNAME target is
autodiscover.outlook.com specifically, the client is on M365. If the
target is autodiscover.{tenant}.onmicrosoft.com, the client uses a
direct-to-tenant CNAME (less common but valid). For the mail.*
variant:
$ dig +short mail.bayareadesignco.com CNAME
bayareadesignco-com.mail.protection.outlook.com.mail.protection.outlook.com is the M365 Exchange Online Protection
endpoint. Same explanation, same fix.
A quick sanity check on what the cPanel server itself thinks the A record should be:
$ /usr/local/cpanel/scripts/whmapi1 --output=jsonpretty parse_dns_zone domain=bayareadesignco.com | jq '.data.dnszone[] | select(.dname=="autodiscover")'
{
"dname": "autodiscover",
"type": "CNAME",
"ttl": 14400,
"record": "autodiscover.outlook.com."
}The local DNS zone agrees with what dig returned externally. This is
the case in every M365 migration we have seen, because the client
edits the zone in WHM's DNS Zone Editor before pointing Outlook at the
new endpoint. The CNAME is intentional and correct. Do not change
the CNAME. The client needs it for Outlook autoconfig to work. The
fix is on the AutoSSL side, not the DNS side.
The fix in two steps
Step 1: exclude the mail subdomains from AutoSSL
cPanel has a per-user "Excluded Domains" list that AutoSSL skips entirely. The list is editable via WHM (Home > SSL/TLS > Manage AutoSSL
Manage Users > Excluded Domains) and via the WHM API. For a single client:
$ whmapi1 --output=jsonpretty set_autossl_user_excluded_domains \
user=bayareade \
excluded_domains='autodiscover.bayareadesignco.com,lyncdiscover.bayareadesignco.com,sip.bayareadesignco.com,mail.bayareadesignco.com'
{
"metadata" : {
"command" : "set_autossl_user_excluded_domains",
"version" : 1,
"reason" : "OK",
"result" : 1
},
"data" : {
"excluded_domains" : [
"autodiscover.bayareadesignco.com",
"lyncdiscover.bayareadesignco.com",
"sip.bayareadesignco.com",
"mail.bayareadesignco.com"
]
}
}The first three are always safe to exclude in an M365 deployment.
They point at Microsoft endpoints and cPanel will never be able to
serve a cert for them. The fourth (mail.*) is the case-by-case one:
exclude it only if the client has CNAMEd it to Microsoft, leave it
included if the client uses on-server webmail.
To verify the exclusion took effect, trigger a manual AutoSSL run for that user and watch the log:
$ /usr/local/cpanel/bin/autossl_check --user=bayareade
$ tail -f /var/log/autossl.log
[2026-04-15 14:22:01 +0000] info [autossl] Starting check for user "bayareade"
[2026-04-15 14:22:01 +0000] info [autossl] Skipping "autodiscover.bayareadesignco.com": user-excluded
[2026-04-15 14:22:01 +0000] info [autossl] Skipping "lyncdiscover.bayareadesignco.com": user-excluded
[2026-04-15 14:22:01 +0000] info [autossl] Skipping "sip.bayareadesignco.com": user-excluded
[2026-04-15 14:22:01 +0000] info [autossl] Skipping "mail.bayareadesignco.com": user-excluded
[2026-04-15 14:22:02 +0000] info [autossl] Issued certificate for "bayareadesignco.com, www.bayareadesignco.com"
[2026-04-15 14:22:02 +0000] info [autossl] Check complete for user "bayareade"The nightly emails stop the following day. The cert on the primary
domain renews on its normal cycle. The mail subdomains continue to
function as M365 endpoints because they never needed a cert from our
server in the first place. The certs they need are Microsoft's, and
Microsoft serves those from autodiscover.outlook.com directly to the
Outlook client.
A note on wildcards: cPanel does not currently support an
"exclude autodiscover.* across all users" pattern at the WHM level.
The per-user exclusion is the only mechanism. We have asked cPanel for
a global pattern-based exclusion list and were told it is on the
roadmap with no committed date. Until then, every M365 migration on
every cPanel account requires its own exclusion entry.
Step 2: document it for next time
This is the step that prevents the slow-burn recurrence. After excluding the subdomains, write an internal note that links the cPanel user to the M365 deployment. We use a single Markdown file in our agency's runbook repo with one line per client:
goldenvi Google Workspace (no AutoSSL exclusions)
northwood on-server Dovecot (no AutoSSL exclusions)
bayareade Microsoft 365 autodiscover, lyncdiscover, sip, mail
tallpine Microsoft 365 autodiscover, lyncdiscover, sip
aspenroo Zoho Mail mail
The reason this note matters is the failure mode's other recurrence trigger: domain transfers. When a new client transfers a domain to our server and we run cPanel's transfer tool, the DNS zone comes with it, including the M365 CNAMEs. AutoSSL on the new server starts trying to validate those subdomains the same night and the failure email cycle begins again from zero. Having the per-client M365 list as a checklist item in the transfer-in runbook collapses a six-week recurrence into a three-minute fix at transfer time.
Edge case: when the client wants the cert anyway
Some compliance regimes (PCI-DSS scoped environments, certain healthcare contexts) require every subdomain under a regulated domain to have a valid TLS cert, even when the subdomain is delegated to a third-party endpoint. This is rare in practice (most auditors accept "the subdomain delegates to Microsoft and Microsoft serves a valid cert" as compliance) but when it does come up the fix is to issue the cert via DNS-01 challenge out-of-band.
The short version: install acme.sh on the cPanel server, configure
it with API credentials for the client's external DNS provider
(Cloudflare's CF_Token, Route 53's IAM keys, etc.), issue the cert
with the DNS-01 challenge against the third-party DNS, and install it
manually into the cPanel user's SSL store via whmapi1 installssl.
The cert is then renewed by a cron job that runs acme.sh --renew
rather than by AutoSSL.
cPanel's AutoSSL does not natively support DNS-01 against external
DNS providers, which is why this has to live as an out-of-band cron.
We treat this path as the exception, not the default: it adds a
renewal mechanism the cPanel admin has to remember exists, and the
first time it silently fails (Cloudflare API token expires, IAM key
rotates) the failure mode is invisible. No email, no log line in
/var/log/autossl.log, just a cert that quietly expires. If you walk
this path, monitor the cert directly via openssl s_client rather than
trusting the renewal script.
A 5-minute audit
This is the script we now run quarterly across our cPanel fleet to
catch the M365 pattern on servers we have not touched in a while. From
a workstation with SSH access to each server in ~/.ssh/config:
# Step 1: list every domain on every cPanel server, with its CNAME if any.
for srv in $(awk '/^Host server[0-9]/ {print $2}' ~/.ssh/config); do
ssh "$srv" '
/usr/local/cpanel/bin/whmapi1 --output=jsonpretty list_users \
| jq -r ".data.users[]" \
| while read u; do
/usr/local/cpanel/scripts/whmapi1 --output=jsonpretty \
domainuserdata user="$u" \
| jq -r ".data.userdata.main_domain, (.data.userdata.sub_domains // [])[]"
done
' | while read d; do
cname=$(dig +short "$d" CNAME)
case "$cname" in
*.outlook.com.|*.lync.com.|*.protection.outlook.com.|*.onmicrosoft.com.)
echo "M365_PATTERN $srv $d -> $cname"
;;
esac
done
done > /tmp/m365-autossl-audit.txt
# Step 2: for each match, check whether it is already excluded.
while read line; do
set -- $line
srv=$2; domain=$3
user_domain=$(echo "$domain" | sed 's/^[^.]*\.//')
ssh "$srv" "whmapi1 --output=jsonpretty get_autossl_user_excluded_domains user=\$(/usr/local/cpanel/bin/whmapi1 --output=jsonpretty domainuserdata domain=$user_domain | jq -r '.data.userdata.user')" \
| jq -r --arg d "$domain" '.data.excluded_domains // [] | if index($d) then "OK_EXCLUDED \($d)" else "NEEDS_EXCLUSION \($d)" end'
done < /tmp/m365-autossl-audit.txtThe output is two lists: domains matching the M365 CNAME pattern, and which of those are already excluded vs which are still in the failure loop. The first time we ran this across our fleet we found seventeen domains across eleven cPanel users that had been emailing failure notices nightly for periods ranging from three weeks to four months. None of them had ever been reported by the client because none of them caused a user-visible problem. They just chewed through admin attention every morning.
Two cPanel administration patterns deserve their own posts and are covered separately: the layered SSH brute-force defence on cPanel servers in SSH brute force on cPanel: the 8,127-attempt night and the fix, and the AutoSSL-adjacent issue of certificate expiry hiding behind a CSF, lfd, and Imunify360 conflict that masks renewal alerts behind firewall noise. Both share the same core failure shape as this incident: a quiet background subsystem producing alerts that look identical day after day, until they stop being read.
How ServerGuard handles this
The AutoSSL M365 pattern is one of the SSL-renewal use cases we cover. It is one of the highest-frequency and lowest-severity incidents in our fleet, which makes it a clean fit for ServerGuard's Safe-action tier. The failure is unambiguous when the CNAME points at a Microsoft endpoint, the fix is reversible (an exclusion can be removed in a single API call), and the cost of a wrong auto-exclusion is small (one renewal cycle of a cert we did not need anyway).
Detection. ServerGuard ingests AutoSSL signals from two sources
in parallel. The first is the same nightly email the admin gets,
parsed via the IMAP integration on the agency's central admin mailbox,
which is enough to catch the failure within hours of the first
occurrence. The second is the AutoSSL log itself, scraped via the
same SSH executor that handles every other diagnostic, looking for
the error [autossl] ... all DCV methods exhausted line and
correlating repeats across nights. Both sources land in the same
incident record so a single failure that arrived only via log (the
admin's email filter sent the AutoSSL notice to a folder) does not
fall through the cracks. The detection layer is implemented today.
Diagnosis. When the incident record fires, ServerGuard runs
dig <failing-subdomain> CNAME on the failing subdomain plus the
sibling subdomains that share its mail pattern (autodiscover,
lyncdiscover, sip, mail, enterpriseregistration,
enterpriseenrollment). It classifies the CNAME target against a
fixed pattern list: anything ending in .outlook.com, .lync.com,
.protection.outlook.com, or .onmicrosoft.com classifies as
unambiguous M365. Anything ending in .zoho.com classifies as Zoho
Mail. Anything ending in .protonmail.ch classifies as ProtonMail.
Anything else (a CNAME pointing at a non-mail third-party,
or no CNAME at all) classifies as ambiguous and triggers the
manual escalation path instead. The classifier is implemented today
for the M365 case (the highest-frequency one); the Zoho and ProtonMail
classifiers are upcoming and will follow the same pattern.
Action (Safe, auto). When the classifier returns unambiguous M365
AND the failure has repeated for at least three consecutive AutoSSL
runs, ServerGuard calls whmapi1 set_autossl_user_excluded_domains
to add the failing subdomain to the per-user exclusion list. The
action is logged into the audit log with the dig output and the AutoSSL
log lines that triggered it, and the next AutoSSL run is monitored to
confirm the exclusion took effect. The Safe-action auto-exclude is
implemented today with the unambiguous-pattern + 3-failure guardrail.
Action (manual escalation). For ambiguous cases (a CNAME we have
not seen before, a mixed pattern where autodiscover points at Microsoft
but mail points at the cPanel server, or a failure on a subdomain
without any CNAME) ServerGuard does not auto-exclude. It alerts
the on-call engineer with the full diagnosis (dig output, AutoSSL log
excerpt, the proposed exclusion list, and the reason the case did not
clear the unambiguous bar), waits for a human to approve or reject,
and only then applies the exclusion. This is the standard
human-in-the-loop pattern and applies to every action
ServerGuard is not confident enough to take alone. The manual
escalation path is implemented today.
What ServerGuard does not do. It does not modify DNS records, ever. Even when the root cause of an AutoSSL failure is a misconfigured CNAME (and we have seen this exact pattern, where a client adds a CNAME to a Microsoft endpoint they should have pointed to a different tenant) fixing it requires coordinating with the client and the change has to happen in the client's DNS provider, not on our server. ServerGuard's use case for those cases ends at the alert: it identifies the misconfiguration, includes the suggested correction in the alert body, and waits for a human to talk to the client. Modifying DNS on the customer's behalf is not coming.
The honest version of the SGuard story on this incident class: we collapse a six-week tail-end-of-the-inbox problem into an action that fires within the same nightly cycle the failure occurs in, with a guardrail that prevents auto-excluding a cert the customer actually needs. The action is small. The compounding value is in how many of these small actions happen automatically across the fleet every night without an engineer reading another lookalike email.
Related posts
- 6 min read
When the client changes DNS without telling you first
When the client changes DNS without telling you first The ticket arrives on a Friday afternoon. Three words: "website is down". No screenshot, no error, no context. You load the site in a private window. It loads fine. You ask a colleague o
- 15 min read
86 CPU spikes in 24 hours: a multi-cause cascade postmortem
86 CPU spikes in 24 hours: a multi-cause cascade postmortem The mailbox at 08:00 had 86 ChkServd CPU alerts from , all from the previous 24 hours. Not a single tidy outage with a single cause. A steady drip of "CPU at 95% for the last minut
- 6 min read
When you have to suspend a WooCommerce client: anatomy
Anatomy of a forced suspension on a shared cPanel server The decision to take a paying client offline to protect fourteen other paying clients is the worst part of running a small hosting agency. There is no scripted version of it that feel