Status: this is a design draft created by Maël Valais on 12 April 2022 and updated on 9 June 2023.
Issues:
TL;DR: We recommend supporting the RFC DNS-over-HTTPS by extending the existing --dns01-recursive-nameservers
. This will unblock at least the 12 people who reacted on the PR and issue. The aim isn't to have a full support of all secure DNS protocols; instead, the goal is to work around egress limitations (egress port 53 closed) in companies with restricted egress access. If someone wants further secure DNS protocols, they can use instead one of the DoH proxies listed below.
Some companies forbid traffic over UDP port 53, and where the only allowed egress traffic must be TCP traffic over 443. In this case, cert-manager is unable to perform the ACME DNS-01 self-check.
This is a stronger version of the split-horizon DNS problem. It does not solve the above problem by running cert-manager with the --dns01-recursive-nameservers
flag since cert-manager can't do DNS lookups over UDP port 53. For example, this would not work:
--dns01-recursive-nameservers=1.1.1.1:53
Users affected:
Florian has an in-flight PR: #5003.
There are two DNS-over-HTTPS protocols:
The DNS Wire format is supported by 63 providers (as of 11 April 2022, the providers are listed in the curl wiki page DNS over HTTPS).
On the other side, the JSON API format is supported by 3 providers:
Although is hasn't been standardized, the JSON API is mostly well documented in the CloudFlare Using JSON page.
Although the JSON API is easier to write a client for, to test and debug, and supports TXT
, it doesn't support SOA
, CNAME
, or CAA
. We also found that it only supports 3 providers, and the JSON format only supports a subset of the DNS records.
The plan is to extend the existing global flag --dns01-recursive-nameservers
, e.g.,
cert-manager-controller \
--dns01-recursive-nameservers "https://8.8.8.8/resolve"
Later on, if someone needs selecting the DNS resolver per-issuer, we could also add a field to the issuer to override the global flag. But we chose not to implement it for now. It would look like this:
# This is a FUTURE POSSIBLE example.
apiVersion: cert-manager.io/v1
kind: Issuer
spec:
acme:
- solvers:
dns01:
selfCheck:
dnsOverHTTPSRFC: https://8.8.8.8/resolve
5 July 2022 (Mael, Florian): Field vs. Flag: Today we have identified the two arguments in favor of going with a field and not a flag:
--recursive-nameservers
with two DNS-01 providers being in conflict. Admittedly, --recursive-nameservers
is not as conflicting as having a flag for DNS-over-HTTPS, which would break any non-DNS-over-HTTPS-enabled DNS-01 providers).We don't have strong arguments in favor of the field (over the flag). We aim to go with the field approach since it makes end-to-end testing much easier.
26 July 2022 (Sven): DNS-over-HTTPS over HTTP proxy: On top of forbidding traffic over UDP port 53, companies often require the use of an HTTP proxy (as opposed to allowing egress HTTPS traffic for specific domains). Sven Schliesing pointed out that the cloudflared
solution does not deal with the HTTP proxy problem.
Another hurdle when using an HTTP proxy is that instead of using the standard CONNECT protocol, some HTTP proxies do TLS reencryption with their own root certificate. There are also HTTP proxies requiring NTLM authentication.
To conclude this section, we suggest adding dnsOverHTTPSJSONEndpoint
to support the minimal set of features that will unblock at least 7 people.
TL;DR: although DoH proxies exist and work for the most part, we have found that it is challenging to implement (lack of maintained containers, requires CAP_ADMIN.
As proposed on the Kubernetes Slack by Matthew de Haast, it is possible to enable DNS-over-HTTPS with cert-manager by deploying the cloudflared
proxy as a deployment/service in the same namespace as cert-manager. cert-manager is then set to point to that service for DNS queries. This work around has been tested and solves a split DNS issue with a AWS Hosted zone. To learn more about cloudflared
, you can visit: https://developers.cloudflare.com/1.1.1.1/encryption/dns-over-https/dns-over-https-client/.
It is also possible to use the sidecar approach (i.e., run that container in the same pod as the cert-manager-controller container), but that requires a lot of changes to the cert-manager Helm chart.
Cloudflared is not the only alternative. In June 2023, we have found that there is no Kubernetes-enabled DNS proxy tool that fits the bill:
Project | Stars | State |
---|---|---|
AdguardTeam/dnsproxy | 1800 | no image |
aarond10/https_dns_proxy | 705 | no upstream image, unofficial image moranbw/https-dns-proxy-docker |
DNSCrypt/doh-server | 573 | no image |
facebookarchive/doh-proxy | 462 | archived |
satishweb/docker-doh | 88 | no image |
junkurihara/doh-server (DNSCrypt fork) | 1 | no upstream image, unofficial image jqtype/doh-server |
jacobwoffenden/container-doh-proxy (cloudflared-based) | 0 | official image ghcr.io/jacobwoffenden/doh-proxy |
We tried aarond10/https_dns_proxy since it had an image available (although unofficial) and it has many stars so it must be somewhat maintained. It successfully worked.
That said, since no official image is available, we do not recommend using it nor any of the tools in this list. Instead, we recommend implemeting DNS-over-HTTPS in cert-manager. For more elaborated use-cases (such as DNS-over-HTTPS using the RFC protocol), we recommend using one of the DoH proxies that are still maintained.
TL;DR: “Authoritative nameserver lookup” and "CNAME follow" need to be disabled when using DNS-over-HTTPS (see conclusion).
For context, when solving a DNS-01 challenge, cert-manager does DNS queries at two moments:
TXT
record, cert-manager calls FindZoneByFqdn
to find the apex domain of the zone that cert-manager needs to add the TXT
record. It finds it by looking for the first SOA
record starting with the domain on which the TXT
is meant to be inserted.TXT
record, cert-manager does a self-check by querying the TXT
record in the function checkDNSPropagation
.Years ago, cert-manager used to do a simple DNS lookup. Nowadays, cert-manager has three different DNS schemes:
NS
record to fetch the TXT
record.
_acme-challenge.domain.com.
using 8.8.8.8.CNAME
it follows it, and when it finds a CAA
records it does something (I can't remember).SOA
record, it fetches the NS
record using 8.8.8.8.TXT
record.--dns01-recursive-nameservers-only=true
)TXT
record.In the three modes, it is possible to change the default 8.8.8.8 nameserver by using the flag --dns01-recursive-nameservers
.
Note: I think talking about "recursive name server" isn't a good idea. 99.9% of end-user DNS servers (e.g., 1.1.1.1) and DoH endpoints are recursive resolvers. The only non-recursive name servers are the NS
servers (i.e., the authoritative name servers) and are never used by anyone (except other nameservers), so there is no good reason to talk about "recursive" name servers in the documentation. I think it was a mistake to mention "recursive" in the flag --dns01-recursive-nameservers
, it adds doubt to the user for no good reason. No one would think that they need to give some NS
IPs.
But our current DNS-01 self-check does not simply rely on a single DNS lookup. In order to work around split-horizon DNS, where cert-manager and Let's Encrypt are relying on different name servers, we have advised a work around a long time ago. We call it the "authoritative nameserver lookup". The authoritative nameserver lookup does what a recursive name server would do if we were to query it. The goal in mimicking what a recursive DNS would do, solving two issues:
But in the face of restricted DNS environments, as James Munnelly explains in Solutions for split-horizon and restricted DNS environment issues (written in 2018), the "authoritative nameserver lookup" won't work. In 2018, the problem is stated as:
A user has configured their cluster/VPC so that all outbound traffic on port 53 is denied, except for the one cluster DNS server (i.e. kube-dns, or their route53 resolver).
In which case the "authoritative nameserver lookup" does not make sense and has to be disabled, since cert-manager can only expect to be talking to that one resolver.
But what if all outbound traffic over UDP port 53 is blocked, and that there is no way to contact an outside recursive nameserver over UDP port 53?
Other cert-manager issues in which using a DNS lookup instead of mimicking the behavior of a recursive DNS resolver would solve the issue:
Comment from Jake Sanders: Richard Wall is working on Readiness gates for ACME challenges. Perhaps we could work out what the API looks like for them and see if it makes sense to add conditions for "passed DNS check", or "passed DNS over HTTPS check".
Conclusion 9 June 2023:
The reason we talk about "authoritative name servers" in cert-manager's DNS-01 code base is because cert-manager's self-check tries to be as close as possible to how the challenge will be performed by Let's Encrypt.
Let's Encrypt queries the authoritative name servers via Unbound's resolver. Unlike our laptops and smartphones that rely on a non-authoritative name server which will do the recursive look-up for us, Unbound first queries the NS
record for the top-level domain (e.g., .com
), and then recursively finds the name servers of the zone in which the challenged domain is located in. For example, that would be delegated.domain.com
for the challenge _challenge.delegated.domain.com
.
Imagine you have a setup with 1 main primary name server and 3 secondary, you will hit 4-ns-out-of-sync if one of the 3 secondaries isn't in sync with the 1 primary.