性能优势
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
简介:DNS 延迟的原因和缓解措施
随着网页变得越来越复杂,DNS 查询会成为浏览体验中的重要瓶颈。每当客户端需要通过网络查询 DNS 解析器时,产生的延迟可能就会很明显,具体取决于解析器必须查询的域名服务器的距离和数量(虽然很少会出现,但可能会发生超过 2 个)。例如,以下屏幕截图显示了 Page Speed 网页性能衡量工具所报告的时间。
每个竖条表示从页面中引用的资源;黑色段表示 DNS 查询次数。在该页面中,系统在加载该页面的前 11 秒内会进行 13 次查询。
虽然几次查询是并行执行的,但屏幕截图显示需要 5 次串行查询时间,占总 11 秒网页加载时间的几秒钟。

DNS 延迟时间有两个组成部分:
- 客户端(用户)与 DNS 解析服务器之间的延迟时间。
在大多数情况下,这主要是由于联网系统中通常的往返时间 (RTT) 限制造成的:客户端和服务器机器之间的地理位置距离;网络拥塞;丢包和较长的重新传输延迟(平均一秒);服务器过载、拒绝服务攻击等。
- 解析服务器与其他域名服务器之间的延迟时间。
导致延迟时间的主要原因包括:
- 缓存未命中。
如果无法从解析器的缓存传送响应,但需要以递归方式查询其他域名服务器,则增加的网络延迟时间会相当大,尤其是当权威服务器在地理位置上远程运行时更是如此。
- 预配不足。
如果 DNS 解析器过载,它们必须将 DNS 解析请求和响应加入队列,并且可能会开始丢弃和重新传输数据包。
- 恶意流量。
即使过度预配 DNS 服务,DoS 流量也可能会给服务器带来过重的负载。同样,Kaminsky 式攻击可能涉及对查询进行泛洪攻击,而查询一定会绕过缓存并需要传出请求才能进行解析。
我们认为缓存未命中因素是导致 DNS 延迟的最主要原因,下文将对此进行进一步讨论。
缓存未命中
即使解析器具有丰富的本地资源,也很难避免与远程域名服务器通信相关的基本延迟。换言之,假设解析器配置得足够好,使缓存命中在服务器端的时间为零,则缓存未命中在延迟时间方面仍然非常昂贵。
为了处理未命中的情况,解析器必须与至少一个(但通常是两个或多个)外部域名服务器通信。
运行 Googlebot 网页抓取工具时,我们发现域名服务器的平均解析时间为 130 毫秒。
但是,由于 UDP 数据包丢失和服务器无法访问,完全有 4-6% 的请求只是超时。如果我们考虑诸如数据包丢失、域名服务器不可用、DNS 配置错误等故障,则实际的平均端到端解析时间为 300-400 毫秒。但是,差异也会很长,并且尾巴也会很长。
虽然各 DNS 服务器之间的缓存未命中率可能会有所不同,但从根本上说,很难避免缓存未命中,原因如下:
- 互联网规模和增长情况。
简单来说,随着互联网的不断发展,无论是新用户还是新网站,大部分内容都只关注边缘利益。虽然少数网站(并由此引发了 DNS 名称)非常受欢迎,但大多数只对少数用户感兴趣,很少访问,因此大多数请求都会导致缓存未命中。
- 存留时间 (TTL) 值较短。
DNS TTL 值呈下降趋势意味着解析需要更频繁的查询。
- 缓存隔离。
DNS 服务器通常部署在负载平衡器后面,负载平衡器会将查询随机分配给不同的机器。这会导致每个服务器维护单独的缓存,而无法重复使用共享池中的缓存分辨率。
缓解措施
在 Google 公共 DNS 中,我们实现了多种方法来缩短 DNS 查找时间。其中一些方法是相当标准的;另一些是实验性方法:
- 充分预配服务器,以处理来自客户端流量(包括恶意流量)的负载。
- 防止 DoS 和放大攻击。
虽然这主要是一个安全问题,并且对已关闭的解析器的影响小于打开的解析器,但防范 DoS 攻击还可以消除 DNS 服务器带来的额外流量负担,从而提高性能。如需了解我们用于最大限度降低攻击几率的方法,请参阅有关安全优势的页面。
- 针对共享缓存提供负载均衡,以提高整个服务集群的汇总缓存命中率。
- 覆盖全球,靠近所有用户。
充分预配服务集群
缓存 DNS 解析器必须执行比权威域名服务器执行更昂贵的操作,因为许多响应无法从内存传送;相反,它们需要与其他域名服务器通信,因此需要大量的网络输入/输出。此外,开放的解析器非常容易受到缓存中毒尝试的攻击,这会提高缓存未命中率(此类攻击专门针对无法通过缓存解析的虚假名称发送请求),以及会增加流量负载的 DoS 攻击。
如果解析器未充分预配,无法跟上负载,则可能会对性能产生非常不利的影响。数据包会被丢弃并需要重新传输,域名服务器请求必须加入队列,等等。所有这些因素都会造成延误。
因此,请务必配置 DNS 解析器,以应对大量输入/输出。
这包括处理可能的 DDoS 攻击,对于这种情况,唯一的有效解决方案是过度预配多台机器。但同时,切勿在添加机器时降低缓存命中率;这需要实现有效的负载均衡政策,我们将在下文中对此进行讨论。
针对共享缓存的负载均衡
实际上,如果负载均衡未正确完成,通过添加机器来扩缩解析器基础架构可能会适得其反,并降低缓存命中率。在典型的部署中,多台机器位于一个负载平衡器后面,负载平衡器使用轮循等简单的算法将流量平均分配给每台机器。这样做的结果是,每台机器都会维护自己的独立缓存,缓存的内容在不同的机器之间会隔离开来。如果每个传入查询都分配到一台随机机器,具体取决于流量的性质,实际缓存未命中率可能会按比例提高。
例如,对于重复查询的 TTL 较长的名称,缓存未命中率可以按集群中的机器数量增加。(对于 TTL 非常短、查询频率非常低或导致响应无法缓存(0 TTL 和错误)的名称,添加机器并不会真正影响缓存未命中率。)
如需提高可缓存名称的命中率,请务必对服务器进行负载均衡,以免缓存被碎片化。在 Google 公共 DNS 中,我们有两个级别的缓存。在一个机器池中,离用户很近,每台机器的小型缓存都包含最常用的名称。如果无法从此缓存中满足某个查询,该查询会被发送到按名称分区的另一个机器池。对于此第二级缓存,具有相同名称的所有查询都会发送到同一台机器,在该机器中,名称要么已缓存,要么未被缓存。
分布服务集群以实现广泛的地理覆盖范围
对于封闭的解析器,这其实并不是问题。对于开放式解析器,您的服务器离用户越近,他们在客户端端看到的延迟就越少。此外,具有足够的地理覆盖范围可以间接缩短端到端延迟时间,因为域名服务器通常会返回针对 DNS 解析器位置进行了优化的结果。也就是说,如果 content provider 托管全球的镜像网站,该提供商的域名服务器将返回距离 DNS 解析器最近的 IP 地址。
Google 公共 DNS 托管在世界各地的数据中心,它使用任播路由将用户发送到地理位置最近的数据中心。
此外,Google 公共 DNS 支持 EDNS 客户端子网 (ECS)。ECS 是一种 DNS 协议扩展,可供解析器将客户端位置转发到域名服务器,后者可以返回针对实际客户端 IP 地址(而非解析器 IP 地址)进行了优化的位置敏感响应。如需了解详情,请参阅此常见问题解答。
Google 公共 DNS 会自动检测支持 EDNS 客户端子网的域名服务器。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-25。
[null,null,["最后更新时间 (UTC):2025-07-25。"],[[["\u003cp\u003eDNS lookups significantly impact website loading speed, especially with resource-heavy pages referencing multiple domains.\u003c/p\u003e\n"],["\u003cp\u003eDNS latency stems from network factors between clients and resolvers and between resolvers and other name servers, with cache misses being a primary contributor.\u003c/p\u003e\n"],["\u003cp\u003eCache misses are difficult to avoid due to the Internet's vastness, short DNS record lifespans, and isolated caching systems across servers.\u003c/p\u003e\n"],["\u003cp\u003eGoogle Public DNS employs strategies like global server distribution, load balancing, and security measures to mitigate DNS latency and enhance performance.\u003c/p\u003e\n"]]],["DNS latency, a significant factor in web browsing speed, is primarily caused by client-resolver latency and resolver-name server latency. Cache misses, due to internet growth, low TTL values, and cache isolation, are a dominant factor. Mitigations include adequately provisioning servers to handle traffic, preventing DoS attacks, load-balancing for shared caching, and ensuring global server coverage. Google Public DNS uses techniques such as two-level caching and EDNS client subnet support to minimize lookup times.\n"],null,["# Performance Benefits\n\nIntroduction: causes and mitigations of DNS latency\n---------------------------------------------------\n\nAs web pages become more complex, referencing resources from numerous domains,\nDNS lookups can become a significant bottleneck in the browsing experience.\nWhenever a client needs to query a DNS resolver over the network, the latency\nintroduced can be significant, depending on the proximity and number of\nname servers the resolver has to query (more than 2 is rare, but it can happen).\nAs an example, the following screen shot shows the timings reported by the\n[Page Speed](/speed/pagespeed) web performance measurement tool.\nEach bar represents a resource referenced from the page; the black segments\nindicate DNS lookups.\nIn this page, 13 lookups are made in the first 11 seconds in which the page is\nloaded.\nAlthough several of the lookups are done in parallel, the screen shot shows that\n5 serial lookup times are required, accounting for several seconds of the total\n11 seconds page load time.\n\nThere are two components to DNS latency:\n\n- Latency between the client (user) and DNS resolving server. In most cases this is largely due to the usual round-trip time (RTT) constraints in networked systems: geographical distance between client and server machines; network congestion; packet loss and long retransmit delays (one second on average); overloaded servers, denial-of-service attacks and so on.\n- Latency between resolving servers and other name servers. This source of latency is caused primarily by the following factors:\n - Cache misses. If a response cannot be served from a resolver's cache, but requires recursively querying other name servers, the added network latency is considerable, especially if the authoritative servers are geographically remote.\n - Underprovisioning. If DNS resolvers are overloaded, they must queue DNS resolution requests and responses, and may begin dropping and retransmitting packets.\n - Malicious traffic. Even if a DNS service is overprovisioned, DoS traffic can place undue load on the servers. Similarly, Kaminsky-style attacks can involve flooding resolvers with queries that are guaranteed to bypass the cache and require outgoing requests for resolution.\n\nWe believe that the cache miss factor is the most dominant cause of DNS latency,\nand discuss it further below.\n\n### Cache misses\n\nEven if a resolver has abundant local resources, the fundamental delays\nassociated with talking to remote name servers are hard to avoid.\nIn other words, assuming the resolver is provisioned well enough so that cache\nhits take zero time on the server-side, cache misses remain very expensive in\nterms of latency.\nTo handle a miss, a resolver has to talk to at least one, but often two or more\nexternal name servers.\nOperating the Googlebot web crawler, we have observed an average resolution time\nof 130 ms for name servers that respond.\nHowever, a full 4-6% of requests simply time out, due to UDP packet loss and\nservers being unreachable.\nIf we take into account failures such as packet loss, dead name servers, DNS\nconfiguration errors, etc., the **actual** average end-to-end resolution time is\n300-400 ms.\nHowever, there is high variance and a long tail.\n\nThough the cache miss rate may vary among DNS servers, cache misses are\nfundamentally difficult to avoid, for the following reasons:\n\n- Internet size and growth. Quite simply, as the Internet grows, both through the addition of new users and of new sites, most content is of marginal interest. While a few sites (and consequently DNS names) are very popular, most are of interest to only a few users and are accessed rarely; so the majority of requests result in cache misses.\n- Low time-to-live (TTL) values. The trend towards lower DNS TTL values means that resolutions need more frequent lookups.\n- Cache isolation. DNS servers are typically deployed behind load balancers which assign queries to different machines at random. This results in each individual server maintaining a separate cache rather than being able to reuse cached resolutions from a shared pool.\n\n### Mitigations\n\nIn Google Public DNS, we have implemented several approaches to speeding up DNS\nlookup times.\nSome of these approaches are fairly standard; others are experimental:\n\n- [Provisioning servers adequately](#provision) to handle the load from client traffic, including malicious traffic.\n- Preventing DoS and amplification attacks. Although this is mostly a security issue, and affects closed resolvers less than open ones, preventing DoS attacks also has a benefit for performance by eliminating the extra traffic burden placed on DNS servers. For information on the approaches we are using to minimize the chance of attacks, see the page on [security benefits](/speed/public-dns/docs/security).\n- [Load-balancing for shared caching](#loadbalance), to improve the aggregated cache hit rate across the serving cluster.\n- [Providing global coverage](#geography) for proximity to all users.\n\nProvisioning serving clusters adequately\n----------------------------------------\n\nCaching DNS resolvers have to perform more expensive operations than\nauthoritative name servers, since many responses cannot be served from memory;\ninstead, they require communication with other name servers and thus demand a\nlot of network input/output.\nFurthermore, open resolvers are highly vulnerable to cache poisoning attempts,\nwhich increase the cache miss rate (such attacks specifically send requests for\nbogus names that can't be resolved from cache), and to DoS attacks, which add to\nthe traffic load.\nIf resolvers are not provisioned adequately and cannot keep up with the load,\nthis can have a very negative impact on performance.\nPackets get dropped and need to be retransmitted, name server requests have to\nbe queued, and so on.\nAll of these factors add to delays.\n\nTherefore, it's important for DNS resolvers to be provisioned for high-volume\ninput/output.\nThis includes handling possible DDoS attacks, for which the only effective\nsolution is to over-provision with many machines.\nAt the same time, however, it's important not to reduce the cache hit rate when\nyou add machines; this requires implementing an effective load-balancing policy,\nwhich we discuss below.\n\nLoad-balancing for shared caching\n---------------------------------\n\nScaling resolver infrastructure by adding machines can actually backfire and\n**reduce** the cache hit rate if load balancing is not done properly.\nIn a typical deployment, multiple machines sit behind a load balancer that\nequally distributes traffic to each machine, using a simple algorithm such as\nround robin.\nThe result of this is that each machine maintains its own independent cache,\nso that the cached content is isolated across machines.\nIf each incoming query is distributed to a random machine, depending on the\nnature of the traffic, the effective cache miss rate can be increased\nproportionally.\nFor example, for names with long TTLs that are queried repeatedly, the cache\nmiss rate can be increased by the number of machines in the cluster.\n(For names with very short TTLs, that are queried very infrequently, or that\nresult in uncacheable responses (0 TTL and errors), the cache miss rate is not\nreally affected by adding machines.)\n\nTo boost the hit rate for cacheable names, it's important to load-balance\nservers so that the cache is not fragmented.\nIn Google Public DNS, we have two levels of caching.\nIn one pool of machines, very close to the user, a small per-machine cache\ncontains the most popular names.\nIf a query cannot be satisfied from this cache, it is sent to another pool of\nmachines that partition the cache by names.\nFor this second level cache, all queries for the same name are sent to the same\nmachine, where the name is either cached or it isn't.\n\nDistributing serving clusters for wide geographical coverage\n------------------------------------------------------------\n\nFor closed resolvers, this is not really an issue.\nFor open resolvers, the closer your servers are located to your users,\nthe less latency they will see at the client end.\nIn addition, having sufficient geographical coverage can indirectly improve\nend-to-end latency, as name servers typically return results optimized for the\nDNS resolver's location.\nThat is, if a content provider hosts mirrored sites around the world, that\nprovider's name servers will return the IP address in closest proximity to the\nDNS resolver.\n\nGoogle Public DNS is hosted in data centers worldwide, and uses anycast routing\nto send users to the geographically closest data center.\n\nIn addition, Google Public DNS supports [EDNS client subnet (ECS)](https://tools.ietf.org/html/rfc7871), a DNS\nprotocol extension for resolvers to forward client location to name servers,\nwhich can return location-sensitive responses optimized for the actual client\nIP address, rather than the resolver's IP address.\nPlease read [this FAQ](/speed/public-dns/faq#cdn) for details.\nGoogle Public DNS [automatically detects name servers](//groups.google.com/forum/#!topic/public-dns-announce/67oxFjSLeUM)\nthat support EDNS Client Subnet."]]