动态网址与静态网址
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
2008年10月16日星期四
发表者:
Juliane Stiller 与 Kaspar Szymanski,搜索质量组
原文:
Dynamic URLs vs. static URLs
发表于:2008年9月22日,下午3: 20
在跟网络管理员沟通时经常会出现这种情形,一些广为传播的理念可能在过去是正确的,但是可能已不再切合目前的情况了。当我们最近在跟几个朋友谈及关于网址的结构时就碰到这种情况。其中一个朋友很担心动态网址的使用,甚至认为“搜索引擎无法处理动态网址”。另外一个朋友觉得动态网址对搜索引擎来说完全不是问题,那些都是过去的事了。还有一个甚至说他从来都搞不懂动态网址和静态网址相比有什么区别。对于我们来说,这一刻使我们决定要好好研究一下动态网址和静态网址这个话题。首先,让我们来界定一下我们要谈论的主题:
什么是静态网址?
一个静态网址,顾名思义,就是一个不会发生变化的网址,它通常不包含任何网址参数。例如:https://www.example.com/archive/january.htm。您可以在搜索框里输入
filetype:html
在谷歌上搜索静态网址。更新此种类型网址的页面会比较耗费时间,尤其是当信息量增长很快时,因为每一个单独的页面都必须更改编译代码。这也是为什么网站管理员们在处理大型的、经常更新的网站,像在线购物网站、论坛社区、博客或者是内容管理系统时,会使用动态网址的原因。
什么是动态网址?
如果一个网站的内容存储于一个数据库,并且根据要求来显示页面,这时就可以使用动态网址。在这种情况下,网站提供的内容基本上是基于模板形式的。通常情况下,一个动态网址看起来像这样:
https://code.google.com/p/google-checkout-php-sample-code/issues/detail?id=31
。您可以通过寻找像? = & 这样的符号识别出动态网址。动态网址有一个缺陷是不同的网址可以拥有相同的内容。这样导致不同的用户可能链向含有不同参数的网址,但是这些网址却都含有相同的内容。这也是为什么网络管理员有时候想要将这些动态网址重写成静态网址的原因之一。
我是不是应该让我的动态网址看起来是静态的呢?
在处理动态网址时,希望您能了解以下几点事实:
-
要正确地生成和维护从动态网址到静态化网址的重写转变实际上是一件很难的事情。
-
将原始的动态网址提供给我们会比较安全,请让我们来处理诸如探测和避开那些有问题的参数的事情。
-
如果您想重写网址,请去掉那些不必要的参数,同时请保持它动态网址的样子。
-
如果您想提供一个静态网址代替动态网址,那么您应该切实地生成相应的静态内容。
静态和动态网址,Googlebot对于哪一个识别得更好呢?
我们碰到过很多网站管理员,像我们的朋友那样,认为静态或者看起来是静态的网址对于网站的索引和排名是有优势的。这种看法是基于这样一个假设,即认为搜索引擎在抓取和分析含有会话标识(session ID)和来源追踪器(source tracker)的网址时是有问题的。然而,事实是,谷歌在这两个方面都有了相当的进展。就点击率来说,静态网址可能略微有些优势,因为用户可以很容易地读懂这个网址。但是,就索引和排名来说,使用数据库驱动网站并不意味着明显的劣势。相比较将参数隐藏以使他们看起来是静态的网址来说,我们更希望网站将动态的网址直接提供给搜索引擎。
现在,让我们来看一些有关动态网址的广为传播的看法,并且来纠正一些蒙蔽网站管理员的假说。:)
传说:“动态网址不能被抓取。”
事实:
我们可以抓取动态网址并且解释不同的参数。如果您为了让网址看起来像是静态的,而隐藏那些可以给谷歌提供有价值信息的参数,这样做反而会给该网址的抓取和排名带来麻烦。我们的建议是:
请不要将一个动态网址改换格式以使其看起来是静态的
。尽可能地使用静态网址来显示静态内容是可取的,但在您决定展示动态内容的情况下,请不要将参数隐藏起来从而使他们看起来像是静态的,因为这样做会删除掉那些有助于我们分析网址的有用信息。
传说:“动态网址的参数要少于3个。”
事实:
对于参数的数量是没有限制的。但是,一个好的经验是
不要让您的网址太长
(这个适用于所有的网址,不论是静态的还是动态的)。您可以去掉一些对于Googlebot来说不重要的参数,给用户一个好看一点的动态网址。如果您不能确定可以去掉哪些参数,我们建议您将动态网址中所有的参数都提供给我们,我们的系统会弄明白哪一些是不重要的。将参数隐藏起来会影响我们正确地分析您的网址,我们也就不能识别这些参数,一些重要信息可能也因此丢失了。
下面一些是我们认为您可能会存在疑问的一些问题。
这是否意味着我应该完全避免重写动态网址?
这是我们的建议,除非您能确保您只是去掉多余的参数,或能够把所有有可能有不良影响的参数完整地删除。如果您把自己的动态网址任意修改使其看起来像是静态的,您要清楚这样做是有风险的,有可能会导致有些信息不能被正常地编译和识别。如果您想给您的网站再增加一个静态的版本,请您一定要提供一个真正意义上的静态的内容,比如生成那些可以通过网站相应路径而获取的文件。如果您仅仅是修改了动态网址的表现形式,而没有真正提供静态的内容,那么您有可能适得其反。请直接把标准的动态URL提供给我们,我们会自动找出那些冗余的参数。
你能给我举一个例子么?
如果您有一个像下面这样标准格式的动态网址:foo?key1=value&key2=value2,我们建议您不用改动它,谷歌会决定哪些参数可以去掉;或者您可以为用户去掉那些不必要的参数。不过要慎重,仅仅去掉那些不重要的参数。这里有一个含有多个参数的动态网址的例子:
www.example.com/article/bin/answer.foo?language=en&answer=3&sid=98971298178906&query=URL
-
language=en – 标明这篇文章的语言
-
answer=3 – 这篇文章含有数字3
-
sid=8971298178906 – 会话标识代码是8971298178906
-
query=URL – 使这篇文章被找到的查询是[URL]
并不是所有的参数都提供额外的信息。所以将这个网址重写为www.example.com/article/bin/answer.foo?language=en&answer=3 可能不会引起任何问题,因为所有不相关的参数都去掉了。
下面是一些经过认为修改而看起来像是静态网址的例子。相比较没有重写、直接提供动态网址来说,这些网址可能会引起更多抓取方面的问题。
-
www.example.com/article/bin/answer.foo/en/3/98971298178906/URL
-
www.example.com/article/bin/answer.foo/language=en/answer=3/ sid=98971298178906/query=URL
-
www.example.com/article/bin/answer.foo/language/en/answer/3/sid/ 98971298178906/query/URL
-
www.example.com/article/bin/answer.foo/en,3,98971298178906,URL
如果您将动态网址重写成如上所述的示例的话,可能会导致我们很多不必要的抓取,因为这些网址中都含有会话标识(sid)和查询(query)参数的可变值,这无形中生成了很多看起来不同的URL,而他们包含的内容却是相同的。这些格式让我们很难理解通过这个网址返回的实际内容和参数URL以及98971298178906是无关的。不过,下面这个重写的例子却将所有无关的参数都去掉了:
-
www.example.com/article/bin/answer.foo/en/3
尽管我们可以正确地处理这个网址,我们还是不鼓励您使用这样的重写。因为它很难维护,而且一旦一个新的参数被加到原始的动态网址,那么这个网址就需要马上更新。不这样做的话就会再次导致生成一个隐藏了参数的貌似静态网址的URL。所以最好的解决方法是通常将动态网址保持他们原来的样子。或者,如果您去掉不相关的参数,请记住一定要保持这个网址是动态的:
-
www.example.com/article/bin/answer.foo?language=en&answer=3
我们希望这篇文章能够对您和我们的朋友有帮助,使围绕动态网址的各种推测清晰化。如果您有更多的问题的话,欢迎加入我们的
网站管理员支持论坛
进行讨论。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2008-10-01。
[null,null,["最后更新时间 (UTC):2008-10-01。"],[[["\u003cp\u003eGoogle can crawl and understand dynamic URLs, including their parameters, so there's no inherent disadvantage in using them.\u003c/p\u003e\n"],["\u003cp\u003eRewriting dynamic URLs to static-looking ones is generally discouraged as it can hinder Google's ability to understand the URL structure and content.\u003c/p\u003e\n"],["\u003cp\u003eFocus on keeping URLs short and removing unnecessary parameters for better user experience, but avoid hiding parameters that provide valuable information to Googlebot.\u003c/p\u003e\n"],["\u003cp\u003eIt's preferable to serve Google the original dynamic URL and let their system handle identifying and ignoring irrelevant parameters.\u003c/p\u003e\n"],["\u003cp\u003eIf you do choose to rewrite, limit it to removing truly unnecessary parameters and ensure the rewritten URL remains dynamic-looking to avoid potential crawling issues.\u003c/p\u003e\n"]]],["The article clarifies that search engines can crawl dynamic URLs, contrary to some webmasters' beliefs. Key actions discussed include: avoiding reformatting dynamic URLs to look static, keeping URLs short, and removing only unnecessary parameters. The content emphasizes that dynamic URLs are not inherently bad for indexing and ranking. It recommends against extensive URL rewriting, suggesting instead to maintain dynamic URLs or create true static equivalents. The document advocates for allowing search engines to analyze the original dynamic URL structure.\n"],null,["# Dynamic URLs vs. static URLs\n\nMonday, September 22, 2008\n\n\nChatting with webmasters often reveals widespread beliefs that might have been accurate in the\npast, but are not necessarily up-to-date any more. This was the case when we recently talked to a\ncouple of friends about the structure of a URL. One friend was concerned about using dynamic URLs,\nsince (as they told us) \"search engines can't cope with these.\" Another friend thought that\ndynamic URLs weren't a problem at all for search engines and that these issues were a thing of the\npast. One even admitted that they never understood the fuss about dynamic URLs in comparison to\nstatic URLs. For us, that was the moment we decided to read up on the topic of dynamic and static\nURLs. First, let's clarify what we're talking about:\n\nWhat is a static URL?\n---------------------\n\n\nA static URL is one that does not change, so it typically does not contain any URL parameters. It\ncan look like this: `https://www.example.com/archive/january.htm`. You can search for\nstatic URLs on Google by typing\n[filetype:htm](https://www.google.com/search?q=filetype%3Ahtm)\nin the search field. Updating these kinds of pages can be time consuming, especially if the amount\nof information grows quickly, since every single page has to be hard-coded. This is why webmasters\nwho deal with large, frequently updated sites like online shops, forum communities, blogs or\ncontent management systems may use dynamic URLs.\n\nWhat is a dynamic URL?\n----------------------\n\n\nIf the content of a site is stored in a database and pulled for display on pages on demand,\ndynamic URLs maybe used. In that case the site serves basically as a template for the content.\nUsually, a dynamic URL would look something like this:\n\u003chttps://code.google.com/p/google-checkout-php-sample-code/issues/detail?id=31\u003e.\nYou can spot dynamic URLs by looking for characters like: `?`,`=`,\n`&`. Dynamic URLs have the disadvantage that different URLs can have the same\ncontent. So different users might link to URLs with different parameters which have the same\ncontent. That's one reason why webmasters sometimes want to rewrite their URLs to static ones.\n\n\nShould I try to make my dynamic URLs look static?\n-------------------------------------------------\n\nFollowing are some key points you should keep in mind while dealing with dynamic URLs:\n\n1. It's quite hard to correctly create and maintain rewrites that change dynamic URLs to static-looking URLs.\n2. It's much safer to serve us the original dynamic URL and let us handle the problem of detecting and avoiding problematic parameters.\n3. If you want to rewrite your URL, please remove unnecessary parameters while maintaining a dynamic-looking URL.\n4. If you want to serve a static URL instead of a dynamic URL you should create a static equivalent of your content.\n\nWhich can Googlebot read better, static or dynamic URLs?\n--------------------------------------------------------\n\n\nWe've come across many webmasters who, like our friend, believed that static or static-looking\nURLs were an advantage for indexing and ranking their sites. This is based on the presumption that\nsearch engines have issues with crawling and analyzing URLs that include session IDs or source\ntrackers. However, as a matter of fact, we at Google have made some progress in both areas. While\nstatic URLs might have a slight advantage in terms of clickthrough rates because users can easily\nread the urls, the decision to use database-driven websites does not imply a significant\ndisadvantage in terms of indexing and ranking. Providing search engines with dynamic URLs should\nbe favored over hiding parameters to make them look static.\n\n\nLet's now look at some of the widespread beliefs concerning dynamic URLs and correct some of the\nassumptions which spook webmasters. :)\n\nMyth: \"Dynamic URLs cannot be crawled.\"\n---------------------------------------\n\n\n**Fact:** We can crawl dynamic URLs and interpret the different parameters. We might have\nproblems crawling and ranking your dynamic URLs if you try to make your urls look static and in\nthe process hide parameters which offer the Googlebot valuable information. One recommendation is\nto *avoid reformatting a dynamic URL to make it look static*. It's always advisable to use\nstatic content with static URLs as much as possible, but in cases where you decide to use dynamic\ncontent, you should give us the possibility to analyze your URL structure and not remove\ninformation by hiding parameters and making them look static.\n\nMyth: \"Dynamic URLs are okay if you use fewer than three parameters.\"\n---------------------------------------------------------------------\n\n\n**Fact:** There is no limit on the number of parameters, but a good rule of thumb would be to\n*keep your URLs short* (this applies to all URLs, whether static or dynamic). You may be able\nto remove some parameters which aren't essential for Googlebot and offer your users a nice looking\ndynamic URL. If you are not able to figure out which parameters to remove, we'd advise you to\nserve us all the parameters in your dynamic URL and our system will figure out which ones do not\nmatter. Hiding your parameters keeps us from analyzing your URLs properly and we won't be able to\nrecognize the parameters as such, which could cause a loss of valuable information.\n\nFollowing are some questions we thought you might have at this point.\n\n\n**Does that mean I should avoid rewriting dynamic URLs at all?** That's our recommendation,\nunless your rewrites are limited to removing unnecessary parameters, or you are very diligent in\nremoving all parameters that could cause problems. If you transform your dynamic URL to make it\nlook static you should be aware that we might not be able to interpret the information correctly\nin all cases. If you want to serve a static equivalent of your site, you might want to consider\ntransforming the underlying content by serving a replacement which is truly static. One example\nwould be to generate files for all the paths and make them accessible somewhere on your site.\nHowever, if you're using URL rewriting (rather than making a copy of the content) to produce\nstatic-looking URLs from a dynamic site, you could be doing harm rather than good. You can serve\nus your standard dynamic URL and we will automatically find the parameters which are unnecessary.\n\n\n**Can you give me an example?** If you have a dynamic URL which is in the standard format like\n`foo?key1=value&key2=value2` we recommend that you leave the URL unchanged, and\nGoogle will determine which parameters can be removed; or you could remove unnecessary parameters\nfor your users. Be careful that you only remove parameters which do not matter. Here's an example\nof a URL with a couple of parameters: \n\n```\nwww.example.com/article/bin/answer.foo?language=en&answer=3&sid=98971298178906&query=URL\n```\n\n- `language=en`: indicates the language of the article\n- `answer=3`: the article has the number 3\n- `sid=8971298178906`: the session ID number is 8971298178906\n- `query=URL`: the query with which the article was found is \"URL\"\n\n\nNot all of these parameters offer additional information. So rewriting the URL to\n`www.example.com/article/bin/answer.foo?language=en&answer=3` probably would not\ncause any problems as all irrelevant parameters are removed.\n\n\nThe following are some examples of static-looking URLs which may cause more crawling problems than\nserving the dynamic URL without rewriting:\n\n- `www.example.com/article/bin/answer.foo/en/3/98971298178906/URL`\n- `www.example.com/article/bin/answer.foo/language=en/answer=3/sid=98971298178906/query=URL`\n- `www.example.com/article/bin/answer.foo/language/en/answer/3/sid/98971298178906/query/URL`\n- `www.example.com/article/bin/answer.foo/en,3,98971298178906,URL`\n\n\nRewriting your dynamic URL to one of these examples could cause us to crawl the same piece of\ncontent needlessly via many different URLs with varying values for session IDs\n(`sid`) and `query`. These forms make it difficult for us to understand that\n`URL` and `98971298178906` have nothing to do with the actual content which\nis returned via this URL. However, here's an example of a rewrite where all irrelevant parameters\nhave been removed: \n\n```\nwww.example.com/article/bin/answer.foo/en/3\n```\n\n\nAlthough we are able to process this URL correctly, we would still discourage you from using this\nrewrite as it is hard to maintain and needs to be updated as soon as a new parameter is added to\nthe original dynamic URL. Failure to do this would again result in a static looking URL which is\nhiding parameters. So the best solution is often to keep your dynamic URLs as they are. Or, if you\nremove irrelevant parameters, bear in mind to leave the URL dynamic as the above example of a\nrewritten URL shows: \n\n```\nwww.example.com/article/bin/answer.foo?language=en&answer=3\n```\n\n\nWe hope this article is helpful to you and our friends to shed some light on the various\nassumptions around dynamic URLs. Please you can join our\n[discussion group](https://support.google.com/webmasters/community)\nif you have any further questions.\n\nWritten by Juliane Stiller and Kaspar Szymanski, Search Quality Team"]]