网址移除说明(第一部分):网址和目录
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
2010 年 3 月 30 日,星期二
现在,互联网上有很多内容。有时,某些内容可能会在网上出现,而实际上您可能并不想看到这些内容,无论是从您后悔发布的煽动性博文,还是无意中泄露的机密数据。在大多数情况下,删除或限制访问此类内容会导致它在一段时间后自然地从搜索结果中移除。但是,如果您急需移除已由 Google 编入索引的不当内容,并且无法等待其自动消失,可以使用我们的网址移除工具加快从我们的搜索结果中移除内容,但前提是符合特定条件(我们将在下文中讨论)。
我们为您准备了一系列博文,介绍如何成功移除各种类型的内容,以及需要避免的常见错误。在第一篇博文中,我将介绍一些基本场景:移除单个网址、移除整个目录或网站以及重新添加已移除的内容。此外,我还强烈建议您阅读之前的博文,了解如何管理网上关于您的信息。
移除单个网址
一般而言,为了确保您的移除请求成功,相关网址的所有者(无论是您还是其他人)必须指明可以移除相关内容。对于单个网址,您可通过以下三种方式中的任意一种指明:
在提交移除请求之前,您可以查看相应网址是否被正确屏蔽:
-
robots.txt:您可以使用网站站长工具中的 Googlebot 模拟抓取或测试 robots.txt 功能检查是否已正确禁止该网址。
-
noindex
meta
标记:您可以使用 Googlebot 模拟抓取确保 meta
标记出现在 <head>
和 </head>
标记之间的某处。如果您想查看无法通过网站站长工具验证的网页,可以在浏览器中打开该网址,依次前往“查看”>“网页源代码”,并确保 <head>
和 </head>
标记之间存在 meta
标记。
-
404
和 410
状态代码:您可以使用“Googlebot 模拟抓取”功能,也可以使用 Live HTTP Headers 或 web-sniffer.net 等工具来验证该网址是否真的返回了正确的代码。有时,“已删除”的网页可能显示“404”或“未找到”,但网页标头中实际上会返回 200
状态代码;因此,请使用正确的标头检查工具仔细检查。
如果垃圾内容已从某个网页中移除,但未按上述任何方式屏蔽该网页,您将无法从搜索结果中完全移除该网址。如果托管相应内容的网站不归您所有,这种情况最为常见。我们会在移除系列第二部分的后续博文中介绍在这种情况下该怎么做。
如果网址符合上述任一条件,您可以通过以下方法将其移除:前往“移除”工具,输入要移除的网址,然后选择“Webmaster has already blocked the page”选项。请注意,您应该输入托管内容的网址,而不是显示该内容的 Google 搜索网址。例如,输入 https://www.example.com/embarrassing-stuff.html
而不是 https://www.google.com/search?q=embarrassing+stuff
。
如需详细了解如何确保输入正确的网址,请参阅我们的帮助中心文章。请注意,如果您不告诉我们让您感到困惑的确切网址,我们将无法移除您的关注内容。
移除整个目录或网站
若要成功移除某个目录或网站级内容,您必须在该网站的 robots.txt 文件中禁止该目录或网站。例如,如需移除 https://www.example.com/secret/
目录,您的 robots.txt 文件需要包含:
User-agent: *
Disallow: /secret/
仅仅让目录的根返回 404
状态代码是不够的,因为可能出现这样一种情况:即使目录返回了 404
,它仍然会提供其下的文件。使用 robots.txt 屏蔽某个目录(或整个网站)可确保该目录(或网站)下的所有网址也遭到屏蔽。您可以使用网站站长工具中的 Googlebot 模拟抓取或测试 robots.txt 来测试目录是否已被正确屏蔽。
只有网站的经验证所有者才可以在网站站长工具中请求移除整个网站或目录。若要请求移除某个目录或网站,请点击相关网站,然后依次前往“网站配置”>“抓取工具访问权限”>“移除网址”。如果您输入自己网站的网址作为要移除的网址,系统会要求您确认是否要移除整个网站。如果您输入了子目录,请从下拉菜单中选择“移除目录”选项。
重新添加内容
您可以随时针对自己拥有的任何网站(包括其他人提交的网站)取消移除要求。若要执行此操作,您必须是网站站长工具中此网站的经验证所有者。验证所有权后,您可以依次前往“网站配置”>“抓取工具访问权限”>“移除网址”>“已移除的网址”(或 >“由他人移除”),然后点击要取消的所有请求旁边的“取消”。
仍有疑问?敬请关注我们系列中的其他视频,了解如何从 Google 搜索结果中移除内容。如果不想等待,可以在我们的帮助论坛中查找许多有关网址移除和排查个别问题的文章。如果您在阅读他人的体验后仍有疑问,可以询问。请注意,在大多数情况下,我们都很难在不知道相关网站或网址的情况下提供有关特定移除的建议。我们建议您使用网址缩短服务来分享您的网址,以便您担心的网址不会作为博文的一部分被编入索引;待您的问题得到解决后,某些缩短服务甚至允许您稍后停用快捷方式。
此系列中的其他博文
最后,您可能还想了解如何管理网上关于您的信息。
发布者:网站站长趋势分析师 Susan Moskwa
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[null,null,[],[[["\u003cp\u003eGoogle's URL removal tool can expedite the removal of unwanted content from search results if specific criteria are met.\u003c/p\u003e\n"],["\u003cp\u003eContent removal requires indicating consent through methods like robots.txt, noindex meta tag, or returning a 404/410 status code.\u003c/p\u003e\n"],["\u003cp\u003eDirectory or site removal necessitates disallowing it in the robots.txt file and verification of site ownership in Google Webmaster Tools.\u003c/p\u003e\n"],["\u003cp\u003eVerified site owners can cancel removal requests and reinstate content through Google Webmaster Tools.\u003c/p\u003e\n"],["\u003cp\u003eThis blog post is part of a series further explaining content removal from Google search results, including cached content and third-party content.\u003c/p\u003e\n"]]],["To remove content from Google search results, owners must indicate it's okay to remove it by blocking it via `robots.txt`, `noindex` meta tag, or `404`/`410` status code. Use the URL removal tool to expedite removal if these criteria are met. Removing an entire directory/site requires blocking via `robots.txt`. Verified site owners can cancel removal requests in Webmaster Tools. Consult the help forum for more support, using a URL shortening service if sharing the URL.\n"],null,["# URL removal explained, Part I: URLs and directories\n\n| It's been a while since we published this blog post. Some of the information may be outdated (for example, some images may be missing, and some links may not work anymore).\n\nTuesday, March 30, 2010\n\n\nThere's\n[a lot of content on the Internet these days](https://googleblog.blogspot.com/2008/07/we-knew-web-was-big).\nAt some point, something may turn up online that you would rather not have out there---anything\nfrom an inflammatory blog post you regret publishing, to confidential data that accidentally got\nexposed. In most cases, deleting or restricting access to this content will cause it to naturally\ndrop out of search results after a while. However, if you urgently need to remove unwanted\ncontent that has gotten indexed by Google and you can't wait for it to naturally disappear, you\ncan use our URL removal tool to expedite the removal of content from our search results as long\nas it meets certain [criteria](/search/docs/crawling-indexing)\n(which we'll discuss below).\n\n\nWe've got a series of blog posts lined up for you explaining how to successfully remove various\ntypes of content, and common mistakes to avoid. In this first post, I'm going to cover a few basic\nscenarios: removing a single URL, removing an entire directory or site, and reincluding removed\ncontent. I also strongly recommend our previous post on\n[managing what information is available about you online](/search/blog/2009/10/managing-your-reputation-through-search).\n\nRemoving a single URL\n---------------------\n\n\nIn general, in order for your removal requests to be successful, the owner of the URL(s) in\nquestion---whether that's you, or someone else---must have indicated that it's okay to\nremove that content. For an individual URL, this can be indicated in any of three ways:\n\n- block the page from crawling via a [robots.txt file](/search/docs/crawling-indexing/robots/intro)\n- block the page from indexing via a [`noindex` `meta` tag](/search/docs/crawling-indexing/block-indexing)\n- indicate that the page no longer exists by returning a [`404` or `410` status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)\n\nBefore submitting a removal request, you can check whether the URL is correctly blocked:\n\n- **robots.txt:** You can check whether the URL is correctly disallowed using either the [Fetch as Googlebot](https://www.google.com/support/webmasters/bin/answer.py?answer=158587) or [Test robots.txt](https://www.google.com/support/webmasters/bin/answer.py?answer=156449) features in Webmaster Tools.\n- **`noindex` `meta` tag:** You can use Fetch as Googlebot to make sure the `meta` tag appears somewhere between the `\u003chead\u003e` and `\u003c/head\u003e` tags. If you want to check a page you can't verify in Webmaster Tools, you can open the URL in a browser, go to *View \\\u003e Page source* , and make sure you see the `meta` tag between the `\u003chead\u003e` and `\u003c/head\u003e` tags.\n- **`404` and `410` status code:** You can use Fetch as Googlebot, or tools like [Live HTTP Headers](https://addons.mozilla.org/en-US/firefox/addon/3829) or [web-sniffer.net](https://web-sniffer.net/) to verify whether the URL is actually returning the correct code. Sometimes \"deleted\" pages may *say* \"404\" or \"Not found\" on the page, but actually return a `200` status code in the page header; so it's good to use a proper header-checking tool to double-check.\n\n\nIf unwanted content has been removed from a page but the page hasn't been blocked in any of the\nabove ways, you will *not be able to completely remove that URL* from our search results.\nThis is most common when you don't own the site that's hosting that content. We cover what to do\nin this situation in a subsequent post in\n[Part II of our removals series](/search/blog/2010/04/url-removals-explained-part-ii-removing).\n\n\nIf a URL meets one of the above criteria, you can remove it by going to\n[the Removals Tool](https://www.google.com/webmasters/tools/removals),\nentering the URL that you want to remove, and selecting the \"Webmaster has already blocked the\npage\" option. Note that you should enter the URL where the content was hosted, *not* the URL\nof the Google search where it's appearing. For example, enter\n`https://www.example.com/embarrassing-stuff.html` *not*\n`https://www.google.com/search?q=embarrassing+stuff`.\n\n\n[Our help center article](https://www.google.com/support/webmasters/bin/answer.py?answer=63758)\nhas more details about making sure you're entering the proper URL. Remember that if you don't tell\nus the exact URL that's troubling you, we won't be able to remove the content you had in mind.\n\nRemoving an entire directory or site\n------------------------------------\n\n\nIn order for a directory or site-wide removal to be successful, the directory or site must be\n*disallowed in the site's\n[robots.txt file](/search/docs/crawling-indexing/robots/intro)* . For example, in order to\nremove the `https://www.example.com/secret/` directory,\nyour robots.txt file would need to include: \n\n```\nUser-agent: *\nDisallow: /secret/\n```\n\n\nIt isn't enough for the root of the directory to return a `404` status code,\nbecause it's possible for a directory to return a `404` but still serve out files underneath it.\nUsing robots.txt to block a directory (or an entire site) ensures that all the URLs under that\ndirectory (or site) are blocked as well. You can test whether a directory has been blocked\ncorrectly using either the\n[Fetch as Googlebot](https://www.google.com/support/webmasters/bin/answer.py?answer=158587)\nor\n[Test robots.txt](https://www.google.com/support/webmasters/bin/answer.py?answer=156449)\nfeatures in Webmaster Tools.\n\n\nOnly verified owners of a site can request removal of an entire site or directory in Webmaster\nTools. To request removal of a directory or site, click on the site in question, then go to\n*Site configuration \\\u003e\nCrawler access \\\u003e\nRemove URL*. If you enter the root of your site as the URL you want to remove, you'll be\nasked to confirm that you want to remove the entire site. If you enter a subdirectory, select the\n\"Remove directory\" option from the drop-down menu.\n\nReincluding content\n-------------------\n\n\nYou can cancel removal requests for any site you own at any time, including those submitted by\nother people. In order to do so, you must be a\n[verified owner of this site](https://www.google.com/support/webmasters/bin/topic.py?topic=8469)\nin Webmaster Tools. Once you've verified ownership, you can go to\n*Site configuration \\\u003e\nCrawler access \\\u003e\nRemove URL \\\u003e\nRemoved URLs* (or *\\\u003e Made by others*) and click\n\"Cancel\" next to any requests you wish to cancel.\n\n\nStill have questions? Stay tuned for the rest of our series on removing content from Google's\nsearch results. If you can't wait, much has already been written about URL removals, and\ntroubleshooting individual cases, in our\n[Help Forum](https://support.google.com/webmasters/community/label?lid=5489e59697a233d7&hl=en).\nIf you still have questions after reading others' experiences, you can ask. Note that, in most\ncases, it's hard to give relevant advice about a particular removal without knowing the site or\nURL in question. We recommend sharing your URL by using a\n[URL shortening service](https://www.google.com/search?q=url+shorteners)\nso that the URL you're concerned about doesn't get indexed as part of your post; some shortening\nservices will even let you disable the shortcut later on, once your question has been resolved.\n\nOther posts of this series\n--------------------------\n\n- [Part II: Removing and updating cached content](/search/blog/2010/04/url-removals-explained-part-ii-removing)\n- [Part III: Removing content you don't own](/search/blog/2010/04/url-removal-explained-part-iii-removing)\n- [Part IV: Tracking requests, what not to remove](/search/blog/2010/05/url-removal-explained-part-iv-tracking)\n\n\nFinally, you might be also interested to read about\n[managing what information is available about you online](/search/blog/2009/10/managing-your-reputation-through-search).\n\nPosted by Susan Moskwa, Webmaster Trends Analyst"]]