网址移除说明(第三部分):移除不归您所有的内容
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
2010 年 4 月 20 日,星期二
欢迎观看网址移除系列视频的第三集!在第 1 集和第 2 集中,我们讨论了如何加快移除由您控制的内容和请求加速移除缓存内容。今天,我们将介绍当内容源自并非由您控制的网站时,如何使用 Google 的公开网址移除工具请求从 Google 搜索结果中移除该内容。
Google 提供了两种工具来请求加速移除内容:
-
验证过的网址移除工具:用于在以下情况下请求从 Google 搜索结果中移除内容:该内容发布在某个网站上,您在网站站长工具中是该网站的经验证所有者(例如您的博客或公司网站)
-
公开网址移除工具:用于在以下情况下请求从 Google 搜索结果中移除内容:该内容发布在您无法验证所有权的网站上(例如朋友的博客)
有时,您要移除的信息可能来自不归您所有或您无法控制的网站。由于每个网站站长分别控制其网站和网站内容,因此更新或移除 Google 搜索结果的最佳方式是让网站所有者(即发布内容的网站)禁止抓取相应网址、修改内容来源或彻底移除该网页。如果内容没有发生变化,我们下次抓取该内容时,它只会重新显示在搜索结果中。因此,如需移除不归您所有的网站托管的内容,请先与网站所有者联系,请求对方移除或屏蔽相关内容。
已移除或屏蔽的内容
如果网站所有者移除了网页,对已移除网页的请求应返回 404 Not Found
响应或 410 Gone
响应。如果他们选择阻止搜索引擎访问该网页,则应在相应网站的 robots.txt 文件中禁止访问该网页,或者包含 noindex
meta
标记。满足其中一项要求后,您就可以使用“Webmaster has already blocked the page”选项提交移除要求。
有时,网站所有者会声称他们屏蔽或移除了某个网页,但从技术上来讲,他们并没有这样做。如果对方声明某个网页已被屏蔽,您可以按照以下方法仔细检查:查看该网站的 robots.txt 文件,确认该网页是否被列入了禁止名单。
User-agent: *
Disallow: /blocked-page/
检查网页的是否已遭屏蔽的另一个位置是网页的 HTML 源代码本身。您既可以访问该网页,也可以在浏览器中选择“查看网页源代码”。HTML head
部分中是否有 noindex 元标记?
<html>
<head>
<title>blocked page</title>
<meta name="robots" content="noindex">
</head>
...
如果他们通知您该网页已被移除,您可以使用 HTTP 响应测试工具(例如适用于 Firefox 浏览器的 Live HTTP Headers 插件)进行确认。启用此插件后,您可以在 Firefox 中请求任何网址,测试 HTTP 响应是否确实为 404 Not Found
或 410 Gone
。
内容已从网页中移除
确认您要求移除的内容在网页已不复存在后,您可以使用“Content has been removed from the page”选项请求移除缓存。这种移除操作通常称为“缓存”移除,可确保 Google 的搜索结果不会包含旧网页的缓存副本或版本,也不会包含旧版网页中的任何文本摘要。您只能从 Google 搜索结果中访问目前已更新的网页(其中的内容已被移除)。不过,由于来自外部网站的入站链接仍存在,因此更新后的网页仍可能会根据与旧内容相关的字词进行排名。对于缓存移除请求,系统会要求您输入“已从网页中移除的字词”。请务必输入当前实际网页中未找到的字词,以便我们的自动化流程可以确认该网页已更改,否则请求将被拒绝。“网址移除说明”系列的第 2 部分中详细介绍了缓存移除方式。
移除安全搜索过滤搜索结果中显示的不当网页或图片
Google 推出了安全搜索过滤器,旨在提供搜索结果来排除可能令人反感的内容。如果您发现某些内容应被安全搜索滤除,可以请求将来从安全搜索过滤结果中排除这些内容。使用“通过安全搜索过滤的结果中包含不当内容”选项提交移除请求。
如果您在使用公开网址移除工具时遇到任何问题,或者有未在此处解决的问题,请将其发布到网站站长帮助论坛,或访问我们的帮助中心查看更详细的移除说明。如果您要在论坛中发帖,请务必使用网址缩短服务来分享指向您要移除的内容的所有链接。
此系列中的其他博文
最后,您可能还想了解如何管理可在线获取的哪些信息。
发布者:Jonathan Simon,网站站长趋势分析师
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[null,null,[],[[["\u003cp\u003eThis post focuses on Google's public URL removal tool, which helps remove content from search results when it's on a site you don't control.\u003c/p\u003e\n"],["\u003cp\u003eBefore using the tool, contact the website owner and request they remove or block the content, ensuring they return a 404/410 response or use robots.txt/noindex meta tag.\u003c/p\u003e\n"],["\u003cp\u003eIf content is removed from the page but still appears in search results, request a cache removal using the tool to update Google's index.\u003c/p\u003e\n"],["\u003cp\u003eYou can use the tool to request removal of inappropriate content appearing in SafeSearch filtered results.\u003c/p\u003e\n"],["\u003cp\u003eFor issues or questions, refer to the Webmaster Help Forum or detailed removal instructions in Google's Help Center.\u003c/p\u003e\n"]]],[],null,["# URL removal explained, Part III: Removing content that you don't own\n\nTuesday, April 20, 2010\n\n\nWelcome to the third episode of our URL removals series! In episodes one and two, we talked about\n[expediting the removal of content that's under your control](/search/blog/2010/03/url-removal-explained-part-i-urls)\nand\n[requesting expedited cache removals](/search/blog/2010/04/url-removals-explained-part-ii-removing).\nToday, we're covering how to use Google's\n[public URL removal tool](https://www.google.com/webmasters/tools/removals)\nto request removal of content from Google's search results when the content originates on a\nwebsite not under your control.\n\nGoogle offers two tools that provide a way to request expedited removal of content:\n\n1. Verified URL removal tool: for requesting to remove content from Google's search results when it's published on a site of which you're a verified owner in Webmaster Tools (like your blog or your company's site)\n2. Public URL removal tool: for requesting to remove content from Google's search results when it's published on a site which you can't verify ownership (like your friend's blog)\n\n\nSometimes a situation arises where the information you want to remove originates from a site that\nyou don't own or can't control. Since each individual webmaster controls their site and their\nsite's content, the best way to update or remove results from Google is for the site owner (where\nthe content is published) to either block crawling of the URL, modify the content source, or\nremove the page altogether. If the content isn't changed, it would just reappear in our search\nresults the next time we crawled it. So the first step to remove content that's hosted on a site\nyou don't own is to\n[contact the owner of the website](https://www.google.com/support/webmasters/bin/answer.py?answer=9109)\nand request that they remove or block the content in question.\n\nRemoved or blocked content\n--------------------------\n\n\nIf the website owner removes a page, requests for the removed\npage should return a\n[`404 Not Found` response or a `410 Gone` response](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes).\nIf they choose to block the page from search engines, then the page should either be disallowed\nin the site's\n[robots.txt](/search/docs/crawling-indexing/robots/intro)\nfile or contain a\n[`noindex` `meta` tag](/search/docs/crawling-indexing/block-indexing).\nOnce one of these requirements is met, you can submit a removal request using the \"Webmaster has\nalready blocked the page\" option.\n\n\nSometimes a website owner will claim that they've blocked or removed a page but they haven't\ntechnically done so. If they claim a page has been blocked you can double check by looking at the\nsite's robots.txt file to see if the page is listed there as disallowed. \n\n```\nUser-agent: *\nDisallow: /blocked-page/\n```\n\n\nAnother place to check if a page has been blocked is within the page's HTML source code itself.\nYou can visit the page and choose \"View Page Source\" from your browser. Is there a meta noindex\ntag in the HTML `head` section? \n\n```\n\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003eblocked page\u003c/title\u003e\n\u003cmeta name=\"robots\" content=\"noindex\"\u003e\n\u003c/head\u003e\n...\n```\n\n\nIf they inform you that the page has been removed, you can confirm this by using an HTTP response\ntesting tool like the\n[Live HTTP Headers](https://addons.mozilla.org/en-US/firefox/addon/3829)\nadd-on for the Firefox browser. With this add-on enabled, you can request any URL in Firefox to\ntest that the HTTP response is actually `404 Not Found` or `410 Gone`.\n\nContent removed from the page\n-----------------------------\n\n\nOnce you've confirmed that the content you're seeking to remove is no longer present on the page,\nyou can request a\n[cache removal](https://www.google.com/support/webmasters/bin/answer.py?answer=59819)\nusing the 'Content has been removed from the page' option. This type of\nremoval---usually called a \"cache\" removal---ensures that Google's search results will not\ninclude the cached copy or version of the old page, or any\n[snippets](/search/docs/appearance/snippet)\nof text from the old version of the page. Only the current updated page (without the content\nthat's been removed) will be accessible from Google's search results. However, the current updated\npage can potentially still rank for terms related to the old content as a result of inbound links\nthat still exist from external sites. For cache removal requests you'll be asked to enter a \"term\nthat has been removed from the page.\" Be sure to enter a word that is not found on the current\nlive page, so that our automated process can confirm the page has changed---otherwise the\nrequest will be denied. Cache removals are covered in more detail in\n[part two of the \"URL removal explained\" series](/search/blog/2010/04/url-removals-explained-part-ii-removing).\n\nRemoving inappropriate webpages or images that appear in our SafeSearch filtered results\n----------------------------------------------------------------------------------------\n\n\nGoogle introduced the\n[SafeSearch](https://www.google.com/support/websearch/bin/answer.py?answer=510)\nfilter with the goal of providing search results that exclude potentially offensive content. For\nsituations where you find content that you feel should have been filtered out by SafeSearch, you\ncan request that this content be excluded from SafeSearch filtered results in the future. Submit\na removal request using the 'Inappropriate content appears in our SafeSearch filtered results'\noption.\n\n\nIf you encounter any issues with the public URL removal tool or have questions not addressed here,\nplease post them to the\n[Webmaster Help Forum](https://support.google.com/webmasters/community/label?lid=5489e59697a233d7)\nor consult the more\n[detailed removal instructions](https://www.google.com/support/webmasters/bin/answer.py?answer=164734)\nin our Help Center. If you do post to the forum, remember to use a\n[URL shortening service](https://www.google.com/search?q=url+shorteners)\nto share any links to content you want removed.\n\nOther posts of this series\n--------------------------\n\n- [Part I: Removing URLs and directories](/search/blog/2010/03/url-removal-explained-part-i-urls)\n- [Part II: Removing and updating cached content](/search/blog/2010/04/url-removals-explained-part-ii-removing)\n- [Part III: Removing content you don't own](/search/blog/2010/04/url-removal-explained-part-iii-removing)\n- [Part IV: Tracking requests, what not to remove](/search/blog/2010/05/url-removal-explained-part-iv-tracking)\n\n\nFinally, you might be also interested to read about\n[managing what information is available about you online](/search/blog/2009/10/managing-your-reputation-through-search).\n\n\nWritten by\n[Jonathan Simon](/search/blog/authors/jonathan-simon), Webmaster Trends Analyst"]]