网址移除说明(第二部分):从网页中移除敏感文本
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
2010 年 8 月 6 日,星期五
可能会发生变化 - 有时,正如我们在之前发布的关于网址移除的博文中所见,您可能会彻底屏蔽或移除自己网站上的某个网页。有时,您可能只能更改网页的某些部分,或移除某些文本。这些更改可能需要一段时间才能体现在我们的搜索结果中,具体取决于网页的抓取频率。在这篇博文中,我们将探讨如果我们仍在搜索结果中显示已移除的旧内容(无论是“摘要”形式,还是从搜索结果链接至其他网页的缓存网页),您可以采取哪些措施。当旧内容中包含需要快速移除的敏感信息时,这样做很有意义;如果您只是正常更新网站,则无需执行此操作。
我们来看一个虚构的搜索结果示例:
Walter E. Coyote |
< 标题 |
Chief Development Officer at Acme Corp 1948-2003: worked on the top
secret velocitus incalculii capturing device which has shown potential... |
< 摘要 |
www.example.com/about/waltercoyote - 缓存 |
< 网址 + 指向缓存网页的链接 |
如需更改摘要(或链接的缓存页面)中显示的内容,您需要先更改实际(实时)页面上的内容。除非更改了网页的公开内容,否则 Google 的自动流程会继续在我们的搜索结果中显示部分原创内容。
网页内容发生变化后,您可以通过以下几种方式使这些更改显示在搜索结果中:
-
等待 Googlebot 重新抓取网页并将其重新编入索引:这是 Google 更新大部分内容的自然方法。有时可能需要相当长的时间,具体取决于 Googlebot 当前抓取相关网页的频率。我们重新抓取网页并将其重新编入索引后,通常不会显示旧内容,因为系统会将其替换为当前内容。如果 Googlebot 未被禁止抓取相应网页(使用 robots.txt 或因无法正确访问服务器),您无需执行任何特殊操作即可实现此目的。通常,无法加快抓取和索引编制速度,因为这些流程是完全自动化的,并依赖于许多外部因素。
-
使用 Google 的公开网址移除工具请求移除已从他人的网页中移除的内容。使用此工具时,有必要输入修改后的网页的确切网址,并选择“已从网页中移除内容”选项,然后指定已从该网页中完全移除的一个或多个字词。
请注意,您输入的字词不能出现在相应网页上;即使某个字词已从网页的某个部分移除,如果您的字词仍出现在网页的另一部分,您的请求会被拒绝。请务必选择网页上任何位置不会再出现的一个或多个字词。在上面的示例中,如果您移除了“top secret velocitus incalculii capturing device”,则应提交这些字词,而不是“my project”。但是,如果“top”或“device”一词仍存在于网页上的任何位置,系统将拒绝该请求。为了尽可能提高成功率,最简单的方法通常是只输入一个您确定在该网页上的任何位置都不会再出现的字词。
如果您的请求已处理完毕,而且提交的字词不再出现在网页上,搜索结果将不再显示摘要,也不再提供缓存的网页。尽管摘要中不再显示这些字词,但仍会显示该网页的标题和网址;且对于已删除内容的相关搜索(例如搜索 velocitus incalculii),您可能仍会在搜索结果中发现该条目。但是,重新抓取该网页并将其重新编入索引后,我们的搜索结果中会显示新的摘要和缓存网页。
请注意,我们会查看该网页,以验证是否删除了相应的字词。如果该网页已不存在,且服务器返回了正确的 404
或 410
HTTP 结果代码,导致我们无法查看该网页,建议您最好请求移除该网页。
-
使用 Google 网站站长工具中的网址移除工具请求从您的网站中移除网页上的信息。如果您有权访问相关网站并在 Google 网站站长工具中验证了对该网站的所有权,就可以在该网站中使用网址移除工具(依次前往“网站配置”>“抓取工具访问权限”)请求移除摘要和缓存的网页,直到 Google 重新抓取该网页。要使用此工具,您只需提交网页的确切网址即可(无需指定任何已移除的字词)。您的请求处理完毕后,我们会从搜索结果中移除摘要和缓存网页。网页的标题和网址仍然可见,不过,对于与已移除内容相关的查询,该网页仍可能会继续在搜索结果中获得排名。网页被重新抓取并重新编入索引后,搜索结果中可能会显示更新后的摘要和缓存网页(基于新内容)。
Google 会根据网页内容以及其他因素(例如指向网址的入站链接)将内容编入索引并对其进行排名。因此,即使网页已重新抓取并重新编入索引,网址也可能继续出现在网页上已不存在的内容的搜索结果中。虽然网址移除工具可以从搜索结果中移除摘要和缓存网页,但它不会更改或移除搜索结果的标题、更改显示的网址,也不会根据当前或之前的任何内容阻止网页显示在 Google 搜索结果中。如果这对您很重要,您应该确保相应网址满足从我们的搜索结果中彻底移除的要求。
移除非 HTML 内容
如果更改的内容不在 (X)HTML 中(例如,更改了图片、Flash 文件或 PDF 文件),您将无法使用缓存移除工具。因此,如果旧内容不再显示在搜索结果中,最快的方法是更改文件的网址,让旧网址返回 404
HTTP 结果代码,并使用网址移除工具移除旧网址。否则,如果您选择允许 Google 自然刷新您的信息,则请注意,在重新抓取后,非 HTML 内容的预览(例如 PDF 文件的快速查看链接)可能需要比普通 HTML 网页更长时间才会更新。
主动阻止显示摘要或缓存版本
作为网站站长,您可以使用漫游器 meta
标记主动阻止显示摘要或缓存版本,而无需使用我们的移除工具。虽然我们不建议将这种方法作为默认方法(网页摘要有助于用户更快地识别相关搜索结果,而缓存网页让用户能够在服务器不可用这种意外事件中查看您的内容),您可以使用“nosnippet”漫游器 meta
标记阻止显示摘要,或使用“noarchive”robots meta
标记禁止缓存网页。请注意,如果现有的和已知的网页上有此变化,Googlebot 将需要重新抓取这些网页并将其重新编入索引,使此变更显示在搜索结果中。
我们希望本博文能帮助您更清楚地了解适用于更新版网页的网址移除工具背后的一些流程。在下一篇博文中,我们将探讨如何请求移除不归您所有的内容;敬请期待!
与往常一样,欢迎您在我们的网站站长帮助论坛中提供反馈意见。
此系列中的其他博文
最后,您可能还想了解如何管理可在线获取的哪些信息。
发布者:John Mueller,Google(瑞士)网站站长趋势分析师
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
[null,null,[],[[["\u003cp\u003eGoogle's search results may display outdated content even after a webpage has been updated.\u003c/p\u003e\n"],["\u003cp\u003eTo update Google's search results, you can wait for Google to recrawl the page, or request removal of the outdated content through Google's URL removal tool.\u003c/p\u003e\n"],["\u003cp\u003eIf you own the website, use Google Webmaster Tools to remove the snippet and cached page until Google recrawls the updated page.\u003c/p\u003e\n"],["\u003cp\u003eGoogle's URL removal tool does not prevent a page from ranking based on previous content, so for complete removal, consider the requirements for removal from search results altogether.\u003c/p\u003e\n"],["\u003cp\u003eTo prevent snippets or cached versions from appearing, use robots meta tags, but it's generally recommended to keep them for user experience.\u003c/p\u003e\n"]]],["To update outdated content in Google search results, first modify the live page. Then, either wait for Googlebot to re-crawl and re-index or use Google's URL removal tools. There are two options for URL removals: removing content from others' pages by specifying removed words or removing information from your own page via Google Webmaster Tools, without specifying removed words. For non-HTML content, change the file's URL. Lastly, webmasters can proactively prevent snippets and cached versions using robots meta tags.\n"],null,["# URL removals explained, part II: Removing sensitive text from a page\n\nFriday, August 06, 2010\n\n\nChange can happen---sometimes, as we saw in our\n[previous post on URL removals](/search/blog/2010/03/url-removal-explained-part-i-urls),\nyou may completely block or remove a page from your site. Other times you might only change parts\nof a page, or remove certain pieces of text. Depending on how frequently a page is being crawled,\nit can take some time before these changes get reflected in our search results. In this blog post\nwe'll look at the steps you can take if we're still showing old, removed content in our search\nresults, either in the form of a \"snippet\" or on the cached page that's linked to from the search\nresult. Doing this makes sense when the old content contains sensitive information that needs to\nbe removed quickly---it's not necessary to do this when you just update a website normally.\n\nAs an example, let's look at the following fictitious search result:\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|\n| **Walter** E. **Coyote** | \\\u003c Title |\n| Chief Development Officer at Acme Corp 1948-2003: worked on the top secret velocitus incalculii capturing device which has shown potential**...** | \\\u003c Snippet |\n| www.example.com/about/**waltercoyote** - Cached | \\\u003c URL + link to cached page |\n\n\nTo change the content shown in the snippet (or on the linked cached page),\n**you'll first need to change the content on the actual (live) page**. Unless a page's publicly\nvisible content is changed, Google's automatic processes will continue to show parts of the\noriginal content in our search results.\n\n\nOnce the page's content has been changed, there are several options available to make those\nchanges visible in our search results:\n\n1.\n **Wait for Googlebot to re-crawl and re-index the page**: This is the natural method for\n how most content is updated at Google. Sometimes it can take a fairly long time, depending on\n how frequently Googlebot currently crawls the page in question. Once we've re-crawled and\n re-indexed the page, the old content will usually not be visible as it'll be replaced by the\n current content. Provided Googlebot is not blocked from crawling the page in question (either\n by robots.txt or by not being able to access the server properly), you don't have to do\n anything special for this to take place. It's generally not possible to speed up crawling and\n indexing, as these processes are fully automated and depend on many external factors.\n\n2.\n Use\n [Google's public URL removal tool](https://www.google.com/webmasters/tools/removals)\n to **request removal of content that has been removed from someone else's webpage** . Using\n this tool, it's necessary to enter the\n [exact URL of the page](https://www.google.com/support/webmasters/bin/answer.py?answer=63758)\n that has been modified, select the \"Content has been removed from the page\" option, and then\n specify one or more words that have been completely removed from that page.\n\n\n Note that *none* of the words you enter can appear on the page; even if a word has been\n removed from one part of the page, your request will be denied if that word still appears on\n another part of the page. Be sure to choose a word (or words) that no longer appear\n *anywhere* on the page. If, in the above example, you removed\n \"top secret velocitus incalculii capturing device\", you should\n submit those words and not something like \"my project.\" However, if the word\n \"top\" or \"device\" still exists\n anywhere on the page, the request would be denied. To maximize your chances of success, it's\n often easiest to just enter one word that you're sure no longer appears anywhere on the page.\n\n\n Once your request has been processed and it's found that the submitted word(s) no longer\n appear on the page, the search result will no longer show a snippet, nor will the cached page\n be available. The title and the URL of the page will still be visible, and the entry may still\n appear in search results for searches related to the content that has been removed (such as\n searches for\n [velocitus incalculii](https://www.google.com/search?q=velocitus+incalculii)),\n even if those words no longer appear in the snippet. However, once the page has been\n re-crawled and re-indexed, the new snippet and cached page can be visible in our search\n results.\n\n\n Keep in mind that we will need to verify removal of the word(s) by viewing the page. If the\n page no longer exists and the server is returning a proper\n [`404` or `410` HTTP result code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes),\n making us unable to view the page, you may be better off\n [requesting removal of the page](/search/blog/2010/03/url-removal-explained-part-i-urls)\n altogether.\n3. Use Google Webmaster Tools URL removal tool to **request removal of information on a page from your website** . If you have access to the website in question and have verified ownership of it in [Google Webmaster Tools](https://search.google.com/search-console), you can use the URL removal tool there (under *Site Configuration \\\u003e Crawler access* ) to request that the snippet and the cached page be removed until the page has been re-crawled. To use this tool, you only need to submit the [exact URL of the page](https://www.google.com/support/webmasters/bin/answer.py?answer=63758) (you won't need to specify any removed words). Once your request has been processed, we'll remove the snippet and the cached page from search results. The title and the URL of the page will still be visible, and the page may also continue to rank in search results for queries related to content that has been removed. After the page has been re-crawled and re-indexed, the search result with an updated snippet and cached page (based on the new content) can be visible.\n\n\nGoogle indexes and ranks items based not only on the content of a page, but also on other external\nfactors, such as the inbound links to the URL. Because of this, it's possible for a URL to\ncontinue to appear in search results for content that no longer exists on the page, even after\nthe page has been re-crawled and re-indexed. While the URL removal tool can remove the snippet\nand the cached page from a search result, it will not change or remove the title of the search\nresult, change the URL that is shown, or prevent the page from being shown for searches based on\nany current or previous content. If this is important to you, you should make sure that the URL\nfulfills the requirements for a\n[complete removal from our search results](/search/blog/2010/03/url-removal-explained-part-i-urls).\n\nRemoving non-HTML content\n-------------------------\n\n\nIf the changed content is not in (X)HTML (for example if an image, a Flash file or a PDF file has\nbeen changed), you won't be able to use the cache removal tool. So if it's important that the old\ncontent no longer be visible in search results, the fastest solution would be to change the URL\nof the file so that the old URL returns a `404` HTTP result code and use the URL\nremoval tool to remove the old URL. Otherwise, if you chose to allow Google to naturally refresh\nyour information, know that previews of non-HTML content (such as\n[Quick View links for PDF files](https://googleblog.blogspot.com/2009/10/quickly-view-formatted-pdfs-in-your.html))\ncan take longer to update after recrawling than normal HTML pages would.\n\nProactively preventing the appearance of snippets or cached versions\n--------------------------------------------------------------------\n\n\nAs a webmaster, you have the option to use robots\n[`meta` tags](/search/docs/advanced/crawling/special-tags)\nto proactively prevent the appearance of snippets or cached versions without using our removal\ntools. While we don't recommend this as a default approach (the snippet can help users recognize a\nrelevant search result faster, and a cached page gives them the ability to view your content even\nin the unexpected event of your server not being available), you can use the \"nosnippet\" robots\n`meta` tag to\n[prevent showing of a snippet](/search/docs/crawling-indexing/robots-meta-tag#nosnippet),\nor the \"noarchive\" robots `meta` tag to disable caching of a page. Note that if this is changed on\nexisting and known pages, Googlebot will need to re-crawl and re-index those pages before this\nchange becomes visible in search results.\n\n\nWe hope this blog post helps to make some of the processes behind the URL removal tool for updated\npages a bit clearer. In our next blog post we'll look at ways to request removal of content that\nyou don't own; stay tuned!\n\n\nAs always, we welcome your feedback and questions in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community/label?lid=5489e59697a233d7).\n\nOther posts of this series\n--------------------------\n\n- [Part I: Removing URLs and directories](/search/blog/2010/03/url-removal-explained-part-i-urls)\n- [Part II: Removing and updating cached content](/search/blog/2010/04/url-removals-explained-part-ii-removing)\n- [Part III: Removing content you don't own](/search/blog/2010/04/url-removal-explained-part-iii-removing)\n- [Part IV: Tracking requests, what not to remove](/search/blog/2010/05/url-removal-explained-part-iv-tracking)\n\n\nFinally, you might be also interested to read about\n[managing what information is available about you online](/search/blog/2009/10/managing-your-reputation-through-search).\n\n\nPosted by\n[John Mueller](https://twitter.com/JohnMu),\nWebmaster Trends Analyst, Google Switzerland"]]