詳解移除網址 - 第 2 集:移除網頁中的機密文字
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
2010 年 8 月 6 日,星期五
我們在上一篇關於移除網址的文章中提到,有時候您可以完全封鎖網站上的網頁,或是將網頁從網站上完全移除。有時您或許只是變更網頁的部分內容,或是移除特定文字。由於網頁的檢索頻率不同,所以這些變更需要一段時間才會反映在搜尋結果中。在這篇網誌文章中,我們將會說明一些步驟,如果過舊、遭到移除的內容仍然以「摘要」的形式顯示在搜尋結果中,或是出現在可透過搜尋結果連結到的快取網頁中,您就可以採取這些步驟來解決問題。如果舊的內容含有需要快速移除的機密資訊,這樣做就有其必要,而如果您只是正常更新網站,就沒有必要這麼做。
我們以下面這個虛構的搜尋結果為例:
Walter E. Coyote |
< 標題 |
Chief Development Officer at Acme Corp 1948-2003: worked on the top
secret velocitus incalculii capturing device which has shown potential... |
< 摘要 |
www.example.com/about/waltercoyote - Cached |
< 網址 + 快取網頁的連結 |
如要變更摘要 (或連結快取網頁) 中顯示的內容,必須先變更實際網頁上的內容。除非網頁公開顯示的內容有所變更,否則 Google 的自動程序會繼續在搜尋結果中顯示部分的原始內容。
變更網頁內容後,您可以透過幾種方式在搜尋結果中顯示這些變更:
-
等待 Googlebot 重新檢索網頁並重新建立索引:Google 會使用這種自然的方式更新大部分的內容。這有時可能需要相當長的時間,取決於 Googlebot 目前檢索相關網頁的頻率。重新檢索網頁並重新建立索引後,系統通常會以目前內容取代舊的內容,因此不會顯示舊的內容。假設 Googlebot 並未因遭到封鎖而無法檢索相關網頁 (無論是遭到 robots.txt 檔案封鎖或無法正常存取伺服器),您就不必採取任何特別的措施。通常我們無法加快檢索和建立索引的速度,因為這些程序都是全自動化,且取決於許多外部因素。
-
使用 Google 的公開網址移除工具,要求移除已經從他人網頁上刪除的內容。使用這項工具時,請務必輸入已經過修改的網頁確切網址,然後選取「內容已從網頁中移除」選項,再指定一或多個已經從該網頁上完全移除的字詞。
請注意,您輸入的「所有」字詞都不得顯示在網頁上;即使字詞已從網頁的某個部分移除,如果該字詞仍出現在網頁的其他部分,您的要求就會遭到拒絕。請確保您所選的字詞 (或多個字詞) 不再出現在網頁上的「任何位置」。在上述範例中,如果您移除了「top secret velocitus incalculii capturing device」,則應該提交這些字詞,而不是像「我的專案」這樣的字詞。不過,如果網頁上任何位置還有「top」或「device」等字詞,系統就會拒絕要求。若要成功排除一切可能,通常最簡單的做法是只輸入一個您確定不會出現在網頁任何位置的字詞。
一旦系統處理了您的要求,而且發現送出的字詞已不再出現於網頁上,則搜尋結果就不會再顯示摘要內容,也不會提供快取網頁。儘管摘要中不再顯示這些字詞,但仍會顯示該網頁的標題和網址;且對於已刪除內容的相關搜尋 (例如 velocitus incalculii 的搜尋查詢),您可能仍會在搜尋結果中發現該項目。不過,等到網頁經過重新檢索並重新編入索引,我們的搜尋結果便會顯示新的摘要內容和快取網頁。
請記住,我們必須檢視網頁才能驗證是否移除該字詞。如果網頁已經不存在,且伺服器傳回適當的 404
或 410
HTTP 結果碼,導致我們無法檢視網頁,則建議您要求移除網頁。
-
使用 Google 網站管理員工具網址移除工具,要求將資訊從您網站上的某個網頁中移除。如果您有權存取相關網站,且已經在 Google 網站管理員工具中驗證網站的擁有權,就可以使用網址移除工具 (位於「網站設定」>「檢索器存取權」下) 來要求移除摘要和快取網頁,直到系統重新檢索網頁為止。如果要使用這項工具,您只需提交網頁的確切網址 (不必指定任何已移除的字詞)。系統處理您的要求後,就會從搜尋結果中移除摘要和快取網頁,但仍然會顯示網頁的標題和網址,當搜尋與已移除的內容有關的內容時,該網頁仍可能繼續出現在搜尋結果中。重新檢索網頁並重新建立索引後,系統就會根據新內容,在搜尋結果中顯示更新的摘要和快取網頁。
Google 會同時根據網頁內容和其他外部因素 (例如網址的連入連結),為網頁建立索引並進行排名。因此,即使網頁已重新檢索並重新建立索引,搜尋結果中可能還是會繼續出現已經移除的網頁的內容網址。雖然網址移除工具能夠從搜尋結果中移除摘要和快取網頁,但是當有人搜尋任何目前或先前的內容時,這項工具並不會變更或移除搜尋結果的標題、變更顯示的網址,或是禁止顯示網頁。如果這對您來說相當重要,請確認網址符合從搜尋結果中完全移除的規定。
移除非 HTML 內容
如果變更的內容不是 (X)HTML (例如,如果圖片、Flash 檔案或 PDF 檔案已經變更),則您將無法使用快取移除工具。因此,如果您必須確保舊內容不會再出現在搜尋結果中,最快的方法就是變更檔案網址,讓舊網址傳回 404
HTTP 結果碼,並使用網址移除工具移除舊網址。否則,如果您選擇讓 Google 自然重新整理資訊,則系統必須在重新檢索後更新資訊,這樣預覽非 HTML 內容 (例如 PDF 檔案的快速檢視連結) 所需的時間,會比一般 HTML 網頁還要長。
主動防止顯示網頁摘要或快取版本
身為網站管理員,您可以選擇使用漫遊器 meta
標記來主動防止顯示網頁摘要或快取版本,而不必使用我們的移除工具。儘管我們並不建議使用這種預設方法 (摘要可以協助使用者更快找到相關搜尋結果,而快取網頁則能在發生無法存取伺服器的非預期事件情況下,讓使用者查看您的內容),但您還是可以使用「nosnippet」漫遊器 meta
標記來禁止顯示摘要,或是使用「noarchive」robots meta
標記來停用快取頁面。請注意,如果現有和已知網頁有所變更,Googlebot 就必須重新檢索這些網頁並重新建立索引,才能在搜尋結果中顯示變更。
希望這篇網誌文章能夠讓您更清楚瞭解更新的網頁網址移除工具背後的作業程序。在下一篇網誌文章中,我們將介紹如何要求移除不屬於自己的內容;敬請持續鎖定最新消息!
我們一如以往歡迎您前往網站管理員說明論壇提供意見和提問。
此系列的其他文章
最後,我們也建議您參閱這篇文章,瞭解如何管理網路上與您有關的資訊。
發文者:Google 瑞士分公司的網站管理員趨勢分析師 John Mueller
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
[null,null,[],[[["\u003cp\u003eGoogle's search results may display outdated content even after a webpage has been updated.\u003c/p\u003e\n"],["\u003cp\u003eTo update Google's search results, you can wait for Google to recrawl the page, or request removal of the outdated content through Google's URL removal tool.\u003c/p\u003e\n"],["\u003cp\u003eIf you own the website, use Google Webmaster Tools to remove the snippet and cached page until Google recrawls the updated page.\u003c/p\u003e\n"],["\u003cp\u003eGoogle's URL removal tool does not prevent a page from ranking based on previous content, so for complete removal, consider the requirements for removal from search results altogether.\u003c/p\u003e\n"],["\u003cp\u003eTo prevent snippets or cached versions from appearing, use robots meta tags, but it's generally recommended to keep them for user experience.\u003c/p\u003e\n"]]],["To update outdated content in Google search results, first modify the live page. Then, either wait for Googlebot to re-crawl and re-index or use Google's URL removal tools. There are two options for URL removals: removing content from others' pages by specifying removed words or removing information from your own page via Google Webmaster Tools, without specifying removed words. For non-HTML content, change the file's URL. Lastly, webmasters can proactively prevent snippets and cached versions using robots meta tags.\n"],null,["# URL removals explained, part II: Removing sensitive text from a page\n\nFriday, August 06, 2010\n\n\nChange can happen---sometimes, as we saw in our\n[previous post on URL removals](/search/blog/2010/03/url-removal-explained-part-i-urls),\nyou may completely block or remove a page from your site. Other times you might only change parts\nof a page, or remove certain pieces of text. Depending on how frequently a page is being crawled,\nit can take some time before these changes get reflected in our search results. In this blog post\nwe'll look at the steps you can take if we're still showing old, removed content in our search\nresults, either in the form of a \"snippet\" or on the cached page that's linked to from the search\nresult. Doing this makes sense when the old content contains sensitive information that needs to\nbe removed quickly---it's not necessary to do this when you just update a website normally.\n\nAs an example, let's look at the following fictitious search result:\n\n|---------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|\n| **Walter** E. **Coyote** | \\\u003c Title |\n| Chief Development Officer at Acme Corp 1948-2003: worked on the top secret velocitus incalculii capturing device which has shown potential**...** | \\\u003c Snippet |\n| www.example.com/about/**waltercoyote** - Cached | \\\u003c URL + link to cached page |\n\n\nTo change the content shown in the snippet (or on the linked cached page),\n**you'll first need to change the content on the actual (live) page**. Unless a page's publicly\nvisible content is changed, Google's automatic processes will continue to show parts of the\noriginal content in our search results.\n\n\nOnce the page's content has been changed, there are several options available to make those\nchanges visible in our search results:\n\n1.\n **Wait for Googlebot to re-crawl and re-index the page**: This is the natural method for\n how most content is updated at Google. Sometimes it can take a fairly long time, depending on\n how frequently Googlebot currently crawls the page in question. Once we've re-crawled and\n re-indexed the page, the old content will usually not be visible as it'll be replaced by the\n current content. Provided Googlebot is not blocked from crawling the page in question (either\n by robots.txt or by not being able to access the server properly), you don't have to do\n anything special for this to take place. It's generally not possible to speed up crawling and\n indexing, as these processes are fully automated and depend on many external factors.\n\n2.\n Use\n [Google's public URL removal tool](https://www.google.com/webmasters/tools/removals)\n to **request removal of content that has been removed from someone else's webpage** . Using\n this tool, it's necessary to enter the\n [exact URL of the page](https://www.google.com/support/webmasters/bin/answer.py?answer=63758)\n that has been modified, select the \"Content has been removed from the page\" option, and then\n specify one or more words that have been completely removed from that page.\n\n\n Note that *none* of the words you enter can appear on the page; even if a word has been\n removed from one part of the page, your request will be denied if that word still appears on\n another part of the page. Be sure to choose a word (or words) that no longer appear\n *anywhere* on the page. If, in the above example, you removed\n \"top secret velocitus incalculii capturing device\", you should\n submit those words and not something like \"my project.\" However, if the word\n \"top\" or \"device\" still exists\n anywhere on the page, the request would be denied. To maximize your chances of success, it's\n often easiest to just enter one word that you're sure no longer appears anywhere on the page.\n\n\n Once your request has been processed and it's found that the submitted word(s) no longer\n appear on the page, the search result will no longer show a snippet, nor will the cached page\n be available. The title and the URL of the page will still be visible, and the entry may still\n appear in search results for searches related to the content that has been removed (such as\n searches for\n [velocitus incalculii](https://www.google.com/search?q=velocitus+incalculii)),\n even if those words no longer appear in the snippet. However, once the page has been\n re-crawled and re-indexed, the new snippet and cached page can be visible in our search\n results.\n\n\n Keep in mind that we will need to verify removal of the word(s) by viewing the page. If the\n page no longer exists and the server is returning a proper\n [`404` or `410` HTTP result code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes),\n making us unable to view the page, you may be better off\n [requesting removal of the page](/search/blog/2010/03/url-removal-explained-part-i-urls)\n altogether.\n3. Use Google Webmaster Tools URL removal tool to **request removal of information on a page from your website** . If you have access to the website in question and have verified ownership of it in [Google Webmaster Tools](https://search.google.com/search-console), you can use the URL removal tool there (under *Site Configuration \\\u003e Crawler access* ) to request that the snippet and the cached page be removed until the page has been re-crawled. To use this tool, you only need to submit the [exact URL of the page](https://www.google.com/support/webmasters/bin/answer.py?answer=63758) (you won't need to specify any removed words). Once your request has been processed, we'll remove the snippet and the cached page from search results. The title and the URL of the page will still be visible, and the page may also continue to rank in search results for queries related to content that has been removed. After the page has been re-crawled and re-indexed, the search result with an updated snippet and cached page (based on the new content) can be visible.\n\n\nGoogle indexes and ranks items based not only on the content of a page, but also on other external\nfactors, such as the inbound links to the URL. Because of this, it's possible for a URL to\ncontinue to appear in search results for content that no longer exists on the page, even after\nthe page has been re-crawled and re-indexed. While the URL removal tool can remove the snippet\nand the cached page from a search result, it will not change or remove the title of the search\nresult, change the URL that is shown, or prevent the page from being shown for searches based on\nany current or previous content. If this is important to you, you should make sure that the URL\nfulfills the requirements for a\n[complete removal from our search results](/search/blog/2010/03/url-removal-explained-part-i-urls).\n\nRemoving non-HTML content\n-------------------------\n\n\nIf the changed content is not in (X)HTML (for example if an image, a Flash file or a PDF file has\nbeen changed), you won't be able to use the cache removal tool. So if it's important that the old\ncontent no longer be visible in search results, the fastest solution would be to change the URL\nof the file so that the old URL returns a `404` HTTP result code and use the URL\nremoval tool to remove the old URL. Otherwise, if you chose to allow Google to naturally refresh\nyour information, know that previews of non-HTML content (such as\n[Quick View links for PDF files](https://googleblog.blogspot.com/2009/10/quickly-view-formatted-pdfs-in-your.html))\ncan take longer to update after recrawling than normal HTML pages would.\n\nProactively preventing the appearance of snippets or cached versions\n--------------------------------------------------------------------\n\n\nAs a webmaster, you have the option to use robots\n[`meta` tags](/search/docs/advanced/crawling/special-tags)\nto proactively prevent the appearance of snippets or cached versions without using our removal\ntools. While we don't recommend this as a default approach (the snippet can help users recognize a\nrelevant search result faster, and a cached page gives them the ability to view your content even\nin the unexpected event of your server not being available), you can use the \"nosnippet\" robots\n`meta` tag to\n[prevent showing of a snippet](/search/docs/crawling-indexing/robots-meta-tag#nosnippet),\nor the \"noarchive\" robots `meta` tag to disable caching of a page. Note that if this is changed on\nexisting and known pages, Googlebot will need to re-crawl and re-index those pages before this\nchange becomes visible in search results.\n\n\nWe hope this blog post helps to make some of the processes behind the URL removal tool for updated\npages a bit clearer. In our next blog post we'll look at ways to request removal of content that\nyou don't own; stay tuned!\n\n\nAs always, we welcome your feedback and questions in our\n[Webmaster Help Forum](https://support.google.com/webmasters/community/label?lid=5489e59697a233d7).\n\nOther posts of this series\n--------------------------\n\n- [Part I: Removing URLs and directories](/search/blog/2010/03/url-removal-explained-part-i-urls)\n- [Part II: Removing and updating cached content](/search/blog/2010/04/url-removals-explained-part-ii-removing)\n- [Part III: Removing content you don't own](/search/blog/2010/04/url-removal-explained-part-iii-removing)\n- [Part IV: Tracking requests, what not to remove](/search/blog/2010/05/url-removal-explained-part-iv-tracking)\n\n\nFinally, you might be also interested to read about\n[managing what information is available about you online](/search/blog/2009/10/managing-your-reputation-through-search).\n\n\nPosted by\n[John Mueller](https://twitter.com/JohnMu),\nWebmaster Trends Analyst, Google Switzerland"]]