使用Google抓取方式呈现网页
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
2014年5月28日星期三
原文:
Rendering pages with Fetch as Google
作者:
Shimi Salant,
网站站长工具小组
利用
网站站长工具中的Google抓取方式功能
,网站站长可查看Googlebot在尝试抓取其网页时会获得的结果。所显示的服务器标头和HTML有助于诊断技术问题以及黑客攻击的负面影响,但有时会使仔细检查响应变得非常困难:
求助!所有这些代码都意味着什么?这确实是我在浏览器中看到的网页吗?我们去哪儿吃午饭?
对于最后这个问题,我们爱莫能助;但为了帮助您解决前两个问题,我们最近对此工具进行了扩展,以便其同时显示Googlebot呈现网页的方式。
查看呈现的网页
为了呈现网页,Googlebot将尝试找到并抓取所有涉及到的外部文件。这些文件通常包括图片、CSS和JavaScript文件,以及可能通过CSS或JavaScript间接嵌入的其他文件。然后,Googlebot将使用这些文件呈现一个预览图片,以便显示Googlebot看到的网页。
处理通过robots.txt阻止抓取的资源
Googlebot抓取的所有文件都是按照
robots.txt指令
抓取的。如果您禁止Googlebot抓取其中的某些文件(或者如果这些文件是通过禁止Googlebot抓取这些文件的第三方服务器嵌入的),那么我们将无法在呈现的视图中显示这些文件。同样,如果服务器返回错误或无法响应,我们将无法使用这些文件(您可以在网站站长工具的
抓取错误
部分找到类似问题)。如果我们遇到任何此类问题,都会将其显示在预览图片的下方。
我们建议您确保Googlebot可以访问任何有益于呈现网站可见内容或版式的嵌入资源。这可让您更轻松地使用Google抓取方式,并使Googlebot能够找到相应内容并将其编入索引。某些类型的内容(例如社交媒体按钮、字体或网站分析脚本)对于呈现网站可见内容或版式并无帮助,因此您可以禁止Googlebot抓取这些内容。有关详情,请查看我们之前发布的有关
Google在采取什么方式来更好地了解网页
的博文。
我们希望此次更新能够帮助您更轻松地诊断这些类型的问题,并能让能更轻松地发现被意外阻止抓取的内容。如有任何意见或疑问,您可以在此处告诉我们,也可以在
网站站长帮助论坛
中发帖。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2014-05-01。
[null,null,["最后更新时间 (UTC):2014-05-01。"],[[["\u003cp\u003eGoogle Webmaster Tools now includes a "Fetch and Render" feature, allowing webmasters to see how Googlebot renders their pages, including handling of external resources like images, CSS, and JavaScript.\u003c/p\u003e\n"],["\u003cp\u003eThe rendered view helps diagnose technical issues and understand how Googlebot perceives a page's content and layout, especially concerning blocked resources.\u003c/p\u003e\n"],["\u003cp\u003eGoogle recommends allowing Googlebot access to crucial embedded resources that impact content visibility and layout for better rendering and indexing.\u003c/p\u003e\n"],["\u003cp\u003eResources like social media buttons or analytics scripts, which don't significantly impact visible content, can remain disallowed from crawling.\u003c/p\u003e\n"],["\u003cp\u003eThis update aims to simplify issue diagnosis and identify content unintentionally blocked from Google's crawling process.\u003c/p\u003e\n"]]],["The Fetch as Google feature, found in Google Webmaster Tools, allows users to see how Googlebot renders a webpage. This tool fetches external files like images, CSS, and JavaScript to create a preview image. Users can submit a URL using \"Fetch and render\" and view the results after processing. It follows robots.txt rules, and resources blocked or with server errors won't be rendered. Ensuring Googlebot can access essential resources improves the tool's effectiveness and content indexing.\n"],null,["# Rendering pages with Fetch as Google\n\n| It's been a while since we published this blog post. Some of the information may be outdated (for example, some images may be missing, and some links may not work anymore). Check how Google renders your page with the [URL inspection tool](https://search.google.com/search-console#view_screenshot) or [Rich Results Test](https://search.google.com/test/rich-results).\n\nTuesday, May 27, 2014\n\n\nThe\n[Fetch as Google feature in Webmaster Tools](https://support.google.com/webmasters/answer/158587)\nprovides webmasters with the results of Googlebot attempting to fetch their pages. The server\nheaders and HTML shown are useful to diagnose technical problems and hacking side-effects, but\nsometimes make double-checking the response hard: *Help! What do all of these codes mean? Is\nthis really the same page as I see it in my browser? Where shall we have lunch?* We can't help\nwith that last one, but for the rest, we've recently expanded this tool to also show how Googlebot\nwould be able to render the page.\n\nViewing the rendered page\n-------------------------\n\n\nIn order to render the page, Googlebot will try to find all the external files involved, and fetch\nthem as well. Those files frequently include images, CSS and JavaScript files, as well as other\nfiles that might be indirectly embedded through the CSS or JavaScript. These are then used to\nrender a preview image that shows Googlebot's view of the page.\n\n\nYou can find the\n[Fetch as Google feature](https://www.google.com/webmasters/tools/googlebot-fetch)\nin the Crawl section of\n[Google Webmaster Tools](https://search.google.com/search-console).\nAfter submitting a URL with \"Fetch and render,\" wait for it to be processed (this might take a\nmoment for some pages). Once it's ready, just click on the response row to see the results.\n\nHandling resources blocked by robots.txt\n----------------------------------------\n\n\nGooglebot follows the [robots.txt rules](/search/docs/crawling-indexing/robots/intro)\nfor all files that it fetches. If you are disallowing crawling of some of these files (or if they\nare embedded from a third-party server that's disallowing Googlebot's crawling of them), we won't\nbe able to show them to you in the rendered view. Similarly, if the server fails to respond or\nreturns errors, then we won't be able to use those either (you can find similar issues in the\n[Crawl Errors](https://support.google.com/webmasters/answer/35120)\nsection of Webmaster Tools). If we run across either of these issues, we'll show them below the\npreview image.\n\n\nWe recommend making sure Googlebot can access any embedded resource that meaningfully contributes\nto your site's visible content, or to its layout. That will make Fetch as Google easier for you\nto use, and will make it possible for Googlebot to find and index that content as well. Some types\nof content---such as social media buttons, fonts or website-analytics scripts---tend not\nto meaningfully contribute to the visible content or layout, and can be left disallowed from\ncrawling. For more information, please see our previous blog post on\n[how Google is working to understand the web better](/search/blog/2014/05/understanding-web-pages-better).\n\n\nWe hope this update makes it easier for you to diagnose these kinds of issues, and to discover\ncontent that's accidentally blocked from crawling. If you have any comments or questions, let us\nknow here or drop by in the\n[webmaster help forum](https://support.google.com/webmasters/threads?hl=en&thread_filter=(category:search_console)).\n\nPosted by Shimi Salant, Webmaster Tools team"]]