URLs with HTTP errors

November 08, 2005

What can you do about URLs that we tried to crawl but couldn't because we received a 404 (not found) error? (You can see these once you've verified your site by clicking the stats link beside the Sitemap name on the My Sitemaps page.)

You don't have to do anything about them. We'll continue to crawl and index your site and will simply skip pages that return a 404. But here are some things you can do.

If we found the URLs from your Sitemap, the fix is simple. Just modify your Sitemap to list the correct pages and resubmit it.

If we found the URLs by following links, the fix isn't quite as easy. In fact, in some cases, there may be no fix. A webmaster may have liked your site and tried to link to it, but mistyped the URL. You can look for sites that link to your pages and ask webmasters to fix any broken links, but if that sounds like a lot of work, you can instead just focus on your own site.

You may not be able to control inbound links from other sites, but you can control internal links. Make sure that none of these broken links are coming from your site. You can generally check your webserver logs to see what visitors clicked on in your site that returned 404 errors.

It could be that a link points to a non-existent page because that page used to exist, but no longer does. In that case, you can:

  • Make sure that your site doesn't link to any outdated pages
  • Check to see if any of these outdated pages are in the Google index

If the page used to exist, but no longer does, it might still be listed in the Google index. (You can check this by doing a Google search for the URL.) If it is, you can either wait for subsequent crawls (if we continue to get 404 errors when we try to crawl it, it will eventually be removed), or you can request its removal using our automatic URL removal system.

In order to use this system, the outdated page must return a 404 (and if the URL is showing up on your Sitemaps Stats page, it already does). Log in and then choose the Remove an outdated link option. Type in the URL,choose anything associated with this URL and click Remove outdated link. The link will show up in a status area as pending. The page should be removed from the index within 3-5 days and the status will be updated.