Gmail Calendar Documents Photos Sites Groups Web More »
Sign in

Robots

You can prevent parts of your site from being indexed by web crawlers by creating a robots.txt file, by using a META tag, or by using HTTP header specifications. Google News crawls with the same robot as Google Web Search, called Googlebot.

If you would prefer not to be included in Google News but want to remain in Web search, Google News respects a robots entry for Googlebot-News, if it is more restrictive than the robots entry for Googlebot. In other words, to be extremely clear:

  • If you block access to Googlebot-News, we will not index your site in Google News.
  • If you block access to Googlebot, we will not index your site in Google News or Web Search.

Creating a robots.txt file

Using a robots.txt file gives you a high level of control over what parts of your site are indexed by Google. You'll find a comprehensive guide to creating and maintaining robots.txt files at our Webmaster Help Center.

Be careful to provide our crawler access to your robots.txt file, so we will know if you've specified certain sections of your site you don't want crawled.

Creating a META tag

Rather than use a robots.txt file to block crawler access to pages, you can add a META tag to an HTML page to tell robots not to index the page. This standard is described at http://www.robotstxt.org/wc/exclusion.html#meta.

If you would like the meta-tag to be applied only to Google News (and not Google Web Search), make sure you use Googlebot-News rather than Googlebot. For each instruction, Google News will follow the more restrictive of Googlebot and Googlebot-news.

  • To prevent all robots from indexing a page on your site, you'd place the following meta tag into the <HEAD> section of your page:

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

  • To allow other robots to index the page on your site, but prevent the Google robot from indexing the page, you'd use the following tag:

    <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

  • To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:

    <META NAME="ROBOTS" CONTENT="NOFOLLOW">

  • To allow robots to index the page on your site but instruct them not to index images on that page, you'd use the following tag:

    <META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">

  • To inform us that an article will expire at a certain time, at which point it should be removed from the Google index, you'd use the following tag:

    <META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2011 15:00:00 EST">

    The date and time must be specified in the RFC 850 format. This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. However, in order for the tag to function properly, it must be included with your article at the time that it is first crawled.

Using HTTP Header Specifications

You can also provide robots instructions in the HTTP header. See here for more information.

Help resources

Suggest news content

If your site is not yet included in Google News, you may request inclusion here.

Tell us how we're doing: Please answer a few questions about your experience to help us improve our Help Center.