News (publishers) Help

Google Help › Google News (publishers) Help › Updates to your Site › Additional Tips › Robots

Additional Tips: Robots

You can prevent parts of your site from being indexed by web crawlers by creating a Robots.txt file, or by using a META tag. Please keep in mind that the robot we use for Google News, called Googlebot, is the same robot that we use for Google Web Search. This means that any settings you modify for Google News will also apply to Google Web Search. Our other robots, such as Googlebot-Mobile and Googlebot-Image, follow rules you set up for Googlebot, but you can set up additional rules for these specific bots as well.

Creating a Robots.txt file

Using a Robots.txt file gives you a high level of control over what parts of your site are indexed by Google. You'll find a comprehensive guide to creating and maintaining Robots.txt files at our Webmaster Help Center.

Creating a META tag

Rather than use a robots.txt file to block crawler access to pages, you can add a <META> tag to an HTML page to tell robots not to index the page. This standard is described at http://www.robotstxt.org/wc/exclusion.html#meta.

To prevent all robots from indexing a page on your site, you'd place the following meta tag into the <HEAD> section of your page:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
To allow other robots to index the page on your site, preventing only Google's robots from indexing the page, you'd use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">
To allow robots to index the page on your site but instruct them not to index images on that page, you'd use the following tag:

<META NAME="ROBOTS" CONTENT="NOIMAGEINDEX">
To inform us that an article will expire at a certain time, at which point it should be removed from the Google News index, you'd use the following tag:
<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Aug-2008 15:00:00 EST">

The date and time must be specified in the RFC 850 format. This information is treated as a removal request: it will take about a day after the removal date passes for the page to disappear from the search results. However, in order for the tag to function properly, it must be included with your article at the time that it is first crawled.

The information you were looking for?

Yes No

Learn from other Google users

Find answers, ask questions, and share your expertise with others in the Google News Help Forum.

Increase your earning potential

Google AdSense can help you maximize your site's revenue potential. AdSense matches ads to your site's content, and you earn money whenever your visitors click on them. Sign up today.

Suggest my news site for Google News

If your site hasn't been added to the Google News index, please don't hesitate to suggest it for inclusion.

Can't find an answer to your question? Contact us

News (publishers) Help

Additional Tips: Robots

Was this article:

You may also be interested in...

Other helpful articles:

Recommended articles

Learn from other Google users

Increase your earning potential

Suggest my news site for Google News