Internet Archive will ignore robots.txt files to keep historical record accurate
By Brad Jones
April 24, 2017
Internet Archive
The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.
In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.
“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”
Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.
The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.
The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.
Editors' Recommendations
What is a DNS server? Here’s how the Internet serves up your favorites
Russia won’t go Wayback, blocks the Internet Archive
This Lenovo Chromebook is only $159 for the back-to-school season
Best cheap iPad deals and sales for July 2021
Best cheap gaming chair deals for August 2021: AKRacing, Respawn, and more
Best cheap gaming monitor deals for August 2021
Dell XPS 13, Dell XPS 15 get massive price cuts for back-to-school
Verizon wants to help your small business with these crazy offers
How to watch the Intel Accelerated event today and what to expect
Best VPN deals and sales for July 2021
VPN Free Trial: All the services that offer a free trial in 2021
Surface Book 4: Everything about Microsoft’s most powerful 2-in-1 so far
The larger, more powerful Apple Silicon iMac may not launch until 2022

Upgrade your lifestyle

Digital Trends helps readers keep tabs on the fast-paced world of tech with all the latest news, fun product reviews, insightful editorials, and one-of-a-kind sneak peeks.

Digital Trends may earn a commission when you buy through links on our site.
Copyright ©2021 Designtechnica Corporation. All rights reserved.
Add Us To Your Social Channels
Must Reads
Best Movies on Netflix
Best Shows on Netflix
Best Shows on Hulu
Best Movies on Hulu
Best Shows on Amazon Prime
Best Movies on Amazon Prime
Best iPhone Games
Best Android Games
Popular Downloads
Download Fortnite
Download Skype
Download Winrar
Download Netflix for Android
Download Spotify for Android
Download PC games
Downloads for Windows
Downloads for Mac
Downloads for Android
Who We Are
About Us
Digital Trends Media Group
Diversity & Inclusion
Contact Us
Privacy Policy
Terms of Use
Do Not Sell My Info
DT Español
The Manual
The Angle
Sponsored Content
Advertise with Us
Manage Preferences
Surface Book 4Window 11 UpdateMacBook Pro 2021iMac Pro Rumors