Wayback Goes Way Back on Web

Imagine being able to travel back in time to an era when the digital publishing euphoria had just begun and the dot-com boom was in full swing.Now that may be possible with a new digital library tool called the Wayback Machine, which goes “way back” in Internet time to locate archived versions of over 10 […]

Imagine being able to travel back in time to an era when the digital publishing euphoria had just begun and the dot-com boom was in full swing.

Now that may be possible with a new digital library tool called the Wayback Machine, which goes "way back" in Internet time to locate archived versions of over 10 billion Web pages dating to 1996.

The Internet Archive and Alexa Internet recently unveiled the free service, which provides digital snapshots from its archives that reveal the origins of the Internet and how it has evolved over the past five years.

"This will help make use of the cultural artifacts of our day," said Brewster Kahle, founder of The Internet Archive. "It will help people make sense of the world and give accountability to what's been published before."

Archivists are attempting to create permanent, reliable access to websites that otherwise might be lost.

"It's preserving a record of something that otherwise literally vanishes," said Paul Grabowicz, assistant dean at University of California in Berkeley's Graduate School of Journalism. "That is one of the frustrations about the Web."

To use the Wayback Machine, users go to and type in a URL in the provided search box, select a provided date and then begin surfing on an archived version of a selected Web page.

Researchers at Xerox PARC are already using the archive to study new user interfaces and languages on the Web.

The archive will unlock possibilities not just for scholars, but also for Web designers, attorneys and journalists.

"For journalists, it's the equivalent of being able to preserve editions of a newspaper on microfilm," Grabowicz said. "Without this, we wouldn't have (that ability)."

Users can search for past homepages, news pages, dead dot-coms and old online technical manuals for obsolete products.

Avid politicos can find early whitehouse.gov Web pages from 1996 to view news about the Clinton/Gore administration's statement on airport safety and terrorism.

Others can find versions of the original Heaven's Gate website before its members committed mass suicide in 1997 with the impending crash of the comet Hale-Bopp.

The project is funded by the Library of Congress, the National Science Foundation, the Smithsonian Institution and Compaq.

With over 100 terabytes of data growing at a rate of 12 terabytes per month, The Internet Archive's digital library is the world's largest known database, eclipsing the amount of data held in every library in the world including the Library of Congress.

The Internet Archive does a sweep every two months to capture digital snapshots of the Internet. The Archive has also captured sites on a daily basis for specific digital collections, like the 2000 presidential election and the Web Archive of the Sept. 11 Attacks.

However, keeping pace with the rapidly evolving digital landscape is a formidable task.

The average life of a Web page is 100 days, Kahle said. At this rate, "A lot of the best Web pages are out of print."

"It's relatively difficult, technologically, to do this. But it's a drop in the bucket compared to what traditional libraries attempt to do."

While the project attempts to archive the entire publicly available Web, some sites may not be included because they are password-protected or otherwise inaccessible to automated crawlers.

Those who don't want their Web pages to be included in the archive can put a robots.txt file on their site and crawlers will mark all previously archived pages as inaccessible.

The archive has been crawling faster over the years and technology is getting cheaper over time, Kahle said. However, the project is still very much a work in progress.

"We don't know what the right things are to be collecting," Kahle admits. "By making this collection available, we're hoping to find out what we should be collecting to create a library that is of enduring value."

"It's an incredible challenge on a variety of levels," Grabowicz agreed. "Being able to sweep sites on a regular basis, taking snapshots not just once every couple months – that's a huge challenge. How much of the Net do you try to catalog?"

"What lies ahead, as the Internet grows and grows, is more difficult to keep up with."

Discuss this story on Plastic.com

Chronicling Attacks on the Web

Chronicling Attacks on the Web

Who Said the Web Fell Apart?

Who Said the Web Fell Apart?

Digitizing Archives Not So Easy

Digitizing Archives Not So Easy

Is U.S. History Becoming History?

Is U.S. History Becoming History?

Discover more Net Culture

Discover more Net Culture