Google's Total Library: Putting The World's Books On The Web

Google's Total Library Putting The World's Books On The Web

Two years ago Google, the Internet search firm, began scanning hundreds of thousands of books and making their contents available on the Web. Could this signal the end of libraries as we know them?

Von Malte Herwig

28.03.2007, 12.56 Uhr

The room is filled with an uncommon sense of peace and beauty. At Oxford University's Bodleian Library, old folios stand side-by-side on oak bookshelves in the warm afternoon light. The chains librarians once used to shackle them to the tables to prevent them from being stolen were removed long ago. The books are free, and soon they will embark on their longest journey yet.

Everything that the books have held between their worn cardboard, leather or linen covers for centuries will soon end up on a small memory chip. The books will open up and their contents will become part of that gigantic river of information known as the Internet.

Librarians are not famous for spontaneous displays of emotion, but on this morning Sarah Thomas is an exception. "The digitization of books will accelerate the emergence of new knowledge tremendously," says the 58-year-old director of the Bodleian Library. The windows in her office look out on the sun-drenched spires of the library, founded in 1602 -- a glorious backdrop exemplifying old Europe and its scholarly tradition.

But Thomas is also looking forward to a completely digitized future, one which seems within reach now that the library has joined forces with American search engine Google. Under the terms of a 2005 deal, Google will digitize the library's collection as part of its Google Book Search project -- and the dot-com firm works at an astonishing pace. "Thanks to Google," says Thomas, "we can digitize more books in 12 months than we could otherwise do in 15 years."

Though not the only one of its kind, the Google project is enormous in scope, and it transforms otherwise level-headed academics into enthusiastic utopians unafraid to display their passion. Reg Carr, Thomas's predecessor in Oxford, saw the project as an opportunity to create "a better world for all."

Google plans to scan a total of about 1 million volumes from Oxford's library shelves. Stanford, Harvard and, more recently, the Bavarian State Library have also joined the program.

It is meant to be a win-win situation for both sides. The libraries receive digital copies of hundreds of thousands of their books from Google, quickly and free of charge, and the search engine is able to improve the quality and relevance of its search results.

Constantly astonished

Klaus Ceynowa, 47, Deputy Director General of the Bavarian State Library, prepared the Munich deal in secret negotiations. "It's nice when you don't have to constantly try to desperately scrape together money," he says, sitting in his Munich office, looking visibly relaxed.

Ceynowa believes the criticism and derision Google has faced over the occasional inferior-quality scan is unfair. Nevertheless, Google has gotten away with some major slip-ups. For example, there are scanned book pages which show more of the elaborately painted fingernails of Google employees than the text itself. But, says Ceynowa, Google overcame the initial hurdles long ago. "We are constantly astonished by the innovative input coming from Google," he says. "They get better at it every day."

Google is not disclosing any of the details of the joint project and has sworn the participating libraries to secrecy. One of the few details which are known is that Google is using equipment it developed itself to scan the books in carefully guarded buildings near the respective libraries.

There is a reason for all the secrecy: The company fears copycats as much as it does critics. Instead of working with fully automated scanning robots that protect the books, Google uses an army of workers. "Quality isn't that important to them, because they're currently the top dog in this market," says an industry insider, whose company is one of the market leaders in the field of scanning technology.

"Our product philosophy," says Google executive Jens Redmer, "is that we would rather get products going and optimize them as we go along, than never start them in the first place. We can process very large quantities while maintaining high quality."

But at the end of this monumental effort, the libraries could find themselves with many unusable digital copies in which pages are missing and passages are out of focus and illegible. To protect its lead, the powerful technology company prefers speed over quality. The motto of its scanning project could be summed up as: scan first and ask questions later.

Creating a Digital Utopia

Google doesn't seem bothered by legal challenges either. The company invokes the "fair use" doctrine of American copyright law and is unperturbed over the lawsuit the Association of American Publishers (AAP) and a number of large publishing houses have sought to launch. The plaintiffs claim that Google is infringing copyrights by not obtaining permission to scan the enormous library holdings, including many books that may still be copyright-protected.

The lengthy case will likely end in a settlement. "The actions filed are a business negotiation that happens to be taking place in the courts," says a Google spokeswoman. Many of the plaintiffs are already collaborating with Google, but in the second phase of the project, in which books are scanned from the publishers' list of titles and made searchable.

However, the search engine only shows excerpts and a link to buy the book -- in effect, free advertising for the publishing houses. In any case, the suit only applies to American libraries -- in Oxford and Munich, Google is only digitizing books that are out of copyright.

If it comes to a settlement, Google could pay the publishing houses more than they would be awarded if they won their case in court, thereby creating a precedent that could deter competitors without Google's deep pockets. Given its current market value of well over $130 billion, even a generous compensation package would be practically petty cash for the California-based company.

This is not the first time Google has received negative press. The once-idealistic startup turned into a powerful corporation long ago. And power quickly arouses suspicion among Internet users, as the example of Microsoft shows. Google is everything but shy when marketing interests are at stake. The company's lawyers have already contacted the editors of the definitive German dictionary, the Duden, in an effort to change the entry for "googeln," the German verb meaning "to google." In response, say sources at the publishers, Duden made it clear to Google that changing the entry wasn't exactly on the agenda.

A powerful ally

The fundamental question remains: Isn't it a bit risky to entrust the universal wisdom stored in libraries to a private company? Can a company that has a virtual monopoly in the search engine market and guards the details of its search algorithms the way Coca-Cola protects its recipe be expected to democratize knowledge?

For the time being, at least, Google is indispensable as a powerful ally in creating a great utopia: the digital university library of the future, making humanity's entire body of knowledge accessible to everyone.

This library would represent the culmination of a democratization of knowledge that began with the invention of printing. The little Google search window would be the gateway to the content of the 32 million books, 750 million articles, 25 million songs, 500 million images, 500,000 films, 3 million television programs and 100 billion public Web pages that Wired writer Kevin Kelly estimates humanity has published since the days of Sumerian clay tablets. To store all of this gigantic volume of data -- estimated at 50 petabytes -- would still require a building the size of a small town's library, Kelly wrote in a 2006 article for the New York Times. But in the future, all of that knowledge will be only a mouse click away -- and will fit on a single iPod.

The practical aspect of the system would be that millions of Internet users could achieve what a handful of librarians would never manage -- the networking of book information through links and tags on the Internet. This digital library would be a giant collection of relationships, in which anyone could communicate with anyone else, and in which books could be disassembled into their components, linked to one another, reassembled, marked, analyzed, referenced and criticized.

But the system could also turn into an indiscriminate jumble of information. Instead of leading us into enlightenment, the random barrage of data could end in digital decadence, or what Friedrich Nietzsche called the "anarchy of atoms." In "The Case of Wagner," Nietzsche criticized the "literary decadence" of his day: "The word becomes sovereign and leaps out of the sentence, the sentence reaches out and obscures the meaning of the page, and the page gains life at the expense of the whole." The whole, Nietzsche complained, is no longer a whole. His words sound like a foreshadowing of hypertext on the Internet.

And what happens to books when the last text has been scanned, the last word stored? Will the great libraries become disembodied, empty cathedrals of knowledge where computers hum away and the pale light of monitors illuminates the faces of readers?

Munich librarian Ceynowa says that although he would never want to read Immanuel Kant's "Critique of Pure Reason" on a computer screen, today's young people are different: "If they can't find it on the Internet, they think it doesn't exist."

But Sarah Thomas thinks it's too soon to write off the book yet. "The book is a long-lived technology," she says, pointing to the massive walls of Oxford's old library. "For centuries people have gathered here to do research and exchange opinions. In the future the library will continue to be a place where a community meets -- just more open than it was before."

Translated from the German by Christopher Sultan

Creating a Digital Utopia

Mehr lesen über