An Elephant Backs Up Google’s Library

Google often says that it likes to take the long-term view of things. But Google’s idea of long term does not appear to be long enough for some librarians, who tend to equate long term with forever, at least when it comes to preserving books.

On Monday, a group of major libraries that are participating in Google’s Library Project, said they are working together to create what amounts to a publicly accessible backup of the digital library that Google is creating. The project, which is called HathiTrust, includes libraries at 12 Midwestern universities like the University of Michigan, the University of Iowa and the University of Illinois, plus the 11 libraries of the University of California system. (Hathi is Hindi for “elephant,” an animal that is said to never forget.)

In the Google Books Library Project, the Internet giant has been scanning the collections of several large libraries. The company gives users access to the complete text of books that are in the public domain, and to snippets of books that are protected by copyrights. Google also gives each library a copy of the books it digitized from that library.

“Google is an excellent partner,” Paul Courant, university librarian and dean of libraries at the University of Michigan, said in an interview. “They are a corporation with a responsibility to its stockholders. Google could last 50 years, 100 years, 1,000 years. We are academic institutions with a commitment to the preservation and use of scholarship and the scholarship record for the indefinite future.”

Mr. Courant said that the majority of the 2 million or so volumes already in HathiTrust were digitized by Google. HathiTrust also includes fragile and hard-to-find books that the libraries have digitized on their own, he said.

HathiTrust is an agreement by several of those libraries — but not all of them — to pool their digital copies into one giant database that will be accessible online. It will not make snippets of copyrighted works available, as some authors and book publishers said that amounts to copyright violations, but will allow users to search those texts.

“I am hoping others will join, and I am hoping that we will be a destination for hundreds of thousands of people on the Web,” Mr. Courant said. “Google will probably be better than we are at large-scale consumer applications.” But Mr. Courant said that for some services aimed at scholars “we’ll be as good or better than them at that.”

Comments are no longer being accepted.

I am happy that they used the word ‘hathi’. It really means elephant in Hindi. And as an Indian, it will be easier for me to remember it

I love it that this project, although originally opposed by those who wish to control free exchange of ideas, is going strong. There are tons of quality books in the public domain and out of print.

I commend the parties for their efforts. This may not be as sexy as YouTube, but it has a longer-lasting effect on real added value to the Internet.

Now what we need are long-lasting *open* standards for documens for storing the scanned books. I have lost use of too many documents because of parties like Microsoft playing with proprietary document formats (I have produced and exchanged electronic documents for a quarter-century now). It is really a pain to have to reformat your documents every few years.

what is different between someone going to the public library and checking out a book to read and that same person going to google.com and digitally checking out a book?

Thanks to the universities. Archiving the internet is a large problem. This knowledge should be preserved.

What an excellent project. However, I hope it will not lead to the loss of libraries as buildings where we can go with our children & grandchildren and spend a quiet hour ot two, sharing some of the richness on the shelves.

This is an excellent idea and needs to be emulated in a country like India. We have few libraries and those in larger towns only. If books in our languages are scanned and made available on line, our younger generation will benefit as computer and internet is penetrating our countryside fast. Same will be the case with other developing nations.

Another reader was right on point: While I am excited about the possibilities of having access to more books at more libraries, I hope this isn’t the beginning of the end of the joy of physically visiting the library. Part of the enjoyment of a book for me is roaming the aisles, holding the book, and flipping through the pages.

#3: “what is different between someone going to the public library and checking out a book to read and that same person going to google.com and digitally checking out a book?”

The facts that your “checked out” digital copy can be yours forever, and multiple people can check out the same copy at the same time.

Not that I totally oppose literature digitization, but the rights of the copyright holders do need to be addressed…..

A huge leap in the right direction, all that information “just a click away”. I for one have not stepped foot in a library for over a decade now, and now, i wont have too.
i know a lot of people are cursing me right now, thinking i am celebrating the demise of the traditional library, i am not though. the traditional library now needs to be the keeper and archive of not just the books of our world but all our knowledge, the libraries need to expand there purpose and lead the way for the global sharing of this vital knowledge base.
don’t get rid of them! make them better and more useful by implementing new technology to manage it all.

Hathi was also a character in the Jungle Books. He’s an elephant – go figure!

This is an exciting and ambitious project, with a terrific name. Of course, I am a bit biased as one of the founders of ElephantDrive – an online backup and storage service. Is there any information available on the storage architecture they plan to use? ElephantDrive would be more than happy to collaborate and we have been working on this specific problem for a very long time.

Don’t forget Project Gutenberg. They have been going strong for years with the help and effort of wonderful volunteers!

These sorts of projects usually don’t get done because so many do not consider long-term contributions to culture to generate enough immediate profit or glory. This is the sort of ‘quiet’ inspired concern for the future of humankind that makes me proud to be a librarian. Will it keep us from a Dark Age? Let’s hope.

What a wonderful project!
Is there any way that we non-academic bibliophiles can help?
I have a reference library of hundreds of art, design and photography books, some of them unusual, that I have collected over the past forty plus years. I would be happy to share them in whatever manner it would be legal to do so.

Someone needs to define “indefinite future.” Let’s say 10,000 years. During that time this civilization will no doubt die, and a few others will rise and fall. How long will these libraries last? How long will state governments last? How long will universities last? How long will the English language last?

Read Miller’s “A Canticle for Liebowitz.”

Exciting project!
I just hope that it stays open and accessible to everyone. If successful, it will be remembered as one of the significant achievements of human civilization in the 21st century.
Just imagine the reach this kind of work could have on the world.

They should consider writing the material to stable microfilm stock and packaging it with a lens. How long will the digital equipment be available? A few dozen years at most?

I don’t see what is new in this article. Wasn’t this project always multi-institutional, always giving back to the donor library a copy of what was scanned and presenting to users snippets or whole text depending on copyright, regardless of who owned the paper?

Is it saying that now there will be a hot backup to Google? What has changed?

By using the word Haathi, Google seems to be invoking goodwill among India, land of the biggest community of developers and software people. Unlike the Bill Gates Foundation’s work , it costs nothing to a company famous for public relations cleverness. Also isn’t this move to to online media going to help Google /Search Engines tremendously to search for content. Maybe they will create Google Gadha libraries consisting of just plain PC’s running chrome. Gadha is Hindi for Donkey- An animal that is said to work hard.
//www.decisionstats.com

I think that a much better name would have been Elephant. So many more people would know right away what it means. Hathi is cute but meaningless for too many people. I for one, having reached the age where the intrusion of “senior moments” is pervasive, would have liked something easier to remember.

With respect to the project, what I understand is most valuable is that, at least in one of the Libraries involved, the actual “book-artifact” is also preserved and the content is available independently of the current technology. The “book-artifact” has value by itself and paper can be read mostly for a long, long time. Some digital products of the past are now lost for ever if they did not had a printed counterpart. Transferring to a new technology does not always preserve ALL the data. Data gets lost with every “upgrade”. Finally, in my view, the project “preserves” the role of the Library as the place to go find both the piece and the content for us now as well as for future generations

I am all for digital versions of most content and support the “Elephant” project as well as many others. They are crucial for the preservation and the access to knowledge. Most of us need to have access to the content / information as fast as possible. But we owe it to our future generations to preserve and give access to the whole experience. In the future, I can not imagine any other institution playing the role of preserving knowledge that the Library has been doing so well.

Here’s what’s new, per:
//chronicle.com/free/2008/10/5061n.htm

Database will be backed up by contributors, lest Google go bust or lose interest.

Currently each library’s catalog/search system can only search its own contributions; this will change so all contributions can be searched by all contributors (and by anybody with a browser anywhere??)

New services will be added; better easier way to learn what is in the collection; services for blind; enhanced search and display beyond what Google provides.