Tuesday, December 14, 2004

My 2001 proposal for a digital Library of Congress comes closer to fruition

In 2001 I proposed that the nation should expend resources to digitize the holdings of the Library of Congress. Not a sample of materials on a single subject -- e.g. the American Memory project -- but every page image of every book in the collection -- tens of millions of books, full-text, full image -- terabytes of data.

Roy Tenant, a leading digital libraries expert, and I debated the concept remotely, with him at the Internet Librarian conference in California, and me teledebating from halfway across the world, from the First Monday conference in Maastricht, the Netherlands.

My main argument in 2001 -- and today -- is simple: disk is so cheap, and digital imaging technology is so mature, that you can

conceive of capturing every page of every book or journal in your collection at low cost.

Roy and I reprised this debate the next spring at the Computers in Libraries conference in Washington, D.C. Several hundred people listened to our debate. One of Roy's best lines was "but the Library of Congress contains a lot of junk!" My response: it's cheaper to digitize everything in the collection than to spend staff time deciding what is useful, and what's junk.

You have to understand that Roy is someone who actually creates digital libraries, and I am but a dreamer. So what the hell do I know? Still, I claim: You were right in a way, Roy. Large research libraries contain a lot of junk. But we retain the junk, and the real estate, because we don't know what is junk and what is useful.

A number of people came up to me afterward, including librarians at the Library of Congress, and said I'd put forth a grand vision. Sadly, LC did not pick up the mantle. Happily, three years later, Google has.

Here's a link to my presentation from 2001/2002:

The Digital Library of Congress

And here's Roy's side of the story from our debate in 2002. I must confess, he may still win the argument: this requires a huge amount of human effort, and a huge leap of technological faith. In researching my side of the debate, I found the earlier calculations of Michael Lesk invaluable. He understood the incredibly shrinking cost of disk before most people did.

Until today, every library digitization project was small-scale. No one took on an all-library project; they objected:

  • We can't afford to digitize the entire collection
  • It will take too long
  • We can't overcome copyright concerns

Google to the rescue. It takes a company with the vision, the technological prowess, and the capital that Google possesses, to make this happen.

One of the points I tried to make in my debate with Roy is that it would take a large-scale project, something on the scale of Kennedy proposing that the United States send a mission to the moon, in order to move digital libraries forward. When JFK proposed we send a man to the moon "before this decade is out" no one knew how the hell to accomplish that goal.

I believe this project is every bit as important as the Apollo moon program -- and this time, it's privately financed.

I believe today's news is cataclysmic, certainly in the halls of academe, if not a red letter day in world history.

No comments: