Old Search Engine
Old Search Engine, the Library, Tries to Fit Into a Google World
By KATIE HAFNER, New York Times, June 21, 2004 - original
SAN FRANCISCO, June 20 — Katarina Maxianova, who received her bachelor's degree in comparative literature from Columbia University in May, took a seminar last year in which the professor assigned two articles from New Left Review magazine. She found one immediately through Google; for the other, she had to trek to the library stacks.

"Everyone in class tried to get those articles online," she said, "and some people didn't even bother to go to the stacks when they couldn't Google them."

For the last few years, librarians have increasingly seen people use online search sites not to supplement research libraries but to replace them. Yet only recently have librarians stopped lamenting the trend and started working to close the gap between traditional scholarly research and the incomplete, often random results of a Google search.

"We can't pretend people will go back to walking into a library and talking to a reference librarian," said Kate Wittenberg, director of the Electronic Publishing Initiative at Columbia University.

Ms. Wittenberg's group recently finished a three-year study of research habits, including surveys of 1,233 students across the country, that concluded that electronic resources have become the main tool for information gathering, particularly among undergraduates.

"We have to respond to these new ways," Ms. Wittenberg said, and come up with a way to make better research material available online.

That means working with commercial search engines like Google and Yahoo to make ever more digital-research materials searchable.

Undergraduates like Ms. Maxianova and her classmates are not the only ones conducting research from their computers. Faculty members also do it.

"One of the rarest things to find is a member of the faculty in the library stacks," said Paul Duguid, an information researcher who will teach a class this fall at the University of California, Berkeley on judging the authenticity of information found on the Web.

In the Columbia survey, 90 percent of the faculty members who responded said they used electronic resources in their research several times a week or more. Nearly all said it was a valuable resource.

While the accuracy of online information is notoriously uneven, the ubiquity of the Web means that a trip to the stacks is no longer the way most academic research begins.

"The nature of discovery is changing," said Joseph Janes, associate professor and chairman of library and information science at the University of Washington. "I think the digital revolution and the use of digital resources in general is really the beginning of a change in the way humanity thinks and presents itself."

A few research librarians say Google could eventually take on more of the role of a universal library.

"If you could use Google to just look across digital libraries, into any digital library collection, now that would be cool," said Daniel Greenstein, university librarian of the California Digital Library, the digital branch of the University of California library system.

"It would help libraries achieve something that we haven't yet been able to achieve by ourselves," Dr. Greenstein said, "which is to place all of our publicly accessible digital library collections in a common pool."

The biggest problem is that search engines like Google skim only the thinnest layers of information that has been digitized. Most have no access to the so-called deep Web, where information is contained in isolated databases like online library catalogs.

Search engines seek so-called static Web pages, which generally do not have search functions of their own. Information on the deep Web, on the other hand, comes to the surface only as the result of a database query from within a particular site.

Use Google, for instance, to research Upton Sinclair's 1934 campaign for governor of California, and you will miss an entire collection of pamphlets accessible only from the University of California at Los Angeles's archive of digitized campaign literature.

"Google searches an index at the first layers of any Web site it goes to, and as you delve beneath the surface, it starts to miss stuff," said Mr. Duguid, co-author of The Social Life of Information" "When you go deeper, the number of pages just becomes absolutely mind-boggling."

Some estimates put the number of Web pages that are hidden from the view of most search engines at 500 billion.

Reference librarians are trying to bring material from the deep Web to the surface. In recent months, dozens of research libraries began working with Google and other search engines to help put their collections within reach of a broader public.

Carnegie-Mellon University, for instance, has digitally scanned 1.6 million pages of archival material from the papers of Carnegie-Mellon scientists like Herbert Simon, a Nobel Prize winner for economics and a computer chess expert. Now, a Google search for "Herbert Simon and Carnegie Mellon" turns up the Simon papers.

Google has also indexed two million book titles through the Online Computer Library Center, which manages a database of catalogs from 12,000 libraries around the world.

Other search sites are striking similar deals. Yahoo recently signed an agreement with the online library center to index its catalogs, and four months ago, it started carrying out a plan to make more of the deep Web reachable through Yahoo.

Yahoo has also signed agreements with the University of Michigan to make searchable the university's compendium of academic collections from more than 250 institutions. And it has indexed a digital repository at Northwestern University of more than 2,000 hours of Supreme Court oral arguments.

Yet for every archive that has become searchable by commercial Web engines, scores are not accessible. "There's lots of great stuff that isn't available digitally and likely never will be," Dr. Janes said. Most books published before 1995 fit into this category, he said, as do many older magazines, newspapers and journals, as well as historical maps, archives, letters, diaries, older census statistics and genealogical materials.

"We have to figure out how to adapt to a world where people will prefer digital stuff," Dr. Janes said, "yet not forgo the investment in print and analog collections and the work involved in mapping and maintaining those collections."

Research institutions are investing heavily in combining the new with the old. At Columbia's Butler Library, the stacks are not only alive and well, Ms. Wittenberg said, but have been modernized to allow for better physical access to the seven million volumes in the collection.

During the renovation, work areas with network connections were placed throughout the library.

"A student or faculty member could work for a whole day in what looks and feels like a very traditional library, while accessing either the print collection or the large and rapidly growing collection of electronic resources," Ms. Wittenberg said.

Many experts, even those who specialize in digital material, say that losing the tactile experience of books and relying too heavily on electronic resources is certain to exact a price.

"How do you know it's the appropriate universe from which to draw your research materials?" said Dr. Greenstein. "It has huge ramifications for the nature of instruction and scholarship."

At the same time, many research librarians say that the new reliance on electronic resources is making their role as guides to undiscovered material more important than ever.

Thomas Mann, a reference librarian in the main reading room of the Library of Congress, was reminded of this recently while helping a visitor who was researching a famine in Greece that occurred in 1942. A Google search had yielded little useful information.

"While he was looking at newspaper articles from the 1940's that we have digitized," Dr. Mann said, "I set up a search on the terminal next to him in another database of historical abstracts and history journals."

In less than a minute, he pulled up citations for five scholarly articles about the famine and helped the visitor put in requests for the paper copies from the stacks. "We can show people things they don't ask for," Dr. Mann said. "The historical database I got into hit it right on the button."

Some library experts welcome the change with few reservations.

"Although it seems like an apocalyptic change now, over time we'll see that young people will grow up using many ways of finding information," said Abby Smith, director of programs at the Council on Library and Information Resources, a nonprofit group in Washington.

"We'll see the current generation we accuse of doing research in their pajamas develop highly sophisticated searching strategies to find high quality information on the Web," Dr. Smith said. "It's this transition period we're in, when not all high-quality information is available on the Web — that's what we lament."

Dr. Janes said that, like many others, he occasionally pined for the days spent in musty library stacks, where one could chance upon scholarly gems by browsing the shelves.

"You can think of electronic research as a more impoverished experience," Dr. Janes said. "But in some ways it's a richer one, because you have so much more access to so much more information. The potential is there for this to be a real bonus to humanity, because we can see more and read more and do more with it. But it is going to be very different in lots of ways."