PLA Home Page About PLAOrganizationConferences, Events, and Online LearningCommittee Work
Publications and ReportsProjectsResourcesMembers OnlyAwardsNews

Public Libraries

PLDS

publications list

shared resources list

ALA Online Store

audiotapes

Tech Notes

Search Engine Update

Steven M. Cohen

I once saw a cartoon that showed a librarian sitting at the reference desk, and the sign in front of her that normally displays the word "librarian" had been replaced with the phrase "search engine." Many felt that this was an inference that librarians will be replaced by search engines, but I interpreted it in a different, and more positive, way. I truly believe that librarians are the search engines of the Web. If we are being replaced by Google or Yahoo!, then why is the librarian still sitting at the desk in the cartoon?

The job of search engine does not come lightly, however. If we are to take on this monstrous task (and answer more than 15 million queries daily), we need to have as much knowledge about our "search engine' jobs as is humanly possible. This means keeping current with new engines and with how the old standards like Yahoo! and Google are changing on a daily basis, just like we stay up to date in our field by attending conferences, reading publications, and subscribing to electronic mailing lists. Like it or not (and I love it), search engines will continue to become more and more a part of librarianship.

Every year at this time, I like to take stock of what has happened in the search engine world over the past year. What engines are still around? Which engines have changed? Which ones do I use the most for certain queries? How can I perform the best possible search without being too broad or too nar- row? How can I maximize my input to retrieve a better output? After checking over the engines that I use most often, I have put together the following update.

Google, the librarians' engine of choice, has had a big year. First, they released the full version of Google Groups, which encompasses more than 700 million usenet posts dating back to 1981. 1 have used this resource to answer many a reference question, so keep it in mind. Second, a beta version of Google Catalogs was shown to the public. Google has catalogued more than four thousand catalogs, created and organized each one into a directory, and made them fully searchable. As of this writing, this feature is still in beta, but it promises to be a wonderful tool. In March of this year, Google released a beta version of a news search engine. While it only indexes a portion of the content of more experienced news engines (see Rocket News, and World News), it does have potential. (Has Google let us down yet?)

For the Web searcher, there was no better news to come from Google than when they released their Application Programmers Interface (API) service, which allowed millions of hardcore programmers to search the entire Google database within their own programs. Basically, the makers of Google gave permission to anyone who has the relevant knowledge to go out and create their own individual search interface using their database. The search capabilities that followed were a dream to many librarians. For example, I have always wanted to see the new sites that were added to the Google index each day but was never able to limit the search results by date. Using the Google search created by Fagan Finder, I can now limit the search to the dates that Google indexed the sites in their database. There are two important reminders that need to be addressed before using this search interface. First, Google has stated that they can't be held responsible for this type of search, as it is still in beta. Second, Google reindexes more than a million of the sites already in their database every day, so a basic search using this resource may produce too many hits. The webmaster at Fagan Finder has also created the Ultimate Google Interface, which brings together all of the possible different search capabilities using Google in one interface.

One of the great aspects of searching a commercial database such as LexisNexis is the intense search capabilities, such as proximity searching. If I only want stories that include terms that are within four words of one another, I can do that. This greatly narrows down the results, thus reduc- ing the amount of information to sift through, saving precious time. Until recently, there were no public Web search engines that provided this type of service. When Google released their API service, this was one of the first search criteria that I looked for. After a few weeks I came across a site located at http://www.staggernation.com/cgi-bin/gaps.cgi. Here users can search Google and find words or phrases that appear within one, two, or three words of one another, plus any other terms that are needed.

While Google is the tool of choice for many, if not all, librarians, it is not the only search engine that made news over the past year (and should not be the only one used by searchers). Teoma, bought by Ask Jeeves last year, released the full version of its engine in March of this year to rave reviews. There are three parts to the results provided by this engine. First, relevant Web sites are displayed. Second, Teoma displays possible suggestions to narrow down the search, which is of use for patrons who search the Web using broad terms. Third, "expert resources" are provided. Teoma has described these as sites that "feature lists of other authoritative sites and links relating to the search topic." A lot has been discussed about Teoma in the popular press and online media, and many think that if there is one search company that has the capabilities to overthrow Google as the search engine of choice, then Teoma will be it. I believe this to be true. While it needs improvement (search experts have noted that the index is not as fresh as it could be, meaning many of the sites are not reindexed frequently), librarians need to be aware of its capabilities and features.

Another relatively new engine is Vivisimo, which has caught the eye of many Web searchers. Vivisimo does not consider itself a search engine, but rather a "clustering engine," in that it collects search results from other engines, organizes the results, and serves them up in an easy-to-use interface. There are two aspects of this product that are worth mentioning. First, Vivisimo uses the folder method made popular by Northernlight, which has since ceased its public search engine. After submitting a query, Vivisimo will place all of the hits into these folders on the left side of the screen. Librarians in particular will like this tool as it helps to narrow down the search, and each folder has many subfolders that contain fewer hits. The second aspect of Librarians need to continue to stay current With search engine developments as they may effect the results of any search. Vivisimo is what I believe to be the future of search engine results. After performing a search, users have the option to open up the URLs that are displayed in a new window (which opens up a new browser), full window (which opens the URL in the current browser), and preview window (which will open up the URL within the current browser). The preview aspect enables the user to view and browse sites while in the search engine interface, saving precious navigation time. While the Vivisimo results are not that extensive (it does not supply hit results from Google or Teorna), its ease of use makes it recom- mendable for the beginning searcher. They have recently integrated their search with one of my favorite news engines, World News, which applies the Vivisimo search technology to query the World News index.

A brief update on All the Web: During the writing of this article, this engine had a complete overhaul to its interface, making life easier for the searcher. If you are not a regular user of All the Web, I would suggest you take a closer look as there are aspects to this engine that Google has not been able to match, such as the extensive news (Google's news search is still in beta), video, mp3, and ftp searches. I use the site primarily for news purposes, but the main engine is a great backup if Google lets you down (and tha has happened to me many times). Last, their database is one of the tops in size and freshness according to Search Engine Showdown, a popular search engine comparison site.

There are many other engines that have been updated, created, and ceased over the past year and are worth discussion, but the ones described above are the ones that will affect the searcher the most. Librarians need to continue to stay current with search engine developments as they may affect the results of any search. Two sites that will help the searcher with currency are the Virtual Acquisition Shelf and News Desk and Pandia weblogs. Happy searching!

Steven M. Cohen is Assistant Librarian at the law firm of Rivkin Radler, LLP. He can be reached at Steven.Cohen@Rivkin.com.

Reference List

All the Web http://www.alltheweb.com
Ask Jeeves http://www.ask.com
Fagan Finder Google Date Search http://www.faganfinder.com/engines/google.shtml
Fagan Finder Ultimate Google Interface http://www.faganfinder.com/google.html
Google http://www.google.com
Google API Proximity Search http://www.staggernation.com/cgi-bin/gaps.cgi
Google Catalogs http://catalogs.google.com
Google Groups http://groups.google.com
Google News Search http://news.google.com
LexisNexis http://www.lexis.com
Northernlight http://www.northernlight.com
Pandia Weblog http://www.pandia.com/searchworldlindex.html
Rocket News http://www.rocketnews.com
Search Engine Showdown http://www.searchengineshowdown.com
Teoma http://www.teoma.com
Virtual Acquisition Shelf and News Desk http://resourceshelf.freepint.com
Vivisimo http://www.vivisimo.com
Vivisimo World News Search http://vivisimo.com/demos/WorldNews.html
World News http://vvww.wn.com