|
Public
Libraries
PLDS
publications list
shared resources list
ALA Online Store
audiotapes
Tech Notes
|

Search Engine Update
Steven M. Cohen
I once saw a cartoon that showed a librarian sitting at the reference
desk, and the sign in front of her that normally displays the word "librarian"
had been replaced with the phrase "search engine." Many felt that this
was an inference that librarians will be replaced by search engines, but
I interpreted it in a different, and more positive, way. I truly believe
that librarians are the search engines of the Web. If we are being replaced
by Google or Yahoo!, then why is the librarian still sitting at the desk
in the cartoon?
The job of search engine does not come lightly, however. If we are to
take on this monstrous task (and answer more than 15 million queries daily),
we need to have as much knowledge about our "search engine' jobs as is
humanly possible. This means keeping current with new engines and with
how the old standards like Yahoo! and Google are changing on a daily basis,
just like we stay up to date in our field by attending conferences, reading
publications, and subscribing to electronic mailing lists. Like it or
not (and I love it), search engines will continue to become more and more
a part of librarianship.
Every year at this time, I like to take stock of what has happened in
the search engine world over the past year. What engines are still around?
Which engines have changed? Which ones do I use the most for certain queries?
How can I perform the best possible search without being too broad or
too nar- row? How can I maximize my input to retrieve a better output?
After checking over the engines that I use most often, I have put together
the following update.
Google, the librarians' engine of
choice, has had a big year. First, they released the full version of Google
Groups, which encompasses more than 700 million usenet posts dating
back to 1981. 1 have used this resource to answer many a reference question,
so keep it in mind. Second, a beta version of Google
Catalogs was shown to the public. Google has catalogued more than
four thousand catalogs, created and organized each one into a directory,
and made them fully searchable. As of this writing, this feature is still
in beta, but it promises to be a wonderful tool. In March of this year,
Google released a beta version of a news
search engine. While it only indexes a portion of the content of more
experienced news engines (see Rocket
News, and World News), it does have
potential. (Has Google let us down yet?)
For the Web searcher, there was no better news to come from Google than
when they released their Application Programmers Interface (API) service,
which allowed millions of hardcore programmers to search the entire Google
database within their own programs. Basically, the makers of Google gave
permission to anyone who has the relevant knowledge to go out and create
their own individual search interface using their database. The search
capabilities that followed were a dream to many librarians. For example,
I have always wanted to see the new sites that were added to the Google
index each day but was never able to limit the search results by date.
Using the Google search created by Fagan
Finder, I can now limit the search to the dates that Google indexed
the sites in their database. There are two important reminders that need
to be addressed before using this search interface. First, Google has
stated that they can't be held responsible for this type of search, as
it is still in beta. Second, Google reindexes more than a million of the
sites already in their database every day, so a basic search using this
resource may produce too many hits. The webmaster at Fagan Finder has
also created the Ultimate
Google Interface, which brings together all of the possible different
search capabilities using Google in one interface.
One of the great aspects of searching a commercial database such as LexisNexis
is the intense search capabilities, such as proximity searching. If I
only want stories that include terms that are within four words of one
another, I can do that. This greatly narrows down the results, thus reduc-
ing the amount of information to sift through, saving precious time. Until
recently, there were no public Web search engines that provided this type
of service. When Google released their API service, this was one of the
first search criteria that I looked for. After a few weeks I came across
a site located at http://www.staggernation.com/cgi-bin/gaps.cgi.
Here users can search Google and find words or phrases that appear within
one, two, or three words of one another, plus any other terms that are
needed.
While Google is the tool of choice for many, if not all, librarians,
it is not the only search engine that made news over the past year (and
should not be the only one used by searchers). Teoma,
bought by Ask Jeeves last year, released
the full version of its engine in March of this year to rave reviews.
There are three parts to the results provided by this engine. First, relevant
Web sites are displayed. Second, Teoma displays possible suggestions to
narrow down the search, which is of use for patrons who search the Web
using broad terms. Third, "expert resources" are provided. Teoma has described
these as sites that "feature lists of other authoritative sites and links
relating to the search topic." A lot has been discussed about Teoma
in the popular press and online media, and many think that if there is
one search company that has the capabilities to overthrow Google as the
search engine of choice, then Teoma will be it. I believe this to be true.
While it needs improvement (search experts have noted that the index is
not as fresh as it could be, meaning many of the sites are not reindexed
frequently), librarians need to be aware of its capabilities and features.
Another relatively new engine is Vivisimo,
which has caught the eye of many Web searchers. Vivisimo does not consider
itself a search engine, but rather a "clustering engine," in that
it collects search results from other engines, organizes the results,
and serves them up in an easy-to-use interface. There are two aspects
of this product that are worth mentioning. First, Vivisimo uses the folder
method made popular by Northernlight,
which has since ceased its public search engine. After submitting a query,
Vivisimo will place all of the hits into these folders on the left side
of the screen. Librarians in particular will like this tool as it helps
to narrow down the search, and each folder has many subfolders that contain
fewer hits. The second aspect of Librarians need to continue to stay current
With search engine developments as they may effect the results of any
search. Vivisimo is what I believe to be the future of search engine results.
After performing a search, users have the option to open up the URLs that
are displayed in a new window (which opens up a new browser), full window
(which opens the URL in the current browser), and preview window (which
will open up the URL within the current browser). The preview aspect enables
the user to view and browse sites while in the search engine interface,
saving precious navigation time. While the Vivisimo results are not that
extensive (it does not supply hit results from Google or Teorna), its
ease of use makes it recom- mendable for the beginning searcher. They
have recently integrated their search with one of my favorite news engines,
World News, which applies the Vivisimo
search technology to query the World
News index.
A brief update on All the Web:
During the writing of this article, this engine had a complete overhaul
to its interface, making life easier for the searcher. If you are not
a regular user of All the Web, I would suggest you take a closer look
as there are aspects to this engine that Google has not been able to match,
such as the extensive news (Google's news search is still in beta), video,
mp3, and ftp searches. I use the site primarily for news purposes, but
the main engine is a great backup if Google lets you down (and tha has
happened to me many times). Last, their database is one of the tops in
size and freshness according to Search
Engine Showdown, a popular search engine comparison site.
There are many other engines that have been updated, created, and ceased
over the past year and are worth discussion, but the ones described above
are the ones that will affect the searcher the most. Librarians need to
continue to stay current with search engine developments as they may affect
the results of any search. Two sites that will help the searcher with
currency are the Virtual Acquisition
Shelf and News Desk and Pandia
weblogs. Happy searching!
Steven M. Cohen is Assistant Librarian at the law firm of Rivkin Radler,
LLP. He can be reached at Steven.Cohen@Rivkin.com.
Reference List
All the Web http://www.alltheweb.com
Ask Jeeves http://www.ask.com
Fagan Finder Google Date Search http://www.faganfinder.com/engines/google.shtml
Fagan Finder Ultimate Google Interface http://www.faganfinder.com/google.html
Google http://www.google.com
Google API Proximity Search http://www.staggernation.com/cgi-bin/gaps.cgi
Google Catalogs http://catalogs.google.com
Google Groups http://groups.google.com
Google News Search http://news.google.com
LexisNexis http://www.lexis.com
Northernlight http://www.northernlight.com
Pandia Weblog http://www.pandia.com/searchworldlindex.html
Rocket News http://www.rocketnews.com
Search Engine Showdown http://www.searchengineshowdown.com
Teoma http://www.teoma.com
Virtual Acquisition Shelf and News Desk http://resourceshelf.freepint.com
Vivisimo http://www.vivisimo.com
Vivisimo World News Search http://vivisimo.com/demos/WorldNews.html
World News http://vvww.wn.com
|