|
Public
Libraries
PLDS
publications list
shared resources list
ALA Online Store
audiotapes
Tech Notes
|

Metadata: Always More Than You Think
Metadata is often defined offhandedly as data about data
or information about information which doesnt actually
tell you anything. The prefix meta comes from the Greek and
can indicate change, as in metamorphosis; or it can mean beyond
or after, as in metaphysics. In information technology usage, the
word metadata has come to be used as a definition or description of data:
a small indicator that encompasses and points to a larger piece of information.
The library card catalog is the standard metaphor for metadata: each card
represented and led the user to a much larger body of information, the
book or other item cataloged.
So, metadata is a succinct and systematic set of information that
references, and can be used to efficiently and accurately retrieve, a
larger set of information (Robert DeCandido, in his chapter on metadata,
Internet Searchers Handbook, 2d ed). Metadata includes indexing
and cataloging but goes far beyond those as it is applied to electronic
resources: metadata can be used to describe geospatial information, music,
art images. There is as yet no single standard for metadata, although
various groups, both library and non-library related, are working furiously
to develop one.
What does metadata do?
Metadata enables searchers to find information in cyberspace and to see
how it relates to other information. The traditional methods of cataloging
do not work for the Web: the proliferation of data makes that impossible.
So the use of metadata, what Milstead
and Feldman call cataloging by any other name, provides
a way to organize Web information in a way that makes it retrievable.
Librarians have a professional understanding of the need for standardization
which may not be shared by others working on metadata. Milstead and Feldman
write:
Information professionals ... are concerned both with how to write
down the descriptive information and what to write down. In contrast,
most current non-library approaches to this problem address the structure
of data rather than its contents, and this represents a severe shortcoming.
In other words, they are concerned with what fields are established, but
not particularly worried about which terms to put in them.
What does metadata look like?
The metadata set we are going to look at comes from the Dublin
Core, which was created for text-based Web documents by an OCLC sponsored
group. The Dublin Core offers papers, workshops, discussions and working
groups to facilitate the discovery of electronic resources.
The fifteen element Dublin Core set is intended to be usable across a
broad spectrum, with no more complexity than that of a library catalog
card. The Dublin Core Elements are Title, Creator, Subject, Descriptions,
Publisher, Contributor, Date, Type, Format, Identifier, Source, Language,
Relation, Coverage, and Rights, as described by Sutart Weibel in a
report in April 1999.
The Dublin Core Metadata Initiative regularly produces a status report,
and its latest to date is April
2001.
Some of these elements are familiar to anyone who has ever seen a bibliographic
record: title, author, subject. Some of these elements are technical in
nature, covering information crucial for Web-based documents, like file
size. Still others are indicative of the brave new world of cyberspace,
like what entity holds the rights to the material.
The developers of the Dublin Core and the World
Wide Web Consortium, another group working on standardization, are
closely involved with each other, but anyone can invent a metadata scheme,
and groups whose focus is on art images or visual modeling, for example,
are working on completely different element sets. Metadata is being developed
for different kinds of material, but metadata can also be mapped to other
metadata schemes. What that means is, for example, that the Dublin Core
<title> tag can be related to the MARC record title fields. This
kind of mapping holds out some hope for the many metadata schemes being
developed for different needs in differing disciplines. A related OCLC
project is CORC, the Cooperative
Online Resource Catalog, a project exploring the creation and sharing
of metadata by libraries.
On the Web, if you View Source in Netscape, metadata elements may look
like this (below are some of the Dublin Core metadata elements for this
Tech Note).
<META NAME="DC.Title" CONTENT="PLA Tech Note: Metadata">
<META NAME="DC.Subject" CONTENT="Metadata, Public Library
Association, SGML, XML">
<META NAME="DC.Description" CONTENT="A basic description,
with links, of what metadata is and does, for the Public Library Association">
<META NAME="DC.Publisher" CONTENT="Public Library Association,
a division of the American Library Association">
<META NAME="DC.Creator" CONTENT="GraceAnne A. DeCandido">
These tags provide some basic metadatasome basic cataloging informationabout
what you are reading. You will recognize elements that look like a catalog
record: title, author, publisher (PLA).
Metadata can be created when the object it describes is created or it
can be added later, as in traditional cataloging. The profusion and proliferation
of Web pages seem to almost force the former rather than the latter.
Search engines and metadata
So far, none of the major search engines reads Dublin Core metadata.
Some search engines that do (with a little tweaking) include Ultraseek,
Swish-E, Microsofts Index Server, Autonomy Knowledge Server, Blue
Angel Technologies MetaStar, and Verity Search 97 Information Server.
Search engines work by matching query terms to the words in a document
often through a complex algorithm, as Milstead and Feldman point out.
If the terms do not match, the engine doesnt find the document.
The use of metadata elements can resolve that.
The use of metadata elements, by providing consistent tagging with their
own controlled vocabulary, also resolve three specific language problems,
according to Milstead
and Feldman:
Polysemy: words that have multiple meanings like spring
or pitcher. Metadata will enable searchers who want information
about water sources, those who want information about seasonal planting,
and those looking for a new mattress to find what they are looking for
without a lot of extraneous hits. The same is true for those looking for
baseball stats or vessels for orange juice.
Synonymy: words that represent the same concept but with different
shades of meaning, like plump or fat or obese.
Ambiguity: think pitcher, or spring again. Controlled vocabularies
or standardized terms used in metadata schemes can retrieve documents
precisely even if the actual term is never used in its text.
Garden-variety, basic HTML does offer a meta-tag. Meta-tagging
allows the creator to insert words that search engines may pick up,
to increase number of hits and to provide a wider net of words for searchers
to find, but this is an uncontrolled vocabulary indeed. Meta-tags with
irrelevant words, often scurrilous or sexual in nature, can be added to
a Web document in a meta-tag in order to attract more users to the site.
Because of this kind of faux-metadata, called spamdexing, some engines
ignore meta-tags completely.
Metadata elements like the Dublin Core offer a possible method of organizing
and accessing the information on the Web. The challenge is not only to
create a set of elements that can be universally agreed upon, but to foster
widespread usage of those elements.
Large blocks of text, images, correspondence, and other materials that
libraries and researchers wish to place on the Web for access and use
by all are being made so by SGML, a metalanguage that describes a collection
of data in retrievable ways.
SGML, XML
In general discussions of metadata, SGML and lately, XML always come
up. SGML and XML are metalanguages (A
Gentle Introduction to SGML) and may include metadata elements.
SGML stands for Standard Generalized Markup Language. It employs ASCII
code to describe text; virtually all computers understand this code. HTML
is a very simple form of SGML. Using SGML, names, titles, subjects, links
between texts and images, can all be identified and searched. There
is almost no limit to the amount of information and intelligence that
can be added to text [with SGML] giving it the structural advantages of
a database without losing the discursive advantages of text. (Robert
DeCandido).
A type of SGML for encoding archival and library finding aids has been
developed, called Encoded Archival
Description, or EAD. EAD promotes consistency in finding aids in the
same institution and among institutions, so that researchers can search
many finding aids at once regardless of their locationan excellent
example of how metadata can be a vital tool in the quest for information.
SGML, however, can only be read by specialized browsers, so it must be
translated to HTML if it is to have widespread accessibility. SGMLs
limitations as a tool for the Web may be addressed by XML
(Extensible Markup Language). XML is SGML streamlined, relying on
tags that almost always come in pairs, and on a new standard called Unicode
that supports text in all major languages (see also the PLA
Tech Note on Unicode). XML tagged documents can be poured, via stylesheets,
into audio format, print, or Web pages.
Talking about metadata sometimes feels like talking about faerie. Information
about information about information moves into a deeply abstract realm,
more related, perhaps, to epistemology or physics than real words on the
page. Chris Armstrong wrote puckishly, in the January 1999 Information
World Review, that writing about metadata could be described as metametadata.
But metadata may hold the key to finding the precise piece of information
on the Web for which we were searching.
Bibliography
Milstead, Jessica and Susan Feldman. Metadata:
Cataloging by any other name ... in Online, Jan/Feb 1999,
p2431. A very lucid description of what metadata is and does, what
the term means, and what the challenges are in implementing metadata schemes.
Metadata:
Projects and standards same authors, same issue.
DeCandido, Robert, Metadata: Whats it to you? in The
Internet Searchers Handbook, 2d ed., Neal Schuman, 1999.
Bosak, Jon and Tim Bray, XML
and the Second-Generation Web, Scientific American, May
1999, p89-93. What XML is and does, by two people who worked on its development.
The indefatigable Robin Cover maintains the most complete coverage of
SGML/XML issues.
Chepesiuk, Ron. Organizing the Internet: The Core of
the Challenge, American Libraries, January 1999, p60-63.About
the Dublin Core and CORC, the Cooperative Online Resources Catalog.
Dorman, David. Metadata Musings American Libraries,
January 1999, p102. One straightforward page, clearly presented.
Weibel, Stuart. The State of the Dublin Core Metadata Initiative April
1999. Bulletin of the American Society for Information Science,
June/July 1999. p1822. A shorter version of the complete text in
D-Lib Magazine.
Christensen, Deborah. Golden Retrievers School Library Journal,
November 1999. Nice introduction to metadata concepts.
David Dorman, The Season of Metadata at the Annual Dublin Core
Workshop in Ottawa in Computers in Libraries, January 2001.
p2629.
Prepared by GraceAnne
A. DeCandido for the Public Library Association, June 13, 1999; reviewed
April 2000; links updated May 2001. ladyhawk@well.com
The Public Library Associations Tech Notes project grew out
of the desire to continue the work of Wired for the Future: Developing
Your Library Technology Plan by Diane Mayo and Sandra Nelson, published
for PLA by ALA in 1999. Each of the Tech Notes is a Web-published document
of 1,5002,500 words, providing an introduction and overview to a
specific technology topic of interest to public libraries at a particular
point in time. Topics were identified by PLAs Technology for Public
Libraries Committee. Each Note is marked with the date of its completion
and posting, and updates are noted.
Readers comments and suggestions are welcome
and should be addressed to pla@ala.org.
Please use Tech Notes in your subject line.
|