After following Dave Beckett's pointer to Stefano Mazzocchi essay On the Quality of Metadata last week, I remembered that while we have people like Stefano and Bruce D'Arcus among us with stronger backgrounds in more classical approaches to metadata, most geeks think that technology from ten years ago is ancient history. I'd like to recommend two books I've read recently for the historical background they provide on the creation, organization, and use of metadata to locate information: Peter Morville's Ambient Findability and Elaine Svenonius's The Intellectual Foundation of Information Organization.
Morville's book focuses on "findability" as an engineering discipline. When you create something on the web, it's no use to anyone if they can't find it. While there is a seamier side to the search engine optimization efforts of people who see it as way to get rich quickly (with yet another technology trail blazed by the porn industry), it's a real problem for respectable companies with serious offerings. He writes that
Hewlett-Packard has taken findability a step further by defining a "Findability Group" that includes an interdisciplinary team responsible for user interface design, information architecture, and search, thereby creating a vital bridge across vertical silos. Perhaps we will see more findability engineers and findability teams in the coming years.
He focuses on metadata and classification as ways to improve findability, with an eye on the implications of the new information delivery technologies cropping up around us. While many discussions of metadata that you read briefly mention card catalogs before plowing into talk of RDF and the semantic web, Morville's library science background gives him a broader perspective on the work that's gone on for over a hundred years to create usable metadata. His breathless buzzword slinging ("RFID is a disruptive technology poised to shift paradigms"..."Millions of bloggers swap memes in exchange for karma, whuffie, and other tokens of a reputation economy") makes the book read like something from Wired magazine and will make it look dated pretty quickly, but his efforts to draw on pioneers of information science to inform our approaches to new classes of content delivery systems make his book worth reading.
While "Ambient Findability" may be more of a beach read than the MIT Press book "The Intellectual Foundation of Information Organization," I had no problem reading Svenonius's book on the beach at Cape May last summer. While Morville was a student of library and information science, Elaine Svenonius is a professor of library information science at UCLA, and provides a more sober, rigourous treatment of information organization issues in a book that, without backmatter, is roughly the same length as Morville's.
Svenonius describes practical and theoretical background in the development and use of metadata with a good historical context. The book covers milestones such as Anthony Panizzi's mid-nineteenth century plan for organizing books in the British library, the beginning of library "science" in the 1930s (now known as "Information Science") at the Chicago Graduate Library School, Cyril Cleverdon's invention of the precision/recall distinction in the late 1950s, and the development of Dublin Core. Did you know that Colon Classification (keep your intestinal jokes to yourself and see Yahoo for an example, where they use a greater-than character instead of a colon) was invented in 1924? She shows examples of how the past can teach us lessons about dealing with new technology, especially regarding the politics of standardization:
In the 1940s and 1950s, the Library of Congress also turned to specialists to draft rules for its growing collection of motion pictures, sound recordings, and pictures. The Library of Congress rules proved difficult to use and, as a result, were rejected by most school and public libraries. This led to a proliferation of locally developed manuals to describe nonbook materials, simultaneously abrogating the standardization principle and that of integration.
Early in the 1970s reaction set in, and a swing began away from specialization and toward integration. Committees first in Canada and then in England and the United States began to formulate rules for nonbook materials that would be compatible with those used for books.
It's not all libraries and Dewey Decimal Systems; she covers many topics that are important to data geeks such as keyword searching, faceted classification, issues around creating and imposing controlled vocabularies, mapping of different names for the same entity such as "Mark Twain" and "Samuel Clemens," and especially bibliographies. The word "bibliography" may bring to mind some of the drearier aspects of reading boring books to write boring papers in school, but they're ultimately about the creation and organization of metadata to make a work easier to find, and a lot of sharp people have thought hard about this for a long time.
Her afterword shows how well her perspective extends to the future:
Two trends appear to be dominating current research and development. One is the increasing formalization of information organization as an object of study through mathematical and entity-relationship modeling, linguistic conceptualization, definitional analysis of theoretical constructs, and empirical research. The second is the increasing reach of automation to develop new means to achieve the traditional bibliographic objectives, to design intelligent search engines, and to aid in the work of cataloging and classification.
Her book doesn't read like a Wired magazine article, but it's not long, and it provides great background in how we got where we are with metadata, which is important to know if you're interested in where it's going.
Hi Bob,
FWIW, I read Elaine S.'s great book a few years back, can't remember how I got onto it, except that I had a pretty intense self-study program going for a while (in the 2000-2002 range) on library and information science, and Svenonius's book is so foundational, I just had to read it. Anyone interested in metadata and information architecture should read it, IMO.