An interview with Seth Earley about Linked Data
The role that taxonomies can play in Linked Data applications.
Earley & Associates is one of the biggest names in taxonomy development, and founder Seth Earley will be giving a talk on Building a Practical Semantic Framework: The role of taxonomies and controlled vocabularies in data integration at the Linked Data Planet conference next week. My recent reading makes the world of taxonomy development look a lot more mature than the ontology development that plays such a significant role in the semantic web, especially in terms of identifying concepts and relationships in a way that helps businesses achieve specific goals. I interviewed Seth via email to learn more about his company and their relationship to the burgeoning world of Linked Data techniques and practices. (As a side note about taxonomies and Linked Data, I recently learned from Kingsley Idehen's blog about a very interesting Linked Data application of one of the most important taxonomies in the US: the Library of Congress Subject Headings. If you follow the links in his bulleted list, remember to do a View Source on them.)
1. Tell me a little about your company.
Earley & Associates delivers consulting and applications development services that help companies leverage internal expertise and knowledge creating capabilities. We specialize in:
- Enterprise taxonomy development
- Content management & Knowledge management
- Technology advisory
- Search strategy & integration
- Change management & governance
- Training & workshops
We are a small company of around 15 full time consultants but we work with all sizes and types of organizations. Some of our recent clients include:
- Motorola
- The Hartford
- The Ford Foundation
- Hasbro Inc.
- The Coca Cola Company
We are recognized within the industry as thought leaders and many of our consultants speak regularly at conferences and workshops including:
- Enterprise Search Summit
- Enterprise3 Portals, Collaboration & Content
- Taxonomy Bootcamp
- KM World & Intranets
We also maintain a regular CoP call series covering a diverse range of topics from search, taxonomy & metadata to usability testing and web analytics.
"The most important aspect of the question is deciding what the real application of either taxonomy or ontology will be, and making sure you have the metrics in place to be able justify the effort it takes to develop either one."
2. What does the idea of Linked Data mean to you?
I think Linked Data is really an extension of concepts and questions that we have been dealing with in the information management field for years. Which is to say, how can we make meaningful connections between the information that we use to do our work? How can we understand it within a context?
In the case of Linked Data, we are attempting to expand this notion of connections or linking from strictly web pages and documents to structured data and other types of resources that can be represented through RDF, and making those connections explicit.
3. What can Linked Data practices and technologies bring to the challenges that Earley & Associates clients are facing?
For the most part our clients have come to recognize the incredible challenge of creating a shared semantic framework within their organization. In this case, we understand the term semantic, not in reference to the semantic web, but in relation to a controlled vocabulary that has a particular meaning to an organization and the content it manages.
In our experience most organizations are not at the level of IM maturity that linked data practices are really relevant to their current needs.
That being said, there is incredible potential for linked data technology to create a richer information environment both on the semantic web and in the organization. The explicit nature of the links made using RDF certainly present a new level of granularity in defining the relationships of one item of content to another.
4. How would you distinguish "Linked Data" projects from "Semantic Web" projects? Or would you?
I suppose it’s possible to invest in linked data projects that are enterprise focused, in that the information lives outside the semantic web behind a firewall. However, the main driver around the creation of linked data is to build the semantic web and create links between disparate data sources. I think the business case is really still in its early stages.
5. Semantic Web discussions often bring up the role of ontologies. Is it possible to differentiate between the potential roles of taxonomies and ontologies in Linked Data and/or Semantic Web efforts?
The line between what is possible to represent with taxonomy and what is possible to represent in an ontology is a fuzzy area. Taxonomies, in a traditional sense, are solely hierarchical in nature, representing a general to specific relationship, whereas an ontology is capable of representing a much larger range of relationships.
However, in our work with clients developing taxonomies, the inclusion of polyhierarchical relationships, as well as reciprocal "see also" relationships, have become commonplace. Now these types of relationships certainly fall outside of the most traditional taxonomy definitions but also fall short of the complexity that can be modelled with RDF and OWL.
I think the most important aspect of the question is deciding what the real application of either taxonomy or ontology will be, and making sure you have the metrics in place to be able justify the effort it takes to develop either one.
6. Some enterprises have already invested in taxonomies. How can they leverage this in Linked Data projects?
This really comes down to the nature of the taxonomy itself. Proponents of the semantic web recommend the use of standard vocabularies (e.g. FOAF SIOC DOAP, etc.) for representing content.
If the taxonomy that an organization has already invested in is representing a very specific and organization-centric domain of information, their may be a lot of work required to align it with standardized vocabularies recommended for the semantic web.
Again, I think it comes down to planning and alignment of effort with an overall information strategy. Anytime you decide to describe a piece of information so that it can be shared, you enter a highly charged and political world. Building a taxonomy is as much about understanding people as it is content. If that understanding can be shared through a linked data project, then great. However I would suggest that a key priority of most organizations is still understanding what the value and meaning of their own content is to them.
Comments
(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)
As someone who does with ontologies roughly what Seth's company does with taxonomies, I commend him for a very clear, fair, and honest assessment (by my lights) of the differences between taxonomies and ontologies. I couldn't agree more than neither is better than the other per se, and which you need depends on use cases, resources, and other engineering tradeoffs.
That said, I don't agree at all with Bob's claim about relative maturity of development tools or available taxonomies versus ontologies; but, then, I wouldn't tend to. ;>
Posted by: Kendall Clark | June 18, 2008 2:57 PM
Hi Kendall,
I never said anything about tools--in fact, the more I study the taxonomy tools out there, the more I think that the free combination of SWOOP and SKOS has a lot more value than several taxonomy development products that cost hundreds of dollars. Someone just needs to write out a bit of code to spit out some of the standard reports that those tools typically offer. (Hello, lazy semweb...)
There probably are more available ontologies than taxonomies out there, but that's part of the problem. While companies like Clark Parsia are basing client ontologies on serious analysis of the client's business goals and needs, the "if you build it maybe they'll come" thrown-together ontologies have really multiplied like rabbits out there over the last few years. I used the term "maturity" because carefully codified taxonomies designed to aid the management of information have been around a lot longer than their ontological equivalents.
Posted by: Bob DuCharme | June 18, 2008 9:57 PM
Bob, okay, I take yr point to make more sense now that I understand what you were saying. But on the marketing level, it's a bit of suckage that *most* of the "ontologies" you are talking about are RDF Schemas, and really just, technically, taxonomies, rather than full-on ontologies. And yet, marketing-wise, you're calling them "ontologies" which implies suckage of the wrong technology! :>
More seriously, there's a lot of crap out there, OWL, RDF, XML, SQL DDL, etc. I don't think any of those technologies is any more prone to crap than any other, not in the aggregate. OWL is probably the hardest, but then it *seems* hard, which tends to warn off people who don't really know what they're doing.
That make any sense?
Posted by: Kendall Clark | June 24, 2008 3:58 PM