Anyone who follows the XML or semantic web world knows of Uche Ogbuji's work. His presentation Linked Data: The Real Web 2.0. will be one of the first talks on the first day of the Linked Data Planet conference next week; as we prepare for it, I asked him a few questions about his work with Linked Data and the benefits its brought to clients of his company, Zepheira.
Tell us a little about Zepheira.
Zepheira provides solutions for data integration, focusing on Semantic technology. But let me back away a bit from the straight corporate line. Think of how we traditionally deal with data in informatics. We look to fit data into neat partitions, shepherd it along neat lines and fit it into grand unified theory. All good, hard science. The problem is that data isn't so easily quantized. It's a living , temperamental entity that absorbs bits of personality from everyone who touches it. Dealing effectively with data requires art, and at Zepheira we really look to the art of data rather than to the science of code. We think adopting the right conventions for data that accommodate its unpredictable qualities is the key to so many of the problems that have dogged IT, and we believe that the web is the most successful set of conventions in this regard. In general we look to apply web architecture to enterprise problems. This brings us right in line with the Linked Data concept, which is really just a way to distill the essential keys to web architecture in a way any developer could tick off his fingers. At Zepheira we start with such principles as the body of art, and we bring together folks who've have proven themselves as journeymen and masters in this art, and we think this positions us to offer particularly effective solutions to our customers.
Web developers can ease into Linked Data ideas, whereas the original message for the semantic web was focused on a major shift to new technologies that seemed too alien and complex to the average Web developer.
What does the idea of Linked Data mean to you?
To me Linked Data means building on the basic framework of the Web, originally designed for documents. Using a set of four basic principles articulated by Tim Berners-Lee we extend it to provide a similarly same rich information space for granular data. We do so by using semantically rich hooks or translations of the essential data in Web pages, and by creating new Web information sources primarily in semantically rich formats. RDF is the format of choice for Linked Data, but more importantly it is the data model for merging information (for query, "mash-up" and much more)—the physical format can be anything from which RDF-like semantics are readily extracted.
What makes Linked Data exciting is that it is a vehicle for the future (semantic web) without straying too far from what has worked so well in past and present. Whether they come in through the door of Microformats, Web feeds or JSON APIs, Web developers can ease into Linked Data ideas, whereas the original message for the semantic web was focused on a major shift to new technologies that seemed too alien and complex to the average Web developer.
What differences do you see from the idea of the "semantic web"?
I think the term 'semantic web' actually covers two separate, but related ideas. On one hand it's a goal—a web where information context is curated as carefully as the presented data. On the other it's a methodology—a specific set of techniques advocated for achieving that goal. Linked Data is just another methodology towards the same goal. Linked Data is simpler because rather than requiring sophisticated and exhaustive declaration of the data such as OWL, it merely requires that you use links effectively, and do what you can to express the basic relationship semantics of those links. It's a much lower barrier to entry, and though the resulting context might not be rigorous enough for a logician, it's a big enough leap that I believe the result merits the term "semantic web".
Are you seeing current or near-term benefits from linked data technology with Zepheira client projects?
Definitely. In rapid prototypes for clients we're usually able to give them new analysis and decision-making capabilities. We often find that after we've produced a few deliverables, clients get much more ambitious because they see new possibilities. I think this is in large part down to web architecture, and thus Linked Data. It's not really black magic; the trick is usually to convert existing data sources using Linked Data techniques, which allows us to very quickly integrate across departments, viewpoints and specific application capabilities. It's the sort of integration that unfortunately IT is too used to associating with heavily-staffed, multi-year projects. We've found Linked Data to be a prodigious accelerator.
It may not be black magic, but again it is all about the art. We've put together a pretty reliable sequence of solutions beginning with START, which is a seminar/workshop combination to analyze the benefits and ideal targets of technology such as Linked Data at a specific client. Once the client has a target project in mine we have 3D, with is a carefully crafted package to accelerate the use of Linked Data in the project. 3D is like taking the general ideas of a home owner sketching out their dream home, and placing it into a particular architectural school, and producing blueprints, detailed materials manifests, subcontractor plans for framing, wiring, plumbing and more. 3D itself does not include implementation (building the home) because many of our clients would prefer that we prepare their internal development teams for that. When we are called for implementation we often use Remix, a web-based application built on SIMILE and other Linked Data open source products, to which we've made many commercial enhancements. That gives us a ready platform for the sort of rapid and rich integration I mentioned above. In effect we've built an entire solutions stack on Linked Data.
At the Linked Data Planet conference, you'll be talking about the Linking Open Data initiative. Where does this fit into the larger picture of Linked Data technology?
I think it's a pretty fuzzy line, but the way I try to organize it in my head Linked Data is a broader concept encompassing four main principles. The LOD initiative is a project associated with a more specific range of techniques and a particular kernel of sites (most notably DBPedia), providing a practical basis for expanding the field of information available as Linked data. At Zepheira we tend towards techniques popularized in LOD, but clearly for most of our clients the data can't be thrown into a cloud of public data, so we've added our own refinements leading to what we've started to call Linking Enterprise Data (LED) to increase the definition of the Linked Data principles as applied to organizational data integration and decision support needs.