Making up URIs

Or not.

I love this recent quote from Jim Hendler:

If you and I decide that we will use the term "http://www.cs.rpi.edu/~hendler/elephant" to designate some particular entity, then it really doesn't matter what the other blind men think it is, they won't be confused when they use the natural language term "elephant" which is not even close, lexigraphically, to the longer term you and I are using. And if they choose to use their own URI, "http://www.other.blind.guys.org/elephant" it won't get confused with ours.

If you wrote a schema in which you defined a namespace and documented specific terms to mean specific things within that namespace, I can use those URIs to have the same meanings in my own application—but it's not always that simple.

Let's say that there's a metadata standard called xyz, and they've declared a schema somewhere. My document at http://www.snee.com/docs/mydoc1.xml, which uses this standard, begins like this:

  <document ns:xyz="http://www.xyz.org/schemas/docmetadata/">
    <xyz:header>
      <xyz:foo bar="56H">northwest</xyz:foo>
    </xyz:header>

I feel confident that the following RDF triple makes sense to say that this document has a foo value of "northwest", with no ambiguity about whose definition of "foo" I'm using:

  <http://www.snee.com/docs/mydoc1.xml>
  <http://www.xyz.org/schemas/docmetadata/foo>
  "northwest".

I'm confident, that is, unless xyz:foo appears in another context in the same document (for example, as a child of another element besides xyz:header) with a different value. In a somewhat related issue, bar is an attribute above, and while the xyz.org people defined it as having a specific meaning in their schema, in my document above that conforms to their schema it doesn't belong to any namespace, so it doesn't feel right to create an RDF predicate for the bar value that begins with http://www.xyz.org/schemas/docmetadata/.

Is the best practice for defining URIs for such information to just make up my own URI for its predicate around a domain name that I have control over, and then use OWL to define an equivalence if the xyz.org people (or anyone else) define their own URI and triples for bar and I want to aggregate their triples with mine?

I'm guessing that this is the case based on the output of the MIT Simile RDFizer project's RDF version of a sample MODS document. (Must... fight... temptation to link to something Quadrophenia-related...) The MODS URI http://www.loc.gov/mods/v3 doesn't appear anywhere in the RDFizer representation of the MODS data, and most properties in the RDFizer version are in the namespace http://simile.mit.edu/2006/01/ontologies/mods3#. So, just as the RDFizer folk built something around the http://simile.mit.edu namespace that they had control over, I should probably do the same with my own domain name for information from the xyz.org schema instead of trying to make up my own conventions for representing attributes and different contexts for the same element from the http://www.xyz.org/schemas/docmetadata/ URI. For example, http://www.xyz.org/schemas/docmetadata/foo/bar makes sense to me as a way to represent the bar attribute, but xyz.org is not my domain name to build naming conventions around, so I'd be better off representing this as http://www.snee.com/ns/xyz/foo/bar. Right?

(I made up xyz.org as an example domain name as I wrote this, and just now looked at the actual website—they're concerned with bigger issues than namespace URIs, such as the Knights Templar and the Secrets of the Bible.)

7 Comments

That's why you should use example.org, example.com, example.net, or just example: all are reserved domain names.


This immediately reminds me of the "topic merging" in topic maps, where once two topics a topic map were identified as referring to the same thing, they were merged into a single topic. For this discussion, a topic map topic is sufficiently similar to an RDF URI.

The interesting thing is that some topic map engines implemented topic merging by doing a real merge within their internal data model, and others did it just by keeping track of the equivalence, but maintaining the original separate topics internally. So I guess both can work.

Keeping track of multiple topics or URIs for the same thing must introduce some processing overehead, but it does have the advantage that you can "unmerge" later if you find that two topics were merged but should not have been (either they weren't really the same, or the user had fat thumbs).

Cheers, Tony.


Right, that's why I mentioned the possibility of using OWL to define the equivalence. That's the kind of thing that makes OWL reasoners such as Pellet so much fun, in a semweb-geek kind of way.


Sorry, I'm confused; is the first XML fragment supposed to be RDF/XML or just some arbitrary XML?


The idea is that it's arbitrary XML, and I want to create separate RDF representations of information in that XML using the namespace defined by xyz.org's metadata schema.


The idea is that it's arbitrary XML... OK, thanks.

I'm confident, that is, unless xyz:foo appears in another context in the same document (for example, as a child of another element besides xyz:header) with a different value. Finding another <xyz:foo> element implying the same subject but with a different object need not knock your confidence in this triple - it's only in contradiction with the idea that xyz:foo is a functional property.

...bar is an attribute above...it doesn't belong to any namespace, so it doesn't feel right to create an RDF predicate for the bar value that begins with http://www.xyz.org/schemas/docmetadata/. I'd definitely agree with this; it would be very presumptuous of you and not at all in the spirit of the game to define a new URI using their domain.