Last April Richard Cyganiak tweeted the following:
@iand @ldodds "use this term if available, else fall back to that one" is common when consuming RDF, not well supported by SPARQL or RDFS
I took this as a challenge (if not as a very pressing one, if I waited this long to follow through). I managed to write a SPARQL query that reads the following data and sets ?label to the skos:prefLabel value if it's available and otherwise to the rdfs:label value:
@prefix skos: <http://www.w3.org/2004/02/skos/core#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix : <http://rdfdata.org/whatever#> . :thing1 rdfs:label "Robert"; skos:prefLabel "Bob" . :thing2 rdfs:label "Jane". :thing3 skos:prefLabel "Frank".
Here's the output, using ARQ:
----------- | label | =========== | "Frank" | | "Bob" | | "Jane" | -----------
Here's a SPARQL 1.0 version of the query:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?label # Bind ?label to WHERE { { # skos:prefLabel if available ?s skos:prefLabel ?label . } UNION # and rdfs:label if not. { ?s rdfs:label ?label . OPTIONAL { ?s skos:prefLabel ?prefLabel .} FILTER (!bound(?prefLabel)) . } }
It asks for the union of any skos:prefLabel values and any rdfs:label values but to filter out any of the latter that have a skos:prefLabel property for the same subject. The query is verbose, and the FILTER(!bound()) trick is non-intuitive enough to have inspired two nicer substitutes in SPARQL 1.1: MINUS and FILTER NOT EXISTS. Here's the query with MINUS:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?label # Bind ?label to WHERE { { # skos:prefLabel if available ?s skos:prefLabel ?label . } UNION # and rdfs:label if not. { ?s rdfs:label ?label . MINUS { ?s skos:prefLabel ?prefLabel } } }
You could substitute FILTER NOT EXISTS for MINUS there, and it would work the same way with a SPARQL engine that implements 1.1 such as ARQ.
It's one less line than the SPARQL 1.0 version, and a bit easier to read, but it's still a verbose way to assign skos:prefLabel to ?label if it's available and otherwise rdfs:label. The important thing, though, is that it can be done with standard SPARQL, and that it's a little easier with 1.1.
Can you improve on this query at all?
[Argh, captcha and validation are killing me]
What about:
{
?s ?p ?o . # or limit to a type
OPTIONAL { ?s skos:prefLabel ?label . }
OPTIONAL { ?s rdfs:label ?label . }
}
COALESCE would be nicer, however.
Seems unfortunate that you have to repeat a whole pattern to get this to work, as the pattern you want in a real-world case could be substantially more complicated than this one. Is there a way to get both labels and then LIMIT 1, inside a subquery?
[In Thread that would be "Subject|(.prefLabel,Label:#1)", although there's also a built-in "otherwise" feature so this could be just "Subject|(.prefLabel;Label)".]
Damian,
With ARQ, that gave me "Bob" twice, so I added ?s and ?p to the select statement and got this:
That's close to what I was looking for, but obviously there's a problem--I think rdfs:label bound ?label to "Robert" and then it got overwritten with "Bob" so that there are the two "Bob" results.
Damian's way is the standard way to do it. The only reason you're getting the duplicates is because you're selecting out the predicate as well and selecting for ?s ?p ?o before the optional.
if you did:
?s rdf:type <whatever>
followed by the optionals, you'd get a single result for each as expected.
Hi Bob,
The explanation for that is nothing to do with my trick, but rather the ?s ?p ?o business before it.
For this trick to work you need ?s to be bound, so (for demo purposes) I added ?s ?p ?o. What you're seeing is each triple, plus the (correct) label given the subject. There are two triples with thing1 as a subject, hence "Bob" is returned twice.
If you add types to the subjects you can try:
which gives the expected answer.
Hi Lee,
That works if I assign rdf:type values to each of the resources in the data file. I assume there's no other way to do it with the data as shown?
Also, if I do it like this (with the prefLabel pattern first),
OPTIONAL { ?s rdfs:label ?label . }
OPTIONAL { ?s skos:prefLabel ?label . }
?label gets bound to "Robert", not "Bob", I assume because it was looking for an rdfs:label value first. I didn't realize that the order could be used to control things this way. I just looked through section 6 of the 1.1. Query spec and didn't see anything about this; where can I find something in the spec about the effect of ordering the OPTIONAL clauses?
thanks,
Bob
The order dependence of OPTIONAL clauses is an artifact of the semantics of OPTIONAL (LeftJoin in the algebra).
Lee
And now I see, Lee, that your General SPARQL Discussion and this blog post that you wrote covered the very issue I was wondering about long ago!