RDF lists and SPARQL

Not great, but not terrible, and a bit better with SPARQL 1.1
I have yet to ever say to myself "what I need here is an RDF collection, which I will implement with lots of rdf:first and rdf:rest triples!"

That fact that RDF expresses everything using the same simple three-part data structure has usually been a great strength, but in the case of ordered lists (or RDF collections) it's pretty messy. The specification defines a LISP-like way of using triples to identify, for each position in a list, what the first member is and what list has the rest of them after that. When saying "and here are the rest" for every member of the list, you don't want to have to come up with a unique URI for each one, so datasets typically use blank nodes for these placeholders, and you can end up with a lot of them.

Putting all this together, you could represent the list ("one", "two", "three", "four", "five") with these triples:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
@prefix d:   <http://learningsparql.com/ns/data#> .

d:myList d:contents _:b1 .

_:b1 rdf:first "one" .
_:b1 rdf:rest _:b2 .

_:b2 rdf:first "two" .
_:b2 rdf:rest _:b3 .

_:b3 rdf:first "three" .
_:b3 rdf:rest _:b4 .

_:b4 rdf:first "four" .
_:b4 rdf:rest _:b5 .

_:b5 rdf:first "five" .
_:b5 rdf:rest rdf:nil .

Turtle and SPARQL include syntax that lets you write out a more human-readable version without explicit blank nodes and with the list represented as, well, a list. The following is the equivalent of the example above:

@prefix d: <http://learningsparql.com/ns/data#> .

d:myList d:contents ("one" "two" "three" "four" "five") 

To do much with these lists, though, especially in SPARQL, you still have to think in terms of rdf:first and rdf:rest.

To be honest, I've never found much need to do anything with RDF lists, but after seeing recent references to them—or, in Manu Sporny's case, the lack of them—I thought I'd play around a bit to see how difficult it was in SPARQL to do four basic list tasks:

  • Retrieve the Nth member of a list

  • Retrieve all the members of a list

  • Insert a new member at a specified position

  • Delete a member from a specified position

Update after posting my original entry: Andy Seaborne pointed me to his 2011 blog entry Updating RDF Lists with SPARQL, which includes SPARQL queries covering several additional cases. Also, more from Joshua Taylor at stackoverflow, thanks to Paul Gearon.

I found that SPARQL 1.1's property paths made it easier to concisely address a specific list member without lots of triple patterns, and of course without SPARQL 1.1 update there would be no insertion or deletion of list members. (I'm happy to take suggestions on improving the queries.)

Retrieving the Nth member

The following query retrieves the third member from the list defined above:

PREFIX d: <http://learningsparql.com/ns/data#>

SELECT ?item
WHERE {
  d:myList d:contents/rdf:rest{2}/rdf:first ?item
}

If you think of it as zero-based counting, it's simple: you just plug the number of the member you're interested in into the curly braces. Using ARQ, the query returns this:

-----------
| item    |
===========
| "three" |
-----------

But... after writing and testing that, I remembered that the ability to specify a specific number of repeated property path steps by putting a number between curly braces was dropped in the 24 July 2012 Working Draft of the SPARQL 1.1 Query spec, so it's not proper SPARQL. It works just the same when you replace rdf:rest{2} with rdf:rest/rdf:rest, which is a minor change, but specifying every step like that will be a pain if you want to retrieve the twenty-third member of the list.

I've replaced the rdf:rest{2} that was in the original draft of the insert and delete queries below with rdf:rest/rdf:rest.

Retrieving all the members

The following retrieves all of the list items. As an added bonus, ARQ displayed them in order, but that was just luck, and not something to count on, because stored triples have no order.

PREFIX d:   <http://learningsparql.com/ns/data#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?item
WHERE {
  d:myList d:contents/rdf:rest*/rdf:first ?item
}

Of course, changing the first line to SELECT (count(?item) AS ?items) would give you the number of members in the list, which is also handy.

Inserting a new member at a specific position

The main work is breaking the link where the insertion will take place and then linking the new member in.

PREFIX d: <http://learningsparql.com/ns/data#>
DELETE {
  ?insertionPoint rdf:rest ?rest . 
}
INSERT {
  _:b1 rdf:first "threePointFive" ; rdf:rest ?rest . 
  ?insertionPoint rdf:rest _:b1 . 
}
WHERE {
  d:myList d:contents/rdf:rest/rdf:rest/rdf:first ?item .
  ?insertionPoint rdf:first ?item ; rdf:rest ?rest . 
}

Here is how the dataset looks after using TopBraid Composer to run this query on the data above:

@prefix d: <http://learningsparql.com/ns/data#> .
d:myList
  d:contents (
      "one"
      "two"
      "three"
      "threePointFive"
      "four"
      "five"
    ) ;
.

Deleting a member from a specified position

The following deletes the third item from the list. As with the previous query, the main work is breaking the link and creating a new one across the gap where the deleted item was:

PREFIX d: <http://learningsparql.com/ns/data#>
DELETE {
  ?previousMember rdf:rest ?deletionPoint .
  ?deletionPoint rdf:rest ?rest . 
  ?s ?p ?item   . 
  ?item ?s ?p . 
}
INSERT {
  ?previousMember rdf:rest ?rest.
}
WHERE {
  d:myList d:contents/rdf:rest/rdf:rest/rdf:first ?item .
  ?deletionPoint rdf:first ?item ;  rdf:rest ?rest . 
  ?previousMember rdf:rest ?deletionPoint .
  ?s ?p ?item . 
  OPTIONAL { ?item ?s ?o . }
}

Running this update request after running the insertion one before it results in a dataset that looks like this:

@prefix d: <http://learningsparql.com/ns/data#> .
d:myList
  d:contents (
      "one"
      "two"
      "threePointFive"
      "four"
      "five"
    ) ;
.

So we know it worked.

Taking it further

I won't remember the syntax of these queries without reviewing them as written here, but I know that I can copy them from here and paste them elsewhere with minor modifications to perform these basic list manipulation goals.

On the other hand, in the work I've done with RDF and SPARQL, I have yet to say to myself "what I need here is an RDF collection, which I will implement with lots of rdf:first and rdf:rest triples!" So, the exercise above seems a bit academic. (In fact, my original goals above look like a homework assignment; for extra credit, modify the queries so that the targets can be specified based on their values and not their positions.) If I need to order some instances in RDF, I'm more likely to give them some property I can use to sort them. I'd love to hear pointers from anyone about places where using rdf:first and rdf:rest addressed a data modeling issue better than any alternative would.

Still, the queries above show that maybe RDF collections are not as bad as I originally thought, and that SPARQL 1.1 property paths can make certain tasks more straightforward to achieve.


Please add any comments to this Google+ post.