Retrieval of triples from relational databases is a popular topic in the semantic web world, but I was recently wondering how much trouble it would be to go in the opposite direction: to retrieve data from a SPARQL endpoint and load it into a relational database. It wasn't much trouble at all. When you retrieve the results in the SPARQL query results XML format, a straightforward XSLT spreadsheet can convert it into the necessary SQL INSERT statements. I was able to automate the data retrieval, conversion to INSERT statements, and actual insertion into a MySQL database with a three-line batch file that used no Windows-specific tricks, so I'm sure it would work on Linux just as well.
I used the following SPARQL query to retrieve the name, founding year, and equity, revenue, net income, and operating income figures of companies listed on the New York Stock Exchange according to DBpedia. I used ARQ to execute the query, so that after the inner query retrieved the raw data from the http://DBpedia.org/sparql SPARQL endpoint service, the outer query could use ARQ's SPARQL 1.1 support to format the data a bit—mostly, by using the str() function to strip language and datatype tags.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX do: <http://dbpedia.org/ontology/> SELECT (str(?name) as ?coName) (substr(str(?formationYearTyped),1,4) as ?formationYear) (str(?equityTyped) as ?equity) (str(?revenueTyped) as ?revenue) (str(?netIncomeTyped) as ?netIncome) (str(?operatingIncomeTyped) as ?operatingIncome) WHERE { SERVICE <http://DBpedia.org/sparql> { SELECT * WHERE { ?company <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Companies_listed_on_the_New_York_Stock_Exchange> . ?company rdfs:label ?name . FILTER ( lang(?name) = "en" ) OPTIONAL { ?company do:formationYear ?formationYearTyped . } OPTIONAL { ?company do:equity ?equityTyped . } OPTIONAL { ?company do:revenue ?revenueTyped . } OPTIONAL { ?company do:netIncome ?netIncomeTyped . } OPTIONAL { ?company do:operatingIncome ?operatingIncomeTyped . } } } }
The following command line told ARQ to put the results of this query in an XML file called companyData.xml. (Because the query doesn't have the FROM keyword, ARQ needs an input dataset specified, so the command names dummy.ttl as this input even though the query above ignores this file and gets its data from DBpedia using the SERVICE keyword.)
arq --results XML --query getCompanyData.spq --data dummy.ttl > companyData.xml
Next, I ran the following command to apply an XSLT stylesheet to the result of the ARQ output using libxslt's xsltproc XSLT processor. (You could use Saxon or Xalan just as easily.) This generated the SQL statements that would add the data to a MySQL database and stored them in the file insertCompanData.sql:
xsltproc SPARQLXMLtoSQL.xsl companyData.xml > insertCompanyData.sql
The XSLT stylesheet is not particularly brief, but there's no customized logic to process the output of the query above other than the use of the query's variable names and the quotes that it adds around the coName values. (The potential need for quotes depends on whether you're inserting the value into the SQL database as a string.) The trickiest part was having the stylesheet output the string "NULL" when a value was missing; I used a named template, so it wasn't too tricky.
If I had many different query results to convert to SQL INSERT statements, I'd write a more generalized version of this stylesheet (for example, setting the the name of the database and table to receive the data in variables at the top), but if I only had two or three sets of SPARQL query results to deal with, I could adapt this one for each of those pretty quickly:
<xsl:stylesheet version="1.0" xmlns:s="http://www.w3.org/2005/sparql-results#" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:strip-space elements="*"/> <xsl:output method="text"/> <xsl:template match="s:sparql"> USE testdb; <xsl:apply-templates/> </xsl:template> <xsl:template match="text()"/> <!-- all values output with xsl:value-of --> <xsl:template match="s:result"> <!-- Typical line for this template rule to create (with carriage return added here): INSERT INTO company VALUES( "Protective Life",1907,NULL,3.06E9,2.71E8,4.16E8); --> <xsl:text>INSERT INTO company VALUES("</xsl:text> <xsl:value-of select="s:binding[@name='coName']/s:literal"/> <xsl:text>",</xsl:text> <xsl:call-template name="valueOrNULL"> <xsl:with-param name="value" select="s:binding[@name='formationYear']/s:literal"/> </xsl:call-template> <xsl:text>,</xsl:text> <xsl:call-template name="valueOrNULL"> <xsl:with-param name="value" select="s:binding[@name='equity']/s:literal"/> </xsl:call-template> <xsl:text>,</xsl:text> <xsl:call-template name="valueOrNULL"> <xsl:with-param name="value" select="s:binding[@name='revenue']/s:literal"/> </xsl:call-template> <xsl:text>,</xsl:text> <xsl:call-template name="valueOrNULL"> <xsl:with-param name="value" select="s:binding[@name='netIncome']/s:literal"/> </xsl:call-template> <xsl:text>,</xsl:text> <xsl:call-template name="valueOrNULL"> <xsl:with-param name="value" select="s:binding[@name='operatingIncome']/s:literal"/> </xsl:call-template> <xsl:text>); </xsl:text> </xsl:template> <xsl:template name="valueOrNULL"> <xsl:param name="value"/> <xsl:choose> <xsl:when test=" $value != '' "> <xsl:value-of select="$value"/> </xsl:when> <xsl:otherwise>NULL</xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
To run the created INSERT statements with a MySQL database table, I just did this, substituting my own MySQL username and password:
mysql -u myusername --password=mypass < insertCompanyData.sql
Of course, the created set of INSERT statements assumes that a database named testdb with a table named company already exists, and that the appropriate columns have been declared for that table.
After combining the command line calls to arq, xsltproc, and mysql in a three-line batch file, it was fun to see it all happen unattended. For a more serious implementation, you'd want to look into the use of APIs to the various tools as a more efficient alternative to this kind of scripting, but it's nice to see how much can be done with a little scripting.