Scripting the addition of XML files to the eXist XQuery database
This wasn't documented very well, so once I got it to work I thought I'd post it.
Saxon is great for getting to know XQuery syntax (see part one and part two of my "Getting Started with XQuery" articles in XML.com for more on this), but it reads all of the data to query into memory, and much of the point of XQuery is to work with large, indexed, disk-based collections of XML that won't fit into memory. I've started playing with the open-source eXist XML database for this.
After starting up the eXist server, you can start up the interactive client and load files from there, but if the client has any problems loading the files, it doesn't show any error messages that I could find—all I knew was that the file I tried to load wasn't showing up in the client's list of loaded files. If you want to load a lot of files, you don't want an interactive client, anyway; you want to create a script that does it for you. Apparently, the documentation and sample perl/python/java scripts that come with eXist are a bit behind the development of the system itself, so they don't always work. I finally found a simple way to load files using an eXist extension to XQuery, demonstrated by the code below.
(: Load the files temp2a.xml, temp2b.xml, temp2c.xml from c:\temp into the eXist database. :) xquery version "1.0"; declare namespace xmldb="http://exist-db.org/xquery/xmldb"; <html><body> { (: We'll load each file into the coll1 collection as the administrator. :) let $collection := xmldb:collection("xmldb:exist:///db/coll1", "admin", "") for $dataFilename in ("temp2a","temp2b","temp2c") let $name := $dataFilename let $URI := xs:anyURI(concat("file:///c:/temp/",$name,".xml")) let $retCode := xmldb:store($collection, $name, $URI) return <p>{$retCode}</p> } </body></html>
With eXist stored in c:\bin\eXist on a Windows machine and its server up and running, storing the XQuery script above as C:\bin\eXist\webapp\xquery\loadfiles.xq and then sending a browser to http://localhost:8080/exist/xquery/loadfiles.xq ran the query, loaded the files, and displayed the return codes in the browser.
After getting this to work with simple dummy files, I found what was wrong with the file I was originally having problems with: "The document is too complex/irregularily structured to be mapped into eXist's numbering scheme." As a dayjob-related file, I can't describe it in much detail, but this reaction to it didn't surprise me. Still, I have plenty of ideas for eXist apps to build around less complex XML.
Comments
(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)
i have been using eXist in anger for the past year....a few bits of advice;
* u must use the latest build...too much volatility with older snapshots
* I would suggest using the very useful Ant tasks for doing mundane stuff like loading and exporting data to the database
* as for complex xml type errors...i havent encountered this! usually issues with getting an xml document into eXist is ...not well formed, dtd is not registered in /exist/WEB-INF/catalog, super large docs (should break up into collections/docs if possible)
* the REST interface is fine with the current perl scripts...i use them constantly
* authentication and doc ownership is a little hit and miss with eXist at the moment across various interfaces (webdav, servlet, REST, XML-RPC)
gl, Jim Fuller
Posted by: Jim Fuller | December 14, 2005 8:56 AM