« Semantic Web project ideas number 2 | Main | Tech »

Using XHTML 2 schemas

The RELAX NG kind, and maybe the XSD kind.

I wanted to use Emacs+nxml to create some XHTML 2 documents, so I went looking for an XHTML 2 schema. The latest Working Draft says that it "includes an early implementation of XHTML 2.0 in RELAX NG, but does not include the implementations in DTD or XML Schema form. Those will be included in subsequent versions, once the content of this language stabilizes." This schema's location is not obvious, but a few web searches turned up a pointer to the ZIP archive version of the Working Draft mentioned in the spec's header.

When you unzip this file, you'll find a collection of RELAX NG rng files in the xhtml2-20060726\RELAXNG subdirectory. The xhtml2.rng file looks like the driver file mentioned in the Working Draft, so I tried parsing a simple XHTML 2 document against that with jing and got some XForms-related error messages. I commented out the xhtml2.rng div element that contained the XForms module and the sample document parsed just fine. (Make sure that your XHTML 2 document's elements are in the http://www.w3.org/2002/06/xhtml2/ namespace.)

I used trang to convert the rng files to RELAX NG Compact files so that I could use them with Emacs+nxml. I zipped these up and put them at http://www.snee.com/xml/xhtml2rnc2005-07-27.zip if anyone else is interested in using them. I also tried converting the RNG files to DTDs, but trang said that there were too many fancy RELAX NG constructs in there, which makes sense— the Working Group used RELAX NG instead of DTDs because it's more expressive.

The story with W3C Schemas was similar to the DTD one but not as bad. Trang converted the RNG files to XSD's with a few errors. I tried validating a sample document against xhtml2.xsd with stdinparse and had some luck, but I still got some error messages. I spent a few minutes trying to track down their cause and then quit. I've always felt that outside of the data typing W3C Schemas are too much trouble, and this certainly didn't change my mind.

Despite the age of the RELAX NG schema, as indicated by the date in my zip filename, the rnc files worked well with Emacs+nxml. They didn't even have problems with a sample document that included the new about, role and property attributes described in my recent XML.com pieces about RDFa (Part 1, Part 2) except when an about attribute value had square brackets to indicate that it was a CURIE. (I was going to link "CURIE" in that last sentence to my second article's section on them, but somewhere in O'Reilly's process for preparing these articles they took the id attributes off of all of my block elements except for the pre elements. I put these id values in the block elements of what I write to make it easier to link to specific points—you know, web, linking, etc.—so it's odd that they would take them out.) CURIEs are recent enough that I wouldn't expect this version of the schema to support them.

When the next Working Draft comes out, I know I'll go straight to the schema in the zip file to try it out. Maybe it will have better XForms support; maybe there'll be new features to play with. I look forward to it.

Comments

(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

I have struggled valiantly for over year to get XHTML2 RNG schemas to work well (I'm using oXygen). Specifically, I would love to hear that anybody in the universe has actually used the XHTML2 RNGs to create a document with forms. Meaning, XForms.

The problem seems to be a combination of how XForms got absorbed into XHTML2 as a host language, plus a set of schemas that had that part somewhat disabled.

>how XForms got absorbed into XHTML2 as a host language

In a word, badly, in the last iteration of the schema. I commented the XForms parts out when I used the schemas.

I would work on the XForms advocates (e.g. Micah Dubinko) on this score, because it's up to them to integrate XForms better into XHTML 2 if they want people to use it.

Bob