Law metadata on the web

US laws and court decisions: fertile ground for semantic integration projects.

More and more primary law (court cases and actual laws passed by governments at any level, as opposed to secondary law such as treatises explaining the meaning of primary law) is available on the web. In the United States, most federal and state governments and court systems make it a regular practice to publish this information on the web on their own dot gov websites. Governments typically have laws requiring that all laws be available where citizens can see them, and doing so on the web costs less than the time-honored tradition of publishing bound books.

Each state, though, has little incentive to encourage and then follow national standards for how this information is published. The web operation at a given state capital or court system is a nonprofit organization working on a limited budget, and if they can get readable HTML, PDF, Word, or even WordPerfect files up there, they've achieved their main goal. Much of what LexisNexis and WestLaw customers pay for is integrated access to normalized, indexed versions of such data, cross-linking between court decisions and laws, and a professionally-maintained taxonomy to guide them through laws that address particular subjects, and while many customers complain about the expense, the cost doesn't surprise people who understand the work that goes into it.

Some related efforts for standardizing law XML have been up and running for a while. The OASIS LegalXML group is mostly concerned with the electronic exchange of legal data such as court filings and and transcripts. Their membership is an interesting array of for-profit and non-profit organizations from several countries. (It turns out that the Victorian Society for Computers & the Law is based in the Australian state of Victoria and not concerned with potential applications of Charles Babbage's Difference Engine to British law during the reign of Queen Victoria.) Joshua Tauberer's www.GovTrack.us (see also his XML.com article) uses semantic web technologies to track ongoing US Government activity as they create laws.

A new organization called Legal-RDF.org focuses more on something I've been wondering about: the use of semantic web technologies to allow for integrated use of the free primary law on the web. Integrating the similar yet often structurally different collections of federal and state US law sounds like an ideal semantic web project; US academic researchers in the field looking for projects that would attract grant money should take a close look at the possibilities.

I recently scraped some Supreme Court decision metadata from the excellent collection at Cornell's Legal Information Institute. (I was going to use it to learn SPARQL better, querying for things like how often two judges were on the same side of a concurrence or a dissent for a particular opinion, but this fell in priority as job search related projects rose in priority.) The HTML at the LII is a bit messy, but the court decisions and opinions have plenty of META tags, and tools for cleaning up the HTML are easy enough to find, so I turned the metadata into an RDF file. (For an example of the existing metadata, do a View Source on my favorite case, 510 U.S. 569, and Justice David Souter's opinion. Don't miss Appendix B of the latter; here's a sample quote: "Big hairy woman all that hair it ain't legit, Cause you look like Cousin It".)

One of the stated goals of Legal-RDF.org is "to develop and publish domain vocabularies (that is, ontologies) used to label text within legal and related documents with their semantic meaning." For the quick and dirty RDF that I created, I just made up namespace URLs, and Legal-RDF.org work like this ontology project will make it easier for projects like mine to use common namespaces so that they can integrate more easily with each other and, like, you know, form a web. (A folksonomy-oriented list of topics touched on by court cases would have a more difficult time being useful, considering that legal research is the classic use case in which recall is more important than precision.)

In general, Legal-RDF.org looks like a great place to get in touch with other people interested in taking on similar pieces of a project that looks like an obvious and useful application of semantic web technologies. I look forward to seeing the results of their work.

bobdc.blog

Bob DuCharme's weblog, mostly on technology for representing and linking information.

Law metadata on the web

Search