Bill Kent passed away on December 17th at the age of 69. Six years ago, in a discussion of a generalized definition for the word "schema" on xsl-list, Ben Pickering asked "are there any books or online documents dealing with the theory of 'information structures'? Some kind of description of the ways in which information may be structured, and the advantages of doing it a particular way? Michael Kay replied "Yes! Though most of the ones I know of are written in the 'database' context rather than the 'document' context. Some are very academic / mathematical / philosophical, some more oriented to the practitioner. One of the best in my view, but very hard to get now, is Bill Kent's 'Data and Reality'."
Between the book's excellent title and the source of the recommendation, I had to seek it out, and found a 1979 North Holland printing at Powells completely set in a Courier typeface. (It has since become available from authorhouse in both electronic form and as a paperback.) I have e-mailed Kent since then, and was thrilled to receive a reply and to later correspond with him about the possibility of compiling his other writings. I was also very sorry to miss the Extreme XML 2003 conference in Montreal, because he was the keynote speaker there. I'm sure he had some insightful things to say about where XML data modeling issues fit into the larger data modeling issues that he had thought so much about over the years.
In a field where a ten-year-old book can look hopelessly out of date, "Data and Reality" has plenty of clear, prescient advice for those of us working nearly three decades after it was written. He talks of semiotics, set theory and realistic examples of data modeling problems that you often don't realize are problematic until he explains why. Much of what he writes anticipates fundamental notions of object-oriented development, and upon my first reading I couldn't help but wonder how he reacted to OO ideas when they came along. You don't have to wonder much; he had plenty to say, much of which you can find in the "Object orientation" section of the essays he's made available in the Document List section of his website. He wasn't simply pointing out problems that OO modeling would solve, though—"Data and Reality" also mentions issues that would point to problems with the OO model.
One important issue in his writing is object identity. For example:
What does "catching the same plane every Friday" really mean? It may or may not be the same physical airplane. But if a mechanic is scheduled to service the same plane every Friday, it had better be the same physical airplane.
Another issue is object boundaries. How do we, and when should we, represent two concepts separately? Once a concept is represented in software, what is the effect of the passing of time on it?
At what point is it appropriate to introduce a new representative into the system, because change has transformed something into a new and different thing?
The problem is one of identifying or discovering some essential invariant characteristic of a thing, which gives it its identity. That invariant characteristic is often hard to identify, or may not exist at all.
An important theme in "Data and Reality" is that the way we represent something has more to do with how it's used than any intrinsic qualities of the thing:
The category of a thing (i.e., what it is) might be determined by its position, or environment, or use, rather than by its intrinsic form and composition. In the set of plastic letters my son plays with, there is an object which might be an "N" or a "Z", depending on how he holds it. Another one could be a "u" or an "n", and still another might be a "b", "p", "d", or "q".
As he put it elsewhere, "we are not modeling reality, but the way information about reality is processed, by people." Today, whether we're talking about open data shared by anyone who wants it, protected data shared by business partners according to the terms of a strict contract, or any data sharing that falls between these two scenarios, the issues Kent describes are even more important than they were when he wrote these words 29 years ago.
I drafted that last paragraph before I came across this, the ending of his book:
In an absolute sense, there is no singular objective reality. But we can share a common enough view of it for most of our working purposes, so that reality does appear to be objective and stable.
But the chances of achieving such a shared view become poorer when we try to encompass broader purposes, and to involve more people. This is precisely why the question is becoming more relevant today: the thrust of technology is to foster interaction among greater numbers of people, and to integrate processes into monoliths serving wider and wider purposes. It is in this environment that discrepancies in fundamental assumptions will become increasingly exposed.
He knew where things were headed, and he had a lot of great advice for people processing information in the seventies, today, and years from now. Take a look at his obituary and web site, particularly his nature photography, "Data and Reality" excerpts, and other writing since then. He will live on in this work.