Austin Ziegler wrote: > Relational databases come from the realisation that more than one > program needs to work with a given set of data, you don't want to store > more than one copy of any given datum, and there should be a formal way > of modelling such things. Thus, you have the Relational Model of Data -- > which, as I have mentioned several times, is a model that represents > tuples, attributes, and relations that can be queried using relational > algebra. Yes, thanks for re-iterating that... full support from me. > Mr Heath's assertion that ER modelling is about physical storage is > completely incorrect; Austin, please put the rocks down again, and listen carefully. You'll see I'm supporting your view, and adding to it. Don't immediately assume that because I said something you didn't relate to, that you can't learn something new from someone who's been doing relational *and* object modeling at both conceptual and physical levels since you were in short trousers. But before I show why I'm not incorrect, let me first say that we have an unavoidable need to work with the physical models we must create, and that there exist no tools (to my knowledge) that do enough to alleviate the problems that arise from that. It's not a problem with the relational model, but it *is* a problem with our need for physical mappings. > there *is* a physical layer that can be modelled, > but it is *primarily* about the logical layer, and the physical > characteristics DO NOT MATTER at that logical layer. > SQL is, at best, an approximation of the relational model You can spit all you like about how SQL causes these problems, but it isn't really the cause; rather it's the absence of a better alternative. Until you can point to a better alternative, you're fighting a pointless battle with the practitioners who maintain that relational databases are hard to work with. You're saying they shouldn't be hard, and you're right. They're saying they *are* hard, and they're right. Read on to see why. Ok, with that out of the way, let me introduce fact-based modeling, and show how it solves problems that repeatedly even occur in properly-designed relational databases. Sit tight, this is going to take a while. I'll try to make it shorter; please don't pick nits with my shortcuts until you've seen the whole picture. Suppose we want build a simple schema to record which types of beer people like. We can record the following elementary facts: Person is known by Name. Beer is known by BeerName. Person is fond of Beer. Here we have two entity types (Person and Beer), and two things that might be value types (Name and BeerName). Each entity type has a defined reference mode ("is known by"), which each form a fact type that's a 1:1 relationship. Finally we have one fact type, with the reading "is fond of", in which Person and Beer both play a role. Now this is enough information for most folk to work out what's going on, but not yet enough to define a relational schema; we need to know whether we will record whether a person may be fond of more than one beer. I'm assuming that a beer might be liked by more than one person, but lets assume that my initial fact said "Person has favourite Beer" instead of "Person is fond of Beer". It's also the case that a given Person might not like any beer, but let's assume that we have other reasons to record such a Person. We can represent this model using the relation: Person(Name*, BeerName?) where * means primary key, and ? means null is allowed. Further normalization requires: Person(Name*) FavouriteBeer(Name*, BeerName*) which avoids the NULL value. Both are valid choices, since they can be mapped to one another without loss - though the former is a preferable physical model. Notice how in neither case did we need a Beer() relation. That's because all fact roles of Beer have been absorbed into one of the tables you see. So far so good... until we get a change request. We have to record all the beers a person likes in priority order. Now the second form, which wasn't preferred because of its additional physical cost, is preferable, because we can simply add a column "priority" to FavouriteBeer, renaming it PreferredBeers. All we did was add a fact "Person has Preference for Beer", and the new constraint allows more than one Beer per Person, yet all our relational queries are written incorrectly, and we have to create migration code to construct a new table, and revisit all our queries to map to the new tables. SQL helps a little by allowing us to construct views, but the views can't hide the fact that the Person table no longer has a BeerName attribute. The story goes on... now change things so that a Name is made up of a FamilyName and a GivenName, and again so that FamilyName is the primary key of a Family entity which has a functionally dependent attribute MedicareNumber. You can't avoid such schema migrations, and you can't avoid the fact that you must create an efficient physical schema each time, so you wind up in a constant tradeoff between keeping your schema clean and refactoring your queries. Notice that I've said *nothing* here about SQL that isn't true of any effective relational system in existence. The need to store more than one fact per tuple (for performance reasons) is the cause. It's these *compound facts* that create the schema evolution problems that SQL suffers from, yet compounding is unavoidable for performance reasons. Hence my comments about disk storage, which I stand by. The only solution is a system that enables us to completely hide the physical layer from the user (from the queries), and that's what fact-based modeling can do. The details of how are contained in Terry Halpin's book "Information Modeling and Relational Databases", and are implemented in at least four significant database design tools, three of which he designed (the other is CaseTalk, www.casetalk.com). The most recent is an open source plugin for Visual Studio called NORMA, which is available as a CTP as the "orm" project at <http://sourceforge.net/projects/orm/>. Talk about an iceberg in hell! This thing is already *way* cool. I'm visiting the team (which Terry leads) for two days in May before the Rails conference. A fact-based model is still relational, and still has all the benefits of being built on first order logic and the predicate calculus, yet it's also intrinsically different from what has come to be commonly known as "the relational model". In particular, all relevant join paths are known in the schema or can be added without breaking it. So I can say the conceptual query: Person(Name@) who has more than 4 Preference for Beer(BeerName@) where the @ sign says that I want this as part of my result set, and this will return the names and preferred beers (without priority details) of all beer connoisseurs. Leave off the @ sign from Person(Name) and you just get the beer names without repetition. I didn't need to say JOIN or WHERE anywhere there, so it was easy to write (also easy to build with a graphical tool), and extensive study has shown that such queries are highly resistant to being broken as the schema grows. The result set is not in first normal form, BTW. It's a tree, or in more complex cases, a graph. It starts to look very like a de-serialized collection of objects (what I call a fact constellation, because it selects a meaningful group of stars from a starry sky of facts), and that's how programmers need it to be - it immediately addresses O/RM requirements with no further work. This is what relational databases *should* be like, but aren't. A raw beginner can use a graphical query builder (like the one in InfoModeler - Google that) that would make an experienced DBA quake in their boots. Such queries aren't hard to translate to SQL against the underlying physical model, as long as you preserve the derivations by which the physical schema was created. I've written Ruby code that already handles some cases. The remaining problem is that the existing tools are only design tools, that generate database schemata and static data layers. What's needed in addition is a flexible runtime and query processor, written in a dynamic language, and it's to create such an animal that I registered the ActiveFacts project on RubyForge. There's no content there yet (sorry Christian), but if anyone who's willing to do the background study, they're welcome to help out. I hang out on the new Yahoo group "information_modeling", and need folk to discuss ActiveFacts with while I develop it to the point where I have enough on which to base collaboration. I plan to post a significant literature review there sometime soon, to help people to get started. It's somewhat a pity that Terry named his method Object Role Modeling, even though neither the word "Object" nor the acronym "ORM" had been used as they do now... but if we agree to call it fact-based modeling or information modeling I think we can avoid confusion. A good overview of Object Role Modeling is at <http://msdn2.microsoft.com/en-us/library/aa290383(VS.71).aspx> Clifford Heath.