How to use the relational model to do database design? by Eduardo Bellani
On a recent exchange I was asked the following:
… what you mean that you use the relation model to design? …
Let’s first start with the motivation: Why should one use the relational model(RM) to do database design? Here is my one line answer:
It makes it possible to have and to maintain the integrity of your business rules.
To illustrate this point, here are some examples of problems that one faces when one does not have such integrity:
- A status got written to ‘Done’, but the data that was expected to be there was not (eventual consistency)
- Some data that your application depends on got deleted (delete anomaly)
- A join returns more information than expected (update anomaly)
- Slow queries (optimizer problems due to duplicates)
- Ambiguous duplicates in results (duplicates)
- Wrong query results (NULLs)
Now to the point at hand: How would one use the relational model to design a database? Let me start by a definition of what is the relational model:
- An open-ended collection of scalar types, including type BOOLEAN in particular
- A type generator and an intended interpretation for relations of types generated thereby
- Facilities for defining variables of such generated relation types
- A assignment operator for assigning values to such variables
- A complete (but otherwise open-ended) collection of generic operators for deriving values from other values
Let’s define a database:
A database is a set of predicates and instatiations of such as propositions. The RM uses relation types to represent predicates. SQL uses table definitions. The RM uses relations to represent the arguments of a predicate, SQL uses rows.
In short, the RM set global constraints on any database design. Here is a (probably incomplete) list with ways that the RM drives database design:
- Never allow NULL anywhere (avoid
NULL
generating operators, such asOUTER JOIN
) - Never allow duplicates (avoid duplicates generating queries, such as
<SELECT | UNION> All
) - Never depend on position of columns or rows
- Always make sure each table represents one and only one predicate, thus being in 5NF
- Use updateable views to have logical independence (simulate them with triggers)