Substantial and accidental forms of a SQL expression by Eduardo Bellani
Multiplicity of parts, variety, and unity of plan which
combines the parts into a coherent whole, –
such are the elements of order found in all beauty.
(De Wulf 2012)
I have been working with data intensive applications for a while and of course that means exposure to a lot of SQL, both personally and training other developers.
In these experiences I have noticed that it is sometimes harder than it should be to grasp what SQL is doing, specially since its accidental form makes understanding its substantial form so hard. In this post I’m going to try to help the reader separate the two and understand SQL better.
But first, some definitions of terms:
… form is the principle of organization of a thing’s matter, or the thing’s intelligible nature, form can be of two kinds.
… (it) can be substantial, organizing the matter into the kind of thing that the substance is.
On the other hand, form can be accidental, organising some part of an already constituted substance.
substantial form always … brings a new substance into existence;
accidental form simply informs an already existing substance, and in doing so it simply modifies some substance. (Kerrl, n.d.)
One of the core problems with SQL and in particular its SELECT
expression is that the way it is written/read (and the usual
expectations of the terms such as SELECT/FROM
…) is very different
from what is actually taking place.
Let’s discuss this fact starting from an example (the example and much of the discussion are taken from (Date and Darwen 1997))
SELECT
P.PNO,
'Weight in grams =' AS TEXT1,
P.WEIGHT * 454 AS GMWT,
P.COLOR,
'Max Quantity =' AS TEXT2,
MAX(SQ.QTY) AS MQTY
FROM P, SP
WHERE
P.PNO = SP.PNO
AND (P.COLOR = 'Red' OR P.COLOR = 'Blue')
AND SP.QTY > 200
GROUP BY
P.PNO,
P.WEIGHT,
P.COLOR
HAVING SUM(SP.QTY) > 350
The difficulty here starts right at the begging, since the SELECT
clause is the first to be read and written, but it is the last to be
evaluated. Here is how to interpret this SELECT
expression:
-
FROM : The source of the data. An usually overseen point is that the
FROM
clause is actually aJOIN
(aCROSS JOIN
specifically). -
WHERE : The result of the
JOIN
of the previous step is reduced by elimination of rows (a process calledRESTRICTION
in the Relational Algebra). -
GROUP BY : This is one of the most complex parts of the expression because it goes outside the Relational Algebra. You can think of it as creating, from the restricted table, a Dictionary where the keys are a combination of the values of the defined columns and the referenced values are all the rows where the keys exist.
NOTE HOWEVER that such Dictionary is NOT a proper table. And that is why a
GROUP BY
clause will always demand a correspondingSELECT
clause that turns such Dictionary into a proper table. -
HAVING : This clause applies to the Dictionary values generated by GROUP BY, filtering all rows that do not match the condition. This is another operator that sits outside the Relational Algebra.
-
SELECT : This is called a
PROJECTION
in Relational Algebra. It is where you pick the colums of the table. It is also where each group resulting from the HAVING filter should now generate a single result row, by this process:- The part number, weight, color and maximum quantity are extracted from the Dictionary
- The weight is converted to grams
- Two literals are added (‘Weight in grams =’, ‘Max Quantity =’).
- All these insertions are ordered. The result looks like this:
PNO | TEXT1 | GMWT | COLOR | TEXT2 | MQTY |
---|---|---|---|---|---|
P1 | Weight in grams = | 5448 | Red | Max Quantity = | 300 |
P5 | Weight in grams = | 5448 | Blue | Max Quantity = | 400 |
So what
I hope the reader will leave with a better appreciation of what a SQL
SELECT
expression actually is, instead of what it looks like. I also
think that understanding the concept of substantial vs accidental form
can help the reader a lot in understanding things like such expression
in the future.
I also think a point that deserves attention is how worse the SELECT
expression becomes by SQL’s introduction of operators that don’t fit the
relational algebra (GROUP BY
and HAVING
clauses). Besides such, the
SELECT
expression is merely a JOIN->RESTRICT->PROJECT
sequence.