Almost three years after the post was originally published in October of 2005, Mark Johansen, author of A Sane Approach to Database Design (an approach I trust it’s clear I favor) has left a terrific comment on the Do Not Fear NULLs article which is worthy of your attention.
The “money quote” comes early:
Blank should mean “none” or “does not apply” or whatever makes sense in context, while null means “unknown”. Confusing these two things is huge problem in a lot of systems.
Mark then goes on to offer several examples to illustrate the pitfalls of failing to consider this issue carefully.
Each column in each of our tables is going to have a range of acceptable responses. The most poetic entity-relationship diagram you can possibly build will collapse into dust operationally if this work is not completed accurately.
As Mark says, all you need are a couple of houses with $0 selling prices in your real estate databases or an illegible data entry form to demonstrate the importance of proper NULLity and response range definition and the consequences of failing to manage these attributes properly.
“Does not apply” or “none” are perfectly reasonable range definitions, although reasonable people might choose to render “none” as NULL. The upshot, as always, is the business requirement driving the design. If we need to distinguish between unknown and none to make our application work properly (as defined by the business requirement), then we should. If the business doesn’t make this distinction, it might be confused to suddenly find it in its systems.
So, as with much of life, context is all.
Thanks, Mark, for your salient comments on a golden oldie (at least in blog years). Sane database design never goes out of style!
UPDATED 24 September 2008 for grammar.