The SQL Mark of Cain, According to Genesis

Posted on April 27, 2010


Today I came across this untitled post at one of forums (discussing an Oracle bug and the value proposition of SQL NULL vs empty strings.) The post sounds remarkably like real life in the trenches, which makes it even funnier. Enjoy.

In the beginning, there was a problem domain, and it was defined, and it was known, and it was good. And the database analysts made whole a logical design, and a physical design, and they brought them before the systems analysts, and the business analysts, and the users, and after much talk and rework, the designs were blessed. And the database analysts went forth, and caused to be made tables a hundredfold, and to each of them columns a hundredfold, each well-typed and well-constrained according to its needs, and myriad procedures.

And the developers went forth and designed user interfaces and batch processes and reports. When all was done, the data was loaded, and found to be good and well-behaved. And there was much feasting upon donuts and pizza, and drinking of beer.

And all was well for a time, and the data flourished and multipled, and the developers made mods and enhancements, and all were happy.

Then to the company came a new monthly data feed, one which knew not of the well-defined problem domain, nor of the blessed designs, nor of the works that had been made before it. And the developers looked and they saw to their horror that data was missing which should be there.

And they went to their managers and said, "This cannot be, for this data which is missing must be laid into a column designated ‘not null’ as is proper for the design; we must reject this data, and send it back to whence it came, that the makers of the data may cleanse it, and make it whole and right". But to this the managers replied "Nay, we cannot, for it is within the contract. You must abide by this data, and find a way".

And they went to the database analysts and said, "We must change the column, that it may accomodate nulls". But the database analysts averred, saying "We cannot change the design, for much has been built upon it; this is a path that is not open to you."

And the developers were anguished, for the users were demanding the data, but there was no way to provide it to them. And a dark cloud descending upon the land. And many ideas were generated by the developers, but none were found workable.

But then one developer among them said, "What if we just import the column as an empty string and then clean it up? They users can review the data through custom reports or the UI, we can do some batch updating based on what they find, and anything left over they can just update through the UI." And they presented this to the users, who agreed, and all was well again.

If the SQL engine equates an empty string to a null, then not-null columns will be populated with "UNKNOWN".

Data is dirty.