Pond's Seventh Law, Applied: Avoiding Random Elegance

Another application of Pond's Laws comes to light from an exchange on an internal SQL Server discussion list.

My good friend and colleague John Huschka, a Senior Software Development Engineer for Microsoft Business Solutions' FRx Software Corporation, alerted me to this post on the list:

I am designing a Windows Service, which will create about 2,000 tables in one year. I know that SQL Server’s maximum number of tables is 2,147,483,647, so I don't need to worry about the overload problem. However, I am worrying about that whether this design will extremely slow down the speed of SQL Server?

The application which uses the database will mostly operate on one table, i.e. there are few operations like join between different tables. Supposedly we will backup the database and delete a lot of tables every 6 months.

The reason I need to create tables during runtime of my program is that, by this way I can reduce the database space I need quite a lot. If this design doesn't affect the speed too much (I mean not unacceptable, though some), I still would like to keep this design.

Any input is greatly appreciated.

One colleague made a general suggestion to look into partitioning before I chimed in with the following:

My humble opinion, with the caveat up front that engineers love to change things:

I have serious concerns regarding the feasibility and completeness of a design which involves the ad-hoc creation of new tables at specified intervals. It seems to me that a stable solution would involve moving your data through an existing schema (Jeff’s suggestion to look into partitioning strikes me as well-founded) rather than rebuilding the schema every six months (how would your stored procedures deal with that?).

I share your concern that your anticipated design will suffer from performance issues. I’m uncertain how your design would “reduce the database space”; you’ve got to store your data somewhere, and I’d think that storing a volume of data in a single table would take up less space than storing that same data in 100 tables if all else is equal. If the data is this fluid, storing some of it in XML and querying the XML may also be an option.

In summary, at first blush it looks to me as though there’s more design analysis to be done here.

John then weighed in with his comments..

I would certainly agree.

Echoing the design considerations that Ward has outlined, I will also add that such applications—which by implication appear to be high volume—pose great difficulties in terms of administration. Rather than tables/tablespaces being “statically” managed by a DBA, algorithms must be implemented by which tables are created in line with production storage requirements…and not many DBAs are going to delegate production storage administration to an application.

.. which brought another comment to my mind:

To continue my long-standing mutual admiration society with John and build further on his comments:

If you build this system as you’re currently planning to, what happens when someone needs to make an architectural change to it after you’ve moved on (we’re all going to move on at some point..)? In the scenario you’ve proposed, your successor has to understand both the business requirement your system addresses and the algorithm you’ve implemented to manage the table creation process and move the data around.

The business requirement is relatively easy for a new person to pick up. There should be plenty of resources on the team who can address those issues.

Your algorithm, though, is another matter entirely. It’s not nearly as central to the team’s mindset as the business requirement; the algorithm thus an extra level of abstraction for your successor to deal with. This process would be very difficult to estimate up-front, because it would depend upon how intuitive your algorithm is to your successor.

In general, code and schema design should articulate a business process as clearly and unambiguously as possible, with the goal being that a competent professional could read your code and develop a reasonable understanding of your application without undue distraction. The design you’ve proposed would impose considerable additional overhead on this process: a new programmer would have to get their head around your algorithm before they could begin to assimilate the business logic.

Not only will production support DBAs resist this design, as John suggests, but I’d think most developers who’ve done sustaining work on someone else’s code would object to it as well.

The only thing I'd add to this discussion is that a review of the five normal forms, and their application in the design process, might also be helpful in this instance.

Pond's Seventh Law states, Random elegance that doesn't serve performance and maintainability is no virtue. This is a scenario where the developer has applied a good deal of creativity to build a design that he feels will address a business need. Unfortunately, the proposed solution has serious maintainability issues for both subsequent developers and current production support personnel. It would certainly be elegant upon implementation, but it seems to me that adding a versioning attribute to a stable schema and making use of SQL Server 2005's partitioning technology would produce a far more maintainable, performant solution.

Many thanks to John for drawing me into this discussion.

-wp