SQL Server Schema Design

I have seen some strange schemas in my time which look like a good idea on paper but not on disk. A common scenario is the schema that is created by a tool controlled by a user and so we end up with columns like user21 in usertabel7 and so on. Then I got an e-mail about the poor performance of user defined functions (UDFs) following Simon Sabin's talk on the CLR and how good they could be for replacing T-SQL UDF's at the SQL Community event last Thursday .

The database involved in the problem had several tables where generic columns where used for different purposes. Taking customer as an example there was a separate table CustomerStrings that looked like this:

Column Type
CustomerID int not null PK
CustomerStringTypeID int not null
StringValue varchar(50)

CustomerID is foreign key to the main customer table and CustomerStringTypeID is another foreign key, this time to CustomerStringType which has an ID and a description, with values in the description such as  'Fax', 'Mobile', 'e-mail' etc.

So the CustomerString table would have multiple rows for the same customer for example a row for Fax, mobile, email and so on.  The good thing about this is that rows would only be created where a given attribute for a customer was used and so would be efficient on space.

Getting back all the e-mail addresses for a customer wouldn't be too bad either depending on how this was implemented and this is where the developer tried to implement a generic solution with a UDF:

CREATE FUNCTION dbo.GetCustomerString(@customerID int, @customerStringTypeID int) RETURNS varchar(50) AS BEGIN

DECLARE @returnValue varchar(50);

SELECT @returnValue = CustomerStrings.StringValue

FROM CustomerStrings

WHERE CustomerStrings.CustomerID = @customerID

AND CustomerStrings.CustomerStringTypeID = @customerStringTypeID

RETURN @returnValue

END;

The problem arises when this function is used in-line on multiple occasions to get three attributes for every customer:

SELECT

cst.LastName,

dbo.GetCustomerString(Customers.CustomerID, 1) AS Fax,

dbo.GetCustomerString(Customers.CustomerID, 2) AS mobile,

dbo.GetCustomerString(Customers.CustomerID, 3) AS email FROM dbo.Customers cst;

So the UDF gets called three times for every row in the customer table which is always going to be much slower than using a set based operation for example:

SELECT

cst.LastName

fax.StringValue as Fax

mob.StringValue as Mobile

eml.StringValue as email

FROM dbo.Customers cst

LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = fax.CustomerID AND fax.CustomerStringTypeID =1

LEFT JOIN dbo.CustomerStrings mob ON cst.CustomerID = mob.CustomerID AND fax.CustomerStringTypeID =2

LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = eml.CustomerID AND eml.CustomerStringTypeID =3;,

However maintaining all the code for the possible combinations of attributes required is a big headache for this company so how can they get the speed of the second query without writing mountains of code.  In my opinion the schema is the problem here.  I would either create sufficient columns in the customer table for all the attributes required or using another group of tables with one row per customer to store contact details etc. Of course if there loads of columns required (as in SharePoint) many of which are usually empty then it's time to have a look at Sparse columns in SQL Server 2008 and that is what I will be posting about next..

Technorati Tags: SQL Server,User Defined Functions