Primary keys and Clustered Indexes

One of the many great SQL Server resources out there is SQL Server Central, and this article on there, Worst Practices - Not Using Primary Keys and Clustered Indexes, by Andy Warren caught my eye.  For the most part he’s absolutely right you’d be mad not to have a primary key on a table and to leverage the power of clustered index. BTW don’t assume that Andy is saying that the primary key is the clustered index, You can only have one of each on a table but they often shouldn’t be on the same column.

However I can think of one scenario where this is not a good idea, and that is in a large fact table in a data warehouse.  If you feel you need a primary key, the natural temptation is to make on by concatenating the foreign keys to the dimension tables.  But this primary key has little real value because it won’t take part in any but a few queries, and will slow down the loading of this fact table, indeed I have often removed the all fact table indexes while populating it so any primary key wouldn’t be enforcing referential integrity.  For data warehousing, Referential integrity is done by the ETL process. 

If a primary key is needed for a a business reason, for example because you need one to create a Reporting Services Report model, then I would suggest you create a new identity column for this purpose and use that.

Clustered indexes on fact tables can also be of little value as they can only work on one of the foreign keys to one of the dimensions, while they can seriously slow down the loading of the fact table.

So data warehouses are a special case, and for the most part do encourage force your developers and designers to use these two features.

The other thing I would encourage you to do is to check out the various community sites and forums out there, someone will have done it before or have an answer to your problem.

Technorati Tags: SQL Server 2008,clustered indexes,primary keys