Database Programming: Temp Tables or Table Variables?

On the occasion of this blog's one hundredth post, I thought I'd pass along some information I recently discussed with my youngest cousin, Tom, who is like me an otherwise rational person who's been bitten by the database development bug. In a conversation on the subject topic, Tom wrote:

My concern with using temp tables is partly superstitious - Using [table variables] sounds so much quicker than writing to disk.

The tidbit I passed along to Tom is now a tidbit for you: The contents of both table variables and temp tables are instantiated in tempdb, so you’re not really buying yourself as much as you might think you are when you try to stay off the disk by using table variables. So what’s the difference between them? Table variables don’t have statistics on them, so they are far less likely to provoke a recompile when the optimizer sees them; temp tables have statistics on them, so they generate a recompile every time they’re built. In SQL Server 2005, you can isolate the recompile to the statement rather than the entire proc (which is how SQL Server 2000 behaves), but you still get some flavor of a recompile.

SO.. if your data volume in the temp structure is low, OR if you’re certain you’ll always be scanning the entire table, use a table variable. If your data volume in the temp structure is random, or if you’re going to be performing highly selective queries against the temp structure, use a temp table and build indexes on it after you populate it if they boost performance with your highest volume.

Don’t build a clustering key, though, unless you can prove it helps overall performance and you put an appropriate ORDER BY on the INSERT statement so you’re not churning out-of-order data on the disk when you build the index. This is a temp structure, so clustering it on disk when we’re going to blow it away in a couple hundred milliseconds anyway doesn’t make much sense unless it’s done at zero overhead.

If the temp table is only referenced once, consider using a derived table instead.

In all instances, you should unit test all permutations of this with low, medium, and high volumes, weight those for the number of times each volume is run (if high volume is run once a day, you might have a different position on its performance than you would if it was run 10,000 times an hour; both are possible), and work to meet the established SLA for whatever it is you’re writing.

Here's to the next 100 posts..

     -wp