SQL Server 2008 Indexing for JOINs

Here are some notes on "SQL Server 2008 indexing for JOINs" I took during an advanced class I attended taught by Kimberly Tripp  (https://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Tables per select 

  • Old limit of 16 tables per select (before SQL Server 7) went up to 256
  • Lots of people needing large join worked around using temp tables
  • The limit is gone, but people still do it, sometime for simplicity
  • Try turning temp tables into views and joining the views
  • Maximum Capacity Specifications for SQL Server at https://msdn.microsoft.com/en-us/library/ms143432.aspx

Main ideas

  • Each table has a join condition and search arguments
  • For large joins, start with the most expensive table, extract and optimize it
  • one of the tables becomes the "driving table"

Finding the "driving table" 

  • For the most selective of the two tables, index on the search argument
  • For the least selective of the two tables, index on the join column foreign key
  • If you don't know which will be more selective, how to do it?
  • You could have indexes on both search argument and join column for both.
  • Then SQL can have the best indexes not matter which is more selective.

Combine the two 

  • Another step: Cover the combination of the two
  • Index on "search argument, join column" or "join column, search argument"
  • Index on "join column, search argument" is probably more useful in other cases
  • When in doubt: create both, see which one is used, drop the other one.
  • Helps when "join column, search argument" is selective, but each one is not

Covering index 

  • Ultimate step: Create a covering index (INCLUDE all data columns).
  • You will not need any bookmark lookups since all the data is in the index.
  • This is the most expensive index. Might not be worth the trouble.

Statistics 

  • SQL cannot have perfect statistics for a table at all times.
  • Think orders table with hundreds of millions of rows.
  • Average of 5 orders/customers, some corporate customers will have thousands.
  • Statics are calculated in steps, so it's possible that SQL won't know enough.
  • Statistics are by table, not by partition.
  • Having multiple partition tables with a partition view works around that.
  • You could also use filtered statistics to help SQL make the right decision.

Force? 

  • You might end up thinking about forcing the plan to use an index
  • You could make a mistake (force for small customers, query hits corporate ones)
  • Avoid forcing SQL to do something if at all possible.

DTA 

  • Database Tuning Advisor (DTA) will help here. DTA is all about coverage.
  • DTA is great for testing specific queries, suggesting options.
  • Grab a backup of your production database, don't run DTA in production.
  • Evaluate for recommendations AND select "Do not keep any existing PDS" 
  • Be careful when you're dropping things you don't want at the end of DTA...
  • DTA hypothetical indexes are actually statistics.
  • You need real data in your set, or else DTA can't figure statistics correctly.
  • Consider taking the recommended statistics, it might help in other places.
  • DMVs could be good enough, if you don't want to spend the time on DTA.
  • How much time do you want to spend on optimizing a query? It depends...