Here are some notes on “SQL Server 2008 indexing for JOINs” I took during an advanced class I attended taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Tables per select
- Old limit of 16 tables per select (before SQL Server 7) went up to 256
- Lots of people needing large join worked around using temp tables
- The limit is gone, but people still do it, sometime for simplicity
- Try turning temp tables into views and joining the views
- Maximum Capacity Specifications for SQL Server at http://msdn.microsoft.com/en-us/library/ms143432.aspx
- Each table has a join condition and search arguments
- For large joins, start with the most expensive table, extract and optimize it
- one of the tables becomes the “driving table”
Finding the “driving table”
- For the most selective of the two tables, index on the search argument
- For the least selective of the two tables, index on the join column foreign key
- If you don’t know which will be more selective, how to do it?
- You could have indexes on both search argument and join column for both.
- Then SQL can have the best indexes not matter which is more selective.
Combine the two
- Another step: Cover the combination of the two
- Index on “search argument, join column” or “join column, search argument”
- Index on “join column, search argument” is probably more useful in other cases
- When in doubt: create both, see which one is used, drop the other one.
- Helps when “join column, search argument” is selective, but each one is not
- Ultimate step: Create a covering index (INCLUDE all data columns).
- You will not need any bookmark lookups since all the data is in the index.
- This is the most expensive index. Might not be worth the trouble.
- SQL cannot have perfect statistics for a table at all times.
- Think orders table with hundreds of millions of rows.
- Average of 5 orders/customers, some corporate customers will have thousands.
- Statics are calculated in steps, so it’s possible that SQL won’t know enough.
- Statistics are by table, not by partition.
- Having multiple partition tables with a partition view works around that.
- You could also use filtered statistics to help SQL make the right decision.
- You might end up thinking about forcing the plan to use an index
- You could make a mistake (force for small customers, query hits corporate ones)
- Avoid forcing SQL to do something if at all possible.
Database Tuning Advisor (DTA) will help here. DTA is all about coverage.
DTA is great for testing specific queries, suggesting options.
Grab a backup of your production database, don’t run DTA in production.
Evaluate for recommendations AND select “Do not keep any existing PDS”
Be careful when you’re dropping things you don’t want at the end of DTA…
DTA hypothetical indexes are actually statistics.
- You need real data in your set, or else DTA can’t figure statistics correctly.
Consider taking the recommended statistics, it might help in other places.
- DMVs could be good enough, if you don’t want to spend the time on DTA.
- How much time do you want to spend on optimizing a query? It depends…