Indexing Strategies

I attended a SQL Server User Group meeting earlier this year and heard a presentation about favourite DMV's. Mentioned in the discussion was sys.dm_db_index_usage_stats and that got me thinking about indexing strategies. I have been involved in performance troubleshooting databases that have used a variety of indexing stategies, ranging from none to lots of narrow indexes on practically every column ! However, there is no right and wrong indexing strategy, it depends entirely on your application and the type and frequency of queries being executed against the database.

So how do you know if you have a problem? For me, it's usually when users tell me that the application is "running slow" or they get timeouts. However, poor perfomance can be open to interpretation. There could be a whole host of factors to take into account when you are dealing with web applications, such as network connections and web servers problems. However, this shows that you need to have good benchmarks in place in order to compare performance over a period of time. Another sign could be high CPU, high memory usage or increased disk IO activity. The SQLCAT team has a post detailing the top OLTP performance issues on 2005 and some basic performance counters to monitor can be found here.

 So once you've determined that there is a problem, what do you do? If you know the specific query that is causing the issue, then you can verify the query execution plan. You do this by chosing the 'Display Estimated Execution Plan' option under the 'Query' menu in Management Studio. This does not execute the query, but returns the estimated execution plan based on the statistics the server has. As a result, you need to ensure that the statistics are up to date, or you may get the wrong results. It's also a good idea to turn on statistics IO. Things to look out for in the execution plan are table or index scans and hash aggregates. Scans imply that there are no indexes for SQL Server to use or the indexes are not selective enough and SQL Server has decided it's less expensive to run a scan. Bear in mind that a clustered index scan is exactly the same as a table scan. Hash aggregates have to create a worktable i.e. a temp table in TempDB. Watch out for Hash aggregates as Statistics IO does not show the cost of the worktables, which can often be very expensive. As a rule of thumb, anytime you see "Hash" in your plan it means temp tables and this can be done better ! Another cool thing about showplan is that you can force the queries to use different indexes and run them side by side. This will show the execution plan for both queries and the cost of each relative to the batch. This is a quick way to see which index choices are most expensive.

However, on shared systems with multiple applications running against the SQL Server instance, chances are you will not know the queries that are causing the performance issues and you will need to do some digging. You can use DMV's in 2005, but as I work in a mixed environment (2000 and 2005) I prefer to use profiler to give me an idea what is going on. I run a server side trace and remotely log the results to a file as oppossed to a SQL Server table to minimise overhead on the system. I will typically run this for short periods (10 - 15 mins) just to get a feel for the queries executing against the server. This is a quick way to see if there are any expensive queries running and how frequently. This gives me some clues as to which database is causing the problems. Once I know this, I can really focus in on that database application. More information on running Profiler traces is available here.

Once I have identified the database, the hardest part is deciding which queries to index. Lots of frequently run queries can give you a far bigger overall performance gain than a large query run once a month. It is important to use Profiler and run traces over longer periods to get a feel for the mix of reads and writes. Never build an index in isolation, always consider the workload over the course of time. You can then combine these traces with DTA in order to fully analyse your workload and come up with the proper indexing strategy based on your application's specific needs. It is also worth restoring a backup of your database on a test system and use this for running your DTA analysis on. I cannot stress highly enough, do not create indexes for indexes sake. I often ask the question when interviewing, "when would droping an index actually help to increase performance"? A lot of people are conditioned to believe that you need to build lots of indexes to increase performance, but this is not true. If you have lots of inserts, updates and deletes then this is a significant overhead to keep all these indexes updated if they are not required in the first place.

Before going off and building indexes it is worth looking at alternative options. It is a good idea to update your statistics to ensure that SQL Server has most up to date information to work out the optimal execution plan. If you are working with stored procedures, you can consider recompiling them. You can also consider re-writting the code, especially if you are using cursors!

Generally, SQL Server does better with wider indexes (covering several columns) and fewer of them than it does with narrower ones. Narrow indexes will require a bookmark lookup to get the rest of the data if you don't cover the query. Bookmark lookups are expensive and SQL Server may even decide not to use the index at all and perform a table scan. In this situation, the narrow indexes will not be used and are an unecesary overhead. You can use the DMV sys.dm_db_missing_index_details to see which multi-column indexes could give better performance. Another word of warning here if you are using the 2005 DMV's, such as sys.dm_db_index_usage_stats. Remember that the data in these DMV's is not persisted across server shutdowns, or database restores. Do not assume that because the DMVs show that an index has not been used you can safely drop it. This only means that it hasn't been used since the cache was cleared out. If you are going to rely on the DMV's, you need to persist the data over a period of time and then analyse it. This is easy to do as you can select directly from the DMV's into tables, which can then persist the data. A good time period would be to persist the data every 30 mins. Paul Randal has a post here with information how to persist the index usage data from DMVs.

The final thing I'd like to say about indexing (for the time being !) is that you can't just create indexes and walk away. You need to regularly maintain them in order to reduce fragmentation. Fragmentation in indexes can be caused by inserts, updates and deletes and can be a problem if you have a busy system. By default, indexes are created with 100% fill factor, which means they are densely packed as soon as you create them. If you then need to insert rows or update data, SQL Server will need to carry out page splits in order to do so. To avoid this happening, it is advisable to create indexes with a fill factor of around 80 - 90%, however this may vary depending on the frequency of data updates. Not only will this increase the performance of inserts and updates, it will also keep fragmentation at a minimum. You can check for fragementation using DBCC SHOWCONTIG (2000) and sys.dm_db_index_physical_stats (2005) and remove fragmentation using DBCC INDEXDEFRAG or DBCC DBREINDEX in 2000, and ALTER INDEX REORGANIZE or REBUILD in 2005. There is further information and advce here for rebuilding indexes and updating statistics.

This is a huge topic and I could go on all day, but the importance of good indexing cannot be stressed highly enough ! In summary, you need to ensure that you understand your application's workload, you need to check that the indexes you have are actually being used and that you are not missing any indexes. Finally, you need to maintain your indexes so they are kept in optimal shape. Good indexing means good performance which means happy users !

[This was originally posted on https://sqlblogcasts.com/ in February 2008]