Here are some notes on "SQL Server 2008 Indexing" I took while attending an advanced class taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
- Good database design is necessary
- SQL Server does not "automatically take care of everything"
- Just throwing hardware at the problem is not a good strategy
- Good design will give you orders of magnitude improvements
- Clients are commonly doing things servers should be doing
- In maintenance plans, if you rebuild an index, you can skip reorg or update stats.
- If you just go through the wizard, it seems like you need to do all three...
- Details at http://www.sqlskills.com/BLOGS/KIMBERLY/post/Database-Maintenance-Best-Practices-Part-I-e28093-clarifying-ambiguous-recommendations-for-Sharepoint.aspx
- Primary keys are sometimes chosen for wrong reasons
- No indexes are automatically created for foreign keys
- GUIDs can be evil (as clustering keys, for instance)
- Page fragmentation also wastes space in the buffer pool (in-memory cache)
- Vertical partitioning can have a huge impact in performance
- You should know your average row size and rows per page for a table
- Leaf level of indexes contains something for every row of the table
- The slot array of the page gives you the order on the page
- The order of the pages is maintained by the links between pages
- Using SELECT COUNT(*) is OK, you don't need to use SELECT COUNT(1)
- Non-leaf level of indexes is all about navigation and scalability
- Index pages with non-null, non-variable columns = 1 byte of overhead
- Index pointers are 6 bytes - 2 for the file number, 4 for the page number
- With integer key (4 bytes), you get 622 entries in a non-leaf level page
- You can get millions of rows in a table with just a few non-leaf levels
- The cost of a lookup is equivalent to the number of levels
- SQL needs a query plan you can get to quickly, not the ideal query plan
- In non-clustered indexes for heaps, SQL uses a RID (file+page+slot=8 bytes)
- This is twice as the integer at 4 bytes, which could add up to a lot
- A well designed clustered index usually is better than a heap.
GUIDs as keys
- A GUID key is 16 bytes. You use four times the space of an 4-byte integer
- This extra space is wasted in all indexes, foreign keys, etc
- GUID keys not only waste space, but also CPU and time to perform lookups
- Joins for GUID-based keys are also a lot more expensive than with integers
- This has to be in the design. It's hard to change after the fact
- It least use a unique integer as clustered key and GUID as non-clustered
- GUIDs used everywhere as keys is a growing performance problem
- Sequential GUIDs plus page compression helps (somewhat) because of common prefix.
- Details at http://sqlkpi.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
- Leaf level of a non-clustered index has non-clustered key and primary key (or RID)
- Assuming a PK with an integer and FK with SSN, you can fit 449 entries per page
- If the non-clustered index is not unique, non-leaf levels have additional ID
- So, if possible, have unique non-clustered index and define them as unique
- This will possibly reduce the number of non-leaf pages and index levels.
Looking at the pages
- Start with a "SELECT * FROM sys.dm_db_index_physical_stats"
- DBCC PAGE with non-leaf level pages shows the details, including key info
- If you know the root page, you can navigate down to the leaf level using DBCC PAGE
Clustering key selection
- Clustering key selection is very important
- If you choose a wide clustering key, it has big consequences
- If we can get away with a 4-byte key, it makes a big difference
- There are other things to consider there, but width is always an issue
You can read other posts on indexing on Kimberly's blog at http://sqlskills.com/BLOGS/KIMBERLY/category/Indexes.aspx