Advantages and Annoyances of Filtered Indexes

Article
04/02/2018

This weeks great post by our US Senior PFE Susan Van Eyck talks about the wonders of filtered indexes!

Check it out!

Today’s Byte is about the advantages and annoyances of filtered indexes. We’ll talk about

What they are
Where (and why) they’re useful
How to create one
How to convince the optimizer to use them

What is a filtered index?

“A filtered index is an optimized nonclustered index especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance as well as reduce index maintenance and storage costs compared with full-table indexes.”

It achieves these wonderful feats mostly by being smaller than a non-filtered nonclustered index which must include one row for every row in its parent table – even when the indexed column(s) is NULL. Remember – smaller is faster!

The other big benefit of filtered indexes is filtered stats which might get you better query plans. Since the stats for a filtered index are built against only the data in the index they can be more accurate (granular) than stats built against an entire table’s data.

When might you want to use one?

When indexing a column with a lot of NULL values. Here’s an example from AdventureWorks:

[sql]CREATE INDEX ix_MiddleName ON Person.Person ( MiddleName );

CREATE INDEX ix_MiddleName_NotNULL ON Person.Person ( MiddleName )

WHERE MiddleName IS NOT NULL;[/sql]

The filtered index is 32% smaller than the one with all the NULLs.

When you want to facilitate access to subsets of data:

[sql]CREATE INDEX ix_OnlineOrderFlag ON Sales.SalesOrderHeader ( OnlineOrderFlag );

CREATE INDEX ix_OnlineOrderFlag_0 ON Sales.SalesOrderHeader ( OnlineOrderFlag )

WHERE OnlineOrderFlag = 0;[/sql]

Again, the filtered index is much smaller.

Of course, useful indexes are seldom defined against a single column. Perhaps you need to “cover” different sets of columns depending on which Status you’re querying for:

[sql]CREATE INDEX ix_Status_A ON dbo.Products ( Col1, Co2 ) INCLUDE ( Col4, Col7, Col8 ) WHERE Status = 'A';

CREATE INDEX ix_Status_F ON dbo.Products ( Col1, Co2 ) INCLUDE ( Col7, Col9, Col13 ) WHERE Status = 'F';

CREATE INDEX ix_Status_X ON dbo.Products ( Col2 ) ( Col3, Col9, Col12 ) WHERE Status = 'X';[/sql]

You might also want to filter on various date ranges, but know that this is a bit trickier to implement since you can’t do something like the highlighted code:

[sql]CREATE INDEX ix_PONumber_DueDate ON Sales.SalesOrderHeader (PurchaseOrderNumber)

INCLUDE ( DueDate, TotalDue )

WHERE Status = 4 AND DueDate > DATEADD(DAY, -5, SYSDATETIME());[/sql]

Instead, you’d have to create the index using a literal (deterministic) date:

[sql]CREATE INDEX ix_PONumber_DueDate ON Sales.SalesOrderHeader (PurchaseOrderNumber)

INCLUDE ( DueDate, TotalDue )

WHERE Status = 4 AND DueDate > '20180101';[/sql]

Creating an index with a somewhat larger date range can still result in an index that’s significantly smaller its parent table. You can get more specific when you run the queries that will use the index. Having a somewhat broader range allows you to update the index less frequently as your target date range moves. You might also create multiple different filtered indexes having differing date ranges on a single table.

So, how do you create one?

CREATE INDEX (T-SQL) Create Filtered Indexes

It’s pretty straightforward, you just tack a WHERE clause onto your CREATE INDEX statement as in the examples above. The expressions can only use the simple comparison operators ( =, >, <>, etc.) and a handful of other constructs (IN, IS NULL, IS NOT NULL). Here are a few more examples:

[sql]WHERE StartDate > '20000101' AND EndDate <= '20000630'

WHERE ComponentID IN (533, 324, 753)

WHERE StartDate IN ('20000404', '20000905') AND EndDate IS NOT NULL[/sql]

Note that the data from the columns in the WHERE clause isn’t stored in the index. If you need those values, use the column in the key or included column list.

Here are a few additional limitations…

You can’t create a filtered index against a view (although a view can leverage filtered index defined on a table)
Errors will occur if column data is implicitly or explicitly converted
You cannot reference a computed column, a UDT column, a spatial or hierarchyID data type column
Filtered indexes do not apply to XML indexes and full-text indexes

Why won’t the Optimizer use my filtered index?

In a word – parameterization – and it actually makes a lot of sense when you think about it in terms of query plan reuse (see this great blog post from Bart Duncan).

Queries are parameterized to allow reuse of cached plans. This is generally good for performance – barring the occasional case of parameter sniffing. Consider the following queries:

[sql]SELECT * FROM Person.Person WHERE LastName = N'Anderson';

SELECT * FROM Person.Person WHERE LastName = N'Baker';

SELECT * FROM Person.Person WHERE LastName = N'anderson';[/sql]

To the Optimizer they’re all different queries. We’ve got 2 different last names and a difference in capitalization. Plans for all 3 queries will be separately compiled and cached, and we don’t get to leverage plan reuse.

If, instead, that literal last name value was cached as a parameter, it would remove the variability and allow plan reuse - for any value of the parameter. We can achieve this using a stored proc:

[sql]CREATE PROC getPersonData ( @LastName VARCHAR(50) )

SELECT * FROM Person.Person WHERE LastName = @LastName;

GO[/sql]

Or let SQL help us out with auto-parameterization - SIMPLE or FORCED. Simple auto-parameterizes simple T-SQL (like the queries above) and Forced parameterizes all queries. Which is in use when set using ALTER DATABASE. Simple is the default setting. When auto-parameterization is in use you’ll see something like this in an Actual Query Plan:

[sql]SELECT * FROM Person.Person WHERE LastName = <b>@1</b>;[/sql]

So, now we’ve got a plan cached for reuse – good for any parameter value. All’s well, right?? Wrong. This is where we butt heads with the Optimizer over use of our filtered index.

If a plan were to be cached using a filtered index for, say purchase orders from 2017, that plan is useless if someone later submits a query for data from 2016 or 2018 – date ranges not covered by our filtered index. So, the optimizer simply doesn’t use it. The Optimizer needs to cache a plan that’s good for a wide range of parameters – not just 2017 values.

Coercing the Optimizer to do our bidding

Here are a few workarounds to you can try to encourage the Optimizer to use your filtered indexes:

Add literal values that align with your filtered index. These will have to updated periodically if you modify your filtered indexes – say to follow a date range.
Query a view that’s based on literal values that align with your filtered index then query the view instead of the base table. You can update the view periodically to cover a new date range that also aligns with a filtered index.
Use a RECOMPILE hint – this prevents the plan from being cached, reuse isn’t an issue, so the Optimizer is OK with using the filtered index.

In the attached script we’ll create a filtered index then look where it’s used (or not) and explore the workarounds listed above.

Happy Exploring!