SQL Server 2008 XML Indexing

Article
03/23/2009

Here are some notes on "SQL Server 2008 XML Indexing" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

XML Indexing

Engine needs to XML in order to process it
Solution: XML Indexes – pre-parsed representation of XML data
Include tages, values, path from the XML instance, turned into a hidden table
You can actually see the table if connected via admin connection
XML indexes almost always speed up XML queries, but they do come at a cost
Use typed XML as much as you can. Optimizer will like it.
See https://msdn.microsoft.com/en-us/library/ms191497.aspx

XML Indexing - Details

Types: Primary (base) and Secondary (PATH, VALUE, PROPERTY – Non-clustered on the primary)
Storage: Increases XML storage cost. Estimate as 3 times the storage requirement of XML instance in base table.
To create the primary XML index, table must have a clustered primary key
Stored in the same FG/partition as the base table
Primary XML index includes column for ordpath, to keep track of hierarchy
Ordpath also used in SQL Server 2008 for HierachyID data type
Primary index has 11 columns, including id, nid (node), hid (hierarchy), value, lvalue, PK[n]
See https://msdn.microsoft.com/en-us/library/bb500237.aspx

XML Indexing – Demo

Create XML Schema, Create Table with XML column
Gathering data for products, turning into some nice XML
Put XML data into a table (no index)
Query WHERE X.Exist (‘//Name[.=”Adjustable Race”’), look at the plan
Create an index, run the query again, look at the plan again
Look at sys.xml_indexes

XML Indexing – Secondary Indexes

Path index – (‘/a/b[.=”abc”]) - Good for path queries (index on hid, value)
Value index – (‘//b[.=”abc”]) - Optimized to find the value and we don’t know the path (index on value, hid)
Property index – .value(‘/a[1]/b[1]’)='abc' - Optimized for name-value pair storage (index on PK[n], hid, value)
See https://msdn.microsoft.com/en-us/library/bb522562.aspx

XML FullText Searching

XPath and XQuery expressions – case insensitive
FTS will index element values only – tags and attributes will be ignored
FTS can do initial filter, then more exact XML methods

Shred vs. Store Decision

Performance considerations – relational query is faster, shredding and re-combining is slow
Data Structure considerations – XML is extremely flexible, but loosely structured
Data lifecycle considerations – if you input and output XML, don’t shred
Storing as XML data type – XML data type will check, strip, eliminate extra spaces.
Storing as [N]VARCHAR(MAX) – If you care about white space or never do any XML operations
There are cases where you need to store both
Schema binding – Typed XML will be validated, schema will improve XML query performance
Schema binding – Uses more space, no wildcard-based queries, schema migration can be difficult
You can add relational constraints on top of those expressed as XSD

Where to put the XML

Need a clustering key in the table
Consider using a separate table if you can’t have a clustering key
Consider using in-row XML column if smaller than 8,000 bytes
You can “promote a property” to a relational column for better indexing and relational querying using UDF

Markup choice

Element-centric - <a><b>val</b></a>
- Element values considered to be content
- Verbose, more human readable. Maybe yield simpler qiery plans
Attribute-centric - <a b=”val”/>
- Lower storage requirements

Queries

Context node is a dot (.) - X.Exist(‘/a/b[@attr=”abc”]’)=1 not same as X.Exist(‘/a/b[.@attr=”abc”]’)=1
Avoid wildcars – Node level - X.Exist(‘//b[@attr=”abc”]’)=1 – Attribute level - X.Exist(‘/a/b/@*[.=”abc”]’)=1
Singletons – /a/b - /a[1]/b[1] – (/a/b)[1]
Outer Data binding
Merging Value Executions
Favor Exist() over Value()

Shredding

OPENXML – backward compatible, will be deprecated. Always use sp_xml_removedocument
Nodes() – newer, current, faster (test both to be sure).

SQL Server 2008 XML Indexing

Additional resources