SQL Server 2008 XML Indexing

Here are some notes on "SQL Server 2008 XML Indexing" I took while attending an advanced class on SQL Server taught by Adam Machanic (https://sqlblog.com/blogs/adam_machanic/default.aspx).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


XML Indexing

  • Engine needs to XML in order to process it
  • Solution: XML Indexes – pre-parsed representation of XML data
  • Include tages, values, path from the XML instance, turned into a hidden table
  • You can actually see the table if connected via admin connection
  • XML indexes almost always speed up XML queries, but they do come at a cost
  • Use typed XML as much as you can. Optimizer will like it.
  • See https://msdn.microsoft.com/en-us/library/ms191497.aspx

XML Indexing - Details

  • Types: Primary (base) and Secondary (PATH, VALUE, PROPERTY – Non-clustered on the primary)
  • Storage: Increases XML storage cost. Estimate as 3 times the storage requirement of XML instance in base table.
  • To create the primary XML index, table must have a clustered primary key
  • Stored in the same FG/partition as the base table
  • Primary XML index includes column for ordpath, to keep track of hierarchy
  • Ordpath also used in SQL Server 2008 for HierachyID data type
  • Primary index has 11 columns, including id, nid (node), hid (hierarchy), value, lvalue, PK[n]
  • See https://msdn.microsoft.com/en-us/library/bb500237.aspx

XML Indexing – Demo

  • Create XML Schema, Create Table with XML column
  • Gathering data for products, turning into some nice XML
  • Put XML data into a table (no index)
  • Query WHERE X.Exist (‘//Name[.=”Adjustable Race”’), look at the plan
  • Create an index, run the query again, look at the plan again
  • Look at sys.xml_indexes

XML Indexing – Secondary Indexes

  • Path index – (‘/a/b[.=”abc”]) - Good for path queries (index on hid, value)
  • Value index – (‘//b[.=”abc”]) - Optimized to find the value and we don’t know the path (index on value, hid)
  • Property index – .value(‘/a[1]/b[1]’)='abc' - Optimized for name-value pair storage (index on PK[n], hid, value)
  • See https://msdn.microsoft.com/en-us/library/bb522562.aspx

XML FullText Searching

  • XPath and XQuery expressions – case insensitive
  • FTS will index element values only – tags and attributes will be ignored
  • FTS can do initial filter, then more exact XML methods

Shred vs. Store Decision

  • Performance considerations – relational query is faster, shredding and re-combining is slow
  • Data Structure considerations – XML is extremely flexible, but loosely structured
  • Data lifecycle considerations – if you input and output XML, don’t shred
  • Storing as XML data type – XML data type will check, strip, eliminate extra spaces.
  • Storing as [N]VARCHAR(MAX) – If you care about white space or never do any XML operations
  • There are cases where you need to store both
  • Schema binding – Typed XML will be validated, schema will improve XML query performance
  • Schema binding – Uses more space, no wildcard-based queries, schema migration can be difficult
  • You can add relational constraints on top of those expressed as XSD

Where to put the XML

  • Need a clustering key in the table
  • Consider using a separate table if you can’t have a clustering key
  • Consider using in-row XML column if smaller than 8,000 bytes
  • You can “promote a property” to a relational column for better indexing and relational querying using UDF

Markup choice

  • Element-centric - <a><b>val</b></a>
  • - Element values considered to be content
  • - Verbose, more human readable. Maybe yield simpler qiery plans
  • Attribute-centric - <a b=”val”/>
  • - Lower storage requirements

Queries

  • Context node is a dot (.) - X.Exist(‘/a/b[@attr=”abc”]’)=1 not same as X.Exist(‘/a/b[.@attr=”abc”]’)=1
  • Avoid wildcars – Node level - X.Exist(‘//b[@attr=”abc”]’)=1 – Attribute level - X.Exist(‘/a/b/@*[.=”abc”]’)=1
  • Singletons – /a/b - /a[1]/b[1] – (/a/b)[1]
  • Outer Data binding
  • Merging Value Executions
  • Favor Exist() over Value()

Shredding

  • OPENXML – backward compatible, will be deprecated. Always use sp_xml_removedocument
  • Nodes() – newer, current, faster (test both to be sure).