Database Programming: Applying APPLY -- Solving Parent Axis Access Performance Issues In XML Access In SQL Server 2005

This little nugget will save you an incredible amount of time if you face the need to retrieve data from multiple levels of the same XML document in one SELECT statement.

 

The two methods we'll examine in this post are “parent axis access,” the “classic” method of addressing this requirement, and the new SQL Server 2005 CROSS APPLY operator (CROSS APPLY is fully described in the December, 2005 version of SQL Server 2005 Books OnLine at ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/0208b259-7129-4d9f-9204-8445a8119116.htm).

Consider the following script, which seeks to retrieve the data from the title elements at each level of the XML:

-- ParentAxisAccessFixDemo

-- run via Query Analyzer with Include Actual Execution Plan turned on

DECLARE @xml XML

SET @xml ='<solutionmap>

      <title>SolutionX</title>

      <level0-item><title>ScenarioGroupA</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <level0-item><title>ScenarioGroupB</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <title>SolutionX</title>

      <level0-item><title>ScenarioGroupA</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

     <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

      <level0-item><title>ScenarioGroupB</title>

            <level1-item><title>Scenario1</title>

                  <process><title>ProcessA</title></process>

            </level1-item>

            <level1-item><title>Scenario2</title>

                  <process><title>ProcessA</title></process>

                  <process><title>ProcessB</title></process>

            </level1-item>

      </level0-item>

</solutionmap>'

SELECT  @xml

SELECT  c.value('(../../../title/text())[1]', 'nvarchar(100)') AS Solution,

        c.value('(../../title/text())[1]', 'nvarchar(100)') AS ScenarioGroup,

        c.value('(../title/text())[1]', 'nvarchar(100)') AS Scenario,

        c.value('(title/text())[1]', 'nvarchar(100)') AS Capability

FROM    @xml.nodes('/solutionmap[1]/level0-item/level1-item/process')

AS      t(c)

-- the query above is 87% of total cost on my single proc system

SELECT  sm.sm.value('(title/text())[1]', 'nvarchar(100)') AS Solution,

        l0.item.value('(title/text())[1]', 'nvarchar(100)') AS ScenarioGroup,

        l1.item.value('(title/text())[1]', 'nvarchar(100)') AS Scenario,

        p.process.value('(title/text())[1]', 'nvarchar(100)') AS Capability

FROM    @xml.nodes('/solutionmap[1]')

AS      sm(sm)

CROSS APPLY

        sm.sm.nodes('level0-item')

AS      l0(item)

CROSS APPLY

        l0.item.nodes('level1-item')

AS      l1(item)

CROSS APPLY

        l1.item.nodes('process')

AS      p(process)

-- the query above is 13% of total cost on my single proc system

If we examine the XML, we see that each node will have, at maximum, one title attribute. There can be any number of level0 nodes under the root solutionmap node, any number of level1 nodes under a particular level0 node, and so on. Each of them will have a single title attribute, and we wish to retrieve them all.

In SQL Server 2000, if we wanted to write a query to extract all of the title attributes, we would use the first SELECT above. The query drills down to the deepest level of the XML for the Capability value, as noted in the NODES() reference. The values from the higher levels (Scenario, ScenarioGroup, and Solution) are retrieved from the parent, grandparent, and great-grandparent (in this case, also the root) nodes respectively.

A look at the execution plan for this query shows that we’ve spawned thirteen XML Reader table valued function calls. The results of these calls are fed to a two-level nested series of sixteen INNER JOINs (eleven nested loops, three merges, and two hash matches) to produce the results.

With SQL Server 2005’s new APPLY syntax, we use the second query above. The execution plan for this query shows eight XML Reader table valued function calls, down from the original thirteen. These are fed through a single tier of seven JOINs (five nested loops, one merge, and one hash match) to produce the results.

The proof is in the pudding, as they say. When I run these two queries on SQL Server 2005 RTM on my hyperthreaded single-processor laptop, the parent axis access query consumes 87% of the resources for the batch, while the CROSS APPLY version consumes 13%. This is an immediate 670% performance improvement with a trivial amount of data. This performance improvement scales non-linearly with respect to the volume of XML processed -- the more XML you’ve got, the better the performance improvement is!

Parent axis access has been my sole remaining major performance complaint with XML access in SQL Server, and this CROSS APPLY syntax resolves it nicely. Pardon the double negative, but when it comes to SQL Server 2005 and XML access, there’s no longer much you can’t do.