Database Programming: Mixing TOP With UNION


I’d like to share yet another interesting conversation from an internal discussion list.


Michael sent a query that was producing inconsistent results:


select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]
from WL4U999S TM,
WL3U999S WRKa
,
W03U999S W03a
where W03a.LOCKSTAT = ‘ ‘
and rtrim(W03a.END_QUEUE_FLG) = ‘N’
and rtrim(W03a.SUSPFLAG)= ‘N’
and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)
and rtrim(TM.USERID) = ‘AWDSETUP’
and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)
and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)
and rtrim(WRKa.USERID) = rtrim(TM.USERID)
and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)
and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)


union


select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY
AS [TOTAL PRIORITY]
from WL3U999S WRKb,
W03U999S W03b
where W03b.LOCKSTAT = ‘ ‘
and W03b.END_QUEUE_FLG = ‘N’
and W03b.SUSPFLAG = ‘N’
and WRKb.USERID = ‘AWDSETUP’
and W03b.UNITCD = WRKb.UNITCD
and W03b.WRKTYPE = WRKb.WRKTYPE
and W03b.QUEUECD = WRKb.QUEUECD
and W03b.QUEUECD not in
( select W30.QUEUECD
from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)
order by 5 desc, 1, 2
option(force order);


Michael’s intent was to produce 70 rows from this query, but he was only getting 35 back.  The issue is that the TOP from the first query is running over the entire UNIONed resultset.


Fortunately, we can resolve this issue fairly easily.  All we need to do is place parentheses around each subquery and give each subquery an alias:


select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]


FROM (


select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]


from WL4U999S TM,


WL3U999S WRKa,


W03U999S W03a


where W03a.LOCKSTAT = ‘ ‘


and rtrim(W03a.END_QUEUE_FLG) = ‘N’


and rtrim(W03a.SUSPFLAG)= ‘N’


and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)


and rtrim(TM.USERID) = ‘AWDSETUP’


and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)


and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)


and rtrim(WRKa.USERID) = rtrim(TM.USERID)


and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)


and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)


) a


 


union


 


select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]


FROM (


select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY AS [TOTAL PRIORITY]


from WL3U999S WRKb,


W03U999S W03b


where W03b.LOCKSTAT = ‘ ‘


and W03b.END_QUEUE_FLG = ‘N’


and W03b.SUSPFLAG = ‘N’


and WRKb.USERID = ‘AWDSETUP’


and W03b.UNITCD = WRKb.UNITCD


and W03b.WRKTYPE = WRKb.WRKTYPE


and W03b.QUEUECD = WRKb.QUEUECD


and W03b.QUEUECD not in


( select W30.QUEUECD


from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)


order by 5 desc, 1, 2


option(force order)


) b


The revised query returns a 70 record resultset, as intended.


     -wp

Comments (7)

  1. Anonymous says:

    What’s this, you say? Useful T-SQL making a return to this blog? Yep. The first T-SQL I’ve posted since

  2. Shurik says:

    IMHO the first query returns 35 rows instead of 70 because UNION and not UNION ALL operation is used.

    Each TOP applies to its subquery and after that UNION operation eliminates duplicate records and that is why the entire query return 35 records

  3. Ward Pond says:

    Thanks for your comment, Shurik.

    I ran a quick test with your suggestion and the behavior of this query didn’t change in either case.  Can you pass along a code sample to demonstrate your idea?

  4. Hugo Kornelis says:

    Shurik’s comment is incorrect. The constants "1 AS PRECEDENCE" and "2 AS PRECEDENCE" in the two union’ed query ensure that there can never be any duplicates. Changing UNION to UNION ALL would improve performance, though, as it eliminates the step to search for the (non-existing) duplicates.

    In your revised query, you should duplicate the ORDER BY clause to the first subquery as well. In the current version, the TOP in the first subquery has no corresponding ORDER BY, meaning that you’ll get a pseudo-random selection of 35 rows.

  5. Ward Pond says:

    I agree with you on all points, Hugo.  I have an inkling that Michael’s intent is to have the ORDER BY on the master query:

    select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE, [TOTAL PRIORITY]
    FROM (
    select TOP 35 1 AS [PRECEDANCE], W03a.CRDATTIM, W03a.RECORDCD, W03a.CRNODE, W03a.PRTY + TM.TEAMPRTY AS [TOTAL PRIORITY]
    from WL4U999S TM,
    WL3U999S WRKa,
    W03U999S W03a
    where W03a.LOCKSTAT = ‘ ‘
    and rtrim(W03a.END_QUEUE_FLG) = ‘N’
    and rtrim(W03a.SUSPFLAG)= ‘N’
    and rtrim(W03a.INXFLD03) = rtrim(TM.TEAM)
    and rtrim(TM.USERID) = ‘AWDSETUP’
    and rtrim(W03a.QUEUECD) = rtrim(TM.QUEUECD)
    and rtrim(W03a.WRKTYPE) = rtrim(WRKa.WRKTYPE)
    and rtrim(WRKa.USERID) = rtrim(TM.USERID)
    and rtrim(W03a.UNITCD) = rtrim(WRKa.UNITCD)
    and rtrim(W03a.QUEUECD) = rtrim(WRKa.QUEUECD)
    ) a

    union

    select PRECEDANCE, CRDATTIM, RECORDCD, CRNODE,[TOTAL PRIORITY]
    FROM (
    select top 35 2 AS [PRECEDANCE], W03b.CRDATTIM, W03b.RECORDCD, W03b.CRNODE, W03b.PRTY AS [TOTAL PRIORITY]
    from WL3U999S WRKb,
    W03U999S W03b
    where W03b.LOCKSTAT = ‘ ‘
    and W03b.END_QUEUE_FLG = ‘N’
    and W03b.SUSPFLAG = ‘N’
    and WRKb.USERID = ‘AWDSETUP’
    and W03b.UNITCD = WRKb.UNITCD
    and W03b.WRKTYPE = WRKb.WRKTYPE
    and W03b.QUEUECD = WRKb.QUEUECD
    and W03b.QUEUECD not in
    ( select W30.QUEUECD
    from W30U999S W30 where W03b.QUEUECD = W30.QUEUECD)
    ) b
    order by 5 desc, 1, 2
    option(force order)

  6. Shurik says:

    It is possible that I am missing something again, but the following query returns 70 records and I am unable to reproduce the effect when

    "…the TOP from the first query is running over the entire UNIONed resultset."

    select top 35 1 as precedence, id from sysobjects

    union

    select top 35 2 as precedence, id from sysobjects

    order by 2, 1

    select @@version

    Microsoft SQL Server  2000 – 8.00.2162 (Intel X86)

  7. Ward Pond says:

    Thanks for writing back, Shurik.

    It appears I may have slightly mis-stated Michael’s original issue.  I went back to the original email and found this, which I missed when I wrote my first post:

    — begin excerpt

    The problem is that if you run the top query all by itself without an order by it brings back 1 set of data, if you run it with the sort from the bottom(in another query window) then you get another data set that is totally different. You can view all 35 records from each query and they aren’t even close to being the same.

    Then if you run the query on the bottom you get zero rows(which you should). Run this entire thing together as is and you get another different dataset. I then moved the top query to the bottom and re-ran it, this time the sort per se affects the over all records and you get back the same data as if you ran the top query all by itself, with ORDER by. I am guessing, its either an issue, or its how the optimizer is running the query, because I don’t honestly see how this query isn’t working.

    — end excerpt

    It’s this issue that the new nested query resolves.

       -wp