Egad, a technical post! And after I’d already said goodbye for the month. A long time ago, a friend taught me that expectations are small murders. What does that make plans?
At any rate, Peter asked a FAQ on one of the internal discussions this morning, and since we haven’t discussed it here..
I have a simple question regarding the Select Top …. clause.
When I use this clause, will the result be always the same, or does SQL 2000 (in my case here) return always a different set of data?
If you don’t define an “order by”, by standard SQL semantics, you are not guaranteed to always have the same result. Most of the time, you will, but index updates and server activity can change the records returned (even if there is no data insertion/update, which of course can change the data returned in any case and I suppose it is not your scenario)
That’s not a limitation of SQL 2000 or 2005, is simply part of the SQL definition: think about sets of records, they are unordered collections unless you explicitly asks for some order.
.. which led me to post the following:
Just to expand on Bruno’s point a bit..
TOP without ORDER BY doesn’t make much sense. If one is programming a ranking function (which is basically what TOP Is), criteria by which to rank is a crucial part of the specification. As Bruno points out, if you don’t specify ranking criteria, SQL Server will simply return the first records it finds until the count is satisfied. There are several scenarios (parallelism primary among them) which could lead to different result sets in this scenario.
Bruno then wrote:
There are some scenarios that TOP without ORDER BY is useful, though – getting a simple view of the table structure and sample data, process records N at a time, when you have to process all that meet some WHERE filter.
.. to which I replied:
While a simple view of the table structure can be retrieved with a TOP query, it’s more efficient to use a WHERE 1=0 construct, which will bypass indexes completely.
In your N records example, without an ORDER BY, there’s no guarantee you wouldn’t process the same record twice.
Think about the “top 10 hits” in music. Top by what? Volume of the bass guitar? Number of drum hits? The “top 10 hits” are measured by sales. If we ordered by drum hits, we’d probably get a different list. Since I believe that one of our jobs in building queries is to give the optimizer as much information as possible (never mind for the query to produce the same results (in context) every time it’s run), for my money TOP without ORDER BY is like IF without THEN.
While this is all obviously my two cents, I just can’t see invoking a ranking function (TOP) without criteria (ORDER BY). Furthermore, this strikes me as an instance where Pond’s Tenth Law of System Design applies: your code is a communication with someone else, who will likely come after you are gone.
Would/Have you ever implemented code in production that uses TOP without ORDER BY?