SQL Tip: The DENSE and DENSE_RANK Functions


This was originally created in 2011/2012 in a series I dubbed "SQL Tips" for my team (via email and an internal blog site). I've decided to add these SQL Tips to my external blog. These are being posted exactly how they were written ~6 years ago...unless I found someone's name being called out...I may have edited that to protect the innocent. 🙂

Best Practice Tip:
Two-Part Naming. Use it. Rather than just writing the name of the view, table, stored procedure, etc include the schema. Example: Use "dbo.v_R_System" rather than "v_R_System".
Not only is this best practice for writing SQL it does give a slight (very slight) performance advantage.

 

The SQL Tip - the DENSE_RANK() Function:
Someone suggested that I use queries for ConfigMgr in my examples, so I'm going to do that this time.
I'm going to be focusing on DENSE_RANK because given the choice it's the one I prefer 99.99% of the time. However, the functions are so similar that it made sense to include both into the same tip and just show the difference between them.
I'm going to use the following query in this SQL Tip. This query returns the number of messages that were processed by the CAS per Replication Group (it rolls all hardware inventory groups up into one) and site for each day in the last 7 days (not including today).
 
SELECT  CASE WHEN rpl.ReplicationGroup LIKE 'Hardware[_]Inventory%' THEN 'Hardware_Inventory'
             ELSE rpl.ReplicationGroup
        END AS [ReplicationGroup]
       ,rpl.ReplicationPattern
       ,hst.SendingSite
       ,CONVERT(date, (hst.ProcessedTime - GETUTCDATE() + GETDATE())) AS [DayDate]
       ,SUM(hst.MessageCount) AS [MessagesProcessed]
  FROM dbo.DRSReceiveHistory hst
       INNER JOIN dbo.ReplicationData rpl
          ON hst.ReplicationGroupID = rpl.ID
 WHERE CONVERT(date, (hst.ProcessedTime)) BETWEEN DATEADD(day, -8, CONVERT(date,GETUTCDATE()))
                                              AND DATEADD(day, -1, CONVERT(date,GETUTCDATE()))
 GROUP BY  CASE WHEN rpl.ReplicationGroup LIKE 'Hardware[_]Inventory%' THEN 'Hardware_Inventory'
                ELSE rpl.ReplicationGroup
           END
          ,rpl.ReplicationPattern
          ,hst.SendingSite
          ,CONVERT(date, (hst.ProcessedTime - GETUTCDATE() + GETDATE()));
GO
 
The output looks like this:

 

This output is great, but what if I wanted to give a "rank" (or number) to these rows? Last tip we looked at generating a row number using the ROW_NUMBER() function. This time, we'll look at two related functions:  DENSE_RANK() and RANK().

 

Parts of the DENSE_RANK() (and RANK) functions:
DENSE_RANK() -- The function
             OVER( -- the OVER clause
                  PARTITION BY [Column] -- the PARTITION clause (optional)
                  ORDER BY [Column] -- the ORDER clause
                  ) AS 'RankNumber'
The keyword “OVER” must be included along with an “ORDER BY” clause. A “PARTITION BY” clause can be included but is not required.
The ORDER BY in the ROW_NUMBER function determines how the row number will be assigned to the result set. It is important to note that this ORDER BY is not the same as the regular ORDER BY used toward the end of the query.
The PARTITION BY clause allows you to define whether you want to restart the numbering based on a group.

 

Creating a "rank" for each SendingSite:
For whatever reason I want to assign a number to each primary. To do this I will only use an ORDER BY clause on the SendingSite:
DENSE_RANK() OVER (ORDER BY hst.SendingSite)
What this does is assign the number "1" to each row that has a SendingSite equal to EU1 (and 2 to NA1, 3 to RD2, and 4 to RD3).
Now if I do the same thing using RANK() I'll get something very different.
RANK() OVER (ORDER BY hst.SendingSite)
This will assign a unique number to each site just like DENSE_RANK(), but instead of being the numbers 1 through 4, the numbers assigned depend on the number of rows returned by the result set. EU1 will still get the number 1, but NA1 will get the number 161, RD2 will get 321 and RD3 gets 481. Why is this? It's because there are 160 rows for EU1. The RANK function doesn't restart the numbering when assigning the number. So, the first group will get the number 1, and the next group will get the first row number of the group...since there are 160 rows for EU1 the next site (NA1) gets the number 161 because it is the first row number of the group.

 

SELECT  CASE WHEN rpl.ReplicationGroup LIKE 'Hardware[_]Inventory%' THEN 'Hardware_Inventory'
             ELSE rpl.ReplicationGroup
        END AS [ReplicationGroup]
       ,rpl.ReplicationPattern
       ,hst.SendingSite
       ,CONVERT(date, (hst.ProcessedTime - GETUTCDATE() + GETDATE())) AS [DayDate]
       ,DENSE_RANK() OVER (ORDER BY hst.SendingSite) DenseRank_ForSite
       ,RANK() OVER (ORDER BY hst.SendingSite) Rank_ForSite
       ,SUM(hst.MessageCount) AS [MessagesProcessed]
  FROM dbo.DRSReceiveHistory hst
       INNER JOIN dbo.ReplicationData rpl
          ON hst.ReplicationGroupID = rpl.ID
 WHERE CONVERT(date, (hst.ProcessedTime)) BETWEEN DATEADD(day, -8, CONVERT(date,GETUTCDATE()))
                                              AND DATEADD(day, -1, CONVERT(date,GETUTCDATE()))
 GROUP BY  CASE WHEN rpl.ReplicationGroup LIKE 'Hardware[_]Inventory%' THEN 'Hardware_Inventory'
                ELSE rpl.ReplicationGroup
           END
          ,rpl.ReplicationPattern
          ,hst.SendingSite
          ,CONVERT(date, (hst.ProcessedTime - GETUTCDATE() + GETDATE()));
GO

 

What the data can be used for:
I thought I'd show an example of what the data can be used for:
Comments (0)

Skip to main content