Simplifying SQL Server Joins (Query and Internals)

I love to explain things in much simpler way but SQL Joins are one of those concepts which are not so simple to explain and I really spend a lot of time in scribbling on White Board for this. But finally I found an article from Jeff Atwood where he did an incredible job by simply using Venn Diagrams to explain the whole concept. I will take few extract from his article but rest you can read from his blog.

INNER JOIN:

SELECT * FROM TableA

INNER JOIN TableB

ON TableA.name = TableB.name

 

clip_image002

OUTER JOIN (1):

SELECT * FROM TableA

FULL OUTER JOIN TableB

ON TableA.name = TableB.name

clip_image004

OUTER JOIN (2):

SELECT * FROM TableA

LEFT OUTER JOIN TableB

ON TableA.name = TableB.name

 

clip_image006

OUTER JOIN (3):

SELECT * FROM TableA

LEFT OUTER JOIN TableB

ON TableA.name = TableB.name

WHERE TableB.id IS null

clip_image008

OUTER JOIN (4):

SELECT * FROM TableA

RIGHT OUTER JOIN TableB

ON TableA.name = TableB.name

clip_image009

 

OUTER JOIN (5):

SELECT * FROM TableA

FULL OUTER JOIN TableB

ON TableA.name = TableB.name

WHERE TableA.id IS null

OR TableB.id IS null

clip_image011

CROSS APPLY (1): without Where Clause

SELECT p.SalesPersonID, t.Name AS Territory

FROM Sales.SalesPerson p

CROSS JOIN Sales.SalesTerritory t

ORDER BY p.SalesPersonID

Note: This can’t be represented by Venn Diagrams but above would multiply all the contents of First Table with all the contents of the second table.

CROSS APPLY (2): with Where Clause

SELECT p.SalesPersonID, t.Name AS Territory

FROM Sales.SalesPerson p

CROSS JOIN Sales.SalesTerritory t

WHERE p.TerritoryID = t.TerritoryID

ORDER BY p.SalesPersonID

clip_image012

Now when we understood these different types of Joins, let’s talk about some of the internal on How these joins would work internally within SQL Server.

clip_image014

Nested Loop Join:

Get Row from Outer Table (1st input to Join operator in showplan)

Get Matching Row from Inner Table (2nd input to Join operator in showplan)

Output Composite Result

Loop Through Inner Table

When Inner Table Exhausted, Loop on Outer Table

 

 

 

SET STATISTICS PROFILE ON

SELECT a.ID,a.name, SUM(m.Marks)/4 FROM

TblHeap a join TblMarks m on a.id = m.id WHERE a.id = 10

GROUP BY a.id, a.name

 

clip_image020 Merge Join:

 

SET STATISTICS PROFILE ON

SELECT a.ID,a.name, SUM(m.Marks)/4 FROM

TblHeapNcl a join TblMarks m on a.id = m.id WHERE a.id < 1000

GROUP BY a.id, a.name

 

 

Hash Join:

clip_image022

SET STATISTICS PROFILE ON

SELECT a.Name FROM HashTab1 a INNER JOIN HashTab2 b

ON a.Name = b.Name

 

 

Few FAQ’s around above topics are:

Q1. Which logical join operators does the nested loops join support?

A: The nested loops join supports the following logical join operators:

  • Inner join
  • Left outer join
  • Cross join
  • Cross apply and outer apply
  • Left semi-join and left anti-semi-join

The nested loops join does not support the following logical join operators:

  • Right and full outer join
  • Right semi-join and right anti-semi-join

 

Q2. How would we decide the type of join in my query?

A: Ideally SQL Query Optimizer would decide which join is cost effective for you and implement accordingly. Alternately you may use:

option(Loop join)

option (Merge join)

option (Hash join)

 

Q3: Which join is best?

A: Neither of these is good nor bad. The join operations would take some of the system resources and depending upon your database design, it could be high or low. Generally Nested Loop is more IO intensive and we see more usage with smaller tables. While Merge is bit CPU hungry but IO wise both Merge and Hash might perform very much the same. Also Hash might take longer time in Execution as by design it supposed to populate the resultant before display.

 

Q4: What are the types of Hash?

A: There are three different types of Hash Join:

  • In-memory Hash Join - In build phase table fit completely in memory.
  • Grace Hash Join - In build phase table does not fit completely in memory and spans to disk.
  • Recursive Hash Join - In build phase table is very large and have to use many levels of merge joins.