SQL Tip: LEFT JOINs and WHERE clauses…are they really LEFT JOINs?


There are times when I review SQL queries and find people using LEFT OUTER JOINs but then use WHERE clauses that turn those joins into INNER JOINs. Recently as I mentioned this to someone they reminded me that I should blog about it...so here I am blogging about it. 🙂

First off, if you aren't familiar with the different JOINs go ahead and read this other SQL Tip I wrote. Now, let's get to learning a bit more about the LEFT OUTER JOIN (I'll refer to it simply as the LEFT JOIN from here on out).

Let's start with some sample data. The following can be used at the beginning of your own queries if you're going to run these tests for yourself. As you can see, it creates two tables (table variables).

DECLARE @Table1 TABLE (colID int, colVal varchar(5));

DECLARE @Table2 TABLE (columnID int, columnVal varchar(15));

 

INSERT @Table1 VALUES (1,'one'),(2,'two'),(3,'three'),(4,'four'),(5,'five');

INSERT @Table2 VALUES (1,'some value'),(3,'blah blah blah'),(5,'hello world'),(12,'howdy');

 

If we want to return all the records in "Table1" regardless of whether there is an associated record in "Table2" and show the data from "Table2" when there is an associated record we'd write a LEFT JOIN, like so:

SELECT *

  FROM @Table1 tb1

        LEFT OUTER JOIN @Table2 tb2

          ON tb1.colID = tb2.columnID;

Output:

However, if we now want to add a WHERE clause to the query to only get the data from "Table2" where the ID is less than 4 we might do something like this:

SELECT *

  FROM @Table1 tb1

        LEFT OUTER JOIN @Table2 tb2

          ON tb1.colID = tb2.columnID

 WHERE tb2.columnID < 4;

Output:

But...did you see that we only get the values where there is a matching ID in both tables? Hmmm...that seems to act an awful lot like an INNER JOIN right? That's because it is. Don't believe me? Take a look at the execution plan after running the LEFT JOIN with and without the WHERE condition. Here's what you'll see:

 

So how can we get around this? Well, since you're doing a LEFT JOIN to begin with you obviously need or want to return all the records from "Table1" regardless of the data in "Table2". And if you really don't want to return data for certain records in "Table2" then you can filter those records out in your JOIN predicate (the 'search criteria' in the ON portion of the JOIN). For example, the last query would be written like this instead:

SELECT *

  FROM @Table1 tb1

        LEFT OUTER JOIN @Table2 tb2

          ON tb1.colID = tb2.columnID

         AND tb2.columnID < 4;

Output:

See how we still get the 5 records from "Table1" but not the data from "Table2" that didn't meet our criteria? It just returns NULLs for the data in that table that doesn't meet the criteria. Pretty awesome right?

And...we can look at the execution plan again to prove that we are actually using a LEFT JOIN:

What about when instead of a value someone uses the IS NULL condition instead? Well, in that case you can use that in the WHERE clause.

SELECT *

  FROM @Table1 tb1

        LEFT OUTER JOIN @Table2 tb2

          ON tb1.colID = tb2.columnID

 WHERE tb2.columnID IS NULL;

Output:

And, let's look at the execution plan again:

Notice it performs the left join and then filters the data for the NULL records? That's what we want it to do and frankly, it's what we probably expected it to do.

What if we use IS NOT NULL instead? Well...that's another story. I'll spoil the surprise and just tell you it will do an INNER JOIN just like it would with an actual value. If you don't believe me go ahead and run the query yourself and look at the execution plan.

So, there you have it; that's how a LEFT JOIN may or may not actually be a LEFT JOIN. Hopefully this information helps you write better, or at least, more informed queries.

Comments (4)

  1. gotqn says:

    I really can’t see the point here? Is the goal to demonstrate the `LEFT JOIN` behavior when filtering is performed in the `WHERE` clause and the `ON` clause? Or is it the goal to demonstrate how smart the engine is because it knows that when we are filtering by the join column, it can use a hash match?

    Knowing the basics is important. Knowing that the engine will help us turning our not optimized code in effective one is good, too. But at the end, this is very simple example which I doubt will reflect real word scenario.

    You are comparing queries which are returning different output, right? If you want to get only matched by `ID` columns and filtered by `ID`, just used `INNER JOIN` (as you said). This `LEFT JOIN` stuff does not seems right at all.

    1. Right, the goal was to show the behavior of filtering when using a LEFT JOIN and to explain this to those who didn’t know the behavior. Because, as you said, knowing the basics is important. Based on the number of times I see this happening, there are a lot of folks who don’t know this basic concept. I guess it comes down to audience, and you obviously weren’t my target audience. 🙂 I do appreciate the comments though – perhaps I could have come up with some real world examples, but in the end opted for simplicity to show the behavior and let folks apply the learnings to their own real world problem.

Skip to main content