Use PowerShell to Compare Two Files


Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to compare two files.

Hey, Scripting Guy! Question Hey, Scripting Guy! I have a script that I wrote to compare two files, but it seems really slow. I am wondering what I can do to speed things up a bit.

—JW

Hey, Scripting Guy! Answer Hello JW,

Microsoft Scripting Guy, Ed Wilson, is here. I looked at the script you supplied, where you use Compare-Object to compare two files. Here is your script:

$fileA = "C:\fso\myfile.txt"

$fileB = "C:\fso\CopyOfmyfile.txt"

$fileC = "C:\fso\changedMyFile.txt"

if(Compare-Object -ReferenceObject $(Get-Content $fileA) -DifferenceObject $(Get-Content $fileB))

 {"files are different"}

Else {"Files are the same"}

When I run the script and compare FileA with FileB, the script returns the correct response:

Image of command output

When I change it to use FileC, the script also works:

Image of command output

The three files are shown here:

Image of files

So JW, this is a very simple test case. What is really going on when using Compare-Object?

I can use the Windows PowerShell ISE to run a portion of the code and look at it. To do this, I highlight the Compare-Object statement and press F-8 to execute only that portion of the code. This is shown here:

PS C:\> Compare-Object -ReferenceObject $(Get-Content $fileA) -DifferenceObject $(Get-Content $fileC)

InputObject                                              SideIndicator                                         

-----------                                              -------------                                         

Additional values                                        =>          

And when I compare FileA with FileB, the following appears:

PS C:\> Compare-Object -ReferenceObject $(Get-Content $fileA) -DifferenceObject $(Get-Content $fileB)

PS C:\> 

This triggers the ELSE portion of the code. Although this works, it can be a bit slow, and on more complex files, I would think it would also be a bit unreliable.

So a better way to do this is to use Get-FileHash and compare the HASH property. Your revised script is shown here:

$fileA = "C:\fso\myfile.txt"

$fileB = "C:\fso\CopyOfmyfile.txt"

$fileC = "C:\fso\changedMyFile.txt"

if((Get-FileHash $fileA).hash  -ne (Get-FileHash $fileC).hash)

 {"files are different"}

Else {"Files are the same"}

Now, when I look at the portion of the code that executes, I can see that I am dealing with a Boolean, instead of trying to evaluate whether output (which is basically ignored) appears or not (as in your previous script).

In the following, I execute only the Get-FileHash portion of the script:

PS C:\> (Get-FileHash $fileA).hash  -ne (Get-FileHash $fileC).hash

True

PS C:\> (Get-FileHash $fileA).hash  -ne (Get-FileHash $fileB).hash

False

In addition, the Get-FileHash code is rather efficient because Windows PowerShell is pretty fast when it comes to getting the file hash. Plus this operation simply obtains the file hashes, and compares the two hashes. Your original script reads in the complete file, and then compares it line-by-line, so it is much less efficient.

JW, that is all there is to using Windows PowerShell to compare two files. Troubleshooting Week will continue tomorrow when I will talk about more cool stuff.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy 

Comments (8)

  1. Anonymous says:

    The links for the images are not working are are 404’ing.

  2. PetSerAl says:

    Compare-Object ignore order of objects, so files File1.txt and File2.txt will be considered same by your code.

    File1.txt:
    First line
    Second line

    File2.txt:
    Second line
    First line

  3. Jonas Pedersen says:

    Hello
    Im really new at powershell comming from bash scripting. Im working on windows 7 with PS v3 and im trying to make your examples but the get-fileHash fails to recognized. What Im I doing wrong and should I upgrade to PSv4.

  4. ITGuy says:

    @Jonas Pederson

    Get-FileHash is available on my Windows 7, PowerShell 4 Install. I don’t have a Windows 7, PowerShell 3 Install to look at right now.

  5. username says:

    Doesn’t Get-FileHash also get the content of the entire file to generate the hash? I feel like it’s not just a quick property grab because it takes forever to run against for example a Windows 8.1 WIM, especially over the network (watching my net adapter
    chew up 10GB in 25MB chunks)

    I think a tiered approach would work better if you’re going to be working with large files. First, compare the length property of both objects, because that’s an almost instantaneous operation. If that doesn’t match you are obviously working with different
    files and the hash (almost 100% for sure, but not quite 100%) won’t match. if the file lengths do match, then do a hash. This would cut your run time down if you’re working with a ton of files, especially larger files. it would only add fractions of a second
    if every single file you compare is different (and if they are, why bother comparing them anyways?)

  6. username says:

    so you would only bother hashing if the files to be compared are the same size, the code would look like:

    $fileA = "C:fsomyfile.txt"
    $fileB = "C:fsoCopyOfmyfile.txt"
    $fileC = "C:fsochangedMyFile.txt"

    if ($fileA.Length -eq $fileC.Length)
    {
    if ((Get-FileHash $fileA).hash -ne (Get-FileHash $fileC).hash)
    {"files are different"}
    Else {"Files are the same"}
    }
    else {"Files are different"}

    also yes, get-filehash is PSv4 or higher.
    https://technet.microsoft.com/en-us/library/dn520872.aspx

  7. Przemyslaw Wojdat says:

    Use Measure-Command -Expression () to see a difference. Calculating hash of a file actually takes longer than comparing small files.

  8. Anand says:

    Hi there
    Actually i would like to compare two large text files and fetch the differences in an output file.
    i know that get-content is not efficient in this case. Could you please provide me a better option?
    If you could you provide me with the actual code itself, it would be really helpful to me.

    I am just a newbie w.r.t powershell.

Skip to main content