Weekend Scripter: Using PowerShell to Look for Documents


SummaryMicrosoft Scripting Guy, Ed Wilson, talks about different approaches to find documents with Windows PowerShell.

Microsoft Scripting Guy, Ed Wilson, is here. It is a lovely weekend down here in the deep south USA. I am sipping a nice cup of English Breakfast tea with lemon, lime, orange pith, and Hibiscus flower added. I decided to leave out the cinnamon stick this morning to see what would happen. What happened is that the resulting tea is mellow and tangy at the same time.

I have the Rolling Stones cranked up on my Zune 2.0, and I am hanging out enjoying the beautiful weather. I also know that more than likely in a few weeks, it will become hot and humid, so I should spend time outside as much as possible. So I took my laptop, my Zune, and my cup of tea upstairs to the deck so I can combine my favorite activities. I was sort of wondering how many Hey, Scripting Guy! Blog posts I have. I know I can go to the blog and count them, but dude…

Keep in mind I did not write all of these posts. One thing the Windows PowerShell community has been great about is writing guest blog posts. I started the Honorary Scripting Guys program to recognize those who have made an outstanding contribution to the blog and to the community.

To make it a little more challenging to count my posts, the first articles I wrote had unique document names for each post, instead of a developed a pattern, such as HSG-4-3-15.docx. With this in mind, it is more of an academic exercise than actually finding how many Hey, Scripting Guy! Blog posts I have written.

Use Get-ChildItem and –Include

Most of the time, when I need to search for specific types of files, I use the –Include parameter with Get-ChildItem. I then end up piping the results to the Where-Object so I can filter other stuff (maybe based on the file name), and then I return the files. So what am I actually looking at, and what am I looking for? Well, here is an image of a typical Hey, Scripting Guy! folder:

Image of files

So in a typical HSG folder, I have pictures (.png file extension), Windows PowerShell scripts (.ps1 extension), and documents (.docx extension). What is not visible here, is that the files begin with HSG during the week, and WES for the weekend. (In the past, I did also wrote a Quick Hits Friday post that began with QHF.)

I know the file extension I am looking for (“DOCX”), so the only thing I need to concern myself with is the beginning of the document names (either HSG, QHF, or WES). I come up with the following command to find my documents:

Get-ChildItem E:\Data\ScriptingGuys -include "*doc*"-Recurse -file |

? {$_.BaseName -like 'HSG*' -OR $_.BaseName -like 'WES*' -OR $_.BaseName -like 'QHF*'}

The next thing I need to do is to do is to count the documents. To do this, I select the Count property from the object returned by Measure-Object. My complete command is shown here:

(Get-ChildItem E:\Data\ScriptingGuys -include "*doc*"-Recurse -file |

? {$_.BaseName -like 'HSG*' -OR $_.BaseName -like 'WES*' -OR $_.BaseName -like 'QHF*'}|

 measure).count

The command runs for a few seconds, and the following appears in the output:

PS C:\> (Get-ChildItem E:\Data\ScriptingGuys -include "*doc*"-Recurse -file |

? {$_.BaseName -like 'HSG*' -OR $_.BaseName -like 'WES*' -OR $_.BaseName -like 'QHF*'}|

 measure).count

2247

It says 2,247 documents. Cool! That is a lot of writing.

Use the –Filter parameter

I have a command that works OK, but, now I want to use the –Filter parameter instead of using –Include. The reason is that by using –Filter, the FileSystem provider does the filtering before returning the objects to Windows PowerShell. It should be faster.

The difference is that I am more limited in what I can do. I can basically do a single filter on the path of the file. I can use an asterisk ( * ) or a question mark ( ? ) for wildcard characters, and that is about it. I also decide to convert from using –Like to the regular expression –Match in my Where-Object command. Here is my resulting command:

(Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^HSG' -OR $_.BaseName -match '^WES' -OR $_.BaseName -match '^QHF'}|

 measure).count

When I run it, it returns the same number of documents, but it seems faster. Here is the result:

PS C:\> (Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^HSG' -OR $_.BaseName -match '^WES' -OR $_.BaseName -match '^QHF'}|

 measure).count

2247 

Clean-up the Regular Expression

Although the previous command works, it is not very pretty. I mean, why have THREE –Or operators, when the Regular Expression language has a built-in Or operator: the pipe character ( | ). So I decide to modify my command:

(Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^(HSG|WES|QHF)'}|

 measure).count

This also returns the same output:

PS C:\> (Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^(HSG|WES|QHF)'}|

 measure).count

2247

Let me run all three at the same time and see what happens:

Image of command output

Timing the commands

So I know that my output for all the commands returns the same information. Now I want to time the three operations to see how long each one takes. I use the Measure-Command cmdlet, and look at the TotalSeconds property. My revised code is shown here:

"Include LIKE"

 (Measure-Command {

 (Get-ChildItem E:\Data\ScriptingGuys -include "*doc*"-Recurse -file |

? {$_.BaseName -like 'HSG*' -OR $_.BaseName -like 'WES*' -OR $_.BaseName -like 'QHF*'}|

 measure).count

 }).TotalSeconds

 "Filter MATCH multiple -OR"

 (Measure-Command {

 (Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^HSG' -OR $_.BaseName -match '^WES' -OR $_.BaseName -match '^QHF'}|

 measure).count

 }).TotalSeconds

 "Filter MATCH Regex OR"

 (Measure-Command {

 (Get-ChildItem E:\Data\ScriptingGuys -filter "*doc*" -Recurse -file |

 ? {$_.BaseName -match '^(HSG|WES|QHF)'}|

 measure).count

 }).TotalSeconds

I run the script to see which is fastest:

Image of command output

It should be no surprise that the last command is the fastest. But what may be surprising is how much faster it was than using the –Include parameter. I am talking .7 seconds compared to 2.3 seconds—and this was for only a couple thousand files. What about a couple million files? Well, it would be orders of magnitude faster. To me, it is worth it to play around with –Filter.

Hope you have a great weekend.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy 

Comments (0)

Skip to main content