How Can I Search a Text File for Strings Meeting a Specified Pattern?

ScriptingGuy1

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I search a text file of product IDs and retrieve just those lines that meet a specified pattern?

— WT

SpacerHey, Scripting Guy! AnswerScript Center

Hey, WT. Before we get to today’s question we were wondering if anyone else has seen the commercial for the car that features the all-new “heartbeat sensor?” The idea is that, before you open your car door, you check the sensor, which can detect the heartbeat of anyone who happens to be hiding in the car waiting to pounce on you.

As a general rule the Scripting Guys are opposed to people hiding in cars waiting to pounce on unsuspecting drivers. And we don’t doubt that this very thing has happened before: someone has opened their car door and been pounced upon. Nevertheless, we’d be curious to know how often this sort of thing happens. It makes sense to have smoke alarms in houses; houses do occasionally catch on fire. But do we really need heartbeat sensors in cars? We’re just not sure. To tell you the truth, no one ever hides in any of the Scripting Guys’ cars.

But, then again, that could simply be because no one wants to catch a Scripting Guy.

The Scripting Guy who writes this column finds this all very interesting, in part because there is plenty of research to indicate that even though our lives continue to get better and better people continue to get unhappier and more depressed. That could be due to the fact that money and material goods truly don’t buy happiness. Alternatively, it could be due to the fact that, just when things start looking up, someone invents a new menace to worry about, and provides a solution to a problem no one even knew existed.

Just wondering.

By contrast, the Scripting Guys only provide solutions to problems that do exist. (We’re also responsible for creating many of those problems in the first place. But that’s another story.) For example, some people need to be able to retrieve a list of specific products from a text file. How are they supposed to do that? Here’s how:

Const ForReading = 1

Set objRegEx = CreateObject(“VBScript.RegExp”) objRegEx.Pattern = “^[1-9]…GRP”

Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading)

Do Until objFile.AtEndOfStream strSearchString = objFile.ReadLine Set colMatches = objRegEx.Execute(strSearchString) If colMatches.Count > 0 Then For Each strMatch in colMatches Wscript.Echo strSearchString Next End If Loop

objFile.Close

Before we explain how the script works we should note that, based on WT’s description, we have a text file similar to this:

1XXXGRPABCEFG
2YYYGRPDEF
AZZZGRPDEF
RTRRABCGRPRTY
YTHJABCPBCOP

WT is looking only for those records (lines in the text file) that meet the following criteria:

The first character is a number, 1 through 9. This character indicates a specific product type.

The second, third, and fourth characters are – well, it doesn’t matter. We don’t care about these characters.

The fifth, sixth, and seventh characters are GRP. These happen to indicate different product groups.

The remaining characters are – well, again, it doesn’t matter.

Based on these criteria the first two lines in the text file are the only two lines we’re looking for: they both begin with a number and then have GRP in the fifth, sixth, and seventh character spots. Granted, Line 3 has GRP in the designated spot; however, line 3 doesn’t begin with a number. As for lines 4 and 5, well, the less said about them the better.

So how do we go about finding the desired records? For us the easiest way to do that was to use a regular expression; after all, we’re looking for a specific pattern (a number followed by three characters followed by GRP) and regular expressions are very adept at sniffing out patterns (as opposed to finding specific words and phrases). Will our regular expression work? Let’s find out.

The script starts out by defining a constant named ForReading and setting the value to 1; we’ll need this constant when we open our text file for reading. We then use these two lines of code to create an instance of the VBScript.RegExp object and to specify a Pattern for our search:

Set objRegEx = CreateObject(“VBScript.RegExp”)
objRegEx.Pattern = “^[1-9]…GRP”

Needless to say, the Pattern is the key to getting this script to work; because of that we should take a minute or two to explain the various components of this regular expression. To begin with, we have the ^ character. That simply tells the script that the pattern must be found at the beginning of the search text; that prevents a value like this from being incorrectly tagged as a match:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1XXXGRP

As you can see, the desired pattern is there, but it occurs at the very end of the string, not at the beginning. Therefore, it doesn’t count, at least not with this script.

Next we have this construction: [1-9]. This simply says that the next character in the pattern must be one of the digits 1 through 9 (the square brackets indicate a range of acceptable values). In other words, to be a match the string must start with one of the numbers 1 through 9.

Pretty simple so far, right? Right.

Next up we have three dots: . What do the dots represent? In a regular expression a dot represents any character (except for the newline character). We use the three dots here simply to indicate that, to meet our pattern, the string must have three characters following the opening digit. What three characters? It doesn’t matter.

Finally, we have the product group code: GRP. After the opening digit we need to have three characters. And after those three characters we need to have the letters GRP, in that order. Any string that fits that complete pattern will be considered a match; any string that doesn’t fit that complete pattern won’t be considered a match.

Note. Does all that make sense? If not, you might want to take a look at String Theory for System Administrators, Scripting Guy Dean Tsaltas’ definitive explanation for how to use regular expressions in a script.

After defining our pattern we next use these two lines of code to create an instance of the Scripting.FileSystem object and to open the text file C:\Scripts\Test.txt for reading:

Set objFSO = CreateObject(“Scripting.FileSystemObject”)
Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading)

At this point, the game is afoot. Our next step is to set up a Do Until loop that runs until we’ve read each and every line in the text file (technically, until the AtEndOfStream property is True). Inside that loop we use the ReadLine method to read the first line in the text file and store it in a variable named strSearchString. That brings us to this line of code:

Set colMatches = objRegEx.Execute(strSearchString)

What we’re doing here is using the Execute method to determine whether or not our regular expression pattern can be found in the value of strSearchString. If it can, that information will be returned as a collection named colMatches. If it can’t well,, then colMatches will end up a collection consisting of 0 items.

With that in mind all we have to do now is check to see if the collection colMatches has anything in it. If it does, we echo back the value of strSearchString. (Why? Because that’s the value that meets our pattern.) If it doesn’t, we simply loop around and repeat the process with the next line in the text file. All of that takes place in this block of code:

If colMatches.Count > 0 Then
    For Each strMatch in colMatches   
        Wscript.Echo strSearchString 
    Next
End If

When we’re all done we close the file Test.txt and then sit back and admire the results:

1ABCGRPABCEFG
2DEFGRPDEF

Beautiful.

Incidentally, the Scripting Guys couldn’t resist: just a few minutes ago we all went out to test the new heartbeat sensor. Somewhat to our surprise it seemed to work pretty good, at least until the Scripting Editor hid in the car; with her in there the sensor failed to detect a heartbeat. Does that mean that, as many of us have long suspected, the Scripting Editor truly is heartless? We can’t say that for sure; she wouldn’t let us do an autopsy. However, we do know, for sure, that she at least has something in her head: emergency room staples. We don’t know the details either, but you can find out more by reading her daily dispatch from the Microsoft Management Summit.

Searching for a String Pattern Using Windows PowerShell

Another way to solve this problem, courtesy of Microsoft’s very own June Blender:

This task is exceptionally easy to do in Windows PowerShell because the PowerShell expression parser is designed to interpret regular expressions.

Here’s the same solution in Windows PowerShell. You can enter this command at the PowerShell command line or save it as a script file (.ps1). 

get-childitem file.txt | select-string -pattern ^[1-9]…GRP | foreach {$_.line}

The first command uses the Get-ChildItem cmdlet (similar to dir or ls) to find the text file. The pipeline operator (|) sends the output to the next command.

get-childitem file.txt

The second command uses the Select-String cmdlet to search for the regular expression in the File.txt file. The Pattern parameter (-pattern) specifies the regular expression. Because both VBScript and PowerShell use standard regular expressions, the syntax of the regular expression in this PowerShell command is identical to the one that you use in the VBScript script.

select-string -pattern ^[1-9]…GRP

The third command uses the ForEach-Object cmdlet (alias = foreach) to select the Line property of each match object from the output. Without it, the command displays the file name and the matched characters. With it, the command displays only the matched characters.

foreach {$_.line}

Both the VBScript and PowerShell solutions for this task really demonstrate the power of regular expressions. Although regular expression syntax isn’t easy, it’s so useful that it’s worth taking the time to learn it.

In Windows PowerShell, start with the About_Regular_Expression topic: get-help about_regular_expression.

0 comments

Discussion is closed.

Feedback usabilla icon