Hey, Scripting Guy! How Can I Parse a Text File and Retrieve All the Information Contained Within Square Brackets?

ScriptingGuy1

Hey, Scripting Guy! Question

Hey, Scripting Guy! I need to parse through a text file and extract the information contained within square brackets. So far, however, I haven’t been able to figure out how to do this; that’s because the data isn’t stored on separate lines in the file. Can you help?
— EI

SpacerHey, Scripting Guy! AnswerScript Center

Hey, EI. Well, today is Friday, April 18th, the day the National Basketball Association is supposed to vote on whether the league will allow the Seattle Sonics to be moved to Oklahoma City. To be honest, the Scripting Guy who writes this column couldn’t care less; after all, he doesn’t like the NBA (he likes basketball) and he hasn’t watched a Sonic game in years, not even on TV. Nevertheless, he has been fascinated by the soap opera playing out between the city of Seattle and the Sonics owners, Oklahoma natives who promised to do everything possible to keep the team in the state of Washington and then – as recently-released emails have shown – pretty much did everything possible not to keep the team in the state of Washington. (In fact, former owner Howard Schultz – of Starbucks fame – found their conduct so egregious that he’s announced plans to sue in order to regain control of the franchise.)

Admittedly, the emails in which the Sonics owners discussed plans to move the team – even while they were publicly assuring everyone that they had no plans to move the team – are interesting. But the Scripting Guy who writes this column found the following email to be the most fascinating of all; this unedited excerpt is from an email that Sonics owner Clay Bennett sent to NBA commissioner David Stern:

You are among the very few, people notwithstanding our relatively brief actual physical time together that have significantly affected my life. I view you as a role model as an extraordinarily gifted executive, a deep and compassionate thinker, and a person with a rare and unique charisma that brings out the best in everyone you touch. You are just one of my favorite people on earth and I so cherish our relationship ….

Wait a second: was that the email Clay Bennett sent to David Stern, or was that the email Ken Myer sent to the Scripting Guys asking that his script, submitted as part of the 2008 Winter Scripting Games, be awarded full points even though the script didn’t actually do what it was supposed to do?

Of course, unlike David Stern, the Scripting Guys turn a deaf ear toward insincere flattery and fawning. Oklahoma City will likely get their basketball team, but Ken Myer did not get his Scripting Games points.

Well, actually he did. But that’s beside the point.

At any rate, we’ll keep you posted as to the Sonics whereabouts. In the meantime, there are far more important matters to attend to. Matters like what, you ask? Well, consider this. EI has a text file that includes lines like these:

Location1[apples][bananas]
Location2[bananas][cherries]
Location3[apples][oranges]
Location4[plums][strawberries]

EI needs to extract the text found between the square brackets; that is, the fruit names. If he can do that he can then compile a list similar to this one:

apples
bananas
bananas
cherries
apples
oranges
plums
strawberries

Unfortunately, though, he doesn’t know how to extract that information, in part because there might be more than one fruit per line in the text file.

So do the Scripting Guys have an answer for him? Well, we thought we did; without fully thinking the matter through we assumed we could use a simple regular expression to retrieve that data. That almost worked, but not quite:

[apples][bananas]
[bananas][cherries]
[apples][oranges]
[plums][strawberries]

How did we end up with that? Well, we initially tried searching for text found between a beginning bracket ([) and a closing bracket (]). And, if you look closely enough, you’ll see that’s what we got:

[apples][bananas]

We have a beginning bracket and a closing bracket, and all the text in between. Which – alas – happens to include another set of brackets.

Faced with this kind of adversity, did the Scripting Guys give up? You bet we did. But a few days later we returned to the problem, and started poking around the Internet to see if anyone had solved a similar problem. Our search managed to uncover a few regular expressions that removed HTML tags from a file, but we weren’t satisfied with that approach; as far as we were concerned those regular expression patterns were about as meaningful as this:

^*@%@&@(!(U&@Y^T^&R@%@R^&(I@)(@()(@)I)@I(W!@!H

Which really isn’t all that meaningful.

In other words, a super-complicated regular expression search might have worked, but we decided against it. After all, we thought, there must be an easier way to get the information EI is after.

And, as it turns out, there is:

Const ForReading = 1

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

strContents = objFile.ReadAll
objFile.Close

Set objRegEx = CreateObject("VBScript.RegExp")

objRegEx.Global = True   
objRegEx.Pattern = "\[.{0,}\]"

Set colMatches = objRegEx.Execute(strContents)  

If colMatches.Count > 0 Then
   For Each strMatch in colMatches   
       strMatches = strMatches & strMatch.Value 
   Next
End If

strMatches = Replace(strMatches, "]", vbCrlf)
strMatches = Replace(strMatches, "[", "")

Wscript.Echo strMatches

Before we go any further we should point out that our solution is designed to work with EI’s text file, which means it won’t necessarily work with a similar file. For example, suppose EI had a file that looks like this, with additional text between the fruit names:

Location1[apples]ABC[bananas]
Location2[bananas]DEF[cherries]
Location3[apples]GHI[oranges]
Location4[plums]JKL[strawberries]

Our solution won’t work with this particular file, although we could easily modify it to do so. But that’s another column for another day.

As for today’s column, we start things out by defining a constant named ForReading and setting the value to 1; that’s the constant that tells the script which file mode (for reading) we want to use when opening the text file. And it’s not long before we put that constant to use; as soon as the constant is defined we create an instance of the Scripting.FileSystemObject object, then use this line of code to open the file C:\Scripts\Test.txt, for reading:

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

Once the file is open we use the ReadAll method to read in the entire contents of the file, storing that information in a variable named strContents. At that point we no longer need the file, so we call the Close method and dismiss Test.txt.

Our next step is to create an instance of the VBScript.RegExp object, the object that enables us to use regular expressions in a VBScript script. (And yes, we are going to use a regular expression here. But it’s a reasonably simple regular expression.) After we create the RegExp object we set the Global property to True, something that ensures that the script will find each instance of the target text. That brings us to this line of code:

objRegEx.Pattern = "\[.{0,}\]"

As you probably know by now, the Pattern property is where we define our target text. So what exactly is our target text? Well, for starters, we’re looking for a beginning bracket; that’s what the \[ is for. (We need to “escape” the [ character by prefacing it with a \; that’s because the [ is a reserved character in a regular expression.) We don’t really care too much what comes after the beginning bracket; thus we add the construction .{0,} which finds any character (and any number of characters) other than the newline character. That gives us a beginning bracket, a series of characters, and – last but not least – the ending bracket, which must also be escaped: \].

After we define the Pattern we call the Execute method to carry out our regular expression search:

Set colMatches = objRegEx.Execute(strContents)

When we call the Execute method, any instances of the target text that are discovered will be stored in a collection we named colMatches. If we were to echo back the value of colMatches right now we’d get this:

[apples][bananas]
[bananas][cherries]
[apples][oranges]
[plums][strawberries]

As we already determined, that’s not what we want to get back.

Therefore, assuming that we even have some matches (that is, assuming the value of the collection’s Count property is greater than 0) we execute the following For Each block:

For Each strMatch in colMatches   
    strMatches = strMatches & strMatch.Value 
Next

All we’re doing here is looping through the collection of matches, and appending each match to a variable named strMatches. When we’ve finished looping through the collection strMatches will look like this:

[apples][bananas][bananas][cherries][apples][oranges][plums][strawberries]

Does that do us any good? You bet it does. In our very next line of code, we use VBScript’s Replace function to replace any instance of the closing bracket with a carriage return-linefeed (vbCrLf):

strMatches = Replace(strMatches, "]", vbCrlf)

As soon as we do that strMatches will equal this:

[apples
[bananas
[bananas
[cherries
[apples
[oranges
[plums
[strawberries

And you’re right: that is very close to the list we were hoping to retrieve, isn’t it? In fact, all we have to do now is remove all instances of the [ character and we’re home free:

strMatches = Replace(strMatches, "[", "")

So now what does strMatches equal? Let’s echo back the value and see for ourselves:

apples
bananas
bananas
cherries
apples
oranges
plums
strawberries

Tah-dah! At this point EI can do whatever he wants with this list of fruits.

That’s all the time we have for today. Which is just as well; we’re too nervous about the upcoming NBA vote to do much work anyway. Just for the heck of it, we decided to check out the Oklahoma City home page, in order to see what might be in store for our Sonics. Here are a few of the articles featured on the home page:

City to collect ammunition, computers and tires at State Fairgrounds

What if my sewer backs up?

Be prepared this tornado season

Storm debris pickup complete

Report potholes by phone

“This City is Going on a Diet”

By comparison, here are a few of the articles we found on the home page for the city of Seattle:

City encourages everyone to go out for a drive: “After all, there’s no traffic anywhere”

Police force considers disbanding; no crimes reported since 1997

City Council to award all King County residents a $5,000 bonus

NBA teams begging for chance to move to Seattle

Seattle to host tickertape parade for the Scripting Guys

Sorry we can’t provide a link to those articles; we seem to have misplaced the URL for the city of Seattle, and for all these articles. But that’s OK; you can just take our word for it.

0 comments

Discussion is closed.

Feedback usabilla icon