Hey, Scripting Guy! I need some help in cleaning up a text file. I have a file that contains multiple columns of data, and I need to either delete or modify lines in the file based on the value of the 13th column. Can you help me?
Hey, AM. You know, usually we start off this column with some amusing (well, OK, semi-amusing) anecdote or story. Not today, though; today nothing seems particularly amusing to the Scripting Guy who writes this column. Is that because this day is the absolute worst day of his life? Yes.
No, wait; we mean: no. The absolute worst day of his life came when he was a junior high school and his baseball team was playing Hanford High, undefeated and ranked number 1 in the state. With the score tied in the bottom of the seventh, the Scripting Guy who writes this column came to bat with one out and a runner on third. The base coach gave him the signal to take the first pitch; he took a first-pitch fastball for a strike. On the second pitch the base coach called for a suicide squeeze, in which the runner from third takes off on a mad dash for home plate and it’s up to the batter to just put his bat on the ball, some way somehow.
Unfortunately, though, Hanford High suspected that something was up. The Falcons called for a pitchout, in which the pitcher throws the ball far enough away from the plate that the batter can’t possibly hit it. The Scripting Guy who writes this column gamely flailed at the ball, missing it by a good two feet. The catcher caught the pitchout and tagged out the runner who was trying to slide in with the winning run. On the next pitch the Scripting Guy who writes this column swung and missed at a fastball, the only time he struck out that entire season. Inning over, and Hanford ended up winning in extra innings.
That was the worst day of the Scripting Guy who writes this column’s life. And nothing else comes close. Except maybe the time he played in a slo-pitch softball tournament and got a base hit to begin the day and another base hit to end the day. In between those two at-bats he hit 12 line drives, each hit harder than the other, and each hit right at someone. 12 consecutive line shots, and 12 consecutive outs. The Scripting Guy who writes this column didn’t speak to anyone for two days after that debacle.
So maybe today isn’t such a bad day after all. And besides, how bad a day could it be when you have the chance to write a script that can delete or modify lines in a text file based on the value of the 13th column in each line, eh?
Before we take a look at the script let’s take a look at AM’s text file:
RD|I|42|101|0047|42603|1|1|008|099|20060123|11:00|24.00000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|12:00|9993|||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|13:00|17.80000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|14:00|-17.30000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|15:00|-28.90000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|16:00|-30.80000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|17:00|-.40000|||||||||||||||
Yes, it is a very nice little text file, isn’t it? However, we’re interested in only one small portion of the text file: we’re interested in only the 13th field (column) in each line. For example, in line 1 that’s the value 24.00000; in line 2, that’s the value 9993. What we need to do is look at this value for each line in the file and take action if:
The value begins with the characters 999.
The value is greater than -.51 but less than 0.
The value is less than -.51.
How are we going to do all that? Why, by running this script, of course:
Const ForReading = 1 Const ForWriting = 2 Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading) Do Until objFile.AtEndOfStream strLine = objFile.ReadLine arrItems = Split(strLine, "|") dblValue = arrItems(12) dblValue = CDbl(dblValue) i = 0 If Left(dblValue, 3) = "999" Then i = 1 End If If dblValue > -.51 AND dblValue < 0 Then i = 1 End If If dblValue < -.51 Then arrItems(12) = 0 End If If i = 0 Then strLine = Join(arrItems, "|") strNewContents = strNewContents & strLine & vbCrLf End If Loop objFile.Close Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting) objFile.Write strNewContents objFile.Close
As you can see, we kick things off in the same way we kick off many of our scripts: by defining a pair of constants (ForReading and ForWriting) that we’ll need in order to open our text file in the correct mode. (That is, do we want to open the text file in order to read from it or write to it?) After defining the two constants we create an instance of the Scripting.FileSystemObject, then use the following line of code to open the file C:\Scripts\Test.txt for reading:Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)
And yes, in and of itself that is pretty exciting, isn’t it? But trust us: we can do more with this text file than just open it for reading.
Like what, you ask? Well, how about setting up a Do Until loop that loops until the file’s AtEndOfStream property is True? (That is, until we’ve read every last bit and byte in the file.) Setting up a loop like this one enables us to use the ReadLine method to read in the entire contents of the file, line-by-line. In fact, this bit of code uses ReadLine to read the first line in the file and store that information in a variable named strLine:strLine = objFile.ReadLine
And that turns out to be everything that we can do with a text file, and everything we can do with this script.
Oh, wait, never mind; now that we think about it, it turns out there are a few other things we can do, such as modify the text file to meet AM’s needs. Sorry for the misunderstanding.
In order to modify the text file to meet AM’s needs the first thing we do is use the Split function to split the file into an array named arrItems; by splitting on the pipe separator (the | character) we end up with an array with items similar to these (plus a bunch of empty items at the end of the array):
The item of most interest to us is the last item in the list, item 12. (Remember, the first item in an array always has an index number of 0. That means that the 13th item in the array – the item we care about – has an index number of 12.) In order to isolate that value, and make it easy for us to work with it, our next step is to execute these two lines of code:
dblValue = arrItems(12) dblValue = CDbl(dblValue)
In the first line we’re simply assigning the value of array item 12 to a variable named dblValue. In the second line we use the CDbl function to “cast” the value of dblValue to a numeric (double) data type; that ensures that VBScript will treat this value as a number and not as a string.
Note. Is that really necessary? Well, it was for us: we got some very strange results before we explicitly cast dblValue as a numeric variable.
After setting the value of a counter variable named i to 0 we’re ready to begin applying AM’s criteria to the current line of text. For starters, AM wants to delete any lines where the 13th column in the text file starts with the characters 999. That’s what this block of code is for:
If Left(dblValue, 3) = "999" Then i = 1 End If
All we’re doing here is using the Left function to determine whether or not the first three characters in dblValue are 999. What if they are? Well, in that case we set the value of our counter variable to 1; that’s how we’ll know that this is a line that needs to be deleted from the text file.
That brings us to this chunk of code:If dblValue > -.51 AND dblValue < 0 Then i = 1 End If
In addition to deleting any lines where the 13th column begins with 999 AM also wants to delete any lines where that 13th column has a value greater than -.51 but less than 0. (For example, a value like -.40000.) That’s what we’re checking for here: if we have a value less than 0 but greater than -.51 (that is, a negative number larger than -.51) then set the value of the counter variable i to 1. Why? You got it: because this, too is a line that we need to delete from the text file.
Last, but surely not least, we encounter this test:If dblValue < -.51 Then arrItems(12) = 0 End If
As you probably noticed in our previous check we don’t delete just any line that contains a negative value in column 13; instead, we delete only those lines that have a negative value greater than -.51. So what happens if we encounter a negative value that’s less than -.51 (for example, -17.30000)? In that case, we want to set the value of the 13th column to 0; that’s something we can do by assigning 0 to array item 12:arrItems(12) = 0
And no, in this case we don’t set the value of the counter variable i to 1. Why not? Right again: because this is not a line that we want to discard. This one’s a keeper.
Our next step is to check and see if our counter variable is equal to 0:If i = 0 Then
If it is, then arrItems represents a line of text that we want to keep. With that in mind we use the Join function to magically turn our array back into a single line of text, separating the individual columns with a pipe separator:strLine = Join(arrItems, "|")
And once we’ve done that we append this line of text (and a carriage return-linefeed character) to a variable named strNewContents:strNewContents = strNewContents & strLine & vbCrLf
And then it’s back to the top of the loop, where we repeat the process with the next line in the text file.
So what happens when the file’s AtEndOfStream property is True? Well, at that point we automatically exit the loop, then use the Close method to close the file Test.txt. No sooner do we close the file then we immediately reopen it, this time for writing:Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting)
From there we call the Write method to replace the existing contents of the file with the value of strNewContents; we call the Close method to close the file once again; and then we call it a day. If we were to go ahead now and echo back the contents of Test.txt we should see this:RD|I|42|101|0047|42603|1|1|008|099|20060123|11:00|24.00000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|13:00|17.80000||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|14:00|0||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|15:00|0||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|16:00|0|||||||||||||||
As you can see, lines where the 13th column started with the characters 999 have been deleted; so have any lines where the 13th column had a value less than 0 but greater than -.51. Oh, and any lines that had a 13th column less than -.51? In those cases, we’ve changed the value of column 13 to 0:RD|I|42|101|0047|42603|1|1|008|099|20060123|14:00|0||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|15:00|0||||||||||||||| RD|I|42|101|0047|42603|1|1|008|099|20060123|16:00|0|||||||||||||||
Which is just exactly what AM was hoping we would do.
That should do it for now; we’ll be back again tomorrow with yet another Hey, Scripting Guy! Are we concerned that tomorrow will be another bad day? Nah. What we neglected to tell you is that the Scripting Guy who writes this column got another crack at Hanford High School later in the year. In that game he had two singles and a double, scoring two runs and driving in two more; in addition, he made a dramatic running catch with two outs in the bottom of the seventh that saved the game and sent Hanford High to their first loss of the season.
Not only that, but one week after his 2-for-14 softball disaster he played in another tournament, this time getting 14 hits in 19 at-bats. So, see: nothing to worry about. Although, just in case, he’ll probably go out tonight and buy himself a new baseball bat, which is what he did the other two times in his life that he’s had a bad day.