Reading and Searching Text from a Group of Files with Regex

ScriptingGuy1

Summary: The Scripting Wife dives into the mysteries of using regular expressions to read and search a group of files in PowerShell.

Microsoft Scripting Guy, Ed Wilson, is here. I had a great trip to Seattle, Washington to teach a Windows PowerShell class to a group of Microsoft network engineers. They had some particular requirements for scripting that I had not previously thought of doing. I wrote some prototype scripts for them to accomplish their specific needs, and in a series of Hey, Scripting Guy! Blog posts that I plan to write in the future, I will go over some of the scripts I wrote for them.

In addition, my good friend, Clint Huffman, was in the classroom next to mine teaching about performance analysis. He is pretty famous as the author of his Performance Analysis of Logs (PAL) tool. Many people in his class were diehard Hey, Scripting Guy! fans, so it was a great week all around. The bad thing is that the Scripting Wife was unable to make the trip; therefore, her studies of Windows PowerShell were sort of derailed a bit while I was gone.

I am back in Charlotte, North Carolina now, having flown all night on the “red eye,” and my stomach is a bit topsy-turvey due to excessive turbulence over the midwest and really bad tea over the entire flight. One thing that was nice about the Microsoft building where I was teaching is there was a nice selection of tea. In particular, one green tea quickly became my favorite after the obligatory first day of exploration and sampling of the wares. It was not loose leaf, but for the bagged stuff it was not bad—and it was way better than the stuff the airline was attempting to pawn off on unsuspecting passengers.

Anyway, the trip was too much fun to waste time whining about lack luster tea. At home, there is a pile of “snail-mail spam” to sort through—so much so that it is flowing onto the kitchen floor. There is no place to sit right now in the kitchen—unless one wishes to begin shredding junk mail. I decide to forgo the shredding. I grab my laptop and a snack from the pantry, and head to the front porch. I am “knee deep” in email with my head slouched over my laptop when the Scripting Wife appears standing at my side.

“Oh good, you are back,” she says warmly.

“Yes, I am. Having flown all night, I am a bit tired,” I begin.

“Yeah, I imagine your arms would get tired…all that flapping,” she smiled.

“So, I am trying to catch up on a few crucial emails, and then I intend to grab a bit of sleep,” I began.

“Well that is perfect, because what I need will not take you more than a few minutes to accomplish,” she said slyly.

“Just a second. You are planning to accompany me to Tech Ed 2011 this year. Correct?” I asked.

“Yes.”

“And you are going to help me people the booth,” I said.

“You are going to do what to the booth?”

“People it. You know, add staff. Have people stand around, talk to people, hand out Windows PowerShell stickers,” I said.

“Oh, I see,” she said, “You are a strange one `enry `iggins,” she said with a passable Eliza Doolittle voice.

“That is pretty good. Was Henry Higgins the hero?” I asked.

“Not even close,” she said. “It is not that you are really anything like him, it is just that you have a tendency to wax on a bit too long.”

“Then perhaps you should tell me to wax off once in a while,” I said with an obscure Karate Kid reference.

“Because it is clear that you are tired, I will not harass you about not being a comedian—even though it is painfully obvious that you are not. Instead I will simply ask you one question; you can refuse to answer if you wish, but keep in mind that anything you do say can be used against you when it comes to dinner tonight,” she said half smiling. “I need to know how to search all of my friends’ contact cards that I have in the myfriends directory. I want to get my friends names and phone numbers from all of my contact cards in the directory,” she said.

Note that the Scripting Wife has been playing around with these files for nearly a week. On Tuesday, last week, she described the text files she uses to track her friends contact information as she learned how to search the files. On Wednesday, she learned how to use a regular expression to search one file and to retrieve a phone number from that file. By Thursday, she was searching a group of files for phone numbers. Now she is back again with another request to parse these contact files.

“Well, my dear Scripting Wife…the easiest way to get a listing of all your friends and their phone numbers is to use the Get-ChildItem cmdlet to get a listing of all of your files, and then pipe the files to the Foreach-Object cmdlet. Inside your Foreach-Object cmdlet, you can use the Get-Content cmdlet like you did the other day to read your files and pipe the content of your files to the Select-Object cmdlet so you could get the first and fourth lines of your file. What do you think? You did all of this in various steps last week.”

The Scripting Wife decided to give it a shot. She used several command aliases to simplify her typing at the Windows PowerShell console. For example, she used GCI for Get-Childitem, and % for the Foreach-Object cmdlet. You will also notice that Windows PowerShell is not case sensitive, and her command is a mixture of upper case, lower case, and even upper and lower case cmdlet names. It does not matter. Following is the command she typed.

GCI c:\myfriends -recurse | % {get-content $_.fullname | Select-Object -Index 0,3 }

If she had not typed any aliases and had used all of the parameters, the command would look like the following one line command (which has wrapped to the next line).

Get-ChildItem -Path C:\MyFriends -Recurse | ForEach-Object { Get-Content -Path $_.FullName |

 Select-Object -Index 0,3 }

The Scripting Wife’s command and the associated output are shown in the following image.

Image of command output

“You are slipping,” the Scripting Wife said.

“How so?”

“You have me do all that convoluted Script Monkey stuff, but I don’t need to do all that. Because, I have been reading, and I saw that the Get-Content cmdlet will accept an array of paths. That means it is smart enough to read several files at once,” she said triumphantly.

“Well, you are right, but you are also wrong.”

“I hate it when you do that geek thing.”

“What geek thing.”

“You never give a straight answer. Everything is always ‘it depends,’ or ‘yes and no.’ And before you get all silly on me; no, I don’t think you are simply following the Charles Dickens literary tradition,” she said.

I was so proud of her oblique reference to Tale of Two Cities that I could hardly contain myself, but I did not want to let her know that I had caught it.

“Go ahead and remove the Get-ChildItem cmdlet from your previous command, and just use the Get-Content cmdlet with an array of paths,” I instructed.

The Scripting Wife thought for a minute, and then she typed the following command:

get-content c:\myfriends | Select-Object -Index 0,3

When she ran the command, an error appeared. The command and its associated error are shown in the following image.

Image of command output

“Why did it do that?” she asked.

“It is trying to read the content of the directory itself. It needs to be able to find the actual files. You can use a wild card character if you wish.”

This time, it did not take the Scripting Wife very long at all to modify the command. The revised command is shown here.

get-content c:\myfriends\* | Select-Object -Index 0,3

The output associated with the previous command is shown here.

Image of command output

“That is better. But I am missing most of my friends’ information,” she complained. “What is causing it?”

“What is happening is that you are only getting the information from one record. You are not getting anything from the other records,” I said.

“OK. I think I know what will fix that,” she announced. “I need to walk through the items, so I will use the Foreach-Object cmdlet.”

After making several attempts, the Scripting Wife had a command that she liked. The command is shown here.

get-content c:\myfriends\* | % { $_ | Select-Object -Index 0,3 }

The command and its associated output are shown in the following image.

Image of command output

At first, she was triumphant, and she jumped up and down making several different whooping types of sounds. Then despair slowly settled upon her countenance as she realized that no filtering was taking place.

“What happened?” she asked.

“Well, when the Get-Content cmdlet reads all the text in all the files in your paths, what it returns is one big array of text. When you send it to your Foreach-Object cmdlet and then to the Select-Object cmdlet, your index trick no longer works like it did because you are no longer dealing with text from individual files. The command that I showed you when you first came out here works. Use that for now because I need to get some rest. See you later alligator,” I said as I got up and headed toward the door.

“After a while crocodile,” she replied.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

0 comments

Discussion is closed.

Feedback usabilla icon