PowerShell Examples: Counting words in a text file

This blog is part of a series that shows example PowerShell code for those learning the language.

This time we’re using PowerShell to count lines, count words, find the largest word and find the most frequently used words in a text file. To make it interesting, we’re using a plain text version of “Alice in Wonderland” downloaded from the Project Guttenberg site.

This example explores string manipulation and the use of hash tables. It also shows the use of Write-Progress.

 

## Counting words in a text file# Uses the text from Alice in Wonderland # from https://www.gutenberg.org/ebooks/11.txt.utf-8#

Clear-Host$FileName = ".Alice.TXT"Write-Host "Reading file $FileName..." $File = Get-Content $FileName$TotalLines = $File.CountWrite-Host "$TotalLines lines read from the file."

$SearchWord = "WONDERLAND"$Found = 0$WordCount = 0$Longest = ""$Dictionary = @{}$LineCount = 0

$File | foreach { $Line = $_ $LineCount++ Write-Progress -Activity "Processing words..." -PercentComplete ($LineCount*100/$TotalLines) $Line.Split(" .,:;?!/()[]{}-```"") | foreach { $Word = $_.ToUpper() If ($Word[0] -ge 'A' -and $Word[0] -le "Z") { $WordCount++ If ($Word.Contains($SearchWord)) { $Found++ } If ($Word.Length -gt $Longest.Length) { $Longest = $Word } If ($Dictionary.ContainsKey($Word)) { $Dictionary.$Word++ } else { $Dictionary.Add($Word, 1) } } } }

Write-Progress -Activity "Processing words..." -Completed$DictWords = $Dictionary.CountWrite-Host "There were $WordCount total words in the text"Write-Host "There were $DictWords distinct words in the text"Write-Host "The word $SearchWord was found $Found times."Write-Host "The longest word was $Longest" Write-HostWrite-Host "Most used words with more than 4 letters:"

$Dictionary.GetEnumerator() | ? { $_.Name.Length -gt 4 } | Sort Value -Descending | Select -First 20

 

In case you were wondering what the output would look like, here it is:

 

Reading file .Alice.TXT...
3339 lines read from the file.
There were 25599 total words in the text
There were 2616 distinct words in the text
The word WONDERLAND was found 3 times.
The longest word was DISAPPOINTMENT

Most used words with more than 4 letters:

Name Value
---- -----
ALICE 385
LITTLE 128
ABOUT 94
AGAIN 83
HERSELF 83
WOULD 78
COULD 77
THOUGHT 74
THERE 71
QUEEN 68
BEGAN 58
TURTLE 57
QUITE 55
HATTER 55
DON'T 55
GRYPHON 55
THINK 53
THEIR 51
FIRST 50
THING 49