Weekend Scripter: Use PowerShell to Analyze Custom Objects

Doctor Scripto

Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to analyze custom objects. Microsoft Scripting Guy, Ed Wilson, is here. Last week, I talked about using a Windows PowerShell script to collect the number of words and documents for each of several years. To this, I wrote a Windows PowerShell script that trolled a number of folders, opened the Word documents, and gathered the word count. (See Use PowerShell to Count Words and Display Progress Bar.) The real power of the script comes not in simply emiting the objects to the Windows PowerShell console, but in collecting the objects, and then using Windows PowerShell to process the objects. In fact, with a script that takes a while to run, this is the only practical solution. All I need to do is add $objects = at the point in my script that creates the objects. The revised code is shown here:    Note  Remember, the only thing I did was add $Objects = to my script in the section where I created the custom objects.

$path = “E:DataScriptingGuys”

$year = $NumberOfDocs = $NumberOfWords = $null

$i = 1

$totalDocs = (Get-ChildItem E:DataScriptingGuys -filter “*doc*” -Recurse -file |

  Where {$_.BaseName -match ‘^(HSG|WES|QHF)’}).count

$word = New-Object -ComObject word.application

$word.visible = $false

$objects = Get-ChildItem $path -filter “????” -Directory |

 ForEach-Object {

   $year = $_.name

   Get-ChildItem $_.FullName -filter “*doc*” -Recurse -file |

     Where-Object {$_.BaseName -match ‘^(HSG|WES|QHF)’} |

      ForEach-Object {

      $i++

      Write-Progress -Activity “Processing $($_.BaseName)” `

       -PercentComplete  (($i / $totalDocs)*100) -Status “Working on $year”

      $document = $word.documents.open($_.fullname)

      $NumberOfWords += $document.words.count

      $NumberOfDocs ++

      $document.close() | out-null

      [System.Runtime.Interopservices.Marshal]::ReleaseComObject($document) |

       Out-Null

      Remove-Variable Document }

    [PSCustomObject]@{

     “NumberOfDocuments” = $NumberOfDocs

     “NumberOfWords” = $NumberOfWords

     “Year” = $year}

     $NumberOfDocs = $NumberOfWords = $year = $null }

$word.quit()

[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null

Remove-Variable Word

[gc]::collect()

[gc]::WaitForPendingFinalizers() After I have run the script and created my collection of objects, I will be able to work with the objects until I close Windows PowerShell, change the value of $objects, or remove the variable. The first thing I do is look at the variable to see what it contains. This is shown here:

PS C:> $objects                             NumberOfDocuments                                NumberOfWords Year                                                                    —————–                                ————- —-                                                                                    6                                         9083 2008                                                                                  135                                       281606 2009                                                                                  387                                       672847 2010                                                                                  379                                       600970 2011                                                                                  392                                       598339 2012                                                                                  363                                       502704 2013                                                                                  388                                       456485 2014                                                                                  180                                       123584 2015                                         The next thing I want to do is look at some stats related to the number of words created over the years:

PS C:> $objects | measure -Property numberofwords -Sum

Count    : 8

Average  :

Sum      : 3245618

Maximum  :

Minimum  :

Property : NumberOfWords It was over 3 million words! I want to know: What was the average number of words per year, the maximum number in one year, and the minimum number in one year? This code is shown here:

PS C:> $objects | Measure-Object -Property numberofwords -Sum -Average -Maximum -Minimum 

Count    : 8

Average  : 405702.25

Sum      : 3245618

Maximum  : 672847

Minimum  : 9083

Property : NumberOfWords But the first year, I only wrote six articles, and so that is skewing the results. I decide that I want to eliminate the first year. I can do that like this with the Select-Object cmdlet:

PS C:> $objects | select -Last 7                              NumberOfDocuments                                NumberOfWords Year                                                                    —————–                                ————- —-                                                                                   135                                       281606 2009                                                                                  387                                       672847 2010                                                                                   379                                       600970 2011                                                                                  392                                       598339 2012                                                                                  363                                       502704 2013                                                                                  388                                       456485 2014                                                                                  180                                       123584 2015            Now that I have eliminated the first year, I add the Measure-Object cmdlet:

PS C:> $objects | select -Last 7 |  Measure-Object -Property numberofwords -Sum -Average -Maximum -Minimum 

Count    : 7

Average  : 462362.142857143

Sum      : 3236535

Maximum  : 672847

Minimum  : 123584

Property : NumberOfWords But what if I want to see the average size of each document? Well, I did not specifically collect that, did I? No problem, I have the information. All I need to do is to create a new object with the information I need. Once again, I use the Select-Object cmdlet as shown here:

PS C:> $objects | select Year, @{L = “AverageSize”; E = {$_.NumberOfWords / $_.NumberOfDocuments}} | ft -AutoSize

Year      AverageSize

—-      ———–

2008 1513.83333333333

2009 2085.97037037037

2010 1738.62273901809

2011   1585.672823219

2012         1526.375

2013 1384.85950413223

2014 1176.50773195876

2015 686.577777777778 What if I want a more in-depth look at the average size? Well, I bring the Measure-Object cmdlet back in to play. This is shown here: PS C:> $objects | select Year, @{L = “AverageSize”; E = {$_.NumberOfWords / $_.NumberOfDocuments}} | measure -Property AverageSize -Average -Maximum -Minimum

Count    : 8

Average  : 1462.3024099762

Sum      :

Maximum  : 2085.97037037037

Minimum  : 686.577777777778

Property : AverageSize So, by creating a custom object and saving that object in a variable, it makes for great offline analysis. That is all for now. Join me tomorrow for more way cool Windows PowerShell stuff. I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace. Ed Wilson, Microsoft Scripting Guy 

0 comments

Discussion is closed.

Feedback usabilla icon