Weekend Scripter: Improve Performance When Combining PowerShell Arrays

Doctor Scripto

Summary: Microsoft premier field engineer, Chris Wu, talks about combining Windows PowerShell arrays.

Microsoft Scripting Guy, Ed Wilson, is here. Chris Wu, a Microsoft PFE, is back to share his knowledge. See previous Hey, Scripting Guy! Blog guest posts from Chris.

Here is contact information for Chris:

Twitter: https://twitter.com/chwu_ms
Facebook: https://www.facebook.com/mschwu
LinkedIn: http://ca.linkedin.com/in/mschwu

Take it away Chris…

While teaching a Windows PowerShell workshop, I was asked about how to combine two arrays of different objects (which share one key property) into one that contains objects with properties from both source objects. One real world scenario is to merge information retrieved from Active Directory (a list of Active Directory users and their properties) and Exchange Server (mailboxes).

The current approach by the student is to use two-level loops, which has seen performance issues when source arrays become huge in size (the following code snippet uses dummy data for demonstration purposes).

$ADList = @”

chwu,chwu@microsoft.com,Chris Wu

tst1,,Test User1

tst2,tst2@contoso.com,Test User2

“@ | ConvertFrom-Csv -Header Name,mail,CN

$EXList = @”

chwu,ex1.contoso.com

tst2,ex2.contoso.com

“@ | ConvertFrom-Csv -Header Name,MailServer

 

$Result = @()

foreach($ad in $ADList) {

  $Match = $false

  foreach($ex in $EXList) {

    if ($ad.Name -eq $ex.Name) {

      $Result += New-Object PSObject -Property @{Name=$ad.Name; mail=$ad.mail; CN=$ad.CN; MailServer=$ex.MailServer}

      $Match = $true

    }

  }

  if(-not $Match) {

    $Result += New-Object PSObject -Property @{Name=$ad.Name; mail=$ad.mail; CN=$ad.CN}

  }

}

 Image of command output

In this post, I will explore several options to improve the performance and cleanness of this code snippet.

The first thing we can do is remove the use of the $Result array, which has two drawbacks:

  • Every assignment operation will create a new array in memory with data copied from the source, which is inefficient.
  • It defeats the streaming benefit of the WindowsPowerShell pipeline because it returns all objects as a whole at the end of the processing. A best practice in Windows PowerShell is to emit an individual object whenever it’s ready.

Another performance issue stems from the use of an inner loop to search for a matching record in the second array, which basically multiplies the total number of iterations. We can utilize a hash table for faster lookup. The Group-Object cmdlet offers a convenient AsHashTable parameter that can be used here.

Image of command output

Please be warned that the value portion of each entry is an array of matching records. If we are grouping records by using a property with unique values (such as SamAccountName), those arrays will apparently contains one single element each.

So an enhanced version of the code snippet is like this (with the constructing source arrays removed):

$h = $EXList | Group-Object -Property Name -AsHashTable

$ADList | %{

  if($h[$_.Name]) {

    New-Object PSObject -Property @{Name=$_.Name; mail=$_.mail; CN=$_.CN; MailServer=$h[$_.Name][0].MailServer}

  } else {

    New-Object PSObject -Property @{Name=$_.Name; mail=$_.mail; CN=$_.CN}

  }

}

The last (but not the least) idea is to sort both arrays based on the key property (user name) beforehand, then we can pair records in a single iteration. Note that in this particular example, users found in Active Directory is a superset of users in Exchange, so we need a pointer variable to deal with this little quirk.

$ADList = @”

chwu,chwu@microsoft.com,Chris Wu

tst1,,Test User1

tst2,tst2@contoso.com,Test User2

“@ | ConvertFrom-Csv -Header Name,mail,CN | Sort-Object -Property Name

 

$EXList = @”

chwu,ex1.contoso.com

tst2,ex2.contoso.com

“@ | ConvertFrom-Csv -Header Name,MailServer | Sort-Object -Property Name

 

$ADList | % {$p = 0} {

  if($_.Name -eq $EXList[$p].Name) {

    New-Object PSObject -Property @{Name=$_.Name; mail=$_.mail; CN=$_.CN; MailServer=$EXList[$p].MailServer}

    $p++

  } else {

    New-Object PSObject -Property @{Name=$_.Name; mail=$_.mail; CN=$_.CN}

  }

}

The last two snippets perform much better than the original one, but which one is faster remains a question (I haven’t tested them against large arrays just yet). I would like to hear about your results, and I welcome your thoughts and ideas.

~Chris

Thanks Chris excellent blog post. Join me tomorrow for more cool Windows PowerShell stuff.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

0 comments

Discussion is closed.

Feedback usabilla icon