Scripting Corner Volume 2 – Pipelining and the One Line command


EDIT: This post has been updated on 10/15/2007 to incorporate feedback we received since original posting.

So my last post demonstrated a script that used a .net method and some functions within the script to take care of a problem that was a little hard to solve manually. The feedback I received was generally positive but some folks indicated "why is PowerShell useful to me"… "why should I lean how to script these are all one off situations"? To an extent I will agree with you.

If you are a smaller shop say 1 to 2 servers then there might not be as much value in leaning to write complex Powershell scripts. Most of what you are going to do on the server is going to be for one off situations (create storage groups, create databases, move mailboxes, etc). So for this installment of my scripting corner I have tried to come up with command that meet two criteria. 1) The command can be written on "one line" within PowerShell and 2) The command is something that I could generally see being used on more than one occasion by the same company.

The scenario that I came up with is one we see here on occasion; where an end user will send out an email to an unintentionally large group of users or will send a message out with information that went to a distribution group or set of users that it should not have. In both of these cases there is not a good end user powered way to get these messages back out of the Database once they have been delivered. For Exchange 2000 and 2003 we had the Exmerge tool that could extract these out for us. In Exchange 2007 we have the Export-Mailbox cmdlet.

So here we have the command that you can run that will go thru the organization and move all of these unexpected messages to a target mailbox.

Get-mailboxserver | foreach {Get-mailbox –server $_.identity -resultsize unlimited | export-mailbox –subjectkeywords "Bad Message" -targetmailbox <newmailbox> -targetfolder Deleteme –deletecontent –confirm:$false}

So let’s break this one liner down for the key pieces that it uses. The biggest thing that we use here is pipelining "|". The pipeline allows us to take the output of one cmdlet and send it directly to another cmdlet. The cmdlet further down the pipeline will need to understand the output of the proceeding cmdlet, but as long as you use a little common sense this is usually not an issue.

We are also using foreach (the alias of foreach-object) to setup an execution loop. When pipelining in from another cmdlet to foreach loop the foreach will execute against each of the outputs of the proceeding cmdlet. In other words if our get-mailboxserver returns five objects, we will execute the foreach loop five times.

This pipelining the output of a cmdlet into foreach (foreach-object) is one of the most common things that I end up doing when writing any script.

Within the loop itself we are doing another pipeline. In this case we are taking the output from the get-mailboxserver cmdlet and using the identity property ($_.X Means use the X property from the object coming down the pipe) to search for all mailboxes that are located on that server. Once we have gathered up that set of users we pipeline it into the export-mailbox cmdlet and use that cmdlet to do the work.

The export-mailbox cmdlet we have setup will simply search all of the mailboxes we give it off the pipeline for any messages that have the specified keywords in the subject and then move them to our target mailbox into a folder called "Deleteme".

Now the big question that some people might be asking is why I didn’t just use a much simpler Powershell command to do this process.

Get-mailbox -resultsize unlimited | export-mailbox –subjectkeywords "Bad Message" -targetmailbox <newmailbox> -targetfolder Deleteme –deletecontent –confirm:$false

After all doesn’t that accomplish the same thing? The answer is yes and no. If you have a small environment with let’s say less than 5000 users or so then the shorter command above is perfect for you. If you have an environment that is larger than that then you will want to use the foreach command to streamline the process.

The reason for this is that the export-mailbox cmdlet is a multi threaded cmdlet so it is going to force the aggregation of all of the mailbox objects before it begins operating on them. (To identity if a cmdlet is a multi threaded cmdlet look for the –maxthreads switch.) Since that is a limitation of the current cmdlets we are working with in a large environment operating on the users on a per server basis will get things started faster (since we won’t have to wait on AD to return all 50k objects to get started) and will be less resource intensive.

Hopefully I have accomplished my goal of showing everyone a little bit more about Powershell and sharing a script that will be of use more than once for people. Again I would like to solicit feedback from you the reader on how you liked this article and if there is anything you would like to see turned into a script. Otherwise I will just have to make something up for my next post and who knows what I will think of.

Matthew Byrd


Share this post :

Comments (14)
  1. evetsleep says:

    Thanks for Matthew.  This is one of the reasons I really do love PowerShell.  This scenario plays out more often then you’d think, especially at large companies.  I’ve gotten into having a folder for Exmerge with a pre-configured ini file on all servers to yank stuff, so being able to do this with PoSh is great.  

    I really wish we could remove the "recall" message though from Outlook as it just doesn’t work and really confuses people.

  2. Keif Machado says:

    The ‘for-each’ looping definitely makes transition and deployment processes easier.  My team and I have designed a few Powershell one-liners to move mailboxes and set UM settings.  Below is two examples:

    Move-Mailboxes using a CSV File: (This can run in multiple Powershell sessions)

    Import-CSV Users.csv | For-Each {Move-Mailbox -Identity $_.UserAlias -TargetDatabase $_.Database -BadItemLimit:10 -Confirm:$False}

    UM Enable Mailboxes with Extension Numbers:

    Import-CSV Users.csv | foreach {Enable-UMMailbox -Identity $_.UserAlias -UMMailboxPolicy ‘San Leandro UM Dial Plan Default Policy’ -Extensions $_.ExtNo -Pin ‘1234’ -PinExpired $false}

    One-liners are going to save a mass amount of time for Administrators, Consultants, and etc..

  3. KB says:

    Desperately needed (due to so few if any "complete" examples that "just work") are examples of integrating exchange + powershell with ASP.NET.  Large and small organizations both have reasons for needing admin related exchange information or methods in web pages, and months after exchange 2007’s release there is nearly no information about it vs the tons of examples in 2003 with CDOEXM, WMI, etc.

    For example, we probably aren’t going to give our first line help desk powershell access.  But we want them to be able to check the status of a user’s mailbox, find out their quota and quota status, what their settings are, folder sizes, get their activesync recovery password, etc.  All these things are in powershell but how about some examples and best practices for putting this into ASP.NET?  

    There ought to be a couple examples either here, or on Technet, and maybe in the SDK: perhaps a single user report page, an interactive table type page with a drilldown form, and a page carrying out a method/operation and returning the results, along with showing proper error handling from powershell into ASP.NET.  Best practices for running with different credentials than the web page calling user.  Best practices for manipulating the data from powershell into either raw HTML or into ASP.NET controls.

    Right now, the lack of these sort of examples makes powershell a big stumbling block in this scenario compared with a .NET based API, even though interactively, it is relatively easy to use and addictive.

  4. Well heres my PowerShell attempt. As Exchange 2007 is so boring, it just works, we stopped monitoring it, but we were getting worried that sometimes it had stopped working, where as in fact the complaining users (including me) had in fact received no mail.

    So I wrote this scipt that logs some performance counters from the SMTP send and receive connectors and the sizes, in MB, of users mail boxes all redirected to a file call DailyFile.log. The file is ASCII as opposed to Unicode as I use a little utility (blat.exe) to e-mail the report to the administrators (Unicode didn’t work).

    The script is actually generated by a .CMD file and “echo…> file.ps1”, the .CMD file being scheduled to run at 6am & 6pm.

    I am sure someone could improve this, I am sure I saw a PowerShell e-mailer somewhere that talks to a SMTP port and the performance counter reading bit is a bit heavy….but it works and I know Exchange is working fine.

    $c1 = new-object System.Diagnostics.PerformanceCounter(“MSExchangeTransport SmtpReceive”,”Messages Received Total”,”from POP3 forwarder”)
    $c2 = new-object System.Diagnostics.PerformanceCounter(“MSExchangeTransport SmtpSend”,”Messages Sent Total”,”to FastHosts”)
    $c3 = new-object System.Diagnostics.PerformanceCounter(“MSExchangeTransport SmtpReceive”,”Bytes Received Total”,”from POP3 forwarder”)
    $c4 = new-object System.Diagnostics.PerformanceCounter(“MSExchangeTransport SmtpReceive”,”Message Bytes Received Total”,”from POP3 forwarder”)
    $c5 = new-object System.Diagnostics.PerformanceCounter(“MSExchangeTransport SmtpSend”,”Message Bytes Sent Total”,”to FastHosts”)
    $c1, $c2, $c3, $c4, $c5 | ft CounterName, RawValue -Autosize | out-file “dailyfile.log” -encoding ascii
    Get-MailboxStatistics | select-object DisplayName, ItemCount, TotalItemSize | Sort-Object -property TotalItemSize -descending | ForEach-Object -process { $_.TotalItemSize = [system.math]::round(($_.TotalItemSize)/1024.0/1024.0, 2); $_ } | ft -autosize | out-file “dailyfile.log” -encoding ascii -append
    exit

  5. Kirk Munro says:

    I’d like to formally request that people stop referring to the ForEach-Object cmdlet as a foreach loop, and that they explicitly use ForEach-Object by name and stop using the foreach alias in published examples.  There are a lot of users searching for help on foreach, and this is easily one of the most confusing cmdlets/statements in PowerShell due to the name overlap.  Blurring the line between ForEach-Object and foreach is contributing to increasing the learning curve required when working with PowerShell.  I see this in searches that lead to my blog on a regular basis.  The two articles I have written about foreach are the most popular articles by far.

    ForEach-Object is not a loop statement.  You cannot use the break statement to get out of it (it will actually break out of the loop that contains the ForEach-Object statement, or the entire script will terminate if there is no external loop statement), nor can you use the continue statement to stop processing the current iteration and continue to the next (continue will also continue the loop that contains the ForEach-Object statement, or the entire script will terminate if there is no external loop statement).  The documentation in about_foreach is wrong!  You cannot use the foreach statement inside of a pipeline (*unless it is used within a script that is a parameter of a cmdlet like Where-Object or ForEach-Object), despite what the section titled "The Foreach Statement Inside of a Command Pipeline" implies.  You can use the ForEach-Object cmdlet inside of the command pipeline, which is not the foreach statement, and while it functions in a similar manner (only in that it processes a set of objects iteratively) the two are not equivalent nor should they be treated as equivalent.

    You can read more about the differences between these the ForEach-Object cmdlet and the foreach statement on my blog.  The first article, Essential PowerShell: Understanding foreach, can be found here:

    http://poshoholic.com/2007/08/21/essential-powershell-understanding-foreach/”>http://poshoholic.com/2007/08/21/essential-powershell-understanding-foreach/

    The follow-up article to that post, Essential PowerShell: Understanding foreach (Addendum), can be found here:

    http://poshoholic.com/2007/08/31/essential-powershell-understanding-foreach-addendum/”>http://poshoholic.com/2007/08/31/essential-powershell-understanding-foreach-addendum/

    If you’re working with foreach and ForEach-Object, please make sure you understand the differences between these the cmdlet and the statement.

    Also, as Dmitry Sotnikov points out (http://dmitrysotnikov.wordpress.com/2007/10/02/bug-in-exchange-cmdlets/), something funny is going on here if what is written above is actually true.  I agree with Dmitry on this.  If pipelining the results of the first cmdlet into the ForEach-Object cmdlet (not the foreach loop statement!) results in the objects coming through the pipeline one at a time while not using the ForEach-Object results in the entire set being collected before anything is passed down the pipeline then something is definitely wrong here.  At least according to PowerShell documentation.  If you look at section 2.3.1 of "PowerShell in Action" by Bruce Payette, one of the creators of PowerShell (if you love PowerShell, buy this book!) in the last paragraph on page 46 he describes how pipelining cmdlets results in objects being passed from one cmdlet to another as soon as those objects are emitted from a given stage in the pipeline.  Since that is the case, you shouldn’t actually need to use the ForEach-Object cmdlet here to make the entire one-liner execute more quickly.  ForEach-Object allows you to execute multiple lines of PowerShell script in the middle of a pipeline.  It also allows you to pass the results from one stage in a pipeline to parameters of a cmdlet in another stage even if those parameters were not configured to accept input from the pipeline.  It does a whole bunch of other things as well, but my point is, according to the description of PowerShell’s streaming behaviour in pipeline execution, what this article states is actually incorrect.  That is, unless the Exchange cmdlets that are described here don’t function the way other cmdlets do in a pipeline.

    Please respond to this.  There is a lot of confusion generated by this blog entry!

    Thanks for listening!

    Kirk out.

    http://poshoholic.com

  6. Karl Prosser says:

    Yes i would agree with Dmitry, ussually foreach-object ADDS overhead that can be quite significant when dealing with large number of items.. the pipeline should be FLOWING regardless.

    What it would seem like is that export-mailbox cmdlet is programmed in a similar way to a powershell function that doesn’t have a seperate process section, in that its taking in the whole pipeline before it does any action.

    Somebody couldprobably confirm this with reflector if they so desired.

    -Karl

  7. Matbyrd says:

    Kirk and Dmitry,

    Thank you for your comments.  I have reviewed your posts and the available information and here is what I have found.

    You are correct in that Powershell will send information down the pipeline as soon as that information is available to do so.  This is core to the architecture of Powershell.  I should not have generalized the behavior of the get-mailbox cmdlet (and the other exchange cmdlets) to the behavior of Powershell.  In this instance the get-mailbox cmdlet (and most other exchange cmdlets) operates by doing an ldap query to AD and is written in such a way that it will not pass any information down the pipeline until all of the results of the query have been returned.  So this is a specific limitation of the get-mailbox cmdlet and is not generally how Powershell operates.  

    In a larger environment you will start processing mailboxes sooner and with fewer resources consumed if you execute the export-mailbox command only against a sub-set of users; rather than trying to execute against the entire set of users in the organization.   Thus my “workaround” of using foreach to only operate on the users from one given server at a time.

    The behavior, in this case, of the pipeline is the same if you are using the foreach command or if you are not.  The data is not passed from the get-mailbox cmdlet down the pipeline until the cmdlet has gotten all of the needed data from AD (again a limitation of the cmdlet not of Powershell).  I used the foreach cmdlet in my one line script only to break my get-mailbox command down into manageable / less resource intensive data sets not to improve the speed of command or change any behavior of Powershell.  

    Yes you are also correct that foreach is both an alias to the cmdlet foreach-object and a keyword recognized by Powershell (like the keyword “if”).  The way to determine which one of these you are using is by their position in your statement.  You will only use a keyword if the keyword comes at the beginning of the statement.

    “Foreach ($value in $array) { <action> }” is going to use the keyword

    “Get-childitem | foreach { <action> }” is going to be using the alias to the foreach-object cmdlet

    Thank you for helping to clarify this for the readers of this blog by referencing people to your entry on the differences between the keyword and the cmdlet.  I will continue to strive in my documentation to call the foreach command a command and the foreach keyword a keyword when each is appropriate.

    I will shortly be making some modifications to this post that will correct the error in the second to last paragraph on the behavior of Powershell and that will further call out that I am using the foreach alias and not the keyword.

    Thank you for your comments, please keep them coming.

    Special thanks to Bruce Payette and his book for helping me understand what I was missing.

    -Matthew

  8. Kirk Munro says:

    That’s fantastic Matthew, thank you so much for taking the time to clarify this!  If there are a number of cmdlets in the Exchange snapin that behave in such a manner, is that documented anywhere?  I work with customers who have very large Exchange environments on a regular basis, and as I tend to use PowerShell quite heavily these days, knowing which cmdlets behave in this fashion would go a long way to helping me use PowerShell as efficiently as possible in these environments.  It is really important that this difference in behaviour be known to PowerShell script authors.

    Further, are there plans to change these cmdlets in the future so that they return objects iteratively through the pipeline and follow the PowerShell pipelining model?

    You know, there is a really big impact to these cmdlets behaving this way.  Compare these two examples:

    foreach ($service in Get-Service) {$service}

    and

    foreach ($mailbox in Get-Mailbox -resultSize Unlimited) {$mailbox}

    In the former, the services are returned iteratively from Get-Service but they are collected and stored in a temporary variable.  This results in one collection of the objects returned by Get-Service being in memory at the time that the foreach statement starts iterating through them.

    Now look at the latter.  According to what you just posted in the comments, Get-Mailbox will gather and store the entire collection of mailboxes internally before it sends any of them back.  Then the foreach statement will use a temporary variable to store the entire collection of mailboxes.  Again.  That’s two copies, twice the memory footprint.  This could be really bad news if you don’t realize how Get-Mailbox really works under the covers.  In the current implementation, from what you have stated it seems that you should *never* use Get-Mailbox with the foreach statement if you’re collecting a large number of mailboxes (define large: more than the default size limit of 1000?) because you’ll have two copies of that in memory.  Never might seem like a strong word but I think it’s better to go that route than to have to worry about doubling the amount of storage for large collections.  This makes it all that much more important to know which Exchange cmdlets behave in this way (and it makes it all that much more important to understand the differences between the foreach statement and the ForEach-Object cmdlet :)).

    Please correct me if I am wrong anywhere in this analysis.

    Thanks again,

    Kirk out.

    http://poshoholic.com

  9. Kirk Munro says:

    One more thing on this.  I was just re-reviewing your last comments.  According to what you’ve stated in the comments, the original script you posted doesn’t even need foreach-object, correct?  Because foreach-object just adds a little processing overhead that isn’t necessary since the server parameter accepts input from the pipeline.  So I believe it could have been written like this:

    Get-mailboxserver | Get-mailbox -resultsize unlimited | export-mailbox –subjectkeywords "Bad Message" -targetmailbox <newmailbox> -targetfolder Deleteme –deletecontent –confirm:$false

    This would still allow you to break down the Get-Mailbox cmdlet into bite-sized chunks (per server) while avoiding the unnecessary use of ForEach-Object.  Unless there are more unexpected things going on here that require you to use ForEach-Object.

    Kirk out.

    http://poshoholic.com

  10. Evan says:

    Couple of comments here that deserve saying:

    1) The key reason for this behavior has nothing to do with PowerShell, per se. It has to do TOTALLY with the way Export-mailbox works. Rather than operating iteratively on each pipeline entry, Export-Mailbox works just like Move-Mailbox… it collects all of the pipeline inputs (in ProcessRecord) and then waits until all inputs are collected before taking any action (in EndProcess). This is done so that it can run multi-threaded. If it had to deal with each iteration in ProcessRecord, it would not be able to process multiple exports simultaneously. This is why it "takes a while to start", since it has to accumulate all of the entries into Export-mailbox before it can start.

    Ok, that said, it leads into point #2:

    2) Get-Mailbox doesn’t have to collect all of the result objects before it passes any of them along on the pipeline. The comments about being limited by the way AD works *ARE* true… but the result set that is treated monolithicly is the AD page size (generally 1000 objects), not the full set of objects. This means if you say "-ResultSize:Unlimited" it won’t have to collect tens of thousands of objects, potentially, before any are streamed into the pipeline. Rather, it will send the first 1000 as a block, etc.

    So, the reason why we get the ‘faster’ behavior from a ForEach(-Object) loop here is that the Export-Mailbox isn’t blocked on executing until 1000 objects are received — it only has to wait for one object; not because the Get-Mailbox is blocked until all of the (>1000) objects are returned from AD.

  11. Kirk Munro says:

    Thanks for the details Evan.  This makes more sense now.  It’s nice to know that Get-Mailbox isn’t collecting _all_ results before passing them along the pipeline.

    It’s also nice to know that Export-Mailbox and Move-Mailbox are collecting objects in ProcessRecord and then processing them in EndProcess.  I noticed that these two cmdlets take in a parameter (MaxThreads) indicating the maximum number of threads they can use.  I noticed that the Restore-Mailbox and Test-SystemHealth also use the same parameter, so I suspect the function in a similar manner.

    Are these four cmdlets the only Exchange cmdlets that collect objects in ProcessRecord and process them in EndProcess or are there others?  I’m wondering if there is a way to identify cmdlets that behave this way through some property on them.  I can check for the MaxThreads attribute for Exchange cmdlets but what about others?  I suspect not, but it would be nice to be able to learn about this behaviour through cmdlet properties as well as documentation.

  12. Evan says:

    Kirk – hmm.. I’m not sure of any better (ie – fullproof) way of making this determination than inspecting for the MaxThreads parameter. Since this is fundamentally code/behavior within the cmdlet which is particular to these few Exchange cmdlets, there’s no metadata exposed to surface this, as far as I am aware.

  13. Matbyrd says:

    Hi Guys,

    Thank you for all of the great feedback.  I was unexpected out of the office for a few days and have just submitted the new updated version of this post.  Hopefully it will be up in the next day or two.

    I also carred on some interesting discussions with our developers on the get-mailbox comdlet and the behavior that I was seeing when testing with it.

    The net result of that is that YES the get-mailbox comdlet will pass data down the pipeline as it gets a completed page back from AD (watch thoese customer that have changed the default page size).

    Also I finally got a chance to test just removing the foreach completely and going with get-mailboxserver | get-mailbox … and yes that does appear to accomplish the same thing.

    So some lessons learned on my part (thanks to the community for pointing out the errors in my observations) and some rules I am going to follow going forward:

    1) When in doubt, avoid foreach if you can.  It is a simple matter to take a look at the help on the cmdlet you are pipelining too and see if the value you are looking to send in is marked as being pipelinable.  

       -Server <ServerIdParameter>

           .

           .

           Accept pipeline input?       True

           Accept wildcard characters?  false

    2) When using an alias make sure to document that you are using an alias to ensure that people are clear on what is being used where.

    3) The book Windows Powershell in Action by Bruce Payette is a wonderful read and I strongly recommend it to anyone seriously interested in learning about powershell.  

    I am already working on Volume 3 … it is going to be a collect of common functions I have created and placed in my profile.  So if you have any suggestions please let me know.

    -Matt

Comments are closed.