Divide Discovery and In-Place hold searches into smaller units

This blog has been in the works for longer than I would like to admit :). Since starting a few things have changed. For example when this was first started in the fall the maximum number of mailboxes in a Discovery Search was 5000. Today it is 10,000. In time the limit will probably change again or it may even disappear at some point. For the moment however the primary focus of this blog is to present a script that will show you how to create searches for a specific number of primary mailboxes (Discovery mailboxes go against the count of 10,000, but are not listed individually in the list of mailboxes to be searched).

 

Why do this? There are two reasons.

  • If you have a search that is failing or timing out a common troubleshooting step is to try the same search on a single mailbox. If it works then the next question is does it work with 10 mailboxes? a 100? 500? Once you discover a comfortable threshold you might have thousands (in some Office 365 tenants hundreds of thousands) of mailboxes that need to have the same search applied. Creating all of those searches by hand is at best tedious and at worst impossible. The mailbox selection dialog in the web interface shows 500 mailboxes.
  • You need to apply the search to more than 10,000 mailboxes.

If you need to know more about the basics of Discovery Searches before reading further you can get a great overview of how to handle Discovery Searches and in-place hold from part I and II of this blog:

https://blogs.technet.com/b/exchange/archive/2012/09/26/in-place-e-discovery-and-in-place-hold-in-the-new-exchange.aspx

 

The cloud does have some differences from the on-premise world:

  • There is a maximum of 10,000 mailboxes per search
  • Only 2 searches can run simultaneously (though 32 can be queued)
  • In the on-prem world you can precisely measure the load on the server that is doing the Discovery Search. In the cloud this detail is hidden. If your search is failing due to timeouts you may have to break it into smaller pieces or try it at a time fewer administrators of other tenants are doing the same thing. To some extent this is a little like deciding what time of day to go to the bank. If everyone else goes to the back at the same time the Tellers (servers/bank employees) will take longer to serve you. You might even give up and walk away (a timeout error).
  • The destination Discovery Mailboxes are capped at 50GB.

NOTE: Please remember that all the numbers given here are subject to change as Office 365 evolves.

 

So how do you work with a large number of users?  Most administrators find it awkward to deal with the Office 365 web interface for adding thousands of mailboxes to a query because the selection dialog only shows 500 mailboxes. There are a couple ways to deal with this:

  • Create Distribution Groups with up to 10,000 member in each. You can then base the search off the distribution group's membership instead of adding the mailboxes
  • Use a script like the one I provide in this blog to add the individual mailboxes to each search

 

The first way has the advantage that once the group is built you need only add one item to the Discovery Search you wish to run (remember the group membership must not exceed 10,000 mailboxes). One catch is that if group membership changes, and the Discovery Search has already been started, the change of group membership is not automatically cascaded into search results. If you want the latest membership changes accounted for you will have to restart the search.

 

Here are a couple of scenarios an administrator might encounter:

You have 33,000 users that need to have in-place hold instituted. In this example you would need to create 4 distribution groups (three of 10,000 members and one of 3000 members). You need not worry about the size of the discovery mailbox because it is not a required parameter for placing an in-place hold.

 

You need to conduct a Discovery Search against 67,000 mailboxes. This would mean creating a minimum of 7 distribution groups. However, trying to do searches against thousands of mailboxes might run into the limit on the size of a Discovery Search mailbox (presently 50GB). Avoiding the size limit of the Discovery Search mailbox might cause you to go with 500 mailboxes per Discovery Search to avoid overflowing the Discovery Mailbox. This means creating a lot more than 7 groups. It also means creating additional Discovery Mailboxes so no one mailbox is overwhelmed with content.

 

To handle both of these hypothetical examples I have written a script that can create the necessary Discovery Searches and spread them over the list of Discovery Mailboxes that are supplied to the script via a TXT file:

#

# Objective:

# Create a new Discovery Search for every $mailboxesPerSearch users. The search

# results are distributed across the Discovery Mailboxes specified in

# c:\o365\DiscoveryMbxs.txt. The distribution is round robin. The TXT file MUST

# contain the UPN of each Discovery Mailbox to be used and each UPN should be on its

# own line of the file. No field name or column heading should be used. THERE MUST

# BE AT LEAST ONE UPN IN c:\o365\DiscoveryMbxs.txt

# Each search name will be text followed by a sequential number. For example: 

# Search1, Search2, etc.

#

# The script does not start any of the Discovery Searches it creates.

#

# For Demonstration purposes this script places 3 mailboxes in each search. You can

# change this by altering the value of the $mailboxesPerSearch variable.

#

# Load the list of discovery mailboxes into a variable. Searches will be distributed

# across the Discovery mailboxes in round robin fashion.

[array]$DiscoveryList = get-content "c:\o365\DiscoveryMbxs.txt"  

# Must use [array] to handle the case where the text file lists only one mailbox. 

#Without Array it gets treated as a string and fails later.

$DiscoveryCount = $DiscoveryList.count

$DiscoveryLoop = 0

 

# Set the number of mailboxes that will be in each search. Acceptable range is 1 to

# 10000. 3 is selected for illustration purposes in a lab. Remember that discovery

# mailboxes go against the 10,000 max, but are not listed individually in the search.

# If all your mailboxes have archives then treat the maximum as 5,000.

$mailboxesPerSearch = 3                       

 

$base_search_name = "Legal-TESTsearch"

$loop_count = 0

 

# The line below obtains the list of mailboxes to be searched.

$mbxs = get-mailbox -resultsize unlimited -RecipientTypeDetails UserMailbox       

        # For additional filtering options see the help for get-mailbox

        # https://technet.microsoft.com/en-us/library/bb123685(v=exchg.150).aspx and for

        # where-object https://technet.microsoft.com/en-us/library/ee177028.aspx

 

$MbxRemaining = $mbxs.count

$low_bound = 0

 

while ($MbxRemaining -gt 0 ) {

    # This loop creates one search per loop and decrements $MbxRemaining by the

    # number of mailboxes included in the search.

 

    $loop_count += 1

   

    $this_search = $base_search_name + $loop_count

    write-host "creating search: " $this_search -foregroundcolor cyan

 

    if($MbxRemaining -gt $mailboxesPerSearch) {

      # There are at least $mailboxesPerSearch mailboxes left. 

      # This IF branch fills $search_list with $mailboxesPerSearch mailboxes.

 

      # Clear $search_list

      [array]$search_list = $NULL

      # Place mailboxes from the current value of low_bound to low_bound plus $mailboxesPerSearch

      #minus one in $search_list

      $mbxs[$low_bound..($mailboxesPerSearch * $loop_count -1)] | foreach {

         [array]$search_list=$search_list + [string]$_.UserPrincipalName

      }

     } else {

        # This branch fills $search_list with the mailboxes that are left when

        # there are less than $mailboxesPerSearch remaining.

 

        # Clear $search_list

        [array]$search_list = $NULL

        #Place mailboxes from the current value of low_bound to the end of the array in $search_list

        $mbxs[$low_bound..($mbxs.count -1)] | foreach {

         [array]$search_list=$search_list + [string]$_.UserPrincipalName

      }

    }

 

    #To create an In-Place Hold customize the line below to suit your needs

    #New-MailboxSearch -name $this_search -SourceMailboxes $search_list -InPlaceHoldEnabled $true -ItemHoldPeriod 730

 

    #To create an ordinary Discovery Search customize the line below to meet your needs

    New-MailboxSearch -name $this_search -SourceMailboxes $search_list -TargetMailbox $DiscoveryList[$DiscoveryLoop] -StartDate "12/31/2012" -SearchQuery "'10-K' OR '10K' OR '10k' OR '10k' OR 'annual report' OR 'fire' NEAR(15) ('insur*' OR 'pay*' OR 'claim*') OR 'fire' NEAR(20) 'insurance'"

 

    $DiscoveryLoop += 1

    if($DiscoveryLoop -ge $DiscoveryCount) { 

      # If we have looped through all available Discovery Mailboxes reset the counter

      # to the first element of the array.

      $DiscoveryLoop = 0

    }

    $low_bound = $mailboxesPerSearch * $loop_count

    $MbxRemaining -= $mailboxesPerSearch

}

 

If you are using the script above to place mailboxes on In-Place Hold I would suggest that you select a name the will cause the search to appear at the bottom of the list of searches. The reason I suggest this is that the Search placing the mailbox on In-Place hold might be around a long time. Prefixing the name with “z” or another character further down in the sort order will place all the In-Place Hold searches at the bottom of the list when you sort by name.

 

Some Discovery Search troubleshooting tips:

  • Include fewer mailboxes in each search (modify the $mailboxesPerSearch variable in the script)
  • Tighten the criteria applied to the get-mailbox cmdlet the script runs to make sure you are only including the mailboxes that truly need to be searched
  • Cover a smaller date range in each search. If you searched without a date range and encountered a timeout or other failure read the error message carefully for clues. There have been instances in the past where specifying a start and end date allowed failing queries to complete
  • Search during times when demands placed on the search servers are likely to be lower (this will vary by the region in which your mailboxes are hosted)
  • Test skipping the "Include unsearchable items" and "enable de-duplication" options. This will reduce the demands of your search and allow it to run faster. If this test works try breaking the search into smaller units and run them individually with the required options. If it still doesn't work you may need to discuss the search with Office 365 support.
  • If you are exporting to PST make sure you have .Net 4.5 and its latest updates installed.

 

Above I mentioned scheduling your searches as a possible option. If you want or need to run your search during a specific time window it is best to create a scheduled task to execute your PS1 file. The PS1 file then logs on to the Office 365 environment and executes your commands. Here is a sample script that will log on and start a pair of searches:

# Specify username and password. The Password is stored in an encrypted text file in this example

$EXOAdmUser = “YourAccount@YourDomain.onmicrosoft.com” 

$pwd= Get-Content "c:\o365\password.txt" | ConvertTo-SecureString

$O365cred = New-Object System.Management.Automation.PSCredential $EXOAdmUser, $pwd

 

# Connect to Office 365

$session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri "https://ps.outlook.com/powershell/" -Credential $O365Cred -Authentication Basic -AllowRedirection

Import-PSSession $session 

Start-MailboxSearch -identity "DiscoverySearch1" -force

Start-MailboxSearch -identity “DiscoverySearch2” -force

Exit

 

 

To create password.txt follow these steps:

  • Log in to office 365 from a PowerShell session using the account that you will be using to create and execute Discovery Searches. You can use these lines to do so:
    $O365Cred = (Get-Credential)
    $session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri "https://ps.outlook.com/powershell/" -Credential $O365Cred -Authentication Basic -AllowRedirection
    Import-PSSession $session

  • Run this command: $o365cred.Password | ConvertFrom-SecureString | Set-Content c:\o365\password.txt

  • Make sure password.txt is created.

 

To log in at the time you wish and start your script you would need to create a Scheduled Task. In Scheduled Tasks you would use a command line like powershell.exe -Noninteractive -Noprofile -Command “&{<full path to your script>}”.