Office 365 Search is now smarter about spreadsheets

Businesses use spreadsheets for many critical functions, such as financial reporting and investment modeling, and they need to minimize the risks that using these spreadsheets introduces. The need for strong spreadsheet management is emphasized in these stories compiled by the European Spreadsheet Risk Interest Group (EuSpRIG). Office responded to this need by introducing new features in Office 2013 to help manage the use of spreadsheets for important business functions, and now it has enabled a similar capability in Office 365.

 

Now you can manage spreadsheets in Office 365

One of the tools introduced in Office 2013 is Discovery and Risk Assessment, a server (on-premises) that helps you find spreadsheets in your organization and determine which present the greatest risk, so you can take steps to reduce your risks. It does this by crawling network shares and SharePoint sites to find the spreadsheets, and then analyzes and rates each one for complexity, materiality (impact to the organization), and risk based on configurable criteria. Businesses are using this powerful tool to gain better visibility of their spreadsheet inventory.

You can now gain better visibility into your spreadsheets in Office 365, too. We integrated the spreadsheet risk assessment concepts directly into the enterprise search capability that's already crawling your document libraries and cataloging your spreadsheets. This means that you can quickly search for spreadsheets based on properties that help you understand their complexity and business impact. In addition to giving you new insights about your spreadsheets, this capability combines with SharePoint Search and other Office 365 features—like eDiscovery, archiving, and data loss prevention—to give your organization much greater control of your information. Because Office 365 does all the heavy lifting, it easier for you to leverage this search capability. Plus, we’re always improving it.

Let’s take a closer look at the new properties of this search capability, where you can use them, and how you can use them effectively to improve visibility into your spreadsheets and help manage risk. 

 

New search properties

In Office 365 you can now search for spreadsheets based on the following properties:

  • The number of unique formula sets. A unique formula set is a range of contiguous cells containing the same formula, taking relative references into account. For example, if cell B1 has the formula “=A1”, and cell B2 has “=A2”, these are considered the same, because both formulas refer to the cell one column to the left. As shown in the image below, the 5 formulas in column C are counted as one unique formula set, because they’re all multiplying the values in columns A and B.

                          The formulas in column C count as 1 “unique formula set.”

  • The number of formulas. This is the total number of cells containing formulas.
  • The number of sheets.
  • The number of data connections.
  • The number of linked workbooks.
  • The path and filename of any/all linked workbooks.

Are there other things you’d like to know or see about your spreadsheets? Please let us know by commenting on this blog post.

These new search properties can be extremely useful, giving your organization visibility into its spreadsheet inventory that it’s never had before. Here are some examples of the questions these properties enable you to ask about your organization's spreadsheets: 

  • Do we have spreadsheets with links to files on someone's local computer? For example, your colleague may have an important spreadsheet that uses links to other spreadsheets to import some of the data. If the other spreadsheets are stored on their local hard drive, the link will appear broken when you open it. You can easily find workbooks where this is a problem, because you can search for workbooks based on the location of any files they're linked to (getting data from).
  • How many spreadsheets have data connections (retrieving data from external sources)? Many people use Excel to analyze data from external data sources, like databases, online services, export files, and others. Lots of important work goes on in these spreadsheets, but you probably don't know how many there are or where they are. Now you can find out—easily.
  • If I change my spreadsheet, which other ones are affected? For example, you have a spreadsheet called "ProductList.xlsx" that's used as a data source by other spreadsheets. You need to make a change to the product list spreadsheet, and you need to know what other spreadsheets will be affected by the change. Now you can search for any spreadsheets that are linked to "ProductList.xlsx."
  • Where are our most important spreadsheets? Many organizations have so many spreadsheets that it's difficult to know which ones need attention first when it comes to better spreadsheet management practices. Since the chance for errors is greater as spreadsheets grow in complexity, it's useful to know which spreadsheets are more complex than others. The new search capabilities allow you to get an understanding of the complexity at a high level to help you prioritize.

 

Where to run searches

There are several places in Office 365 where you can use the new search capabilities. Any search box in SharePoint Online will let you use these properties, as well as any other advanced search terms. We have plans to create a search experience in Office 365 to help guide you and make it easy to use these new properties, but for now one of the best ways is to use the eDiscovery Center in your Office 365 admin center. The eDiscovery center lets you create, save and run search queries, and it has some nice features to help you work with the search results, such as exporting a copy of the files.

Here are some screen shots of what queries look like in an eDiscovery center:

You can use the eDiscovery Center in Office 365 to search for spreadsheets .

For more information, see Create and run eDiscovery queries in an eDiscovery Center.

Features of the new search properties

Because the new properties go beyond just searching for words or phrases that appear in a spreadsheet, you need to know how to use them effectively. For the new properties that count items in a workbook, the exact counts are distilled down to just a few ranges, and it's important to understand how this works so your searches are effective.

Note: These new features are only enabled for XLSX, XLSM, and XLSB files (Office Open XML spreadsheets), so you'll want to upgrade any older spreadsheets to one of these file formats.

Search based on an approximate number of items

Several of the new search properties let you search based on an approximate number of items found in the workbook. They are:

Property Name in Search Schema

Description

XLFormulaCountRngID

Formulas in the workbook (approximate count)

XLLinkedWkbkCountRngID

Linked workbooks (approximate count)

XLWorksheetCountRngID

Worksheets (approximate count)

XLDataConnCountRngID

Data Connections (approximate count)

XLUniqueFormulaSetCountRngID

Unique Formula Sets (approximate count)

Search based on the path and filename of linked workbooks

One of the new properties lets you search based on the path and filename of linked workbooks:

Property Name in Search Schema

Description

XLLinkedWorkbooksText

Path and filename of linked workbooks

Search based on the number of formulas in a spreadsheet

To search based on the number of formulas in a spreadsheet, in your search, specify the property "XLFormulaCountRngID.”s. For each document, this property has a value that indicates the low end of the range in which the actual formula count falls. For example, if a workbook has 47 formulas, we set the XLFormulaCountRngID=1, because it's in the range from 1 to 99. So searching for "XLFormulaCountRngID=1" would find all workbooks with at least 1 formula, but no more than 99. To find workbooks with zero formulas, you would search for "XLFormulaCountRngID=0".

You should always keep in mind the value will be the low end of the range, so if you want to search for spreadsheets with greater than n formulas, you need to find the correct range ID and search for "XLFormulaCountRngID >=" that range ID.

For XLFormulaCountRngID, search for the following values to find spreadsheets based on how many formulas they have:

Typically, you would search for files with greater than or equal to some number of formulas, such as "XLFormulaCountRngID>=500". It's best to stick with the values shown in the table above. The search index won't have any values for this property other than the ones shown here. For example, if you have a spreadsheet with 750 formulas, it would have a RngID value of 500, so it would not be found if you searched for ">=750". Instead, you would find spreadsheets that have at least 1000 formulas, since that’s the next higher range.

Here are all the properties and their possible values:

Search using the SharePoint 2013 Search Query Tool

Another great option is the SharePoint 2013 Search Query Tool, which is a free tool available on CodePlex. Download the Search Query Tool. Here’s how to use it.

Basic steps to run a query using the Search Query tool:

  1. Get the tool from CodePlex and install it.
  2. Run the Search Query tool.
  3. In the “Connection” fields:

a.  Enter your Office 365 URL, for example “https://microsoft.sharepoint.com”.

b.  Choose Authentication = Authenticate using a specific user account.

c.  Select Authentication Method = SharePoint Online

d.  Click Sign in to SP Online, and log into your site if prompted.  You may be logged in automatically if you’ve saved your login through your browser

4.  In the Query section:

a.  Enter your search query in the Query Text box.  For example, you can type “XLUniqueFormulaSetCountRngID>0”.  Here are some sample queries you can use:

    • XLLinkedWorkbooksText:"desktop" (this is going to find workbooks that are linked to files that are someone’s desktop on their computer)
    • XLFormulaCountRngID>=1000 OR XLUniqueFormulaSetCountRngID>=50
    • XLLinkedWkbkCountRngID>=4
    • XLFormulaCountRngID>=10000 AND XLWorksheetCountRngID>=5
    • XLFormulaCountRngID>=10 AND Author:Joe
    • XLUniqueFormulaSetCountRngID>=1
    • Title:Part_of_the_filename (this is going to find workbooks that contain what you specify in any part of the filename)

b.  Go to the Select Properties box and enter properties that you want to see in your query results.  For example:
Title, Size, LastModifiedTime, Filename, Path, Author, XLFormulaCountRngID, XLFormulaCount, XLLinkedWkbkCountRngID, XLLinkedWkbkCount, XLLinkedWorkbooksText, XLWorksheetCountRngID, XLWorksheetCount, XLDataConnCountRngID, XLDataConnCount, XLUniqueFormulaSetCount

c.  Click Run.

d.  Check the Primary Results tab to see your results.

e.  Use the URL next to the Run button to get your results in a browser using the REST API.

f.  Adjust any other parameters of your query using the other controls available in the Search Query tool. Here’s more information: - SharePoint 2013 Search REST API.

g.  Enjoy the new visibility of your organization’s spreadsheets!

 

For more information about spreadsheet management in Office, see this Excel blog post.