Build a Query to Search the Windows Index from PowerShell

Doctor Scripto

June 25th, 20120 0

Summary: Guest blogger, James O’Neill, discusses using Windows PowerShell to build a query to search the Windows Index. Microsoft Scripting Guy, Ed Wilson, is here. Today is Part One of three blogs written by guest blogger, James O’Neill. James O’Neill was born in the 1960s and used his first Microsoft product in the 1970s (and has been trying to stop ever since.) He obtained a degree in Computer
Science in the 1980s and spent most of the 1990s running an IT training company. From 2000 to 2010 he worked for Microsoft in his native England,
finishing as the evangelist for Windows platform, where he discovered PowerShell. He’s probably best known in the PowerShell community for his
library to manage Hyper-V on Windows Server 2008/2008-R2. .

Note I have two Weekend Scripter blogs where I discuss querying the Windows Search Index. The first is Querying the Windows Search Index, and the second is Using the Windows Search Index to Find Specific Files. Both of these blogs use COM-based ADO to do the query instead of using the .NET Framework classes that are used by James. The blogs provide a good background for James’ series. Take it away, James… I have spent some time developing and honing a Windows PowerShell function that gets information from the Windows Index, which is the technology behind the search that is integrated into Windows Explorer in Windows 7 and Windows Vista. The Windows Index can be queried by using SQL, and my function builds the SQL query from user input, executes it, and receives rows of data for all the matching items. Today, I’m going to explore the query process. Part Two will look at making user input easier (I don’t want to make understanding SQL a prerequisite for using the function). In Part Three, I will look at why rows of data are not the best thing for the function to return and what the alternatives might be. We will look at how at how the query is built in a moment. For now, please accept a ready-to-run query that is stored in the variable $SQL. Then it only takes a few lines of Windows PowerShell to prepare and run the query as shown here.

$Provider=”Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;”

$adapter = new-object system.data.oledb.oleDBDataadapter -argument $sql, $Provider

$ds = new-object system.data.dataset

if ($adapter.Fill($ds)) { $ds.Tables[0] } The data is fetched by using the oleDBDataAdapter and DataSet objects. The adapter is created by specifying a “provider” (which says where the data will come from) and a SQL statement (which says what is being requested). The query is run when the adapter is told to fill the dataset. The .fill() method returns a number that indicates how many data rows were returned by the query. If this is non-zero, my function returns the first table in the dataset. Windows PowerShell sees each data row in the table as a separate object, and these objects have a property for each of the table’s columns. So a search might return something like this:

SYSTEM.ITEMNAME : DIVE_1771+.JPG

SYSTEM.ITEMURL : file:C:/Users/James/pictures/DIVE_1771+.JPG

SYSTEM.FILEEXTENSION : .JPG

SYSTEM.FILENAME : DIVE_1771+.JPG

SYSTEM.FILEATTRIBUTES : 32

SYSTEM.FILEOWNER : InspironJames

SYSTEM.ITEMTYPE : .JPG

SYSTEM.ITEMTYPETEXT : JPEG Image

SYSTEM.KINDTEXT : Picture

SYSTEM.KIND : {picture}

SYSTEM.MIMETYPE : image/jpeg

SYSTEM.SIZE : 971413 There are lots of fields to choose from, so the list might be longer. The SQL query to produce it looks something like this:

SELECT System.ItemName, System.ItemUrl, System.FileExtension, System.FileName, System.FileAttributes, System.FileOwner, System.ItemType, System.ItemTypeText , System.KindText, System.Kind, System.MIMEType, System.Size

FROM SYSTEMINDEX

WHERE System.Keywords = ‘portfolio’ AND Contains(*,’stingray’) In the finished version of the function, the SELECT clause has 60 or so fields. The FROM and WHERE clauses might be more complicated than in the example, and an ORDER BY clause might be used to sort the data. The clauses are built by using parameters that are declared in my function like this:

Param ( [Alias(“Where”,”Include”)][String[]]$Filter ,

[Alias(“Sort”)][String[]]$orderby,

[Alias(“Top”)][String[]]$First,

[String]$Path,

[Switch]$Recurse

) In my functions, I try to use names that are already used in Windows PowerShell. So here I use -Filter and -First, but I also define aliases for SQL terms like WHERE and TOP. These parameters build into the complete SQL statement, starting with the SELECT clause which uses –First.

if ($First) {$SQL = “SELECT TOP $First “}

else {$SQL = “SELECT “}

$SQL += ” System.ItemName, System.ItemUrl ” # and the other 58 fields If the user specifies –First 1, $SQL will be “SELECT TOP 1 fields“; otherwise, it’s just “SELECT fields.” After the fields are added to $SQL, the function adds a FROM clause. Windows Search can interrogate remote computers, so if the -Path parameter is a UNC name in the form \computerNameshareName, the SQL FROM clause becomes FROM computerName.SYSTEMINDEX; otherwise, it is FROM SYSTEMINDEX to search the local computer.
A regular expression can recognize a UNC name and pick out the computer name, like this:

if ($Path -match “\\([^\]+)\.”) {

$sql += “FROM $($matches[1]).SYSTEMINDEX WHERE “

}

else {$sql += ” FROM SYSTEMINDEX WHERE “} The regular expression in the first line of the example breaks down as follows:

Regular expression	Meaning	Application
\\([^\]+)\.	2 characters: “” is the escape character, so each one needs to be written as \	\computerNameshareName
\\([^\]+)\.	Any non- character, repeated at least once	\computerNameshareName
“\\([^\]+)\.“	A ,followed by any character	\computerNameshareName
“\\([^\]+)\.”	Capture the section that is enclosed by the brackets as a match	$matches[0] =\computerNames $matches[1] =computerName

I allow the function to take different parts of the WHERE clause as a comma separated list, so that

-filter “System.Keywords = ‘portfolio'”,”Contains(*,’stingray’)” is equivalent to

-filter “System.Keywords = ‘portfolio’ AND Contains(*,’stingray’)” To add the filter, we simply need this:

if ($Filter) { $SQL += $Filter -join ” AND “} The folders searched can be restricted. A “SCOPE” term limits the query to a folder and all of its subfolders, and a “DIRECTORY” term limits it to a folder without subfolders. If the request is going to a remote server, the index is smart enough to recognize a UNC path and return only the files that are accessible via that path. If a -Path parameter is specified, the function extends the WHERE clause, and the –Recurse switch determines whether to use SCOPE or DIRECTORY, like this:

if ($Path){

if ($Path -notmatch “w{4}:”) {

$Path = “file:” + (resolve-path -path $Path).providerPath

}

if ($sql -notmatch “WHEREs*$”) {$sql += ” AND ” }

if ($Recurse) {$sql += ” SCOPE = ‘$Path’ ” }

else {$sql += ” DIRECTORY = ‘$Path’ “}

} In these SQL statements, paths are specified in the form file:c:/users/james, which isn’t how we normally write them (and the way I recognize UNC names won’t work if they are written as file://ComputerName/shareName). This is rectified by the first line inside the If ($Path) {} block, which checks for 4 “word” characters, followed by a colon. Doing this prevents ‘File:’ being inserted if any protocol has been specified. The same search syntax works against HTTP:// (although, not usually when searching on your workstation), MAPI:// (for Outlook items), and OneIndex14:// (for OneNote items). If a file path has been given, I ensure it is an absolute one. The need to support UNC paths forces the use of .ProviderPath here. It turns out that there is no need to convert characters in the path to /, provided file: is included. After taking care of that, the operation -notmatch “WHEREs*$” sees to it that an “AND” is added if there is anything other than spaces between WHERE and the end of the line (that is, if any conditions specified by –Filter have been inserted). If neither -Path nor –Filter was specified, there will be a dangling WHERE at the end of the SQL statement. Initially I removed this with –Replace. Then I decided that I didn’t want the function to respond to a lack of input by returning the whole index, so I changed it to write a warning and exit. With the WHERE clause completed, the final clause in the SQL statement is ORDER BY, which, like WHERE, joins a multipart condition.

if ($sql -match “WHEREs*$”) {

Write-warning “You need to specify either a path, or a filter.”

Return

}

if ($orderby) { $sql += ” ORDER BY ” + ($OrderBy -join ” , “) } When the whole function is put together, it takes three dozen lines of Windows PowerShell to handle the parameters, build and run the query, and return the result. Put together, it looks like this:

Function Get-IndexedItem{

Param ( [Alias(“Where”,”Include”)][String[]]$Filter ,

[Alias(“Sort”)][String[]]$OrderBy,

[Alias(“Top”)][String[]]$First,

[String]$Path,

[Switch]$Recurse )

if ($First) {$SQL = “SELECT TOP $First “}

else {$SQL = “SELECT “}

$SQL += ” System.ItemName, System.ItemUrl ” # and the other 58 fields

if ($Path -match “\\([^\]+)\.”) {

$SQL += “FROM $($matches[1]).SYSTEMINDEX WHERE “

}

else {$SQL += ” FROM SYSTEMINDEX WHERE “}

if ($Filter) { $SQL += $Filter -join ” AND “}

if ($Path) {

if ($Path -notmatch “w{4}:”) {$Path = “file:” + $Path}

$Path = $Path -replace “\”,”/”

if ($SQL -notmatch “WHEREs*$”) {$SQL += ” AND ” }

if ($Recurse) {$SQL += ” SCOPE = ‘$Path’ ” }

else {$SQL += ” DIRECTORY = ‘$Path’ “}

}

if ($SQL -match “WHEREs*$”) {

Write-Warning “You need to specify either a path or a filter.”

Return

}

if ($OrderBy) { $SQL += ” ORDER BY ” + ($OrderBy -join ” , ” ) }

$Provider=”Provider=Search.CollatorDSO;Extended Properties=’Application=Windows’;”

$Adapter = New-Object system.data.oledb.oleDBDataadapter -argument $SQL, $Provider

$DS = New-Object system.data.dataset

if ($Adapter.Fill($DS)) { $DS.Tables[0] }

} The -Path parameter is more user-friendly as a result of the way I handle it. But I’ve made it a general rule that you shouldn’t expect the user to know too much about the underlying syntax; and at the moment, the function requires too much knowledge of SQL. I don’t want to type this:

Get-Indexed-Item –Filter “Contains(*,’Stingray’)”, “System.Photo.CameraManufacturer Like ‘Can%'” And it seems unreasonable to expect anyone else to do so. I came up with this list that I want the function to do for me:

Don’t require the user to know whether a search term is prefixed with SYSTEM (SYSTEM.DOCUMENT, SYSTEM.IMAGE or SYSTEM.PHOTO). If the prefix is omitted, add the correct one.
Even without the prefixes, some field names are awkward; for example, “HorizontalSize” and “VerticalSize” instead of width and height. Provide aliases.
Literal text in searches needs to be enclosed in single quotation marks. Insert quotation marks if the user omits them.
A free text search over all fields is written as Contains(*,’searchTerm’). Convert “orphan” search terms into Contains conditions.
SQL uses % (not *) for a wild card. Replace * with % in filters to cope with users adding the familiar *.
SQL requires the like predicate (not =) for wildcards. Replace = with like for wildcards.

In Part Two, I’ll look at how I accomplish these things. ~James Thank you, James, for a great blog. I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

Build a Query to Search the Windows Index from PowerShell

Doctor Scripto Scripter, PowerShell, vbScript, BAT, CMD

0 comments

Build a Query to Search the Windows Index from PowerShell

Doctor Scripto Scripter, PowerShell, vbScript, BAT, CMD

Read next

0 comments