Client Automation: SharePoint Crawl Log Validation leveraging CSOM!

Intro

Glad to be back blogging and yes it has been awhile. Part of belonging to the Search Premier Field Engineering team is coming up with creative solutions to complex Search tasks. In my case, one such request came in with a request to process thousands of URL’s against the Crawl Log. While the Crawl Log UI is certainly capable in terms of validating items are successfully crawled/indexed, it lacks automation. For Example, What if you wanted to check thousands of URL’s against the crawl log to determine whether or not a page or document is crawled/indexed? Performing this task manually in the UI would take days if not weeks to complete. I suspect other companies have a good business case for this in that often calls come in complaining of various URL’s not appearing in Search Results. While a variety of factors exists in why search results are missing, one very common approach for a Search Administrator is to validate the particular item has been crawled.  The other thought is perhaps a Search Service Administrator would like to delegate this tasks to someone that doesn’t have access to SharePoint Servers. For both scenarios, I authored a cmdlet (commandlet) that runs on a client and leverages CSOM. The command-let performs the following steps:

1. Loads URL’s from a CSV File

2. Checks each URL against the Crawl Log

3. If URL is not marked as Crawled, a secondary check is made to validate the URL is accessible by calling invoke-webrequest

4. The results of processing each URL are exported to a results report CSV file

 

Requirements in running the Cmdlet

1. Cmdlet tested against SharePoint 2013 On Premise installations

2. Recommend running this command-let on Windows 8 Client Machine

3. Credentials used to run Command-Let must be granted permissions explicitly against the crawl log.

    • Run Set-SPEnterpriseSearchCrawlLogReadPermission to grant these permissions

4. Commandlet will not run properly if Proxy Server is configured on the client. See Question and Answer Section below for more details.

5. CSV file you wish to import should be configured with the first column containing all of the URL’s. So for example, my Import.CSV file looks like:

image

 

 

Setup Instructions

Before going through the setup, please review the following disclaimer:

Microsoft provides programming examples for illustration only, without warranty either expressed or implied, including, but not limited to, the implied warranties of merchantability and/or fitness for a particular purpose. This sample assumes that you are familiar with the programming language being demonstrated and the tools used to create and debug procedures. Microsoft support professionals can help explain the functionality of a particular procedure, but they will not modify these examples to provide added functionality or construct procedures to meet your specific needs. If you have limited programming experience, you may want to contact a Microsoft Certified Partner or the Microsoft fee-based consulting line at (800) 936-5200.

For more information about Microsoft Certified Partners, please visit the following Microsoft Web site: https://partner.microsoft.com/global/30000104

Author: Russ Maxwell (russmax@microsoft.com)

 

Steps are below:

1. Copy the following SharePoint Assembly from a SharePoint Server to the client’s c:\CSOM directory.  The client in this case will be executing the cmdlet:

Microsoft.SharePoint.Client.dll
Microsoft.SharePoint.Client.DocumentManagement.dll
Microsoft.SharePoint.Client.Publishing.dll
Microsoft.SharePoint.Client.Runtime.dll
Microsoft.SharePoint.Client.Search.Applications.dll
Microsoft.SharePoint.Client.Search.dll
Microsoft.SharePoint.Client.ServerRuntime.dll
Microsoft.SharePoint.Client.Taxonomy.dll
Microsoft.SharePoint.Client.UserProfiles.dll
Microsoft.SharePoint.Client.WorkflowServices.dll

Note 1: The above dll files are located in the following directory on a SharePoint Server: C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\Isapi

Note 2: If you decide to copy the files into a different directory on the client, you’ll need to update the ps1 file by updating the add-type cmdlet and setting the path parameter to the appropriate path.  Look at lines 40-44 within the PS1 file.

image

 

2. Create a directory with the name of the Module here: c:\windows\system32\windowsPowerShell\v1.0\Modules\

Note: In my case, I created a directory called CrawlLogCheck

image

 

3. Download the attached zip file at the bottom of the page and extract the contents to the newly created module directory.  In my case, I extracted the contents of the zip file to the following directory:  C:\Windows\System32\WindowsPowerShell\v1.0\Modules\CrawlLogCheck

image

 

4. Import the Module into PowerShell by launching PowerShell and running the following: Import-Module crawllogcheck

5. That’s it! you should now be able to run the new commandlet called Initialize-CrawlLogCheck

 

Question: What if I want the cmdlet available everytime I launch PowerShell on my client box?

Answer:   You can do this by creating a profile and specify the Import-Module crawllogcheck command in that particular profile ps1 file. Steps for this are the following:

a. Launch PowerShell
b. Run: $profile
c. Run: test-path $profile

Note: if it returns back as false, proceed to step d.  If it returns true, proceed to step e.

d. Run: new-item –path $profile –itemtype file –force
e. Run: powershell_ise $profile
f. Add the following: Import-Module CrawlLogCheck

image

g. Save and Close PowerShell ISE

 

 

Running the Cmdlet

Setup is complete so let’s do a Demo of what this looks like.  My import csv is located in the c:\test folder and I will leverage the same directory for the export.csv which will be created by the cmdlet.

1. Launched PowerShell and ran the following:

Initialize-CrawlLogCheck –ImportPath “c:\test\myfile.csv” –ExportPath “c:\test\myexport.csv” –SiteURL “https://intranet.contoso.com”

image

 

2. Prompted for Credentials, enter credentials of account that has access to Crawl Log

image

 

3. Observe it run

image

 

4. Upon completion, you have a new Export CSV file and opening it looks like:

image

 

 

Question and Answer

QuestionWhat does Crawled Column Indicate in the export csv?
Answer: If Crawled is marked as yes, it means the URL provided is marked as Crawled and the Item is not marked as deleted.

 

Question: What does URL Valid column mean?
Answer: It validates the URL provided starts with https:// or https://

 

Question: Why doesn’t the cmdlet work with Proxy during the page request check?
AnswerProxies often require authentication and I’ve seen instances where one code snippet will work and authenticate against one proxy server while the same code snippet fails to authenticate against a different type of Proxy Server. Yes, very frustrating to deal with although I do have a code sample of doing this. If interested in it, please leave feedback.

 

Question: What’s the max URL’s attempted?
Answer:   I ran the cmdlet and processed over 3,000 URLs and processing time was fast. Memory consumption was minimal.

 

 

Resources

https://technet.microsoft.com/en-us/library/jj219817.aspx

https://msdn.microsoft.com/en-us/library/bb613488(VS.85,loband).aspx

 

Thanks,

Russ Maxwell, MSFT