Mystery Solved - Crawled Properties in SharePoint (Part 1)

Introduction

Crawled properties in SharePoint are metadata that is extracted from documents during crawls and is based on the protocol handler used. Metadata includes information such as author, create date, subject, title, etc. Administrators can control which crawled properties are mapped to managed properties, and, in doing so, the end user search experience can be enhanced.

The dilemma comes into play when you’re trying to figure out which crawled property to map to a managed property. Several of the crawled properties do not have descriptive names (propID) but rather an integer for the name. This makes it impossible for one to know which crawled property could be mapped to a managed property.

Viewing Crawled Properties

Crawled properties can be seen in the Shared Services Provider’s Metadata Properties page.

1. Launch SharePoint 3.0 Central Administration

2. Select Shared Services Administration

3. Select the Shared Services Provider

a. In the Shared Services Provider, you can see the categories of the Crawled Properties by one of two ways:

                                                                          i. Search Settings

                                                                         ii. Search Administration (if the Infrastructure Update is installed)

b. Select the Search Administration link

c. In the Queries and Results heading, select Metadata Properties.

Metadata Properties Link

d. On the Metadata Property Mapping page, select Crawled Properties to see the list of Crawled Property Categories.

e. From here, you can see the Crawled Properties View and the number of properties associated with each category.

NOTE: The number of crawled properties per category may differ than the screen shot due to environment differences.

Crawled Properties View 01

From the screenshot, you can see there are 11 crawled property categories that are available out of the box. Some of the categories are self-explanatory while others may not be so clear. Given that, let’s take a look at what these categories represent.

Basic Category – can contain metadata that is associated with the gatherer, search, core, and storage property sets. In my environment, there are 10 different GUIDs (property sets) in the Basic Crawled Property Category.

Business Data Category – metadata that is associated with content in the Business Data Catalog

Internal Category – metadata internal to SharePoint

Mail Category – this metadata is associated with Microsoft Exchange Server

Notes Category – metadata this is associated with Lotus Notes

Office Category – metadata contained in Microsoft Office documents such as Word, Excel, PowerPoint, etc

People Category – metadata that is associated with the people profiles in SharePoint. The majority of these are also mapped to various managed properties from Active Directory and SharePoint information.

SharePoint Category – metadata that is part of the Microsoft Office schema available out of the box.

Tiff Category – contains metadata associated mainly with documents that have been scanned, faxed, along with word processing and Optical Character Recognition (OCR).

Web Category – HTML metadata associated with web pages

XML Category – includes metadata associated with the XML filter

Deep Dive

We have just completed a very high-level overview of the “out of the box” crawled properties available in Microsoft Office SharePoint Server 2007. From here on, we’re going to dive into each of the crawled property categories, one at a time, to learn exactly what metadata is available to be crawled and what it means. Remember, some of the crawled properties may be different in your environment based on the content being crawled. Also, while I’ve tried to find all of the information, there are still a few crawled properties I simply cannot find.

The best approach when figuring out a big challenge is to break it up into smaller chunks. For our first example, let’s look closer at the Crawled Properties View – Basic for the Property Name called Basic:12(Integer) .

Crawled Properties View 02

There are two properties with this exact same name where one is mapped to Size and the other isn’t mapped to anything.

Select the Property Name Basic:12(Integer) that is mapped to Size to see the details of this property.

Crawled Property 12_01

The Name and Information section is the key to finding the meaning behind the any of the crawled properties whether it is in this category or any of the other categories. Given that, let’s look closer at the six (6) elements that make up the Name and Information Section.

Name and Information Section of a Crawled Property

As you can see, there are six (6) pieces of information that are associated with each crawled property: Property Name, Category, Property Set ID, Variant Type, Data Type, and Multi-valued.

Crawled Property 12_02

Property Name – This is the name the development team gave this property when the program was written. It is hard-coded in the program and cannot be changed.

Category – This is a grouping of crawled properties based on the iFilter and Protocol Handler used to extract the metadata from the content. The category name can be edited but it is not recommended as search functionality will break.

Property Set ID - A GUID that identifies the property set for the crawled property. Doing a search for the GUID B725F130-47EF-101A-A5F1-02608C9EEBAC and filtering the results for the Property Name of 12 yields several links to related content. One such link, on MSDN, provides a tremendous amount of information. This tells us that this property set is a System property and the propID of 12 is the file size

System. Size

The system-provided file system size of the item, in bytes.

  • propertyDescription
    name = System.Size
    shellPKey = PKEY_Size
    formatID = B725F130-47EF-101A-A5F1-02608C9EEBAC
    propID = 12

· searchInfo
inInvertedIndex = true
isColumn = true
isColumnSparse = false
columnIndexType = OnDisk
maxSize = 128

· labelInfo
label = Size
sortDescription
invitationText = Add a file size
hideLabel = false

· typeInfo
type = UInt64
groupingRange = Size
isInnate = true
canBePurged
multipleValues = false
isGroup = false
aggregationType = Sum
isTreeProperty = false
isViewable = true
isQueryable = true
includeInFullTextQuery = false
conditionType = String
defaultOperation = Equal

· aliasInfo
sortByAlias = None
additionalSortByAliases = None

· displayInfo
defaultColumnWidth = 10
displayType = Number
alignment = Right
relativeDescriptionType = Size
defaultSortDirection = Descending

· stringFormat
formatAs = General

· booleanFormat
formatAs = YesNo

· numberFormat
formatAs = ByteSize
formatDurationAs = hh:mm:ss

· dateTimeFormat
formatAs = General
formatTimeAs = ShortTime
formatDateAs = ShortDate

· enumeratedList
defaultText
useValueForDefault = False

· enum
value
text

· enumRange
minValue = 134217729
setValue = 134217729
text = >129 MB

· drawControl
control = Default

· editControl
control = Default

· filterControl
control = Default

· queryControl
control = Default

Any of the GUID’s can be referenced on MSDN to gain a better insight into each of their properties.

Variant Type – The variant type defines the type of data for a property i.e. text, data and time, yes/no, integer, etc. The following table describes the some of the variant types used in SharePoint.

Variant Types

Data Type – Corresponds to the Variant Type

Multi-valued – Describes whether this property can hold more than one value. Most of the crawled properties are not multi-valued.

Now that we know how to read the information on the properties, the next post we'll see what they are all about.

basic.png