SharePoint 2013 Search: People Search – “Why are my results so bad?” Understanding Relevancy, the Rank Model, Full-Text Index, Fuzzy Matching, and Social Distance

I’ve heard this question come up enough now that I think it warrants deeper examination.  Several customers have complained about “bad results” in people search.  They search for Katherine Doe, but the number-one result is for Cathy Smith and it’s ranked higher than the Katherine Doe result.  While this seems like bad relevancy, if we step back and understand why the relevancy and ranked results are technically correct, we can make some informed choices that will bring us closer to the desired results.

The first step is understanding the difference between Recall and Precision.  The second is understanding how the default People Search Rank Model and the People Full-Text Index, called PeopleIdx, contribute to Recall and Precision. 

Recall and Precision

In the context of information retrieval, or search, Recall and Precision measure the completeness and accuracy of a query’s results.  For our purposes, Recall defines the set of all possible documents that match a query, and Precision is the sort of the recalled set by a relevancy score.  In SharePoint 2013, we can equate Recall to the Full-Text Index and Precision to the Ranking Model.

Full-Text Index

In SharePoint 2013, when users create a Managed Property and set the Searchable property to “true”, they have the opportunity to assign the property to a Full-Text Index in the Advanced Searchable Settings.  SharePoint 2013 has several Full-Text Indexes, but only two are used by default: 

  1. Default—the Full-Text Index for the “Local SharePoint Results” result source or the “Everything” vertical
  2. PeopleIdx— the Full-Text Index for the “Local People Results” result source or the “People” vertical.  All the people relevant managed properties are assigned to the PeopleIdx Full-Text Index

Some simple PowerShell reveals all Managed Properties assigned to PeopleIdx:

> Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa | ? {$_.FullTextIndex -eq “PeopleIdx”} | ft Name, FullTextIndex, Context

Name FullTextIndex Context

—- ————- ——-

AccountName PeopleIdx 9

AMEXFirstName PeopleIdx 15

AMEXLastName PeopleIdx 15

CombinedName PeopleIdx 14

ContentsHidden PeopleIdx 5

JobTitle PeopleIdx 2

Memberships PeopleIdx 6

NgramPhoneNumbers PeopleIdx 0

PreferredName PeopleIdx 1

Pronunciations PeopleIdx 8

RankingWeightHigh PeopleIdx 7

RankingWeightLow PeopleIdx 4

RankingWeightName PeopleIdx 13

Responsibilities PeopleIdx 3

SipAddress PeopleIdx 12

UserName PeopleIdx 10

WorkEmail PeopleIdx 11

 

These are all the Managed Properties that will be used for Recall when doing a people search.  There is a loose coupling between a Full-Text Index and a Rank Model.  Note that a Rank Model may not show all of the Managed Properties assigned to a Full-Text Index, so don’t forget that, despite their absence, all Managed Properties in the Full-Text Index contribute to recall, but with no weighting.  Also, any Managed Property that shares the same context as a property weighted in a Rank Model will receive the same weighting during rank score calculation.  So, if you add a Managed Property named Interests and assign it to context 3, and the Rank Model weights Responsibilities (e.g., w=”1.0”), Interests will receive the same weighting, despite not being specifically named in the Rank Model.

People Search Rank Models

The default People Search Rank Model is named “People Search application ranking model” (https://technet.microsoft.com/library/7c8ddec1-c8ff-4a90-afae-387b27a653f1.aspx#Ranking_Models).  Rank Models provide Precision or Relevancy Ranking for a recalled search result set.  Note that of the 16 OOB rank models, six are specific to People Search.  One of these rank models may serve your needs better than the default.  To choose which model is best for you, you’ll need to understanding the rank model beyond its description and understand the configuration of the rank model.

PowerShell provides some nifty Cmdlets that export the rank model configuration XML (https://msdn.microsoft.com/en-us/library/office/dn169052(v=office.15).aspx#sp15_using_custom_ranking_model ).  If you look at the XML config for the default People Search ranking model, you should notice a couple things.  First, the model has two stages (<RankingModel2NN>).  The first stage is a lighter calculation used to sort all the results that were matched or recalled from the search index.  The second stage is used to resort the top 1000 results.  It uses a more complex sorting algorithm, adding proximity weights to the calculation.  Not all ranking models have two stages; some models (e.g. People Search name ranking model) only have one stage.  Second, the ranking model names all the fields that will be weighted with their weight values and a bias value.

Here is an excerpt of the properties definition and weightings from the default people search rank model.

       <Properties>

          <Property name=”RankingWeightName” w=”0.5″ b=”0.5″ propertyName=”RankingWeightName” />

          <Property name=”PreferredName” w=”1.0″ b=”0.5″ extractOccurrence=”1″ propertyName=”PreferredName” />

          <Property name=”JobTitle” w=”2.0″ b=”0.5″ extractOccurrence=”1″ propertyName=”JobTitle” />

          <Property name=”Responsibilities” w=”1.0″ b=”0.85714285719″ extractOccurrence=”1″ propertyName=”Responsibilities” />

          <Property name=”RankingWeightLow” w=”0.2″ b=”0.85714285719″ extractOccurrence=”1″ propertyName=”RankingWeightLow” />

          <Property name=”ContentsHidden” w=”0.1″ b=”0.85714285719″ extractOccurrence=”1″ propertyName=”ContentsHidden” />

          <Property name=”Memberships” w=”0.25″ b=”0.85714285719″ extractOccurrence=”1″ propertyName=”Memberships” />

          <Property name=”RankingWeightHigh” w=”2.0″ b=”0.5″ extractOccurrence=”1″ propertyName=”RankingWeightHigh” />

          <Property name=”Pronunciations” w=”0.05″ b=”0.5″ propertyName=”Pronunciations” />

        </Properties>

 Social Distance

An important note needs to be made here about Social Distance Ranking. As you explore the default people ranking model, you will notice dynamic ranking features such as FirstLevelColleagues, SecondLevelColleagues and LevelsToTop. These are part of the Social Distance ranking feature, and their intention is to add a relevancy boost to colleagues and management close to you, with respect to your company’s organizational chart. The issue is that, out of the box, it doesn’t work correctly.** I’ve heard conflicting reports that recent patches have fixed this issue for both SPO and On-prem SharePoint environments. I have not been able to verify. Please test thoroughly.  

If it is not working, an examination of the ULS log for a “default” People search query will reveal the following entry:

07/17/2014 01:13:34.85 NodeRunnerQuery1-963b71f2-33dc- (0x1280) 0x1584 Search Query Processing aiziq Monitorable Microsoft.Office.Server.Search.Query.Pipeline.Processing.PersonalizationDataInjectionEvaluator : Field: PersonalizationData ‘null’. Personalized Search queries will not work 0be6b1d5-3bd8-4aac-a1f1-08d7c75867cc

The web part is not setting the “PersonalizationData” parameter. The consequence is the null parameter prevents Social Distance Ranking from being calculated. Fortunately, there are three workarounds, but the last is the most simple to implement. For all workarounds, verify the Web Application hosting the Enterprise Search center is associated with a UPSA proxy (This association should work OOB with a 2013 proxy, but not 2010). (1) Use the Query REST API and add the “PersonalizationData” parameter with the search user’s User Profile GUID as the value. (2) Create a custom search web part and add the User Profile GUID. SCOM uses the parameter name “QueryPersonalizationData”. (3) Edit the People Search results page, edit the People Search Core Results web part, and click Change Query:

Old:

{searchboxquery}

New:

{?(({searchboxquery} XRANK(cb=2) firstlevelcolleagues:{User.userprofile_guid}) XRANK(cb=1) secondlevelcolleagues:{User.userprofile_guid}) XRANK(cb=7.5) userprofile_guid:{User.userprofile_guid}}

Or if you want to get real fancy:

 {?((((((({searchboxquery} XRANK(cb=2) firstlevelcolleagues:{User.userprofile_guid}) XRANK(cb=1) secondlevelcolleagues:{User.userprofile_guid}) XRANK(cb=7.5) userprofile_guid:{User.userprofile_guid}) XRANK(cb=1) levelstotop:1) XRANK(cb=0.8) levelstotop:2) XRANK(cb=0.6) levelstotop:3) XRANK(cb=0.4) levelstotop:4) XRANK(cb=0.2) levelstotop:5}

Note that the XRANK operator can have an impact on query latency, so test this (or any solution) before putting it into production. Also, note that latency is directly tied to index size, so what performs well in a small test environment may perform poorly in a very large production environment.

**A big thanks to Mikkel Conradi for doing all the leg work on this issue.

“Why are my results so bad?”  

Back to the Katherine Doe/Cathy Smith scenario where searching for Katherine Doe produces results with Cathy Smith at the top.

In reality, the results aren’t bad.  They are exactly what they should be.  Understanding the metadata and how it’s used in the rank model and Full-Text Index will explain why.  First, some background for our scenario.  Katherine Doe is an Executive and Cathy Smith is her Assistant.  In Cathy Smith’s User Profile under Responsibilities it lists “Executive Assistant for Katherine Doe”, while under Memberships, it shows Cathy is a member of “Katherine Doe Direct Reports” group.

If we refer to the default People Rank Model, we see that it includes, along with PreferredName, Memberships and Responsibilities.  So a search for Katherine Doe produces hits in Cathy Smith’s Memberships and Responsibilities Managed Properties, while no hits are in the corresponding Managed Properties for Katherine Doe (Katherine’s name does not appear in these fields.  She is not a member of her direct reports group and she is not her own Executive Assistant).  The hits in the non-name fields are often forgotten or not noticed because they are not displayed in the default Item_Person.html display template and are only partially displayed in the Item_Person_HoverPanel.html display template.  Katherine does get a hit, for both first and last name, in the PreferredName field, but so does Cathy, at least on the first name.  Cathy is a nickname for Katherine, so Fuzzy Matching, which is enabled by default on People Searches, will add “Cathy” as a nickname synonym, for the Katherine term to the query.

Fuzzy Matching

To be honest, the documentation on the Fuzzy Matching feature in SharePoint 2013 is quite… fuzzy.  Often phonetic matching is mentioned when talking about SP 2013 Fuzzy Matching components, but to be clear, there is no intrinsic phonetic matching in SP 2013.  It did exist in SharePoint 2010, but it is not in SP 2013. 

SharePoint 2013 Fuzzy Matching is comprised of three services: (1) Core Fuzzy Name Search, (2) Name Suggestions, and (3) Name Intent Search.  The latter, Name Intent Search, is only used as part of a Query Rule condition used to identify People Queries from a “non-People” vertical.  Core Fuzzy Name Search is a spelling or distance algorithm that measures similarity of terms.  Results from this service can look a lot like a phonetic match, but they’re not.  For example, the query ‘katherine’ will return results for what appears to be its phonetic match ‘catherine’.  Examination of verbose ULS logs reveals the real story.  The term ‘catherine’ is added to the query because of its spelling similarity or distance from ‘katherine’, not for its phonetic equivalency. Additionally, not only is ‘catherine’ considered, but so are ‘gathering’, ‘katharine’.

06 /04/2015 12:14:49.21 NodeRunnerQuery1-aaaaedc2-86d0- (0x0F24) 0x129C Search Fuzzy Name Search ajyg0 VerboseEx FuzzyNameSearcher : CoreCCAFuzzySearcher mined [Candidate:catherine GeometricSimilarity: 0.999740619082072 NormalizedConfidence:0.999740619082072] for Query:katherine IsComparable:true 531f0d9d-9c5c-c0a8-27b6-5ef65f3bbadb

06/04/2015 12:14:50.58 NodeRunnerQuery1-aaaaedc2-86d0- (0x0F24) 0x12B4 Search Query Processing aizf6 High Microsoft.Office.Server.Search.Query.Pipeline.Executors.LinguisticQueryProcessingExecutor : QSC: All Annotations: <Annotation ID=”1″ Name=”token” Range=”[0,9)” Attributes={normalizedForm=”katherine”} NumericalAttributes={}/>,<Annotation ID=”2″ Name=”querysegment” Range=”[0,9)” Attributes={} NumericalAttributes={}/>,<Annotation ID=”2001″ Name=”spellcheck” Range=”[0,9)” Attributes={Fuzzy=”catherine“,Score=”0.888888888888889″,dynamic=””,value=”5″} NumericalAttributes={}/>,<Annotation ID=”2002″ Name=”spellcheck” Range=”[0,9)” Attributes={Fuzzy=”gathering“,Score=”0.777777777777778″,dynamic=””,value=”5″} NumericalAttributes={}/>,<Annotation ID=”2003″ Name=”spellcheck” Range=”[0,9)” Attributes={Fuzzy=”katharine“,Score=”0.888888888888889″,dynamic=””,value=”6″} NumericalAttributes={}/>,<Annotation ID=”2004″ Name=”spellcheck” Range=”[0,9)” Attributes={Fuzzy=”katherine“,Score=”1″,dynamic=””,value=”5″} NumericalAttributes={}/>,<Annotation ID=”4001″ Name=”qsc_known_word” Range=”[0,9)” Attributes={value=”9″} NumericalAttributes={}/> 531f0d9d-6cb7-c0a8-27b6-5022c0ff2a0d

Most of the confusion comes from the now poorly named “EnablePhonetics” parameter, found in a Search web part’s DataProviderJSON configuration or in the Query Object Model.  The vestigial “EnablePhonetics” parameter enables Core Fuzzy Name Searching, which unfortunately has no phonetic matching features.  Name Suggestions is the nickname synonym service.  Names found in the nickname dictionary will have their nickname mappings added to the query.

Use the following PowerShell to see what nicknames are handled by the service:

> Get-SPEnterpriseSearchLanguageResourcePhrase -type NickName -Language en-US -SearchApplication $ssa -Owner $owner | ft Phrase, Mapping

Conclusion

So what to do about the Cathrine Doe > Kathy Smith scenario?  First, decide on the purpose of your “People” search vertical.  Is it a people name directory look up?  An expertise search?  Should the sorting be influenced by social distance or your organizational chart?  Then pick your Ranking Model.  You might find that the “People Name search ranking model” better suits your needs than the default “People Search application ranking model.”  Take time to look at which Managed Properties each ranking model weighs most heavily.  You might even notice that the “People Search expertise ranking model” is very poorly named.