Search Visibility

 

one of the scenario where there is a custom web part which pulls data from People Soft application called "SKILLS" on the Page is not always there, it would be displayed only when we browse the site ,the SharePoint account does not have access to the database or any permission on the People Soft application.

 

Below are the concerns

1. How come we are able crawl the data from the people soft?

2. If we are able to configure it so easily, why there is an need of BDC?

3. What happens to the security if we are able to crawl data from Finance or HR database?

 

To understand the behaviour i decided to have an in house repro

 

Repro steps

=============

Installed a new server with SQL server 2005 SP3  and SharePoint 2007 SP2 on 2 different server

Ran the configuration wizard and created a new farm

Created a new web application and a new site collection

Then installed SHAREPOINT designer and inserted a data view from an Adventure works database which is not part of the farm. (similar to People Soft application)

 

Followed the below steps to Get a data view  WebPart to display data from SQL 2005 server on a SHAREPOINT page as explained in the below link

https://blogs.technet.com/paulpaa/archive/2010/01/26/data-view-web-part-to-view-data-from-non-sharepoint-database-adventure-works.aspx 

    clip_image001

Then created a crawl rule as below

clip_image002

In the search visibility I configured as below

 

clip_image003

Note: I have the data from external data source and it connection String is as SQL_SA account which I created for testing

 

created a new content source as below

clip_image004

And started a full crawl

Then when I search for the word "Television " I was able to retrieve results

I changed the search visibility option back to "Do not index ASPX pages if this site contains fine-grained permissions  " as default

clip_image005

Then started full crawl . And searched for the same word "Television"

I was not able to get results.

 

As another test,

I created an new data view web part to insert a table which had more than 10 rows , as show below ‘

clip_image006

In the web part I had Last name as “Bjorn” and when I searched for Bjorn I did not get any results.

 

Explanation Crawling Process

The indexing process starts with configuring the content source and the start  address(es). Indexing begins when the content source is either triggered to     start -manually or scheduled. The spider governs the content discovery and retrieval. The gatherer reads the start address(es) of the content source and loads the protocol handler and IFilter. Once the protocol handler and IFilter are loaded, the content is collected as a stream of text. The data is then passed to the word breaker(s) and continues for noise word removal before the data is added to the index.

 

when we Go to the site, site actions -> site settings -> and in site administration tab look for search visibility

here you can set some details on how to index aspx pages.

This site does not contain fine-grained permissions. Specify the site's ASPX page indexing behavior:

  • Do not index ASPX pages if this site contains fine-grained permissions
  • Always index all ASPX pages on this site
  • Never index any ASPX pages on this site

 

In the web part I had Last name as “Bjorn” and when I searched for Bjorn I did not get any results.

 

This is what I think that might be happening at your side and in the in-house repro that we have done.

If we select the option “Always index all ASPX pages on this site”, the crawl engine hits the site through the web front end. The

gatherer reads the start address(es) of the content source and loads the protocol handler

process this file as an HTLM page and process the URL and added to the index .Hence we were not able to search for the word Bjorn in the search results.

 

So the assumption that we had in the beginning were not true. What ever its displayed in the page when the crawler hits the site, it converts it as an HTML file when the option “Always index all ASPX pages on this site “ is selected.

 

Below are the links that can explain the Security considerations for search

https://technet.microsoft.com/en-us/library/cc262033.aspx

 

More details on the Search architecture

https://msdn.microsoft.com/en-us/library/ms570748.aspx

 

Learn how to exclude specific URLs from being displayed in search results.

 Remove URLs from search results (Search Server 2008) 

 

Hope this help!!