As you start to work with Search you will notice that it’s architecture has not been changed dramatically and your lessons learned in SharePoint 2013 should be able to help you.
Moving parts in Search
- Crawl Component
To retrieve information, the crawl component connects to the content sources by invoking the appropriate indexing connector or protocol handler. After retrieving the content, the crawl component passes crawled items to the content processing component.
For more information about crawling content sources, see Plan crawling and federation in SharePoint Server.
- Content processing component
The component transforms crawled items into artifacts that are included in the search index. The content processing component also writes information about links and URLs to the link database.
For more information about content processing, see Plan crawling and federation in SharePoint Server.
- Analytics processing component
The analytics processing component performs two types of analyses: search analytics and usage analytics.
The results from the analyses are added to the items in the search index. In addition, results from usage analytics are stored in the analytics reporting database.
For more information, see Overview of analytics processing in SharePoint Server.
- Index component
The search index can be divided into discrete portions, called index partitions. The search index is the aggregation of all index partitions. Each index partition holds one or more index replicas that contain the same information.
The index component:
- Receives processed items from the content processing component and writes those items to an index file. Index files are stored on a disk in the server that hosts the index component.
- Receives queries from the query processing component and returns result sets.
For more information about the search schema and the search index, see Overview of the search schema in SharePoint Server.
- Query processing component
The query component analyzes and processes queries and results.
For more information, see Plan to transform queries and order results in SharePoint Server.
- Search administration component
The search administration component runs the system processes for search.
- Crawl database
The crawl database stores tracking information and historical information about crawled items.
- Link database
The link database stores information extracted by the content processing component. In addition, it stores information about search clicks; the number of times people click on a search result from the search result page. This information is stored unprocessed, to be analyzed by the analytics processing component.
- Analytics reporting database
The analytics reporting database stores the results of usage analytics. In addition, it stores statistics information from the analyses. SharePoint Server uses this information to create Excel reports that show different statistics.
- Search administration database
The search administration database stores search configuration data, such as the topology, crawl rules, query rules, and the mappings between crawled and managed properties. It also stores the access control list (ACL) for the crawl component.
- Search Servers
- Index Component 80GB System Drive, 500GB Space for index files, 32 GB Ram, 8 Cores
- Analytics Component 80GB System drive, 300GB Space for processing of data before writing to DB, 8GB RAM, 8 Cores
- Other components 80GB System Drive, 8GB RAM, 8 Cores
NOTE: If you combine any of these components (Analytics, Crawl, Content Processing, Query Processing, or Search Admin) you need to add an additional 8GB Ram for each component ie If you have 3 components that server needs 24GB Ram
- Database Server 80 GB System Drive, 8-16GB RAM, 4-8 Cores
Based on the expectation of having over 20 million items, and high availability across all components I would recommend the following base architecture
Search Server 1 Query Processing component, Index Component (Partition 0)
Search Server 2 Index Component (Partition 0)
Search Server 3 Query Processing, Index Component (Partition 1)
Search Server 4 Index Component (Partition 1)
Search Server 5 Analytics Component, Content Processing Component, Crawl Component, Admin Component
Search Server 6 Analytics Component, Content Processing Component, Crawl Component, Admin Component
To improve Full crawl time and results – Add more crawl databases and content processing components for result freshness, use the Crawl health report to determine bottlenecks if there are any.
To improve Query latency – Add more index replicas so that the query load is distributed. Use the Query health report to determine bottlenecks if there are any.
To improve Query latency and Throughput – Split the index into multiple partitions. Use the Query health report to determine bottlenecks if there are any.
For redundant crawling and query processing, it is not necessary to have a redundant analytics processing component. However, if the non-redundant analytics processing component fails, the search results will not have optimal relevance until the failure is recovered.
Best Practices for End Users
- Use Metadata often, make sure data is consistent across Site Collections
- Organize data in hierarchies that naturally flow together and use natural language, the search engine performs linguistic analyze of url and file names to improve ranking
- Shallow/Simple data structure