Thursday, March 23, 2017

Enterprise Search - SharePoint 2013

Enterprise Search – SharePoint 2013

v What’s New in Enterprise Search

ü  Single Search Architecture
SharePoint 2013 is based on a single search architecture, which includes a single search service application. FAST search and FAST technology is now native to the product, not a separate product. This is a complete re-architecture, and many of the components are brand-new or revised.
ü Search Center and Search UI
-          The new look for SharePoint 2013 is also displayed in the Search Center.
-          The Search Center includes deep refiners with exact counts, and document previews with the new “take a look inside” functionality available in the new hover panel with the new Office Web Applications.
-          Search Verticals (new name for what used to be the tabbed Interface)
-          Four new web parts
ü  Relevancy Improvements
SharePoint 2013 improves relevance in areas such as freshness of search results, linguistics, and document parsing.

v Search Architecture

SharePoint 2013 search has been re-architectured , and the goal of achieving a single enterprise search platform has introduced a number of changes. You can consider SharePoint 2013 search to be a combination of SharePoint 2010 search, FAST search for SharePoint 2010, and FAST technology.
The Search topology has several key improvements:
-          Separate crawl and indexing processes
-          A new analytics process that provides search and usage analyses, including link analysis and recommendations
-          The entire index is stored locally on disk, and it no longer uses the property database
-          Search is scalable in two dimensions, content and query load
-          The administration component can be made fault tolerant
-          Native support for repartitioning the index as part of scaling out the topology

ü  Topology
The topology can be broken down into search components and databases that work together to provide search capability as shown below.
In a multi-server farm, these components reside on application servers, and the databases exist on SQL server database servers.
The Search components can be categorized in to five groups or processes:
Crawl and content – Includes the crawl and content processing components and the crawl database
Analytics – Includes the analytics processing component, and the links and analytics reporting databases
Index – Includes the index component, index partition, and index replica
Query – Includes the query processing component
Administration – Includes the administration component and the administration database



ü  Managing the crawled process and crawled properties
-          The Whole process begins with the crawl component, which is also referred to as the Crawler.
-          This component crawls the content sources, and it delivers the crawled content and associated metadata to the content processing component.
-          To manage crawl volume and performance, you can simultaneously crawl content using multiple crawl components.
-          As the crawler processes data, it caches the content locally in preparation for sending content to the content processing unit.
-          The crawl component uses one or more crawl databases to temporarily store information about crawled items and to track crawl history.
-          There is no longer a one-to-one mapping of the crawl database to crawler as in SharePoint 2010; each crawl database can be associated with one or more crawlers, so you can scale them independently.
-          To support the need for a “fresher” index, SharePoint 2013 includes a new crawl type, the continuous crawl. The continuous crawl is applicable only to SharePoint content sources, and is a new option you can choose when you create a new content source.
-          Continuous crawl is like the incremental crawl but without the need to be scheduled. With continuous crawl, changed content is crawled every 15 minutes by default, but the frequency is configurable.
-          As in SharePoint 2010, all crawler configurations are stored in administration database.
-          The content and metadata that has been crawled and extracted from a document or URL are represented as crawled properties. New crawled properties are created after each new crawl, as new content is added to the enterprise. Crawled properties are passed to the content processing component for further analysis.



ü  Content Processing
-          This is a very specialized node in the search architecture, whose purpose is to analyze and process the data and metadata that will be included in the index.
-          The processing node transforms the crawled items and crawled properties using language detection, document parsing, dictionaries, property mapping and entity extraction.
-          This component is also responsible for mapping crawled properties to managed properties.
-          When a user performs a search, and clicks a result, the click-through information is also stored unprocessed in the link database.
-          All this raw data is subsequently analyzed by the analytics processing component, which updates the index with relevancy information.
-          Once completed, the transformed data is then sent to the index component.
-          Content processing configurations are stored in the search administration database.
-          The content processing component is also highly extensible, by using the web services that would provide information about how content should be processed.

Managed Properties
-          Crawled properties are mapped to managed properties to include the content and metadata in the search index.
-          Only managed properties are included in the index; therefore, users can search only on managed properties.
-          Managed properties have attributes, which determine how the contents are shown in the search results.
-          Managed properties also have associated attributes, also referred to as properties; yes, the managed properties have properties.
-          The list of default managed properties, also referred as search schema or index schema, contains the managed properties, their associated properties, and the mapping between crawled properties and managed properties.
-          You can edit the search schema yourself, manually mapping crawled properties to managed properties, and configuring property settings.
-          The content-processing component utilizes this schema to perform any necessary mapping.
-          A single managed property can be mapped to more than one crawled property.
-          You can also map a single crawled property to multiple managed properties.

Search Schema
The search schema is stored in the search administration database, and the schema web page, which is called search service application: Managed properties, shown in the below picture

This page is available from the search service application: Search Administration page in Central Administration, using the search schema link in the Queries and Results section.
Key points to remember about the search schema include the following:
-          It contains the mapping between crawled properties and managed properties, including the order of mapping for those cases that have mapped multiple crawled properties.
-          It maintains the settings for which index stores the managed property.
-          It contains the settings or properties for each of the different managed properties.
-          Site collection administrators can change the search schema for a particular site collection using the site settings page, and customize the search experience for that specific site collection. This is a new capability in SharePoint 2013.
-          It is possible to have multiple search schemas.

ü  Analytics Processing
This is a brand-new component for the search architecture. Its purpose is to analyze the content and how users interact with the content to improve search relevance, create search reports and recommendations, and create deep links.
The analytics component analyzes two different types of information:
·         Search Analytics – Information from crawled items that is stored in the index
·         Usage Analytics – Information about how users interact with the search results, such as how many times an item is viewed.
The Web Analytics capability in SharePoint 2010 has been discontinued, replaced by the analytics processing component in SharePoint 2013. This change was necessary to increase performance and scalability. The analytics component provides additional capabilities such as report of top items, recommendations, and dynamic improvement of search result relevancy.

ü  Index Processing
The index is the key to providing the best search experience, as its content determines what users find when executing search queries. SharePoint 2013 Search is a data access technology, because it provides access to information beyond just the search box query. The index component receives crawled and processed content and this information is added to the search index. This component also handles incoming queries, retrieves information from the search index, and sends back the result set to the query processing component.
The index processing architecture can be divided into:
·         Index partition
·         Index replica
·         Index component
SharePoint 2013 stores all the index on disk. Search capability is scaled using index partitions and index replicas; the “row and columns” terminology from SharePoint 2010 is gone.

ü  Query Processing, Query Rules and Result Sources
The query-processing component analyzes incoming queries, which are sent to the index component, which returns a set of results. This component performs linguistic analysis of the query, including word-breaking, which determines the boundaries of the words in the query, and stemming, which defines the base or root form of the words in the query. Once the query is processed, the query is submitted to the index component, which returns results from the index. The results are returned to the query component, where they are further processed before returning the results to the search front end.

Query rules and result sources are new features in SharePoint 2013. Query rules can be used to conditionally promote certain results, display the results in blocks, and tune relevancy. Results sources are used to scope the search results.

ü  Administration
This component is responsible for running processes that are essential to search, including new component provisioning. The search administration database stores search configuration data, such as the topology, crawl rules, and the mappings between crawled and managed properties. Each search service application can have only one search administration component. The current search configuration is accessible through central administration, but modifying the search topology requires PowerShell.

This completes the architecture overview. As you have seen, several enhancements have been made to the search architecture, and these changes have resulted in a very powerful search capability.


No comments:

Post a Comment