Enterprise Search –
SharePoint 2013
v What’s New in Enterprise Search
ü Single Search
Architecture
SharePoint 2013 is based on a single search architecture, which includes
a single search service application. FAST search and FAST technology is now
native to the product, not a separate product. This is a complete
re-architecture, and many of the components are brand-new or revised.
ü Search Center and
Search UI
-
The new look for SharePoint 2013 is also
displayed in the Search Center.
-
The Search Center includes deep refiners with
exact counts, and document previews with the new “take a look inside” functionality available in the new hover panel
with the new Office Web Applications.
-
Search Verticals (new name for what used to be
the tabbed Interface)
-
Four new web parts
ü Relevancy Improvements
SharePoint 2013 improves relevance in areas such as freshness of search
results, linguistics, and document parsing.
v Search Architecture
SharePoint 2013 search has been
re-architectured , and the goal of achieving a single enterprise search
platform has introduced a number of changes. You can consider SharePoint 2013
search to be a combination of SharePoint 2010 search, FAST search for
SharePoint 2010, and FAST technology.
The Search topology has several key
improvements:
-
Separate crawl and indexing processes
-
A new analytics process that provides search and
usage analyses, including link analysis and recommendations
-
The entire index is stored locally on disk, and
it no longer uses the property database
-
Search is scalable in two dimensions, content
and query load
-
The administration component can be made fault
tolerant
-
Native support for repartitioning the index as
part of scaling out the topology
ü Topology
The topology can be broken down into search components and databases that
work together to provide search capability as shown below.
In a multi-server farm, these components reside on application servers,
and the databases exist on SQL server database servers.
The Search components can be categorized in to five groups or processes:
Crawl and content – Includes
the crawl and content processing components and the crawl database
Analytics – Includes the
analytics processing component, and the links and analytics reporting databases
Index – Includes the index
component, index partition, and index replica
Query – Includes the query
processing component
Administration – Includes the
administration component and the administration database
ü Managing the crawled
process and crawled properties
-
The Whole process begins with the crawl component, which is also referred
to as the Crawler.
-
This component crawls the content sources, and
it delivers the crawled content and associated metadata to the content processing component.
-
To manage crawl volume and performance, you can
simultaneously crawl content using multiple
crawl components.
-
As the crawler processes data, it caches the
content locally in preparation for sending content to the content processing
unit.
-
The crawl component uses one or more crawl databases to temporarily store
information about crawled items and to track crawl history.
-
There is no longer a one-to-one mapping of the crawl database to crawler as in
SharePoint 2010; each crawl database can be associated with one or more crawlers,
so you can scale them independently.
-
To support the need for a “fresher” index,
SharePoint 2013 includes a new crawl type, the continuous crawl. The continuous crawl is applicable only to
SharePoint content sources, and is a new option you can choose when you create
a new content source.
-
Continuous crawl is like the incremental crawl
but without the need to be scheduled. With continuous crawl, changed content is
crawled every 15 minutes by default, but the frequency is configurable.
-
As in SharePoint 2010, all crawler
configurations are stored in administration
database.
-
The content and metadata that has been crawled
and extracted from a document or URL are represented as crawled properties. New
crawled properties are created after each new crawl, as new content is added to
the enterprise. Crawled properties are passed to the content processing
component for further analysis.
ü Content Processing
-
This is a very specialized node in the search
architecture, whose purpose is to analyze and process the data and metadata
that will be included in the index.
-
The processing
node transforms the crawled items and crawled properties using language
detection, document parsing, dictionaries, property mapping and entity
extraction.
-
This component is also responsible for mapping
crawled properties to managed properties.
-
When a user performs a search, and clicks a
result, the click-through information is also stored unprocessed in the link database.
-
All this raw data is subsequently analyzed by
the analytics processing component,
which updates the index with relevancy information.
-
Once completed, the transformed data is then
sent to the index component.
-
Content processing configurations are stored in
the search administration database.
-
The content processing component is also highly
extensible, by using the web services
that would provide information about how content should be processed.
Managed
Properties
-
Crawled properties are mapped to managed
properties to include the content and metadata in the search index.
-
Only managed properties are included in the
index; therefore, users can search only on managed properties.
-
Managed properties have attributes, which
determine how the contents are shown in the search results.
-
Managed properties also have associated
attributes, also referred to as properties;
yes, the managed properties have properties.
-
The list of default managed properties, also
referred as search schema or index schema, contains
the managed properties, their associated properties, and the mapping between
crawled properties and managed properties.
-
You can edit the search schema yourself,
manually mapping crawled properties to managed properties, and configuring
property settings.
-
The content-processing
component utilizes this schema to perform any necessary mapping.
-
A single managed property can be mapped to more
than one crawled property.
-
You can also map a single crawled property to
multiple managed properties.
Search
Schema
The search schema is stored in the search
administration database, and the schema web page, which is called search service application: Managed
properties, shown in the below picture
This page is available from the search service application: Search
Administration page in Central Administration, using the search schema link in
the Queries and Results section.
Key points to remember about the search
schema include the following:
-
It contains the mapping between crawled
properties and managed properties, including the order of mapping for those
cases that have mapped multiple crawled properties.
-
It maintains the settings for which index stores
the managed property.
-
It contains the settings or properties for each
of the different managed properties.
-
Site collection administrators can change the
search schema for a particular site collection using the site settings page,
and customize the search experience for that specific site collection. This is
a new capability in SharePoint 2013.
-
It is possible to have multiple search schemas.
ü Analytics Processing
This is a brand-new component for the search architecture. Its purpose is
to analyze the content and how users interact with the content to improve
search relevance, create search reports and recommendations, and create deep
links.
The analytics component analyzes
two different types of information:
·
Search Analytics – Information from crawled
items that is stored in the index
·
Usage Analytics – Information about how users
interact with the search results, such as how many times an item is viewed.
The Web Analytics capability in SharePoint 2010 has been discontinued,
replaced by the analytics processing component
in SharePoint 2013. This change was necessary to increase performance and
scalability. The analytics component provides additional capabilities such as
report of top items, recommendations, and dynamic improvement of search result
relevancy.
ü Index Processing
The index is the key to providing the best search experience, as its
content determines what users find when executing search queries. SharePoint
2013 Search is a data access technology,
because it provides access to information beyond just the search box query.
The index component receives crawled and processed content and this information
is added to the search index. This component also handles incoming queries,
retrieves information from the search index, and sends back the result set to
the query processing component.
The index processing architecture
can be divided into:
·
Index partition
·
Index replica
·
Index component
SharePoint 2013 stores all the index on disk. Search capability is scaled using index partitions and index
replicas; the “row and columns” terminology from SharePoint 2010 is gone.
ü Query Processing, Query
Rules and Result Sources
The query-processing component analyzes incoming queries, which are sent
to the index component, which returns a set of results. This component performs
linguistic analysis of the query, including word-breaking, which determines the
boundaries of the words in the query, and stemming, which defines the base or
root form of the words in the query. Once the query is processed, the query is submitted
to the index component, which returns results from the index. The results are
returned to the query component, where they are further processed before
returning the results to the search front end.
Query rules and result sources are new features in
SharePoint 2013. Query rules can be
used to conditionally promote certain results, display the results in blocks,
and tune relevancy. Results sources
are used to scope the search results.
ü Administration
This component is responsible for running processes that are essential to
search, including new component provisioning. The search administration database
stores search configuration data, such as the topology, crawl rules, and the
mappings between crawled and managed properties. Each search service
application can have only one search administration component. The current
search configuration is accessible through central administration, but
modifying the search topology requires PowerShell.
This completes the architecture overview. As you have seen, several
enhancements have been made to the search architecture, and these changes have
resulted in a very powerful search capability.
























