Smartsite

Smartsite 7.9 - ...

Purpose

The Smartsite provider supports to use the Smartsite content management system as a source of Enterprise Search documents. One content item typically results in one ES document.

The provider assembles an ES document from a content item, applying a hybrid approach:

  • It uses content item fields as ES document fields. It for example uses content item field Title as ES document field doc_title. This is done by means of a query of the CMS database.
  • It renders the content item using the Smartsite render engine, and uses the resulting HTML page as main content for the ES document, field doc_body. Rendering an item allows to include application logic when producing the page.

Configuration

Common configuration

Part of the configuration is common to all sources. Refer to All providers.

Item selection

Item selection applies during discovery.

Specify which items will lead to ES documents.

  • Select items by selecting one or more folders in the CMS item tree. The provider will select each folder item, and items recursively below each folder. The provider restricts selected items to the channel, for example channel DEF, by selecting from view vwActive_DEF.
  • Or specify a query, bypassing the folder selection and bypassing the restriction to a channel.

The provider further restricts selected items to the selected content types, in both cases.

When using a query include columns Nr, Channel, ActivationDate and ContentType. These are native columns of table Contents, except Channel which needs to be aliased. For example:

SELECT Nr, 'DEF' AS Channel, ActivationDate, ContentType FROM vwActive_DEF

The provider uses column ActivationDate as field SystemLocationModified. This field helps establishing whether the item must be reindexed because it was updated compared to the timestamp recorded when the item was indexed the last time, or whether reindexing can be skipped because the timestamps are equal indicating that index information for the item should still be valid.

Content selection

Content selection applies when (re)indexing documents.

Specify how to select content, in particular how to obtain ES document field doc_body, containing the main document text.

  • Body based on a default url. The provider uses a default url to render the content item. It for example renders url https://docs.seneca.nl/Home/Examples/Using-Virtual-Assemblies.html, using the resulting page as main document text. The url can be relative to the current site. A relative url has to start with a forward slash (/).
  • Body based on a custom url. A custom url allows to add url parameter that steer the rendering. For example https://docs.seneca.nl/{channel.defaultdocument}?{channel.id}={item.nr}&es=indexing could indicate special rendering for indexing purpose. Url parameter es=indexing requests to omit menus and other navigation, limiting the page to the net content required for indexing. The url can be relative to the current site. A relative url has to start with a forward slash (/).
  • Body based on content. This requests to query the database and use the value in content item column Body as ES document field doc_body. No page is rendered for this.
  • Body based on an SQL Query. Specify a query that yields the ES document body, for example: SELECT Body FROM vwContent WHERE Nr={item.nr}. No page is rendered for this. The column can be any column or composition of columns. If necessary alias the colum as Body.

The provider resolves placeholders in urls and queries. 

Placeholder Example Replacement
{channel.code} DEF Channel code.
{channel.defaultdocument} smartsite.net Channel default document.
{channel.id} id Query parameter name used to specify the item number.
{channel.nr} 60 Channel number.
{item.code} HOME Item code.
{item.nr} 1234 Item number.

 The provider selects from standard content item columns in order to produce standard ES document fields.

Content item column Notes Example ES document field
ActivationDate     doc_modified
ActivationDate     system_location_modified  
AddDate     doc_created
Author     doc_authors
Body Unless rendered   doc_body
- Channel code DEF extra_sms_channel_code
Code Through {item.code} placeholders HOME  
ContentType From number to code CWP extra_sms_contenttype_code
Description     doc_description
Nr Channel code is included DEF:1234 system_location
Nr Channel specific friendly name   doc_url
Nr Through {item.nr} placeholders 1234  
Title     doc_title

Content relation dependencies

Optionally it is possible to specify content relation dependencies by means of a selection of content relation types. This is part of the mechanism to invalidate an ES document if it depends on a set of content items and any item in the set is modified.

For example content relation type AIMIF may be selected; the AIM type for item-to-folder relations. Active Integrity Maintenance (AIM) performs real time maintainance of content relations during content management.

The provider indexes an ES document. The document is based on content item 1234. The provider also records dependencies for the document, using content relations where item 1234 appears on the from side of the relation, taking items that appear on the to side of the relation, for relation type AIMIF.

The CMS detects modification of items. It marks an ES document as dirty if the document depends on a modified item. This is perfomed in real time.

An ES document marked as dirty will be reindexed. Whether this is done immediately or with a delay depends on the number of documents currently queued for (re)indexing.

XML document rendering

The above rendering of a content item resulted in an HTML page, used for ES document field doc_body only. Rendering can also result in an XML page in the document data format, allowing to produce doc_body and/or several other fields.

The Smartsite provider supports these additional fields.

The database and the xml may both yield a value for a particular ES document field. For example column Title may be obtained from the item in the database and element <doc_title> may be obtained from the xml, resulting in two values for ES document field doc_title. The provider gives precedence to the xml value in that case: it replaces the value obtained from the query by the value obtained from the xml.

Maintenance of documents

New documents

The Smartsite provider detects new content items during a discovery cycle. A new qualifying content item results in a new ES document. A content item is considered new if the combination of channel and item number is new. The provider records this combination as the system location for the ES document.

Having completed a discovery the provider waits the discovery interval, for example configured as one hour, and then starts a next discovery.

Recording a new ES document schedules the document for indexing. Whether indexing occurs immediately or with a delay depends on the number of ES documents currently queued for processing.

Modified documents

The provider detects a modified content item during a discovery cycle. The content item is considered modified if its activation date time differs from the system location modified timestamp recorded for the ES document. The provider marks the document as dirty if the content item is modified.

In addition the document becomes marked as dirty in real time if the underlying content item is modified.

In addition the document becomes marked as dirty in real time if a content item it depends on is modified, provided that the above content relation type dependencies are properly configured.

The provider schedules reindexing of the modified document. Whether indexing occurs immediately or with a delay depends on the number of ES documents currently queued for processing.

Removed documents

The provider detects a removed document during a discovery cycle. At the end of the discovery cycle it checks each known ES document. It deletes the ES document if the underlying content item no longer exists.