SQL database
Purpose
A SQL database provider supports to use a database as a source of Enterprise Search documents. This for example allows to use a database table such that each table row results in an ES document.
The database provider currently supports SQL Server, database version SQL Server 2008 R2 and higher.
Configuration
Common configuration
Part of the configuration is common to all sources. Refer to All providers.
Database
Select a database connection name from the list. Database connection strings are maintained in configuration file Smartsite.Data.config, which resides on the webserver that runs the Smartsite manager. For security reasons it is not possible to interactively enter a connection string.
Discovery query
The discovery query yields the set of documents for the source. Each query row results in one Enterprise Search document. A database application may delete a particular row. Enterprise Search deletes the ES document if the discovery query no longer yields that row.
The following is a discovery query for table Products of the Microsoft Northwind example database. Column Modified was added to the table for example purposes.
SELECT
'Product-'+CAST(ProductID AS nvarchar) AS SystemLocation,
Modified AS SystemLocationModified
FROM Products
Columns:
Column name | Description |
---|---|
SystemLocation | This column is mandatory. A value uniquely identifies the document within the source. Just ProductID is also a unique key and would also be suitable. Prefix 'Product-' is included such that additional tables can be added to this source, using an UNION query. |
SystemLocationModified |
This column is optional, although recommended. A value indicates when the document was modified last, at the source. The provider records the value, at the ES side. The provider assumes the document is not modified if both values are available and the values are equal. The provider skips reindexing in that case, resulting in a performance gain. The provider reindexes the document if either value is not available or if the values differ. Reindexing occurs every discovery cyle in that case. |
After completing a discovery and after the configured discovery interval of for example 60 minutes Enterprise Search runs a next discovery.
Primary query
The primary query as well as the secondary queries yield fields for one ES document. Enterprise Search runs these queries whenever it is necessary to index or reindex the document. ES supplies the above SystemLocation value as @Key, allowing to select the appropriate database data. Example:
SELECT
ProductName AS DocTitle,
'Northwind' AS DocAuthors,
'http://localhost/products/'+CAST(ProductID AS nvarchar) AS DocUrl,
Modified AS DocModified,
Modified AS SystemLocationModified
FROM Products
WHERE ProductID=RIGHT(@Key, LEN(@Key)-LEN('Product-'));
All columns are opional. Columns used in the example:
Column name | Description |
---|---|
DocTitle | The product name is used as document title. Column DocTitle corresponds to document field doc_title. For standard fields refer to standard fields. |
DocAuthors | A fixed value Northwind is used as author for all documents obtained from this source. |
DocUrl | This URL will apear in the search result for a found document. The user can navigate to the page addressed by the URL. The URL could for example be http://localhost/products/123. The page should present document details and/or the document itself. |
DocModified | The product modified timestamp is used as the date time the document is modified last. |
SystemLocationModified | The product modified timestamp is also used for the mechanism to skip reindexing if the timestamp at the source is equal to the timestamp recorded by ES. If available it is recommended to include this timestamp, for performance. |
Secondary queries
The secondary queries as well as the primary query yield fields for one ES document. Example:
SELECT
c.CategoryName AS DocKeywords
FROM Products p JOIN Categories c ON p.CategoryID=c.CategoryID
WHERE ProductID=RIGHT(@Key, LEN(@Key)-LEN('Product-'));
All columns are opional. Columns used in the example:
Column name | Description |
---|---|
DocKeywords | The product category is used as a keyword for the document. The primary query and secondary queries can return multiple rows. For a multivalue field such as doc_keywords ES will use each row to add one keyword to the field. |
Primary query versus secondary queries
The primary query and secondary queries behave similarly:
- Both types of queries yield fields for the ES document identified by @Key.
- Both types of queries can return no rows, one row or multiple rows.
The primary query and secondary queries differ as follows:
- ES considers the document deleted if the primary query yields no rows for key @Key. Secondary queries may yield no rows without causing a delete of the ES document.
- The primary query can yield a SystemLocationModified and ES willl use it. ES ignores any SystemLocationModified in the result of secondary queries.
Document data accumulation
Enterprise search builds-up document data using the primary and secondary queries. Both the primary and secondary queries may yield multiple rows. As a result there may be multiple values for a particular document field: provenient from more than one query, and/or provenient from more than one row. ES accumulates field data as follows.
Column name | Field name | Description |
---|---|---|
DocAuthors | doc_authors | Add. This field can contain multiple values, and ES adds the value to the collection of values being built up for the document. |
DocBody | doc_body | Merge. This field contains text. ES merges values by concatenating texts, separated by a space. |
DocCreated | doc_created | Replace. ES replaces the current value by the next value. Effectively the last value applies. |
DocDescription | doc_description | Merge. |
- | doc_fileformat | Not supported. |
DocIdentifier | doc_identifier | Merge. |
DocKeywords | doc_keywords | Add. |
DocLanguage | doc_language | Replace. |
DocModified | doc_modified | Replace. |
DocPublisher | doc_publisher | Merge. |
DocTitle | doc_title | Merge. |
DocUrl | doc_url | Replace. |
ExtraAbc: extra field with a value of type string | extra_abc | Merge. |
ExtraAbc: extra field with a value of other type than string | extra_abc | Replace. |
SystemLocationModified | Replace. This applies to the timestamps returned by rows of the primary query only. |
Special behaviour:
- Merge: it is possible to request the replace behaviour instead. ES performs a replace rather than a merge if the next text starts with _replace_, underscores included, comma excluded.
Maintenance of documents
New documents
The sql database provider uses the discovery query during a discovery cycle, resulting in a set of rows for new and existing ES documents. A new row results in a new ES document. A row is new if the SystemLocation in the row is new given the system locations of the ES documents currently recorded for the source.
Having completed a discovery the provider waits the discovery interval, for example configured as one hour, and then starts a next discovery.
Recording a new ES document schedules the document for indexing. Whether indexing occurs immediately or with a delay depends on the number of ES documents currently queued for processing.
Modified documents
The provider detects a modified document during a discovery cycle. The discovery interval may for example be configured as one hour. The provider schedules reindexing of the document. Whether indexing occurs immediately or with a delay depends on the number of ES documents currently queued for processing.
Removed documents
The provider detects a removed document during an aged information check cycle. This is a check if the age of the information recorded for a document exceeds a configured age of for example 48 hours. The provider marks the document as dirty if exceeded, scheduling a recheck of the document. The recheck causes removal of the document. Whether the recheck occurs immediately or with a delay depends on the number of ES documents currently queued for processing.