All providers

Smartsite 7 - ...

Purpose

A provider definition supports to configure a source of documents. A source may for example be configured for:

  • A set of sites to crawle, using the web crawler provider
  • A set of folders to search for files, using the file system provider
  • A database to query, using the sql database provider.

One particular provider definition can be used to configure several sources that use this provider.

Configuration

All sources share the following configuration.

General

  • Name. Readable name of the source. Names need not to be unique, although it is recommended to use unique, descriptive names.
  • Code. Code that uniquely identifies the source. The code is included in each indexed document provenient from the source, as field system_source_code. At search time this allows to filter on the source, using facets.
  • Enable indexer. Whether to enable discovery and indexing for the source. Documents already stored in the Elastic Search index remain indexed regardless whether indexing is enabled.
  • Discovery interval, in minutes. Wait interval between the end of the last discovery and the begin of the next discovery. Discovery is the process of detection of new and modified documents at the source. A typical discovery interval could be hours or days.
  • Maximum index age for known document, in hours. Maximum age of the document information, relative to the last index date time registered for the document. When reaching the maximum index age Enterprise Search reconsiders the document. ES may decide to remove the document because it no longer exists at the source, or it may reindex the document because the document is modified at the source, or it may decide to keep the index information because the document is not modified at the source. A typical maximum index age could be several days.
  • Maximum file size, in bytes. Maximum size of a disk file, download file, HTML page or other document. Enterpise Search skips indexing of a document if it exceeds the maximum size, recording an error for the document. ES attempts to avoid downloading or processing the document if the size can be established beforehand.

Security

Supports to select user groups that can find and read documents from this source. Notes:

  • Groups are currently Smartsite user groups.
  • Enterprise Search stores the group codes, as opposed to group names or group numbers.
  • ES stores group codes with documents, using field system_usergroups.
  • ES stores group codes for new documents and updated documents. A change of configured groups is not reflected by documents already in the Elastic Search index. Reindex the source if necessary.
  • ES adds group code EVERYONE if no groups are configured. This group code can be present as a Smartsite group code; this is however not required.

Groups can serve the following purposes:

  • At search time it is possible to specify a group or a set of groups, limiting the search result to the documents the user is allowed to find and read. The system preselects the groups, based on user authorization.
  • At search time groups could be used for other filtering or facetting purposes.