Standard fields
Smartsite 7.9 - ...
Purpose
An ES document includes a number of standard fields. These fields are predefined and are suffiently complete for a production search solution.
Standard fields consist of:
- doc fields such as doc_title and doc_authors for fields that pertain to the document
- system fields such as system_number and system_location, added by the system for document registration and processing.
Standard fields
The table below specifies the following information.
- The type String is followed by (analyzed) if breaking and language specific stemming is applied to the field content, which usually modifies the field content. The type is followed by (normalized) if the field content is only normalized, preserving the field content except for modifications such as conversion to lowercase or ASCII folding (mapping é to e). The type is followed by (plain) if the plain content is used, without normalizing.
- An indicator 1 if the field can have at most one value, for example a title. An indicator n if the field can have multiple values, for example as is the case for keywords.
- A field marked as a DC field is a Dublic Core field.
Field name | Type | 1_n | Description |
---|---|---|---|
doc_authors | String (normalized) | n | Authors. Corresponds to the DC field Creators. |
doc_body | String (analyzed) | 1 | Main text of the document. |
doc_created | DateTime | 1 | Creation date. A DC field. The earliest date is used if a document yields multiple creation dates. |
doc_description | String (analyzed) | 1 | Description. A DC field. Audience is the reader of the document, as opposed to a description for use by the content editor. |
doc_fileformat | FileFormat | 1 | File format, for example pdf or html. A DC field. |
doc_identifier | String (normalized) | 1 | Identifier, being the identification of the document as assigned by the publisher or authors. A DC field. |
doc_keywords | String (analyzed) | n | Keywords. Corresponds to the DC field Subject for which DC suggests to use a controlled vocabulary. |
doc_language | DocumentLanguage | 1 | Language. Contains one of the supported languages, for example Dutch, English, French or German. A DC field. |
doc_modified | DateTime | 1 | Modification date. A DC field. The latest date is used if a document yields multiple modification dates. |
doc_publisher | String (plain) | 1 | Publisher, for example a company name. A DC field. |
doc_title | String (analyzed) | 1 | Title. A DC field. A provider may supply a fallback title if no document title can be established. For example the file provider will use the document file name and the web crawler provider will use the last segment of the URL. |
doc_url | Uri | 1 | Uniform resource locator used to open the original document, for example the URL of an HTML page. |
system_autocomplete | String (normalized) | n | Internal field used to send a set of autocomplete terms to the Elastic Search index. The document is used to build one set of terms. ES accumulates the term sets of all documents and uses this to support an autocomplete when the user types search terms. |
system_document_size | LongInteger | 1 | Document size. This is the total of the number of characters in relevant document fields. |
system_guid | Guid | 1 | System assigned globally unique identifier. |
system_location | String (normalized) | 1 | Provider specific document location specification. A file system provider may for example use this field for the file system path of the document. |
system_number | LongInteger | 1 | System assigned number. This number is both the key in database document registration table esDocuments, and the document number in the Elastic Search index. |
system_phrase_suggester | String (plain) | 1 | Internal field used to send a phrase suggester text to the Elastic Search index. The document is used to build the text. ES accumulates the texts of all documents and uses this to support did you mean after a search by the user. |
system_provider_code | String (normalized) | 1 | Code of the provider type used for the discovery and indexing of the document. |
system_source_code | String (normalized) | 1 | Code of the source that was responsible for the discovery and indexing of the document. |
system_usergroups | String (normalized) | n | Codes of authorization groups that are allowed to find and view the document. |