Standard fields

Smartsite 7.9 - ...

Purpose

An ES document includes a number of standard fields. These fields are predefined and are suffiently complete for a production search solution.

Standard fields consist of:

  • doc fields such as doc_title and doc_authors for fields that pertain to the document
  • system fields such as system_number and system_location, added by the system for document registration and processing.

Standard fields

The table below specifies the following information.

  • The type String is followed by (analyzed) if breaking and language specific stemming is applied to the field content, which usually modifies the field content. The type is followed by (normalized) if the field content is only normalized, preserving the field content except for modifications such as conversion to lowercase or ASCII folding (mapping é to e). The type is followed by (plain) if the plain content is used, without normalizing.
  • An indicator 1 if the field can have at most one value, for example a title. An indicator n if the field can have multiple values, for example as is the case for keywords.
  • A field marked as a DC field is a Dublic Core field.
Field name Type 1_n Description
doc_authors String (normalized) n Authors. Corresponds to the DC field Creators.
doc_body String (analyzed) 1 Main text of the document.
doc_created DateTime 1 Creation date. A DC field. The earliest date is used if a document yields multiple creation dates.
doc_description String (analyzed) 1 Description. A DC field. Audience is the reader of the document, as opposed to a description for use by the content editor.
doc_fileformat FileFormat 1 File format, for example pdf or html. A DC field.
doc_identifier String (normalized) 1 Identifier, being the identification of the document as assigned by the publisher or authors. A DC field.
doc_keywords String (analyzed) n Keywords. Corresponds to the DC field Subject for which DC suggests to use a controlled vocabulary.
doc_language DocumentLanguage 1 Language. Contains one of the supported languages, for example Dutch, English, French or German. A DC field.
doc_modified DateTime 1 Modification date. A DC field. The latest date is used if a document yields multiple modification dates.
doc_publisher String (plain) 1 Publisher, for example a company name. A DC field.
doc_title String (analyzed) 1 Title. A DC field. A provider may supply a fallback title if no document title can be established. For example the file provider will use the document file name and the web crawler provider will use the last segment of the URL.
doc_url Uri 1 Uniform resource locator used to open the original document, for example the URL of an HTML page.
system_autocomplete String (normalized) n Internal field used to send a set of autocomplete terms to the Elastic Search index. The document is used to build one set of terms. ES accumulates the term sets of all documents and uses this to support an autocomplete when the user types search terms.
system_document_size LongInteger 1 Document size. This is the total of the number of characters in relevant document fields.
system_guid Guid 1 System assigned globally unique identifier.
system_location String (normalized) 1 Provider specific document location specification. A file system provider may for example use this field for the file system path of the document.
system_number LongInteger 1 System assigned number. This number is both the key in database document registration table esDocuments, and the document number in the Elastic Search index.
system_phrase_suggester String (plain) 1 Internal field used to send a phrase suggester text to the Elastic Search index. The document is used to build the text. ES accumulates the texts of all documents and uses this to support did you mean after a search by the user.
system_provider_code String (normalized) 1 Code of the provider type used for the discovery and indexing of the document.
system_source_code String (normalized) 1 Code of the source that was responsible for the discovery and indexing of the document.
system_usergroups String (normalized) n Codes of authorization groups that are allowed to find and view the document.