Nested documents

Smartsite 8.0 - ...

Purpose

A web page or a document or a database row or some other unit of information can be used to create an ES document. That ES document can be searched and can be found. Nested documents is the support to have an ES document, which is the root document, and to have ES documents nested below the root document. The root document and the nested documents can be searched. A hit results in finding the root document. The nested documents cannot be found individually. Nested documents can have further levels of nested documents.

Use case

The bestuurlijk informatiesysteem (BIS) or raadsinformatiesysteem (RIS) is a use case. Typically this is organized as as hierarchy of vergaderingen - agendapunten - documenten: meetings, a meeting having agenda items, an agenda item having associated office documents.

A RIS provider creates a root ES document per meeting. The root ES document has a nested ES document per agenda item. A nested agenda item ES document has a nested ES document per office document. The office document is perhaps not included in its entirety, but using its title and and other meta information.

ES searches all information. A hit for the meeting, agenda item or document results in finding the meeting.

Nested fields

Nested documents are supported by means of nested fields. The root document consists of

  • doc fields such as doc_title and doc_identifier,
  • extra fields as required for the search solution, and
  • system fields such as system_provider_code.

Nested documents contain fields as required for the search only:

  • doc fields such as doc_title, however not doc_identifier which identifies the root document, nested documents included,
  • extra fields,
  • however not system fields, which are used for management of the root document, nested documents included.

Silo configuration, field mapping, index creation

Enterprise Search uses strict field mapping. An Elastic Search index must be prepared for the maximum level of nesting for all providers that add data to the index, for example:

  • 1 when not using nested documents
  • 3 for the above use case.

A silo configuration may look like this:

<data ...>
<entry>
<languagefilterenabled>true</languagefilterenabled>
<docfields>
<list>
<item name="doc_title" datatype="text" />
<item name="doc_keywords" datatype="text" />
<item name="doc_body" datatype="text" />
</list>
</docfields>
<extrafields>
<list>
...
</list>
</extrafields>
<completion_thesaurus>KWD</completion_thesaurus>
<entry>
<docfields>
<list>
<item name="doc_title" datatype="text" />
</list>
</docfields>
 <extrafields>
<list>
...
</list>
</extrafields>
<entry>
<docfields>
<list>
<item name="doc_title" datatype="text" />
</list>
</docfields>
</entry>
</entry>
</entry>
</data>

The first level of the <entry> element is for the root document, also being the regular case if nested documents are not used. The nested <entry> element is for a first level of nested documents, and so on.

When increasing the number of levels it is required to recreate the elastic index.

The silo configuration is used to create an elastic index that contains field mappings and the specified number of levels of nested field mappings. This may look like this:

"mappings" : {
"dynamic" : "strict",
"properties" : {
"doc_title" : {
"type" : "text",
...
},
...,
"nested" : {
"type" : "nested",
"properties" : {
"doc_title" : {
"type" : "text",
...
},
"nested" : {
"type" : "nested",
"properties" : {
"doc_title" : {
"type" : "text",
...
}
}
}
}
},
"system_provider_code" : {
"type" : "keyword",
...
},
...
}
}

Provider

In order to make use of nested documents a provider is required that produces ES documents that contain nested documents. For the above use case a RIS provider would be required. Several providers just produce one root ES document per unit of information, for example per web page.

Searching

When searching an elastic index the searcher will automatically create a query containing appropriate nested queries: queries that access the nested fields. The number of nesting levels depends on the index being queried, and is established from the field mapping in the index. If multiple indexes are queried the maximum is used of the nesting levels of the involved indexes.