General information

Release 1.3 - ...

Full-text catalog

The Smartsite iXperion Faceted Search module extensively uses the database's full-text technology in order to gain rich search functionalities while preserving performance. SQL Server's FTS/iFTS uses a so-called full-text catalog as a core of its technology. Therefore, Faceted Search adds a full-text catalog SmsSearch during its setup.

SQL CopyCode image Copy Code
CREATE FULLTEXT CATALOG SmsSearch
WITH ACCENT_SENSITIVITY = OFF
AUTHORIZATION dbo;

Mind that, when omitting WITH ACCENT_SENSITIVITY = OFF, SQL Server lets the accent sensitivity handlings depend on the used collation. Typically this collation has _AS_ in its name which literally means Accent Sensitive. Especially for languages like English and Dutch, this behavior usually is not desired.

Predicates and functions

Simply put, a predicate is a comparison operator in the WHERE clause of a SQL statement. In SQL Server FTS, predicates take a format similar to the SQL EXISTS predicate, with no explicit comparison operators required.

SQL Server FTS provides two search predicates, both based on SearchSQL, which is an extension to the SQL language promoted by the ISO and ANSI committees. These predicates are the following:

  • The CONTAINS predicate provides access to phrase search, proximity search, strict search, and advanced query capability.
  • The FREETEXT predicate provides fuzzy search and basic query capabilities.

SQL CopyCode image Copy Code
/* Simple FTS/iFTS queries using predicates */
SELECT Nr, Title FROM vwtsContentPUB WHERE FREETEXT (*, N'Smartsite iXperion');
SELECT Nr, Title FROM vwtsContentPUB WHERE CONTAINS (*, N'Seneca and Smartsite');

The functionality of the FREETEXT and CONTAINS predicates are also exposed by the rowset functions, FREETEXTTABLE and CONTAINSTABLE. These are complementary functions that accept parameters including a search phrase. Unlike their predicate counterparts, these functions return rowsets consisting of two columns: a key column named KEY and a rank column named RANK. The rowsets returned by these functions can be returned to the client application or used server-side to join against the source table (or another related table). You can sort the results in descending order to push the most relevant results to the top of the result set.

SQL CopyCode image Copy Code
/* Simple FTS/iFTS query using (rowset) functions */
SELECT c.Nr, c.Title, k.[RANK] FROM vwtsContentPUB c
INNER JOIN FREETEXTTABLE (vwTsContentPUB, *, N'Smartsite iXperion') k ON c.Nr = k.[KEY]
ORDER BY k.[RANK] DESC;
SQL CopyCode image Copy Code
/* Simple FTS/iFTS query using (rowset) functions */
SELECT c.Nr, c.Title, k.[RANK] FROM vwtsContentPUB c
INNER JOIN CONTAINSTABLE (vwTsContentPUB, *, N'Seneca and Smartsite') k ON c.Nr = k.[KEY]
ORDER BY k.[RANK] DESC;

If you return the rank, you’ll see that it’s a value between 0 and 1000. The rank value is derived from a statistical method that attempts to assign a value to the relevance of each result. The higher the rank, the more relevant your result; 1000 indicates a perfect match. SQL Server iFTS uses a formula known as the Jaccard coefficient in the rank calculation.

Unicode search phrases

Search phrases should always be declared as Unicode (nvarchar, nchar). When you don’t declare them as Unicode, SQL Server implicitly converts them to Unicode, so it’s a good idea to explicitly declare them as Unicode from a performance perspective. Declaring them as Unicode also helps prevent parameter sniffing. Furthermore, with the trend toward globalization, restricting your full-text-indexed columns to specific non-Unicode collations can prove short-sighted and result in the need for a drastic and expensive system overhaul down the line.