Synchronizing custom database tables using timestamps

Release 1.1 - ...

One of the more advanced possibilities within Outscaling, is the functionality which enables you to synchronize a custom database table between two cms instances (each having their own database) using an (Outscaling) Scheduled event. This article describes how to accomplish this. Some of the links within this article refer to the Smartsite 5 support site, so if you want to read all the background information, make sure you have an account for this site as well.

Concept

The example used in this article is a real-life use case. To publish (a selection of) the public summaries from our internal work items list to the iXperion community site, Outscaling has been configured to synchronize the custom table PublicWorkItemSummaries by way of creating a datasource.

Since the data included within this table isn't being managed using the cms manager, no events are generated whenever a record is added or updated. But no problem, Outscaling's concept of Scheduled events can be used to synchronize custom tables.

Furthermore, the primary keys of the records must be kept in sync (both ends must use the same primary key). This involves configuring SmartNumbering. And since we only wanted to synchronize changed records, the configuration specifies that a timestamp column must be used.

Each of these concepts will be discussed in detail in the remainder of this article.

Custom table

Outscaling is able to synchronize almost any table between cms instances, each having their own database. However, it is only capable of synchronizing records, the table itself must already exists. Also, schema changes to a table (e.g. adding/removing columns, changing column defaults, etc.) must be applied manually on all database instances. The table itself must, of course, have a (single) primary key column.

Any table for which a default datasource definition doesn't exists is considered a custom table. However, you don't need to create a datasource for every table. Cross-tables storing relations between two other tables don't require a datasource definition. The relations stored within this cross-table will be synchronized as part of the table/datasource to which this cross-table relates. For example, the tables UserGroupUsers and UserGroupsUserRoles do not have a corresponding datasource, and the relations stored within these tables are automatically synchronized when a particular UserGroup is synchronized.

Datasource

First of all, whenever you want Outscaling to synchronize records from a certain database table, you need to create a datasource for it. Without a datasource for a specific table, Outscaling will never synchronize records from that table.

An Outscaling datasource specifies which table must be synchronized (the datasource name), from which source server instance to which target server instance, and whether or not this should be allowed to take place automatically or if the datasource is intended to be used for manual or scheduled synchronization.

Datasource for PublicWorkItemSummaries

In this example, the source server is our internal Doc6 cms instance and the target server is the iXperion community site cms instance. The code field, which must be unique, is just to identify each datasource and is used later on in the definition of the scheduled event.

The records, in this case, must be pushed (Direction is set to push) from the source to the target server. Pull is the other option, in which case the source server will collect changed records from the target server and update its own database accordingly.

The update mode has been set to Scheduled, which is an indication for Outscaling that this datasource must exclusively be used in combination with a scheduled event.

When a record is received at the target server, Outscaling checks (again) if the specified table has a corresponding datasource. If not, it rejects the update request. So, when creating a datasource always make sure it exists on both ends, either by creating the definition manually on both ends or by configuring the table osDatasources to be synchronized automatically.

The Datasource configuration will be shown later on.
More background information about Datasources can be found here.

Scheduled Event

Whenever e.g. an user, usergroup or userrole is being added/edited, using the standard cms actions within Configure Security, an so-called DataEditor event is raised. These DataEditor events are the backbone for Outscaling's mechanism to automatically synchronize changed records. They act as a signal and tell Outscaling which record from which table must be synchronized. It's then the datasource configuration(s) which determines to which other cms instance(s) these changes must be send.

However, our custom table PublicWorkItemSummaries isn't being managed using an DataEditor action. In fact, it is directly filled and updated from another (internal) database using an SQL job. So, there are no DataEditor events to be used as signal.

Although it is possible to generate appropriate DataEditor events ourselves when a record changes (using a trigger), we've chosen to use the Outscaling concept of Schedules Events, in combination with the use of a timestamp.

An Scheduled Event is a specific Outscaling mechanism and in fact just holds configuration information about the underlying event which must be added to Outscaling's eventqueue (the table osEventQueue) periodically. This may be one of the standard Smarsite events, as well as another specific Outscaling event.

You can create an Scheduled Event directly from the Outscaling Console or you can use the Configure Scheduled Events action from the Outscaling menu (recommended).

Below, screenshots are shown from the actual created event, after using the Configure Scheduled Events action.

Event Information

Scheduled event - event information

In this case, we want to raise the Outscaling.SynchronizeDatasoure event periodically. This event, which can be seen as an Outscaling internal processing instruction, tells Outscaling's background component to start synchronizing records from the datasource as specified within the event data (the third tab).

The source and target server must be specified separately here (the datasource definition includes the same config), since Outscaling must be able to query which scheduled events needs to be processed on a particular cms instance.

Scheduling

Scheduled event - scheduling

The Scheduling tab displays the settings for when the event should be added to the Outscaling eventqueue. This can be done on a regular interval (e.g. every 2 hours) or on a specific time. You can use Schedule Mode None to (temporarily) disable an Scheduled Event. The Last Run Date displays the date/time the Scheduled Event has been added to the Outscaling eventqueue for the last time, which of course (approximately) equals the last time the specified datasource has been synchronized.

Event Data

Scheduled Event - Event Data

The Event Data tab contains information about the target instance (the eventqueue table doesn't have a target instance field, so this information must be included within the eventdata as well) and the code of the datasource that must be used when executing the Outscaling.SynchronizeDatasource event.

SmartNumbering

Now everything is in place to let Outscaling synchronize the specified table PublicWorkItemSummaries every two hours. However, the primary key field, which is usually an identity field as well, must be kept in sync. (Within this example, we're dealing with SQL Server on both ends.)

When on the source server instance an record has been added, that record must get the same primary key on the target server instance (for referential integrity at the least). When an identity is in place on the primary key field, this is not possible (at least not without reverting to use identity inserts, but that's a mechanism not being used within Smartsite).

This is where SmartNumbering comes in. SmartNumbering is in fact a kind of replacement for the identity. When SmartNumbering is configured for a specific table, an instead of insert trigger is created which inspects the record being added. When it does contain a value for the primary key, this value is used, otherwise a new number is determined based on the numberranges defined in the SmartNumberRanges table.

So, in a scenario like this, the identity of the primary key field on the target server instance's database must be removed and SmartNumbering must be configured for the table in question.

Timestamp

Almost there, records will now keep the same primary key on both ends, which also makes it a lot easier to update the appropriate record on the target server instance when a record on the source server instance has been updated.

But now, we also wanted to synchronize only those records which have been either newly added or updated since the last time synchronization of the datasource took place.

Outscaling includes a mechanism to use a Timestamp column (or a similar replacement in Oracle using varbinary), which can be specified within the datasource configuration. In the datasource configuration example shown below, a timestampcolumn with the name timestamp has been specified.

Datasource Configuration

A column using the SQL Server data type timestamp is being managed by SQL Server itself. The value in the timestamp column is updated every time a row containing a timestamp column is inserted or updated. It contains a unique sequential binary number which can be used to track which records have been added or updated.

When the datasource configuration specifies a timestampcolumn to be used, Outscaling keeps track of the latest/highest timestamp value used (it is stored within the osDatasources table itself in the column LastTimestamp).

The synchronize datasource mechanism uses this value to get the records that has changed since the last synchronization (ordered by the timestampcolumn). Then, each record is send to the target server instance, and if it succeeds adding/updating the record, the LastTimestamp column of the appropriate datasource is updated with the timestamp value of the record send. This way, LastTimestamp always contains the timestamp value of the latest record that has been send.

The next time the Outscaling.SynchronizeDatasource event is triggered, the process starts again. The LastTimestamp value of the datasource is used to get any records which have been added or updated since the last synchronization.

Note however, on initial setup you must supply an appropriate/valid value for the LastTimestamp column within the osDatasources table. You can just insert 0, but then every record within the table will be send to the target server instance within the first run.

When the table contains a lot of records, or you're certain the table on the target server instance is initially in sync, it is better to use the maximum value of the timestampcolumn as value for the LastTimestamp column within the associated datasource record.

Summary

As this article explains, Outscaling can be used to synchronize (almost) any database table. Which database table(s) must be synchronized to subscribed target server instance(s) and which not, is just a matter of Outscaling Datasources configuration.

Records within a database table which are being managed using the DataEditor are synchronized automatically (when configured accordingly). Other tables can be synchronized using the concept of Scheduled Events.

To keep primary keys in sync without intervention of identity configuration, SmartNumbering must be configured. And to limit the number of records to synchronize, Outscaling's timestamp support can be used to only synchronize those records that have actually been added or changed since the last time synchronization of a particular datasource took place.