Data Distribution Summary

How is Distribution Accomplished

Architecture

The techniques described in this book have some specific architectural requirements. The first is that there is a central database that is the collection point and arbiter for all updates as well as being the distribution point for database updates. Updates to the other databases in the enterprise are considered provisional until the updated record is returned to the remote database from the central database. Conversely, any update that is made to the central database can override any pending updates from the remote databases.

Figure 1
Data Distribution Architecture

The database update processes occur at discrete times and are not continuous. They are instituted by the remote databases and not by the central database. This will allow mobile databases to not be continuously connected to the network.

Any update that is applied to the central database must not overwrite another update. This means that the row that is to be updated must have been the row that is currently in the central database. The methodology for doing this will be discussed further in the section on Central Database Updates.

All rows in the database will have a row version. This will be discussed in the section on Central Database Updates.

Remote databases will connect to the central database using a guaranteed delivery protocol such as TCP. The process will be performed in a series of synchronized steps. These steps are:

Data Update Collection
Update Submission
Central Database Updates
Completed Update Collection
Update Reception
Remote Data Update
Collision Detection

Each of these steps will be discussed following the section on Data Update Detection.

Data Update Detection

This part of the process is the only part that will actually be done by the applications. There are two basic methods for detecting updates, marking the rows with an update flag or writing the updates to an update table. If the database supports triggers, this can be done by the database itself. If the database doesn’t support triggers, the application will have to be modified to perform this function.

There are two methods for using a database update table. The first would be to create a table that contains columns for a table id or table name, primary key value, the date/time of the update and the database
transaction id of the update transaction. The second method would be to have the trigger or application code create a row in the update table that has columns for the table id or table name, primary key value, date/time of the update, the transaction id and the update to be sent to the central database.

Data Update Detection must also be performed in the central database. If update marking is performed, the marking is done in the central database by updating the row version. If a separate update table is maintained, the row version is still updated and the row version is added to the update table.

Data Update Collection

This is the first step of the synchronization process. The synchronization process will query the database to collect the updates that have been made since the last synchronization and place them in a file to be submitted to the central database. If updates are collected in an update table, the process reads this table
and writes the formatted updates to a file. If only the key values of the updated records are stored, this would also involve the reading of the data records and formatting the results.

If updated records were marked with an update flag, the process would have to read all of the required tables in the database to find the rows that have been updated. Care must be taken to avoid attempting to order these updates so that any referential constraints in the database are not violated. One method of doing this would be to flag the updated rows with the transaction id of the update transaction when the rows are updated. Another method that has been used is to order the table scan so that updates to required data is read before the data that is required is read. This would require that the process know what order the tables are to be scanned.

Other data should also be placed in the update file. The schema version of the database should be provided. This is to insure that the data is compatible with the other databases in the enterprise. It is hoped that the application that uses the database has a method for reading the schema version from the database that I have found to be the rule. If the application supports customization of the schema, the user-defined portion of the schema version must be maintained in a consistent manner.

The update should also indicate what the latest update was to the local database. This is to create an update that will synchronize the local database with the central database without having row updates sent that were already submitted.

Update Submission

This is the second step in the synchronization process. This step involves sending the update to the central database server and then notifying the central database server that the update is there.

Using ftp to send the data to the central database server and then connecting to the server through a known socket and sending a message with the filename of the update could do this. It is advised that the enterprise use the trusted host facility of TCP/IP to insure that only those systems that are part of the enterprise be allowed to connect to the ftp directory and the communications socket. The filename should be made up of the hostname and an update identifier from the remote system to insure that the remote systems will not overwrite their own or other remote systems updates.

Central Database Updates

The update process on the central database server opens the specified update file and verifies that the schema version of the remote database is compatible with the schema version of the central database. If this is not the case, the update process will notify the remote system and close the connection. The remote synchronization process should alert its manager that the database schema versions are incompatible. In this event, the manager should determine if any application and database updates are required.

The update process will now apply the updates to the central database. The updates will only be completed if the primary key value and the row version of the update record and the record in the database are the same. Note that these updates will create updates in the database update table or mark the rows as being updated. The updated rows will also be provided with a new row version.

Completed Update Collection

The update process will now use the last update information from the remote database update file and collect the pertinent rows from the database. As with the remote database update collection, this is done by reading the database update table, the database update table and the updated data from the database or by scanning the database.

Update Reception

The file is now sent to the remote database system using either a known directory or by reading the location from the socket or the update file. When transmission is complete, the remote system is notified through the socket that the update is available and what the file name is. The socket can now be closed.

Remote Data Update

The synchronization process applies the updates to the remote database. When this is done, the update table in the database is cleared of the updates that were sent to the central database.

Collision Detection

A collision would occur when to databases attempt to update the same row in the database. The principle that is applied is that the first update to the central database will win and super-cede later updates to that row if they don’t have the proper row version. The alternative is called Last in Wins. The problem with last in wins is that old data can be applied to the central database losing valid updates. Consider the case of the laptop user. An FST is working on a trouble ticket and goes on vacation but doesn’t run the synchronization process on the laptop. Another FST takes over the call, resolves the issue and closes the trouble ticket.
The first FST returns from vacation, performs the synchronization process and likely will overwrite the second FST’s data. The trouble ticket could be re-opened in the database.

When the update is applied to the central database, the row version of the row submitted by the remote database must be the same as the row version in the central database. Two databases that synchronized at similar times would have the same row version in their database. The first update to the central database would update the row version when it is applied and the second update would fail. The user that applied the failed update should be notified that the update failed and be provided with a mechanism to reapply the update.

Collision detection can occur at two places in the process. The first would be at the central database update. If an update fails this would be detected by the number of rows updated being zero, the update process could format a collision record for the completed update file or when the updates are applied to the remote database, the updates can be checked to determine if there was a collision. The update from the central database would be applied to the remote database and the update record would be flagged as being a collision or the original record would be moved to a collision table.

Delete Processing

Fortunately, most applications do not allow the deletion of data from the database. If deletes are allowed, the processing to support deletes can be very tricky.

The first concern is what to do if the delete fails. If the application does not perform cascaded deletes, then the row that would have been deleted would be reapplied to the remote database. This should be considered a collision and has to be detected but the remote database would be complete.

The real problem occurs when cascaded deletes are supported. In this situation, the user deletes a row from the database and all of the related rows are deleted. In the CRM space a trouble ticket would likely have several status change notes, user notes and other rows that are related back to the trouble call. If a user deletes the trouble call, the cascading delete will also delete these other records. In this situation, the update file will contain a delete. When the central database is updated, this delete will delete the trouble call and all of the related records. Consider if a user on another remote database updated one of the related records.
When the parent is deleted, that update is deleted which violates the principle of First in Wins.

The solution is to record the primary key and row version values of the parent and related rows in the remote database and verify that all these rows exist at the specified row version and that there are not other related rows in the central database. If this is not the case, the remote database has to be notified that the delete failed and the data restored in the database. This requires that the rows either be marked as having been deleted without actually deleting them or saving the data in a pending delete table. Marking the rows as having been deleted would be easier for the update processing, but the application may have to be modified to not display these records.

The other concern is when a remote database updates a record. The update in the central database will fail with no records updated or a referential integrity constraint violation. The remote database should then receive the deletes in it’s next update file from the central database but also needs to know that the update failed so that the offending row can be deleted and a collision detected.

Larry Wakeman - Consulting and Hosting