~~Title: Replication Concepts~~
~~NOTOC~~

<html><font color=#990000 size="+2"><b>Change Capture + Replicatoin Concepts</b></font></html>

In database parlance, [[wp> Change_data_capture|change data capture]] (CDC) is a set of software design patterns used to determine (and track)  data that has changed so that action can be taken using the changed 

Most database management systems manage a transaction log that records changes made to the database contents and to metadata. By scanning and interpreting the contents of the database transaction log one can capture the changes made to the database in a non-intrusive manner.

=== Transaction Logging ===

Using transaction logs for change data capture offers a challenge in that the structure, contents and use of a transaction log is specific to a database management system. Unlike data access, no standard exists for transaction logs. Most database management systems do not document the internal format of their transaction logs, although some provide programmatic interfaces to their transaction logs (for example: Oracle, DB2, SQL/MP, SQL/MX and SQL Server 2008).

Other challenges in using transaction logs for change data capture include:

Coordinating the reading of the transaction logs and the archiving of log files (database management software typically archives log files off-line on a regular basis).
Translation between physical storage formats that are recorded in the transaction logs and the logical formats typically expected by database users (e.g., some transaction logs save only minimal buffer differences that are not directly useful for change consumers).
Dealing with changes to the format of the transaction logs between versions of the database management system.
Eliminating uncommitted changes that the database wrote to the transaction log and later rolled back.
Dealing with changes to the metadata of tables in the database.
CDC solutions based on transaction log files have distinct advantages that include:

minimal impact on the database (even more so if one uses log shipping to process the logs on a dedicated host).
no need for programmatic changes to the applications that use the database.
low latency in acquiring changes.
transactional integrity: log scanning can produce a change stream that replays the original transactions in the order they were committed. Such a change stream include changes made to all tables participating in the captured transaction.
no need to change the database schema

=== Change Capture Triggers ===

StreamScape’s Reactive Data Fabric™ does not make use of log-based change capture, preferring a low-impact combination of triggers and shared disk instead.  Changes are stored external to the source database. 

Log based change capture requires substantial processing power and often imposes additional limitations on source databases, such as restrictions on log truncation, keys or unique data values. Log replication can become a costly solution as it requires additional hardware and specialists to setup and manage the environment.

StreamScape makes use of specialized triggers that push data changes into external log files (Journal File Tables), offloading replication processing to the data broker and minimizing impact on source systems. Triggers support a much broader set of technologies than log reading and are much easier to setup.

Typically, database backups are used to store and retrieve historic information. A database backup is a security mechanism, more than an effective way to retrieve ready-to-use historic information.

A (full) database backup is only a snapshot of the data in specific points of time, so we could know the information of each snapshot, but we can know nothing between them. Information in database backups is discrete in time.

Using the log trigger the information we can know is not discrete but continuous, we can know the exact state of the information in any point of time, only limited to the granularity of time provided with the DATETIME data type of the RDBMS used.

Advantages
It is simple.
It is an extension of database cpabilities, it works with available features in common RDBMS.
It is automatic, once it is created, it works with no further human intervention.
It is not required to have good knowledge about the tables of the database, or the data model.
Changes in current programming are not required.
Changes in the current tables are not required, because log data of any table is stored in a different one.
It works for both programmed and ad hoc statements.
Only changes (INSERT, UPDATE and DELETE operations) are registered, so the growing rate of the history tables are proportional to the changes.
It is not necessary to apply the trigger to all the tables on database, it can be applied to certain tables, or certain columns of a table.

=== Journal File Tables ===

Journal Files Tables are log files that contain data for table change operations in transportable, platform-independent format. Journal File Tables are a critical component of StreamScape’s change capture mechanism. They can reside on the source or target server but exist outside of the database to ensure heterogeneity, simplicity and minimal impact to source data systems. 

Journal tables share data access and permissions.  They are opened for read and write simultaneously by the source system and by StreamScape’s data broker, allowing source systems to populate the contents in a transacted fashion using triggers.  A data fabric node continuously reads the change file and provides full SQL access to the transaction log content.  As such, users can configure Journal File Tables to be replication sources or write triggers that react to data changes that execute outside the source database. This architecture minimizes source system impact as there is no additional query or database I/O needed to enable the change capture process. 

Journal Tables move captured data to a replication queue managed by a broker node for delivery to target system(s).  In the event of an outage at the target the queue contains the most-recent data up to the point of the outage and will attempt re-delivery once target systems are online again.  Flexible recovery and restart options simplify exception handling automation.





