Validating Replicated Data

Problems encountered during replication often occur because data at the Subscriber is not in synchronization with data at the Publisher. Microsoft® SQL Server™ 2000 replication can validate the replicated data at a Subscriber as the replication process is occurring to ensure that data at the Subscriber matches data at the Publisher.

You do not need to stop updates to the Publisher and wait for the Subscriber to become fully synchronized before testing that data has been received and applied correctly. You can validate the data in snapshot replication, transactional replication, or merge replication. Validation can be performed for specific subscriptions or for all subscriptions to a publication.

How Inline Data Validation Works

SQL Server validates data by calculating a rowcount and/or a checksum at the Publisher and then comparing those values to the rowcount and/or checksum calculated at the Subscriber. One value is calculated for the entire publication table and one value is calculated for the entire subscription table, but data in text or image columns is not included in the calculations.

While the calculations are performed, shared locks are placed temporarily on tables for which rowcounts or checksums are being run, but the calculations are completed quickly and the shared locks removed, usually in a matter of seconds.

When validating replicated data, consider the following:

Is the fact that validation failed really a problem? Some validation failures are explainable, and you may not want to reinitialize.
If the validation failure is an issue, consider the different options for synchronizing the data, including a full reinitialization, a partial reinitialization a previous state, or manually updating the data so that it is synchronized.

Validating Replicated Data for Transactional Replication

Validation can be performed on transactional replication, subscriptions that use immediate updating or queued updating, and on horizontal and vertical partitions of data.

You can choose any of the following methods for validation:

Rowcount only.
Rowcount and checksum.
Rowcount and binary checksum (this is available only for Subscribers running Microsoft SQL Server 2000).

You can configure validation using SQL Server Enterprise Manager or Transact-SQL system stored procedures. Regardless of which you use, when you run validation, stored procedures are executed at the Publisher. The stored procedure sp_publication_validation calls sp_article_validation for each article that is being validated, and sp_article_validation calls sp_table_validation for each table, which then generates the rowcount or checksum calculations. The sp_table_validation command is posted as a replication command to the Subscriber using the Log Reader Agent and Distribution Agent, and the calculations are then made at the Subscriber.

Note Subscribers running SQL Server 6.5 can use rowcount only validation, but not checksum validation. You can validate based on a binary checksum calculation if Subscribers are running SQL Server 2000.

You can validate replicated data on a schedule by creating a Transact-SQL job that calls sp_publication_validation or sp_article_validation.

Unless you are a member of the sysadmin or db_owner roles, you must have SELECT permissions on all columns of the base table used in the article (even if the article is vertically partitioned) in order to execute sp_publication_validation.

Validation with Checksums

When checksums are used, 32-bit redundancy check (CRC) occurs on a column-by-column basis rather than a CRC on the physical row on the data page. This allows the columns with the table to be in any order physically on the data page, but still compute to the same CRC for the row. Checksum validation can be used when there are row (horizontal) or column (vertical) filters on the publication. Because checksums can require large amounts of processor resources when validating a large data set, you may want to schedule validation to occur when there is the least activity on the servers used in replication.

Subscribers running SQL Server 7.0 use the checksum routines released in SQL Server 7.0, which generate CRC values that are different than those generated with SQL Server 2000. The checksum routines released in SQL Server 7.0 cannot validate vertical partitions, or logical table structures where column offsets differ (due to ALTER TABLE statements that DROP and ADD columns).

Setting the Rowcount_only Parameter

The @rowcount_only parameter is a smallint and accepts the following values.

Value	Description
0	Execute checksum functionality released with SQL Server 7.0.
1 (Default)	Execute a rowcount check only.
2	Execute checksum functionality released with SQL Server 2000.

Because Subscribers running SQL Server 7.0 will use this parameter as a bit data type, not a smallint, SQL Server will interpret the parameter as 'ON'. Setting the parameter to a value of 2 with a Subscriber running SQL Server 7.0 will result in a rowcount only validation at the Subscriber. If you need to run a checksum validation for a Subscriber running SQL Server 7.0, use the value of 0 for this parameter. Subscribers running SQL Server 2000 could use the same value (0), but the checksum functionality would have the SQL Server 7.0 limitations.

To validate transactional data using SQL Server Enterprise Manager

At the Distributor, expand Replication Monitor, expand Publishers, and then expand a specific Publisher.
Right-click a transactional publication, and then click Validate subscriptions.
Choose whether you want to validate all subscriptions or just specific subscriptions, and if you want to validate specific subscriptions, select those in the text box.
To choose the type of validation, click Validation Options.
Choose whether you want to compute a fast rowcount based on cached table information, compute an actual row count by querying the tables directly, or compute a fast row count and if differences are found, compute an actual row count.
You can also choose to enable Compare checksums to validate data, a binary checksum (if the Subscriber is running SQL Server 2000), and you can choose to stop the Distribution Agent after the validation has completed.

To validate transactional data using Transact-SQL system stored procedures

To validate all articles in a publication and specify rowcount only (the default) or checksum validation, execute sp_publication_validation. This will call sp_article_validation for each article in the publication.
To validate specific articles and specify rowcount only or checksum validation, execute sp_article_validation.

Validation and Immediate Updating

When using inline publication validation (sp_publication_validation) on immediate updating subscriptions, there is a period of time when a change on the Subscriber will cause the publication validation to fail. This occurs when a data change is made on the Subscriber after a publication validation has been run on the Publisher, but before the publication validation can be performed on the Subscriber.

With transactional replication (without updatable subscriptions), changes can be made only at the Publisher, so changes made to the Publisher after sp_publication_validation has been executed will be applied at the Subscriber after the validation is run on the Subscriber.

However, when using immediate updating subscriptions, data modifications can be made at the Subscriber. Any changes made at the Subscriber after validation was run on the Publisher are reflected immediately at the Subscriber. Validation will fail because the checksum and rowcount calculations were based on data in the publication table before changes were made at the Subscriber. To avoid this, stop all data modifications at the Subscriber during the validation process.

Considerations when Validating Replicated Data for Transactional Replication

The following are validation restrictions when using validation for transactional replication:

Checksum validations are not supported for transformable subscriptions because values are likely to be transformed between Publisher and Subscriber and checksum values would not be the same.
Rowcount validation is not supported for an article that is configured as a DTS horizontal partition because the filter criteria is saved as part of a DTS package, not in a view at the Publisher like replication filters.
Validation for replicated data to heterogeneous Subscribers is not supported.

Validation Failure and Alerts

If validation between data at the Publisher and data at the Subscriber fails, you can configure replication alerts to notify you of the failure (with a message sent through e-mail or to a pager) and you can have the subscriptions reinitialized automatically.

To configure automatic reinitialization of subscriptions that fail validation

At the Distributor, expand Replication Monitor, click Replication Alerts, right-click the Replication: Subscription has failed data validation alert, and then click Properties.
On the General tab, select the Enabled check box.
On the Response tab, select Execute job, and then in the drop down box, click Reinitialize subscriptions having data validation failures.
To send a reinitialize confirmation message to the event log, right-click the Replication: Subscription reinitialized after validation failure alert, and click Properties.
On the General tab, select the Enabled check box.

Validating Replicated Data for Merge Replication

Using SQL Server Enterprise Manager, you can choose to validate all subscriptions to a merge publication. Using Transact-SQL system stored procedures, you can validate all subscriptions to a merge publication or specified subscriptions.

You can choose any of the following methods for validation:

Rowcount only
Checksum

To request validation of replicated data at a merge Subscriber, you can use:

SQL Server Enterprise Manager, which allows you to validate all subscriptions to a publication.
The Merge Agent command line or the Merge Agent Command Prompt Utility specifying the –Validate parameter. If the Merge Agent is running in continuous mode, the –Validate parameter run at an agent command prompt will conduct validation until the -ValidationInterval value is reached. Validation will occur after the merge process is complete.
The sp_validatemergepublication Transact-SQL system stored procedure. This will conduct a publication-wide validation for which all subscriptions (push, pull, and anonymous) will be validated once each.
The sp_validatemergesubscription Transact-SQL system stored procedure, which runs validation once on the Merge Agent for the specified subscription.
Running the Merge Agent with the -Validate parameter causes SQL Server to temporarily lock the Subscriber tables to prevent further changes. SQL Server then computes either a rowcount or checksum of each replicated table at the Subscriber and at the Publisher. If there is a difference, SQL Server locks the discrepant table at the Publisher and any new data changes are downloaded to the Subscriber. After downloading is complete, SQL Server recalculates the rowcount or checksum at the Subscriber and Publisher and compares them again. After validation is complete, SQL Server removes all locks on Subscriber and Publisher tables.

You can validate your data on a regular schedule by adding -Validate to the Merge Agent profile at a specified time. Because inline validation may be time-consuming or may result in undesirable contention between the Publisher and Subscriber, you should schedule validation for a time when Publisher and Subscriber activity is at a minimum.
In case of merge validation failure, you can respond to the failure by using SQL Server Enterprise Manager to configure the replication alert named Replication: Subscriber has failed data validation so that you are notified of the failure or you can reinitialize the subscription to ensure that data at the Subscriber is in synchronization with data at the Publisher. Reinitializing the subscription should be performed with caution because it can be a resource-intensive process for the Publisher, Distributor and Subscribers, and users may not be able to update data while the initial snapshot is reapplied at Subscribers.

When validating merge replication, another option is to validate data, and if data is not converged, to conduct a partial reinitialization of the subscription. This partial reinitialization will return the Subscriber back to a previous state when data was in synchronization. Using the Validate and Resynchronize Subscription option in SQL Server Enterprise Manager or sp_resyncmergesubscription, you can resynchronize a merge subscription to a known validation state that you specify. This allows you to force convergence or synchronize the subscription database to a specific point in time, such as the last time there was a successful validation, or to a specified date. When resynchronizing a subscription using this method, the snapshot is not reapplied.

To validate merge data using SQL Server Enterprise Manager

At the Distributor, expand Replication Monitor, expand Publishers, and then expand a specific Publisher.
Right-click a merge publication, and then click Validate all subscriptions.
Choose whether you want to validate replicated data using rowcounts only, rowcounts and checksums, or rowcounts and comparing binary checksums (all Subscribers must be running SQL Server 2000 to use this option). Validation will occur the next time the Merge Agent runs with results displayed in Replication Monitor.

To validate and resynchronize subscriptions

Expand Replication Monitor, expand the Publishers folder, and then expand a registered Publisher. Right-click a publication, and then click Validate and Resynchronize Subscriptions.
Execute sp_resyncmergesubscription at the Publisher on the publication database or at the Subscriber on the subscription database.

To validate merge data using Transact-SQL system stored procedures

To mark all named and anonymous subscriptions for validation the next time the Merge Agent runs, execute sp_validatemergepublication.
To mark specific subscriptions for validation, execute sp_validatemergesubscription or sp_validatemergepullsubscription.

To validate merge data using a Merge Agent command line parameter

At the Distributor, expand Replication Monitor, click the Merge Agents folder, right-click an agent, and then click Agent Properties.
On the Steps tab, double click the Run agent step.
In the command text box, type –validate and specify 1 for rowcount-only validation, 2 for rowcount and checksum validation. Validation will occur the next time the Merge Agent runs and success or failure messages are logged in the Merge Agent History.
If you want to schedule validation, set the –ValidateInterval parameter on the Merge Agent command line to the number of minutes when you want the validation to occur (the default is to validate every 60 minutes).