SQL Server 2008 Transactional Replication

Article
03/26/2009

Here are some notes on “SQL Server 2008 Transactional Replication” I took while attending an advanced class on SQL Server taught by Greg Low (from https://sqlblog.com/blogs/greg_low/ and https://www.sqldownunder.com/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.

Transactional Replication

Agents: Snapshot, Log Reader (like CDC), Distribution
Incremental flow: Into distribution DB, reliable store/forward queue, minimal latency (seconds)
Can run continuously or scheduled intervals
Can have updatable subscriptions
Consider – Would asynchronous database mirroring fit the bill?
Components: Published, Distributor, Subscriber

Initial Dataset

Subscribers need to initialize: schema, tables
Created by snapshot agent, applied by distribution agent
Store in snapshot folder, can be done via backup/restore
Script it: look at the decisions the wizards are making on your behalf
Pay attention to the part about articles and options
Locking: places shared locks all tables published
Careful – Default location of snapshot folder is on C:SQLData. UNC path instead?

Log Reader

Runs on distributor
Runs continuously but can also be scheduled
Scans the transaction logs for changes (data, schema)
Only copies commited transactions
Calls sp_rpldone to mark where it’s up to
If it falls behind, your log will grow

Distribution

Runs on Distributor for push subscriptions, Subscriber for pull
Copies transactions from distribution database to subscriber

DDL replication

Introduced in SQL Server 2005
Supported: ALTER for TABLE, VIEW, PROCEDURE, FUNCTION, TRIGGER (no DDL)
Not supported: Adding/dropping/altering indexes, alter XML columns
Careful – Mixed versions of SQL Servers
Careful – Name your constraints to avoid differences between publisher and subscriber

Stored Procedure Execution Replication

You can publish a stored procedure
Can have a positive effect on performance
Scenario: Batch load
Scenario: Send specific changes as a stored procedure
Consider the intent. Maybe sue SQL Server Broker queues instead

Partitioning of Data

Transactional Replication works best where data is cleanly partitioned
Scenario: Point-of-sale data partitioned by store ID
Bi-directional options in particular benefit from this

Demo - Creating distributor

Use wizard, generate to script, look at script
Sp_adddistributor - @password
Sp_adddistributiondb - @max_distretention, @security_mode
Not a scary script at all

Demo - Creating new publication

Use wizard, transactional, select table, article properties
Article properties – What do you need to copy? Defaults may not work for you, like copy collation.
Table filtering, create immediately, schedule, agent security (credentials for snapshot and log reader agents)
Save script, look at the script
sp_addpublication has lots of options, some of them not in the wizard
sp_addpublication_snapshot
sp_addarticle, what does @schema_option mean?
See https://msdn.microsoft.com/en-us/library/ms147887.aspx
Careful – Identity columns – NOT FOR REPLICATION means “turn identity column into regular integer”
Careful – some data types are not supported
SQL Server 2000 – sp_repladdcolum, sp_repldropcolum – no longer needed in SQL Server 2005
Note – Where is the option to do peer-to-peer?

Demo – Publication Properties

Subscriptions, Articles
Snapshot – Run additional scripts
FTP Snapshot – configuration options – need credentials
Subscription options
- Allow peer-to-peer subscriptions?
- Conflict detection – Only in SQL Server 2008, define originator ID,
- Careful – Default for “peer-to-peer”, “continue after conflict” is false. Assumes partitioning will avoid conflict.
After that, “Configure Peer-to-Peer Topology” option appears
Configure Peer-to-Peer Topology – add additional nodes, how the agent connects
Careful – Use “refresh topology” on this window if it’s not updated

Updatable Subscriptions

Immediate is OK, queued is deprecated
Republishing is not supported
Adds msrepl_tran_version column to data
Not used so much

Considerations

Transaction log space – keeps stuff around for replication
Distribution db – consider retention period (3 days might not be enough)
Careful – Ensure regular snapshot creation, avoid having to retaining more data, longer initialize
Careful – Published table must have PK
Careful – Default trigger of XACT_ABORT can cause entire batch to be aborted (use skip errors?)
Careful – Avoid explicit transaction in triggers at subscribers
Partial updates on LOB columns are supported (fragment of column is replicated)
Careful – There are limitations around LOB column. Should you even do this?
Peer-to-peer – partition data
Conflict detection – in SQL Server 2008 – make sure it does not happen with partitioning
Thresholds and Alerts – Distributor alerts (expiration, re-initialization, failed data validation)

Backup

Needs special attention, cannot restore to alternate server
Careful – If you renamed the server, run sp_dropserver, sp_addserver
Backup publication, distribution, subscription (if can’t re-initalize), master, msdb
“Sync with backup” option – log not trunctated until backed up (distribution, publisher) – careful!
In restore, can use sp_replrestart to resynchronize with publisher and distribution metadata
If all else fail, tear down and reconfigure (specify that subscribers already have data)

Complex scenarios – In Books Online

Updating subscribers
Restore of distributor
Peer to Peer
Restore of Subscriber

Replication Scripting

It call all be scripted
Do it as part of DR planning, automating steps
More options than the GUI options, as mentioned
Use option to “generate scripts…” in the GUI

Replication Monitor

Helps with common questions:
Is it all healthy? Why is it slow? How far behind? Why agent not running?
Uses caching, there’s a SQL Agent job to refresh the data every few seconds (4-30 secs)

Performance

Baseline – latency, throughput, concurrency, sync duration, resources
Look at what it looks like when it’s running normally. Don’t wait until it’s slow.
Set expectations right. Know what you can get out of the system.

Validation

Row count only, row cont and binary checksum (SQL 7 only), row count and checksum
Must stop all update activity at subscriber, adds CPU load
Cannot validate with column filters, column offsets, LOBs
Careful – watch for false positives – timestamp turns into binary, for instance

SQL Server 2008 Transactional Replication

Additional resources