Hi folks, my name is Umair Khan. I am on the Configuration Manager support team here at Microsoft and I wanted to take a minute to go through transaction based replication between sites in Configuration Manager 2007. Keep in mind that what I’ve written below assumes that you already know the basic flow of Site to Site communication so you may want to review this before reading further:
There are two types of replication where we have the site to site communication in System Center Configuration Manager 2007 (ConfigMgr). The first is transaction based replication and the other is non-transaction based replication. This replication provides a method so that every site has the latest data and the receiving site has the capability to reject the old data.
1. Transaction based replication
For this type of replication, Replication manager creates a .RPT file in the replmgr\ready.
Objects that have transaction based replication:
Section name as in sitecode.TRS file
Site control files
SDM Package Information
Individual CI information
2. Non Transaction based replication.
For this type of replication, Replication manager creates a .RPL file in the replmgr\ready. Items not listed in the transaction based table above use non-transaction based replication.
Basic Flow of Transaction based Replication:
Every site has a replication ID located in the registry at the following location:
When creating a replication job, the sending site will mark it with a number based on the replication ID and the replication ID is incremented by 1.
When arriving at the receiving site, the receiving site performs a verification check by verifying the number attached to the file with the number that is locally stored in the replmgr\history known as sitecode.TRS. · If the number is less than the number stored in the sitecode.TRS file for the corresponding object then the file is rejected.
A sample rejection message from the replmgr.log on the receiving site would look something like this:
Replication file C:\Program Files\Microsoft Configuration Manager\inboxes\replmgr.box\incoming\1goq1xsi.RPT has an old transaction ID (Object Type = SITECTRLCT1SRCSITE, Object ID = CS1, Transaction ID = 1112), the current transaction ID is 1113, delete it.
Here the data in the TRS file is 1113 which is greater than the number attached with the file (e.g. 1112), hence the file is rejected.
I am taking an example of the hierarchy as shown below (which is also the hierarchy in my lab):
Assuming this type of hierarchy means the following:
Files in the replmgr.box\history
CS1.TRS , SS1.TRS
I am making a change in the properties of the child primary (PS1) site from the central site console (say by adding a comment).
1. An .RPT file will be created at the CS1 site in the replmgr\ready folder and will be attached a number mentioned in the registry at HKLM\Software\Microsoft\SMS\Components\SMS_REPLICATION_MANAGER\Transaction ID.
2. It will follow the normal site to site process and the despooler will receive the .PCK (don’t confuse this one with the package PCK as every file information file is converted to a .PCK at the receiving site) and .SNI file and verify the signature, then move it to replmgr\incoming as a .RPT file.
3. The replmgr of the PS1 site will compare the number attached to file with the CS1.TRS file located in the replmgr.box\history folder. As this is a site control file replication, the section that will be checked in the CS1.TRS file is [SITECTRLCT1SRCSITE]:
This is the snapshot of the CS1.TRS file in the PS1 site. So if the received file has an attached value that is less than 1112 it will be rejected. Also notice the scroll bar for this file to get an idea as to the size.
To verify that we have the latest number updated in the CS1.TRS file of the PS1 site, we can check the replication id registry (at the path mentioned previously) in the CS1 site. A snapshot of this is shown below:
We see that the registry has the number 1113 (Incremented by 1) at the CS1 site. This also means that the next transaction based replication to any other sites will contain the number 1113.
Important note: Taking this example of the hierarchy, we cannot have a number for any stored object in the CS1.TRS file for the PS1 site that is greater than the number 1113 (assuming the site is working of course).
Format of a .TRS File
A .TRS file contains many sections. The basic format is shown below:
Here, the number that is assigned to each of the objects is the replication ID registry number of the sending site at the time when the object was last replicated (the last modified change in the object).
As an Example of a CS1.TRS file from the PS1 site:
Considering an example from the [CIOBJHANDLER] section, we can say that the last time the object BB727F46-94A3-4D23-9704-DDE66A3674BB replicated to the PS1 site was when the Replication ID of the CS1 site was 163.
Note: All questions will assume the previously mentioned hierarchy for a sense of better understanding.
1. Is it necessary to have the .TRS file in the receiving site?
· Yes. It’s required for storing the serial number data for the objects when they are received from the sender site.
2. What if I delete the CS1.TRS from the PS1 site? Will the site to site replication from CS1 to PS1 be broken or will the CS1.TRS file will be recreated? If it’s recreated then from where and how?
· Yes, A new CS1.TRS file will be created in the PS1 site but will contain only the new data that is being replicated from now on. For example, my earlier CS1.TRS file at the PS1 site was about 4MB since it contained information about all the packages, CIs and advertisements, however the newly created file is only a few bytes containing info about the section [SITECTRLCT1SRCSITE] as it was site control file replication.
The first thing to note is that the CS1.TRS file on the PS1 site grows in size gradually as the time passes. This is because of the changes made in the CS1 site like packages created, advertisements, CIs synched etc.
Whenever any object is replicated from CS1, it will be given the replication ID of the CS1 site at the time the object was replicated, meaning that at any point in time the serial numbers for any stored object in the CS1.TRS file will contain various serial numbers depending upon the time they were synched but the largest of all the numbers will always be less than the Replication ID at the central site.
3. What are the consequences of deleting the CS1.TRS file from the PS1 site?
· After deleting a CS1.TRS file on the PS1 site server, the first transaction-based replication job received from the CS1 site server will be processed and the transaction ID will be recorded to a new CS1.TRS file in the PS1 site. This might result in replication jobs being processed out of order, as the processed replication job may or may not be the first job sent to the PS1 site server after the file was deleted (As we do not have the information to compare the latest ID). If replication jobs are processed out of order, valid replication jobs can be discarded.
4. So if the new CS1.TRS file contains only the new information, but what about the previous information? How will the other objects be checked when they are replicated?
· The answer is we have lost the data. But this will not affect the flow of communication between the sites. As explained in the previous question, when we have an object that is not present in CS1.TRS, it will be inserted with the latest Replication ID that it was attached and there will be no verification checks as there is no data in the CS1.TRS file. Only after if the same object is replicated again (Object changes in the CS1 site) will we have the data in the CS1.TRS file where it will check the attached number with the value in the CS1.TRS site and if greater then will accept the file and update the CS1.TRS with the latest value. This means that even if the file is deleted, it will gradually grow as the objects are replicated down.
5. I have restored my PS1 site and now I find that the files are rejected by the CS1 site and the SS1 site. What can I do to make the PS1 site work again?
· The first thing to note is that if I restore the PS1 site, the incoming connections to the PS1 site will not be rejected as the values of the objects in the CS1.TRS file or SS1.TRS will still be older than the current replication IDs registry value for the CS1 and the SS1 site. The communication that would be hampered is the outbound connections (e.g. connections to the CS1 and the SS1 site). This is because the objects stored in the PS1.TRS file in the CS1 site and the SS1 site will have a higher value than the current Replication ID registry of the PS1 site (as it was restored). Thus, any files replicated from the PS1 site to the CS1 or SS1 will be rejected.
To resolve this situation we can do two things:
Delete the PS1.TRS files from all of the sites it’s connected to. Here the connected sites are CS1 and SS1. Now the questions come as to how it fixes the problem. Well as the PS1.TRS file is deleted from both the CS1 and SS1 sites, this will result in a new PS1.TRS file getting created on both the sites. And whenever an object is replicated to CS1, SS1 from the PS1 site it will create an entry with the current entry of the replication ID and verification checks will only be made after the second replication of the same object.
Why this is not a good solution?
We already know the consequences of deleting the .TRS file as explained in question 2. Also, if the PS1 site has many secondary sites then it’s tedious to go to every site and delete the PS1.TRS file from the replmgr\history folder.
Change the Replication ID registry in the PS1 site to a value that is higher than any value for any object in the PS1.TRS of any sites connected to the PS1 site. Changing the value to make it larger will cause new objects to have a value that is greater than the value stored in the PS1.TRS file for any sites (CS1, SS1).
But here comes the million dollar question: How will I know which number to select that would be greater than any stored value for any objects in the PS1.TRS files for the CS1 and the SS1 sites?. Also, it becomes undoable if the number of sites is more. You can’t open each PS1.TRS and look for the highest number.
So what to do in such scenario? Typical methods to arrive at a number to change this value in the registry are as follows:
When recovering sites restored with an SMS Backup or ConfigMgr backup, multiply the number of days the site was down times 1,000, and then add this to the current value of the Transaction ID.
When you recover sites without an SMS Backup or ConfigMgr Backup, if there are only a few sites in the hierarchy, open the \SMS\Inboxes\Replmgr.box\history\sitecode.trs files on the recovered site's parent and all child sites and look for the highest number. Add 20 to this number. If there are many sites in the hierarchy, open a 5 to 10 percent random sample of \SMS\Inboxes\Replmgr.box\history\ sitecode.trs files on other sites, looking for the highest number. Use both parent and child sites as references. Make sure that your sample sites have connected to the recovering site as recently as any other sites; check the client agent time in the resource explorer on these computers, as they are also clients. The highest number found in the .TRS files should be doubled or increased by 1 million, whichever makes a smaller increase.
Reference for questions 3 and 5:
Hope this helps!
Umair Khan | System Center Configuration Manager Support Engineer
App-V Team blog: http://blogs.technet.com/appv/
ConfigMgr Support Team blog: http://blogs.technet.com/configurationmgr/
DPM Team blog: http://blogs.technet.com/dpm/
MED-V Team blog: http://blogs.technet.com/medv/
Orchestrator Support Team blog: http://blogs.technet.com/b/orchestrator/
Operations Manager Team blog: http://blogs.technet.com/momteam/
SCVMM Team blog: http://blogs.technet.com/scvmm
Server App-V Team blog: http://blogs.technet.com/b/serverappv
Service Manager Team blog: http://blogs.technet.com/b/servicemanager
System Center Essentials Team blog: http://blogs.technet.com/b/systemcenteressentials
WSUS Support Team blog: http://blogs.technet.com/sus/
The Forefront Server Protection blog: http://blogs.technet.com/b/fss/
The Forefront Endpoint Security blog : http://blogs.technet.com/b/clientsecurity/
The Forefront Identity Manager blog : http://blogs.msdn.com/b/ms-identity-support/
The Forefront TMG blog: http://blogs.technet.com/b/isablog/
The Forefront UAG blog: http://blogs.technet.com/b/edgeaccessblog/