Measuring Transactional Replication Latency Without Tracer Tokens
SQL Server 2005 introduced Tracer Tokens (SQL 2005 | SQL 2008), a new methodology for programmatically measuring replication latency in transactional replication. To measure latency with a tracer token, you simply insert a tracer token at the publisher. The replication process will trace the token as it moves through the steps of the process and report back how long it took for the token to reach the distributor and the subscriber.
Sounds great, right? Do you see the problem I see?
What’s wrong with tracer tokens?
There is an inherent problem with tracer tokens. In order to get the results back, replication has to be working well, or you have to wait a long time. For example, if the replication subscriber is 4 hours behind guess how long it will take for the tracer token to get from the distributor to the subscriber.
You guessed it: 4 hours!! If replication is latent, and you need to determine by how much, can you really afford to wait for 4 hours to find out? So tracer tokens aren’t really very effective when you need them most.
It’s All in the Distributor
There is no great alternative for tracking latency from Publisher to Distributor because you’d have to be able to read the transaction log to determine if there are transactions that have not been read by the log reader. You can calculate replication latency from Distributor to all Subscribers by querying the tracking tables in the distribution database.
The simplest way to check latency is simply to let the replication monitor procedures do the work for you and then pull the data from the replication monitor tables. To be sure that we’re getting current data, we must first refresh the monitor data by executing sp_replmonitorrefreshjob.
The Query
Just modify the query below to set your publisher server and publisher database and execute the query in the distribution database.
Declare @Publisher sysname, @PublisherDB sysname; -- Set Publisher server name Set @Publisher = 'Publisher server name'; -- Set Publisher database name Set @PublisherDB = 'Publisher database name'; -- Refresh replication monitor data Exec sys.sp_replmonitorrefreshjob @iterations = 1; With MaxXact (ServerName, PublisherDBID, XactSeqNo) As (Select S.name, DA.publisher_database_id, max(H.xact_seqno) From dbo.MSdistribution_history H with(nolock) Inner Join dbo.MSdistribution_agents DA with(nolock) On DA.id = H.agent_id Inner Join master.sys.servers S with(nolock) On S.server_id = DA.subscriber_id Where DA.publisher_db = @PublisherDB Group By S.name, DA.publisher_database_id) , OldestXact (ServerName, OldestEntryTime) As (Select MX.ServerName, Min(entry_time) From dbo.msrepl_transactions T with(nolock) Inner Join MaxXact MX On MX.XactSeqNo < T.xact_seqno And MX.PublisherDBID = T.publisher_database_id Group By MX.ServerName) Select [Replication Status] = Case MD.status When 1 Then 'Started' When 2 Then 'Succeeded' When 3 Then 'In progress' When 4 Then 'Idle' When 5 Then 'Retrying' When 6 Then 'Failed' End, Subscriber = SubString(MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 4, Charindex('-', MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 5) - (Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 4)), [Subscriber DB] = A.subscriber_db, [Publisher DB] = MD.publisher_db, Publisher = MD.publisher, [Current Latency (sec)] = MD.cur_latency, [Current Latency (hh:mm:ss)] = Right('00' + Cast(MD.cur_latency/3600 As varchar), 2) + ':' + Right('00' + Cast((MD.cur_latency%3600)/60 As varchar), 2) + ':' + Right('00' + Cast(MD.cur_latency%60 As varchar), 2), [Latency Threshold (min)] = Cast(T.value As Int), [Agent Last Stopped (sec)] = DateDiff(hour, agentstoptime, getdate()) - 1, [Agent Last Sync] = MD.last_distsync, [Last Entry TimeStamp] = OX.OldestEntryTime From dbo.MSreplication_monitordata MD with(nolock) Inner Join dbo.MSdistribution_agents A with(nolock) On A.id = MD.agent_id Inner Join dbo.MSpublicationthresholds T with(nolock) On T.publication_id = MD.publication_id And T.metric_id = 2 -- Latency Inner Join OldestXact OX On OX.ServerName = SubString(MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 4, Charindex('-', MD.agent_name, Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 5) - (Len(MD.publisher) + Len(MD.publisher_db) + Len(MD.publication) + 4)) Where MD.publisher = @Publisher And MD.publisher_db = @PublisherDB And MD.publication_type = 0 -- 0 = Transactional publication And MD.agent_type = 3; -- 3 = distribution agent
SQLDBA2013
Code returns zero data!
SQLDBA2013
Code returns nothing!
SQLSoldier
Can you provide some details of how you are running it?
sqldba2013
I changed the 2 parameters, and executed using the distribution database. Additional comments in your code would be very helpful. I am running on sql 2008 and 2012. 35 instances total, and multiple transactional replications.
SQLSoldier
I’ve been meaning to revisit this query for a long time to make it usable for more than just transactional replication and the distribution agent. I will comment it better when I do that.
sanjaykumar.rajarao
code returned no data.
— Set Publisher server name
Set @Publisher = select @@servername;
— Set Publisher database name
Set @PublisherDB = ‘distribution’;
SQLSoldier
Are you sure that the Publisher DB is named distribution? That’s generally the name of the distributor DB, not the publisher DB.
Taiob
Running on 2008 R2.
Confirmed both servername and dbname is correct.
I know current latency is few hours (it is on purpose, we call it managed replication).
But shows no data and getting this error:
Msg 537, Level 16, State 3, Line 12
Invalid length parameter passed to the LEFT or SUBSTRING function.
SQLSoldier
Can you tell me what the values in the agent_name column in dbo.MSreplication_monitordata is?
Taiob
I am sorry those names have servername that I cannot send because comments publish automatically in on your blog.
I can send to you on your personal email if you send me.
I think the problem is here:Len(MD.publication)
Publication name is not included in the agent name where agent type is 3.
Taiob
I think your code is written for individual agent (one distribution agent for each publication). We use shared agent. One distribution agent per publisher and subscriber.
SQLSoldier
Yes. I was sharing a script I wrote for a specific situation. Was never meant to be a fully generic script that would work for all environments. It’s just not possible to think of every possible scenario and test it.
Taiob
So I modified the code to give the correct subscriber name still no record.
SubString(MD.agent_name,
Len(MD.publisher) + Len(MD.publisher_db) + 3,
Charindex(‘-‘, MD.agent_name,
Len(MD.publisher) + Len(MD.publisher_db) + 4) –
(Len(MD.publisher) + Len(MD.publisher_db) + 3))
SQLThinker
Thanks for the script. I was having the same error “Invalid length parameter passed to the LEFT or SUBSTRING function”. I believe it happened because my server names have dashes “-” and so my publication names. The names of my publications are big so they also get truncated in the “agent_name” column of MSreplication_monitordata. Below are some suggested changes to better find the subscriber server. I am also displaying the publication name since I have multiple publication for the same database.
Declare @Publisher sysname,
@PublisherDB sysname;
— Set Publisher server name
Set @Publisher = ‘Publisher server name’;
— Set Publisher database name
Set @PublisherDB = ‘Publisher database name’;
— Refresh replication monitor data
Exec sys.sp_replmonitorrefreshjob @iterations = 1;
With MaxXact (ServerName, PublisherDBID, XactSeqNo)
As (Select S.name, DA.publisher_database_id, max(H.xact_seqno)
From dbo.MSdistribution_history H with(nolock)
Inner Join dbo.MSdistribution_agents DA with(nolock)
On DA.id = H.agent_id
Inner Join master.sys.servers S with(nolock)
On S.server_id = DA.subscriber_id
Where DA.publisher_db = @PublisherDB
Group By S.name, DA.publisher_database_id)
, OldestXact (ServerName, OldestEntryTime)
As (Select MX.ServerName, Min(entry_time)
From dbo.msrepl_transactions T with(nolock)
Inner Join MaxXact MX
On MX.XactSeqNo <= T.xact_seqno
And MX.PublisherDBID = T.publisher_database_id
Group By MX.ServerName)
Select [Replication Status] = Case MD.status
When 1 Then 'Started'
When 2 Then 'Succeeded'
When 3 Then 'In progress'
When 4 Then 'Idle'
When 5 Then 'Retrying'
When 6 Then 'Failed'
End,
Subscriber = S.srvname,
[Subscriber DB] = A.subscriber_db,
[Publisher DB] = MD.publisher_db,
Publisher = MD.publisher,
A.Publication,
[Current Latency (sec)] = MD.cur_latency,
[Current Latency (hh:mm:ss)] = Right('00' + Cast(MD.cur_latency/3600 As varchar), 2) +
':' + Right('00' + Cast((MD.cur_latency%3600)/60 As varchar), 2) +
':' + Right('00' + Cast(MD.cur_latency%60 As varchar), 2),
[Latency Threshold (min)] = Cast(T.value As Int),
[Agent Last Stopped (sec)] = DateDiff(hour, agentstoptime, getdate()) – 1,
[Agent Last Sync] = MD.last_distsync,
[Last Entry TimeStamp] = OX.OldestEntryTime
From dbo.MSreplication_monitordata MD with(nolock)
Inner Join dbo.MSdistribution_agents A with(nolock) On A.id = MD.agent_id
Inner Join dbo.MSpublicationthresholds T with(nolock) On T.publication_id = MD.publication_id And T.metric_id = 2 — Latency
Inner Join master.dbo.sysservers S ON S.srvid = A.subscriber_id
Left Join OldestXact OX On OX.ServerName = S.srvname
Where MD.publisher = @Publisher
And MD.publisher_db = @PublisherDB
And MD.publication_type = 0 — 0 = Transactional publication
And MD.agent_type = 3; — 3 = distribution agent
SQLSoldier
Thanks for sharing your script. I’m sure there are many other ways this script could go wrong.
Taiob
SQLThinker thank you for the post. It works like a charm.
I suggest add this piece in your select statement so you have the publication names too.
Publication=MD.publication