Our 3rd party monitoring solution collects blocking information, but not for system threads. There was no additional information available for this blocking incident, but I could see that the system thread was a background process with the command “UNKNOWN TOKEN” and was sitting in a wait type of “HADR_WORK_QUEUE”. It was clearly the worker thread for the AG of a specific database.
A little later, we had blocking again involving that same thread, but this time, the AG worker thread was blocking the log backup thread. Seemed logical that if the worker thread could block the log backup, then the log backup could have also blocked the worker thread, but still it did not make sense to me.
I did some checking of our log backup jobs to see what was going on and found the culprit. We had recently done some maintenance on this database that included taking it out of the AG. While it was out of the AG, we set up a log backup job to back up the log for this database on the primary server. Our normal log backup job only runs on the secondary server.
When the maintenance was done and the database added back in to the AG, the regular log backup job automatically picked it up and started backing up the log again. However, we forgot to disable the temporary log backup job we had created on the primary. The log backup job on one server was blocking the log backup job on the other server.
When you back up the log on the secondary, it has to be able to reach across the AG and update the log on the primary to indicate it can be reused, etc. This is the process that the AG worker thread was trying to do when it was either being blocked by or blocking the local log backup job.
Naturally, the fix was to delete the unneeded log backup job on the local server.
sqlproddba
Excellent info SQLSoldier!
SQLSoldier
Thank you!
Dueling Log Backup Jobs – Curated SQL
[…] Robert Davis ran into HADR_WORK_QUEUE waits recently: […]