I worked with Vicky Harp (blog|@vickyharp), SQLdm dev lead and an excellent developer, on this issue. She asked me to validate whether they were calculating and interpreting the data correctly or if it was a bug in SQL Server.
In the beginning, all I had to go on were some numbers and anecdotal information being relayed from the product users. I devised a theory based on the minimal information I had on-hand. The theory did not pan out when trying to reproduce the issue on an internal server.
When I was able to reproduce the issue in a controlled environment, I noticed that sessions that worked with large objects (tables and indexes with greater than 128 extents) that get dropped, truncated, or rebuilt, as appropriate to the object. I also noticed that the Resource Monitor thread was showing no allocations and extremely high deallocations. When considering Resource Monitor deallocations and user allocations and deallocations as a whole, the numbers came pretty close to settling out. It appeared from my observations that deallocations were occurring, but they were being attributed to a system session.
This posed the question of, “why?”.
What’s Happening
So the answer was deferred drops. A deferred drop is when a two-phase drop is used for dropping large objects or a large number of pages. This optimization was introduced in SQL Server 2005 to speed up large deallocations.
Phase 1 of a deferred drop — the logical phase — simply marks the pages as deallocated. This greatly minimizes the length of time required to complete the commit of the transaction. This also greatly speeds up a rollback as it simply needs to reallocate the pages it deallocated. No movement of data occurs during this phase.
Phase 2 of a deferred drop — the physical phase — physically deallocates the pages. Since the pages are not allocated to any object, it does not require taking extensive locks to perform the physical drop. The physical phase is executed in batches by a background process called Resource Monitor. This means they are deallocated by a system process instead of the session that allocated them.
This clearly explains why the deallocations were way out of line for user sessions. The user sessions were not performing the deallocations for large objects.
Learn More
I came up with the theory, and provided my findings to Vicky. Vicky got the pleasure of working with CSS at Microsoft to prove out that it actually was a bug. It wouldn’t be enough to simply tell the SQLdm users that it was a bug in SQL Server. We needed to have hard evidence. Too many people blame Microsoft if they have an unusual experience with SQL Server. Vicky spent many hours over many months working with CSS to prove out that it is a bug. Vicky created an excellent repro for the issue. go to post on the topic to get a look at and try the repro for yourself. Vicky also offers some additional technical details to which I no longer have access.
We finally got confirmation from CSS and the SQL Server product team that it is indeed a bug in how deallocations are being reported. A fix for this is not available yet, but at least you now know to consider deallocations by system threads when the numbers don’t look right.
Please go visit Vicky’s blog to get the full picture of what happened.
SQLSoldier
Thanks to Mike West (blog: http://www.sqlwaitstats.com) for pointing out that I spelled Microsoft as Microoft. 😀 Corrected.
mike
I wasn’t going to tell anyone. I thought it might have been Freudian slip that wasn’t really a slip. 🙂
SQLSoldier
Haha! When someone helps me out, I like to acknowledge it. 🙂
Chris Yates
Nice post; I’ve taken this and shared it with the rest of the team; appreciate you taking the time to spread the awareness. Have a good one!
SQLSoldier
Thanks Chris! 🙂