Warning: Not a valid checkpoint file name

One of the other DBAs I work with noticed a warning message that was flooding the log files on one of our servers. He asked me to look into it. I did a Bing search for the warning message and found no useful links.

This is the warning message:

[WARNING] HkHostBackupGetCheckpointFileInfoV2(). Database ID: [<database ID>]. Not a valid checkpoint file name. FileName: fffeaca6-ffff519f-fffc.00015359-0000ae60-0003.c1242a5d-8a93-46ec-9e21-cf41c32179fa.0-0.1000016. (d:\b\s1\sources\sql\ntdbms\hekaton\sqlhost\sqlmin\hkhostbackup.cpp : 2958)

The database ID was the ID of the database in which we had recently deployed in-memory OLTP. This seemed to jibe with the error message as in-memory uses checkpoint files and the feature was codenamed “hekaton” prior to release. Clearly, it was related to the in-memory OLTP feature.

When I looked at all messages in the log as a whole, I saw that there was a series of log messages that it always logged. First there was this informational message:

[INFO] getMaxUnrecoverableCheckpointId(). Database ID: [<database ID>]. Start of Log LSN: 00015343:000128D8:0002 used to trim unrecoverable checkpoint files tables during full backup.

Followed by hundreds of the aforementioned warning about invalid checkpoint file names. And then finally, this informational message:

[INFO] HkHostBackupDeleteContext(). Database ID: [<database ID>]. Cleaned up all the allocated buffers.

And yes, database ID was the same in all messages (currently we are only using in-memory in a single database). The series of messages starts shortly after midnight while our full backup process is running. Clearly this is a system maintenance process that is triggered by a full backup. My theory is that the hundreds of warnings are for every in-memory buffer still allocated but no longer associated to a real checkpoint file (no longer in use).

Other Related Observances

The server on which we are running in-memory OLTP is a really hefty server with 128 logical cores and 1.5 TB of RAM (1.4 TB allocated to SQL Server). We are limiting in-memory’s memory usage with Resource Governor, which also makes it easy to see how much it is using. Needless to say, even with a limited percentage of 1.4 TB of RAM is still a lot of memory. The highest I have seen in-memory usage for this one database reach at peak activity levels is ~43 GB. In production, when the heavy in-memory OLTP processes complete, I see the system reclaim the in-memory buffers pretty quickly, though not completely. During a normal day, I often see the in-memory memory usage hovering between 1 and 3 GB even when there is virtually no traffic.

When testing in-memory on a dev server that only I was using before deploying to production, I noticed that the memory usage would stay at whatever high level it reached. This makes me believe that in-memory buffers are cleaned up and reclaimed as needed, and if not needed, they just hang around as in-memory buffers. And it appears that some of the buffers end up hanging around. Perhaps they wouldn’t if the server was memory starved. I have not tested that theory.

It would seem that the SQL Server development team saw that there may be a need for a memory cleanup job. It seems logical that if there is a slow period for a system, that is when the DBA will schedule maintenance like backup jobs to run. So why not have a full backup trigger the memory cleanup job? Seems sensible.

Things to Do

There is still more I would like to test on this theory.

I know that full backups trigger this and log backups do not. What about other backup types like differential, filegroup, or file backups? What about SAN snapshots?
How does this affect AG nodes where the full backups do not occur? I do see that the messages are only being logged on the primary where we are performing backups? Does this process also run on the secondaries? If I run the backup on the secondary, does it run on the primary as well?
Observer in-memory memory usage during this process to see if I can see memory allocations being removed.
Test under memory starvation to see if all buffers are being reclaimed.