This is the warning message:
[WARNING] HkHostBackupGetCheckpointFileInfoV2(). Database ID: [<database ID>]. Not a valid checkpoint file name. FileName: fffeaca6-ffff519f-fffc.00015359-0000ae60-0003.c1242a5d-8a93-46ec-9e21-cf41c32179fa.0-0.1000016. (d:\b\s1\sources\sql\ntdbms\hekaton\sqlhost\sqlmin\hkhostbackup.cpp : 2958)
The database ID was the ID of the database in which we had recently deployed in-memory OLTP. This seemed to jibe with the error message as in-memory uses checkpoint files and the feature was codenamed “hekaton” prior to release. Clearly, it was related to the in-memory OLTP feature.
When I looked at all messages in the log as a whole, I saw that there was a series of log messages that it always logged. First there was this informational message:
[INFO] getMaxUnrecoverableCheckpointId(). Database ID: [<database ID>]. Start of Log LSN: 00015343:000128D8:0002 used to trim unrecoverable checkpoint files tables during full backup.
Followed by hundreds of the aforementioned warning about invalid checkpoint file names. And then finally, this informational message:
[INFO] HkHostBackupDeleteContext(). Database ID: [<database ID>]. Cleaned up all the allocated buffers.
And yes, database ID was the same in all messages (currently we are only using in-memory in a single database). The series of messages starts shortly after midnight while our full backup process is running. Clearly this is a system maintenance process that is triggered by a full backup. My theory is that the hundreds of warnings are for every in-memory buffer still allocated but no longer associated to a real checkpoint file (no longer in use).
Other Related Observances
The server on which we are running in-memory OLTP is a really hefty server with 128 logical cores and 1.5 TB of RAM (1.4 TB allocated to SQL Server). We are limiting in-memory’s memory usage with Resource Governor, which also makes it easy to see how much it is using. Needless to say, even with a limited percentage of 1.4 TB of RAM is still a lot of memory. The highest I have seen in-memory usage for this one database reach at peak activity levels is ~43 GB. In production, when the heavy in-memory OLTP processes complete, I see the system reclaim the in-memory buffers pretty quickly, though not completely. During a normal day, I often see the in-memory memory usage hovering between 1 and 3 GB even when there is virtually no traffic.
When testing in-memory on a dev server that only I was using before deploying to production, I noticed that the memory usage would stay at whatever high level it reached. This makes me believe that in-memory buffers are cleaned up and reclaimed as needed, and if not needed, they just hang around as in-memory buffers. And it appears that some of the buffers end up hanging around. Perhaps they wouldn’t if the server was memory starved. I have not tested that theory.
It would seem that the SQL Server development team saw that there may be a need for a memory cleanup job. It seems logical that if there is a slow period for a system, that is when the DBA will schedule maintenance like backup jobs to run. So why not have a full backup trigger the memory cleanup job? Seems sensible.
Things to Do
There is still more I would like to test on this theory.
- I know that full backups trigger this and log backups do not. What about other backup types like differential, filegroup, or file backups? What about SAN snapshots?
- How does this affect AG nodes where the full backups do not occur? I do see that the messages are only being logged on the primary where we are performing backups? Does this process also run on the secondaries? If I run the backup on the secondary, does it run on the primary as well?
- Observer in-memory memory usage during this process to see if I can see memory allocations being removed.
- Test under memory starvation to see if all buffers are being reclaimed.