If you missed any of the earlier posts in my DR series, you can check them out here:
- 31 Days of disaster Recovery
- Does DBCC Automatically Use Existing Snapshot?
- Protection From Restoring a Backup of a Contained Database
- Determining Files to Restore Database
- Back That Thang Up
- Dealing With Corruption in a Nonclustered Index
- Dealing With Corruption in Allocation Pages
- Writing SLAs for Disaster Recover
- Resolutions for All DBAs
CHECKSUM Page Verification
There are 3 page verification options in SQL Server 2005 and newer. Checksum is the current default setting and the most comprehensive. Torn page detection was the default in SQL Server 2000. Torn page detection is less comprehensive because the way it works is to write a 2-bit pattern to the header of the page and then an alternating 2-bit pattern every 512 byte sector. This means it is possible for corruption to occur in a very small area of the page and not be detected. Checksum on the other hand generates a different value if even one bit of data is different. Any amount of corruption will result in a different value and the page will be flagged as corrupted.
Since the default page verification option changed, new databases are protected, but if an older database is upgraded, the page verification option is not changed. If you have really old databases, they may not be fully protected. You should check the page verification option in sys.databases and update those databases.
Here’s the tricky part. If you change the page verification option, you are not automatically protected. It doesn’t automatically rewrite every page to have checksum values when you change it. The next time a page is written to disk, the checksum value is written to the page. My recommendation when changing to checksum page verification is during the next index maintenance period, simply rebuild every index or heap (SQL Server 2008+) rather than rebuilding selectively. This will cause every table and index to be rewritten to disk and ensure that all data structures are protected by checksums.
The third page verification option is NONE. This has NEVER been the default page verification option and if you have databases using this option then it is time for someone to find a new job. This option is only set if someone changes it and whether the change was done maliciously or due to not knowing better, that person should not be touching SQL Server.
Backup and Restore With Checksum
The final 2 ways to use checksum is with the backup and restore commands. When you use WITH CHECKSUM with the backup and restore commands, it performs extra checks. If checksums exist on a page (see previous section), it will recalculate the checksum values and ensure that they are still corruption free. This does NOT replace the need to run regular integrity checks with DBCC CHECKDB, but it gives you an extra opportunity to catch corruption. Furthermore, if you backup a database that has become corrupted without using this option, it may complete successfully with no warnings or errors. This often leads to corruption getting backed up and restored to other servers or environments without anyone realizing it. Additionally, when the backup completes, it will generate a checksum for the entire backup.
When restoring a backup using WITH CHECKSUM, it performs the same checks. The command does require that the backup was created using the checksum option, but it ensures that you don’t restore a corrupted database without being aware you are doing so. Also, if the backup became corrupt after it was created, then the restore is able to detect that and fail the restore almost immediately rather than waiting until it hits the corrupt page. If the backup file became corrupt, the checksum for the backup will be different and for a very large database, this can save you hours of time.
If you did hit corruption when restoring or backing up with the checksum option, you can complete the backup or restore using the CONTINUE_AFTER_ERROR option. Creating a backup of a corrupt database using this option will mark the database as being damaged and will ensure that the backup cannot be restored unless the continue after error option is also used for the restore. This will prevent someone from unwittingly restoring a backup of a corrupt database without realizing it.
EDIT: recommendation from Paul Randal (blog|@PaulRandal)
A great way to check a backup for corruption is to run RESTORE VERIFYONLY … WITH CHECKSUM; to perform the same checks (check the checksum of every page with checksum values and check the checksum of the backup file) without actually restoring it.
See it in Action
I put together a demo so you can prove to yourself the value of using all 3 checksum options. Rather than explain the demo in the blog post here, the demo code is heavily commented. It uses the two sample databases I’ve used in 2 earlier posts in this series. You may already have them downloaded, or you can download them again. I have provided the demo script and the 2 sample corrupt databases in separate zip files so you can download only what you need.
Sample corrupt databases: SampleCorruptDBs.zip (12.33 MB)
Demo code: Demo_BackupWithCheckSum.zip (2 KB)
-- Check page verify setting -- Note that both databases have Checksums enable Select name, page_verify_option_desc From sys.databases Where name in ('PFSCorruption', 'AdventureWorksDW2012'); -- Check to see if the databases are corrupted -- Note that both databases are DBCC CheckDB(AdventureWorksDW2012) With No_InfoMsgs, All_ErrorMsgs, TableResults; DBCC CheckDB(PFSCorruption) With No_InfoMsgs, All_ErrorMsgs, TableResults; -- Backup PFSCorruption with Checksum -- Note that it suceeds even though we know it is corrupt -- Pages have not had checksum values written yet Backup Database PFSCorruption To Disk = 'C:bakPFSCorruption_checksum.bak' With Init, Checksum; -- Backup AdventureWorksDW2012 with Checksum -- Note that this backup fails and notifies us that it is corrupt -- Pages have had the checksum values written to them Backup Database AdventureWorksDW2012 To Disk = 'C:bakAdventureWorksDW2012_checksum.bak' With Init, Checksum; -- Backup AdventureWorksDW2012 without Checksum -- Suceeds with no warning or errors Backup Database AdventureWorksDW2012 To Disk = 'C:bakAdventureWorksDW2012_nochecksum.bak' With Init; -- Use Continue_After_Error option to get backup of corrupt database Backup Database AdventureWorksDW2012 To Disk = 'C:bakAdventureWorksDW2012_checksum.bak' With Init, Checksum, Continue_After_Error; -- What happens if we restore AdventureWorksDW2012_nochecksum.bak? -- Database restores with no errors. Another lost opportunity to catch corruption. -- We have potentially spread corruption to different server/environment Restore Database CorruptDB_nochecksum From Disk = 'C:bakAdventureWorksDW2012_nochecksum.bak' With Move 'AdventureWorksDW2008R2_Data' To 'c:bakAdventureWorksDW2008R2_Data.mdf', Move 'AdventureWorksDW2008R2_Log' To 'c:bakAdventureWorksDW2008R2_Log.ldf'; -- Let's try it again with the checksum option If DB_ID('CorruptDB_nochecksum') Is Not Null Drop Database CorruptDB_nochecksum; -- Restore fails because we cannot use checksum for restore unless -- we used it for the backup Restore Database CorruptDB_nochecksum From Disk = 'C:bakAdventureWorksDW2012_nochecksum.bak' With Checksum, Move 'AdventureWorksDW2008R2_Data' To 'c:bakAdventureWorksDW2008R2_Data.mdf', Move 'AdventureWorksDW2008R2_Log' To 'c:bakAdventureWorksDW2008R2_Log.ldf'; -- What happens if we restore AdventureWorksDW2012_checksum.bak -- Restore fails because it was marked as a corrupt database Restore Database CorruptDB_checksum From Disk = 'C:bakAdventureWorksDW2012_checksum.bak' With Move 'AdventureWorksDW2008R2_Data' To 'c:bakAdventureWorksDW2008R2_Data.mdf', Move 'AdventureWorksDW2008R2_Log' To 'c:bakAdventureWorksDW2008R2_Log.ldf'; -- Let's try it again with the checksum option -- Restore fails because it was marked as a corrupt database -- Requires using the continue after error option Restore Database CorruptDB_checksum From Disk = 'C:bakAdventureWorksDW2012_checksum.bak' With Checksum, Move 'AdventureWorksDW2008R2_Data' To 'c:bakAdventureWorksDW2008R2_Data.mdf', Move 'AdventureWorksDW2008R2_Log' To 'c:bakAdventureWorksDW2008R2_Log.ldf'; -- Let's try it again with the checksum and Continue_After_Error options -- Suceeds with a low level warning telling us that the database -- was damaged and should be checked out Restore Database CorruptDB_checksum From Disk = 'C:bakAdventureWorksDW2012_checksum.bak' With Checksum, Move 'AdventureWorksDW2008R2_Data' To 'c:bakAdventureWorksDW2008R2_Data.mdf', Move 'AdventureWorksDW2008R2_Log' To 'c:bakAdventureWorksDW2008R2_Log.ldf', Continue_After_Error; -- Cleanup extra database(s) If DB_ID('CorruptDB_checksum') Is Not Null Drop Database CorruptDB_checksum; If DB_ID('CorruptDB_nochecksum') Is Not Null Drop Database CorruptDB_nochecksum;
Summary
You can see that the checksum options I’ve described in this post are very useful for protecting the integrity of your data and for increasing the chances that you will find corruption early as well as being the quickest way to detect a corrupt backup file. I highly encourage you to step through the demo code and see how it protects you and can save you a lot of hardship down the line.
SQLSoldier
Thanks to Paul Randal again for pointing out that the 2-bit pattern used for torn page detection uses an alternating pattern rather than repeating the same pattern every 512 bytes. Also, thanks to Paul for the recommendation of using RESTORE VERIFYONLY … WITH CHECKSUM;.
See edits in post.
Sqlchow
For the torn page detection, How does it determine what pattern to write?
SQLSoldier
As far as I know, the details of the bit pattern are not documented. Paul Randal may know, but I’ve never seen that info made public.
Paul Randal
Starts with 10, 01, 10, 01 alternating down the blocks, then flips them next time it writes the page.
SQLSoldier
Thanks again Paul!
Angie
I just scripted out the restore code from the GUI and it does not use CheckSum. That would have been nice if they coded for it.
SQLSoldier
Agreed. Sadly, the GUI doesn’t support a lot of good features.
Martin Catherall
It seems amazing that SSMS uses the checksum option differently when scripting backups if you choose compression verses np compression.
Using compression I get checksums scripted using SSMS 2008 / 2012 and when compression is left off the backup then checksums are not scripted – I checked BOL and it stats this also. Yet another reason to go the TSQL route.
Great series Robert.
SQLSoldier
Thanks Martin! I wish they would make checksum the default option for backups and restores, but that would probably break backwards compatibility rules.
Day 10 of 31 Days of Disaster Recovery: Monitoring for Corruption Errors | SQLSoldier
[…] Use All the Checksums […]
Day 11 of 31 Days of Disaster: Converting LSN Formats | SQLSoldier
[…] Use All the Checksums […]
David
Hi, i have a vendor DB that is set to torn page detection. We are running on SQL 2008 R2, I am guessing that if the default is with checksum the vendor must have changed the setting to torn page detection when they setup the DB. Are there any possible reason they would do this? Are there any risks to changing it to checksum?
SQLSoldier
Hi David. The vendor’s DB was probably developed initially on SQL 2000 or earlier version when torn page detection was the default and then they generated scripts that hard-coded the torn page option.
There is no risk to changing it to checksum, but I would first verify that this would not break your support agreement with the vendor.
SQLSoldier
Sorry David. I saw you posted a reply, but it accidentally got deleted. Can you repost it?
David
No worries, I will follow up with the Vendor regarding checksum. My question was around the need to do a complete index rebuild after turning on checksum. Our vendor provide all the DB maint jobs that they say need to be run against the DB. Can i just turn on checksum and leave SQL to just create the checksums as the data is accessed or? I understand that the pages will not have a checksum created until the are changed but this would be better then just staying on torn_page_detection.
SQLSoldier
I generally recommend doing a full rebuild of all indexes during your next index maintenance period. Otherwise, There could be areas of your database not properly protected. I guess you need to weigh the risks and benefits and decide if you think it is okay to wait.
Day 15 of 31 Days of Disaster Recovery: Running DBCC CheckTable in Parallel Jobs | SQLSoldier
[…] Use All the Checksums […]
Day 19 of 31 Days of Disaster Recovery: How Much Log Can a Backup Log | SQLSoldier
[…] Use All the Checksums […]
Day 24 of 31 Days of Disaster Recovery: Handling Corruption in a Clustered Index | SQLSoldier
[…] Use All the Checksums […]
Shaun
Is there any way to detect the number of pages or percentage of pages which do not have a checksum for a database which has been changed from torn page detection (if the maintenance window didn;t allow for a full index rebuild for example)?
SQLSoldier
Unfortunately, that isn’t tracked anywhere as an aggregate. You would have to dump every page using DBCC PAGE() to see if it had a checksum value. The closest you could come in estimating this without dumping every page would be to dump just the differential change map (DCM) pages and try to estimate how much of the database had changed since the last full backup. This would still be inaccurate because the DCMs track changed extents, not pages. So an extent could be changed but have anywhere between 1 and 8 pages in the extent that have been changed.
It may be better to figure out a way to do the index rebuild rather than trying to calculate how much of it is protected.
Who Owns SQL Backups? | SQLSoldier
[…] their backup medium support backing up with the CHECKSUM option? This option is very important for making sure that your backups do not contain corrupted data. It […]