Day 15 of 31 Days of Disaster Recovery: Running DBCC CheckTable in Parallel Jobs

Welcome back to my 31 Days of Disaster Recovery series. Today is day 15, and I want to answer a question I was asked a while back. Paul Randal (blog|@PaulRandal) wrote a blog post explaining alternative options for checking integrity of a very large database if you are not able to run the full CHECKDB process, and the question was borne out of one of the recommendations by Paul. One of the tactics Paul recommends is breaking the process up over multiple nights.

The person I was talking to was planning to run DBCC CHECKALLOC and DBCC CHECKCATALOG the first night of the week and then spread out DBCC CHECKTABLE executions for all of the tables across other nights. His database has a mixture of some very big tables and lots of small tables. His question to me was can he save time by running the DBCC CHECKTABLE commands in parallel threads and process multiple tables at the same time. His thought was that while one job is running DBCC CHECKTABLE against a very large table, another job could be running it against the smaller tables.

At the time, I wasn’t sure how well multiple jobs would co-exist, but I said I would run some tests and see. The results of my findings are below.

If you missed any of the earlier posts in my DR series, you can check them out here:

31 Days of disaster Recovery

Holy Double DBCC, Batman!

I decided to use my AdventureWorksDW2012 database for these tests. I have a very large table that I added to the database that I use for testing quite often named dbo.FactInternetSalesBig. dbo.FactInternetSalesBig has 30,923,776 rows in it and is basically just a copy of dbo.FactInternetSales table with the data re-inserted repeatedly until it was sufficiently big enough. The original dbo.FactInternetSales table has 60,398 rows in it, so it is a fraction of the size of the big versions. Another difference between the two tables is indexes. The small table has 9 total indexes, a clustered index, 7 nonclustered indexes, and a columnstore index (remnant of a different test) whereas the big table is just a heap.

I already knew from experience that the big table is about 7 GB of data that takes a fair amount of time to load into memory. First thing I did was run DBCC CHECKTABLE on both tables to make sure they both had their data in memory. Next, I gathered run times for the DBCC CHECKTABLE runs individually for my baseline values. Then I captured the run times for running both commands run at the same time in separate query windows. I toyed around with several other combinations of factors, and the ones that seemed to make the most compelling story was playing around with limiting maximum degree of parallelism (max DOP), and creating the database snapshot ahead of time. I had expected that running the DBCC commands against the database would not be ideal as each job would create it’s own database snapshot (see day 1 of this series: Does DBCC Automatically Use Existing Snapshot?).

The code for the DBCC CHECKTABLE commands are shown below. The commands were the same for running against the database snapshot (named AWSnap) except the USE statement point to AWSnap.

Use AdventureWorksDW2012;

Declare @StartTime datetime2 = getdate(),
    @TimeLapse int;

DBCC CheckTable('dbo.FactInternetSalesBig') with No_InfoMsgs;

Set @TimeLapse = DATEDIFF(ms, @StartTime, getdate());

Select @TimeLapse;

Use AdventureWorksDW2012;

Declare @StartTime datetime2 = getdate(),
    @TimeLapse int;

DBCC CheckTable('dbo.FactInternetSales') with No_InfoMsgs;

Set @TimeLapse = DATEDIFF(ms, @StartTime, getdate());

Select @TimeLapse;

One of the interesting things I noticed was that running the jobs in parallel caused a relatively small increase in run time for the large table and a large increase in run time for the small table. Initially, I noticed a lot of IOCOMPLETION and PAGEIOLATCH_SH (shared page IO latch waits) waits on the the hidden snapshot that gets created (e.g., 13:1:1444304) and the occasional PREEMPTIVE_OS_FILEOPS (calling out to OS for a file operations) wait. This quickly gave way to CXPACKET (parallelism exchange event) and 1 thread waiting on WRITE_COMPLETION. Clearly there is contention between the parallel jobs.

Altogether, I ended with 8 different sets of numbers. one thing to note is although I show a single set of numbers for each category, each test was run multiple times and the run times were averaged. Additionally, the cache was pre-warmed for both the database and the pre-existing database snapshot. Some generalizations can be made about my results based on these numbers.

Running the jobs in parallel tends to take slightly longer than running individually
Running the jobs with limited max DOP was faster on the database directly, but slower against the snapshot
The best results were achieved with higher max DOP on the pre-existing snapshot

Though not shown in the numbers, I did try with various levels of max DOP. The test machine has 8 logical CPUs, and the max DOP used for the reported numbers were 0 (number of logical CPUs or 8) and 1 (parallelism disabled). 1 was chosen as the test value because that value demonstrated the best results for running the jobs in parallel against the database and degraded incrementally as max DOP is increased. Running the jobs against the pre-existing snapshot got the worst results with max DOP = 1 and improved incrementally as max DOP is increased.

	Run Time (ms)
	FactInternetSalesBig	FactInternetSales	Run Time
Baseline (Individually on database)	234274	3614	237888
Parallel on database	239547	237027	239547
Individually on database w/limited DOP	147240	2467	149707
Parallel on database w/limited DOP	153836	114123	153836
Individually on snapshot	12853	263	13116
Parallel on snapshot	14617	1647	14617
Individually on snapshot w/limited DOP	49710	660	50370
Parallel on snapshot w/limited DOP	50520	950	50520

Summary

These tests showed that you can indeed run DBCC CHECKTABLE command in parallel jobs to fit more checks into your maintenance window. Creating the database snapshot ahead of time and running all of the checks against the snapshot directly gives the best results speed wise. And as for the best degree of relativity, I believe that your mileage may vary and recommend testing different levels of max DOP to determine the optimal settings.

As Paul notes in the comments below, you should never run DBCC commands in parallel jobs unless you are creating the snapshot ahead of time and running it directly against the snapshot. And it’s important to remember that it’s not enough just to create the snapshot. You have to run the DBCC command in the context of the snapshot. It won’t automatically use the snapshot just because one exists.

Edit: A couple of people have asked about the script I used to create the dbo.FactInternetSalesBig table. The script I used was for testing out columnstore indexes and is from Kalen Delaney’s (blog|@SQLQueen) blog: Geek City: Build a Big Table with a Columnstore Index.

Comments (3)

January 17, 2013 at 9:59 AM

Paul Randal


An explanation of your results:

Firstly, running DBCC against the tables to load the data into memory does nothing – DBCC uses 100 buffers in the buffer pool and marks them for disfavoring. You can’t preload data into memory using DBCC.

The runs against the snapshot are so much faster because DBCC doesn’t need to create the underlying database snapshot, nothing to do with DBCC itself. There’s nothing different in the DBCC algorithms or the I/O it does – it just doesn’t have to create the snapshot itself. When running against a snapshot, you still have to take the time and disk space to create the snapshot.

I would never advocate running DBCC in parallel EXCEPT on a snapshot, as each parallel DBCC run against the database has to create its own snapshot in that case. If there is activity in the database, running in parallel NOT against the snapshot will likely bring the system to it’s knees quickly. You really need to make that recommendation clear.

Thanks

Loading...
1. January 17, 2013 at 10:13 AM
  
  SQLSoldier
  
  
  Thanks for the additional information, Paul!
  
  I discounted the first run of DBCC CHECKTABLE because the first run always took significantly longer to run than follow-up runs. And when when I run it against the very large table, SQL Server’s memory usage increases my almost 7 GB, which is the size of the table. I’m not disagreeing, I’m just saying that this was my observation.
  
  Loading...
March 11, 2013 at 11:18 PM

Day 30 of 31 Days of Disaster Recovery (T-SQL Tuesday #40): Using Partial Availability and Initialize from Backup to Replicate a Partial Database | SQLSoldier


[…] Running DBCC CheckTable in Parallel Jobs […]

Loading...