SQLU VLDB Week – Index Maintenance

Welcome back for another exciting session on working with VLDBs, or Very Large DataBases. Today is day 2 of VLDBweek, and we are going to dive into the biggest challenge, as voted on by the people, with working with VLDBs: Index Maintenance <cue spooky music>.

The reason index maintenace is challenging is pretty obvious. Big tables have big indexes. Big indexes require more time and use more resources to rebuild or reorganize. There are some strategies you can use for easing the pain. You can’t get completely alleviate it unless you opt to never do index maintenance, and then you’re just swapping one set of troubles for another.

Be Selective

The first way you can ease the pain is to not perform index maintenance on everything. A lot of people just do a blanket rebuild on everything every night or once week. This is a perfectly fine approach if the indexes are all small. When working with a very large database, this tactic will not work. You have to pick and choose what index maintenance to do. You can start by using sys.dm_db_index_physical_stats to examine the fragmentation levels of the indexes and making informed decisions about whether to rebuild or reorganize or simply to leave alone.

The standard levels that most people use are to rebuild if fragmentation is greater than 30%, reorganize if fragmentation is between 10% and 30%, and leave alone if less than 10% fragmented. Additionally, most people skip indexes with less than 1000 pages.

You should also pay attention to whether an index can be built online or you must rebuild offline. Know the rules for whether an index can be rebuilt online and rebuild online if possible. If rebuilding online is not possible, you will need a downtime for the index maintenance. If a downtime is not possible, you could attempt to perform a reorganize instead. Reorganizing an index is ALWAYS an online operation. In some cases where there is high fragmentation, a reorganize may be able to bring down fragmentation enough to buy you some time until you can get downtime to rebuild the index.

Here is a downand dirty query I wrote for checking index fragmentation levels:

With BuildOffline
As (
    -- Identify NCL indexes with included LOB/Spatial columns
    Select ic.object_id, ic.index_id
    From sys.index_columns ic
    Inner Join sys.columns c on c.object_id = ic.object_id
        And c.column_id = ic.column_id
        And ic.is_included_column = 1
    Inner Join sys.types ty on ty.user_type_id = c.user_type_id
        And ((ty.name In ('varchar', 'nvarchar', 'varbinary')
                And c.max_length = -1)
            Or ty.name In ('geography', 'geometry', 'ntext', 'image', 'text', 'xml'))
    Union
    -- Identify CL indexes with LOB/Spatial columns
    Select c.object_id, 1
    From sys.columns c
    Inner Join sys.types ty on ty.user_type_id = c.user_type_id
        And ((ty.name In ('varchar', 'nvarchar', 'varbinary')
                And c.max_length = -1)
            Or ty.name In ('geography', 'geometry', 'ntext', 'image', 'text', 'xml'))
)
Select TableName = OBJECT_NAME(PS.object_id),
    AvgFragPercent = PS.avg_fragmentation_in_percent,
    PageCount = PS.page_count,
    IndexName = i.name,
    RebuildCode = 'ALTER INDEX ' + QUOTENAME(i.name) + '
        ON ' + QUOTENAME(SCHEMA_NAME(t.schema_id)) + '.' + QUOTENAME(OBJECT_NAME(PS.object_id)) + CHAR(10) +
        Case When PS.avg_fragmentation_in_percent > 30 Then
                'REBUILD WITH (ONLINE = ' +
                Case When bo.object_id is not null Then 'Off'
                    Else 'On' End + ', MAXDOP = 2, SORT_IN_TEMPDB = ON)'
            When avg_fragmentation_in_percent > 10 Then
                'REORGANIZE'
        End + ';'
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, 'Detailed') PS
Inner Join sys.indexes i On i.object_id =  PS.object_id
    And i.index_id = PS.index_id
    And i.index_id > 0
Inner Join sys.tables t on t.object_id = i.object_id
Left Join BuildOffline bo On bo.object_id = PS.object_id
    And i.index_id = bo.index_id
Where t.is_ms_shipped = 0
And avg_fragmentation_in_percent > 10
And PS.page_count >= 1000
Order By PS.object_id, PS.index_id
OPTION (MAXDOP 2);

Perform Maintenance at the Partition Level

You can also perform index maintenance in a partitioned table at the partition level. Often very large tables have a large amount of historical data that never changes. When partitions hold static data, fragmentation doesn’t increase. Once they are defragmented, they stay defragmented. So why rebuild the whole table when only the newest 5 to 10% is fragmented?

There is a catch. Rebuilding an index at the partition level is ALWAYS an offline operation. An offline rebuild on 1 or 2 partitions might still be preferable to an online rebuild of the whole table. If you already have to rebuild the index offline, then rebuilding at the partition level is a no-brainer.

The good side is that reorganizing at the partition level is still an online operation. There’s no penalty for reorganizing a partition. Again, no-brainer.

Don’t Check Fragmentation Levels

Iused to work on the operations team of an application with a 1.8 TB OLTP database. There was not much in this database in terms of old, historical data. It was architected for a much smaller database, and there are certain large tables that are always fragmented when they perform the monthly index maintenance. Simply checking the fragmentation levels on some tables can take 4 to 6 hours, and they always have to be rebuilt. Why perform the fragmentation level check? Skip that step and simply rebuild those indexes if you know you already know you will have to do it.

Design with Index Maintenance in Mind

DBAs may not always get a say during the design phase of an application. If you are able to guide the design of the database, and the database is expected to get very large, here are some tips to design for index maintenance. The two key tips I want to touch on are storing LOB columns in a child table and using partitioned views and tables.

Storing LOB Columns in a Child Table

Having LOB columns in the clustered index means that the index cannot be rebuilt online. They must be rebuilt offline. You can preserve the ability to rebuild online by placing LOB columns in a child table. This is especially true if the LOB columns are going to be sparsely populated.

Combine Partitioned Views with Partitioned Tables

As stated before, if all of your activity and fragmentation is in the last 1 or 2 partitions, then you can get a lot of benefit by separating active data into a stand-alone table and historical data into a partitioned table. You can use different indexing strategies on the two tables. You can perform index maintenance separately on the two tables. You can rebuild the stand-alone table online whereas rebuilding it as a partition of a larger table would be offline.

Over top of the two tables, you can place a partitioned view that combines the two tables via a UNION ALL. End users can hit the view, and as far as they see, it is still just a single table. The picture below will help explain how to implement this: