Comments (8)

  1. SQLSoldier,

    Thanks for this useful series. Some of your readers may want code examples that show/expand on what you suggest. Following is a code snippet I often use in my current job. We mainly consolidate data from lots of different data sources into a central data/reporting store (not a true Data Warehouse) so our recovery mode for all our databases is simple (no need for point-in-time recovery). Most of our larger tables are partitioned but we still sometimes need to perform DELETEs which as you indicate can cause many issues with VLDBs.

    The following script causes frequent small commits leading to frequent checkpoints, thus as you say minimizing log file growth.

    Obvious note: WHERE clauses in the command must match.

    WHILE EXISTS (SELECT * FROM MyTable WITH (NOLOCK) WHERE MyColumn = 12345)
    DELETE TOP (1000) MyTable WHERE MyColumn = 12345 ;

    If the WHERE column is your clustered index or is the first column of your clustered index and if you can determine some way to evenly “chunk” or group your data into even non-overlapping sets then you can parallelize this DELETE operation possibly leading to better performance. This might look something like the following where you would run the commands simultaneously and possibly see a performance boost.

    Command #1:
    WHILE EXISTS (SELECT * FROM MyTable WITH (NOLOCK) WHERE MyColumn >= 12345 AND MyColumn < 23456)
    DELETE TOP (1000) MyTable WHERE MyColumn = 12345 AND MyColumn = 23456 AND MyColumn < 34567)
    DELETE TOP (1000) MyTable WHERE MyColumn = 23456 AND MyColumn = 23456 AND MyColumn < 34567 ;

    If I see problems I can easily cancel these WHILE EXISTS DELETE TOP (N) commands. Because these looping structures are issuing multiple small, frequent commits any rollback operation on the currently running DELETE TOP (N) command is also small. If you were midstream of issuing a single huge DELETE command with no TOP (N) and had to cancel, the necessary rollback can take a very long time.

    I would love to see scripts that you might have or scripts that your other readers have related to this discussion.

    Again thanks for sharing these very useful ideas related to VLDBs.

  2. SQLSoldier. Sorry for the follow up. It looks like WordPress mangled my post somewhat. Not sure what happened. Following is the chunk that is mangled. If it mangles it again and you are interested I can send via email? Thanks, Stephen

    Command #1:

    WHILE EXISTS (SELECT * FROM MyTable WITH (NOLOCK) WHERE MyColumn >= 12345 AND MyColumn < 23456)
    DELETE TOP (1000) MyTable WHERE MyColumn = 12345 AND MyColumn = 23456 AND MyColumn < 34567)
    DELETE TOP (1000) MyTable WHERE MyColumn = 23456 AND MyColumn = 23456 AND MyColumn < 34567 ;

    If I see problems… etc

    1. Thanks for the feedback, Stephen!! I’m seeing your posts just fine. Nothing is mangled. What does it look like to you?

      I’m going to add in a little more info on this. Thanks for the suggestion!!

      1. Hi Robert,

        The mangling is in the Command #1 clause above. I think it is the less thans and greater thans that are messing things up a bit. WordPress is possibly seeing an html element somehow. Should read like the following (will use GT and LT instead of the characters).

        —-

        Command #1:
        WHILE EXISTS (SELECT * FROM MyTable WITH (NOLOCK) WHERE MyColumn GT= 12345 AND MyColumn LT 23456)
        DELETE TOP (1000) MyTable WHERE MyColumn GT= 12345 AND MyColumn LT 23456 ;

        Command #2:

        WHILE EXISTS (SELECT * FROM MyTable WITH (NOLOCK) WHERE MyColumn GT= 23456 AND MyColumn LT 34567)
        DELETE TOP (1000) MyTable WHERE MyColumn GT= 23456 AND MyColumn LT 34567 ;

        When I am doing something like this, I frequently use Adam Machanic’s sp_whoisactive to monitor that help ensure that I am not causing problems. I also use sp_spaceused on MyTable to watch progress. And I also use a query like the following to monitor progress.

        SELECT COUNT(*)
        FROM MyTable WITH (NOLOCK)
        WHERE MyColumn GT= 23456 AND MyColumn LT 34567 ;

        If I see problems… etc

  3. I’ve been waiting all week for this article! I’m in a situation where I need to delete data from two huge tables – 60 million+ rows each and growing by 80K rows per day. Trying to find the best way. Been using your version of the “hard method”, but it’s still pretty slow. I’d love to see an example of using partitions to delete data.

    1. Is the table partitioned? If so, you switch out the whole partition that you want to delete into an empty table and then just drop the table or truncate it.

      1. It’s not currently partitioned 🙁 But that’s what I was thinking I would do. Does adding partitions to an unpartitioned table cause a big disk IO hit if the partitions are in the same filegroup? (running SQL 2005, btw).

        1. Yes. Unfortunately, it is a huge IO hit. It has to move the data to different files. Obviously, it’s best to plan for partitioning in the design phase, but tha tis so often not the cards we are dealt. Unfortunately, people often let it go because it will be such a big pain to implement. Even if they set the existing data as a single partition and then starting added partitions at se time intervals, the problem will at least not keep getting worse.

Leave a Reply to Stephen HorneCancel reply