Using batching to do large operations without filling the transaction log

Deleting large amounts of data from a table is usually an important task, but if you don’t have a maintenance window to work within then you can end up causing locking and blocking against the other processes which are accessing the database.  Not to mention you will cause a very large amount of data into the transaction log no matter what your transaction logging level is set to.

Say you have a table with a date column and you need to delete a million plus records.  Doing this in a single transaction will put all million transactions into your transaction log, plus cause any other processes which are trying to access the table to be blocked.

 However if we batch the transaction into many smaller transactions our transaction log will not fill up as we can backup the log using our normal log backup methods throughout the process, or if we use SIMPLE recovery on our database then transactions will be removed from the log automatically.

In SQL 2000 and below you have to set the ROWCOUNT session variable to a number above 0, which would cause SQL to delete the first n records that it comes across.  In SQL 2005 we can use the TOP parameter as part of our DELETE command having the same effect, but without having to reset the session variable.

In SQL 2000 or below you can use a syntax like this one.

DECLARE @c BIT, @d DATETIME
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
SET ROWCOUNT = 1000
WHILE @c = 0
BEGIN
DELETE FROM Table
WHERE CreateDate

If you are using SQL 2005 you can use this very similar syntax.

DECLARE @c BIT, @d datetime
SET @d = dateadd(yy, -1, getdate())
SET @c = 0
WHILE @c = 0
BEGIN
DELETE TOP (1000) FROM Table
WHERE CreateDate

Both pieces of code are very similar. Declare a variable which tells the loop when to exit. Then start deleting the data. If no records are deleted, then set the variable to 1 causing us to exit the loop. Now this will usually take a little bit longer to complete than a single delete statement, but the system will continue to be responsive during the process.

The number of records which you are deleting should be adjusted based on the width of your records, the load on the database at the time of deletion, and the speed of your hard drives which hold the data files and transaction logs. I usually start at 1000 records and see how the system responds. For tables which a just a few numbers I'll put it up as high as 50k or 100k records. For very wide tables I'll drop it down to 100 or 500 records if the system can't handle 1000 records.

Denny

Update:
Sorry this post didn't look very good at first. The blog site seams to have eaten it, and I didn't notice until just now.

Share

2 Responses

  1. Is it possible to apply this same concept of minimal impact to the Log file when performing maintenace tasks like index rebuilds? We have a 100GB DB with 2 very volatile tables which require Index Rebuilds weekly on a total of 6 indexes. The Log file has 60GB of space to work with and under any other DB activity the log file seldom gets to a fourth of that size. But when we rebuild these indexes, all 60GB are consumed.

    We have the RECOVERY MODEL set to Bluk-Logged, something we can’t change due to requirements by Microsoft’s DPM Software which we use for backups. Curently I’m changing the Reovery Model to SIMPLE then running the Rebulds and switching back to Bulk-Logged once the job is done. This occurs outside normal user hours so there’s nothing else going on with the DB when the rebuilds occur. Even though this approach works, I’d love to finid a way to do this without changing the Recovery Model and without always having such a large amnount of drive space avaiable for the log file.

    Thanks

  2. No it’s not. There’s no way to batch index rebuild operations.

    By changing the recovery model from bulk-logged to simple you are breaking your transaction log series which will stop you from being able to restore your transaction logs.

    Instead you should trigger a transaction log backup as soon as your reindex job is complete.

    About all you can do would be to do log backups between each index rebuild command. You’ll need to keep that amount of drive space reserved for your transaction log.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?