How can I remove duplicate records in my tables?

All to often we end up with duplicate rows in a table.  The best way to keep duplicate rows out of the database is to not let them in.  But assume that they are there.  This bit of sample code shows how to delete those duplicate rows quickly and easily in a single statement.  No temp tables required (I use a temp table to put the data into for example purposes).  This code is for SQL 2005 and up as it uses some features which were introduced in SQL Server 2005.  SQL Server 2000 would require a totally different technique.

CREATE TABLE #DuplicateRows /*Create a new table*/
(Col1 INT,
Col2 INT,
Col3 INT)

INSERT INTO #DuplicateRows /*Load up duplicate rows*/
SELECT 1,1,1
UNION ALL
SELECT 1,1,1
UNION ALL
SELECT 1,1,1
UNION ALL
SELECT 2,2,2
UNION ALL
SELECT 2,2,2
UNION ALL
SELECT 2,2,2

SELECT *
FROM #DuplicateRows; /*Check that the data is actually hosed*/

WITH Cleaning AS (SELECT ROW_NUMBER() OVER(ORDER BY Col1, Col2, Col3) as row,
Col1,
Col2,
Col3
FROM #DuplicateRows)

DELETE FROM Cleaning /*Delete the rows which are duplicates*/
WHERE Row NOT IN (SELECT row FROM  (SELECT Col1, Col2, Col3, MIN(row) row
FROM Cleaning a
GROUP BY Col1, Col2, Col3) b)

SELECT * /*Check the table to see that it is clean*/
FROM #DuplicateRows

DROP TABLE #DuplicateRows /*Clean up the table*/

Hopefully you find this code useful.

Denny

Share

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?