Duplicate records in SQL Server can lead to inaccurate reporting, data inconsistencies, and performance issues. In this article, we’ll go over how to identify and safely remove duplicate rows while keeping at least one unique record.
Detecting Duplicates
To find duplicate records in a table, use the GROUP BY
and HAVING
clauses to count occurrences of each unique combination of values:
SELECT column1, column2, COUNT(*)
FROM YourTable
GROUP BY column1, column2
HAVING COUNT(*) > 1;
Replace column1, column2
with the columns that define a duplicate in your dataset.
If you need to see the actual duplicate rows, use a ROW_NUMBER()
approach:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
FROM YourTable
) t
WHERE row_num > 1;
Here, id
should be a unique column to order the duplicates.
Removing Duplicates
Method 1: Using ROW_NUMBER()
The safest way to remove duplicates while keeping one unique record is by using ROW_NUMBER()
.
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY column1, column2 ORDER BY id) AS row_num
FROM YourTable
)
DELETE FROM CTE WHERE row_num > 1;
This deletes all duplicate records while keeping the first occurrence.
Method 2: Using DISTINCT INTO a New Table
If you want to be extra cautious, create a new table with only unique records:
SELECT DISTINCT * INTO NewTable FROM YourTable;
Then, drop the old table and rename NewTable
back to YourTable
.
Final Thoughts
Always backup your data before running delete operations to prevent accidental data loss. By regularly cleaning up duplicates, you can keep your SQL Server database efficient and reliable.