Skip to content
Home » SQL Server » How to Find and Remove Duplicate Records in SQL and Keep Only One Record

How to Find and Remove Duplicate Records in SQL and Keep Only One Record

5/5 - (1 vote)

When working with databases in real-life scenarios, you often encounter duplicate records. These duplicates can lead to inaccurate reporting, double counting, or performance issues if they are not removed. In this post, we will explain how to identify duplicate records in SQL and how to remove them while keeping only one copy of each record.

Follow these steps to find and remove duplicate records while keeping just one copy.

Step-1: Create a Sample Table
We will start by creating a table named Employees.

CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50)
);

Step-2: Insert Sample Records with Duplicates
We will insert some employee data, including duplicate records.

INSERT INTO Employees (EmployeeID, EmployeeName, Department) VALUES
(1, 'Sandeep', 'Human Resources'),
(2, 'Ravi', 'Finance'),
(3, 'Ankit', 'Information Technology'),
(3, 'Ankit', 'Information Technology'), -- duplicate
(4, 'Neha', 'Human Resources'),
(2, 'Ravi', 'Finance'); -- duplicate
SQL Tables
SQL Tables

Step-3: Identify Duplicate Records

To identify duplicate records, we can use the GROUP BY statement with the HAVING clause:

SELECT EmpID, EmpName, Department, COUNT(*) AS DuplicateCount
FROM Employees
GROUP BY EmpID, EmpName, Department
HAVING COUNT(*) > 1;
Duplicate records in sql
Duplicate records in sql

When you execute the above query, you will see that EmployeeID 2 and EmployeeID 3 are duplicate records.



Step-4: Remove Duplicate Records and Keep One Record

To remove duplicates while keeping only one copy, we can use a Common Table Expression with the ROW_NUMBER() function:

WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY EmpID, EmpName, Department
ORDER BY EmpID
) AS RowNum
FROM Employees
)
DELETE FROM CTE WHERE RowNum > 1;
Remove duplicate and keep only one
Remove duplicate and keep only one

This method gives a number to each duplicate row and then deletes all rows after the first one, keeping only the first record from each set of duplicates.

Final result of query
Final result of query

Summary Points:

  • Use the GROUP BY statement with the HAVING clause to identify duplicate records.
  • Use the ROW_NUMBER() function with a Common Table Expression to remove duplicates while keeping one record.
  • Always test queries on a copy of the table or create a backup before deleting records in a live database.

Loading

Leave a Reply

Discover more from Power BI Docs

Subscribe now to keep reading and get access to the full archive.

Continue reading