How to Find and Remove Duplicate Records in SQL and Keep Only One Record

5/5 - (1 vote)

When working with databases in real-life scenarios, you often encounter duplicate records. These duplicates can lead to inaccurate reporting, double counting, or performance issues if they are not removed. In this post, we will explain how to identify duplicate records in SQL and how to remove them while keeping only one copy of each record.

Follow these steps to find and remove duplicate records while keeping just one copy.

Step-1: Create a Sample Table
We will start by creating a table named Employees.

CREATE TABLE Employees (
EmployeeID INT,
EmployeeName VARCHAR(50),
Department VARCHAR(50)
);

Step-2: Insert Sample Records with Duplicates
We will insert some employee data, including duplicate records.

INSERT INTO Employees (EmployeeID, EmployeeName, Department) VALUES
(1, 'Sandeep', 'Human Resources'),
(2, 'Ravi', 'Finance'),
(3, 'Ankit', 'Information Technology'),
(3, 'Ankit', 'Information Technology'), -- duplicate
(4, 'Neha', 'Human Resources'),
(2, 'Ravi', 'Finance'); -- duplicate

Step-3: Identify Duplicate Records

To identify duplicate records, we can use the GROUP BY statement with the HAVING clause:

SELECT EmpID, EmpName, Department, COUNT(*) AS DuplicateCount
FROM Employees
GROUP BY EmpID, EmpName, Department
HAVING COUNT(*) > 1;

When you execute the above query, you will see that EmployeeID 2 and EmployeeID 3 are duplicate records.

Step-4: Remove Duplicate Records and Keep One Record

To remove duplicates while keeping only one copy, we can use a Common Table Expression with the ROW_NUMBER() function:

WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY EmpID, EmpName, Department
ORDER BY EmpID
) AS RowNum
FROM Employees
)
DELETE FROM CTE WHERE RowNum > 1;

This method gives a number to each duplicate row and then deletes all rows after the first one, keeping only the first record from each set of duplicates.

Summary Points:

Use the GROUP BY statement with the HAVING clause to identify duplicate records.
Use the ROW_NUMBER() function with a Common Table Expression to remove duplicates while keeping one record.
Always test queries on a copy of the table or create a backup before deleting records in a live database.

How to Find and Remove Duplicate Records in SQL and Keep Only One Record

Follow these steps to find and remove duplicate records while keeping just one copy.

Summary Points:

Share this:

Leave a ReplyCancel reply

Discover more from Power BI Docs