Find Duplicates in MS SQL Server

Last Updated : 30 Aug, 2024

Finding duplicate values in a database is a common task when managing data integrity. In SQL, several methods can be employed to identify and handle duplicate entries.

In this article, We will explore two effective techniques for locating duplicates using SQL queries: the GROUP BY clause and the ROW_NUMBER() function.

Find Duplicate Values Using the GROUP BY Clause

The GROUP BY clause is a straightforward way to identify duplicates by grouping rows based on specified columns and using the HAVING clause to filter groups with more than one occurrence.

This method is useful for detecting repeated combinations of column values.

Syntax:

SELECT col1, col2, ...COUNT(*)
FROM table_name
GROUP BY col1, col2, ...
HAVING COUNT(*) > 1;

In this method, we group the column values, and values that have a count greater than 1 are the duplicate values in that column.

Example of Finding duplicate values using the GROUP BY clause

Let us create a table named Geek that contains three columns: ID, A, and B.

CREATE TABLE Geek (
    ID INT IDENTITY(1, 1),
    A INT,
    B INT,
    PRIMARY KEY (ID)
);

Let us add some values to the table Geek

INSERT INTO Geek (A, B)
VALUES
    (1, 1),
    (1, 2),
    (1, 3),
    (2, 1),
    (1, 2),
    (1, 3),
    (2, 1),
    (2, 2);

Let's write an query to find the duplicate rows using GROUP BY clause in the Geek table :

SELECT
    A,
    B,
    COUNT(*) AS num
FROM
    Geek
GROUP BY
    A,
    B
HAVING
    COUNT(*) > 1;

Output:

A	B	num
2	1	2
1	2	2
1	3	2

Explanation: The query identifies combinations of columns `A` and `B` in the `Geek` table that appear more than once. It groups the rows by `A` and `B`, counts the occurrences of each combination, and uses the `HAVING` clause to filter out groups with a count of 1 or less. The result is a list of duplicate combinations along with their counts.

To find the full row details for each duplicate row, JOIN the output of the above query with the Geek table using CTE :

WITH CTE AS (
    SELECT A, B, COUNT(*) AS num
    FROM Geek
    GROUP BY A, B
    HAVING COUNT(*) > 1
)
SELECT Geek.ID, Geek.A, Geek.B
FROM Geek
JOIN CTE ON CTE.A = Geek.A AND CTE.B = Geek.B
ORDER BY Geek.A, Geek.B;

Output:

ID	A	B
2	1	2
5	1	2
6	1	3
3	1	3
4	2	1
7	2	1

Explanation: The query uses a Common Table Expression (CTE) to identify the combinations of columns `A` and `B` in the `Geek` table that occur more than once. It groups by `A` and `B`, and the `HAVING` clause filters to include only those combinations with a count greater than 1. The outer query then joins this CTE with the original `Geek` table to retrieve all rows that have these duplicate combinations, and it orders the results by `A` and `B`.

Find Duplicate Values Using ROW_NUMBER() Function

To find the duplicate values using the ROW_NUMBER() function, follow the given syntax.

Syntax :

WITH cte AS (
    SELECT 
        col, 
        ROW_NUMBER() OVER (PARTITION BY col ORDER BY col) AS row_num
    FROM 
        table_name
)
SELECT * FROM cte WHERE row_num > 1;

MS SQL Server query to find the duplicate rows using ROW_NUMBER() function in the Geek table :

Query:

WITH CTE AS (
    SELECT A, B, 
    ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B) 
    AS rownum FROM Geek
)
SELECT * FROM CTE WHERE rownum > 1;

Output:

A	B	rownum
1	2	2
1	3	2
2	1	2

Explanation: The query uses a Common Table Expression (CTE) to assign a unique row number (`rownum`) to each row in the `Geek` table based on the partition of columns `A` and `B`, ordered by `A` and `B`. The outer query then selects rows where `rownum` is greater than 1, effectively retrieving duplicate rows based on the combination of columns `A` and `B`.

Conclusion

Identifying and handling duplicate data is crucial for maintaining data quality in databases. The GROUP BY clause and the ROW_NUMBER() function offer powerful techniques for finding duplicates, each with its own advantages. The GROUP BY method is efficient for detecting repeated combinations, while ROW_NUMBER() provides a detailed approach to pinpoint specific duplicates.

List All Databases in SQL Server

khushboogoyal499

Improve

Article Tags :

Find Duplicates in MS SQL Server

Find Duplicate Values Using the GROUP BY Clause

Example of Finding duplicate values using the GROUP BY clause

Find Duplicate Values Using ROW_NUMBER() Function

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?