Find Duplicates in MS SQL Server
Last Updated :
30 Aug, 2024
Finding duplicate values in a database is a common task when managing data integrity. In SQL, several methods can be employed to identify and handle duplicate entries.
In this article, We will explore two effective techniques for locating duplicates using SQL queries: the GROUP BY
clause and the ROW_NUMBER
()
function.
Find Duplicate Values Using the GROUP BY Clause
The GROUP BY
clause is a straightforward way to identify duplicates by grouping rows based on specified columns and using the HAVING
clause to filter groups with more than one occurrence.
This method is useful for detecting repeated combinations of column values.
Syntax:
SELECT col1, col2, ...COUNT(*)
FROM table_name
GROUP BY col1, col2, ...
HAVING COUNT(*) > 1;
In this method, we group the column values, and values that have a count greater than 1 are the duplicate values in that column.
Example of Finding duplicate values using the GROUP BY clause
Let us create a table named Geek that contains three columns: ID, A, and B.
CREATE TABLE Geek (
ID INT IDENTITY(1, 1),
A INT,
B INT,
PRIMARY KEY (ID)
);
Let us add some values to the table Geek
INSERT INTO Geek (A, B)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 2);
Let’s write an query to find the duplicate rows using GROUP BY clause in the Geek table :
SELECT
A,
B,
COUNT(*) AS num
FROM
Geek
GROUP BY
A,
B
HAVING
COUNT(*) > 1;
Output:
A |
B |
num |
2 |
1 |
2 |
1 |
2 |
2 |
1 |
3 |
2 |
Explanation: The query identifies combinations of columns `A` and `B` in the `Geek` table that appear more than once. It groups the rows by `A` and `B`, counts the occurrences of each combination, and uses the `HAVING` clause to filter out groups with a count of 1 or less. The result is a list of duplicate combinations along with their counts.
To find the full row details for each duplicate row, JOIN the output of the above query with the Geek table using CTE :
WITH CTE AS (
SELECT A, B, COUNT(*) AS num
FROM Geek
GROUP BY A, B
HAVING COUNT(*) > 1
)
SELECT Geek.ID, Geek.A, Geek.B
FROM Geek
JOIN CTE ON CTE.A = Geek.A AND CTE.B = Geek.B
ORDER BY Geek.A, Geek.B;
Output:
ID |
A |
B |
2 |
1 |
2 |
5 |
1 |
2 |
6 |
1 |
3 |
3 |
1 |
3 |
4 |
2 |
1 |
7 |
2 |
1 |
Explanation: The query uses a Common Table Expression (CTE) to identify the combinations of columns `A` and `B` in the `Geek` table that occur more than once. It groups by `A` and `B`, and the `HAVING` clause filters to include only those combinations with a count greater than 1. The outer query then joins this CTE with the original `Geek` table to retrieve all rows that have these duplicate combinations, and it orders the results by `A` and `B`.
Find Duplicate Values Using ROW_NUMBER() Function
To find the duplicate values using the ROW_NUMBER() function, follow the given syntax.
Syntax :
WITH cte AS (
SELECT
col,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY col) AS row_num
FROM
table_name
)
SELECT * FROM cte WHERE row_num > 1;
MS SQL Server query to find the duplicate rows using ROW_NUMBER() function in the Geek table :
Query:
WITH CTE AS (
SELECT A, B,
ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY A, B)
AS rownum FROM Geek
)
SELECT * FROM CTE WHERE rownum > 1;
Output:
A |
B |
rownum |
1 |
2 |
2 |
1 |
3 |
2 |
2 |
1 |
2 |
Explanation: The query uses a Common Table Expression (CTE) to assign a unique row number (`rownum`) to each row in the `Geek` table based on the partition of columns `A` and `B`, ordered by `A` and `B`. The outer query then selects rows where `rownum` is greater than 1, effectively retrieving duplicate rows based on the combination of columns `A` and `B`.
Conclusion
Identifying and handling duplicate data is crucial for maintaining data quality in databases. The GROUP BY
clause and the ROW_NUMBER()
function offer powerful techniques for finding duplicates, each with its own advantages. The GROUP BY
method is efficient for detecting repeated combinations, while ROW_NUMBER()
provides a detailed approach to pinpoint specific duplicates.
Similar Reads
How to Find Duplicate Records in SQL?
To find duplicate records in SQL, we can use the GROUP BY and HAVING clauses. The GROUP BY clause allows us to group values in a column, and the COUNT function in the HAVING clause shows the count of the values in a group. Using the HAVING clause with a condition of COUNT(*) > 1, we can identify
3 min read
MIN() Function in SQL Server
MIN() : This function in SQL Server is used to find the value that is minimum in the group of values stated. Features : This function is used to find the minimum value.This function comes under Numeric Functions.This function accepts only one parameter namely expression. Syntax : MIN(expression) Par
2 min read
MAX() Function in SQL Server
MAX() : This function in SQL Server is used to find the value that is maximum in the group of values stated. Features : This function is used to find the maximum value.This function comes under Numeric Functions.This function accepts only one parameter namely expression. Syntax : MAX(expression) Par
2 min read
How to Find Duplicate Rows in PL/SQL
Finding duplicate rows is a widespread requirement when dealing with database analysis tasks. Duplicate rows often create problems in analyzing tasks. Detecting them is very important. PL/SQL is a procedural extension for SQL. We can write custom scripts with the help of PL/SQL and thus identifying
5 min read
SQL Query to Find Duplicate Names in a Table
Duplicate records in a database can create confusion, generate incorrect results, and waste storage space. Itâs essential to identify and remove duplicates to maintain data accuracy and database performance. In this article, weâll discuss the reasons for duplicates, how to find duplicate records in
3 min read
Joins in MS SQL Server
A database comprises tables and each table in case of RDBMS is called a relation. Let us consider a sample database named University and it has two tables named Student and Marks. If a user wants to transfer a certain set of rows, insert into select statement is used along with the query. But if a u
2 min read
DATENAME() Function in SQL Server
DATENAME() function : This function in SQL Server is used to find a given part of the specified date. Moreover, it returns the output value as a string. Features : This function is used to find a given part of the specified date.This function comes under Date Functions.This function accepts two para
2 min read
List All Databases in SQL Server
In SQL Server, databases are crucial for storing and managing data efficiently. Whether we are managing a large enterprise system or a small application, understanding how to list all the databases on our SQL Server is essential. In this article, we will write SQL queries that help us to retrieve al
3 min read
Introduction of MS SQL Server
Data is a collection of facts and figures and we have humungous data available to the users via the internet and other sources. To manipulate the data, Structured Query Language (SQL) in short has been introduced years ago. There are different versions of SQL available in the market provided by diff
2 min read
Where clause in MS SQL Server
In this article, where clause will be discussed alongside example. Introduction : To extract the data at times, we need a particular conditions to satisfy. 'where' is a clause used to write the condition in the query. Syntax : select select_list from table_name where condition A example is given bel
1 min read