Deleting Duplicate Rows in SQL
Deleting Duplicate Rows in SQL
.com
DELETE
DUPLICATE ROWS
IN SQL
Duplicate rows in a database can cause data
integrity issues and affect the accuracy of data
analysis. It’s essential to identify and remove
duplicates to ensure the dataset is clean and
reliable.
Shah Jahan
linkedin.com/in/shah907
deepdecide
.com
Example
This query groups the data based on Column1 and Column2 and counts the number of
occurrences of each group. If the count is greater than 1, those rows are duplicates.
linkedin.com/in/shah907
deepdecide .com
Example
WITH CTE AS (
SELECT Column1, Column2, ROW_NUMBER() OVER (PARTITION BY
Column1, Column2 ORDER BY Column1) AS RowNum
FROM your_table
)
DELETE FROM CTE
WHERE RowNum > 1;
The ROW_NUMBER() function assigns a unique row number to each row within a
partition of the result set. Rows that have the same values in Column1 and Column2
are partitioned together.
The DELETE statement removes all rows where the RowNum is greater than 1,
effectively keeping only one instance of each duplicate.
linkedin.com/in/shah907
deepdecide.com
Example
CREATE TABLE temp_table AS
SELECT DISTINCT Column1, Column2
FROM your_table;
Then, the original table is truncated (which removes all rows) and repopulated with
the distinct data.
linkedin.com/in/shah907
deepdecide
4. Deleting Duplicates Based on
.com
a Single Column
If you want to delete duplicates based on a specific column, you can use the following
method:
Example
The MIN(id) function selects the row with the smallest ID for each group of duplicates.
The DELETE query removes all other rows that are not in this set, effectively keeping
one row per duplicate.
linkedin.com/in/shah907
deepdecide.com
JOIN NOW
linkedin.com/in/shah907
deepdecide .com
Shah Jahan