
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find and Remove Duplicates from a Table in Oracle
Problem Statement:
You want to find and remove duplicates from a table in Oracle.
Solution: We can use Oracle’s internal ROWID value for uniquely identifying rows in a table. The sample syntax to acheive this would like below.
delete from table where rowid in (... query here ...)
To demonstrate the usage, we will begin by creating sample data.
Example
-- table with tennis player rankings DROP TABLE atp_stats; CREATE TABLE atp_stats ( player_rank NUMBER NOT NULL, player_name VARCHAR2(100) NOT NULL, time_range TIMESTAMP(6)); -- sample records INSERT INTO atp_stats VALUES (1,'ROGER FEDERER',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (2,'RAFAEL NADAL',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (3,'NOVAK DJOKOVIC',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (4,'ANDY MURRAY',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (1,'ROGER FEDERER',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (2,'RAFAEL NADAL',CURRENT_TIMESTAMP); INSERT INTO atp_stats VALUES (3,'NOVAK DJOKOVIC',CURRENT_TIMESTAMP); COMMIT;
Looking at the data we just created.
Example
SELECT * FROM atp_stats ORDER BY 2;
player_rank |
player_name |
4 |
ANDY MURRAY |
3 |
NOVAK DJOKOVIC |
3 |
NOVAK DJOKOVIC |
2 |
RAFAEL NADAL |
2 |
RAFAEL NADAL |
1 |
ROGER FEDERER |
1 |
ROGER FEDERER |
So, we have inserted 3 duplciates which we wanted to remove. before we go on and write a Delete statement, let us understand the inner query with ROWID.
Example
SELECT rowid FROM ( SELECT player_rank, player_rank, rowid , row_number() over (partition BY player_rank, player_name order by player_rank,player_name) AS rnk FROM atp_stats ) WHERE rnk > 1;
I had intentionally added the columns player_rank and player_name to this innermost subquery to make the logic understandable. Ideally, innermost subquery could be written without them to the same effect. If we execute just this innermost query offcourse with the extra columns selected for clarity, we see these results.
player_rank |
player_name |
rowid |
rnk |
4 |
ANDY MURRAY |
AAAPHcAAAAAB/4TAAD |
1 |
3 |
NOVAK DJOKOVIC |
AAAPHcAAAAAB/4TAAC |
1 |
3 |
NOVAK DJOKOVIC |
AAAPHcAAAAAB/4TAAG |
2 |
2 |
RAFAEL NADAL |
AAAPHcAAAAAB/4TAAB |
1 |
2 |
RAFAEL NADAL |
AAAPHcAAAAAB/4TAAF |
2 |
1 |
ROGER FEDERER |
AAAPHcAAAAAB/4TAAE |
1 |
1 |
ROGER FEDERER |
AAAPHcAAAAAB/4TAAA |
2 |
The SQL returns the rowid for all the rows in the table. The ROW_NUMBER() function then works over sets of id and player_name driven by the PARTITION BY instruction. This means that for every unique player_rank and player_name, ROW_NUMBER will start a running count of rows we have aliased as rnk. When a new player_rank and player_name combination is observed, the rnk counter resets to 1.
Now we can apply the DELETE operator to remove the duplicate values as below.
SQL: Remove duplicates
Example
DELETE FROM atp_stats WHERE rowid IN ( SELECT rowid FROM( SELECT player_rank, player_name, rowid , row_number() over (partition BY player_rank, player_name order by player_rank,player_name) AS rnk FROM atp_stats ) WHERE rnk > 1 );
Output
3 rows deleted.
player_rank |
player_name |
4 |
ANDY MURRAY |
3 |
NOVAK DJOKOVIC |
2 |
RAFAEL NADAL |
1 |
ROGER FEDERER |