
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Remove Rows in R Data Frame with Duplicate Values
To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.
Example1
Consider the below data frame −
> x1<-rpois(20,1) > x2<-rpois(20,1) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 0 0 2 0 0 3 1 0 4 0 1 5 0 0 6 1 1 7 0 1 8 1 1 9 1 2 10 0 0 11 1 1 12 0 0 13 1 1 14 2 2 15 1 1 16 1 0 17 1 1 18 0 3 19 2 0 20 0 0
Removing rows based on x1 that has number of duplicate values greater than or equal to 3 −
Example
df1[df1$x1 %in% names(which(table(df1$x1)<3)),]
Output
x1 x2 14 2 2 19 2 0
Example2
> y1<-rpois(20,2) > y2<-rpois(20,2) > y3<-rpois(20,2) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 2 2 1 2 1 2 0 3 1 2 3 4 3 1 4 5 2 1 1 6 2 1 2 7 1 0 1 8 0 3 5 9 6 1 3 10 2 2 2 11 0 3 0 12 2 2 3 13 3 2 0 14 2 2 4 15 1 0 1 16 1 1 2 17 3 1 3 18 2 4 1 19 0 1 2 20 0 0 0
Removing rows based on y2 that has number of duplicate values greater than or equal to 2 −
Example
> df2[df2$y2 %in% names(which(table(df2$y2)<2)),]
Output
y1 y2 y3 18 2 4 1
Example3
> z1<-rpois(20,2) > z2<-rpois(20,2) > z3<-rpois(20,2) > z4<-rpois(20,2) > df3<-data.frame(z1,z2,z3,z4) > df3
Output
z1 z2 z3 z4 1 5 1 3 3 2 1 1 3 3 3 1 1 2 5 4 1 1 2 6 5 3 5 0 1 6 1 3 1 1 7 0 2 0 0 8 2 0 1 2 9 4 1 3 1 10 3 2 1 1 11 1 0 1 1 12 2 3 0 4 13 0 1 2 1 14 2 3 3 2 15 4 2 0 4 16 1 4 2 2 17 0 2 2 3 18 2 1 2 1 19 4 3 4 1 20 3 3 5 2
Removing rows based on z1 that has number of duplicate values greater than or equal to 2 −
Example
> df3[df3$z1 %in% names(which(table(df3$z1)<2)),]
Output
z1 z2 z3 z4 1 5 1 3 3
Advertisements