
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Unique Rows in an R Data Frame
A unique row in an R data frame means that all the elements in that row are not repeated with the same combination in the whole data frame. In simple words, we can say that if we have a data frame called df that contains 3 columns and 5 rows then all the values in a particular row are not repeated for any other row. The search of this type of rows might be required when we have a lot of duplicate rows in our data set. To do this, we can use group_by_all function of dplyr package as shown in the below examples.
Example1
Consider the below data frame −
> x1<-rpois(20,1) > x2<-rpois(20,1) > x3<-rpois(20,1) > df1<-data.frame(x1,x2,x3) > df1
Output
x1 x2 x3 1 1 0 2 2 2 1 2 3 1 0 1 4 0 1 0 5 0 0 1 6 1 1 1 7 0 0 0 8 0 1 1 9 0 0 0 10 1 0 1 11 2 2 2 12 1 2 1 13 2 0 2 14 0 1 0 15 0 1 1 16 1 0 1 17 0 0 2 18 1 1 1 19 4 2 0 20 2 2 0
Loading dplyr package and finding unique rows in df1 −
> library(dplyr) > df1%>%group_by_all%>%count # A tibble: 14 x 4 # Groups: x1, x2, x3 [14]
Output
x1 x2 x3 n <int> <int> <int> <int> 1 0 0 0 2 2 0 0 1 1 3 0 0 2 1 4 0 1 0 2 5 0 1 1 2 6 1 0 1 3 7 1 0 2 1 8 1 1 1 2 9 1 2 1 1 10 2 0 2 1 11 2 1 2 1 12 2 2 0 1 13 2 2 2 1 14 4 2 0 1
Example2
> y1<-sample(c("Yes","No"),20,replace=TRUE) > y2<-sample(c("Yes","No"),20,replace=TRUE) > df2<-data.frame(y1,y2) > df2
Output
y1 y2 1 No Yes 2 No Yes 3 No No 4 Yes No 5 No No 6 Yes Yes 7 No No 8 Yes Yes 9 No No 10 No No 11 No Yes 12 No Yes 13 Yes No 14 No Yes 15 No No 16 Yes No 17 Yes No 18 No Yes 19 No Yes 20 Yes No
Finding unique rows in df2 −
> df2%>%group_by_all%>%count # A tibble: 4 x 3 # Groups: y1, y2 [4]
Output
y1 y2 n <int> <int> <int> 1 No No 6 2 No Yes 7 3 Yes No 5 4 Yes Yes 2
Example3
> z1<-sample(1:4,20,replace=TRUE) > z2<-sample(1:4,20,replace=TRUE) > df3<-data.frame(z1,z2) > df3
Output
z1 z2 1 1 4 2 2 3 3 1 4 4 1 3 5 4 3 6 2 3 7 3 2 8 1 3 9 1 3 10 1 4 11 4 1 12 2 1 13 4 4 14 4 4 15 3 3 16 4 2 17 4 1 18 4 2 19 2 1 20 1 3
Finding unique rows in df3 −
> df3%>%group_by_all%>%count # A tibble: 10 x 3 # Groups: z1, z2 [10] z1 z2 n
Output
<int> <int> <int> 1 1 3 4 2 1 4 3 3 2 1 2 4 2 3 2 5 3 2 1 6 3 3 1 7 4 1 2 8 4 2 2 9 4 3 1 10 4 4 2
Advertisements