
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Count Non-Missing Values by Group in R Data Frame
To find the number of non-missing values in each column by group in an R data frame, we can use summarise_each function of dplyr package with negation of is.na function.
For Example, if we have a data frame called df that contains a grouping column say G and some other columns having few NAs then we can find the number of non-missing values in each column by grouping column G with the help of below command −
df%%group_by(G)%%summarise_each(funs(sum(!is.na(.))))
Example 1
Following snippet creates a sample data frame −
Group<-sample(LETTERS[1:4],20,replace=TRUE) x1<-sample(c(NA,1,2),20,replace=TRUE) x2<-sample(c(NA,round(rnorm(3),2)),20,replace=TRUE) df1<-data.frame(Group,x1,x2) df1
The following dataframe is created
Group x1 x2 1 C 2 -0.49 2 B 1 1.86 3 A NA NA 4 A NA NA 5 D 1 1.12 6 A NA 1.12 7 B 1 1.86 8 C 1 -0.49 9 D 2 NA 10 A NA 1.12 11 A NA 1.12 12 C NA 1.86 13 A 2 -0.49 14 A 1 -0.49 15 C NA NA 16 C NA 1.86 17 A NA 1.12 18 D 2 -0.49 19 C 2 1.86 20 C 2 NA
To load dplyr package and find the number of non-missing values in each column by Group column in df1 on the above created data frame, add the following code to the above snippet −
Group<-sample(LETTERS[1:4],20,replace=TRUE) x1<-sample(c(NA,1,2),20,replace=TRUE) x2<-sample(c(NA,round(rnorm(3),2)),20,replace=TRUE) df1<-data.frame(Group,x1,x2) library(dplyr) df1%%group_by(Group)%%summarise_each(funs(sum(!is.na(.)))) # A tibble: 4 x 3
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Group x1 x2 <chr <int <int 1 A 2 6 2 B 2 2 3 C 4 5 4 D 3 2
Example 2
Following snippet creates a sample data frame −
Class<-sample(c("I","II","III"),20,replace=TRUE) Score<-sample(c(NA,1:5),20,replace=TRUE) Rank<-sample(c(NA,0:2),20,replace=TRUE) df2<-data.frame(Class,Score,Rank) df2
The following dataframe is created
Class Score Rank 1 II NA NA 2 III 5 2 3 II 5 2 4 II 4 NA 5 II NA NA 6 II 4 2 7 II 2 1 8 III 4 2 9 II 2 NA 10 III NA 0 11 I 1 2 12 II NA NA 13 I 2 1 14 I 5 NA 15 I 3 0 16 I 4 0 17 I 3 2 18 II 2 NA 19 II 2 NA 20 II 2 1
To find the number of non-missing values in each column by Class column in df2 on the above created data frame, add the following code to the above snippet −
Class<-sample(c("I","II","III"),20,replace=TRUE) Score<-sample(c(NA,1:5),20,replace=TRUE) Rank<-sample(c(NA,0:2),20,replace=TRUE) df2<-data.frame(Class,Score,Rank) df2%%group_by(Class)%%summarise_each(funs(sum(!is.na(.)))) # A tibble: 3 x 3
Output
If you execute all the above given snippets as a single program, it generates the following Output −
Class Score Rank <chr <int <int 1 I 6 5 2 II 8 4 3 III 2 3