
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Remove Rows with Coded Missing Values in R Data Frame
Sometimes missing values are coded and when we perform analysis without replacing those missing values the result of the analysis becomes a little difficult to interpret, especially it is difficult to understand by first time readers.
Therefore, we might want to remove rows that contains coded missing values. For this purpose, we can replace the coded missing values with NA and then replace the rows with NA as shown in the below given examples.
Example 1
Following snippet creates a data frame, if missing values are coded as 1 −
x1<-rpois(20,1) x2<-rpois(20,1) df1<-data.frame(x1,x2) df1
The following dataframe is created −
x1 x2 1 1 0 2 1 2 3 1 3 4 1 1 5 0 1 6 0 1 7 1 0 8 0 1 9 2 1 10 1 2 11 0 3 12 1 0 13 1 2 14 2 2 15 0 0 16 2 3 17 1 1 18 2 0 19 0 0 20 1 1
To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −
x1<-rpois(20,1) x2<-rpois(20,1) df1<-data.frame(x1,x2) df1[df1==1]<-NA df1
Output
If you execute all the above given snippets as a single program, it generates the following output: −
x1 x2 1 NA 0 2 NA 2 3 NA 3 4 NA NA 5 0 NA 6 0 NA 7 NA 0 8 0 NA 9 2 NA 10 NA 2 11 0 3 12 NA 0 13 NA 2 14 2 2 15 0 0 16 2 3 17 NA NA 18 2 0 19 0 0 20 NA NA
To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −
df1[rowSums(is.na(df1))<ncol(df1),]
Output
If you execute all the above given snippets as a single program, it generates the following output: −
x1 x2 1 NA 0 2 NA 2 3 NA 3 5 0 NA 6 0 NA 7 NA 0 8 0 NA 9 2 NA 10 NA 2 11 0 3 12 NA 0 13 NA 2 14 2 2 15 0 0 16 2 3 18 2 0 19 0 0
Example 2
Following snippet creates a data frame, if missing values are coded as 99 −
y1<-sample(c(1,99),20,replace=TRUE) y2<-sample(c(5,99),20,replace=TRUE) df2<-data.frame(y1,y2) df2
The following dataframe is created −
y1 y2 1 99 5 2 99 5 3 99 5 4 1 99 5 1 99 6 1 5 7 1 99 8 99 99 9 99 99 10 99 99 11 99 99 12 99 5 13 1 99 14 99 5 15 99 5 16 99 99 17 99 5 18 99 99 19 99 99 20 99 5
To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −
y1<-sample(c(1,99),20,replace=TRUE) y2<-sample(c(5,99),20,replace=TRUE) df2<-data.frame(y1,y2) df2[df2==99]<-NA df2
Output
If you execute all the above given snippets as a single program, it generates the following output: −
y1 y2 1 NA 5 2 NA 5 3 NA 5 4 1 NA 5 1 NA 6 1 5 7 1 NA 8 NA NA 9 NA NA 10 NA NA 11 NA NA 12 NA 5 13 1 NA 14 NA 5 15 NA 5 16 NA NA 17 NA 5 18 NA NA 19 NA NA 20 NA 5
To remove rows that contains coded missing value for all columns in an R data frame, add the following code to the above snippet −
df2[rowSums(is.na(df2))<ncol(df2),]
Output
If you execute all the above given snippets as a single program, it generates the following output: −
y1 y2 1 NA 5 2 NA 5 3 NA 5 4 1 NA 5 1 NA 6 1 5 7 1 NA 12 NA 5 13 1 NA 14 NA 5 15 NA 5 17 NA 5 20 NA 5