
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace NA Values in R Data Frame with Column Mean
In the whole world, the first step people teach to impute missing values is replacing them with the relevant mean. That means if we have a column which has some missing values then replace it with the mean of the remaining values. In R, we can do this by replacing the column with missing values using mean of that column and passing na.rm = TRUE argument along with the same.
Consider the below data frame −
Example
set.seed(121) x<-sample(c(0:2,NA),20,replace=TRUE) y<-sample(c(0:10,NA),20,replace=TRUE) z<-sample(c(rnorm(2,1,0.40),NA),20,replace=TRUE) df<-data.frame(x,y,z) df
Output
x y z 1 NA 1 1.525471 2 NA 10 1.525471 3 NA 0 NA 4 2 1 NA 5 NA 3 NA 6 0 4 1.525471 7 2 9 NA 8 0 5 NA 9 2 7 NA 10 2 6 1.296308 11 2 1 1.296308 12 0 NA 1.525471 13 NA 8 1.296308 14 0 5 NA 15 1 7 1.296308 16 NA 1 1.525471 17 0 1 NA 18 NA 5 1.525471 19 0 8 1.296308 20 1 1 1.296308
Replacing NA’s in column x with mean of the remaining values −
Example
df$x[is.na(df$x)]<-mean(df$x,na.rm=TRUE) df
Output
x y z 1 0.9230769 1 1.525471 2 0.9230769 10 1.525471 3 0.9230769 0 NA 4 2.0000000 1 NA 5 0.9230769 3 NA 6 0.0000000 4 1.525471 7 2.0000000 9 NA 8 0.0000000 5 NA 9 2.0000000 7 NA 10 2.0000000 6 1.296308 11 2.0000000 1 1.296308 12 0.0000000 NA 1.525471 13 0.9230769 8 1.296308 14 0.0000000 5 NA 15 1.0000000 7 1.296308 16 0.9230769 1 1.525471 17 0.0000000 1 NA 18 0.9230769 5 1.525471 19 0.0000000 8 1.296308 20 1.0000000 1 1.296308
Replacing NA’s in column y with mean of the remaining values −
Example
df$y[is.na(df$y)]<-mean(df$y,na.rm=TRUE) df
Output
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 NA 4 2.0000000 1.000000 NA 5 0.9230769 3.000000 NA 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 NA 8 0.0000000 5.000000 NA 9 2.0000000 7.000000 NA 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 NA 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 NA 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308
Replacing NA’s in column z with mean of the remaining values −
Example
df$z[is.na(df$z)]<-mean(df$z,na.rm=TRUE) df
Output
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 1.410890 4 2.0000000 1.000000 1.410890 5 0.9230769 3.000000 1.410890 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 1.410890 8 0.0000000 5.000000 1.410890 9 2.0000000 7.000000 1.410890 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 1.410890 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 1.410890 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308
Advertisements