
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace Missing Values with Row Means in an R Data Frame
If we have similar characteristics in each column of an R data frame then we can replace the missing values with row means. To replace the missing values with row means we can use the na.aggregate function of zoo package but we would need to use the transposed version of the data frame as na.aggregate works for column means.
Example1
Consider the below data frame −
x1<−sample(c(NA,1,5),20,replace=TRUE) x2<−sample(c(NA,10,25),20,replace=TRUE) x3<−rpois(20,5) df1<−data.frame(x1,x2,x3) df1
Output
x1 x2 x3 1 5 10 4 2 1 NA 2 3 NA NA 5 4 5 NA 2 5 1 25 8 6 1 10 2 7 1 NA 4 8 5 NA 4 9 5 25 3 10 1 NA 5 11 1 NA 7 12 5 NA 6 13 1 25 4 14 5 NA 8 15 1 25 6 16 NA 10 6 17 5 10 5 18 5 25 8 19 NA 25 3 20 NA 25 5
Loading zoo package and replacing missing values with row means −
Example
library(zoo) df1[]<−t(na.aggregate(t(df1))) df1
Output
x1 x2 x3 1 5 10.0 4 2 1 1.5 2 3 5 5.0 5 4 5 3.5 2 5 1 25.0 8 6 1 10.0 2 7 1 2.5 4 8 5 4.5 4 9 5 25.0 3 10 1 3.0 5 11 1 4.0 7 12 5 5.5 6 13 1 25.0 4 14 5 6.5 8 15 1 25.0 6 16 8 10.0 6 17 5 10.0 5 18 5 25.0 8 19 14 25.0 3 20 15 25.0 5
Example2
y1<−sample(c(NA,525,235,401),20,replace=TRUE) y2<−rnorm(20,500,51.24) y3<−sample(c(NA,35,47),20,replace=TRUE) df2<−data.frame(y1,y2,y3) df2
Output
y1 y2 y3 1 525 555.4212 47 2 401 508.7781 47 3 401 488.3973 47 4 NA 546.6707 35 5 401 497.5346 47 6 235 460.7668 35 7 NA 495.0879 35 8 401 441.4254 47 9 NA 446.8322 47 10 235 484.8106 NA 11 235 517.4665 47 12 NA 450.1524 NA 13 525 485.2432 47 14 525 506.0650 35 15 525 470.7504 47 16 NA 370.8190 35 17 525 509.6385 35 18 525 471.0552 35 19 235 468.6052 35 20 401 472.6163 47
Replacing missing values with row means −
Example
df2[]<−t(na.aggregate(t(df2))) df2
Output
y1 y2 y3 1 525.0000 555.4212 47.0000 2 401.0000 508.7781 47.0000 3 401.0000 488.3973 47.0000 4 290.8353 546.6707 35.0000 5 401.0000 497.5346 47.0000 6 235.0000 460.7668 35.0000 7 265.0440 495.0879 35.0000 8 401.0000 441.4254 47.0000 9 246.9161 446.8322 47.0000 10 235.0000 484.8106 359.9053 11 235.0000 517.4665 47.0000 12 450.1524 450.1524 450.1524 13 525.0000 485.2432 47.0000 14 525.0000 506.0650 35.0000 15 525.0000 470.7504 47.0000 16 202.9095 370.8190 35.0000 17 525.0000 509.6385 35.0000 18 525.0000 471.0552 35.0000 19 235.0000 468.6052 35.0000 20 401.0000 472.6163 47.0000
Advertisements