
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Replace Missing Values with Median in R Data Frame Column
To replace missing values with median, we can use the same trick that is used to replace missing values with mean. For example, if we have a data frame df that contain columns x and y where both of the columns contains some missing values then the missing values can be replaced with median as df$x[is.na(df$x)]<-median(df$x,na.rm=TRUE) for x and for y we can do the same as df$y[is.na(df$y)]<-median(df$y,na.rm=TRUE).
Example
Consider the below data frame −
set.seed(1112) x1<-LETTERS[1:20] x2<-sample(c(NA,rpois(19,8)),20,replace=TRUE) df1<-data.frame(x1,x2) df1
Output
x1 x2 1 A 10 2 B 11 3 C 8 4 D 6 5 E 6 6 F NA 7 G 10 8 H 8 9 I 8 10 J 7 11 K NA 12 L 12 13 M 7 14 N 6 15 O 10 16 P 7 17 Q 7 18 R 8 19 S 11 20 T 4 median(df1$x2) [1] 8
Replacing missing values in x2 with median of the remaining values −
df1$x2[is.na(df1$x2)]<-median(df1$x2,na.rm=TRUE) df1
Output
x1 x2 1 A 10 2 B 11 3 C 8 4 D 6 5 E 6 6 F 8 7 G 10 8 H 8 9 I 8 10 J 7 11 K 8 12 L 12 13 M 7 14 N 6 15 O 10 16 P 7 17 Q 7 18 R 8 19 S 11 20 T 4
Let’s have a look at another example −
Example
ID<-1:20 Ratings<-sample(c(NA,1,2,3,4,5),20,replace=TRUE) df2<-data.frame(ID,Ratings) df2
Output
ID Ratings 1 1 3 2 2 1 3 3 1 4 4 4 5 5 1 6 6 4 7 7 2 8 8 3 9 9 2 10 10 2 11 11 3 12 12 5 13 13 5 14 14 1 15 15 4 16 16 1 17 17 4 18 18 NA 19 19 1 20 20 NA median(df2$Ratings,na.rm=TRUE) [1] 2.5
Replacing missing values in Ratings with median of the remaining values −
Example
df2$Ratings[is.na(df2$Ratings)]<-median(df2$Ratings,na.rm=TRUE) df2
Output
ID Ratings 1 1 3.0 2 2 1.0 3 3 1.0 4 4 4.0 5 5 1.0 6 6 4.0 7 7 2.0 8 8 3.0 9 9 2.0 10 10 2.0 11 11 3.0 12 12 5.0 13 13 5.0 14 14 1.0 15 15 4.0 16 16 1.0 17 17 4.0 18 18 2.5 19 19 1.0 20 20 2.5
Advertisements