
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to combine the levels of a factor variable in an R data frame?
An R data frame can have numeric as well as factor variables. It has been seen that, factor levels in the raw data are recorded as synonyms even in different language versions but it is rare. For example, a factor variable can have hot and cold as levels but it is possible that hot is recorded as garam by a Hindi native speaker because garam is Hindi form of hot. Therefore, we need to combine the similar levels into one so that we do not have unnecessary factor levels for a variable.
Example
Consider the below data frame −
set.seed(109) x1<-rep(c("Sweet","Meetha","Bitter","Salty"),times=5) x2<-sample(1:100,20) x3<-rpois(20,5) df1<-data.frame(x1,x2,x3) df1
Output
x1 x2 x3 1 Sweet 8 4 2 Meetha 22 6 3 Bitter 25 3 4 Salty 85 10 5 Sweet 90 13 6 Meetha 10 0 7 Bitter 55 7 8 Salty 92 7 9 Sweet 95 4 10 Meetha 31 4 11 Bitter 5 4 12 Salty 56 6 13 Sweet 32 4 14 Meetha 78 6 15 Bitter 16 10 16 Salty 48 9 17 Sweet 49 4 18 Meetha 35 4 19 Bitter 37 9 20 Salty 11 8
Since Meetha is the Hindi version of Sweet, we might want to convert Meetha to Sweet and it can be done as shown below −
Example
levels(df1$x1)[levels(df1$x1)=="Meetha"] <-"Sweet" df1
Output
x1 x2 x3 1 Sweet 8 4 2 Sweet 22 6 3 Bitter 25 3 4 Salty 85 10 5 Sweet 90 13 6 Sweet 10 0 7 Bitter 55 7 8 Salty 92 7 9 Sweet 95 4 10 Sweet 31 4 11 Bitter 5 4 12 Salty 56 6 13 Sweet 32 4 14 Sweet 78 6 15 Bitter 16 10 16 Salty 48 9 17 Sweet 49 4 18 Sweet 35 4 19 Bitter 37 9 20 Salty 11 8
Let’s have a look at another example −
Example
ID <-1:20 Class<-rep(c("First","Second","Third","Fourth","One"),each=4) df2<-data.frame(ID,Class) df2
Output
ID Class 1 1 First 2 2 First 3 3 First 4 4 First 5 5 Second 6 6 Second 7 7 Second 8 8 Second 9 9 Third 10 10 Third 11 11 Third 12 12 Third 13 13 Fourth 14 14 Fourth 15 15 Fourth 16 16 Fourth 17 17 One 18 18 One 19 19 One 20 20 One
Example
levels(df2$Class)[levels(df2$Class)=="One"] <-"First" df2
Output
ID Class 1 1 First 2 2 First 3 3 First 4 4 First 5 5 Second 6 6 Second 7 7 Second 8 8 Second 9 9 Third 10 10 Third 11 11 Third 12 12 Third 13 13 Fourth 14 14 Fourth 15 15 Fourth 16 16 Fourth 17 17 First 18 18 First 19 19 First 20 20 First
Advertisements