
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Standardize Numerical Columns in R Data Frame with Categorical Columns
The standardization of a numerical column can be easily done with the help of scale function but if we want to standardize multiple columns of a data frame if categorical columns also exist then mutate_if function of dplyr package will be used. For example, if we have a data frame df then it can be done as df%>%mutate_if(is.numeric,scale)
Example1
Consider the below data frame −
> x1<-sample(letters[1:4],20,replace=TRUE) > x2<-rpois(20,2) > df1<-data.frame(x1,x2) > df1
Output
x1 x2 1 c 4 2 c 1 3 a 4 4 a 1 5 b 0 6 c 4 7 c 2 8 a 1 9 c 2 10 d 2 11 b 0 12 b 3 13 c 0 14 d 1 15 a 2 16 d 1 17 a 2 18 d 2 19 c 1 20 a 3
Loading dplyr package and standardizing numerical columns in df1 −
> library(dplyr) > df1%>%mutate_if(is.numeric,scale)
Output
x1 x2 1 c 1.7168098 2 c -0.6242945 3 a 1.7168098 4 a -0.6242945 5 b -1.4046626 6 c 1.7168098 7 c 0.1560736 8 a -0.6242945 9 c 0.1560736 10 d 0.1560736 11 b -1.4046626 12 b 0.9364417 13 c -1.4046626 14 d -0.6242945 15 a 0.1560736 16 d -0.6242945 17 a 0.1560736 18 d 0.1560736 19 c -0.6242945 20 a 0.9364417
Example2
> y1<-sample(c("S1","S2","S3"),20,replace=TRUE) > y2<-rnorm(20,34,2.3) > y3<-rnorm(20,500,47.1) > df2<-data.frame(y1,y2,y3) > df2
Output
y1 y2 y3 1 S2 33.67237 511.9535 2 S2 30.47941 509.6286 3 S3 35.19967 605.8329 4 S2 27.82392 590.1114 5 S2 33.91328 485.1736 6 S1 38.26157 449.6714 7 S3 32.46148 495.2131 8 S3 32.06987 477.6192 9 S2 33.32162 448.6335 10 S2 37.55487 544.3631 11 S2 34.84706 462.9035 12 S1 34.59332 532.0554 13 S2 32.36337 501.9207 14 S2 32.26520 516.7858 15 S3 33.62168 530.5313 16 S3 33.06213 515.0878 17 S1 35.09752 454.7614 18 S3 31.79898 499.8527 19 S1 32.85342 509.8768 20 S3 33.72336 503.8084
Standardizing numerical columns in df2 −
> df2%>%mutate_if(is.numeric,scale)
Output
y1 y2 y3 1 S2 0.09796633 0.11297890 2 S2 -1.30368623 0.05666468 3 S3 0.76842187 2.38692048 4 S2 -2.46939699 2.00611458 5 S2 0.20372057 -0.53568372 6 S1 2.11253906 -1.39561547 7 S3 -0.43359265 -0.29250727 8 S3 -0.60550146 -0.71866529 9 S2 -0.05600808 -1.42075459 10 S2 1.80231017 0.89800290 11 S2 0.61363310 -1.07510811 12 S1 0.50224659 0.59988493 13 S2 -0.47666141 -0.13003510 14 S2 -0.51975777 0.23002594 15 S3 0.07571152 0.56296787 16 S3 -0.16991946 0.18889687 17 S1 0.72358127 -1.27232444 18 S3 -0.72441871 -0.18012673 19 S1 -0.26153720 0.06267550 20 S3 0.12034948 -0.08431193
Advertisements