
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Subset Rows Containing Maximum Values in R Data Frame
To subset rows that contains maximum depending on another column in R data frame, we can follow the below steps −
- First of all, create a data frame with one numerical and one categorical column.
- Then, use tapply function with max function to find the rows that contains maximum in numerical column based on another column.
Example1
Create the data frame
Let's create a data frame as shown below −
x<-rnorm(20) factor1<-sample(LETTERS[1:4],20,replace=TRUE) df1<-data.frame(x,factor1) df1
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
x factor1 1 -1.21231516 A 2 -0.01576519 B 3 0.59032593 D 4 -0.41583339 C 5 -0.38508102 A 6 -0.61177209 C 7 -0.52961795 C 8 0.30561837 A 9 -0.58067776 A 10 0.62246173 C 11 -0.58479709 C 12 0.09817433 B 13 1.11240042 C 14 0.29007306 B 15 -0.66345792 B 16 -1.80789902 A 17 0.33419804 C 18 -0.15665767 A 19 1.56775923 C 20 1.49345799 B
Find the rows that contains maximum based on another column
Using tapply function to find the maximum of rows in column x based on factor1 column in df1 −
x<-rnorm(20) factor1<-sample(LETTERS[1:4],20,replace=TRUE) df1<-data.frame(x,factor1) tapply(df1$x,df1$factor1,max)
Output
A B C D 0.3056184 1.4934580 1.5677592 0.5903259
Example 2
Create the data frame
Let's create a data frame as shown below −
y<-sample(1:50,20) factor2<-sample(c("Low","Medium","High"),20,replace=TRUE) df2<-data.frame(y,factor2) df2
On executing, the above script generates the below output(this output will vary on your system due to randomization) −
y factor2 1 45 Low 2 2 Medium 3 5 High 4 33 Low 5 28 High 6 37 Medium 7 7 High 8 21 High 9 48 Low 10 18 High 11 15 High 12 38 High 13 20 Medium 14 4 Low 15 22 Medium 16 34 Low 17 32 Low 18 29 Low 19 24 High 20 17 Medium
Find the rows that contains maximum based on another column
Using tapply function to find the maximum of rows in column y based on factor2 column in df2 −
tapply(df2$y,df2$factor2,max)
Output
High Low Medium 38 48 37
Advertisements