
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Sum Based on Categorical Variable in R Data Frame
Finding group-wise mean is a common thing but if we go for step-by-step analysis then sum of values are also required when we have a categorical variable in our data set. This can be easily done with the help of group_by and summarise_each function of dplyr package.
Example
Consider the below data frame:
> Group<-sample(LETTERS[1:4],20,replace=TRUE) > Salary<-sample(21000:50000,20) > Emp<-data.frame(Group,Salary) > Emp
Output
Group Salary 1 D 28256 2 B 31092 3 A 23147 4 C 28209 5 B 37676 6 C 33374 7 D 44864 8 B 40152 9 A 25843 10 A 40946 11 D 23321 12 A 42854 13 C 36960 14 A 35285 15 B 44478 16 B 36173 17 C 35077 18 A 39319 19 D 49204 20 A 41597
Loading dplyr package:
> library(dplyr)
Finding the sum of salaries in each group:
Example
> Emp %>% group_by(Group) %>% summarise_each(funs(sum)) # A tibble: 4 x 2
Output
Group Salary 1 A 248991 2 B 189571 3 C 133620 4 D 145645 Warning message: `...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Do not worry about the warning message here, it occurred due to the version of dplyr. Our output is correct. If you want to ignore this warning then we need to install the latest version of tibble from CRAN. We are showing this here because you are likely to face the same problem but not necessarily.
Let’s have a look at another example:
Example
> x<-sample(c("India","USA","China","Canada"),20,replace=TRUE) > y<-rpois(20,5) > df<-data.frame(x,y) > df
Output
x y 1 India 5 2 India 3 3 India 4 4 China 5 5 Canada 3 6 USA 5 7 Canada 7 8 China 6 9 China 4 10 USA 7 11 China 5 12 India 8 13 Canada 3 14 India 6 15 Canada 5 16 Canada 3 17 Canada 7 18 USA 12 19 China 2 20 India 5
Example
> df %>% group_by(x) %>% summarise_each(funs(sum)) # A tibble: 4 x 2
Output
x y 1 Canada 28 2 China 22 3 India 31 4 USA 24 Warning message: `...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?