
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Combine GroupBy and Multiple Aggregate Functions in Pandas
The groupby() and aggregate() are the two functions available in the pandas library.
The groupby() function
The groupby() function allows you to group a DataFrame by one or more columns. It internally performs a combination of operations such as splitting the object, applying a function, and combining the results, on the dataframe object.
This function returns DataFrameGroupBy object which contains information about the groups. Once we obtain this object we can perform various operations such as calculating the mean, calculating the sum and average etc?
Syntax
Following is the syntax of the groupby() function -
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
The aggregate() function
The aggregate() function is used to apply one or more operations (functions) over a particular axis on the entire DataFrame or on specific columns in the DataFrame. These mathematical operations are used to summarize and compute the statistics for the given set of data. agg is the alias of the aggregate() function.
Syntax
The following is the syntax of groupby -
DataFrame.aggregate(func=None, axis=0, *args, **kwargs)
Where,
func ? This is the function or, list of functions used to calculate the aggregate values.
axis ? This value specifies the axis (column: 0 or 'index' and row ? 1 or 'columns').
*args? Positional arguments to pass to func
**kwargs ? Keyword arguments to pass to func
Combining Groupby and Multiple Aggregate Functions
We can perform multiple aggregate functions such as sum, mean, min max etc., on the results of a Groupby clause using the aggregate() or agg() function as shown below -
pandas.groupby(column_name).agg(column)
Example
In the following example, we use groupby function in pandas to group a Dataframe by the column name Fruits and perform the aggregate operation mean on two different columns 'Dozens' and 'Cost'. This will return the combined output of groupby and aggregate functions.
import pandas as pd data = {'Fruits': ['Papaya','Apple','Banana','Grapes','Orange','Watermelon'], 'Dozens': [25, 30, 35, 27, 32, 37], 'Cost': [500, 6000, 270, 5500, 6500, 7500]} df = pd.DataFrame(data) grouped_df = df.groupby('Fruits').agg({'Dozens': ['mean'], 'Cost': ['mean']}) print(grouped_df)
Output
Following is the output of the combination of groupby and aggregate functions.
Dozens Cost mean mean Fruits Apple 30.0 6000.0 Banana 35.0 270.0 Grapes 27.0 5500.0 Orange 32.0 6500.0 Papaya 25.0 500.0 Watermelon 37.0 7500.0
Example
In the following example we are considering the same dataset and grouping it by the same column (fruits) and calculating the "min", "max", "sum", "count", "mean" values of the column ?cost' using the agg() function -
import pandas as pd data = {'Fruits': ['Papaya','Apple','Banana','Grapes','Orange','Watermelon'], 'Dozens': [25, 30, 35, 27, 32, 37], 'Cost': [500, 6000, 270, 5500, 6500, 7500]} df = pd.DataFrame(data) grouped_df = df.groupby('Fruits').Cost.agg(["min", "max", "sum", "count", "mean"]) print(grouped_df)
Output
When we run the above code, following output will be displayed -
min max sum count mean Fruits Apple 6000 6000 6000 1 6000.0 Banana 270 270 270 1 270.0 Grapes 5500 5500 5500 1 5500.0 Orange 6500 6500 6500 1 6500.0 Papaya 500 500 500 1 500.0 Watermelon 7500 7500 7500 1 7500.0
Example
Let's see another example for combining groupby and aggregate functions. Here we are considering the dataset "titanic.csv" dataset and performing the operations "mean", "max", "sum" on the columns 'Survived' and 'Pclass' grouped by the column 'fare' -
import pandas as pd data = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv") data_groupby = data.groupby('Fare') data_agg = data_groupby.agg({'Survived':["mean", "max", "sum"],'Pclass':["min", "max", "sum"]}) print(data_agg)
Output
When we run the above code, following output will be generated -Survived Pclass mean max sum min max sum Fare 0.0000 0.066667 1 1 1 3 29 4.0125 0.000000 0 0 3 3 3 5.0000 0.000000 0 0 1 1 1 6.2375 0.000000 0 0 3 3 3 6.4375 0.000000 0 0 3 3 3 ... ... .. .. ... .. .. 227.5250 0.750000 1 3 1 1 4 247.5208 0.500000 1 1 1 1 2 262.3750 1.000000 1 2 1 1 2 263.0000 0.500000 1 2 1 1 4 512.3292 1.000000 1 3 1 1 3 [248 rows x 6 columns]