How to Select Column Values to Display in Pandas Groupby

Last Updated : 11 Jul, 2024

Pandas is a powerful Python library used extensively in data analysis and manipulation. One of its most versatile and widely used functions is groupby, which allows users to group data based on specific criteria and perform various operations on these groups. This article will delve into the details of how to select column values to display in pandas groupby, providing practical examples and technical explanations.

Table of Content

Understanding GroupBy in Pandas
Grouping Data with GroupBy

Example 1: Grouping by a Single Column
Example 2: Grouping by Multiple Columns

Selecting Column Values to Display in Pandas GroupBy

Example 1: Selecting Columns After GroupBy Using Double Brackets
Example 2: Selecting Columns from a GroupBy Object
Example 3: Iterating Over Groups

Understanding GroupBy in Pandas

We use groupby() function in Pandas is to split a DataFrame into groups based on some criteria. It can be followed by an aggregation function to perform operations on these groups.

Types of GroupBy Operations:

Single Column Grouping: Grouping data based on a single column.
Multiple Column Grouping: Grouping data based on multiple columns.

GroupBy Syntax:

The groupby function in Pandas is used to group data and perform operations on these groups.

Syntax:

df.groupby('column_name').operation()
Where,
df: The DataFrame to be grouped.
'column_name': The column or columns to group by.
operation(): The operation to be applied to each group, such as mean(), sum(), count(), etc.

When using groupby, the default behavior is to return the index values of the groups. However, in many cases, you might want to display values from a specific column instead. This can be achieved by applying a function to the grouped data.

Grouping Data with GroupBy

We can use the groupby function in Pandas to split the data into different groups based on some criteria. For example, we might group data by a single column or multiple columns to analyze subsets of the data independently.

Example 1: Grouping by a Single Column

Python

import pandas as pd

# Sample data
data = {
    'Animal': ['Cheetah', 'Cheetah', 'Lion', 'Lion', 'Tiger', 'Tiger'],
    'Max Speed': [100, 95, 80, 85, 65, 70]
}

df = pd.DataFrame(data)

# Group by 'Animal' and calculate mean speed
mean_speed = df.groupby('Animal').mean()
print(mean_speed)

Output:

         Max Speed
Animal            
Cheetah       97.5
Lion          82.5
Tiger         67.5

Example 2: Grouping by Multiple Columns

We can also group by multiple columns to perform complex data analysis like to group both 'Animal' and 'Max Speed' columns, and the sum is calculated for each group.

Python

import pandas as pd

# Sample data with multiple columns
data = {
    'Animal': ['Cheetah', 'Cheetah', 'Lion', 'Lion', 'Tiger', 'Tiger'],
    'Max Speed': [100, 95, 80, 85, 65, 70],
    'Color': ['Yellow', 'Yellow', 'Tan', 'Tan', 'Orange', 'Orange']
}

df = pd.DataFrame(data)

# Group by 'Animal' and 'Color', and calculate the sum
grouped = df.groupby(['Animal', 'Color']).sum()
print(grouped)

Output:

                Max Speed
Animal  Color            
Cheetah Yellow        195
Lion    Tan           165
Tiger   Orange        135

Selecting Column Values to Display in Pandas GroupBy

After performing a groupby operation and aggregating the data, if we want to select specific columns to display than we can do so by using double square brackets.

Example 1: Selecting Columns After GroupBy Using Double Brackets

In this we want to display sum for value1 and mean for value2 for category using GroupBy function

Python

import pandas as pd

# Sample data
data = {
    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value1': [10, 20, 30, 40, 50, 60],
    'Value2': [100, 200, 300, 400, 500, 600]
}

df = pd.DataFrame(data)

# Group by 'Category'
grouped = df.groupby('Category')

# Aggregate the data
aggregated = grouped.agg({'Value1': 'sum', 'Value2': 'mean'})

# Select specific columns to display
selected_columns = aggregated[['Value1', 'Value2']]

print(selected_columns)

Output:

          Value1  Value2
Category                
A             30   150.0
B             70   350.0
C            110   550.0

Example 2: Selecting Columns from a GroupBy Object

To select columns from a GroupBy object, you can use the reset_index method:

Python

import pandas as pd
df = pd.DataFrame({
    'a': [1, 1, 3],
    'b': [4.0, 5.5, 6.0],
    'c': [7, 8, 9],
    'name': ['hello', 'hello', 'foo']
})

# Group by columns a and name
gb = df.groupby(['a', 'name'])

# Calculate the median
median_result = gb.median().reset_index()

print(median_result)

Output:

   a    name    b    c
0  1  hello  4.75  7.5
1  3    foo  6.00  9.0

Example 3: Iterating Over Groups

To iterate over the groups and access the corresponding sub-DataFrames, you can use a loop:

Python

import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'bar'] * 3,
    'B': [1, 2, 3, 4, 5, 6],
    'C': [7, 8, 9, 10, 11, 12]
})

# Group by column A
gb = df.groupby('A')

# Iterate over the groups
for name, group in gb:
    print(f"Group: {name}")
    print(group)
    print()

Output:

Group: bar
   A  B   C
1  bar  2   8
3  bar  4  10
5  bar  6  12

Group: foo
   A  B   C
0  foo  1   7
2  foo  3   9
4  foo  5  11

Conclusion

The groupby function in Pandas is a versatile tool for data analysis. It allows you to group data based on one or more columns and perform various operations on these groups. By selecting specific columns, aggregating data, and applying custom functions, you can gain valuable insights from your data. Whether you are working with sales data, student scores, or employee information, the groupby function can help you analyze and understand your data more effectively.

How to List values for each Pandas group?

abhaystriver

Improve

Article Tags :

How to Select Column Values to Display in Pandas Groupby

Understanding GroupBy in Pandas

Types of GroupBy Operations:

GroupBy Syntax:

Grouping Data with GroupBy

Example 1: Grouping by a Single Column

Example 2: Grouping by Multiple Columns

Selecting Column Values to Display in Pandas GroupBy

Example 1: Selecting Columns After GroupBy Using Double Brackets

Example 2: Selecting Columns from a GroupBy Object

Example 3: Iterating Over Groups

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?