How to flatten a hierarchical index in Pandas DataFrame columns?
Last Updated :
13 Oct, 2022
In this article, we are going to see the flatten a hierarchical index in Pandas DataFrame columns. Hierarchical Index usually occurs as a result of groupby() aggregation functions. Flatten hierarchical index in Pandas, the aggregated function used will appear in the hierarchical index of the resulting dataframe.
Using reset_index() function
Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python.
Syntax: pandas.DataFrame.reset_index(level, drop, inplace)
Parameters:
- level - removes only the specified levels from the index
- drop - resets the index to the default integer index
- inplace - modifies the dataframe object permanently without creating a copy.
Example:
In this example, We used the pandas groupby function to group car sales data by quarters and reset_index() pandas function to flatten the hierarchical indexed columns of the grouped dataframe.
Python3
# import the python pandas package
import pandas as pd
# create a sample dataframe
data = pd.DataFrame({"cars": ["bmw", "bmw", "benz", "benz"],
"sale_q1 in Cr": [20, 22, 24, 26],
'sale_q2 in Cr': [11, 13, 15, 17]},
columns=["cars", "sale_q1 in Cr",
'sale_q2 in Cr'])
# group by cars based on the sum
# of sales on quarter 1 and 2
grouped_data = data.groupby(by="cars").agg("sum")
print(grouped_data)
# use reset_index to flattened
# the hierarchical dataframe.
flat_data = grouped_data.reset_index()
print(flat_data)
Output:
How to flatten a hierarchical index in PandasUsing as_index() function
Pandas provide a function called as_index() which is specified by a boolean value. The as_index() functions groups the dataframe by the specified aggregate function and if as_index() value is False, the resulting dataframe is flattened.
Syntax: pandas.DataFrame.groupby(by, level, axis, as_index)
Parameters:
- by - specifies the columns on which the groupby operation has to be performed
- level - specifies the index at which the columns has to be grouped
- axis - specifies whether to split along rows (0) or columns (1)
- as_index - Returns an object with group labels as the index, for aggregated output.
Example:
In this example, We are using the pandas groupby function to group car sales data by quarters and mention the as_index parameter as False and specify the as_index parameter as false ensures that the hierarchical index of the grouped dataframe is flattened.
Python3
# group by cars based on the
# sum of sales on quarter 1 and 2
# and mention as_index is False
grouped_data = data.groupby(by="cars", as_index=False).agg("sum")
# display
print(grouped_data)
Output:
How to flatten a hierarchical index in PandasFlattening hierarchical index in pandas dataframe using groupby
Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels.
Syntax: pandas.DataFrame.groupby(by=None, axis=0, level=None)
Parameter:
- by - mapping function that determines the groups in groupby function
- axis - 0 - splits along rows and 1 - splits along columns.
- level - if the axis is multi-indexed, groups at a specified level. (int)
Syntax: pandas.DataFrame.agg(func=None, axis=0)
Parameter:
- func - specifies the function to be used as aggregation function. (min, max, sum etc)
- axis - 0 - function applied to each column and 1- applied to each row.
Example
Import the python pandas package. Create a sample dataframe showing the car sales in two-quarters q1 and q2 as shown. Now use the pandas groupby function to group based on the sum and max of sales on quarter 1 and sum and min of sales 2. The grouped dataframe has multi-indexed columns stored in a list of tuples. Use a for loop to iterate through the list of tuples and join them as a single string. Append the joined strings in the flat_cols list. </li > <li > Now assign the flat_cols list to the column names of the multi-indexed grouped dataframe columns.
Python3
# group by cars based on
# the sum and max of sales on quarter 1
# and sum
grouped_data = data.groupby(by="cars").agg(
{"sale_q1 in Cr": [sum, max],
'sale_q2 in Cr': [sum, min]})
# create an empty list to save the
# names of the flattened columns
flat_cols = []
# iterate through this tuples and
# join them as single string
for i in grouped_data.columns:
flat_cols.append(i[0]+'_'+i[1])
# now assign the list of flattened
# columns to the grouped columns.
grouped_data.columns = flat_cols
# print the grouped data
print(grouped_data)
Output:
How to flatten a hierarchical index in PandasFlattening hierarchical index using to_records() function
In this example, we use the to_records() function of the pandas dataframe which converts all the rows in the dataframe as an array of tuples. This array of tuples is then passed to pandas.DataFrame function to convert the hierarchical index as flattened columns.
Syntax: pandas.DataFrame.to_records(index=True, column_dtypes=None)
Explanation:
- index - creates an index in resulting array
- column_dtypes - sets the columns to specified datatype.
Code:
Python3
# group by cars based on the sum
# and max of sales on quarter 1
# and sum and min of sales 2 and mention
# as_index is False
grouped_data = data.groupby(by="cars").agg({"sale_q1 in Cr": [sum, max],
'sale_q2 in Cr': [sum, min]})
# use to_records function on grouped data
# and pass this to the Dataframe function
flattened_data = pd.DataFrame(grouped_data.to_records())
print(flattened_data)
Output:
How to flatten a hierarchical index in PandasFlattening hierarchical columns using join() and rstrip()
In this example, we use the join() and rstrip() functions to flatten the columns. Usually, when we group a dataframe as hierarchical indexed columns, the columns at multilevel are stored as an array of tuples elements.
Syntax: str.join(iterable)
Explanation: Returns a concatenated string, if iterable, else returns a type error.
Syntax: str.rstrip([chars])
Explanation: Returns a string by splitting the excess trailing spaces (rightmost) to the string.
Code:
Here, we iterate through these tuples by joining the column name and index name of each tuple and storing the resulting flattened columns name in a list. Later, this stored list of flattened columns is assigned to the grouped dataframe.
Python3
# group by cars based on the sum
# and max of sales on quarter 1
# and sum and min of sales 2 and
# mention as_index is False
grouped_data = data.groupby(by="cars").agg({"sale_q1 in Cr": [sum, max],
'sale_q2 in Cr': [sum, min]})
# use join() and rstrip() function to
# flatten the hierarchical columns
grouped_data.columns = ['_'.join(i).rstrip('_')
for i in grouped_data.columns.values]
print(grouped_data)
Output:
How to flatten a hierarchical index in Pandas
Similar Reads
How to Find & Drop duplicate columns in a Pandas DataFrame?
Letâs discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Letâs create a simple Dataframe with column names 'Name', 'Age', 'Domicile', and 'Age'/'Marks'. Find Duplicate Columns from a DataFrameTo find duplicate columns we need to iterate through all columns of a DataFrame a
4 min read
Determine Period Index and Column for DataFrame in Pandas
In Pandas to determine Period Index and Column for Data Frame, we will use the pandas.period_range() method. It is one of the general functions in Pandas that is used to return a fixed frequency PeriodIndex, with day (calendar) as the default frequency. Syntax: pandas.to_numeric(arg, errors=âraiseâ,
2 min read
How to rename multiple column headers in a Pandas DataFrame?
Here we are going to rename multiple column headers using the rename() method. The rename method is used to rename a single column as well as rename multiple columns at a time. And pass columns that contain the new values and in place = true as an argument. We pass inplace = true because we just mod
5 min read
How to Convert Index to Column in Pandas Dataframe?
Pandas is a powerful tool which is used for data analysis and is built on top of the python library. The Pandas library enables users to create and manipulate dataframes (Tables of data) and time series effectively and efficiently. These dataframes can be used for training and testing machine learni
2 min read
How to convert index in a column of the Pandas dataframe?
Each row in a dataframe (i.e level=0) has an index value i.e value from 0 to n-1 index location and there are many ways to convert these index values into a column in a pandas dataframe. First, let's create a Pandas dataframe. Here, we will create a Pandas dataframe regarding student's marks in a pa
4 min read
How to drop one or multiple columns in Pandas DataFrame
Let's learn how to drop one or more columns in Pandas DataFrame for data manipulation. Drop Columns Using df.drop() MethodLet's consider an example of the dataset (data) with three columns 'A', 'B', and 'C'. Now, to drop a single column, use the drop() method with the columnâs name.Pythonimport pand
4 min read
How to Move a Column to First Position in Pandas DataFrame?
Moving a column to the first position in a Pandas DataFrame means changing the column order so that the column you want appears first. For example, if you have a DataFrame with columns ['Age', 'Name', 'City'] and you want to move the 'Name' column to the front, the result will be ['Name', 'Age', 'Ci
3 min read
How to Get First Column of Pandas DataFrame?
In this article, we will discuss how to get the first column of the pandas dataframe in Python programming language. Method 1: Using iloc[] function This function is used to get the first column using slice operator. for the rows we extract all of them, for columns specify the index for first column
4 min read
Adding New Column to Existing DataFrame in Pandas
Adding a new column to a DataFrame in Pandas is a simple and common operation when working with data in Python. You can quickly create new columns by directly assigning values to them. Let's discuss how to add new columns to the existing DataFrame in Pandas. There can be multiple methods, based on d
6 min read
Pandas - How to reset index in a given DataFrame
Let us see how to reset the index of a DataFrame after dropping some of the rows from the DataFrame.Approach :Â Import the Pandas module.Create a DataFrame.Drop some rows from the DataFrame using the drop() method.Reset the index of the DataFrame using the reset_index() method.Display the DataFrame
1 min read