Pandas DataFrame itertuples() Method
Last Updated :
18 Dec, 2024
itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.
Python
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
# Iterate using itertuples()
for row in df.itertuples():
print(row)
OutputPandas(Index=0, Name='Alice', Age=25, City='New York')
Pandas(Index=1, Name='Bob', Age=30, City='Los Angeles')
Pandas(Index=2, Name='Charlie', Age=35, City='Chicago')
From the output we can see that namedtuples have been returned for each row.
Pandas DataFrame itertuples() Method
itertuples is a method in Pandas that is used to iterate over the rows of the dataframe and return lightweight namedtuples. By namedtuples we mean that we can access the element of the tuple by its field name. It is an alternative to iterrows() and is much more memory efficient.
Syntax:
DataFrame.itertuples(index=True, name='Pandas')
- DataFrame means name of the dataframe
- index= True will return the index of the row and it will be the first element of the tuple or namedtuple.
- name='Pandas' will return the rows in namedtuple format. If it is set to None, it will return plain tuples with no field names
Here we have a dataframe and we need to iterate over the rows. We will use itertuples and set the index to False.
Python
import pandas as pd
# Sample fruit data
data = {
'name': ['Apple', 'Banana', 'Cherry'],
'color': ['Red', 'Yellow', 'Red'],
'price': [1.2, 0.5, 2.5]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples(index=False):
print(row)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that the index has been excluded because we have set index=False. Since the default name is 'Pandas' it returns rows where the field names are basically column names.
How to access a particular field in namedtuple while using itertuples()
We can also access a particular field in namedtuple while using itertuples. This can be done by using row variable followed by the dot operator and then the field name. Let us consider one example. Here we have a dataframe and we need to display the output in a proper format instead of namedtuple.
Python
import pandas as pd
# Sample flower data
data = {
'name': ['Rose', 'Tulip', 'Daisy'],
'color': ['Red', 'Yellow', 'White'],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples():
print(f"Flower: {row.name}, Color: {row.color}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the column names we can extract the values from the namedtuple.
Now if we are setting name=None, then we are getting plain tuple as output. For plain tuple we can use indexing to access the values. By default the tuple indexing starts from 0.
Python
import pandas as pd
# Sample flower data
data = {
'color': ['Red', 'Yellow', 'White', 'Yellow'],
'bloom_season': ['Spring', 'Spring', 'Summer', 'Summer']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples (with plain tuple, no namedtuples)
for row in df.itertuples(name=None): # `name=None` ensures plain tuple instead of namedtuple
print(f"Index: {row[0]}, Color:{row[1]}, Bloom Season: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the indexing, we can access the items of the tuple. But the drawback is in plain tuples we do not know the attribute names.
Some other operations using itertuples()
We can perform some operations using itertuples. Some of them include filtering, calculation, grouping and creating dictionary of rows.
1. Filtering Rows
itertuples means iterating through the rows and generating namedtuples. In namedtuples we can consider any attribute and apply comparison operator and filter those rows or items from the namedtuples. Below is the example that illustrates the same.
Python
import pandas as pd
# Sample student data
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Filter rows with Marks greater than 75 using itertuples
print("Students scoring more than 75 marks:")
for row in df.itertuples(name='Pandas'):
if row.Marks > 75: # 'Marks' is at index 1 in the tuple
print(f"Student: {row.Student}, Subject: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodHere we have filtered the rows based on marks using the comparison operator.
2. Performing Calculations
We can iterate over the rows and perform aggregate calculations as well. Here we will iterate over the dataframe and perform addition operation for each row.
Python
import pandas as pd
# Sample data with columns A and B
data = {
'A': [10, 20, 30, 40, 50],
'B': [5, 15, 25, 35, 45],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Perform addition using itertuples
print("Sum of A and B for each row:")
for row in df.itertuples(name='Pandas'): # Using plain tuples
sum_ab = row.A + row.B # 'A' is at index 0, 'B' is at index 1
print(f"Row {row.Index}: {sum_ab}")
Output:
Pandas DataFrame itertuples() Method3. Grouping based on specific column
We can also group data based on particular column without using groupby and perform aggregation operations like min, max, count and sum. Let us consider an example.
Python
import pandas as pd
# Sample data
data = {
'Group': ['A', 'B', 'A', 'B', 'A', 'C'],
'Value': [10, 20, 30, 40, 50, 60]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Group data by the 'Group' column and calculate the sum of 'Value'
grouped_data = {}
for row in df.itertuples(name=None): # Use plain tuples
group = row[1] # 'Group' column at index 1
value = row[2] # 'Value' column at index 2
# Aggregate the values by group
if group not in grouped_data:
grouped_data[group] = 0
grouped_data[group] += value
# Print the grouped results
for group, total in grouped_data.items():
print(f"Group: {group}, Total Value: {total}")
Output:
Pandas DataFrame itertuples() MethodIn this we are iterating and for each group we are calculating the sum of the values. If the group name is not present in dictionary, we are creating a key which is basically our group name and default value as 0. Then we are updating the values accordingly.
4. Creating Dictionary of rows
We can also create a dictionary of rows. This technique is useful when we need to store the rows in JSON format.
Python
import pandas as pd
# Sample DataFrame
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
df = pd.DataFrame(data)
# Create a dictionary of rows
rows_dict = {}
for row in df.itertuples(): # Use plain tuples
key = row.Index # Use the 'Student' column as the key (index 0)
rows_dict[key] = {
'Marks': row.Marks, # 'Marks' column (index 1)
'Subject': row.Subject, # 'Subject' column (index 2)
}
# Print the resulting dictionary
for k,v in rows_dict.items():
print(k,v)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that index is basically the key and the values comprises of different columns and its associated values. The structure is similar to the JSON format.
Similar Reads
Pandas DataFrame iterrows() Method
iterrows() method in Pandas is a simple way to iterate over rows of a DataFrame. It returns an iterator that yields each row as a tuple containing the index and the row data (as a Pandas Series). This method is often used in scenarios where row-wise operations or transformations are required. Exampl
4 min read
Pandas DataFrame interpolate() Method | Pandas Method
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Python Pandas interpolate() method is used to fill NaN values in the DataFrame or Series us
3 min read
Pandas DataFrame take() Method
Python is a great tool for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages like Pandas which make analyzing data much easier. Pandas take() function returns elements on the given indices, along an axis. This means that we are not indexing according to actu
3 min read
Merge Multiple Dataframes - Pandas
Merging allow us to combine data from two or more DataFrames into one based on index values. This is used when we want to bring together related information from different sources. In Pandas there are different ways to combine DataFrames: 1. Merging DataFrames Using merge()We use merge() when we wan
3 min read
Pandas DataFrame.to_sparse() Method
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
2 min read
Pandas DataFrame duplicated() Method | Pandas Method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas duplicated() method identifies duplicated rows in a DataFrame. It returns a boo
3 min read
Pandas Dataframe Index
Index in pandas dataframe act as reference for each row in dataset. It can be numeric or based on specific column values. The default index is usually a RangeIndex starting from 0, but you can customize it for better data understanding. You can easily access the current index of a dataframe using th
3 min read
How to Get First Row of Pandas DataFrame?
To get the first row of a Pandas Dataframe there are several methods available, each with its own advantages depending on the situation. The most common methods include using .iloc[], .head(), and .loc[]. Let's understand with this example: [GFGTABS] Python import pandas as pd data = {'Name'
4 min read
Methods to Round Values in Pandas DataFrame
There are various ways to Round Values in Pandas DataFrame so let's see each one by one: Let's create a Dataframe with 'Data Entry' Column only: Code: C/C++ Code # import Dataframe class # from pandas library from pandas import DataFrame # import numpy library import numpy as np # dictionary Myvalue
3 min read
Python | Pandas dataframe.melt()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.melt() function unpivots a DataFrame from wide format to long format,
2 min read