Pandas DataFrame itertuples() Method
Last Updated :
18 Dec, 2024
itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.
Python
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
# Iterate using itertuples()
for row in df.itertuples():
print(row)
OutputPandas(Index=0, Name='Alice', Age=25, City='New York')
Pandas(Index=1, Name='Bob', Age=30, City='Los Angeles')
Pandas(Index=2, Name='Charlie', Age=35, City='Chicago')
From the output we can see that namedtuples have been returned for each row.
Pandas DataFrame itertuples() Method
itertuples is a method in Pandas that is used to iterate over the rows of the dataframe and return lightweight namedtuples. By namedtuples we mean that we can access the element of the tuple by its field name. It is an alternative to iterrows() and is much more memory efficient.
Syntax:
DataFrame.itertuples(index=True, name='Pandas')
- DataFrame means name of the dataframe
- index= True will return the index of the row and it will be the first element of the tuple or namedtuple.
- name='Pandas' will return the rows in namedtuple format. If it is set to None, it will return plain tuples with no field names
Here we have a dataframe and we need to iterate over the rows. We will use itertuples and set the index to False.
Python
import pandas as pd
# Sample fruit data
data = {
'name': ['Apple', 'Banana', 'Cherry'],
'color': ['Red', 'Yellow', 'Red'],
'price': [1.2, 0.5, 2.5]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples(index=False):
print(row)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that the index has been excluded because we have set index=False. Since the default name is 'Pandas' it returns rows where the field names are basically column names.
How to access a particular field in namedtuple while using itertuples()
We can also access a particular field in namedtuple while using itertuples. This can be done by using row variable followed by the dot operator and then the field name. Let us consider one example. Here we have a dataframe and we need to display the output in a proper format instead of namedtuple.
Python
import pandas as pd
# Sample flower data
data = {
'name': ['Rose', 'Tulip', 'Daisy'],
'color': ['Red', 'Yellow', 'White'],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples
for row in df.itertuples():
print(f"Flower: {row.name}, Color: {row.color}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the column names we can extract the values from the namedtuple.
Now if we are setting name=None, then we are getting plain tuple as output. For plain tuple we can use indexing to access the values. By default the tuple indexing starts from 0.
Python
import pandas as pd
# Sample flower data
data = {
'color': ['Red', 'Yellow', 'White', 'Yellow'],
'bloom_season': ['Spring', 'Spring', 'Summer', 'Summer']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Iterate over rows using itertuples (with plain tuple, no namedtuples)
for row in df.itertuples(name=None): # `name=None` ensures plain tuple instead of namedtuple
print(f"Index: {row[0]}, Color:{row[1]}, Bloom Season: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that using the indexing, we can access the items of the tuple. But the drawback is in plain tuples we do not know the attribute names.
Some other operations using itertuples()
We can perform some operations using itertuples. Some of them include filtering, calculation, grouping and creating dictionary of rows.
1. Filtering Rows
itertuples means iterating through the rows and generating namedtuples. In namedtuples we can consider any attribute and apply comparison operator and filter those rows or items from the namedtuples. Below is the example that illustrates the same.
Python
import pandas as pd
# Sample student data
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Filter rows with Marks greater than 75 using itertuples
print("Students scoring more than 75 marks:")
for row in df.itertuples(name='Pandas'):
if row.Marks > 75: # 'Marks' is at index 1 in the tuple
print(f"Student: {row.Student}, Subject: {row[2]}")
Output:
Pandas DataFrame itertuples() MethodHere we have filtered the rows based on marks using the comparison operator.
2. Performing Calculations
We can iterate over the rows and perform aggregate calculations as well. Here we will iterate over the dataframe and perform addition operation for each row.
Python
import pandas as pd
# Sample data with columns A and B
data = {
'A': [10, 20, 30, 40, 50],
'B': [5, 15, 25, 35, 45],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Perform addition using itertuples
print("Sum of A and B for each row:")
for row in df.itertuples(name='Pandas'): # Using plain tuples
sum_ab = row.A + row.B # 'A' is at index 0, 'B' is at index 1
print(f"Row {row.Index}: {sum_ab}")
Output:
Pandas DataFrame itertuples() Method3. Grouping based on specific column
We can also group data based on particular column without using groupby and perform aggregation operations like min, max, count and sum. Let us consider an example.
Python
import pandas as pd
# Sample data
data = {
'Group': ['A', 'B', 'A', 'B', 'A', 'C'],
'Value': [10, 20, 30, 40, 50, 60]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Group data by the 'Group' column and calculate the sum of 'Value'
grouped_data = {}
for row in df.itertuples(name=None): # Use plain tuples
group = row[1] # 'Group' column at index 1
value = row[2] # 'Value' column at index 2
# Aggregate the values by group
if group not in grouped_data:
grouped_data[group] = 0
grouped_data[group] += value
# Print the grouped results
for group, total in grouped_data.items():
print(f"Group: {group}, Total Value: {total}")
Output:
Pandas DataFrame itertuples() MethodIn this we are iterating and for each group we are calculating the sum of the values. If the group name is not present in dictionary, we are creating a key which is basically our group name and default value as 0. Then we are updating the values accordingly.
4. Creating Dictionary of rows
We can also create a dictionary of rows. This technique is useful when we need to store the rows in JSON format.
Python
import pandas as pd
# Sample DataFrame
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Marks': [85, 42, 78],
'Subject': ['Math', 'Science', 'English']
}
df = pd.DataFrame(data)
# Create a dictionary of rows
rows_dict = {}
for row in df.itertuples(): # Use plain tuples
key = row.Index # Use the 'Student' column as the key (index 0)
rows_dict[key] = {
'Marks': row.Marks, # 'Marks' column (index 1)
'Subject': row.Subject, # 'Subject' column (index 2)
}
# Print the resulting dictionary
for k,v in rows_dict.items():
print(k,v)
Output:
Pandas DataFrame itertuples() MethodFrom the output we can see that index is basically the key and the values comprises of different columns and its associated values. The structure is similar to the JSON format.
Similar Reads
Pandas DataFrame iterrows() Method
iterrows() method in Pandas is a simple way to iterate over rows of a DataFrame. It returns an iterator that yields each row as a tuple containing the index and the row data (as a Pandas Series). This method is often used in scenarios where row-wise operations or transformations are required. Exampl
4 min read
Pandas dataframe.groupby() Method
Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. It follows a "split-apply-combine" strategy, where data is divided into groups, a function is applied to each group, and the results
6 min read
Pandas DataFrame.loc[] Method
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
6 min read
Pandas DataFrame interpolate() Method | Pandas Method
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Python Pandas interpolate() method is used to fill NaN values in the DataFrame or Series us
3 min read
DataFrame.to_excel() method in Pandas
The to_excel() method is used to export the DataFrame to the excel file. Â To write a single object to the excel file, we have to specify the target file name. If we want to write to multiple sheets, we need to create an ExcelWriter object with target filename and also need to specify the sheet in th
3 min read
Pandas Dataframe Difference
When working with multiple DataFrames, you might want to compute the differences between them, such as identifying rows that are in one DataFrame but not in another. Pandas provides various ways to compute the difference between DataFrames, whether it's comparing rows, columns, or entire DataFrames.
4 min read
Pandas Merge Dataframe
Merging DataFrames is a common operation when working with multiple datasets in Pandas. The `merge()` function allows you to combine two DataFrames based on a common column or index. In this article, we will explore how to merge DataFrames using various options and techniques.We will load the datase
5 min read
Merge Multiple Dataframes - Pandas
Merging allow us to combine data from two or more DataFrames into one based on index values. This is used when we want to bring together related information from different sources. In Pandas there are different ways to combine DataFrames:1. Merging DataFrames Using merge()We use merge() when we want
3 min read
Efficient methods to iterate rows in Pandas Dataframe
When iterating over rows in a Pandas DataFrame, the method you choose can greatly impact performance. Avoid traditional row iteration methods like for loops or .iterrows() when performance matters. Instead, use methods like vectorization or itertuples(). Vectorized operations are the fastest and mos
5 min read
DataFrame.read_pickle() method in Pandas
Prerequisite : pd.to_pickle method() The read_pickle() method is used to pickle (serialize) the given object into the file. This method uses the syntax as given below : Syntax: pd.read_pickle(path, compression='infer') Parameters: Arguments                Type
2 min read