Open In App

Pandas DataFrame itertuples() Method

Last Updated : 18 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.

Python
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
})

# Iterate using itertuples()
for row in df.itertuples():
    print(row)

Output
Pandas(Index=0, Name='Alice', Age=25, City='New York')
Pandas(Index=1, Name='Bob', Age=30, City='Los Angeles')
Pandas(Index=2, Name='Charlie', Age=35, City='Chicago')

From the output we can see that namedtuples have been returned for each row.

Pandas DataFrame itertuples() Method

itertuples is a method in Pandas that is used to iterate over the rows of the dataframe and return lightweight namedtuples. By namedtuples we mean that we can access the element of the tuple by its field name. It is an alternative to iterrows() and is much more memory efficient.

Syntax:

DataFrame.itertuples(index=True, name='Pandas')

  • DataFrame means name of the dataframe
  • index= True will return the index of the row and it will be the first element of the tuple or namedtuple.
  • name='Pandas' will return the rows in namedtuple format. If it is set to None, it will return plain tuples with no field names

Here we have a dataframe and we need to iterate over the rows. We will use itertuples and set the index to False.

Python
import pandas as pd

# Sample fruit data
data = {
    'name': ['Apple', 'Banana', 'Cherry'],
    'color': ['Red', 'Yellow', 'Red'],
    'price': [1.2, 0.5, 2.5]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Iterate over rows using itertuples
for row in df.itertuples(index=False):
    print(row)

Output:

Screenshot-2024-12-14-115631
Pandas DataFrame itertuples() Method

From the output we can see that the index has been excluded because we have set index=False. Since the default name is 'Pandas' it returns rows where the field names are basically column names.

How to access a particular field in namedtuple while using itertuples()

We can also access a particular field in namedtuple while using itertuples. This can be done by using row variable followed by the dot operator and then the field name. Let us consider one example. Here we have a dataframe and we need to display the output in a proper format instead of namedtuple.

Python
import pandas as pd

# Sample flower data
data = {
    'name': ['Rose', 'Tulip', 'Daisy'],
    'color': ['Red', 'Yellow', 'White'],
    
}

# Create a DataFrame
df = pd.DataFrame(data)

# Iterate over rows using itertuples
for row in df.itertuples():
    print(f"Flower: {row.name}, Color: {row.color}")

Output:

Screenshot-2024-12-14-120229
Pandas DataFrame itertuples() Method

From the output we can see that using the column names we can extract the values from the namedtuple.

Now if we are setting name=None, then we are getting plain tuple as output. For plain tuple we can use indexing to access the values. By default the tuple indexing starts from 0.

Python
import pandas as pd

# Sample flower data
data = {
    
    'color': ['Red', 'Yellow', 'White', 'Yellow'],
   
    'bloom_season': ['Spring', 'Spring', 'Summer', 'Summer']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Iterate over rows using itertuples (with plain tuple, no namedtuples)
for row in df.itertuples(name=None):  # `name=None` ensures plain tuple instead of namedtuple
  print(f"Index: {row[0]}, Color:{row[1]}, Bloom Season: {row[2]}")

Output:

Screenshot-2024-12-14-123542
Pandas DataFrame itertuples() Method

From the output we can see that using the indexing, we can access the items of the tuple. But the drawback is in plain tuples we do not know the attribute names.

Some other operations using itertuples()

We can perform some operations using itertuples. Some of them include filtering, calculation, grouping and creating dictionary of rows.

1. Filtering Rows

itertuples means iterating through the rows and generating namedtuples. In namedtuples we can consider any attribute and apply comparison operator and filter those rows or items from the namedtuples. Below is the example that illustrates the same.

Python
import pandas as pd

# Sample student data
data = {
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Marks': [85, 42, 78],
    'Subject': ['Math', 'Science', 'English']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Filter rows with Marks greater than 75 using itertuples
print("Students scoring more than 75 marks:")
for row in df.itertuples(name='Pandas'):
    if row.Marks > 75:  # 'Marks' is at index 1 in the tuple
        print(f"Student: {row.Student}, Subject: {row[2]}")

Output:

Screenshot-2024-12-14-124457
Pandas DataFrame itertuples() Method

Here we have filtered the rows based on marks using the comparison operator.

2. Performing Calculations

We can iterate over the rows and perform aggregate calculations as well. Here we will iterate over the dataframe and perform addition operation for each row.

Python
import pandas as pd

# Sample data with columns A and B
data = {
    'A': [10, 20, 30, 40, 50],
    'B': [5, 15, 25, 35, 45],
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform addition using itertuples
print("Sum of A and B for each row:")
for row in df.itertuples(name='Pandas'):  # Using plain tuples
    sum_ab = row.A + row.B  # 'A' is at index 0, 'B' is at index 1
    print(f"Row {row.Index}: {sum_ab}")

Output:

Screenshot-2024-12-14-124957
Pandas DataFrame itertuples() Method

3. Grouping based on specific column

We can also group data based on particular column without using groupby and perform aggregation operations like min, max, count and sum. Let us consider an example.

Python
import pandas as pd

# Sample data
data = {
    'Group': ['A', 'B', 'A', 'B', 'A', 'C'],
    'Value': [10, 20, 30, 40, 50, 60]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Group data by the 'Group' column and calculate the sum of 'Value'
grouped_data = {}

for row in df.itertuples(name=None):  # Use plain tuples
    group = row[1]  # 'Group' column at index 1
    value = row[2]  # 'Value' column at index 2
    
    # Aggregate the values by group
    if group not in grouped_data:
        grouped_data[group] = 0
    grouped_data[group] += value

# Print the grouped results
for group, total in grouped_data.items():
    print(f"Group: {group}, Total Value: {total}")

Output:

Screenshot-2024-12-14-125429
Pandas DataFrame itertuples() Method

In this we are iterating and for each group we are calculating the sum of the values. If the group name is not present in dictionary, we are creating a key which is basically our group name and default value as 0. Then we are updating the values accordingly.

4. Creating Dictionary of rows

We can also create a dictionary of rows. This technique is useful when we need to store the rows in JSON format.

Python
import pandas as pd

# Sample DataFrame
data = {
    'Student': ['Alice', 'Bob', 'Charlie'],
    'Marks': [85, 42, 78],
    'Subject': ['Math', 'Science', 'English']
}

df = pd.DataFrame(data)

# Create a dictionary of rows
rows_dict = {}
for row in df.itertuples():  # Use plain tuples
    key = row.Index  # Use the 'Student' column as the key (index 0)
    rows_dict[key] = {
        'Marks': row.Marks,      # 'Marks' column (index 1)
        'Subject': row.Subject,    # 'Subject' column (index 2)
    }

# Print the resulting dictionary
for k,v in rows_dict.items():
    print(k,v)

Output:

Screenshot-2024-12-14-130304
Pandas DataFrame itertuples() Method

From the output we can see that index is basically the key and the values comprises of different columns and its associated values. The structure is similar to the JSON format.


Next Article
Article Tags :

Similar Reads