Pivot N DF Pandas-II
Pivot N DF Pandas-II
DataFrame
• In data analysis with pandas, the pivot function is used to
reshape data in a DataFrame. Essentially, it transforms
data from a long format to a wide format, allowing you to
reorganize the data based on specific columns.
Basic Concept
• Suppose you have a DataFrame with data in a long format like this:
• You can use pivot to rearrange this data so that each category
becomes a column, with the Date as the index, and the Value
entries filling the table.
Syntax
• pivot(index=None, columns=None, values=None)
• index: The column to use to make new frame’s index.
• columns: The column to use to make new frame’s
columns.
• values: The column to use for populating new frame’s
values.
Example
import pandas as pd
# Creating the DataFrame
data = { 'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-
02'], 'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 15, 25] }
df = pd.DataFrame(data)
# Pivoting the DataFrame
pivot_df = df.pivot(index='Date', columns='Category', values='Value')
print(pivot_df)
Result
Pivot Table
• In pandas, the pivot_table function is a versatile tool used
for reshaping and summarizing data in a DataFrame.
Unlike pivot, which requires unique values for the
combination of index and columns, pivot_table can handle
duplicate entries by aggregating data.
Basic Concept
• pivot_table allows you to create a summary table, where
you can aggregate data based on one or more keys
(indexes) and provide a function to perform on the values.
It's particularly useful for creating cross-tabulations and
summary statistics.
Syntax
• pivot_table( values=None, index=None, columns=None, aggfunc='mean',
fill_value=None, dropna=True )
• data: The DataFrame you want to pivot.
• values: The column(s) to aggregate. If not specified, all numeric columns are
aggregated.
• index: The column(s) to use as the new index of the pivot table.
• columns: The column(s) to use as the columns of the pivot table.
• aggfunc: The aggregation function(s) to use, such as 'mean', 'sum', 'count', or a custom
function. Default is 'mean'.
• fill_value: Value to replace missing values in the pivot table.
• dropna: Whether to drop columns that do not have any data (default is True).
import pandas as pd
# Creating the DataFrame
data = {
'Date': ['2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02',
'2024-01-01'],
'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 15, 25, 5] }
df = pd.DataFrame(data)
pivot_table_df = df.pivot_table( values='Value', index='Date',
columns='Category',
aggfunc='mean‘ )
print(pivot_table_df)
Result