Analyzing Data Activity with Pandas



Pandas is a Python library that is designed for data manipulation and analysis. It provides the two data structures:

  • Series: It is a one-dimensional labelled array (like a column in a spreadsheet).
  • DataFrame: It is a two-dimensional labelled data structure (like a table), allowing storage of multiple columns with different data types.

Using Pandas, we can perform complex data manipulations with the help of its powerful data structures. It can work with different file formats like CSV, Excel, etc. In this article, we will learn how to analyze data activity using Pandas.

Analyzing Data Activity with Pandas

Here, we are going to use the sample data that consists of the data related to the customer that is stored in the file named "demo_3.csv". By using this data, we are going to perform different types of analysis.

demo_3.csv file:
Customer_ID Product_Category Purchase_Amount
101 Electronics 450
700 Fashion 102
103 Electronics 300
1200 Furniture 104
105 Fashion 550

You can load the dataset by using the read_csv() function. It reads data from the CSV file and converts it into a dataframe. It is important to load data before starting any analysis to understand the structure of the data.

import pandas as pd
x=pd.read_csv('demo_3.csv')
print(x.head())

The output of the above program is as follows -

Customer_ID Product_Category  Purchase_Amount
0          101      Electronics              450
1          102          Fashion              700
2          103      Electronics              300
3          104        Furniture             1200
4          105          Fashion              550

Using Boolean Indexing

Boolean Indexing is the technique used in Python, particularly within libraries like NumPy and Pandas, for filtering and selecting data based on a specific condition. It is also known as Boolean masking.

In this case, we are applying the condition directly to the DataFrame and extracting the rows that match the condition (Purchase_Amount > 300).

Example

Consider the following example, where we are going to filter the data based on the condition:

import pandas as pd
x=pd.read_csv('demo_3.csv')
y=x[x['Purchase_Amount']>300]
print(y)

Following is the output of the above program:

Customer_ID Product_Category  Purchase_Amount
0          101      Electronics              450
1          102          Fashion              700
3          104        Furniture             1200
4          105          Fashion              550

Using groupby() Method

The Pandas groupby() method is used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis.

We are going to use the groupby() method combined with the sum() function to add the data. Pandas automatically organizes the rows by category and calculates the sum of purchases in each group.

Example

In the following example, we are going to calculate the total sales amount for each Product_Category from the data:

import pandas as pd
x=pd.read_csv('demo_3.csv')
result=x.groupby('Product_Category')['Purchase_Amount'].sum()
print(result)

If we run the above program, it will generate the following output:

Product_Category
Electronics     750
Fashion        1250
Furniture      1200
Name: Purchase_Amount, dtype: int64

Conclusion

Pandas simplifies the data activity analysis through easy-to-use methods like read_csv(), filtering via Boolean indexing etc. By using this techniques, we can efficiently process and analyze the datasets.

Updated on: 2025-07-24T18:21:03+05:30

314 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements