
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Analyzing Data Activity with Pandas
Pandas is a Python library that is designed for data manipulation and analysis. It provides the two data structures:
- Series: It is a one-dimensional labelled array (like a column in a spreadsheet).
- DataFrame: It is a two-dimensional labelled data structure (like a table), allowing storage of multiple columns with different data types.
Using Pandas, we can perform complex data manipulations with the help of its powerful data structures. It can work with different file formats like CSV, Excel, etc. In this article, we will learn how to analyze data activity using Pandas.
Analyzing Data Activity with Pandas
Here, we are going to use the sample data that consists of the data related to the customer that is stored in the file named "demo_3.csv". By using this data, we are going to perform different types of analysis.
demo_3.csv file:Customer_ID | Product_Category | Purchase_Amount |
---|---|---|
101 | Electronics | 450 |
700 | Fashion | 102 |
103 | Electronics | 300 |
1200 | Furniture | 104 |
105 | Fashion | 550 |
You can load the dataset by using the read_csv() function. It reads data from the CSV file and converts it into a dataframe. It is important to load data before starting any analysis to understand the structure of the data.
import pandas as pd x=pd.read_csv('demo_3.csv') print(x.head())
The output of the above program is as follows -
Customer_ID Product_Category Purchase_Amount 0 101 Electronics 450 1 102 Fashion 700 2 103 Electronics 300 3 104 Furniture 1200 4 105 Fashion 550
Using Boolean Indexing
Boolean Indexing is the technique used in Python, particularly within libraries like NumPy and Pandas, for filtering and selecting data based on a specific condition. It is also known as Boolean masking.
In this case, we are applying the condition directly to the DataFrame and extracting the rows that match the condition (Purchase_Amount > 300).
Example
Consider the following example, where we are going to filter the data based on the condition:
import pandas as pd x=pd.read_csv('demo_3.csv') y=x[x['Purchase_Amount']>300] print(y)
Following is the output of the above program:
Customer_ID Product_Category Purchase_Amount 0 101 Electronics 450 1 102 Fashion 700 3 104 Furniture 1200 4 105 Fashion 550
Using groupby() Method
The Pandas groupby() method is used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis.
We are going to use the groupby() method combined with the sum() function to add the data. Pandas automatically organizes the rows by category and calculates the sum of purchases in each group.
Example
In the following example, we are going to calculate the total sales amount for each Product_Category from the data:
import pandas as pd x=pd.read_csv('demo_3.csv') result=x.groupby('Product_Category')['Purchase_Amount'].sum() print(result)
If we run the above program, it will generate the following output:
Product_Category Electronics 750 Fashion 1250 Furniture 1200 Name: Purchase_Amount, dtype: int64
Conclusion
Pandas simplifies the data activity analysis through easy-to-use methods like read_csv(), filtering via Boolean indexing etc. By using this techniques, we can efficiently process and analyze the datasets.