0% found this document useful (0 votes)
19 views15 pages

Rajendra Task-2

material
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

Rajendra Task-2

material
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Name : AKKALA RAJENDRA REDDY

Roll No : 21F01A0503
College : ST.ANN'S COLLEGE OF ENGINEERING AND TECHNOLOGY
Mail id: [email protected]
Course : Datascience
Task No : Task 2

1)Plot the count of males and females in the dataset.

PROGRAM:-

import pandas as pd from

gender_guesser.detector import Detector

file_path = 'Diwali Sales Data.csv'

try:

df = pd.read_csv(file_path, encoding='latin-1')

except FileNotFoundError: print('file not found

error')

exit()

detector = Detector() def

infer_gender_from_name(name): gender =

detector.get_gender(name) if gender == 'male'

or gender == 'mostly_male':

return 'male' elif gender == 'female' or gender ==

'mostly_female':

return 'female'

else:

return 'unknown' df['gender'] =

df['name'].apply(infer_gender_from_name)

gender_counts = df['gender'].value_counts()
plt.figure(figsize=(8,6))
gender_counts.plot(kind='bar',color=['blue','pink'])
plt.title("Count of Males and Females")
plt.xlabel('Gender') plt.ylabel('Count')
plt.show()

OUTPUT:-

2) Give the sum of amounts spent by each gender and plot the corresponding graph.

PROGRAM:- import pandas as pd

import matplotlib.pyplot as plt

from gender_guesser.detector import Detector


file_path = 'Diwali Sales Data.csv' try:
df = pd.read_csv(file_path, encoding='latin-1')
except FileNotFoundError:
print('file not found error') exit()
detector = Detector() def
infer_gender_from_name(name): gender =
detector.get_gender(name) if gender == 'male'
or gender == 'mostly_male':
return 'male' elif gender == 'female' or gender
== 'mostly_female':
return 'female'
else:
return 'unknown'
df['gender'] = df['Cust_name'].apply(infer_gender_from_name)
gender_spent=df.groupby('gender')['Amount'].sum()
print(gender_spent)
gender_spent.plot(kind='bar',title='Sum of Amount Spent by
Gender',ylabel='Amount Spent',xlabel='Gender') plt.show()
OUTPUT:-

gender
female 18725393.49 male
22052081.95 unknown
65471656.99 Name: Amount,
dtype: float64

3)Count each age group and provide individual counts grouped by gender.

PROGRAM:- import pandas as pd import


matplotlib.pyplot as plt from
gender_guesser.detector import Detector
file_path = 'Diwali Sales Data.csv' try:
df = pd.read_csv(file_path, encoding='latin-1')
except FileNotFoundError:
print('file not found error')
exit() detector = Detector()
def infer_gender_from_name(name): gender =
detector.get_gender(name) if gender == 'male'
or gender == 'mostly_male':
return 'male' elif gender == 'female' or gender
== 'mostly_female':
return 'female'
else:
return 'unknown' df['gender'] =
df['Cust_name'].apply(infer_gender_from_name)
bins=[0,17,26,35,36,45,51,55] labels=['0-17','18-25','26-35','36-
45','46-50','51-55','55+'] grouped=df.groupby(['Age
Group','Gender']).size().unstack(fill_value=0) print(grouped)
grouped.plot(kind='bar',stacked=True,title='Count of Age Groups by
Gender',ylabel='Count',xlabel='Age Group') plt.show()

OUTPUT:-

Gender F M
Age Group
0-17 162 134
18-25 1305 574
26-35 3271 1272
36-45 1581 705
46-50 696 291
51-55 554 278
55+ 273 155
4)Plot the total amount spent by each age group.

PROGRAM:- import pandas as pd import


matplotlib.pyplot as plt file_path =
'Diwali Sales Data.csv' try:
df = pd.read_csv(file_path, encoding='latin-1')
except FileNotFoundError:
print('file not found error')
exit()
bins=[0,17,26,35,36,45,51,55]
labels=['0-17','18-25','26-35','36-45','46-50','51-55','55+']
df['Age Group']=pd.cut(df['Age'],bins,labels=labels,right=False)
age_group_spent=df.groupby('Age Group')['Amount'].sum()
age_group_spent.plot(kind='bar',title='Total Amount Spent by Age
Group',ylabel='Amount Spent',xlabel='Age Group') plt.show()
OUTPUT:-
5)Plot a graph depicting the total number of orders from the top 10 states.

PROGRAM:-

import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv('Diwali Sales Data.csv',encoding='unicode_escape')
state_order_counts=df['State'].value_counts()
top_10_states=state_order_counts.head(10)
plt.figure(figsize=(10,6)) top_10_states.plot(kind='bar')
plt.title('Total Number of Orders from Top 10 States')
plt.xlabel('State')
plt.ylabel('Number of Orders')
plt.xticks(rotation=45)
plt.show()

OUTPUT:-
6)Determine the total amount spent in the top 10 states.

PROGRAM:-

import pandas as pd df=pd.read_csv('Diwali Sales


Data.csv',encoding='unicode_escape')
state_totals=df.groupby('State')['Amount'].sum()
top_10_states=state_totals.nlargest(10)
total_amount_spent=top_10_states.sum()
print(f"Total amount spent in the top 10 states:$
{total_amount_spent}")
OUTPUT:-
Total amount spent in the top 10 states:$92220541.44

7)Plot a comparison graph between the number of married and unmarried


individuals.
PROGRAM:-

import pandas as pd
import matplotlib.pyplot as plt df=pd.read_csv('Diwali Sales
Data.csv',encoding='unicode_escape')
num_married=df['Marital_Status'].sum() #Assuming 1 represents
married
num_unmarried=(df[
'Marital_Status'
]==0).sum() #Assuming 0 represents
unmarried
Statuses=[
'Married'
,'Unmarried'
]
counts=[num_married,num_unmarried]
plt.figure(figsize=(
8,6))
plt.bar(Statuses,counts,color=[
'blue','orange'
])
plt.title(
'Comparision between Married and Unmarried Individuals'
)
plt.xlabel(
'Marital Status'
)
plt.ylabel(
'Count'
)
plt.show()

OUTPUT:
-

8)Plot the amount spent by males and females based on marital status.

PROGRAM:
-

importpandasas pd
importmatplotlib.pyplot
as plt
df=pd.read_csv(
'Diwali Sales Data.csv'
,encoding=
'unicode_escape'
)
grouped=df.groupby([
'Gender'
,'Marital_Status'
])['Amount'
].sum().reset_i
ndex()
fig,ax=plt.subplots(figsize=(
8,6))
male_data=grouped[grouped['Gender']=='M']
female_data=grouped[grouped['Gender']=='F']
bar_width=0.35
bar_positions_male=male_data['Marital_Status']
bar_positions_female=female_data['Marital_Status']+bar_width
ax.bar(bar_positions_male,male_data['Amount'],width=bar_width,label='Ma
le')
ax.bar(
bar_positions_female,female_data['Amount'],width=bar_width,label='Femal
e')
ax.set_xlabel('Marital Status(0: Single, 1: Married)')
ax.set_ylabel('Amount Spent')
ax.set_title('Amount Spent by Marital Status and Gender')
ax.set_xticklabels(['Single','Married'])
ax.legend()
plt.tight_layout()
plt.show()

OUTPUT:-
9)Plot the count of each occupation present in the dataset.

PROGRAM:-

import pandas as pd import matplotlib.pyplot as plt


df=pd.read_csv('Diwali Sales Data.csv',encoding='unicode_escape')
occupation_counts=df['Occupation'].value_counts()
plt.figure(figsize=(10,6))
occupation_counts.plot(kind='bar',color='skyblue')
plt.title('Count of Each Occupation') plt.xlabel('Occupation')
plt.ylabel('Count') plt.xticks(rotation=45) plt.show()

OUTPUT:-

10)Plot the amount spent by each occupation in descending order.

PROGRAM:-
import pandas as pd import matplotlib.pyplot as plt
df=pd.read_csv('Diwali Sales Data.csv',encoding='unicode_escape')
occupation_amounts=df.groupby('Occupation')['Amount'].sum()
occupation_amounts_sorted=occupation_amounts.sort_values(ascending=Fals
e) plt.figure(figsize=(10,6))
occupation_amounts_sorted.plot(kind='bar',color='green')
plt.title('Total Amount Spent by Each Occupation(Descending Order)')
plt.xlabel('Occupation') plt.ylabel('Total Amount Spent')
plt.xticks(rotation=45) plt.show()

OUTPUT:-

11)Provide a statistical analysis of each product category based on the percentage of orders
completed.

PROGRAM:-

import pandas as pd import matplotlib.pyplot as plt


df=pd.read_csv('Diwali Sales Data.csv',encoding='unicode_escape')
grouped=df.groupby('Product_Category')
total_orders=grouped.size()
#Assuming orders completed per category (assuming orders>2 are
completed) orders_completed=grouped['Orders'].apply(lambda x:
(x>2).sum())
percentage_completed=(orders_completed/total_orders)*100
summary_df=pd.DataFrame({
'Total Orders':total_orders,
'Orders Completed':orders_completed,
'Percentage Completed':percentage_completed
}) print(summary_df) plt.figure(figsize=(10,6))
percentage_completed.plot(kind='bar',color='brown')
plt.title('Percentage of Orders Completed by Product Category')
plt.xlabel('Product Category') plt.ylabel('Percentage Completed
(%)') plt.xticks(rotation=45) plt.grid(axis='y')
plt.tight_layout() plt.show()

OUTPUT:-

Total Orders Orders Completed Percentage Completed


Product_Category
Auto 100 48 48.000000
Beauty 422 226 53.554502
Books 103 48 46.601942
Clothing & Apparel 2655 1314 49.491525
Decor 96 42 43.750000
Electronics & Gadgets 2087 1026 49.161476
Food 2493 1190 47.733654
Footwear & Shoes 1064 547 51.409774
Furniture 353 181 51.274788
Games & Toys 386 172 44.559585
Hand & Power Tools 26 19 73.076923
Household items 520 266 51.153846
Office 113 49 43.362832
Pet Care 212 109 51.415094
Sports Products 356 175 49.157303
Stationery 112 56 50.000000
Tupperware 72 31 43.055556
Veterinary 81 44 54.320988
12)Determine the budget spent on each product category in descending order.

PROGRAM:-

import pandas as pd import matplotlib.pyplot as plt


df=pd.read_csv('Diwali Sales Data.csv',encoding='unicode_escape')
df_sorted=df.sort_values(by='Amount',ascending=False) print("Budget
Spent on Each Product Category in Descending Order:")
print(df_sorted[['Product_Category','Amount']])

OUTPUT:-

Budget Spent on Each Product Category in Descending Order:


Product_Category Amount
0 Auto 23952.0
1 Auto 23934.0
2 Auto 23924.0
3 Auto 23912.0
4 Auto 23877.0 .. ... ...
344 Furniture NaN
345 Footwear & Shoes NaN
452 Food NaN
464 Food NaN
493 Food NaN
[11251 rows x 2 columns]
13)Conclude with a detailed explanation of the insights gained from the
dataset.

ANSWER:-

Analyzing the Diwali Sales Dataset can provide valuable insights


into consumer behaviour and market trends. Here’s a detailed
explanation of the insights that could be gained from each column.

1)User_Id and Cust_Name:-

These columns primarily identify unique customers. Insights can


be derived regarding customer retention, frequency of purchases, and
personalized marketing strategies based on individual customer
preferences.

2)Product_Id and Product_Category:-

These columns help in understanding which products are most


popular during Diwali Sales. Analysis of product categories can reveal
trends such as preferences for electronics, apparel, home goods, etc.,
during festive seasons.

3)Gender, Age Group, Age:-

Demographic information such as gender and age group allows


segmentation analysis. Insights into which demographic groups are the
primary buyers during Diwali can inform targeted marketing analysis.

4)Marital Status:-

Understanding the marital status of the customers can influence


product recommendations and promotions. For instance, married couples
might prefer different products compared to single individuals during
festive shopping.

5)Zone:-

Geographic segmentation (by zone) helps in understanding regional


preferences and variations in purchasing behaviour. It can also guide
Logistics and inventory management strategies based on demand patterns
across different regions.

6)Occupation:-

Occupation data provides insights into the purchasing power and


preferences of different professional groups. For example, executives
might have different buying behaviours compared to students or
homemakers during Diwali Sales.

7)Orders and Amount:-


These columns provide quantitative insights into sales volume and
revenue generated. Analysis of order frequency, average order value,
and total sales can help in assessing ovberall business performance
during the festive period.

Insights and Analysis:-

1)Popular Products:-

Identify which product categories or specific products sell the


most during the Diwali Festive Period. This insight can guide
inventory stocking decisions and promotional strategies.

2)Demographic Preferences:-

Determine if there are specific demographic groups(based on age,


gender, marital status, etc.) that contribute more significantly to
sales. This information can tailor marketing messages and product
offerings accordingly.

3)Regional Variances:-

Analyse sales data across different zones to uncover regional


preferences and adapt marketing strategies to regional tastes and
preferences.

4)Seasonal Trends:-

Track year-over-year sales data to identify seasonal trends in


customer behaviour. For instance, are there changes in spending
patterns or product preferences compared to non-festive periods?

5)Customer Lifetime Value:-

Utilize Customer_Id data to calculate Customer Lifetime


Value(CLV)and understand which customer segments are the most valuable
to the business. This can inform customer retention strategies and
loyalty programs.

6)Optimization Opportunities:-

Identify opportunities to optimize marketing spend, promotional


offers, and product placements based on data-driven insights from the
dataset.

You might also like