0% found this document useful (0 votes)
23 views59 pages

The Role of Cultural Traditions: A Predictive Study On Husband's Age and Karwa Chauth

Uploaded by

jmegh03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views59 pages

The Role of Cultural Traditions: A Predictive Study On Husband's Age and Karwa Chauth

Uploaded by

jmegh03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

The Role of Cultural Traditions: A Predictive

Study on Husband's Age and Karwa Chauth

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('karwa_chauth_dataset_new.csv')

df.head()

Year Wife's Age Marriage Duration (Years) Number of Children \


0 2022 27 4 4
1 2023 32 18 4
2 2022 31 5 4
3 2022 40 16 3
4 2022 40 14 4
Husband's Age Educational Qualification Occupation Cultural
Background \
0 30 Master's Teacher South
Indian
1 33 PhD Teacher West
Indian
2 35 High School Engineer North
Indian
3 46 PhD Engineer South
Indian
4 33 Bachelor's Engineer North
Indian

City/Region Economic Status ... Cultural Activities Gifts Exchanged


\
0 Kolkata Low ... Cultural dance Clothing

1 Kolkata High ... Local festivals Sweets

2 Chennai Low ... Traditional games Sweets

3 Bangalore High ... Family activities Clothing

4 Mumbai Low ... Family activities Cash gifts

Health Impact Social Media Trends Public Celebrations \


0 Positive mood Trending hashtags Public gatherings
1 Neutral mood Engaging discussions Public gatherings
2 Neutral mood Trending hashtags Local fairs
3 Negative mood Engaging discussions Public gatherings
4 Positive mood Viral posts Cultural parades

Food Prepared Common Myths Economic Impact


\
0 Traditional dishes Fasting is harmful Low

1 Savory snacks Fasting is harmful High

2 Traditional dishes Fasting improves relationships Moderate

3 Traditional dishes Fasting improves relationships Moderate

4 Sweets Fasting leads to weight loss Low

Emotional Impact Longevity Influence of Karwa Chauth


0 Mixed emotions Moderate belief in influence
1 Satisfaction Low belief in influence
2 Increased happiness Low belief in influence
3 Satisfaction Moderate belief in influence
4 Frustration Strong belief in influence

[5 rows x 27 columns]

df.tail()

Year Wife's Age Marriage Duration (Years) Number of Children


\
9995 2023 33 11 4

9996 2022 29 2 2

9997 2023 37 13 3

9998 2023 32 1 2

9999 2022 31 5 3

Husband's Age Educational Qualification Occupation Cultural


Background \
9995 51 Bachelor's Teacher East
Indian
9996 55 PhD Teacher North
Indian
9997 58 High School Laborer East
Indian
9998 29 Master's Doctor East
Indian
9999 35 Bachelor's Teacher West
Indian

City/Region Economic Status ... Cultural Activities Gifts


Exchanged \
9995 Mumbai Middle ... Cultural dance
Fruits
9996 Chennai High ... Community prayers Cash
gifts
9997 Mumbai Low ... Traditional games
Jewelry
9998 Chennai Low ... Family activities
Clothing
9999 Kolkata Middle ... Family activities
Sweets

Health Impact Social Media Trends Public Celebrations \


9995 Negative mood Viral posts Local fairs
9996 Negative mood Viral posts Public gatherings
9997 Negative mood Engaging discussions Public gatherings
9998 Negative mood Engaging discussions Public gatherings
9999 Positive mood Engaging discussions Local fairs

Food Prepared Common Myths Economic


Impact \
9995 Savory snacks Fasting causes stress
Moderate
9996 Sweets Fasting leads to weight loss
Moderate
9997 Rice, Lentils Fasting is harmful
High
9998 Savory snacks Fasting promotes health
High
9999 Traditional dishes Fasting improves relationships
Low

Emotional Impact Longevity Influence of Karwa Chauth


9995 Joyful celebrations Low belief in influence
9996 Joyful celebrations Low belief in influence
9997 Joyful celebrations Moderate belief in influence
9998 Mixed emotions Moderate belief in influence
9999 Frustration Low belief in influence

[5 rows x 27 columns]

df.shape

(10000, 27)

df.columns

Index(['Year', 'Wife's Age', 'Marriage Duration (Years)', 'Number of


Children',
'Husband's Age', 'Educational Qualification', 'Occupation',
'Cultural Background', 'City/Region', 'Economic Status',
'Husband's Health Status', 'Lifestyle Factors',
'Participation Frequency', 'Perceived Longevity Factors', 'Age
Group',
'Marital Status', 'Traditions Observed', 'Cultural Activities',
'Gifts Exchanged', 'Health Impact', 'Social Media Trends',
'Public Celebrations', 'Food Prepared', 'Common Myths',
'Economic Impact', 'Emotional Impact',
'Longevity Influence of Karwa Chauth'],
dtype='object')

df.duplicated().sum()

df.isnull().sum()
Year 0
Wife's Age 0
Marriage Duration (Years) 0
Number of Children 0
Husband's Age 0
Educational Qualification 0
Occupation 0
Cultural Background 0
City/Region 0
Economic Status 0
Husband's Health Status 0
Lifestyle Factors 0
Participation Frequency 0
Perceived Longevity Factors 0
Age Group 0
Marital Status 0
Traditions Observed 0
Cultural Activities 0
Gifts Exchanged 0
Health Impact 0
Social Media Trends 0
Public Celebrations 0
Food Prepared 0
Common Myths 0
Economic Impact 0
Emotional Impact 0
Longevity Influence of Karwa Chauth 0
dtype: int64

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 10000 non-null int64
1 Wife's Age 10000 non-null int64
2 Marriage Duration (Years) 10000 non-null int64
3 Number of Children 10000 non-null int64
4 Husband's Age 10000 non-null int64
5 Educational Qualification 10000 non-null object
6 Occupation 10000 non-null object
7 Cultural Background 10000 non-null object
8 City/Region 10000 non-null object
9 Economic Status 10000 non-null object
10 Husband's Health Status 10000 non-null object
11 Lifestyle Factors 10000 non-null object
12 Participation Frequency 10000 non-null object
13 Perceived Longevity Factors 10000 non-null object
14 Age Group 10000 non-null object
15 Marital Status 10000 non-null object
16 Traditions Observed 10000 non-null object
17 Cultural Activities 10000 non-null object
18 Gifts Exchanged 10000 non-null object
19 Health Impact 10000 non-null object
20 Social Media Trends 10000 non-null object
21 Public Celebrations 10000 non-null object
22 Food Prepared 10000 non-null object
23 Common Myths 10000 non-null object
24 Economic Impact 10000 non-null object
25 Emotional Impact 10000 non-null object
26 Longevity Influence of Karwa Chauth 10000 non-null object
dtypes: int64(5), object(22)
memory usage: 2.1+ MB

df.describe()

Year Wife's Age Marriage Duration (Years) \


count 10000.000000 10000.000000 10000.000000
mean 2022.498700 32.533700 10.614000
std 0.500023 4.606689 5.793717
min 2022.000000 25.000000 1.000000
25% 2022.000000 29.000000 6.000000
50% 2022.000000 33.000000 11.000000
75% 2023.000000 37.000000 16.000000
max 2023.000000 40.000000 20.000000

Number of Children Husband's Age


count 10000.000000 10000.000000
mean 2.059600 42.423300
std 1.420017 10.389221
min 0.000000 25.000000
25% 1.000000 33.000000
50% 2.000000 42.000000
75% 3.000000 51.000000
max 4.000000 60.000000

df.nunique()

Year 2
Wife's Age 16
Marriage Duration (Years) 20
Number of Children 5
Husband's Age 36
Educational Qualification 4
Occupation 5
Cultural Background 4
City/Region 5
Economic Status 3
Husband's Health Status 3
Lifestyle Factors 3
Participation Frequency 2
Perceived Longevity Factors 5
Age Group 3
Marital Status 2
Traditions Observed 4
Cultural Activities 5
Gifts Exchanged 5
Health Impact 3
Social Media Trends 3
Public Celebrations 3
Food Prepared 4
Common Myths 5
Economic Impact 3
Emotional Impact 5
Longevity Influence of Karwa Chauth 3
dtype: int64

object_columns = df.select_dtypes(include=['object']).columns
print("Object type columns:")
print(object_columns)

numerical_columns = df.select_dtypes(include=['int64',
'float64']).columns
print("\nNumerical type columns:")
print(numerical_columns)

Object type columns:


Index(['Educational Qualification', 'Occupation', 'Cultural
Background',
'City/Region', 'Economic Status', 'Husband's Health Status',
'Lifestyle Factors', 'Participation Frequency',
'Perceived Longevity Factors', 'Age Group', 'Marital Status',
'Traditions Observed', 'Cultural Activities', 'Gifts
Exchanged',
'Health Impact', 'Social Media Trends', 'Public Celebrations',
'Food Prepared', 'Common Myths', 'Economic Impact', 'Emotional
Impact',
'Longevity Influence of Karwa Chauth'],
dtype='object')

Numerical type columns:


Index(['Year', 'Wife's Age', 'Marriage Duration (Years)', 'Number of
Children',
'Husband's Age'],
dtype='object')

def classify_features(df):
categorical_features = []
non_categorical_features = []
discrete_features = []
continuous_features = []

for column in df.columns:


if df[column].dtype == 'object':
if df[column].nunique() < 10:
categorical_features.append(column)
else:
non_categorical_features.append(column)
elif df[column].dtype in ['int64', 'float64']:
if df[column].nunique() < 10:
discrete_features.append(column)
else:
continuous_features.append(column)

return categorical_features, non_categorical_features,


discrete_features, continuous_features

categorical, non_categorical, discrete, continuous =


classify_features(df)

print("Categorical Features:", categorical)


print("Non-Categorical Features:", non_categorical)
print("Discrete Features:", discrete)
print("Continuous Features:", continuous)

Categorical Features: ['Educational Qualification', 'Occupation',


'Cultural Background', 'City/Region', 'Economic Status', "Husband's
Health Status", 'Lifestyle Factors', 'Participation Frequency',
'Perceived Longevity Factors', 'Age Group', 'Marital Status',
'Traditions Observed', 'Cultural Activities', 'Gifts Exchanged',
'Health Impact', 'Social Media Trends', 'Public Celebrations', 'Food
Prepared', 'Common Myths', 'Economic Impact', 'Emotional Impact',
'Longevity Influence of Karwa Chauth']
Non-Categorical Features: []
Discrete Features: ['Year', 'Number of Children']
Continuous Features: ["Wife's Age", 'Marriage Duration (Years)',
"Husband's Age"]

for i in continuous:
plt.figure(figsize=(15,6))
sns.histplot(df[i], bins = 20, kde = True, palette='hls')
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
plt.figure(figsize=(15,6))
sns.distplot(df[i], bins = 20, kde = True)
plt.xticks(rotation = 90)
plt.show()
for i in continuous:
plt.figure(figsize=(15, 6))
sns.boxplot(x=i, data=df, palette='hls')
plt.xticks(rotation=90)
plt.show()
for i in discrete:
print(i)
print(df[i].unique())
print()

Year
[2022 2023]

Number of Children
[4 3 0 1 2]

for i in discrete:
print(i)
print(df[i].value_counts())
print()
Year
Year
2022 5013
2023 4987
Name: count, dtype: int64

Number of Children
Number of Children
4 2151
3 2026
2 2009
0 1918
1 1896
Name: count, dtype: int64

for i in discrete:
plt.figure(figsize=(15, 6))
ax = sns.countplot(x=i, data=df, palette='hls')

for p in ax.patches:
height = p.get_height()
ax.annotate(f'{height}',
xy=(p.get_x() + p.get_width() / 2., height),
xytext=(0, 10),
textcoords='offset points',
ha='center', va='center')

plt.show()
import plotly.express as px

for i in discrete:
counts = df[i].value_counts()
fig = px.pie(counts, values=counts.values, names=counts.index,
title=f'Distribution of {i}')
fig.show()
for i in categorical:
print(i)
print(df[i].unique())
print()

Educational Qualification
["Master's" 'PhD' 'High School' "Bachelor's"]

Occupation
['Teacher' 'Engineer' 'Doctor' 'Laborer' 'Businessman']

Cultural Background
['South Indian' 'West Indian' 'North Indian' 'East Indian']

City/Region
['Kolkata' 'Chennai' 'Bangalore' 'Mumbai' 'Delhi']

Economic Status
['Low' 'High' 'Middle']

Husband's Health Status


['Good' 'Poor' 'Average']

Lifestyle Factors
['Non-smoker, Non-drinker' 'Smoker, Non-drinker' 'Occasional drinker']

Participation Frequency
['Occasionally' 'Every year']

Perceived Longevity Factors


['Nutritional supplements' 'Healthy diet, Regular exercise'
'Balanced diet, Active lifestyle' 'Family history of longevity'
'Stress management']
Age Group
['25-30' '36-40' '31-35']

Marital Status
['Married' 'Divorced']

Traditions Observed
['Fasting, Offerings' 'Fasting, Evening Puja' 'Fasting, Prayer'
'Fasting, Rituals']

Cultural Activities
['Cultural dance' 'Local festivals' 'Traditional games'
'Family activities' 'Community prayers']

Gifts Exchanged
['Clothing' 'Sweets' 'Cash gifts' 'Jewelry' 'Fruits']

Health Impact
['Positive mood' 'Neutral mood' 'Negative mood']

Social Media Trends


['Trending hashtags' 'Engaging discussions' 'Viral posts']

Public Celebrations
['Public gatherings' 'Local fairs' 'Cultural parades']

Food Prepared
['Traditional dishes' 'Savory snacks' 'Sweets' 'Rice, Lentils']

Common Myths
['Fasting is harmful' 'Fasting improves relationships'
'Fasting leads to weight loss' 'Fasting causes stress'
'Fasting promotes health']

Economic Impact
['Low' 'High' 'Moderate']

Emotional Impact
['Mixed emotions' 'Satisfaction' 'Increased happiness' 'Frustration'
'Joyful celebrations']

Longevity Influence of Karwa Chauth


['Moderate belief in influence' 'Low belief in influence'
'Strong belief in influence']

for i in categorical:
print(i)
print(df[i].value_counts())
print()
Educational Qualification
Educational Qualification
High School 2510
Bachelor's 2508
PhD 2506
Master's 2476
Name: count, dtype: int64

Occupation
Occupation
Businessman 2041
Teacher 2033
Engineer 2015
Doctor 1970
Laborer 1941
Name: count, dtype: int64

Cultural Background
Cultural Background
West Indian 2538
North Indian 2526
South Indian 2492
East Indian 2444
Name: count, dtype: int64

City/Region
City/Region
Mumbai 2050
Chennai 2021
Kolkata 1999
Delhi 1976
Bangalore 1954
Name: count, dtype: int64

Economic Status
Economic Status
Low 3391
Middle 3332
High 3277
Name: count, dtype: int64

Husband's Health Status


Husband's Health Status
Poor 3374
Good 3347
Average 3279
Name: count, dtype: int64

Lifestyle Factors
Lifestyle Factors
Non-smoker, Non-drinker 3362
Smoker, Non-drinker 3357
Occasional drinker 3281
Name: count, dtype: int64

Participation Frequency
Participation Frequency
Occasionally 5027
Every year 4973
Name: count, dtype: int64

Perceived Longevity Factors


Perceived Longevity Factors
Family history of longevity 2066
Balanced diet, Active lifestyle 2008
Stress management 2008
Nutritional supplements 1976
Healthy diet, Regular exercise 1942
Name: count, dtype: int64

Age Group
Age Group
25-30 3427
36-40 3326
31-35 3247
Name: count, dtype: int64

Marital Status
Marital Status
Divorced 5029
Married 4971
Name: count, dtype: int64

Traditions Observed
Traditions Observed
Fasting, Rituals 2591
Fasting, Offerings 2511
Fasting, Evening Puja 2449
Fasting, Prayer 2449
Name: count, dtype: int64

Cultural Activities
Cultural Activities
Traditional games 2006
Community prayers 2006
Local festivals 2002
Family activities 1998
Cultural dance 1988
Name: count, dtype: int64
Gifts Exchanged
Gifts Exchanged
Cash gifts 2019
Clothing 2010
Sweets 2000
Jewelry 1988
Fruits 1983
Name: count, dtype: int64

Health Impact
Health Impact
Neutral mood 3356
Positive mood 3334
Negative mood 3310
Name: count, dtype: int64

Social Media Trends


Social Media Trends
Viral posts 3395
Trending hashtags 3319
Engaging discussions 3286
Name: count, dtype: int64

Public Celebrations
Public Celebrations
Cultural parades 3343
Local fairs 3338
Public gatherings 3319
Name: count, dtype: int64

Food Prepared
Food Prepared
Savory snacks 2571
Rice, Lentils 2487
Sweets 2477
Traditional dishes 2465
Name: count, dtype: int64

Common Myths
Common Myths
Fasting causes stress 2066
Fasting improves relationships 2030
Fasting is harmful 1993
Fasting leads to weight loss 1984
Fasting promotes health 1927
Name: count, dtype: int64

Economic Impact
Economic Impact
Moderate 3350
High 3333
Low 3317
Name: count, dtype: int64

Emotional Impact
Emotional Impact
Joyful celebrations 2044
Mixed emotions 2017
Satisfaction 1997
Increased happiness 1979
Frustration 1963
Name: count, dtype: int64

Longevity Influence of Karwa Chauth


Longevity Influence of Karwa Chauth
Low belief in influence 3358
Strong belief in influence 3332
Moderate belief in influence 3310
Name: count, dtype: int64

for i in categorical:
plt.figure(figsize=(15, 6))
ax = sns.countplot(x=i, data=df, palette='hls')

for p in ax.patches:
height = p.get_height()
ax.annotate(f'{height}',
xy=(p.get_x() + p.get_width() / 2., height),
xytext=(0, 10),
textcoords='offset points',
ha='center', va='center')

plt.show()
for i in categorical:
counts = df[i].value_counts()
fig = px.pie(counts, values=counts.values, names=counts.index,
title=f'Distribution of {i}')
fig.show()
for feature in categorical:
df[feature] = df[feature].astype('category')

sns.set(style="whitegrid")

continuous_features = ["Wife's Age", 'Marriage Duration (Years)',


"Husband's Age"]

for continuous_feature in continuous_features:


num_features = len(categorical)
cols = 4
rows = num_features // cols + (num_features % cols > 0)

plt.figure(figsize=(20, 5 * rows))
for i, feature in enumerate(categorical):
plt.subplot(rows, cols, i + 1)
sns.boxplot(x=feature, y=continuous_feature, data=df)
plt.title(f'{continuous_feature} by {feature}')
plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

anova_results = {}
for feature in ['Educational Qualification', 'Occupation', 'Cultural
Background']:
groups = [group[continuous_features[2]] for name, group in
data.groupby(feature)]
anova_results[feature] = stats.f_oneway(*groups)

for feature, result in anova_results.items():


print(f'ANOVA results for {feature}: F-statistic =
{result.statistic}, p-value = {result.pvalue}')
----------------------------------------------------------------------
-----
NameError Traceback (most recent call
last)
Cell In[36], line 26
24 anova_results = {}
25 for feature in ['Educational Qualification', 'Occupation',
'Cultural Background']:
---> 26 groups = [group[continuous_features[2]] for name, group in
data.groupby(feature)]
27 anova_results[feature] = stats.f_oneway(*groups)
29 for feature, result in anova_results.items():

NameError: name 'data' is not defined

corr = df[["Wife's Age", 'Marriage Duration (Years)', "Husband's


Age"]].corr()
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap='coolwarm', square=True)
plt.title('Correlation Matrix for Continuous Variables')
plt.show()
participation_order = ['Never', 'Rarely', 'Occasionally', 'Every
year', 'Sometimes', 'Often', 'Always']

df['Participation Frequency'] = pd.Categorical(df['Participation


Frequency'],

categories=participation_order,
ordered=True)

plt.figure(figsize=(10, 6))
sns.stripplot(x='Participation Frequency', y="Husband's Age", data=df,
jitter=True)
plt.title("Husband's Age vs Participation Frequency in Karwa Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.boxplot(x='Participation Frequency', y="Husband's Age", data=df)
plt.title("Distribution of Husband's Age by Participation Frequency in
Karwa Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.violinplot(x='Participation Frequency', y="Husband's Age",
data=df)
plt.title("Distribution of Husband's Age by Participation Frequency in
Karwa Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.pointplot(x='Participation Frequency', y="Husband's Age", data=df,
ci='sd')
plt.title("Average Husband's Age by Participation Frequency in Karwa
Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.barplot(x='Participation Frequency', y="Husband's Age", data=df,
ci='sd')
plt.title("Mean Husband's Age by Participation Frequency in Karwa
Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.swarmplot(x='Participation Frequency', y="Husband's Age", data=df)
plt.title("Husband's Age Distribution by Participation Frequency in
Karwa Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
sns.catplot(x='Participation Frequency', y="Husband's Age", data=df,
kind='box', height=6, aspect=2)
plt.title("Box Plot of Husband's Age by Participation Frequency in
Karwa Chauth")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.show()
plt.figure(figsize=(10, 6))
sns.boxplot(x='Participation Frequency', y="Husband's Age",
hue='City/Region', data=df)
plt.title("Husband's Age by Participation Frequency in Karwa Chauth
(by Region)")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.legend(title='Region', bbox_to_anchor=(1, 1))
plt.show()
plt.figure(figsize=(10, 6))
sns.barplot(x='Participation Frequency', y="Husband's Age",
hue='City/Region', data=df, ci='sd')
plt.title("Mean Husband's Age by Participation Frequency in Karwa
Chauth (by Region)")
plt.xlabel('Participation Frequency')
plt.ylabel("Husband's Age")
plt.xticks(rotation=45)
plt.legend(title='Region', bbox_to_anchor=(1, 1))
plt.show()
pivot_1 = df.pivot_table(values="Husband's Age",
index='Participation Frequency',
aggfunc='mean')
print(pivot_1)

Husband's Age
Participation Frequency
Occasionally 42.440024
Every year 42.406395

pivot_2 = df.pivot_table(values="Husband's Age",


index='Participation Frequency',
aggfunc=['mean', 'min', 'max', 'count'])
print(pivot_2)

mean min max \


Husband's Age Husband's Age Husband's Age
Participation Frequency
Occasionally 42.440024 25.0 60.0
Every year 42.406395 25.0 60.0
Never NaN NaN NaN
Rarely NaN NaN NaN
Sometimes NaN NaN NaN
Often NaN NaN NaN
Always NaN NaN NaN

count
Husband's Age
Participation Frequency
Occasionally 5027
Every year 4973
Never 0
Rarely 0
Sometimes 0
Often 0
Always 0

pivot_3 = df.pivot_table(values="Husband's Age",


index='Participation Frequency',
columns='City/Region',
aggfunc='mean')
print(pivot_3)

City/Region Bangalore Chennai Delhi Kolkata


Mumbai
Participation Frequency

Occasionally 42.084275 42.566440 42.445771 42.266599


42.805582
Every year 42.135576 42.777778 42.622039 42.700000
41.802176

pivot_4 = df.pivot_table(values=["Husband's Age", "Wife's Age"],


index='Participation Frequency',
aggfunc='mean')
print(pivot_4)

Husband's Age Wife's Age


Participation Frequency
Occasionally 42.440024 32.480008
Every year 42.406395 32.587975

pivot_5 = df.pivot_table(values="Husband's Age",


index='Participation Frequency',
columns='City/Region',
aggfunc='mean',
margins=True,
margins_name='Overall')
print(pivot_5)

City/Region Bangalore Chennai Delhi Kolkata \


Participation Frequency
Occasionally 42.084275 42.566440 42.445771 42.266599
Every year 42.135576 42.777778 42.622039 42.700000
Overall 42.110031 42.669965 42.532389 42.487744

City/Region Mumbai Overall


Participation Frequency
Occasionally 42.805582 42.440024
Every year 41.802176 42.406395
Overall 42.310732 42.423300

from sklearn.model_selection import train_test_split


from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.impute import SimpleImputer

bins = [0, 40, 60, 100]


labels = ['<40', '40-60', '>60']
df['Husband_Age_Category'] = pd.cut(df["Husband's Age"], bins=bins,
labels=labels)

features = ['Wife\'s Age', 'Marriage Duration (Years)', 'Number of


Children',
'Cultural Background', 'Lifestyle Factors', 'Economic
Status',
'Participation Frequency', 'Traditions Observed', 'Health
Impact']

X = df[features]
y = df['Husband_Age_Category']

categorical_features = ['Cultural Background', 'Lifestyle Factors',


'Economic Status',
'Participation Frequency', 'Traditions
Observed', 'Health Impact']
numerical_features = ['Wife\'s Age', 'Marriage Duration (Years)',
'Number of Children']

preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numerical_features),
('cat', OneHotEncoder(drop='first', handle_unknown='ignore'),
categorical_features)
])

pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', RandomForestClassifier(random_state=42))
])
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify =
y, test_size=0.2, random_state=42)

pipeline.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('num',
StandardScaler(),
["Wife's Age",
'Marriage Duration
(Years)',
'Number of
Children']),
('cat',

OneHotEncoder(drop='first',

handle_unknown='ignore'),
['Cultural
Background',
'Lifestyle
Factors',
'Economic Status',
'Participation
Frequency',
'Traditions
Observed',
'Health
Impact'])])),
('classifier',
RandomForestClassifier(random_state=42))])

y_pred = pipeline.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[762 348]
[597 293]]
precision recall f1-score support

40-60 0.56 0.69 0.62 1110


<40 0.46 0.33 0.38 890

accuracy 0.53 2000


macro avg 0.51 0.51 0.50 2000
weighted avg 0.51 0.53 0.51 2000

from sklearn.tree import DecisionTreeClassifier


dt_pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', DecisionTreeClassifier(random_state=42))
])

print("\nTraining Decision Tree Classifier...")


dt_pipeline.fit(X_train, y_train)
y_pred_dt = dt_pipeline.predict(X_test)

Training Decision Tree Classifier...

print("\nDecision Tree Model Evaluation:")


print(confusion_matrix(y_test, y_pred_dt))
print(classification_report(y_test, y_pred_dt))

Decision Tree Model Evaluation:


[[615 495]
[486 404]]
precision recall f1-score support

40-60 0.56 0.55 0.56 1110


<40 0.45 0.45 0.45 890

accuracy 0.51 2000


macro avg 0.50 0.50 0.50 2000
weighted avg 0.51 0.51 0.51 2000

from sklearn.linear_model import LogisticRegression

lr_pipeline = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', LogisticRegression(max_iter=1000, random_state=42))
])

print("\nTraining Logistic Regression Classifier...")


lr_pipeline.fit(X_train, y_train)
y_pred_lr = lr_pipeline.predict(X_test)

Training Logistic Regression Classifier...

print("\nLogistic Regression Model Evaluation:")


print(confusion_matrix(y_test, y_pred_lr))
print(classification_report(y_test, y_pred_lr))

Logistic Regression Model Evaluation:


[[1105 5]
[ 885 5]]
precision recall f1-score support

40-60 0.56 1.00 0.71 1110


<40 0.50 0.01 0.01 890

accuracy 0.56 2000


macro avg 0.53 0.50 0.36 2000
weighted avg 0.53 0.56 0.40 2000

models = ['Random Forest', 'Decision Tree', 'Logistic Regression']


precision = [0.56, 0.56, 0.56]
recall = [0.69, 0.55, 1.00]
f1_score = [0.62, 0.56, 0.71]

metrics_df = pd.DataFrame({
'Model': models,
'Precision': precision,
'Recall': recall,
'F1-Score': f1_score
})

sns.set(style="whitegrid")

plt.figure(figsize=(10, 6))

metrics_df_melted = metrics_df.melt(id_vars='Model',
var_name='Metric', value_name='Score')

sns.barplot(data=metrics_df_melted, x='Model', y='Score',


hue='Metric', palette='muted')

plt.title('Comparative Evaluation of Classification Models',


fontsize=16)
plt.xlabel('Models', fontsize=14)
plt.ylabel('Scores', fontsize=14)
plt.ylim(0, 1)
plt.legend(title='Metrics')
plt.grid(axis='y')

plt.show()
Thanks !!!

You might also like