0% found this document useful (0 votes)
122 views16 pages

Analytics Group Assignment

1. The document discusses analyzing a marketing campaign dataset to understand customer personalities through factors like education, marital status, income, spending habits, and campaign responses. 2. Key attributes in the dataset include customer demographics, spending amounts in various categories, campaign response data, and purchase behaviors. The analysis will examine relationships between these factors. 3. Preliminary EDA found some attributes like income had outliers and missing data, while others like spending amounts were right-skewed. New features like total spending and age were generated for further analysis.

Uploaded by

Alok Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views16 pages

Analytics Group Assignment

1. The document discusses analyzing a marketing campaign dataset to understand customer personalities through factors like education, marital status, income, spending habits, and campaign responses. 2. Key attributes in the dataset include customer demographics, spending amounts in various categories, campaign response data, and purchase behaviors. The analysis will examine relationships between these factors. 3. Preliminary EDA found some attributes like income had outliers and missing data, while others like spending amounts were right-skewed. New features like total spending and age were generated for further analysis.

Uploaded by

Alok Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Analytics Foundation Group

Assignment
Customer Personality Analysis through Marketing Campaign dataset

Amey Lokhande - 22020343043


Ayushi Sharma - 22020343023
Sagnik Mukherjee - 22020343061
Sufiyan Shaikh - 22020343064
Team Lambda
MBA Business Analytics
PROBLEM DEFINITION

Customer Personality Analysis is about the detailed analysis of a company’s ideal customers.
The business wants to better understand its customers and make it easier for them to modify
products according to the specific needs, behaviors and concerns of different types of
customers.

Customer Personality Analysis will help the business to predict the total spending of a
customer based on various factors. The business wants to understand the association
between marketing campaigns and various attributes of customers. It wants to understand
how to make successful marketing campaigns for its customers. It wants to know the
personality of customers like the number of children they have and their buying habits based
on that.

The business wants to understand the spending patterns of customers based on various
factors like Education, Marital status, Number of children, Income, and so on. They want to
find out the correlation between the number of visits to the store by a customer and factors
like income, number of purchases, etc.

EXPLORATORY DATA ANALYSIS

Description of the attributes =>

● People
○ ID: Customer's unique identifier
○ Year_Birth: (Categorical) Customer's birth year
○ Education: (Categorical) Customer's education level
○ Marital_Status: (Categorical) Customer's marital status
○ Income: (Scale) Customer's yearly household income
○ Kidhome: (Scale) Number of children in customer's household
○ Teenhome: (Scale) Number of teenagers in customer's household
○ Dt_Customer: (Categorical) Date of customer's enrollment with the company
○ Recency: (Scale) Number of days since customer's last purchase
○ Complain: (Categorical) 1 if the customer complained in the last 2 years, 0
otherwise
● Products
○ MntWines: (Scale) Amount spent on wine in last 2 years
○ MntFruits: (Scale) Amount spent on fruits in last 2 years

1
○ MntMeatProducts: (Scale) Amount spent on meat in last 2 years
○ MntFishProducts: (Scale) Amount spent on fish in last 2 years
○ MntSweetProducts: (Scale) Amount spent on sweets in last 2 years
○ MntGoldProds: (Scale) Amount spent on gold in last 2 years
● Promotion
○ NumDealsPurchases: (Scale) Number of purchases made with a discount
○ AcceptedCmp1: (Categorical) 1 if customer accepted the offer in the 1st
campaign, 0 otherwise
○ AcceptedCmp2: (Categorical) 1 if customer accepted the offer in the 2nd
campaign, 0 otherwise
○ AcceptedCmp3: (Categorical) 1 if customer accepted the offer in the 3rd
campaign, 0 otherwise
○ AcceptedCmp4: (Categorical) 1 if customer accepted the offer in the 4th
campaign, 0 otherwise
○ AcceptedCmp5: (Categorical) 1 if customer accepted the offer in the 5th
campaign, 0 otherwise
○ Response: (Categorical) 1 if customer accepted the offer in the last campaign, 0
otherwise
● Place
○ NumWebPurchases: (Scale) Number of purchases made through the company’s
website
○ NumCatalogPurchases: (Scale) Number of purchases made using a catalogue
○ NumStorePurchases: (Scale) Number of purchases made directly in stores
○ NumWebVisitsMonth: (Scale) Number of visits to company’s website in the last
month

2
Description of dataset

● Number of attributes/columns = 29
● Number of instances/rows = 2240
● There is no missing data in all column/attribute except Income column
● There are 26 attributes with integer datatype & 3 attributes with factor datatype
● Education - 5 unique values. Will need to reduce/replace them with something more
meaningful. Half of the people are graduated.

3
● Marital_status - 8 unique values. Will need to reduce/replace them with something
more meaningful. We can say that approximately 40% are married and 60% are
single.
● Income - Maximum & Minimum values are very high than the mean which means
there are outliers in income. The standard dev is also very high which means that the
data is very highly dispersed.
● Kidhome, Teenhome - Maximum value is 2
● MntWines, MntFruits, MntMeatProducts, MntFishProducts, MntSweetProducts,
MntGoldProds - Slight difference between Q3 & maximum value. It means that there
may be outliers in the data.
● NumDealsPurchases, NumWebPurchases, NumCatalogPurchases,
NumStorePurchases, NumWebVisitsMonth - Slight difference exist between Q3 & the
maximum value. It means that there may be outliers in the data.
● More statistical inferences can be added here

Handling missing/null values and Outliers

According to the below screenshot, we can see that only Income has missing data & there
are 24 missing values.

As there are only 24 missing values under the Income column out of a total of 2240
records, we will delete those rows.

Outlier detection & removal

Following are the attributes in which outliers are present - Income, Age & Total_Spending

4
Feature Generation

Age - Generating a new scaled variable called Age with the help of Year_Birth (Age = 2014
- Year_Birth. We are considering 2014 as the year because this dataset is from 2014)

Total_Spending - Generating new scaled variable as below

Total_Children - Generating a new scaled variable called Total_children with the help of
teen_home & kids_home

Has_Child - Generating a new categorical variable called as Has_Child as below

5
Marital_Status - Updating the levels under marital_status & bringing it down from 8 to 2
levels as below.

Education - Updating the levels under education and bringing it down from 5 to 2 levels
as below

Nature of distribution using Kurtosis & Skewness

Income - As Skewness and Kurtosis for income is very high, we can say that it is a
distorted distribution (not normally distributed).

MntFruits, MntMeatProducts, MntFishProducts, MntSweetProducts, MntGoldProducts -


All these are scale variables & they have slightly higher value of Kurtosis which means
that they are Platykurtic in nature

6
STATISTICAL ANALYSIS

1. Chi-Square

Problem 1 -

Null Hypotheses - There is no association/relation between the Education of a


customer and whether the customer accepts the offer in the first campaign or not

Alternate Hypotheses - There is some association/relation between the Education


of a customer and whether the customer accepts the offer in the first campaign or
not

7
Conclusion - As the p-value is greater than the level of significance - 5%, we accept
the Null Hypotheses.

Hence, we can conclude that there is weak or no association between the


education of a customer and whether the customer accepts the offer in the first
campaign or not

Problem 2 -

Null Hypotheses - There is no association/relation between the marital status of a


customer and whether the customer accepts the offer in the first campaign or not

Alternate Hypotheses - There is some association/relation between the marital


status of a customer and whether the customer accepts the offer in the first
campaign or not

Conclusion - As the p-value is greater than the level of significance - 5%, we do not
reject the Null Hypotheses.

Hence, we can conclude that there is weak or no association between the marital
status of a customer and whether the customer accepts the offer in the first
campaign or not

Problem 3 -

Null Hypotheses - There is no association/relation between whether the customer


has a child or not and whether the customer accepts the offer in the first
campaign or not

Alternate Hypotheses - There is some association/relation between whether the


customer has a child or not and whether the customer accepts the offer in the first
campaign or not

8
Conclusion - As the p-value is significantly lesser than the level of significance -
5%, we reject the Null Hypotheses.

Hence, we can conclude that there is some association between the customer has
a child or not and whether the customer accepts the offer in the first campaign or
not

2. T-test

Problem 1 -

Null hypothesis: The mean value of Total spending of Grad Students = The mean
value of Total spending of Post Grad Students

Alternate hypothesis: The mean value of Total spending of Grad Students ≠ The
mean value of Total spending of Post Grad Students

Since the ratio of variance of spending of grad students and variance of spending
of post-grad students is less than 4. We can assume equal variance t-test

Conclusion - As the p-value is significantly lesser than the level of significance -


5%, we reject the Null Hypotheses.

Thus the mean value of Total spending of Grad Students ≠ The mean value of
Total spending of Post Grad Students

9
Problem 2 -

Null hypothesis: The mean value of Total spending of a customer having children
= The mean value of Total spending of a customer not having children

Alternate hypothesis: The mean value of Total spending of a customer having


children ≠ The mean value of Total spending of a customer not having children

Since the ratio of variance of spending of customers having children and variance
of spending of customers not having children is less than 4. We can assume equal
variance t-test

Conclusion - As the p-value is significantly lesser than the level of significance -


5%, we reject the Null Hypotheses.

Thus the mean value of Total spending of customers having children ≠ The mean
value of Total spending of customers not having children.

Problem 3 -

Null hypothesis: The mean value of Total spending of non-married customers =


The mean value of Total spending of married customers.

Alternate hypothesis: The mean value of Total spending of non-married customers


≠ The mean value of Total spending of married customers

Since the ratio of variance of married customers’ spending and variance of


non-married customers’ spending is less than 4. We can assume equal variance t
test

10
Conclusion - As the p-value is significantly lesser than the level of significance -
5%, we reject the Null Hypotheses.

Thus the mean value of Total spending of single people ≠ The mean value of Total
spending of married people

3. ANOVA

Problem 1 -

Null hypothesis: mean of total spending of people having 0 children = mean of


total spending of people having 1 children = mean of total spending of people
having 2 children = mean of total spending of people having 3 children

Alternate hypothesis: At least one of them are unequal

Standard F value = 3.84568895 Calculated F value = 731

Conclusion - For degree of freedom (1, 2200) the calculated F value is greater than
the standard F value. Hence we reject the null hypothesis.

Thus the mean value of at least one of them is unequal.

Problem 2 -

Null hypothesis: The mean of total spending of undergraduates = The mean of


total spending of postgraduates

11
Alternate hypothesis: At least one of them are unequal

Standard F value = 3.84568895 Calculated F value = 30.43

Conclusion - For degree of freedom (1, 2200) the calculated F value is greater than
the standard F value. Hence we reject the null hypothesis.

Thus the mean value of at least one of them is unequal.

Problem 3 -

Null hypothesis: The mean of total spending of Parents = The mean of total
spending of non parents

Alternate hypothesis: At least one of them are unequal

Standard F value = 3.84568895 Calculated F value = 819

Conclusion - For degree of freedom (1, 2200) the calculated F value is greater than
the standard F value. Hence we reject the null hypothesis.

Thus the mean value of at least one of them is unequal.

4. Correlation

Problem 1 -

Objective - To find out the correlation between Income and total spending of a
customer.

Null Hypothesis - Income is not correlated with the total spending of a customer.

Alternate Hypothesis - Income is correlated with the total spending of a customer.

12
Conclusion - With 95% confidence we can conclude that Income is related to
total spending.

Problem 2 -

Null Hypothesis - Income is not correlated with the Number of visits per month of
a customer

Alternate Hypothesis - Income is correlated with the number of visits per month
of a customer

Conclusion- With 95% confidence we conclude that Income is correlated with


number of visits per month

13
Problem 3 -

Null Hypothesis - The number of store purchases by a customer is not correlated


with the total spending of a customer

Alternative Hypothesis - The number of store purchases by a customer is


correlated with the total spending of a customer

Conclusion - With 95% confidence we conclude that Number of store


purchases is correlated with Total spending

5. Regression

STATISTICAL CONCLUSION

1. Chi-Square test conclusions -


● There is a weak or no association between the education of a customer and
whether the customer accepts the offer in the first campaign or not.
● There is a weak or no association between the marital status of a customer
and whether the customer accepts the offer in the first campaign or not.
● There is some association between whether the customer has a child or not
and whether the customer accepts the offer in the first campaign or not.
2. T-Test conclusions -

○ The mean value of Total spending of Grad Students ≠ The mean value of
Total spending of Post Grad Students.

14
○ The mean value of Total spending of customers having children ≠ The
mean value of Total spending of customers not having children.

○ The mean value of Total spending of single people ≠ The mean value of
Total spending of married people
3. ANOVA conclusions -

○ At least one of the mean values of total spending of people having 0, 1, 2, 3


children is unequal.

○ The mean value of total spending of undergraduate and postgraduate


students is unequal.

○ The mean value of total spending of Parents and non parents is unequal.
4. Correlation conclusions -
○ Income of the customers are directly related to total spending
○ Income of the customers are inversely related to total number of visits to
the store
○ Total number of store purchases is directly related to total spending
5. Regression analysis conclusions -

15

You might also like