0% found this document useful (0 votes)
55 views29 pages

ET - Project Presentation Solution

This document discusses analyzing customer data to predict which customers are likely to purchase a new travel package being offered by a tourism company. Key points discussed include: - Customers with passports, from tier 1 cities, who are younger, single, and have been contacted multiple times are more likely to purchase packages. - Higher income customers and those in high positions are less likely to purchase. Basic and standard packages have higher conversion rates. - The document outlines data preprocessing steps like dropping outliers and rare values in the dataset to prepare the data for modeling to predict customers likely to purchase the new wellness travel package.

Uploaded by

Sugrib K Shaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views29 pages

ET - Project Presentation Solution

This document discusses analyzing customer data to predict which customers are likely to purchase a new travel package being offered by a tourism company. Key points discussed include: - Customers with passports, from tier 1 cities, who are younger, single, and have been contacted multiple times are more likely to purchase packages. - Higher income customers and those in high positions are less likely to purchase. Basic and standard packages have higher conversion rates. - The document outlines data preprocessing steps like dropping outliers and rare values in the dataset to prepare the data for modeling to predict customers likely to purchase the new wellness travel package.

Uploaded by

Sugrib K Shaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Travel Package Purchase

Prediction
[email protected]
D1GS97LPEQ

Visit with Us: Ensemble Technique

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Contents / Agenda
● Executive Summary

● Business Problem Overview and Solution Approach

● EDA Results

● Data Preprocessing
[email protected]

D1GS97LPEQ Model Performance Summary

● Appendix

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Executive Summary
● Our analysis shows that very few customers have passports and they are more likely to purchase
the travel package. The company should customize more international packages to attract more
such customers.

● We have customers from tier 1 and tier 3 cities but very few from tier 2 cities. The company
should expand its marketing strategies to increase the number of customers from tier 2 cities.

● We saw
[email protected] in our analysis that people with higher income or at high positions like AVP or VP are
D1GS97LPEQ
less likely to buy the product. The company can offer short-term travel packages and customize
the package for higher- income customers with added luxuries to target such customers.

● When implementing a marketing strategy, external factors, such as the number of follow-ups,
time of call, should also be carefully considered as our analysis shows that the customers who
have been followed up more are the ones buying the package.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Executive Summary
● After we identify a potential customer, the company should pitch packages as per the customer's
monthly income, for example, do not pitch king packages to a customer with low income and such
packages can be pitched more to the higher-income customers.

● We saw in our analysis that young and single people are more likely to buy the offered packages.
The company can offer discounts or customize the package to attract more couples, families, and
customers above 30 years of age.
[email protected]
D1GS97LPEQ

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Business Problem Overview and Solution Approach
● Visit with us is a tourism company, and the policymaker wants to enable and establish a viable
business model to expand the customer base. A viable business model is a central concept that
helps you understand the existing ways of doing business and how to change the ways for the
benefit of the tourism sector.

● One of the ways to expand the customer base is to introduce a new offering of packages.
Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super
[email protected]
D1GS97LPEQ Deluxe, and King. However, it was difficult to identify the potential customers because customers
were contacted at random without looking at the available information.

● The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness
Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy
lifestyle, and support or increase one's sense of well-being. This time company wants to harness
the available data of existing and potential customers to target the right customers.

● The task is to analyze the data and build a model to predict which customer is potentially going
to purchase the newly introduced travel package.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results

[email protected]
D1GS97LPEQ

● The distribution for monthly income shows that most of the values lie between 20,000 to 40,000.

● Income is one of the important factors to consider while approaching a customer with a certain package.
We can explore this further in bivariate analysis.

● There are some observations on the left and some observations on the right of the boxplot which can be
considered as outliers.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results

● There are approx 70% of customers who reached out to the company
first i.e. self-inquiry.

● This shows the positive outreach of the company as most of the


inquires are initiated from the customer's end.

[email protected]
D1GS97LPEQ

● The company pitches Deluxe or Basic packages to their customers more


than the other packages.

● This might be because the company makes more profit from Deluxe or Basic
packages or these packages are less expensive, so preferred by the majority
of the customers.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results

● We have seen that married people are the most common customer
for the company but this graph shows that the conversion rate is
higher for single and unmarried customers as compared to the
married customers.

● The company can target single and unmarried customers more and
can modify packages as per these customers.
[email protected]
D1GS97LPEQ

● The conversion rate for large business owners is higher than salaried or
small business owners.

● This might be because large business owners have high income.

● Freelancer have 100% conversion rate but there is just 2 such


observation, so cannot give any conclusive insights.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results

[email protected]
D1GS97LPEQ

● The conversion rate of customers is higher if the product pitched is Basic. This might be because
the basic package is less expensive.

● We saw earlier that company pitches the deluxe package more than the standard package, but the
standard package shows a higher conversion rate than the deluxe package. The company can pitch
standard packages more often.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results

[email protected]
D1GS97LPEQ

● The Number of trips and age have a weak positive correlation, which makes sense as age increases
number of trips is expected to increase.

● Age and monthly income are positively correlated.

● ProdTaken has a weak negative correlation with age which agrees with our earlier observation that
as age increases the probability for purchasing a package decreases.

● No other variables have a high correlation among them.


This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Preprocessing
● There are only two observations where the duration of pitch is greater than 37, so we will drop
these rows.

● There are only four observations where the monthly income is greater than 40,000 and less than
12000. Checked these observations and they seem to be the outliers.

● The percentage of categories for the number of trips 19 or above is very less. We can consider
these values as outliers. We can see that there are just four observations with a number of trips
[email protected]
D1GS97LPEQ
19 or greater, so we will drop these rows.

● There are missing values in a few of the numeric variables Age, Monthly income, and Number of
trips, so we will impute these values with a median.

● There are missing values in a few of the categorical variables Type of contact, Preferred property
star, and Number of children visiting, so we will impute these values with mode / most frequent.

● There are 6 categorical variables having string values, so we will be encoding these variables
with dummies.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Performance Summary
● We want to predict whether a liability customer will buy newly introduced travel package or not
using the information provided to us.

● We will use the Recall as the performance metric for our model because

● Predicting a customer will buy the product and the customer doesn't buy - Loss of
resources
[email protected]
D1GS97LPEQ ● Predicting a customer will not buy the product and the customer buys - Loss of opportunity

● We would want Recall to be maximized. The greater the Recall higher the chances of
minimizing false negatives

● Tuned XGBoost model indicates that the most significant predictors of buying a travel package:

○ Passport
○ Designation
○ Marital Status
○ City tier This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Performance Summary

[email protected]
D1GS97LPEQ

Best performing model

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
[email protected]
D1GS97LPEQ

APPENDIX

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Background and Contents
● The data contains information about 4,888 customers.

● The attributes include Age, Occupation, Income,Gender,Prod taken, Occupation, Passport, and
more.

● Average age of customers is 37 years, age of customers has a wide range from 18 to 61 years.

● Monthly income variable has some outliers at both ends.


[email protected]
D1GS97LPEQ
● Average income of customers is 25k dollars. Income has a wide range from 1k dollars to 98k
dollars. The distribution of Income is skewed to right.

● Half of the customers are married.

● 70% of the customers do not have passport.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: Decision Tree

Training Performance Testing Performance

● 0 errors on the training set, each sample ● The decision tree model is overfitting
has been classified correctly.
[email protected]
D1GS97LPEQ
the data as expected and not able to
generalize well on the test set.
● Model has performed very well on the
training set. ● We will have to use hyperparameter
tuning with the decision tree.
● As we know, a decision tree will continue
to grow and classify each data point
correctly if no restrictions are applied as the
trees will learn all the patterns in the
training set.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: Decision Tree

Training Performance Testing Performance

● The performance of the model after hyperparameter tuning has become generalized.
[email protected]
● We are getting
D1GS97LPEQ a Recall of 0.663 and 0.652 for training and test set, respectively.

● Let’s try building some ensemble models and see if the metrics improve.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: Random Forest

Training Performance Testing Performance

● With default parameters, random forest is overfitting the training data.


[email protected]
● We'll try to
D1GS97LPEQ reduce overfitting and improve the performance by hyperparameter tuning.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: Random Forest

Training Performance Testing Performance

● We are getting a Recall of 0.881 and 0.662 for training and test set, respectively.
[email protected]
● After tuning
D1GS97LPEQ the hyperparameters the random forest is still overfitting

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: Bagging

Training Performance Testing Performance

● We are getting a Recall of 0.951 and 0.510 for training and test set, respectively, which is
a very big difference.
[email protected]
D1GS97LPEQ

● We'll try to reduce overfitting and improve the performance by hyperparameter tuning.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: Bagging

Training Performance Testing Performance

● After tuning the hyperparameters the bagging classifier is still overfitting.


[email protected]
● There's a
D1GS97LPEQ big difference in the training and the test recall.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: AdaBoost

Training Performance Testing Performance

● The recall of both train and test set is very less.


[email protected]
● We'll try to
D1GS97LPEQ improve the performance by hyperparameter tuning.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: AdaBoost

Training Performance Testing Performance

● The recall of both train and test set is improved but there is a big difference between both
the sets.
[email protected]
D1GS97LPEQ

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: Gradient Boosting

Training Performance Testing Performance

● The recall of both train and test set is very less.


[email protected]
● We'll try to
D1GS97LPEQ improve the performance by hyperparameter tuning.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: Gradient Boosting

Training Performance Testing Performance

● The recall of both train and test set is improved but there is a difference between both the
sets.
[email protected]
D1GS97LPEQ

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: XGBoost

Training Performance Testing Performance

● The XGBoost model on the training set has performed very well but it is not able to
generalize on the test set.
[email protected]
D1GS97LPEQ

● Let's try and tune the hyperparameters and see if the performance can be generalized.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Improvement: XGBoost

Training Performance Testing Performance

● The overfitting has reduced after hyperparameter tuning but is still an overfit model.
[email protected]
D1GS97LPEQ

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Building: Stacking

Training Performance Testing Performance

● For the Stacking Classifier, the tuned random forest, the tuned gradient boosting classifier
and the decision tree models were used as the initial estimators while the tuned xgboost
[email protected]
D1GS97LPEQ
classifier was used as the final estimator.
● We have received recall scores of 0.878 and 0.735 on the training and test set,
respectively.

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Happy Learning !
[email protected]
D1GS97LPEQ

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action. 29
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like