100% found this document useful (1 vote)
172 views52 pages

PM Guided Project Sample Business Report

Uploaded by

monikasreee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
172 views52 pages

PM Guided Project Sample Business Report

Uploaded by

monikasreee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

AL

TI
EN Predictive Modelling
Business Report
[email protected]
3HNURQQA7M
D
FI
N
O
C

This file is meant for personal use by [email protected] only.


Sharing or publishing the contents in part or full is liable for legal action.
Contents

S.no Topics Page

Extraa Learn 3

1.1 Problem Definition 3

1.2 Univariate Analysis 6

AL
1.3 Multivariate Analysis 16

1.4 Data Pre-processing 27

1.5 Model Building 28

TI
1.6 Model Performance Comparison 47

1.5 Actionable Insights and Recommendations 48


EN
[email protected]
List of Tables
3HNURQQA7M
D
FI

No Name of the Table Page no

1 Basic Information of dataset 5


N

2 Statistical Summary 6

3 Model Comparison on train data 47


O

4 Model Comparison on test data 47


C

1
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
List of Figures

No Name of Figure age no

1 Univariate Analysis on Categorical features 6

2 Univariate Analysis on Age 7

3 Univariate Analysis on Website_visits 8

AL
4 Univariate Analysis on Number of time spent on website 9

5 Univariate Analysis on Number of page views per visit 10

6 Univariate Analysis on Number of adults 11

TI
7 Univariate Analysis on Number of children 11

8 Univariate Analysis on profile complete 12


EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

2
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
ExtraaLearn Project
Define the problem and Perform Exploratory Data Analysis

Context
The EdTech industry has been surging in the past decade immensely, and according to a

AL
forecast, the Online Education market would be worth $286.62bn by 2023 with a compound
annual growth rate (CAGR) of 10.26% from 2018 to 2023. The modern era of online education
has enforced a lot in its growth and expansion beyond any limit. Due to having many dominant
features like ease of information sharing, personalized learning experience, transparency of
assessment, etc, it is now preferable to traditional education.

TI
In the present scenario due to the Covid-19, the online education sector has witnessed rapid
growth and is attracting a lot of new customers. Due to this rapid growth, many new
companies have emerged in this industry. With the availability and ease of use of digital
EN
marketing resources, companies can reach out to a wider audience with their offerings. The
customers who show interest in these offerings are termed as leads. There are various
sources of obtaining leads for Edtech companies, like
[email protected]
3HNURQQA7M
● The customer interacts with the marketing front on social media or other online
D
platforms
● The customer browses the website/app and downloads the brochure
FI

● The customer connects through emails for more information


The company then nurtures these leads and tries to convert them to paid customers. For this,
the representative from the organization connects with the lead on call or through email to
N

share further details.


O

Objective
C

ExtraaLearn is an initial stage startup that offers programs on cutting-edge technologies to


students and professionals to help them upskill/reskill. With a large number of leads being
generated regularly, one of the issues faced by ExtraaLearn is to identify which of the leads are
more likely to convert so that they can allocate resources accordingly. You, as a data scientist
at ExtraaLearn, have been provided the leads data to:

3
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Analyze and build an ML model to help identify which leads are more likely to convert to
paid customers,
● Find the factors driving the lead conversion process
● Create a profile of the leads which are likely to convert

Data Description
The data contains the different attributes of leads and their interaction details with

AL
ExtraaLearn. The detailed data dictionary is given below.
Data Dictionary
ID: ID of the lead

TI
age: Age of the lead
current_occupation: Current occupation of the lead. Values include
'Professional','Unemployed',and 'Student'
EN
first_interaction: How did the lead first interact with ExtraaLearn. Values include 'Website',
'Mobile App'
profile_completed: What percentage of the profile has been filled by the lead on the
[email protected]
3HNURQQA7M
website/mobile app. Values include Low - (0-50%), Medium - (50-75%), High (75-100%)
D
website_visits: How many times has a lead visited the website
FI

time_spent_on_website: Total time spent on the website


page_views_per_visit: Average number of pages on the website viewed during the visits.
last_activity: Last interaction between the lead and ExtraaLearn.
N

Email Activity: Seeking for details about the program through email, Representative shared
information with a lead like a brochure of program, etc
O

Phone Activity: Had a Phone Conversation with a representative, Had conversation over SMS
with a representative, etc
C

Website Activity: Interacted on live chat with a representative, Updated profile on the website,
etc
print_media_type1: Flag indicating whether the lead had seen the ad of ExtraaLearn in the
Newspaper.
print_media_type2: Flag indicating whether the lead had seen the ad of ExtraaLearn in the
Magazine.

4
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
digital_media: Flag indicating whether the lead had seen the ad of ExtraaLearn on the digital
platforms.
educational_channels: Flag indicating whether the lead had heard about ExtraaLearn in the
education channels like online forums, discussion threads, educational websites, etc.
referral: Flag indicating whether the lead had heard about ExtraaLearn through reference.
status: Flag indicating whether the lead was converted to a paid customer or not.

AL
Data Overview
The data is imported, and the following are the observations:

TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

Table 1: Data information

The data has 4612 rows and 15 columns. There are 10 object type data types and there are 4
int data types and 1 float data type.

● There are no duplicate values in the data

● All the values in the case id column are unique. Hence, we can drop this column.

5
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Statistical Summary

AL
Table 2: Statistical Summary

● The average age of leads in the data is 46 years. The distribution of the age seems
fine.
● On average a lead visits the website 3 times. There are some leads who have never

TI
visited the website.
● On average the leads spent 724 seconds or 12 minutes on the website. There's also a
very huge difference in 75th percentile and maximum value which indicates there
EN
might be outliers present in this column.
● The distribution of the average page views per visit suggests that there might be
outliers in this column.
[email protected]
3HNURQQA7M
D
Exploratory Data Analysis
FI

Univariate Analysis
N

Observations on Categorical Variables:


O

● Most of the leads are working professions.


● Most of the leads interacted with ExtraaLearn from website.
● Most of the leads have completed atleast 50% of their profile.
C

● Most of the activity with the leads was on Emails.


● Very few leads are acquired from print media, digital ,media and referrals.

6
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on Age

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

The age seems to be uniformly distributed with mean and median of the leads being
very close to each other - ~50 years.
N
O
C

7
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on website_visits

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

The distribution of website visits is skewed to right. There are many outliers on the rights
side.
Leads who have visited more than 9 times on the website are being represented as outliers.
N

Some of the leads have never visited the website while some of the leads have visited
website more than 10 times.
O

174 leads have not visited the website.


C

8
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on number of time_spent_on_website

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

50% of the leads have spent less than 500 seconds on the website.
Some leads have spent more than 1800 seconds or 30 minutes on the website.
N
O
C

9
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on number of page_views_per_visit

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

The distribution of the page views per visit is skewed to the right.
50% of the leads have seen less than 3 pages of the website.
N

There are many outliers on the right side of the boxplot. Some leads have viewed more than
10 pages on the website.
O
C

10
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on number of adults

● 56.7% of the leads are working


professionals, followed by 31.2% of the leads
who are currently unemployed.
● 12% of the leads are still pursuing their
education.

AL
TI
EN
[email protected]
3HNURQQA7M● Observations on number of children
D
FI

● 55% of the leads first interacted


through website, while the rest through
N

mobile application.
O
C

11
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on profile_complete

● Almost an equal percentage of profile


completions are categorized as high and medium
that is 49.1% and 48.6%, respectively.
● Only 2.3% of the profile completions are
categorized as low.

AL
TI
EN
[email protected]
3HNURQQA7M
D
● Observations on last_activity
FI

● 49.4% of the leads had their last activity


over email, followed by 26.8% having phone
N

activity
O
C

12
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on print_media_type1

● 89.2% of the leads didn't see the ads in


Newspaper.

AL
TI
EN
[email protected]
3HNURQQA7M
D
● Observations on print_media_type2
FI
N

● ~95% of the leads didn't see the ads in


O

Magazine.
C

13
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on room type reserved

● 11.4% of the leads saw the ads on digital media.

AL
TI
EN
[email protected]
3HNURQQA7M
D
● Observations on educational_channels
FI
N

● 15.3% of the leads heard about ExtraaLearn on


educational channels.
O
C

14
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Observations on referral

● Only 2% of the leads were referred.

AL
TI
EN
[email protected]
3HNURQQA7M
D
● Observations on status
FI

● 30% of the leads were converted.


N
O
C

15
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Multivariate Analysis

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

There's a positive correlation between status and time spent on website.


There's no correlation between any independent variable.
N
O
C

16
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Leads will have different expectations from the outcome of the course and the current
occupation may play a key role for them to take the program. Let's analyze it

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

● Working professional showed the highest percentage of conversion among other three
categories. Around 35% of the working professionals were converted
N

● Students showed the least conversion percentage - only around 15% of the leads were
converted.
O

● Currently unemployed leads showed the conversion rate of around 30%.


This shows that the currently offered program is more oriented towards working
C

professionals or unemployed personnels.


The program might be suitable for the working professionals who might want to transition to
a new role or take up more responsibility in the current role. And also focused on skills that
are in high demand making it more suitable for currently unemployed leads.

17
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Age can be a good factor to differentiate between such leads

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O

● The range of age for students is 18 to 25 years.


● The range of age for professionals vary from 25 years to 60 years.
C

● The currently unemployed leads have age range from 32 to 63 years.


● The average age of working professionals and unemployed leads is almost equal to 50
years.

18
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
The company's first interaction with leads should be compelling and persuasive. Let's see if
the channels of the first interaction have an impact on the conversion of leads

AL
TI
EN
● The website seems to be doing a good job as compared to mobile app as there is a
[email protected]
3HNURQQA7M huge difference in the percentage of conversions of the leads who first interacted with
D
the company through website and those who interacted through mobile application.
● Close to 60% of the leads who interacted through websites were converted to paid
customers while only around 20% of the leads who interacted through mobile app
FI

converted.
N
O
C

19
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
We saw earlier that there is a positive correlation between status and time spent on the
website. Let's analyze it further

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O

● There's a good separation between the leads who were converted into paid customers
and who didn't.
C

● The median time for the leads who were not converted to paid customers is close to
300 seconds or 5 minutes.
● The median time for the leads who were converted to paid customers is close to 790
seconds or 13 minutes.
● The time spent on website can be a good predictor for finding which leads will be
converted.

20
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Let's do a similar analysis for time spent on website and page views per visit.

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N

There is not much difference between the distribution of number of website visits for the
O

leads who were converted and who were not converted.


C

21
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N

The median of page views per visit for a lead who was converted to a paid customer is
slightly higher than the lead who was not converted.
O
C

22
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
People browsing the website or the mobile app are generally required to create a profile by
sharing their personal details before they can access more information. Let's see if the
profile completion level has an impact on lead status

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

● The leads who have shared their complete details with the company converted more
N

as compared to other levels of profile completion. Around 60% of the leads with high
level of profile completion were converted.
O

● The medium and low levels of profile completion saw around 20% and 10%
conversions, respectively.
C

The high level of profile completion might indicate a lead's intent to pursue the course which
results in high conversion.

23
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
After a lead shares their information by creating a profile, there may be interactions
between the lead and the company to proceed with the process of enrollment. Let's see how
the last activity impacts lead conversion status

AL
TI
EN
[email protected]
3HNURQQA7M
D
● Leads interacting on website and through emails have around 60% conversion.
FI

● The interaction happening over phone has around 20% conversion. This is an area of
improvement for the company.
N

Let's see how advertisement and referrals impact the lead status
O
C

24
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

25
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
Observations:
EN
● All the advertisement channels show almost equal percentage of conversions and
non-conversions except for referrals.
● There are very less number of referrals but the conversion percentage is high.
[email protected]
3HNURQQA7M● Company should try to get more leads through referrals by promoting rewards for
D
existing customer base when they refer someone.
FI
N
O
C

26
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Data Pre-processing

Outlier Check

AL
TI
EN
Observations
[email protected]
3HNURQQA7M● There are quite a few outliers in the data.
D
● However, we will not treat them as they are proper values.
FI

Data Preparation for modeling


● We want to predict which lead is more likely to be converted.
N

● Before we proceed to build a model, we'll have to encode categorical features.


● We'll split the data into train and test to be able to evaluate the model that we build on
the train data.
O
C

27
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Model Building

Model evaluation criterion


Model can make wrong predictions as:
1. Predicting a lead will not converted to a paid customer in reality, the lead would have
converted to a paid customer.

AL
2. Predicting a lead will converted to a paid customer in reality, the lead would not have
converted to a paid customer.

Which case is more important?

TI
● If we predict that a lead will not get converted and the lead would have converted then
the company will lose a potential customer.
● If we predict that a lead will get converted and the lead doesn't get converted the
EN
company might lose resources by nurturing false positive cases.
Losing a potential customer is a greater loss.

How to reduce
[email protected] the losses?
3HNURQQA7M
D
● Company would want Recall to be maximized, greater the Recall score higher are the
chances of minimizing False Negatives.
FI
N
O
C

28
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regression

After the data preprocessing Logistic Regression model is applied to the Train and Test
datasets with default hyper-parameters and solver considered as to be ‘newton-cg’

● Checking model performance on training set

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

● Checking model performance on testing set

29
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
ROC-AUC
[email protected]
3HNURQQA7M
D
FI
N
O
C

30
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Using GridSearch for Hyperparameter tuning of our logistic regression model
We will be tuning the logistic regression model. We checked the following parameters and their
different values:

● Checking performance on training set

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

● Checking model performance on test set

31
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

32
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Importance

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O

● Coefficients of current_occupation_Student, current_occupation_Unemployed,


profile_completed_Low, profile_completed_Medium, last_activity_Phone Activity are
negative an increase in these will lead to a decrease in chances of a lead getting
C

converted.
● Coefficients of time_spent_on_website, first_interaction_Website ,last_activity_Website
Activity , print_media_type2_Yes, and referral_Yes are positive an increase in these will
lead to a increase in the chances of a lead getting converted.

33
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Observations from Logistic Regression model
● We have been able to build a predictive model that can be used by the company to
predict which leads are likely to be converted to paid customers with recall score of
0.60 on the training set and formulate marketing strategies accordingly for the leads.
● The logistic regression models are giving a generalized performance on training and
test set.
● Using the model with default threshold the model will give a low recall but good
precision score - The company will be able to predict which lead will be converted and

AL
will be able to save resources but will lose potential customers

TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

34
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Linear Discriminant Analysis
The LDA model is also built with default parameters. The default cut-off value of 0.5 is
considered for prediction.

This model is also further evaluated with Recall, along with the confusion matrix. The
AUC-ROC curve is plotted for both the Train and Test data.

AL
● Checking model performance on training set

TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

35
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Checking model performance on test set

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

36
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
ROC-AUC

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

37
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Using GridSearch for Hyperparameter tuning of our LDA model
We will be tuning the LDA model. We checked the following parameters and their different values:

AL
● Checking model performance on training set

TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

● Checking model performance on test set


38
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

39
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Importance

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N

● Coefficients of current_occupation_Student, current_occupation_Unemployed,


profile_completed_Low, profile_completed_Medium, last_activity_Phone Activity are
O

negative an increase in these will lead to a decrease in chances of a lead getting


converted.
● Coefficients of first_interaction_Website ,last_activity_Website Activity ,
C

print_media_type1_Yes, print_media_type2_Yes, and referral_Yes are positive an


increase in these will lead to a increase in the chances of a lead getting converted.

40
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Decision Tree Model

AL
● Checking model performance on training set

TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

● Almost 0 errors on the training set, each sample has been classified correctly.
● Model has performed very well on the training set.
● As we know a decision tree will continue to grow and classify each data point correctly
if no restrictions are applied as the trees will learn all the patterns in the training set.
● Let's check the performance on test data to see if the model is overfitting.

41
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Checking model performance on test set

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

● The decision tree model is overfitting the data as expected and not able to generalize
well on the test set.
N

● We will have to prune the decision tree.


O
C

42
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
ROC-AUC

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

43
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Pruning the tree
We will be doing pre-pruning of the CART model. We checked the following parameters and their
different values:

Using GridSearch, we got the following parameter combination as the best one.

AL
TI
● Checking performance on training set
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

44
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● Checking performance on test set

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI

ROC-AUC
N
O
C

45
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

46
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Importance

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N

Observations from decision tree


O

● We can see that the tree has become simpler and the rules of the trees are readable.
● The model performance of the model is good and generalized.
C

● We are getting a recall of 0.89 on the training set.


● We observe that the most important features are:
○ First interaction - Website
○ Time spent on website
○ Profile completed - Medium
○ Current Occupation - Student

47
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
The rules obtained from the decision tree can be interpreted as:
● The rules show that first interaction channel plays a key role in identifying if a lead will
get converted or not.
Leads who first interacted through website:
● If the lead has spent more than 415 seconds (~7 minutes) on the website and the lead
is not a student then the lead is likely to get converted to a paid customer. But if the
lead is a student, and the last interaction was through phone they are less likely to get
converted.

AL
● If the customer has spent less than or equal to 415 second (~7 minutes) on the
website and the profile completion is medium level the lead is not likely to convert to a
paid customer. But if the profile completion of the lead is not of medium level and the
age is more than 27 years then lead is likely to get converted. Leads with age less than

TI
27 years are less likely to get converted.
Leads who first didn't interacted through website:
EN
● If the lead has spent less than 419 seconds (~7 minutes) on the website is less likely
to be converted.
● If the lead has spent more than 419 seconds (~7 minutes) on the website, the last
activity was done on website by the lead, the lead is more likely to get converted. But if
[email protected]
3HNURQQA7M
the last activity was done on website leads are less likely to get converted.
D
FI
N
O
C

48
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Model Performance Comparison and Final Model Selection
Let’s compare the performances of all the models we have built.

Training

AL
Table 3: Model Comparison on train data

Test

TI
EN
[email protected] Table 4: Model Comparison on test data
3HNURQQA7M
D
Observations
FI

● Decision tree model with default parameters is overfitting the training data and is not
able to generalize well.
● Pre-pruned tree has given a generalized performance with the recall score of 0.89 and
N

0.86 on training and test set, respectively.


● Post-pruned tree is also giving good results but the recall from pre-pruned tree is
higher.
O

● The company will be able to minimize the false negatives better using the pre-pruned
tree.
C

49
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Actionable Insights and Recommendations
Insights
● Overall we can see that the Decision Tree model performs better on the dataset
● Looking at important variables based on p-values in Logistic regression and
feature importance in the Decision Tree model
○ Time spent on website, First Interaction, Profile completion and Current
occupation are important in both model

AL
○ From the Logistic Regression model we observe that Time spent on
website, First interaction - Website, have a positive relation with leads
getting converted

TI
Business Recommendations
The company can focus on the leads having the following profile-

EN
● Those who first interacted with the company through website.
● Those who have spent around 7 minutes exploring website.
This shows that the current website user interface is working well and is able to
[email protected]
3HNURQQA7M engaged the visitors. The company should keep on improving the browsing experience
D
of the visitors and make it easy for them to access the information about different
programs.
FI

● Those who have high level of profile completion - The high level of profile
completion might indicate a lead's intent to pursue the course which results in
high conversion.
N

● Those who are currently unemployed or working professionals - The program


might be suitable for the working professionals who might want to transition to a
new role or take up more responsibility in the current role. And also focused on
O

skills that are in high demand making it more suitable for currently unemployed
leads. Students can be offered a different program from the company's portfolio.
C

● Those with whom the last interaction was done on a website or through emails.
Areas of improvement for the company-
● In our analysis we saw that there are less conversion when the leads interacted
through phone. This can investigated further by the company and the
communication done via this channel can be improved.
● We also saw that when a lead is referred by an existing customer there's a high
chances of conversion. Company should try to get more leads through referrals
by promoting rewards for existing customer base when they refer someone.

50
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
● The advertisement channels not contributing much in the conversion and the
company can focus on improving their engagement on these channels.
● The company can get more data on the educational background of the leads
which might help in getting better insights.

AL
TI
EN
[email protected]
3HNURQQA7M
D
FI
N
O
C

51
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

You might also like