Lead Scoring Assignment Summary

An education company wanted to increase its low lead conversion rate. It provided a dataset to build a logistic regression model to assign lead scores between 0-100. The approach included data cleaning, EDA, feature selection, and model evaluation. The optimal model used 10 features to predict lead conversion with 87% AUC.

Uploaded by

Akshay Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views4 pages

Lead Scoring Assignment Summary

Uploaded by

Akshay Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lead Scoring Case Study Summary:

Problem Description:

An education company named X Education sells online courses to industry professionals. Although X Education gets
a lot of leads, its lead conversion rate is very poor and is around 30%.

X Education needs help with building a logistic regression model so as to assign a lead score between 0 and 100 to
each of the leads which can be used by the company to target potential leads. A higher score would mean that the
lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead is cold and will mostly not get
converted The CEO, in particular, has given a ballpark of the target lead conversion rate to be around 80%.

Approach:

o Reading & understanding the data:

✓ In this step we took a first look at the dataset and inspected the following:
✓ First few and last few rows
✓ Checked the shape of the data
✓ Data types for each column
✓ Got the descriptive statistics for the numerical columns
✓ Did basic research to get better understanding of the domain

o Data Cleaning:

✓ Converted ‘Select’ values to null values.

✓ Missing value treatment:

✓ Further dropped columns with only one unique value:

✓ Dropped columns with unique values = 2, after confirming data imbalance of > 85%
✓ Checked for duplicates, none were found.
o Exploratory Data Analysis:

✓ Did basic EDA and identified very interesting patterns in the data.
✓ Performed bivariate analysis on categorical columns to see how they vary w.r.t Converted column.
✓ Dropped the column ‘Last Notable Activity’ as the feature is sales team generated
✓ Performed bivariate analysis on numerical columns by plotting box plots.
✓ Also used a heat plot to identify highly correlated numerical columns.

o Data Preparation:

✓ Created dummy variables the categorical columns with more than 2 categories using the
pd.get_dummies function
✓ Performed a 70-30 spilt the leads dataset into Train and Test respectively
✓ Performed feature scaling using the standard scaler.

o Model Building:

✓ We shortlisted the top 15 features using the Recursive Feature Elimination (RFE) technique to build
our first model.
✓ In the next few iterations, we further fine-tuned our model by eliminating features with p-values > 0.05
and (Variable Inflation Factor) vif values > 5. Using vif helps reduce the impact of multicollinearity in
the data.
✓ Once this model was less complex with ~10 features, we predicted probabilities on the train set and
created a new column predicted with 1 if probability is greater than .5 else 0.

o Model Evaluation:

✓ We also calculated the metrics sensitivity, specificity, precision, and accuracy.

✓ To make predictions on the train dataset, optimum cut-off of 0.34 was found from the intersection
of sensitivity, specificity and accuracy as shown in below figure.
✓ We also plotted roc curve to find the area under the curve (0.87 for the train data set).
✓ We also tired getting the optimal cut-off using Precision vs. Recall Trade-off curve. However, the
models sensitivity and precision went below the 75% mark and hence was not considered in as the
final cut-off.
o Predictions on the Test Set:
✓ After finalizing the optimum cut-off of 0.34 and calculating the metrics on train set, we predicted the
data on test data set. Below are the observations:
o Final Observations:

Below are the predictor variables that we used in our final model and their relative importance:

Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
Lead Score Case Study - Presentation
33% (3)
Lead Score Case Study - Presentation
17 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
Mediclaim Policy Premium 1
No ratings yet
Mediclaim Policy Premium 1
4 pages
Predictive Modeling Business Report
100% (3)
Predictive Modeling Business Report
69 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Assignment#2 RT WQ2021
No ratings yet
Assignment#2 RT WQ2021
2 pages
Lead Scoring Case Study Summary-Mamta Lohani and Garima Bansal
100% (1)
Lead Scoring Case Study Summary-Mamta Lohani and Garima Bansal
2 pages
Lead Scoring Case Study Summary Report
100% (1)
Lead Scoring Case Study Summary Report
3 pages
Case Study Summary
No ratings yet
Case Study Summary
3 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Lead Scoring Logistic Regression
No ratings yet
Lead Scoring Logistic Regression
19 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
16 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
11 pages
Lead Score Summary
No ratings yet
Lead Score Summary
4 pages
Hemant Sawakare - Lead Scoring Case Study - Summary
No ratings yet
Hemant Sawakare - Lead Scoring Case Study - Summary
4 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
13 pages
Summary Report - Vineeta - Aman
No ratings yet
Summary Report - Vineeta - Aman
2 pages
Lead Score
No ratings yet
Lead Score
23 pages
LeadscoringCaseStudySummary Aparna Ashish
100% (2)
LeadscoringCaseStudySummary Aparna Ashish
2 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
7 pages
Main Projects Rubrics - PM - Coded (NEW)
No ratings yet
Main Projects Rubrics - PM - Coded (NEW)
2 pages
LEAD SCORING CASE STUDY-converted-compressed
No ratings yet
LEAD SCORING CASE STUDY-converted-compressed
13 pages
Lead Scoring Case Study Presentatin Shravan + Kavana
No ratings yet
Lead Scoring Case Study Presentatin Shravan + Kavana
15 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
12 pages
PAMLSET1 New
No ratings yet
PAMLSET1 New
4 pages
Presentation Lead Case Score
No ratings yet
Presentation Lead Case Score
12 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Lead Score Case Study: Presented By: Vaibhav Dubey Amar Uttarkar DSC-25
No ratings yet
Lead Score Case Study: Presented By: Vaibhav Dubey Amar Uttarkar DSC-25
11 pages
'Yatham Padma' 8 May 2022
No ratings yet
'Yatham Padma' 8 May 2022
82 pages
Business+Report Linear
No ratings yet
Business+Report Linear
20 pages
Advanced Machine Learning Final Project
No ratings yet
Advanced Machine Learning Final Project
20 pages
PAMLSET2
No ratings yet
PAMLSET2
4 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Assignment Question
No ratings yet
Assignment Question
6 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
Documenting The Solution To Develop A Behaviour Score
No ratings yet
Documenting The Solution To Develop A Behaviour Score
9 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
14 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Lead Scoring Case Study: Aparna Trivedi Ashish Nipane DS C29
No ratings yet
Lead Scoring Case Study: Aparna Trivedi Ashish Nipane DS C29
13 pages
Document 1
No ratings yet
Document 1
4 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
9 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Devidutta Predictive Modeling PDF
No ratings yet
Devidutta Predictive Modeling PDF
25 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Predictive Modeling (MP) Project Report
100% (1)
Predictive Modeling (MP) Project Report
73 pages
Assignment 1 - CIS 508
No ratings yet
Assignment 1 - CIS 508
11 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
9 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Bank Marketing Prediction
No ratings yet
Bank Marketing Prediction
2 pages
Capstone 2 Corizo
No ratings yet
Capstone 2 Corizo
2 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Pooja Kabadi - Predictive Modelling Project
No ratings yet
Pooja Kabadi - Predictive Modelling Project
70 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
14 pages
LTI Certificate
No ratings yet
LTI Certificate
2 pages
CTL Certificate
No ratings yet
CTL Certificate
2 pages
Lead Scoring Case Study Presentation
No ratings yet
Lead Scoring Case Study Presentation
16 pages
Tata AIA Life Insurance Sampoorna Raksha Supreme (UIN: 110N160V02) - IRDA of India Regn No. 110
No ratings yet
Tata AIA Life Insurance Sampoorna Raksha Supreme (UIN: 110N160V02) - IRDA of India Regn No. 110
3 pages
Agent/Intermediary Name: NJ Insurance Brokers PVT LTD Agency/Intermediary Code/License: 004503118 Agent/Intermediary Contact Details: 3985500 / Khozema@njgroup - in
No ratings yet
Agent/Intermediary Name: NJ Insurance Brokers PVT LTD Agency/Intermediary Code/License: 004503118 Agent/Intermediary Contact Details: 3985500 / Khozema@njgroup - in
73 pages
Mediclaim Policy Premium Receipts
No ratings yet
Mediclaim Policy Premium Receipts
4 pages
Session 09 - BS - 2020-Z Score
No ratings yet
Session 09 - BS - 2020-Z Score
32 pages
Allison
No ratings yet
Allison
6 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
Econometric Mod L
No ratings yet
Econometric Mod L
8 pages
4793 11183 1 PB
No ratings yet
4793 11183 1 PB
6 pages
Homework 1 - Simple Linear Regression - Neal Pania
No ratings yet
Homework 1 - Simple Linear Regression - Neal Pania
4 pages
STA457 Week 7 Notes
No ratings yet
STA457 Week 7 Notes
61 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
BR-III MCQs
100% (2)
BR-III MCQs
8 pages
MA2 Applied Linguistics 2016: Quantitative Methods
No ratings yet
MA2 Applied Linguistics 2016: Quantitative Methods
19 pages
MC Math 13 Module 10
No ratings yet
MC Math 13 Module 10
15 pages
Unit 1 - Capstone Project-Answer Key
No ratings yet
Unit 1 - Capstone Project-Answer Key
21 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
Linear Model and Extensions Peng Ding Instant Download
No ratings yet
Linear Model and Extensions Peng Ding Instant Download
91 pages
2004 JQT Woodall Et Al
No ratings yet
2004 JQT Woodall Et Al
12 pages
Regression Analysis For Non-Linear Load Growth (Load Forecasting)
No ratings yet
Regression Analysis For Non-Linear Load Growth (Load Forecasting)
9 pages
BCS 040 PDF
No ratings yet
BCS 040 PDF
5 pages
PHD Thesis Structural Equation Modeling
100% (3)
PHD Thesis Structural Equation Modeling
6 pages
Sol HW6
100% (2)
Sol HW6
15 pages
Unit Cell Refinement From Powder Diffraction Data: The Use of Regression Diagnostics
No ratings yet
Unit Cell Refinement From Powder Diffraction Data: The Use of Regression Diagnostics
13 pages
Stats - 112 by Kuyajovert
No ratings yet
Stats - 112 by Kuyajovert
70 pages
Linkage Methods
No ratings yet
Linkage Methods
2 pages
BN2102 7-10
No ratings yet
BN2102 7-10
24 pages
Imran Hussain 1
No ratings yet
Imran Hussain 1
2 pages
8-F-Test (Two-Way Anova With Interaction Effect)
No ratings yet
8-F-Test (Two-Way Anova With Interaction Effect)
14 pages
FandI CT6 200909 Exam FINAL
No ratings yet
FandI CT6 200909 Exam FINAL
7 pages
Statistical Inference Coursera Peer Project Part 2
No ratings yet
Statistical Inference Coursera Peer Project Part 2
8 pages
Problem Sets (Days 1-6)
No ratings yet
Problem Sets (Days 1-6)
18 pages
Continuous Random Variables: - A Continuous Random Variable Has An Set of Possible Values
No ratings yet
Continuous Random Variables: - A Continuous Random Variable Has An Set of Possible Values
4 pages

Lead Scoring Assignment Summary

Uploaded by

Lead Scoring Assignment Summary

Uploaded by

Lead Scoring Case Study Summary:

o Reading & understanding the data:

✓ Converted ‘Select’ values to null values.

✓ Further dropped columns with only one unique value:

✓ We also calculated the metrics sensitivity, specificity, precision, and accuracy.

You might also like