0% found this document useful (0 votes)

3 views3 pages

Python - Project 2 Problem Statement

The document outlines a project focused on predicting customer interest in caravan insurance using a dataset with 86 variables. It includes a formal problem statement, evaluation criteria for two parts of the project, and general guidelines for model building and submission. Successful completion requires passing a quiz and achieving a specific Fbeta score on test data predictions.

Uploaded by

Hem Kuniyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Python - Project 2 Problem Statement

Uploaded by

Hem Kuniyal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

16/01/2019 Project2

To Mail or Not to Mail

Direct mailings to a company’s potential customers – “junk mail” to many – can be a very effective way for them
to market a product or a service. However, as we all know, much of this junk mail is really of no interest to the
people that receive it. Most of it ends up thrown away, not only wasting the money that the company spent on it,
but also filling up landfill waste sites or needing to be recycled.

If the company had a better understanding of who their potential customers were, they would know more
accurately who to send it to, so some of this waste and expense could be reduced.

Data Files

Train Dataset = carvan_train.csv

Test Dataset = carvan_test.csv

Formal Problem Statement

We want you to predict whether a customer is interested in a caravan insurance policy from other data about the
customer. Information about customers consists of 86 variables and includes product usage data and socio-
demographic data derived from zip area codes. The data was supplied based on a real world business problem.
The training set contains over 5000 descriptions of customers, including the information of whether or not they
have a caravan insurance policy. A test set contains 4000 customers of whom target variable is not shared with
you.

Target Variable is V86.

You need to use train data for building the model and then use that model to predict outcome for given test data.
Test dataset does not have a response column; you need to predict those values and submit it in a csv format.
We expect outcomes to be either 0 or 1.

Evaluation Criterion

Part 1:

file:///C:/Users/anjal/Downloads/Project2%20(1).html 1/3
16/01/2019 Project2

You will first attempt Part 1 of this project which is a quiz. You can access it through LMS. This quiz needs to be
answered based on exploration of the dataset given and some generic questions about algorithms discussed in
the course. Consider only the training dataset for data cleaning and exploration to answer the quiz questions.
There will be 10 questions of which you need to get at least 7 correct in order to pass the project.

Part 2:

Here you work on creating the machine learning models and choosing the one which gives the best
performance. You can refer to the Project Process Guides provided in LMS to understand how to approach and
work on a project.

In order to get a passing grade in this project you need to get Fbeta score greater than 0.26 [ beta =2 ] for your
test data predictions.

Submission:

You need to use train data for building the model and then use that model to predict outcome for given test data.
We expect outcomes to be either 0 or 1. Your submission will be a csv file with a single column containing your
predictions for target. Order of these predicitons should be same as order of the observations in the test data to
which these predictions correspond.

You can make as many submissions you want if you want. [We might ask you to submit the script which was
used to generate the submission at any time].

General Guidelines for the project

file:///C:/Users/anjal/Downloads/Project2%20(1).html 2/3
16/01/2019 Project2

Since its a small dataset and you can quickly run many experiments, we are not providing any benchmark
script for you to get started.
One more reason for not providing a benchmark script is that, entire data is conveniently numeric and you
need to spend very less time in preparing the data.
you will find data details in 'data dictionary.txt' file.
You will notice that many variables which are numeric in the data but should have been categorical in reality.
Handling those variables in proper fashion might improve your model.
Real catch in this problem is very low number of responses being 1. Simpler models will perform very poorly
on this data. You will have to focus on parameter tuning very well. Since the dataset is fairly small, it wouldnt
be an issue.
As mentioned in the project 1, do break your train data into two parts; use one part to build your model and
use another to asses its performance, so that while submitting your results, you know how your model
performs rather than wait for our evaluations.
While you are breaking your data into two parts, make sure that you stratified sampling so that both part
have same percentages of 0/1 as in the original data. This way you'll avoid falling in trap of severe
over/underfit while assesing performance of your model.
In case of any doubt , feel free to reach out to us.

In order to clear this project, you are required to clear both, Part 1 as well as Part 2 of this assignment.

Wish you all the best!

file:///C:/Users/anjal/Downloads/Project2%20(1).html 3/3

CT2 Assignment
No ratings yet
CT2 Assignment
3 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
No ratings yet
Project: ©great Learning. Proprietary Content. All Rights Reserved. Unauthorised Use or Distribution Prohibited
8 pages
Machine Learning
100% (1)
Machine Learning
33 pages
CT2 Assignment
No ratings yet
CT2 Assignment
3 pages
30XB - 30XBE - 30XBP: Air-Cooled Screw Chillers
100% (1)
30XB - 30XBE - 30XBP: Air-Cooled Screw Chillers
88 pages
ML Lab Manual1
No ratings yet
ML Lab Manual1
23 pages
PPT
No ratings yet
PPT
29 pages
PL LAB 3 File
No ratings yet
PL LAB 3 File
56 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
DLP Science 3 Q1
No ratings yet
DLP Science 3 Q1
12 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
Machine Learning Project - Parijat
No ratings yet
Machine Learning Project - Parijat
26 pages
Machine Learning Business Report PDF
No ratings yet
Machine Learning Business Report PDF
54 pages
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
No ratings yet
RAJIVRANJAN 26-03-2023 MachineLearningProjectReport Final
54 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
Final Exam MPML
No ratings yet
Final Exam MPML
5 pages
Python - Project3 Problem Statement
No ratings yet
Python - Project3 Problem Statement
2 pages
Team Alacrity - Amazon ML Challenge 2023 - Text File
No ratings yet
Team Alacrity - Amazon ML Challenge 2023 - Text File
8 pages
Machine Learning Extended Project
No ratings yet
Machine Learning Extended Project
3 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
New Chat: 1. Predicting Uber Ride Prices
No ratings yet
New Chat: 1. Predicting Uber Ride Prices
16 pages
Capstone 2 Corizo
No ratings yet
Capstone 2 Corizo
2 pages
Submission Type Due Date Total Score Available From Description
No ratings yet
Submission Type Due Date Total Score Available From Description
3 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
Malignant Comments Classifier Project
No ratings yet
Malignant Comments Classifier Project
30 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
DWDM Pavan Final
No ratings yet
DWDM Pavan Final
10 pages
A1991370857 65680 10 2025 Csm355ca1
No ratings yet
A1991370857 65680 10 2025 Csm355ca1
6 pages
FMT - Problem - Statement
No ratings yet
FMT - Problem - Statement
2 pages
Capstone Overview
No ratings yet
Capstone Overview
3 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
No ratings yet
Ifjo 320 Fy 98324 Fo 3 F 2 Ifr
6 pages
ML Project Proposal PDF
No ratings yet
ML Project Proposal PDF
4 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Project 2 Classification Models
No ratings yet
Project 2 Classification Models
5 pages
SL - Problem Statement
No ratings yet
SL - Problem Statement
3 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
Port State Control
No ratings yet
Port State Control
47 pages
CSC 603 - Final Project
No ratings yet
CSC 603 - Final Project
3 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
Important Questions
No ratings yet
Important Questions
4 pages
Sains - Integrated Curriculum For Secondary School
100% (6)
Sains - Integrated Curriculum For Secondary School
21 pages
Ambuja Mall, Raipur: Submitted To - Ar. Madhura Hanji
No ratings yet
Ambuja Mall, Raipur: Submitted To - Ar. Madhura Hanji
17 pages
2020PVC Geomembranes For Lining
100% (2)
2020PVC Geomembranes For Lining
34 pages
Data Science For Online Customer Analytics - Assignment
No ratings yet
Data Science For Online Customer Analytics - Assignment
11 pages
Form Release Agent-SDS
No ratings yet
Form Release Agent-SDS
9 pages
Kang-Nung Company Profile
100% (1)
Kang-Nung Company Profile
61 pages
PH Meter Starter - 3100
No ratings yet
PH Meter Starter - 3100
88 pages
Urbanization Class - Vii
No ratings yet
Urbanization Class - Vii
48 pages
Group 2 Indalex LTD - Case Analysis
No ratings yet
Group 2 Indalex LTD - Case Analysis
3 pages
Module 6-2
No ratings yet
Module 6-2
36 pages
Cma433 Topics 3
No ratings yet
Cma433 Topics 3
122 pages
Final Topic Cruise Tourism
No ratings yet
Final Topic Cruise Tourism
22 pages
Biofilms Presentation Handouts 3-16-15
No ratings yet
Biofilms Presentation Handouts 3-16-15
62 pages
2020 F4 SCIENCE NOTES KSSM CHAPTER 4 6a
No ratings yet
2020 F4 SCIENCE NOTES KSSM CHAPTER 4 6a
1 page
Sponge Park PDF
No ratings yet
Sponge Park PDF
9 pages
PBL Project On Drain Cleaner
No ratings yet
PBL Project On Drain Cleaner
8 pages
Twogether Article 22 en 24 Tranby
No ratings yet
Twogether Article 22 en 24 Tranby
4 pages
Precertification Worksheet: LEED v4.1 BD+C - Precertification
No ratings yet
Precertification Worksheet: LEED v4.1 BD+C - Precertification
62 pages
Sanyo Xacti Vpc-Sh1ex User Manual
No ratings yet
Sanyo Xacti Vpc-Sh1ex User Manual
96 pages
User Guide: R5010W/R5010B R5510W/R5510B/R5510S
No ratings yet
User Guide: R5010W/R5010B R5510W/R5510B/R5510S
17 pages
One Plastic Bag PDF
No ratings yet
One Plastic Bag PDF
25 pages
Specifications For Sewers Over 375mm Diameter
No ratings yet
Specifications For Sewers Over 375mm Diameter
2 pages
3m 323 Epoxy
No ratings yet
3m 323 Epoxy
3 pages
MEC7106: Thermo-Chemical Energy Engineering
No ratings yet
MEC7106: Thermo-Chemical Energy Engineering
3 pages
Food Waste Handling in Malaysia and Comparison With Other Asian Countries
No ratings yet
Food Waste Handling in Malaysia and Comparison With Other Asian Countries
7 pages
Reduce Global Warming
No ratings yet
Reduce Global Warming
2 pages
Huber - Folleto - Deshidratador - RoS 3Q
No ratings yet
Huber - Folleto - Deshidratador - RoS 3Q
4 pages
PMP Question Bank
From Everand
PMP Question Bank
Mohammad Usmani
4/5 (34)
Effective Test Case Writing
From Everand
Effective Test Case Writing
D. P. Harrison
4/5 (6)
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Re-Architecting Application for Cloud: An Architect's reference guide
From Everand
Re-Architecting Application for Cloud: An Architect's reference guide
Ashutosh Shashi
4/5 (1)
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Demonstrating Design for Six Sigma
From Everand
Demonstrating Design for Six Sigma
Robert Perrine
3/5 (2)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Data Breach A Complete Guide - 2020 Edition
From Everand
Data Breach A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet

Python - Project 2 Problem Statement

Uploaded by

Python - Project 2 Problem Statement

Uploaded by

16/01/2019 Project2

To Mail or Not to Mail

Train Dataset = carvan_train.csv

Test Dataset = carvan_test.csv

Formal Problem Statement

Target Variable is V86.

General Guidelines for the project

Wish you all the best!

You might also like