0% found this document useful (0 votes)

79 views18 pages

Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1

This business report analyzes insurance claim data using machine learning models. It imports claim data and cleans outliers. Univariate analysis finds relationships between variables. CART, random forest, and neural network classifiers are built on train and test splits. Neural networks have the best performance. Recommendations include collecting more demographic data, offering online discounts, improving underperforming agencies, and analyzing claim processing.

Uploaded by

Aditya Hajare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views18 pages

Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1

Uploaded by

Aditya Hajare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

BUSINESS REPORT ON

DATA MINING
By: Aditya Janardan Hajare
Batch: PGPDSBA Mar’C21 Group 1
Index

1. Problem 2: CART-RF-ANN
1.1 Importing Data
1.2 Understanding the data

2. Question 1
2.1 Univariate Analysis for Non-Categorical variables
2.2 Univariate Analysis for Categorical variables
2.3 Multivariate Analysis for non-categorical variables

3. Question 2
3.1 Building a CART classifier
3.2 Building a Random Forest classifier
3.3 Building a Neural Network classifier

4. Question 3
4.1 CART – AUC and ROC for the training data.
4.2 Random Forest model evaluation on training data
4.3 Neural Network model evaluation on training data

5. Question 4

6. Question 5
1) Problem 2: CART-RF-ANN

An Insurance firm providing tour insurance is facing higher claim frequency. The management decides to
collect data from the past few years. You are assigned the task to make a model which predicts the claim
status and provide recommendations to management. Use CART, RF & ANN and compare the models'
performances in train and test sets.

Dataset for Problem 2: insurance_part2_data-1.csv

Attribute Information:
1. Target: Claim Status (Claimed)
2. Code of tour firm (Agency_Code)
3. Type of tour insurance firms (Type)
4. Distribution channel of tour insurance agencies (Channel)
5. Name of the tour insurance products (Product)
6. Duration of the tour (Duration)
7. Destination of the tour (Destination)
8. Amount of sales of tour insurance policies (Sales)
9. The commission received for tour insurance firm (Commission)
10. Age of insured (Age)

1.1 Importing the data

1.2 Understanding the data

Using df.info() we can interpret the below points about the dataset

1) There are total 3000 rows and 10 columns.

2) Data has different data type. Viz. int64, object and float
3) The data type needs to be changed to categorical type except Age, Commission, Duration and
Sales. Balance will be categorical for building and interpreting the model
4) There are no-null values present

Above data can be fetched using df.describe().T. Below points can be interpreted from the output.

1) The mean age of the patients is 38 years and average age is 36 years. Minimum and Maximum
age of the patients is 8 and 84 respectively
2) The std deviation is high for Duration of the tour

There above table shows the unique values of each variable.

There are total 139 duplicate rows. This will not be treated as it accounts for approx.. 4% entries of the
dataset and there is also a possibility that the data might be similar for other patients. Hence these
duplicate entries will not be treated.
2. Question 1

2.1 Univariate Analysis for Non-Categorical variables

Below points can be interpreted from the univariate analysis

1) Heavy outliers are present in Age, Commission, Duration and Sales.

2) The outliers need to be treated for better results.

2.2 Univariate Analysis for Categorical variables

Below points can be interpreted from the univariate analysis of categorical variable.

1) EPX (agency_code) has the highest count of insurances being done and the claim status is
relatively low.
2) The claim rate if high for the insurance provided by Airlines and online channel is preferred the
most for claiming the insurance.
3) Customer prefer customized plan of insurances rather than std. The claim status is high for the
gold plan even when this plan is least preferred followed by Silver Plan.
4) Highest no of customer travel in Asian countries and the claim status is also high. But if ratios of
no of customer to claim status is compared, then claim status is high for customers travelling to
America followed by Europe and Asia.

2.3 Multivariate Analysis for non-categorical variables

Below points can be interpreted from the above pairplot of Age, Commission, Duration, Sales

1) There is no strong relationship between any of the non-categorical variables.

2) The heat map helps in understanding the relationship between these variables and can be found
that there is a strong relationship between sales and commission.
The above tables is the output after changing the data type to categorical for Agency_Code, Type,
Claimed, Channel, Product Name and Destination.
3. Question 2

Below is the Train and Test data split

Decision tree Classifier is built and tree was plotted using link: http://webgraphviz.com/

Link to open the Decision Tree Classifier: Click Here

3.1 Building a CART classifier

The above shown is the predicted classes and probs using random forest classifier.

3.2 Building a Random Forest classifier

#n_estimators are small values as the the kernel failed multiple times

The above shown is the predicted classes and probs using random forest classifier.
3.3 Building a Neural Network classifier

Above shown are the predicted classes and probs using Neural Network Classifier.

4. Question 3

4.1 CART – AUC and ROC for the training data.

AUC Score – 0.845

4.1.1 CART – AUC and ROC for the test data

AUC Score – 0.798

4.1.2 CART - Confusion Matrix and Classification report for the training data.

Accuracy of the train data – 0.8

Classification of the CART training data

4.1.3 CART - Confusion Matrix and Classification report for the testing data.

Accuracy of the train data – 0.7555

Classification of the CART testing data

4.1.4 Inferences from the CART Training and Testing data.

1) The accuracy value for both train and test data has no major difference
2) The precision value for train data is higher than test data
3) The f1-score also has no major difference.

4.2 Random Forest model evaluation on training data

4.2.1 Random Forest – AUC and ROC for the training data.

Area under the curve is 0.8665081

4.2.2 Random Forest – AUC and ROC for the testing data.

Area under the curve is 0.818548

4.2.3 Random Forest - Confusion Matrix and Classification report for the training data.

Accuracy of the train data – 0.77888

Classification of the Random Forest training data

Random Forest - Confusion Matrix and Classification report for the testing data.

Accuracy of the test data – 0.788571

Classification of the Random Forest testing data

4.2.4 Inferences from the Random Forest Training and Testing data.

1) The accuracy value for both train and test data has no major difference
2) The precision value for train data is higher than test data
3) The f1-score also has no major difference.

4.3 Neural Network model evaluation on training data

4.3.1 Neural Network – AUC and ROC for the training data.

Area under the curve is 0.815826

4.3.2 Neural Network – AUC and ROC for the testing data.

Area under the curve is 0.782790

4.3.3 Neural Network - Confusion Matrix and Classification report for the training data.

Accuracy of the training data – 0.788571

Classification of the Random Forest training data

4.3.4 Neural Network - Confusion Matrix and Classification report for the testing data.

Accuracy of the testing data – 0.764444

Classification of the Random Forest testing data

4.3.5 Inferences from the Neural Network Training and Testing data.

1) The accuracy value for both train and test data has no minor difference
2) The precision value for train and test data is same
3) The f1-score for train and test data is same
5. Question 4

Combined ROC curve for train data using CART, Random Forest and Neural Network models

Combined ROC curve for test data using CART, Random Forest and Neural Network models

Summary: Neural Network model has better accuracy, precision, recall and better f1-score than CART
and Random Forest. Hence I am selecting Neural Network model.
6. Question 5

Based on the dataset available for the analysis, more data related to age-group, time, incident, location,
airline names etc. can help to get more co-relations between the data and help in analyzing it in more
detailed manner.

Online channel is most preferred for getting the insurance done. Hence company should give discounts
to customers enrolling online. Doing so will reduce the offline registration and this will result in reducing
the offline overheads and mistakes

The JZI agency has minimum sales which is hitting the business. Company should either help the agency
to grow with a market penetration plan to reach max. possible customer or they should find an
alternative to JZI.

Since most of sale is done by Agency but claims are processed by Airlines. Need to deep dive into this to
understand more about this.

Personal Loan Campaign Final
No ratings yet
Personal Loan Campaign Final
12 pages
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
FEED WATER TREATMENT FROM AVT (R) TO AVT (O) L
100% (5)
FEED WATER TREATMENT FROM AVT (R) TO AVT (O) L
14 pages
Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Mini Project - Machine Learning - Tejas Nayak
No ratings yet
Mini Project - Machine Learning - Tejas Nayak
65 pages
Data Mininig Project
67% (3)
Data Mininig Project
28 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Chapter Ii Project Proposal 2
No ratings yet
Chapter Ii Project Proposal 2
37 pages
Statement
0% (1)
Statement
8 pages
Book Portfolio Design
No ratings yet
Book Portfolio Design
40 pages
Project Charter: Wilmont's Pharmacy Drone Case
No ratings yet
Project Charter: Wilmont's Pharmacy Drone Case
2 pages
Insurance - CART - RF - ANN - Models - Kaggle
No ratings yet
Insurance - CART - RF - ANN - Models - Kaggle
81 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Far Reviewer Complete
No ratings yet
Far Reviewer Complete
87 pages
Devart ODBCQuick Books
No ratings yet
Devart ODBCQuick Books
95 pages
Internship Project Report EXCELLENT
100% (1)
Internship Project Report EXCELLENT
74 pages
MSMEs Introduction-Rashmi Chaudhary
No ratings yet
MSMEs Introduction-Rashmi Chaudhary
46 pages
CH2 Illustrative Problems-1
No ratings yet
CH2 Illustrative Problems-1
3 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Journal of Business Research: Melika Husi Ć-Mehmedović, Ismir Omeragi Ć, Zenel Batagelj, Toma Ž Kolar
No ratings yet
Journal of Business Research: Melika Husi Ć-Mehmedović, Ismir Omeragi Ć, Zenel Batagelj, Toma Ž Kolar
10 pages
Janani Prakash Loan Prediction Study
No ratings yet
Janani Prakash Loan Prediction Study
97 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
The Mechanics Of: Mining More
No ratings yet
The Mechanics Of: Mining More
4 pages
Bank Data Analysis Report
No ratings yet
Bank Data Analysis Report
14 pages
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
No ratings yet
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
40 pages
Chapter 7 Supplier Evaluation and Selection
No ratings yet
Chapter 7 Supplier Evaluation and Selection
51 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Machine Learning Report
92% (12)
Machine Learning Report
42 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Discipline Records of Prolific Wisconsin Liberal Blogger Chris "Capper" Liebenthal
No ratings yet
Discipline Records of Prolific Wisconsin Liberal Blogger Chris "Capper" Liebenthal
10 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Final Project Report 22540 PDF
No ratings yet
Final Project Report 22540 PDF
9 pages
Machine Learning Project - Parijat
No ratings yet
Machine Learning Project - Parijat
26 pages
Project 4 Data Mining Final v2
100% (1)
Project 4 Data Mining Final v2
19 pages
Cost of Living Comparison Between Kingston Upon Thames, United Kingdom and Brighton, United Kingdom
No ratings yet
Cost of Living Comparison Between Kingston Upon Thames, United Kingdom and Brighton, United Kingdom
3 pages
Insights
No ratings yet
Insights
2 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Machine Learning
100% (2)
Machine Learning
30 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Entity Docs
No ratings yet
Entity Docs
2 pages
Thera Bank
100% (1)
Thera Bank
25 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
BSBOPS303 Workbook Theory
No ratings yet
BSBOPS303 Workbook Theory
22 pages
BP Projects
No ratings yet
BP Projects
10 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Data Mining Project
No ratings yet
Data Mining Project
11 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Mba M02 - Brand Management
No ratings yet
Mba M02 - Brand Management
2 pages
Solution PDF
No ratings yet
Solution PDF
4 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Suchita - Bhovar - Business Report - March 14 2024
No ratings yet
Suchita - Bhovar - Business Report - March 14 2024
24 pages
Sanatander Analysis
No ratings yet
Sanatander Analysis
19 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Afandi Fisco CV
No ratings yet
Afandi Fisco CV
2 pages
Rules For Creating, Login and Managing LK Profiles
No ratings yet
Rules For Creating, Login and Managing LK Profiles
2 pages
Week 1 Notes - Concept of Entrepreneurship
No ratings yet
Week 1 Notes - Concept of Entrepreneurship
18 pages
Predictive Modelling Report
No ratings yet
Predictive Modelling Report
13 pages
Assignment 1 Lego Aishvi Mehta
No ratings yet
Assignment 1 Lego Aishvi Mehta
26 pages
Comarch Company Profile ENG
No ratings yet
Comarch Company Profile ENG
20 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
BCM 1103 Exams 2023 Final
No ratings yet
BCM 1103 Exams 2023 Final
5 pages
Product Development
No ratings yet
Product Development
30 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
Project Report
No ratings yet
Project Report
34 pages
Sukanya December Predictive Modeling 14th Jan 2024
No ratings yet
Sukanya December Predictive Modeling 14th Jan 2024
50 pages
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
No ratings yet
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
76 pages
Mlproj
No ratings yet
Mlproj
49 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
CHap 4
No ratings yet
CHap 4
165 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Cost Object Mapping in SAP S4HANA Finance
No ratings yet
Cost Object Mapping in SAP S4HANA Finance
16 pages
Final Report
No ratings yet
Final Report
17 pages
Ief Example FALCON
No ratings yet
Ief Example FALCON
2 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Azki Task Solution - Afshin Amiri
No ratings yet
Azki Task Solution - Afshin Amiri
7 pages
Nida Specx
No ratings yet
Nida Specx
114 pages
ML - Extended Project Business Report-Richa
No ratings yet
ML - Extended Project Business Report-Richa
32 pages
Answer Report
No ratings yet
Answer Report
9 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
23 pages
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
From Everand
Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning
Michael Gilliland
No ratings yet
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
From Everand
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Wouter Verbeke
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1

Uploaded by

Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1

Uploaded by

BUSINESS REPORT ON

Dataset for Problem 2: insurance_part2_data-1.csv

1.1 Importing the data

1.2 Understanding the data

1) There are total 3000 rows and 10 columns.

There above table shows the unique values of each variable.

2.1 Univariate Analysis for Non-Categorical variables

1) Heavy outliers are present in Age, Commission, Duration and Sales.

2.2 Univariate Analysis for Categorical variables

2.3 Multivariate Analysis for non-categorical variables

1) There is no strong relationship between any of the non-categorical variables.

Below is the Train and Test data split

Link to open the Decision Tree Classifier: Click Here

3.1 Building a CART classifier

3.2 Building a Random Forest classifier

4.1 CART – AUC and ROC for the training data.

AUC Score – 0.845

AUC Score – 0.798

Accuracy of the train data – 0.8

Classification of the CART training data

Accuracy of the train data – 0.7555

Classification of the CART testing data

4.1.4 Inferences from the CART Training and Testing data.

4.2 Random Forest model evaluation on training data

Area under the curve is 0.8665081

Area under the curve is 0.818548

Accuracy of the train data – 0.77888

Classification of the Random Forest training data

Accuracy of the test data – 0.788571

4.3 Neural Network model evaluation on training data

Area under the curve is 0.815826

Area under the curve is 0.782790

Accuracy of the training data – 0.788571

Classification of the Random Forest training data

Accuracy of the testing data – 0.764444

Classification of the Random Forest testing data

You might also like