80% found this document useful (10 votes)

4K views

Machine Learning - Project

This document outlines a machine learning project involving customer segmentation and insurance claim prediction. For customer segmentation (Problem 1), the document describes exploring banking customer data to identify customer segments using clustering algorithms like hierarchical and k-means clustering. For insurance claim prediction (Problem 2), the document describes building classification models like CART, Random Forest and ANN on insurance claim data and comparing their performance to predict claims and provide business recommendations. The document outlines typical steps for exploratory data analysis, variable identification, missing value treatment, applying clustering/classification algorithms, and comparing model performance for both problems.

Uploaded by

Ashit Debdas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

80% found this document useful (10 votes)

4K views

Machine Learning - Project

Uploaded by

Ashit Debdas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Machine Learning - Project

Ashit Debdas
BACP-2020

1|Page
Table of Contents
1 Project Objective…………………………………………………………………………………………………………………………3
1.1 Problem one: Clustering …………………………………………………………………………………………………...3
1.2 Problem Two: CART-RF-ANN …………………………………………………………………………………………….3
2 Exploratory Data Analysis – Step by step approach………………………………………………………………………3
2.1 Install Necessary Packages and Invoke Library………………………………………………………………………3
2.2 Set up Working Directory……………………………………………………………………………………………………3
2.3 Import and Read Data Set………………………………………………………………………………………………...3
3 Variable Identification……………………………………………………………………………………………………………….3
4 Missing Value Treatment……………………………………………………………………………………………………….….3
5 Insights from Problem one…….………………………………………………………………….…………………………......4
5.1 Read the data and do exploratory data analysis.……………………………………………………….….…...4
5.2 Do you think scaling is necessary for clustering in this case.……………………………………….….….4
5.3 Apply hierarchical clustering to scaled data, Identify the number of optimum clusters using
Dendrogram and briefly describe the………………………………………………………………………….….….5
5.4 Apply K-Means clustering on scaled data and determine optimum clusters.………………………5
5.5 Describe cluster profiles for the clusters defined. Recommend different promotional
strategies for different clusters…………...…………………….…….………….…………………………………….6
6 Insights from Problem Two…………………………………………………………………………………………….............7
6.1 Read the dataset. Do the descriptive statistics and do null value condition check, write an
inference…….……………………………………………………………………………………………………………….….…7
6.2 Data Split: Split the data into test and train, build classification model CART, Random
Forest and ANN………………………………………………………………………………………………...….…….….….8
6.3 Performance Metrics: Check the performance of Predictions on Train and Test sets using
Confusion Matrix………………………………………………………………………………………………………….….11
6.4 Final Model: Compare all the model and write an inference which model is optimized.….13
6.5 Inference: Basis on these predictions, what are the business insights and
recommendations…………………………………………………………………………………………………...………13

2|Page
1 Project Objective
1.1 Problem one
A leading bank wants to develop a customer segmentation to give promotional offers to its customers. They collected
a sample that summarizes the activities of users during the past few months. You are given the task to identify the
segments based on credit card usage. The data of the las few months is in “bank_marketing_part1_Data (2).csv”

1.1 Problem Two

An Insurance firm providing tour insurance is facing higher claim frequency. The management decides to collect data
from the past few years. You are assigned the task to make a model which predicts the claim status and provide
recommendations to management. Use CART, RF & ANN and compare the models' performances in train and test
sets. The data of the last few years is in “insurance_part2_data (1).csv”.

2 Exploratory Data Analysis – Step by step approach

A Typical Data exploration activity consists of the following setup
2.1 Install necessary packages and invoke Library
Before start this section to install necessary packages and invoke associated libraries. Having all the packages at the
same places increases code readability more efficiently.
2.2 Set up Working Directory
Setting a working directory on starting of the R session makes importing and exporting data files and code files easier.
Basically, working directory is the location/ folder on the PC where you have the data, codes related to the project
which makes thinks more sophisticated.
2.3 Import and Read the data
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the file.
 Problem One for Clustering: bank_marketing_part1_Data (2).csv
 Problem Two for CART-RF-ANN: insurance_part2_data (1).csv
3 Variable Identification
The dataset is analyzed for basic understanding of the features and data contained. it is usually an activity by which
data is explored and organized.
• Variable classes
Problem one has 210 Rows and 7 columns
Problem Two has 3000 Rows and 10 columns

4 Missing Value Treatment

Missing value treatment is an important step in Exploratory Data Analysis, essentially missing data in the training
data set can reduce the power of a model or can lead to a biased model because we have not analyzed the behavior
and relationship with other variables correctly. It can lead to wrong prediction or classification. The both datasets
under scrutiny does not have any Missing values.

Problem one:

> customer_segm = read.csv("bank_marketing_part1_Data (2).csv", header = TRUE)

> anyNA(customer_segm)
[1] FALSE

Problem two:

> Insurance = read.csv("insurance_part2_data (1).csv", header = TRUE)

> anyNA(Insurance)
[1] FALSE

5 Insights from Problem one

5.1 Read the data and do exploratory data analysis.
5|Page
> summary(customer_segm)
spending advance_payments probability_of_full_payment current_balance credit_limit min_payment_amt
Min. :10.59 Min. :12.41 Min. :0.8081 Min. :4.899 Min. :2.630 Min. :0.7651
1st Qu.:12.27 1st Qu.:13.45 1st Qu.:0.8569 1st Qu.:5.262 1st Qu.:2.944 1st Qu.:2.5615
Median :14.36 Median :14.32 Median :0.8734 Median :5.524 Median :3.237 Median :3.5990
Mean :14.85 Mean :14.56 Mean :0.8710 Mean :5.629 Mean :3.259 Mean :3.7002
3rd Qu.:17.30 3rd Qu.:15.71 3rd Qu.:0.8878 3rd Qu.:5.980 3rd Qu.:3.562 3rd Qu.:4.7687
Max. :21.18 Max. :17.25 Max. :0.9183 Max. :6.675 Max. :4.033 Max. :8.4560
max_spent_in_single_shopping
Min. :4.519
1st Qu.:5.045
Median :5.223
Mean :5.408
3rd Qu.:5.877
Max. :6.550
As we can see data summary that has 7 individual columns. Every column has unique name and mean, median,
Quartiles and also can view all the necessary outputs.

> str(customer_segm)
'data.frame': 210 obs. of 7 variables:
$ spending : num 19.9 16 18.9 10.8 18 ...
$ advance_payments : num 16.9 14.9 16.4 13 15.9 ...
$ probability_of_full_payment : num 0.875 0.906 0.883 0.81 0.899 ...
$ current_balance : num 6.67 5.36 6.25 5.28 5.89 ...
$ credit_limit : num 3.76 3.58 3.75 2.64 3.69 ...
$ min_payment_amt : num 3.25 3.34 3.37 5.18 2.07 ...
$ max_spent_in_single_shopping: num 6.55 5.14 6.15 5.18 5.84 ...

The beside created visual plot helps in understanding

the correlation strength between the variables.
5.2 Do you think scaling is necessary for
clustering in this case?
Normalization is used to eliminate
redundant data and ensures that good quality
clusters are generated which can improve the
efficiency of clustering algorithms. So, it becomes an
essential step before clustering as Euclidean
distance is very sensitive to the changes in the
differences. all dimensions are equally important.

However, in this Market Segmentation data has

various dimension’s like spending: Amount spent by
the customer per month (in 1000s) 2.
advance_payments: Amount paid by the customer in
advance by cash (in 100s) 3.
probability_of_full_payment: Probability of payment done in full by the customer to the bank 4. current_balance:
Balance amount left in the account to make purchases (in 1000s) 5. credit_limit: Limit of the amount in credit card
(10000s) 6. min_payment_amt: minimum paid by the customer while making payments for purchases made monthly
(in 100s) 7. max_spent_in_single_shopping: Maximum amount spent in one purchase (in 1000s). below snapshot
shows after scaling data.

> head(customer_segm_scale)
spending advance_payments probability_of_full_payment current_balance credit_limit min_payment_amt
[1,] 1.7501726 1.8076485 0.1778050 2.3618888 1.3353877 -0.2980937
[2,] 0.3926441 0.2532349 1.4981931 -0.5993122 0.8561898 -0.2422262
[3,] 1.4099313 1.4247880 0.5036700 1.3981443 1.3142077 -0.2209434
[4,] -1.3807350 -1.2246066 -2.5856995 -0.7911583 -1.6351103 0.9855289
[5,] 1.0800003 0.9959842 1.1934881 0.5901336 1.1527101 -1.0855596
[6,] -0.7380569 -0.8800322 0.6941106 -1.0055745 -0.4437341 3.1630318
max_spent_in_single_shopping
[1,] 2.3234463
[2,] -0.5372979
[3,] 1.5055095
[4,] -0.4538765
[5,] 0.8727275
[6,] -0.8302902

5.3 Apply hierarchical clustering to scaled data. Identify the number of optimum clusters using
Dendrogram and briefly describe them.

6|Page
Since Clustering unsupervised learning after using distance matrix and plotting the dendrogram we can see 3 cluster
would be optimal cluster.

However, dendrogram gives us clear picture we can take the high value from hclust value and visual graph we can
see various highs merged happens. In this last merge has significant drops, after third merges there is not drops. So,
we can consider three optimum clusters.

As well Clusplot visitations shows first two components explained by 88.93%. so, we can conclude by saying three
optimum clusters would be best fit.

5.4 Apply K-Means clustering on scaled data and determine optimum cluster.

K-means clustering with 3 clusters of sizes 72, 71, 67 . and also verious graphicl plot which is mentiond below as
follows cluster plot, by using nclust (WSS), silhouette method, gap_stat (Bootstrapping) mehod. Every graphical
methods shows three cluster is a optimal cluster.

7|Page
The Hubert index is a graphical method of
determining the number of clusters. In the plot of
Hubert index, we seek a significant knee that
corresponds to a significant increase of the value of
the measure i.e. the significant peak in Hubert index
second differences plot.

The D index is a graphical method of determining

the number of clusters. In the plot of D index, we
seek a significant knee (the significant peak in
Dindex second differences plot) that corresponds to
a significant increase of the value of the measure.

According to the majority rule, the best number of

clusters is 3

5.5 Describe cluster profiles for the clusters defined. Recommend different promotional strategies
for different clusters.

8|Page
In Hierarchical Clustering each grope of cluster shows indifferent other variable to first group of clusters similarly
second and third group.

As we run the silhouette function, we can observe each cluster size and
average silhouette and each cluster not overlapped. And also, we can
observe, cluster 1 closest neighbor 2 cluster and. 2 cluster neighbor 3 cluster
By using Hierarchical Clustering, we can say cluster 1 grope of people
spending more and they do usually more advance payment, probability of
full payment is higher compare to 3 group cluster.

According business prospective we can target cluster 1 people give

attractive offer as followers cluster 2 and cluster 3.

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The K-means
algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster as we can see
this problem statement just like Hierarchical Clustering group 1 people are spending more money and advance
payment as well compare to other cluster.

6 Insights from Problem Two

6.1 Read the dataset. Do the descriptive statistics and do null value condition check, write an
inference?

As we can see data frame has 3000 obs. of 10 variables.

And summary reveals that data has 10 columns and there mean, median and Quartiles and also can view all the
necessary outputs.
9|Page
So, fare we don’t have Null value in this data set.

6.2 Data Split: Split the data into test and train, build classification model CART, Random Forest and
Artificial Neural Network
successfully splits the data set by 80% ratio. Now
we have training data set and test set. Training
data set has 2400 ovservation and test set has 600
ovservation.

We can observe almost similiter percentage claimed status have both the data set. Total claimed Records: 294
(30.80%) Total Not claimed Records: 2076 (69.20%).
Model Building – CART
Decision Trees are commonly used in data mining with the objective of creating a model that predicts the value of a
claimed (or dependent variable) based on the values of several input (or independent variables).
Classification Trees where the target variable is categorical and the tree is used to identify the “class” within which a
target variable would likely fall into. Regression Trees where the target variable is continuous and tree is used to
predict its value.

10 | P a g
e
The arguments to rpart. control checked against the list of valid arguments to create a Decision tree model. Visual plot
represents the decision tree model.

Here we can see the various nsplit and xerror. After the 7th split there is significant increasing trend on xerror 071448 to
10th split 0.72936.

After using post Pruning technique, we can cut the tree since xerror where increasing after 7th split. 11 | P a g
e
Model Building - Random Forest
A Supervised Classification Algorithm, as the name suggests, this algorithm creates the forest with a number of trees in
random order. In general, the more trees in the forest the more robust the forest looks like. In the same way in the random
forest classifier, the higher the number of trees in the forest gives the high accuracy results.
Some advantages of using Random Forest are as follows:
The same random forest algorithm or the random forest classifier can use for both classification and the regression task.
 Random forest classifier will handle the missing values.
 When we have more trees in the forest, random forest classifier won’t over fit the model.
 Can model the random forest classifier for categorical values also.
The model is built with dependant variable as Claimed, and considering all independent variables.

Random Forests algorithm is a classifier based on primarily two

methods - bagging and random subspace method.
Out-of-bag (OOB) error, also called out-of-bag estimate, is a
method of measuring the prediction error of random forests,
boosted decision trees, and other machine learning models
utilizing bootstrap aggregating to subsample data samples used
for training.
Out-of-bag estimates help in avoiding the need for an
independent validation dataset.
In this mode we have OOB estimate of error rate: 21.96% and
this model shows the significant decreasing error rate if we
increase the tree. OOB is a combine measure of claim frequency
yes or no.
It is observed that as the number of tress increases, the OOB
error rate starts decreasing.

In the random forests the number of variables available for splitting at each tree node is referred to as the mtry parameter.
The optimum number of variables is obtained using tuneRF function. Optimum number of mytre is 9.

Model Building – Artificial Neural Network

Artificial neural networks (ANNs) are statistical models directly inspired by, and partially modeled on biological neural
networks. They are capable of modeling and processing nonlinear relationships between inputs and outputs in parallel.
Artificial neural networks are characterized by containing adaptive weights along paths between neurons that can be tuned
by a learning algorithm that learns from observed data in order to improve the model. In addition to the learning algorithm
itself, one must choose an appropriate cost function.
12 | P a g
e
The cost function is what’s used to learn the optimal solution to the problem being solved. This involves determining the
best values for all of the tune able model parameters, with neuron path adaptive weights being the primary target, along
with algorithm tuning parameters such as the learning rate. It’s usually done through optimization techniques such as
gradient descent or stochastic gradient descent.
These optimization techniques basically try to make the ANN solution be as close as possible to the optimal solution, which
when successful means that the ANN is able to solve the intended problem with high performance.

In this Artificial Neural Network after 6312 min thresh Error

reduce 144.49637. here we can see visitation graph as well
6.3 Performance Metrics: Check the
performance of Predictions on Train and Test sets
using Confusion Matrix
Bellow visitation reflects the CART model confusion matrix
which reflects accuracy 77% for test set and training set
79% since insurance is facing higher claim frequency. Since
claim online majority status number is “No” both train and
test data set Hence we there is significant increases on
insurance claims current study shows.

13 | P a g
e
Confusion Matrix = CART

Confusion Matrix = Random Forest

14 | P a g
e
In the Random forest model slides different accuracy compare to both test and train data. Train data has accuracy 90% but
test model has 77%. I would say train data giver good accuracy.
Confusion Matrix = Artificial Neural Network

In This Artificial Neural Network, we can observe similar kind of trends test data has 77% accuracy and train data has 81%.

6.4 Final Model: Compare all the model and write an inference which model is best/optimized

The CART method has given poor performance compared to Random Forest and ANN. Looking at the percentage deviation
between Training and Testing Dataset, it looks like the Model is over fit. The Random Forest method has the best
performance (best accuracy) among all the three models. The percentage deviation between Training and Testing Dataset
also is reasonably under control, suggesting a robust model. Neural Network has given relatively secondary performance
compared to Random Forest, however, better than CART. However, the percentage deviation between Training and Testing
Data set is minimal among three models.

6.5 Inference: Basis on these predictions, what are the business insights and recommendations
The main objective of the project was to develop a predictive model to predict if An Insurance firm providing tour insurance
is facing higher claim frequency. There is a probability they would get more, as of now AUC area under the ROC curve.
customers will respond positively to a promotion or an offer using tools of Machine Learning.

15 | P a g
e
16 | P a g
e

ML Quiz 3 Machine Learning Great Learning
89% (9)
ML Quiz 3 Machine Learning Great Learning
7 pages
Data Mining Business Report Hansraj Yadav
83% (12)
Data Mining Business Report Hansraj Yadav
34 pages
Assignment - Predictive Modeling
88% (24)
Assignment - Predictive Modeling
66 pages
Capstone - Project - Final - Report - Churn - Prediction
100% (3)
Capstone - Project - Final - Report - Churn - Prediction
28 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
Final Project - ML - Nikita Chaturvedi - 03.10.2021 - Jupyter Notebook
100% (11)
Final Project - ML - Nikita Chaturvedi - 03.10.2021 - Jupyter Notebook
154 pages
Samsung Aqv09 Aqv12 Service Manual PDF
100% (1)
Samsung Aqv09 Aqv12 Service Manual PDF
88 pages
Project Instructions 2
No ratings yet
Project Instructions 2
5 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
Project - Machine Learning - Rajendra M Bhat
100% (11)
Project - Machine Learning - Rajendra M Bhat
19 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Harshini Week 8 Doc PDF
No ratings yet
Harshini Week 8 Doc PDF
10 pages
Data Visualization in Tableau - Car Insurance Claim Project
50% (2)
Data Visualization in Tableau - Car Insurance Claim Project
51 pages
Week 7 Project Report 1 and 2
No ratings yet
Week 7 Project Report 1 and 2
10 pages
Project-Predictive Modelling - Tanaya - Lokhande
100% (1)
Project-Predictive Modelling - Tanaya - Lokhande
55 pages
Unimog Small Emplacement Excavator - Operator's Manual
No ratings yet
Unimog Small Emplacement Excavator - Operator's Manual
51 pages
Data Mining Project
100% (2)
Data Mining Project
20 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Ritesh Machine Learning Project
100% (9)
Ritesh Machine Learning Project
46 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Machine Learning Project
83% (6)
Machine Learning Project
37 pages
ML Project Report
100% (2)
ML Project Report
35 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
No ratings yet
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
22 pages
DataMiningProjectProblem1 Clustering
100% (4)
DataMiningProjectProblem1 Clustering
20 pages
Lifi
100% (1)
Lifi
16 pages
State Wise Health Income Clustering 18th December 2021 PDF
100% (2)
State Wise Health Income Clustering 18th December 2021 PDF
29 pages
Statisitics Project 6
100% (2)
Statisitics Project 6
48 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Predictive Modelling
67% (3)
Predictive Modelling
64 pages
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
100% (1)
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
3 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Facebook Comment Volume Prediction
No ratings yet
Facebook Comment Volume Prediction
20 pages
PN1 Shakti Akshaya S PDF
100% (2)
PN1 Shakti Akshaya S PDF
60 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
Project ML
100% (4)
Project ML
36 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
Assignment Report - Advanced Statistics
No ratings yet
Assignment Report - Advanced Statistics
12 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
FRA Report
100% (1)
FRA Report
30 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Weekly Quiz 2 Machine Learning PDF
100% (1)
Weekly Quiz 2 Machine Learning PDF
4 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
As Quiz 3 PCA Solution PDF
100% (1)
As Quiz 3 PCA Solution PDF
1 page
Machine Learning Project - Sapan Parikh
100% (1)
Machine Learning Project - Sapan Parikh
12 pages
Tableau Questions
No ratings yet
Tableau Questions
2 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Data Mining
No ratings yet
Data Mining
28 pages
North American Spine Society Public Education Series
No ratings yet
North American Spine Society Public Education Series
8 pages
Accurate Time For Windows Server 2016
No ratings yet
Accurate Time For Windows Server 2016
23 pages
Alagos & Bayona - Advanced Accounting 1-6 MCQ
100% (1)
Alagos & Bayona - Advanced Accounting 1-6 MCQ
24 pages
cstTRA-3470254 3
No ratings yet
cstTRA-3470254 3
7 pages
NAMUR High Flow Catalog
No ratings yet
NAMUR High Flow Catalog
7 pages
DAF PR Solaris Motori PDF
No ratings yet
DAF PR Solaris Motori PDF
4 pages
NIV Appointment System
No ratings yet
NIV Appointment System
1 page
Hadzic's Textbook of Regional Anesthesia and Acute Pain Management: Self-Assessment and Review 1st Edition Admir Hadzic 2024 Scribd Download
No ratings yet
Hadzic's Textbook of Regional Anesthesia and Acute Pain Management: Self-Assessment and Review 1st Edition Admir Hadzic 2024 Scribd Download
51 pages
Case Study 1-2
No ratings yet
Case Study 1-2
1 page
7 Training
No ratings yet
7 Training
18 pages
Detailing Rules & Special Dimensioning Rules in Eurocode 8
No ratings yet
Detailing Rules & Special Dimensioning Rules in Eurocode 8
5 pages
Samsung Hl50a650
No ratings yet
Samsung Hl50a650
22 pages
All You Need To Know About The Long Term Repo Operations Introduced by RBI
No ratings yet
All You Need To Know About The Long Term Repo Operations Introduced by RBI
2 pages
Batch Distillation and Plate and Packed Column Sizing (Compatibility Mode)
No ratings yet
Batch Distillation and Plate and Packed Column Sizing (Compatibility Mode)
70 pages
22-23 EE INDIAN CONSTITUTION
No ratings yet
22-23 EE INDIAN CONSTITUTION
12 pages
English Operating Instructions For Third Generation Vacuum Homogenizing Cooking Kettle With Mixer
No ratings yet
English Operating Instructions For Third Generation Vacuum Homogenizing Cooking Kettle With Mixer
25 pages
An1709 Application Note: Emc Design Guide For Stm8, Stm32 and Legacy Mcus
No ratings yet
An1709 Application Note: Emc Design Guide For Stm8, Stm32 and Legacy Mcus
38 pages
Pipeline Engineering: Wall Thickness Calculation
No ratings yet
Pipeline Engineering: Wall Thickness Calculation
3 pages
07 Ppe Cost s24 - Cls - Done
No ratings yet
07 Ppe Cost s24 - Cls - Done
37 pages
Gameskraft Technologies Private Limited Vs Directorate General of Goods Services Tax Intelligence Karnataka High Court
No ratings yet
Gameskraft Technologies Private Limited Vs Directorate General of Goods Services Tax Intelligence Karnataka High Court
325 pages
Contem. World Lesson 3
100% (1)
Contem. World Lesson 3
6 pages
Oops in Sap Abap
No ratings yet
Oops in Sap Abap
117 pages
534 - Safety Audit
No ratings yet
534 - Safety Audit
1 page
What Is Regulatory Affairs
No ratings yet
What Is Regulatory Affairs
4 pages
Paper LBO Model Example: How To Rip Through A Paper LBO in 5 Minutes
No ratings yet
Paper LBO Model Example: How To Rip Through A Paper LBO in 5 Minutes
2 pages
PDA-7000 Series: Shimadzu Optical Emission Spectrometer
No ratings yet
PDA-7000 Series: Shimadzu Optical Emission Spectrometer
16 pages
Patient Monitor, Module Monitor
No ratings yet
Patient Monitor, Module Monitor
7 pages
Sem - Output.cad 313
No ratings yet
Sem - Output.cad 313
7 pages

Machine Learning - Project

Uploaded by

Machine Learning - Project

Uploaded by

Machine Learning - Project

1.1 Problem Two

2 Exploratory Data Analysis – Step by step approach

4 Missing Value Treatment

> customer_segm = read.csv("bank_marketing_part1_Data (2).csv", header = TRUE)

> Insurance = read.csv("insurance_part2_data (1).csv", header = TRUE)

5 Insights from Problem one

The beside created visual plot helps in understanding

However, in this Market Segmentation data has

The D index is a graphical method of determining

According to the majority rule, the best number of

According business prospective we can target cluster 1 people give

6 Insights from Problem Two

As we can see data frame has 3000 obs. of 10 variables.

Random Forests algorithm is a classifier based on primarily two

Model Building – Artificial Neural Network

In this Artificial Neural Network after 6312 min thresh Error

Confusion Matrix = Random Forest

You might also like