0% found this document useful (0 votes)

477 views23 pages

Project Report On Customer Lifetime Value

This document discusses a team's analysis of customer lifetime value (CLV) for an auto insurance company. They analyzed industry trends, selected important features like monthly premium and total claims amount, then applied various machine learning models like linear regression, decision trees, XGBoost, neural networks and random forests. The random forest model performed best with an R2 of 0.96 on the training set and 0.71 on the test set. The team also cleaned the data by removing outliers and selecting important variables to improve the model.

Uploaded by

Shubham Ekapure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

477 views23 pages

Project Report On Customer Lifetime Value

Uploaded by

Shubham Ekapure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Open IIT Data Analytics

09.09.2019
─

Team Name - R
unTime Terror
Shubham Ekapure
Naman Paharia
Varun Madhavan
Arusarka Bose
Umang Aditya

INDEX

The Industry analysis
The features
Models applied
Multivariate Regression
Decision trees
XGBoost
Neural Network
Random Forest Regression
Data cleansing and variable selection
Data Cleansing
Variable Selection
Model Used
Statical Test
Insights For The Company

THE INDUSTRY ANALYSIS

Vehicle insurance (commonly known as car insurance, motor insurance, or auto
insurance) is insurance for cars, trucks, motorcycles etc. Its primary use is to provide
financial protection against physical damage or bodily injury resulting from traffic
collisions and against liability that could also arise from incidents in a vehicle.
Vehicle insurance additionally offers financial protection against theft of the vehicle,
and against damage to the vehicle sustained from events other than traffic
collisions, such as keying, weather or natural disasters, and damage sustained by
colliding with stationary objects.

The Automobile Insurance industry consists of Personal liability insurance, personal
collision/comprehensive insurance, commercial liability insurance, commercial
collision/comprehensive insurance and investment income.

This industry makes revenue through the fear of any risk in the public, basically, a
higher number of insurance buyers and lesser claims means more revenue for the
industry. Hence, the terms and conditions regarding any auto-insurance play a very
important role. The future looks promising for the life insurance industry with
several changes in the regulatory framework which will lead to further change in the
way the industry conducts its business and engages with its customers.

THE FEATURES

To predict Customer Lifetime value most important features are:

Monthly Premium Auto -It is the premium paid by the customer to the company per
month.
It has a correlation of 0.62
It has a mean value of 93.29

Months since policy inception - The total number of month customer pays the
premium to the company.
It has a mean of 48.06

Income of the customer - income varies from 0 to 99981, has a standard deviation of
30380, mean value of 37657 and skewness of 0.2868.

Total claim amount - The money claimed by the policyholder from the company.
Mean value- 434.09, standard deviation of 290.5, correlation with CLV is 0.54.

The number of open complaints - The company has maximum profit from the
customers who have 0 or 1 open complaints with an aggregate of $55M. Clearly the
open complaints determine how well does the company behave with the customer.

Vehicle - The type and size of vehicle affects the CLV prominently. 50% of the
insurance holders have four-door car, and it corresponds to 33% of aggregate money
made by the company. The average CLV of luxury vehicles is more than normal ones.
70% of the customers have a med-size vehicle and Large vehicles are prone to
higher claims.

Gender - Gender composition is almost equal with 51% of females in the dataset.

This is the Pearson’s r correlation plot.The Pearson product-moment correlation
coefficient is a measure of the strength of the linear relationship between two
variables. It is referred to as Pearson's correlation or simply as the correlation
coefficient. With only the numerical variables being taken into consideration, this
plot justifies the choice of variables in the final model. Few categorical variables like
Response, Coverage are also quantified and are further proved to be a good measure
of evaluation.

MODELS APPLIED

CLV is a continuous variable and based on its relationship with other features of the
data-set like income, employment etc. we can apply various machine learning
models to it.

Multivariate Linear Regression - T he model didn’t perform as expected
because of non-linear relationship which could be observed with most of the
features like months since the last claim, the number of open complaints. But a
positive accuracy meant that we had features which followed linear relationship like
monthly premium.

Decision Tree Model- Decision trees can sometimes be beneficial for regressions
as well. The roots divided on the basis of root squared error and hence we got a
regression tree. Effective features like location code, marital status, employment
came into the scheme of model development and that increased the accuracy by
40%. The error in the prediction was also diminished but the chances of high
variance and overfitting increase if the depth is increased. Even a small change in
input data can at times, cause large changes in the tree. Decision trees moreover,
examine only a single field at a time, leading to rectangular classification boxes.

This model gave us an R2 score of 0.99 (overfitting) on the train set and 0.47 on the
test set.

Gradient Boosting Regressor- A model that uses boosting with simple Decision
Trees.

This model gave us an r2 score of 0.73 on the train set and 0.68 on the test set.
XGBoost- XGBoost is an implementation of Gradient Boosted Decision Trees that
have given a spectacular performance in Kaggle competitions.

This model gave us an r2 score of 0.96 on the train set and 0.71 on the test set.

Neural Network- W
e also tried to fit a fully connected neural network to the data,
choosing the number of layers and the number of units per layer using hyperopt
trained to minimize root mean square error.
This model gave us an r2 score of 0.68 on the train set and 0.47 on the test set.

Random Forest Regressor- R andom forests are a strong modelling technique and
much more robust than a single decision tree. They aggregate many decision trees
to limit overfitting as well as error due to bias. The performance was boosted up by
60%, the MSE dropped significantly and overall without any features scaling this
was the best model as per as the accuracy. Inclusion of many trees based on
unrelated categorical features like gender, response and policy type enhanced the
performance.
We also used the hyperopt library for Bayesian Hyperparameter Optimization for the
Random Forest Model.
This model gave us an r2 score of 0.96 on the train set and 0.71 on the test set.

DATA CLEANSING AND VARIABLE SELECTION

DATA CLEANING:
We analysed the data and calculated the Z Score. After that, the outliers with
z-score>3 were removed but as a consequence, the overall performance of the
model was decreased and so it was pretermitted.
The variable of customer ID and Effective to date were dropped. Customer ID, each
being unique didn’t have any relationship whatsoever. The effective date
corresponds to all the policies that die out in January and February i.e. basically a
portion of the whole data that the company provided.

VARIABLE SELECTION:
We found the model weights and that Number of Policies, Monthly Premium Auto,
Vehicle Class, Total Claim Amount and Income had the highest correlation
We did standardisation for preprocessing of the data.
Based on the monthly premium and the number of months of policy inception we
figured out a new variable called Aggregate amount paid which was basically the
product of total months and monthly premium. We took into account the fact that
some people extend their policies as-well. This resulted in improved accuracy of
0.22%.

MODEL USED
Random Forest Regressor - Random Forest Regressor has been used as the
running model of the python code.
Random Forest is an ensemble machine learning technique capable of performing
both regression and classification tasks using multiple decision trees and a
statistical technique called bagging. Bagging along with boosting are two of the
most popular ensemble techniques which aim to tackle high variance and high bias.
A RF instead of just averaging the prediction of trees it uses two key concepts that
give it the name random:

1. Random sampling of training observations when building trees

2. Random subsets of features for splitting nodes
In other words, Random forest builds multiple decision trees and merge their
predictions together to get a more accurate and stable prediction rather than relying
on individual decision trees.

Random sampling of training observations:

Each tree in a random forest learns from a random sample of the training
observations. The samples are drawn with replacement, known as bootstrapping,
which means that some samples will be used multiple times in a single tree. The
idea is that by training each tree on different samples, although each tree might
have high variance with respect to a particular set of training data, overall, the
entire forest will have lower variance but not at the cost of increasing the bias. In
Sklearn implementation of Random forest the sub-sample size of each tree is always
the same as the original input sample size but the samples are drawn with
replacement

if bootstrap=True. If bootstrap=False each tree will use exactly the same dataset
without any randomness.

Random Subsets of features for splitting nodes:

The other main concept in the random forest is that each tree sees only a subset of
all the features when deciding to split a node. In Sklearn this can be set by
specifying max_features = sqrt(n_features) meaning that if there are 16 features, at
each node in each tree, only 4 random features will be considered for splitting the
node.

General Working Steps:

Step 1: Samples are taken repeatedly from the training data so that each data point
is having an equal probability of getting selected, and all the samples have the same
size as the original training set.
Let's say we have the following data:
x= 0.1,0.5,0.4,0.8,0.6, y=0.1,0.2,0.15,0.11,0.13 where x is an independent variable
with 5 data points and y is the dependent variable.
Now Bootstrap samples are taken with replacement from the above data set.
n_estimators is set to 3 (no of tree in random forest), then:
The first tree will have a bootstrap sample of size 5 (same as the original dataset),
assuming it to be:
x1={0.5,0.1,0.1,0.6,0.6} likewise
x2={0.4,0.8,0.6,0.8,0.1}
x3={0.1,0.5,0.4,0.8,0.8}

Step 2: A Random Forest Regressor model is trained at each bootstrap sample
drawn in the above step, and a prediction is recorded for each sample.

Step 3: Now the ensemble prediction is calculated by averaging the predictions of
the above trees producing the final prediction.

Parameters used in our Model :

Number of trees in the forest =100

Maximum depth of the tree =20
Minimum number of samples required to split an internal node =2
Seed for the random number generator =0

STATISTICAL TESTS
R-squared Score - It is a statistical measure of how close the data are to the fitted
regression line, it is the percentage of the response variable variation that is
explained by a linear model.

R-squared = Explained variation / Total variation

R-squared result of the model is 0.7107

RMSE Score - The RMSE is the square root of the variance of the residuals, it can
be interpreted as the standard deviation of the unexplained variance.

RMSE score of the model is 3839.76

MAE Score -Mean Absolute Error (MAE) is a measure of the difference between two
continuous variables.

MAE score of the model is 1470.28

MSE Score - Mean Squared Error basically measures the average squared error of
our predictions.

MSE score of the model is 14743785.17

INSIGHTS FOR THE COMPANY

Effective marketing in the semi-urban and rural sector can be more profitable to the
company because they pay a higher amount of monthly premium and the public in
rural sector claim lesser money than the people in the urban sector. Suburban region
has a maximum population of insurance takers and that relates to the high CSV.
The graphs here quantify the same:

*The Monthly Premium Auto is the aggregate sum of all premiums falling in that category.

An educated person is generally financially stable because of his/her employment.
But the bigger difference lies in the gender of the educated person. Generally, male
educated bachelors a nd h igh school or below have a higher tendency to take
insurances. In the case of females population, the focus must be on educated
bachelors and college-going ones.

It's definitely not worth wasting resources on doctors because most of them receive
such insurance benefits under the government. Employed people with Masters are
the ones who should be focussed as they correspond to very little of the total
profitability.

The customers with premium coverage are the best to focus on. Proving some
discounts for renewal of Premium insurance holders can be strategically very fruitful.
As the insurance type changes from basic to extended, growth of 10% can be
observed and for premium, the growth rate is 20%.

With the enormous reach that the internet has today, straying online is one of the
best options for an insurance company. Till that time engaging more and cheaper
agents, developing branches are going to be of great help to the company.

Generally, a greater segment of customers have of four-door cars, SUVs and so these
vehicles because of the higher number of insurances being sold are more profitable.
The segment of luxury vehicles isn’t profiting the company because the claim is
generally higher. Hence to make this sector profiting a different policy type needs to

be issued with a higher premium. Medium-size and Large-size vehicles are
generally the ones that are more profitable. On the contrary large vehicles are
prone to more frequent and costly claims.

ANNEXURE

IMPORTS/LIBRARIES USED
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error
from hyperopt import fmin,tpe,hp, STATUS_OK

HYPEROPT
Hyperopt is a library used for Bayesian Hyperparameter Optimization,
to select the hyperparameters that give the best results. We used
hyperopt for the best performing models, RandomForests and the fully
connected Neural Network. RandomForests performed better as the
Neural Network tended to overfit the data.

(Code for hyperopt for RandomForests attached)
best_r2 = -10
def f(space):

global best_nest, best_m_d, best_r2, rmse
nest = space['nest']
#lr = space['lr']/100
m_d = space['m_d']

regressor = RandomForestRegressor(n_estimators = int(nest), random_state = 0,
max_depth = m_d, min_samples_split = 2).fit(X_train, Y_train)
results = train_test(regressor, X_train, X_test, Y_train, Y_test)

r2 = results['r2']
if (r2 > best_r2):
best_r2 = r2
best_nest = nest
#best_lr = lr
best_m_d = m_d
rmse = results['RMSE']
print('best_nest = {}'.format(best_nest))
print('best_r2 = {}'.format(best_r2))
print('best_m_d = {}'.format(m_d))

return {'loss': rmse, 'status': STATUS_OK }

space = {
'nest': hp.quniform('nest', 100, 1000,10),
'm_d': hp.quniform('i',5,20,5)
}

best = fmin(fn=f,space=space,algo=tpe.suggest,max_evals=150)

Data Information

PANDAS PROFILING REPORT

CORRELATION HEATMAP

NORMALITY TEST/STANDARDIZATION OF DATA

Normality tests are used to determine if a d
ata set is well-modeled by
a normal distribution and to compute how likely it is for a random
variable underlying the data set to be normally distributed. Since we

have standardized the data, the mean of each column is zero and
standard deviation is 1.

PRINCIPAL COMPONENT ANALYSIS (PCA)

PCA is done to find the most relevant features for prediction. The
features Number of Policies, Monthly Premium Auto, Vehicle Class,
Total Claim Amount and Income were the most important.

APPLYING NEURAL NETWORK OF THE GIVEN DATA

from sklearn.ensemble import IsolationForest
clf = IsolationForest(max_samples = 100, random_state = 42)
clf.fit(train)
y_noano = clf.predict(train)
y_noano = pd.DataFrame(y_noano, columns = ['Top'])
y_noano[y_noano['Top'] == 1].index.values
train = train.iloc[y_noano[y_noano['Top'] == 1].index.values]
train.reset_index(drop = True, inplace = True)
print("Number of Outliers:", y_noano[y_noano['Top'] == -1].shape[0])
print("Number of rows without outliers:", train.shape[0])
# Columns for tensorflow
feature_cols = [tf.contrib.layers.real_valued_column(k) for k in FEATURES]
# Training set and Prediction set with the features to predict
training_set = train[COLUMNS]
prediction_set = train['Customer Lifetime Value']
# Train and Test

x_train, x_test, y_train, y_test = train_test_split(training_set[FEATURES] , prediction_set,
test_size=0.33, random_state=42)
y_train = pd.DataFrame(y_train, columns = [LABEL])
training_set = pd.DataFrame(x_train, columns = FEATURES).merge(y_train, left_index = True,
right_index = True)
training_set.head()
# Training for submission
training_sub = training_set[col_train]

Ingles 1000 Verbos
67% (3)
Ingles 1000 Verbos
73 pages
Notice: Agency Information Collection Activities Proposals, Submissions, and Approvals
No ratings yet
Notice: Agency Information Collection Activities Proposals, Submissions, and Approvals
1 page
Pakka
No ratings yet
Pakka
22 pages
Economic Theory and Operations Analysis-1
No ratings yet
Economic Theory and Operations Analysis-1
7 pages
Terex Rough Terrain Cranes Operators Manual
No ratings yet
Terex Rough Terrain Cranes Operators Manual
324 pages
(Research in Political Economy) P. Zarembka - The Capitalist State and Its Economy - Democracy in Socialism. 22-JAI Press (2005)
100% (3)
(Research in Political Economy) P. Zarembka - The Capitalist State and Its Economy - Democracy in Socialism. 22-JAI Press (2005)
307 pages
BUDT Individual - Project - 5 - Spark ML Regression and Classification
No ratings yet
BUDT Individual - Project - 5 - Spark ML Regression and Classification
3 pages
Tijd - Be-De Malaise Bij Beaulieu Familie de Clerck Moet Het Bedrijf Eindelijk Loslaten
No ratings yet
Tijd - Be-De Malaise Bij Beaulieu Familie de Clerck Moet Het Bedrijf Eindelijk Loslaten
3 pages
Acct Statement XX1888 17052024
No ratings yet
Acct Statement XX1888 17052024
36 pages
Tranzol
No ratings yet
Tranzol
12 pages
Berita Acara Comisioning
No ratings yet
Berita Acara Comisioning
2 pages
Booklist and Supplementary Materials For Iss
No ratings yet
Booklist and Supplementary Materials For Iss
16 pages
TA 8 - BT MLH - Unit 10
No ratings yet
TA 8 - BT MLH - Unit 10
6 pages
PQLI
No ratings yet
PQLI
3 pages
Contoh Tiket Citilink - Co.id
No ratings yet
Contoh Tiket Citilink - Co.id
1 page
Pikit Ice Plant
No ratings yet
Pikit Ice Plant
4 pages
Jpmorgan Chase Quant Finance Mentorship Case Study 2024
No ratings yet
Jpmorgan Chase Quant Finance Mentorship Case Study 2024
25 pages
Motilal Oswal Research Report
No ratings yet
Motilal Oswal Research Report
15 pages
Changzhou Meditech Technology Co., LTD.: 2022 CMF Instruments System
No ratings yet
Changzhou Meditech Technology Co., LTD.: 2022 CMF Instruments System
3 pages
Health Insurance Cross Sell Prediction
No ratings yet
Health Insurance Cross Sell Prediction
32 pages
ZeTheta Profile, JD and Live Projects Ver 1.2
No ratings yet
ZeTheta Profile, JD and Live Projects Ver 1.2
28 pages
Finance & Risk Analytics QSTN 1 - Credit Risk
No ratings yet
Finance & Risk Analytics QSTN 1 - Credit Risk
24 pages
Muslim Countries
No ratings yet
Muslim Countries
2 pages
II Puc Statistics Mock Paper 2
No ratings yet
II Puc Statistics Mock Paper 2
5 pages
RMC No 48-2018 - eCAR Processing Time
No ratings yet
RMC No 48-2018 - eCAR Processing Time
1 page
Canada Bay DRAFT Employment Strategy 190510
No ratings yet
Canada Bay DRAFT Employment Strategy 190510
151 pages
Articles 187
No ratings yet
Articles 187
15 pages
"How Much Is The Current and Non-Current Portion?": Current Assets (Current Liabilities) Working Capital A L + E
No ratings yet
"How Much Is The Current and Non-Current Portion?": Current Assets (Current Liabilities) Working Capital A L + E
3 pages
BA 4001 Security Analysis and Portfolio Management
No ratings yet
BA 4001 Security Analysis and Portfolio Management
2 pages
If The Coat Fits
100% (1)
If The Coat Fits
4 pages
Vikram University MBA Syllabus
No ratings yet
Vikram University MBA Syllabus
36 pages
STPR
No ratings yet
STPR
14 pages
Arvind Remedies Limited: Amount (Rs. Crore) Rating Action
No ratings yet
Arvind Remedies Limited: Amount (Rs. Crore) Rating Action
3 pages
Operation Research
No ratings yet
Operation Research
25 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
Economic Notes
No ratings yet
Economic Notes
4 pages
Assessment 4 Major Essay
No ratings yet
Assessment 4 Major Essay
7 pages
British Airways Internship Report
No ratings yet
British Airways Internship Report
26 pages
November 2010)
No ratings yet
November 2010)
6 pages
Introvert Leader
100% (1)
Introvert Leader
2 pages
Microeconomics Sapienza PHD Economia
No ratings yet
Microeconomics Sapienza PHD Economia
2 pages
Bank Loan Case Study
100% (1)
Bank Loan Case Study
24 pages
Monopoly Practice Quizes
No ratings yet
Monopoly Practice Quizes
1 page
40 Questions To Test A Data Scientist On Time Series
No ratings yet
40 Questions To Test A Data Scientist On Time Series
26 pages
International Tax Notes (Part II) For May & Nov 23
No ratings yet
International Tax Notes (Part II) For May & Nov 23
95 pages
ITOL Case Study
No ratings yet
ITOL Case Study
4 pages
CS 229 Project Report: Predicting Used Car Prices
100% (1)
CS 229 Project Report: Predicting Used Car Prices
5 pages
REVIEW OF LITERATURE Dharani
No ratings yet
REVIEW OF LITERATURE Dharani
5 pages
Project Report: A Study On The Financial Perfomance of The Western India Plywoods Ltd. Baliapatanam Kannur
No ratings yet
Project Report: A Study On The Financial Perfomance of The Western India Plywoods Ltd. Baliapatanam Kannur
105 pages
Fishwealth Canning Corp Vs Cir (GR 179343 January 21 2010)
No ratings yet
Fishwealth Canning Corp Vs Cir (GR 179343 January 21 2010)
2 pages
Review Patricia Crone S Book Arabia Without Spices
No ratings yet
Review Patricia Crone S Book Arabia Without Spices
35 pages
Advantages of The Formula Plan
No ratings yet
Advantages of The Formula Plan
3 pages
Marketing Analytics
No ratings yet
Marketing Analytics
111 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
MIS Reliance
0% (1)
MIS Reliance
3 pages
Tax Invoice: Warranty Expired:N
No ratings yet
Tax Invoice: Warranty Expired:N
3 pages
Estimate Current Market Size of Tyre Industry
No ratings yet
Estimate Current Market Size of Tyre Industry
3 pages
D-Raghavarao Block Designs
0% (1)
D-Raghavarao Block Designs
224 pages
BSF 4230 - Advanced Portfolio Management - April 2022
No ratings yet
BSF 4230 - Advanced Portfolio Management - April 2022
8 pages
Casestudy:-Indian Telecom War: Startup Reliance Takes On Leader Airtel in 4G Services
No ratings yet
Casestudy:-Indian Telecom War: Startup Reliance Takes On Leader Airtel in 4G Services
7 pages
Market Analysis - Tiffin & Catering Service, Mumbai Region
100% (1)
Market Analysis - Tiffin & Catering Service, Mumbai Region
10 pages
Reliability Theory and Survival Analysis Final
No ratings yet
Reliability Theory and Survival Analysis Final
12 pages
Internship CPCL
67% (3)
Internship CPCL
26 pages
Introduction To OR
No ratings yet
Introduction To OR
20 pages
Simulation Examples
No ratings yet
Simulation Examples
4 pages
Models Adopted For Retail Banking
No ratings yet
Models Adopted For Retail Banking
4 pages
Forecasting and Time Series Analysis by Douglas C Montgomery Lynwood Albert Johnson 0070428573
No ratings yet
Forecasting and Time Series Analysis by Douglas C Montgomery Lynwood Albert Johnson 0070428573
5 pages
ACPC Cut Off 2019 For MBA
No ratings yet
ACPC Cut Off 2019 For MBA
9 pages
A Project Report On "Analysis of Mutual Fund-As An Indicator For Investment Decision Making"
No ratings yet
A Project Report On "Analysis of Mutual Fund-As An Indicator For Investment Decision Making"
76 pages
Minitab Vs SPSS
No ratings yet
Minitab Vs SPSS
11 pages
of Sbi
33% (3)
of Sbi
31 pages
Solvedproblems PFM
0% (3)
Solvedproblems PFM
75 pages
PDF HRM Project Pak Suzuki Motors DD - PDF
No ratings yet
PDF HRM Project Pak Suzuki Motors DD - PDF
25 pages
Paint Industry
No ratings yet
Paint Industry
28 pages
SCM Lec 5 - (Chopra Chapter - 4)
0% (1)
SCM Lec 5 - (Chopra Chapter - 4)
29 pages
Practical Exercise 2 - Solution
No ratings yet
Practical Exercise 2 - Solution
5 pages
SWOT Analysis of Health Insurance Sector
50% (2)
SWOT Analysis of Health Insurance Sector
39 pages
Comparison Between Marketing Mix of Coca-Cola and Pepsico
No ratings yet
Comparison Between Marketing Mix of Coca-Cola and Pepsico
19 pages
S1-17-Mba ZC416-L16
No ratings yet
S1-17-Mba ZC416-L16
43 pages
Does CAPM Hold Good For Indian Stocks A Case Study of CNX Nifty Constituents
100% (1)
Does CAPM Hold Good For Indian Stocks A Case Study of CNX Nifty Constituents
14 pages
Unit 1
100% (3)
Unit 1
42 pages
ADL 14 - Production and Operation Management Material
No ratings yet
ADL 14 - Production and Operation Management Material
90 pages
Fuzzy Inference Systems
No ratings yet
Fuzzy Inference Systems
22 pages
What Is The FMCG Industry
No ratings yet
What Is The FMCG Industry
4 pages
316 Resource 10 (E) Simulation
100% (1)
316 Resource 10 (E) Simulation
17 pages
Or!
No ratings yet
Or!
25 pages
Term Paper For Macroeconomics PDF
No ratings yet
Term Paper For Macroeconomics PDF
14 pages
Kingfisher PPT Shweta Sharma
No ratings yet
Kingfisher PPT Shweta Sharma
19 pages
Strategic Management-Final Project On Kingfisher Airlines
No ratings yet
Strategic Management-Final Project On Kingfisher Airlines
16 pages
Information Systems Information Systems Used in ICICI BANK
No ratings yet
Information Systems Information Systems Used in ICICI BANK
24 pages

Project Report On Customer Lifetime Value

Uploaded by

Project Report On Customer Lifetime Value

Uploaded by

Open IIT Data Analytics

THE INDUSTRY ANALYSIS

1. Random sampling of training observations when building trees

Random sampling of training observations:

Random Subsets of features for splitting nodes:

General Working Steps:

Number of trees in the forest =100

NORMALITY TEST/STANDARDIZATION OF DATA

APPLYING NEURAL NETWORK OF THE GIVEN DATA

You might also like