0% found this document useful (0 votes)

2 views17 pages

PM Mock Test

The document outlines a project for predicting car insurance claims using a dataset provided by CarZuma. It includes detailed instructions on data handling, model creation, and evaluation criteria for various predictive models, including decision trees and neural networks. The document emphasizes the importance of model assessment methods and the implications of using different models in the context of insurance claims prediction.

Uploaded by

Dhruvi Sethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views17 pages

PM Mock Test

Uploaded by

Dhruvi Sethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

SVKM’s NMIMS University

Date: Total Marks: 30

Time: 3 Hours
Instructions:
1. All Questions are compulsory.
2. Answer to each new question to be started on a fresh page.
3. Figures in brackets on the right-hand side indicate full marks.
4. Create a new project with name inscribed on your answer books.
5. Whenever best model is asked use appropriate criteria for assessment and mention the value of the
criteria for best model, for example if criteria is MSE, mention the best MSE value in the answer.
If target is categorical, validation misclassification rate is the assessment criteria. If continuous, use
Average squared error
If you are predicting claim occurrence (Target_ClaimsInd): Use Validation Misclassification Rate -> binary
If you are predicting claim amount (Target_Claim_Amount): Use Average Squared Error (ASE) (rejected)
6. Open data source carinsure1 and score file carinsuretest1
7. Make any assumptions wherever necessary
8. Follow the instructions clearly, failure to do so may lead in incorrect results

CarZuma: Car Insurance Claim Case Study

A car insurance company, CarZuma, has collected some of the past data about their
clients or customers that what are their attributes as well as their insured vehicles
attributes while insuring the same. They have heard a lot about the data analytics and are
very sure that their competitors are also using some of these techniques to beat the
competition. They have to be very focused to their target segment. You are expected to
predict Customers that would claim.

Measurement
Variable Role Level Description
Policy Number ID Nominal Unique Policy Identifier
Year Input Nominal Year of manufacture
IDV Input Interval Insured Declared Value of Car
City Input Nominal City of registration of vehicle
State Input Nominal State of registration
Cubic Capacity Input Nominal Capacity of engine
Mfr_Model Input Nominal Manufacture and model of car
Total Premium paid for the policy at the
Premium Input Interval beginning of the term
Type Input Binary Source of lead
Gender Input Binary M/F
Channel Input Binary Lead generation channel
Age Input Nominal Age of applicant
Cover Type Input Binary Third Party or Comprehensive
PaymentFrequency Input Binary Annual payment or monthly instalments
Target_ClaimsInd Target Binary ClaimsY/N
Target_Claim_Amoun
Total claim amount
t Rejected Interval
Create a new project with name PMMock.

Open data source carinsure1 and score file carinsuretest1 from sasuser library with roles and measurement
levels as mentioned in the table above. Make sure you open the main file with role as raw and Score file with
role as Score. Create a new diagram.
Q. 1 Answer following questions

For this question, add statexplore to the file import and run

A. For each quantitative (interval) variable what fill following table:

Standard Minimu
Variable Role Mean Deviation Missing m Median Maximum Skewness Kurtosis

B. What are number of levels for following variables: Age, City, state, cubic_capacity & Mfr_model? What is
the mode of these variables?

C. Which variable has highest variable worth? Which variable has lowest variable worth?
D. What is the percentage of primary target in the dataset? (give upto four decimals)

1 is primary and 2 is secondary

Perform following steps.

a) Sample -> Data partition next to file import-> Partition the data Train=70% validation =30%
b) Add decision tree node after partitioning for 2 branches (change the assessment measure to
misclassification throughout the paper) (if continuous use ASE)

Because the dependent variable is classification type (targetind)

How to see the branches -> view -> model -> subtree assessment plot -> output also for misclassification

c) Add decision tree node after partitioning for 3 branches (change the assessment measure to
misclassification throughout the paper) (if continuous use ASE)

Because the dependent variable is classification type (targetind)

How to see the branches -> view -> model -> subtree assessment plot -> output also for misclassification

d) Make sure you use appropriate model assessment method for binary target.

Check for complexity if same

Q. 2 Answer following questions

A. How many leaves are in the final model of 2 branch tree?

View -> models -> subtree assessment plot -> number of leaves = 5
2 branch tree -> results -> view -> model -> subtree assessment plot -> misclassification rate

B. How many leaves are in the final model of 3 branch tree?

2 branch tree -> results -> view -> model -> subtree assessment plot -> misclassification rate
Exactly same values but since 2 branch model has less branches so better model

C. Which of the above trees is better model?

D. Do you see any abnormality in the data? Do you need to do any replacements, imputations or
transformations before using trees? Why?

Abnormality means missing data and high skewness in quantitative/interval data (2: idv and premium)

None applicable

No, because trees handle it, they are non-parametric, they handle missing data, do not get affected by all
this.
In this case we do not have imputation, but in case of outliers, do transformation and then imputation.
Perform following steps.
If there is any imputation, do it after transformation since outliers need to be removed
Transform -> replace -> impute -> NN, regressions

a) Transformation – Use Max Normal transformation (modify -> transform variables -> interval inputs
(maximum normal) -> answer 3a

b) Regression – make sure in stepwise, entry level probability is 1 and stay probability is 0.5 and you do not
want more than 20 variables in the final model. (model->stepwise-> put after transform -> validation
misclassification -> selection default no -> selection options 3 dots (change according to question) -> if
variables are written then change steps in the end))

c) Poly Regression of degree 2– make sure in stepwise, entry level probability is 1 and stay probability is 0.5
and you do not want more than 20 variables in the final model. (copy regression -> poly terms yes ->
two factor interaction yes)

Q. 3 Answer following questions

A. Which variables have skewness more than 2 after transformation?

Now directly add transform variables to the data partition – set IDV and premium to Max normal then run
Ans. 0 since in output it shows 0 variables

B. How many variables are in final model of regression node?

In the output go in the end and look for this:

7 variables

C. How many variables are in final model of Poly regression node? Are there any interaction terms in the
final model? If yes, which terms?

Output to be seen in the misclassification only

D. Which of the above two models is better model?

Assess -> model comparison -> selection statistic to misclassification rate -> selection table as validation
Perform following steps. (Please note that in Neural networks convergence needs to be achieved, if it is not
converging in default iterations, you can increase up to 500 iterations to achieve convergence.

MODEL NEURAL NETWORK (needs to be converged)

a) Insert a neural network model after regression node – Do not enable Preliminary training, and select
number of hidden units as 3 (optimization max iterations as 500) -> model selection criteria to
misclassification -> this will be converged (cntrl F)

b) Insert a neural network model after regression node – Do not enable Preliminary training, and select
number of hidden units as 6
Network
c) Insert Auto-neural model after regression node with number of hidden units =1, Tolerance = Low and
select only tanh activation function.

d) Insert a neural network model after 2 branch tree node – Do not enable Preliminary training, and select
number of hidden units as 3
e) Insert a neural network model after 2 branch tree node – Do not enable Preliminary training, and select
number of hidden units as 6
f) Insert Auto-neural model after 2 branch tree node with number of hidden units =1, Tolerance = Low and
select only tanh activation function.
Q. 4 Answer following questions (8 Marks)

A. Which of the above six models is best? (4 Marks)

B. Would you insert neural network after decision tree? Why or why not?
Answer:
Yes, in this case because there was no missing data and we did not perform any imputations.

No, you typically would not insert a neural network after a decision tree in SAS Miner if any imputations
were performed.
Reasoning:
 Decision trees are non-parametric models that segment data into rules and conditions. Once a decision
tree is built, it does not provide a continuous transformation of the input variables that a neural network
can benefit from.
 Neural networks work well with raw, numeric, or transformed data but are not typically applied to
tree-generated outputs.
 If the decision tree is pruned, it might lose some important patterns, and applying a neural network
afterward would not regain lost information.
 Instead, you could use ensemble techniques like boosting or bagging, or you could try feature
engineering and preprocessing before applying a neural network.
C. Would you insert neural network after poly Regression? Why or why not?
Answer: No, you typically would not insert a neural network after polynomial regression.
Reasoning:
 Polynomial Regression already models non-linearity by introducing polynomial terms (e.g., x2,x3x^2,
x^3x2,x3).
 Neural Networks are also designed for capturing complex, nonlinear relationships, so applying a neural
network after polynomial regression is redundant.
 Overfitting Risk: Polynomial regression may already lead to overfitting if the degree is high. Adding a
neural network might amplify this issue.
 Better Approach: Instead of chaining them, use either:
o Polynomial Regression for simpler problems with clear curve fitting.
o Neural Networks for complex relationships when polynomial regression is insufficient.

Use ensemble model and connect (1) 2 branch tree, (2) regression, (3) neural network with 6 hidden units from
decision tree node, (4) Auto-neural from regression and (5) poly regression node to this node, give criteria as
voting

Use Model comparison node and connect all of the above models (including ensemble) to the model assessment
node use appropriate selection statistics for binary target.

Use score to score the data and find the accuracy of the score data. Answer following questions
Q. 5 Answer following questions

1. Which is the best model for binary target? Neural3

Model comparison

2. Without changing assessment criteria within the existing models, which model will you prefer if you
want to maximize cumulative lift on validation set? Why this method is not appropriate for this data?
Neural 4
Lift is used when we are interested only in a part of the model. Cumulative lift’s objective to identify each
observation correctly. Only when our problem is marketing or buyer type of problem, only then use this.

Imbalanced Data Issue

 If claim occurrences (Target_ClaimsInd) are rare (which is often the case in insurance datasets),
cumulative lift may be misleading.

 A high lift may come from overfitting to a small group of claimants rather than generalizing well.

Business Decision Perspective

 Lift is useful for marketing applications where ranking matters (e.g., who is most likely to respond to an
offer).

 In insurance claims prediction, accuracy (Misclassification Rate) and false positive/false negative
control are more important than just ranking.

Existing Assessment Criteria Conflict

 The dataset already defines Validation Misclassification Rate as the primary criterion.

 Switching to Lift for model selection might lead to a model that ranks well but misclassifies claims,
which is risky in an insurance context.

3. Give Classification table for score data. What is the accuracy of the best model on score data?

Drag score and connect it to the score wala data -> role change to score in the input data -> run it ->
exported data -> copy to score excel and make pivot table to this
4. If you create a profit/cost matrix where cost of non-identifying claim is 8000, which is the best model?
What is total expected cost for score data assuming cutoff of 0.5? Neural 2

For this part, do not make any changes to the original file, copy it and make changes to the other file.

Import data -> Role is to be selected as Score -> Then bring score from assess -> correct the paths of both the
inputs

Connect file import -> decisions -> data partition

Decisions -> apply decisions yes -> custom editor 3 dots -> build -> decisions -> decision weights -> minimize or
maximize profits or costs

Change all the decision tree criteria to average squared error

For regression and poly regression, make validation profit loss

All neural networks, make profit/loss

Model comparison:

Ensemble voting
5. Based on the above question, can you conclude that the model can be deployed?

It is considering every one as claimer, it will do optimization hence the last column has not classified
anything as non-claimer and hence no cost-> not be deployed
Best Luck

Sensitivity = TP / (TP + FN)

Specificity = True Negatives (TN) / (True Negatives (TN) + False Positives (FP))

ML Interview Questions PDF
100% (5)
ML Interview Questions PDF
20 pages
Test Bank
No ratings yet
Test Bank
55 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Machine Learning Interview Questions PDF
No ratings yet
Machine Learning Interview Questions PDF
14 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
QB AMT305module 2
No ratings yet
QB AMT305module 2
4 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
Project Data Mining
No ratings yet
Project Data Mining
55 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
MLfinal 1
No ratings yet
MLfinal 1
7 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
Week1 Assignment
No ratings yet
Week1 Assignment
6 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
PA Answers
No ratings yet
PA Answers
4 pages
ISE 529 Mock Test Answers
No ratings yet
ISE 529 Mock Test Answers
6 pages
Machine Learning Suggestion (2 Marks) MCQ
No ratings yet
Machine Learning Suggestion (2 Marks) MCQ
5 pages
2 Mark Questions
No ratings yet
2 Mark Questions
13 pages
Model Paper - Applied Machine Learning
No ratings yet
Model Paper - Applied Machine Learning
3 pages
Axioms
No ratings yet
Axioms
3 pages
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
No ratings yet
Solution of Final Exam: 10-701/15-781 Machine Learning: Fall 2004 Dec. 12th 2004
27 pages
Cie 2
No ratings yet
Cie 2
4 pages
Practice Quiz M1 (Ungraded) 2
No ratings yet
Practice Quiz M1 (Ungraded) 2
4 pages
Aam Ut-1 QB Ans (Final)
No ratings yet
Aam Ut-1 QB Ans (Final)
26 pages
Machine Learning Axioms Q&A
No ratings yet
Machine Learning Axioms Q&A
3 pages
Fraud Claim Detection
No ratings yet
Fraud Claim Detection
13 pages
Finals 19
No ratings yet
Finals 19
16 pages
Machine Learning One Mark Answers
No ratings yet
Machine Learning One Mark Answers
4 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
University of Mumbai Examination 2020 Under Cluster - (Lead College Short Name)
No ratings yet
University of Mumbai Examination 2020 Under Cluster - (Lead College Short Name)
12 pages
Questions For Chapter 2
No ratings yet
Questions For Chapter 2
6 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Credit Risk Project
No ratings yet
Credit Risk Project
11 pages
Machine Learning - AKTU PAPER (Session 2019 - 2020)
No ratings yet
Machine Learning - AKTU PAPER (Session 2019 - 2020)
10 pages
Sample Question Paper
No ratings yet
Sample Question Paper
4 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Mid Term Exam 15.062 Data Mining: Problem 1 (25 Points)
No ratings yet
Mid Term Exam 15.062 Data Mining: Problem 1 (25 Points)
4 pages
Test Bank
No ratings yet
Test Bank
55 pages
Basic Interview Q's On ML PDF
100% (2)
Basic Interview Q's On ML PDF
243 pages
Finals 19
No ratings yet
Finals 19
16 pages
Aam Ut-1 QB Ans - (Final)
No ratings yet
Aam Ut-1 QB Ans - (Final)
28 pages
Data Mining For Intelligence
No ratings yet
Data Mining For Intelligence
4 pages
Midterm Sample
No ratings yet
Midterm Sample
16 pages
Nptel Week 6 - 2
No ratings yet
Nptel Week 6 - 2
4 pages
ML Assignment NPTEL
No ratings yet
ML Assignment NPTEL
25 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Final Report
No ratings yet
Final Report
17 pages
ML Suggestion 2
No ratings yet
ML Suggestion 2
11 pages
10-701 Midterm Exam, Fall 2007
No ratings yet
10-701 Midterm Exam, Fall 2007
25 pages
Multiple Choice Questions
No ratings yet
Multiple Choice Questions
56 pages
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
Giant Pile ML Problems
No ratings yet
Giant Pile ML Problems
56 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
6 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
No ratings yet
Chapter Three: Lecture 1: Solving Problems by Searching and Constraint Satisfaction Problem
53 pages
(Pec Cs701e)
No ratings yet
(Pec Cs701e)
4 pages
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
No ratings yet
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
20 pages
MTech I YEAR - II SEM QB
No ratings yet
MTech I YEAR - II SEM QB
12 pages
Sample Questions
No ratings yet
Sample Questions
6 pages
Quality Control in The Development Process of AI System On Ships
No ratings yet
Quality Control in The Development Process of AI System On Ships
7 pages
Ch2 Wiener Filters
No ratings yet
Ch2 Wiener Filters
80 pages
Icipcn 2020 1
No ratings yet
Icipcn 2020 1
2 pages
A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity
No ratings yet
A Quantum-Classical Collaborative Training Architecture Based On Quantum State Fidelity
13 pages
Unit-I - ADS - IMP QP
No ratings yet
Unit-I - ADS - IMP QP
2 pages
3 Variable Cramers Rule PDF
No ratings yet
3 Variable Cramers Rule PDF
4 pages
Agents and Environment
No ratings yet
Agents and Environment
35 pages
04 Handout 1 (Feedback)
No ratings yet
04 Handout 1 (Feedback)
11 pages
Chapter7 Rock Mechanics Interactions
No ratings yet
Chapter7 Rock Mechanics Interactions
7 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
No ratings yet
IEM 4103 Quality Control & Reliability Analysis IEM 5103 Breakthrough Quality & Reliability
42 pages
Fin1819 Sample
No ratings yet
Fin1819 Sample
3 pages
Simplified Unit 4 and 5 Study Material
No ratings yet
Simplified Unit 4 and 5 Study Material
34 pages
Key Concepts On Deep Neural Networks
No ratings yet
Key Concepts On Deep Neural Networks
8 pages
Digital Certificates and Digital Signature
No ratings yet
Digital Certificates and Digital Signature
5 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
Non Linear Regression Saturation Growth Curve
No ratings yet
Non Linear Regression Saturation Growth Curve
2 pages
DFA Interpretation Help
No ratings yet
DFA Interpretation Help
36 pages
Classification of Malware Detection Using Machine Learning Algorithms A Survey
No ratings yet
Classification of Malware Detection Using Machine Learning Algorithms A Survey
7 pages
Dsa Assignment
No ratings yet
Dsa Assignment
37 pages
Signals Spectra, and Signal Processing Laboratory: October 7, 2020 October 7, 2020
No ratings yet
Signals Spectra, and Signal Processing Laboratory: October 7, 2020 October 7, 2020
16 pages
Algorithms Lab Viva Questions
No ratings yet
Algorithms Lab Viva Questions
2 pages
Errors in Numerical Calculations
100% (1)
Errors in Numerical Calculations
8 pages
StyleSwin Transformer-Based GAN For High-Resolution Image Generation
No ratings yet
StyleSwin Transformer-Based GAN For High-Resolution Image Generation
11 pages
ASM, Image Search N Classification-2
No ratings yet
ASM, Image Search N Classification-2
4 pages