PM Mock Test
PM Mock Test
Measurement
Variable Role Level Description
Policy Number ID Nominal Unique Policy Identifier
Year Input Nominal Year of manufacture
IDV Input Interval Insured Declared Value of Car
City Input Nominal City of registration of vehicle
State Input Nominal State of registration
Cubic Capacity Input Nominal Capacity of engine
Mfr_Model Input Nominal Manufacture and model of car
Total Premium paid for the policy at the
Premium Input Interval beginning of the term
Type Input Binary Source of lead
Gender Input Binary M/F
Channel Input Binary Lead generation channel
Age Input Nominal Age of applicant
Cover Type Input Binary Third Party or Comprehensive
PaymentFrequency Input Binary Annual payment or monthly instalments
Target_ClaimsInd Target Binary ClaimsY/N
Target_Claim_Amoun
Total claim amount
t Rejected Interval
Create a new project with name PMMock.
Open data source carinsure1 and score file carinsuretest1 from sasuser library with roles and measurement
levels as mentioned in the table above. Make sure you open the main file with role as raw and Score file with
role as Score. Create a new diagram.
Q. 1 Answer following questions
For this question, add statexplore to the file import and run
B. What are number of levels for following variables: Age, City, state, cubic_capacity & Mfr_model? What is
the mode of these variables?
C. Which variable has highest variable worth? Which variable has lowest variable worth?
D. What is the percentage of primary target in the dataset? (give upto four decimals)
How to see the branches -> view -> model -> subtree assessment plot -> output also for misclassification
c) Add decision tree node after partitioning for 3 branches (change the assessment measure to
misclassification throughout the paper) (if continuous use ASE)
How to see the branches -> view -> model -> subtree assessment plot -> output also for misclassification
d) Make sure you use appropriate model assessment method for binary target.
View -> models -> subtree assessment plot -> number of leaves = 5
2 branch tree -> results -> view -> model -> subtree assessment plot -> misclassification rate
2 branch tree -> results -> view -> model -> subtree assessment plot -> misclassification rate
Exactly same values but since 2 branch model has less branches so better model
D. Do you see any abnormality in the data? Do you need to do any replacements, imputations or
transformations before using trees? Why?
Abnormality means missing data and high skewness in quantitative/interval data (2: idv and premium)
None applicable
No, because trees handle it, they are non-parametric, they handle missing data, do not get affected by all
this.
In this case we do not have imputation, but in case of outliers, do transformation and then imputation.
Perform following steps.
If there is any imputation, do it after transformation since outliers need to be removed
Transform -> replace -> impute -> NN, regressions
a) Transformation – Use Max Normal transformation (modify -> transform variables -> interval inputs
(maximum normal) -> answer 3a
b) Regression – make sure in stepwise, entry level probability is 1 and stay probability is 0.5 and you do not
want more than 20 variables in the final model. (model->stepwise-> put after transform -> validation
misclassification -> selection default no -> selection options 3 dots (change according to question) -> if
variables are written then change steps in the end))
c) Poly Regression of degree 2– make sure in stepwise, entry level probability is 1 and stay probability is 0.5
and you do not want more than 20 variables in the final model. (copy regression -> poly terms yes ->
two factor interaction yes)
Now directly add transform variables to the data partition – set IDV and premium to Max normal then run
Ans. 0 since in output it shows 0 variables
7 variables
C. How many variables are in final model of Poly regression node? Are there any interaction terms in the
final model? If yes, which terms?
Assess -> model comparison -> selection statistic to misclassification rate -> selection table as validation
Perform following steps. (Please note that in Neural networks convergence needs to be achieved, if it is not
converging in default iterations, you can increase up to 500 iterations to achieve convergence.
b) Insert a neural network model after regression node – Do not enable Preliminary training, and select
number of hidden units as 6
Network
c) Insert Auto-neural model after regression node with number of hidden units =1, Tolerance = Low and
select only tanh activation function.
d) Insert a neural network model after 2 branch tree node – Do not enable Preliminary training, and select
number of hidden units as 3
e) Insert a neural network model after 2 branch tree node – Do not enable Preliminary training, and select
number of hidden units as 6
f) Insert Auto-neural model after 2 branch tree node with number of hidden units =1, Tolerance = Low and
select only tanh activation function.
Q. 4 Answer following questions (8 Marks)
B. Would you insert neural network after decision tree? Why or why not?
Answer:
Yes, in this case because there was no missing data and we did not perform any imputations.
No, you typically would not insert a neural network after a decision tree in SAS Miner if any imputations
were performed.
Reasoning:
Decision trees are non-parametric models that segment data into rules and conditions. Once a decision
tree is built, it does not provide a continuous transformation of the input variables that a neural network
can benefit from.
Neural networks work well with raw, numeric, or transformed data but are not typically applied to
tree-generated outputs.
If the decision tree is pruned, it might lose some important patterns, and applying a neural network
afterward would not regain lost information.
Instead, you could use ensemble techniques like boosting or bagging, or you could try feature
engineering and preprocessing before applying a neural network.
C. Would you insert neural network after poly Regression? Why or why not?
Answer: No, you typically would not insert a neural network after polynomial regression.
Reasoning:
Polynomial Regression already models non-linearity by introducing polynomial terms (e.g., x2,x3x^2,
x^3x2,x3).
Neural Networks are also designed for capturing complex, nonlinear relationships, so applying a neural
network after polynomial regression is redundant.
Overfitting Risk: Polynomial regression may already lead to overfitting if the degree is high. Adding a
neural network might amplify this issue.
Better Approach: Instead of chaining them, use either:
o Polynomial Regression for simpler problems with clear curve fitting.
o Neural Networks for complex relationships when polynomial regression is insufficient.
Use ensemble model and connect (1) 2 branch tree, (2) regression, (3) neural network with 6 hidden units from
decision tree node, (4) Auto-neural from regression and (5) poly regression node to this node, give criteria as
voting
Use Model comparison node and connect all of the above models (including ensemble) to the model assessment
node use appropriate selection statistics for binary target.
Use score to score the data and find the accuracy of the score data. Answer following questions
Q. 5 Answer following questions
Model comparison
2. Without changing assessment criteria within the existing models, which model will you prefer if you
want to maximize cumulative lift on validation set? Why this method is not appropriate for this data?
Neural 4
Lift is used when we are interested only in a part of the model. Cumulative lift’s objective to identify each
observation correctly. Only when our problem is marketing or buyer type of problem, only then use this.
If claim occurrences (Target_ClaimsInd) are rare (which is often the case in insurance datasets),
cumulative lift may be misleading.
A high lift may come from overfitting to a small group of claimants rather than generalizing well.
Lift is useful for marketing applications where ranking matters (e.g., who is most likely to respond to an
offer).
In insurance claims prediction, accuracy (Misclassification Rate) and false positive/false negative
control are more important than just ranking.
The dataset already defines Validation Misclassification Rate as the primary criterion.
Switching to Lift for model selection might lead to a model that ranks well but misclassifies claims,
which is risky in an insurance context.
3. Give Classification table for score data. What is the accuracy of the best model on score data?
Drag score and connect it to the score wala data -> role change to score in the input data -> run it ->
exported data -> copy to score excel and make pivot table to this
4. If you create a profit/cost matrix where cost of non-identifying claim is 8000, which is the best model?
What is total expected cost for score data assuming cutoff of 0.5? Neural 2
For this part, do not make any changes to the original file, copy it and make changes to the other file.
Import data -> Role is to be selected as Score -> Then bring score from assess -> correct the paths of both the
inputs
Decisions -> apply decisions yes -> custom editor 3 dots -> build -> decisions -> decision weights -> minimize or
maximize profits or costs
Model comparison:
Ensemble voting
5. Based on the above question, can you conclude that the model can be deployed?
It is considering every one as claimer, it will do optimization hence the last column has not classified
anything as non-claimer and hence no cost-> not be deployed
Best Luck