0% found this document useful (0 votes)
26 views11 pages

GLM Sol

The document discusses fitting different GLMs to model claim numbers using Poisson distribution. It finds that factors like age, car group, area and no-claim discount have significant impact on claim numbers. It also shows that male policyholders report higher claim numbers than females.

Uploaded by

Akshita Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views11 pages

GLM Sol

The document discusses fitting different GLMs to model claim numbers using Poisson distribution. It finds that factors like age, car group, area and no-claim discount have significant impact on claim numbers. It also shows that male policyholders report higher claim numbers than females.

Uploaded by

Akshita Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CHAPTER 12 SOLUTION PAPER B

Solution 1
> #(i)Open file in R
> indices<-read.csv(file.choose(),header=TRUE);indices
Month Sensex BM CD EN FM FI HC
1 Feb-06 0.0444 0.0364 0.0291 -0.0179 0.1233 0.0103 0.0769
2 Mar-06 0.0841 0.1632 0.0783 0.1032 0.1158 0.0185 0.0822
3 Apr-06 0.0654 0.1513 0.0415 0.1518 0.0445 -0.0093 0.0095
4 May-06 -0.1468 -0.1939 -0.0884 -0.1015 -0.2018 -0.0885 -0.1394
5 Jun-06 0.0201 -0.0252 -0.0691 0.0347 0.0314 -0.0806 -0.0784
6 Jul-06 0.0126 -0.0177 -0.0280 -0.0478 -0.0361 0.0536 0.0298
(….continue)

> # New column


> indices$Sensex_direction<-ifelse(indices$Sensex>0,"Positive","Negative")
> View(indices)
> class(indices$Sensex_direction)
[1] "character"

> # GLM MODEL


>
> glmmodel<-glm(indices$Sensex_direction~indices$BM+indices$CD+indices$EN+indices$FM+i
ndices$FI+indices$HC+indices$IN+indices$IT+indices$TE+indices$UT,family = binomial(lin
k="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
> summary(glmmodel)

Call:
glm(formula = indices$Sensex_direction ~ indices$BM + indices$CD +
indices$EN + indices$FM + indices$FI + indices$HC + indices$IN +
indices$IT + indices$TE + indices$UT, family = binomial(link = "logit"))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.27544 -0.00117 0.00000 0.01354 1.75651

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0086 0.7315 -1.379 0.16796
indices$BM 7.7977 16.5255 0.472 0.63703
indices$CD -87.5335 42.6785 -2.051 0.04027 *
indices$EN 93.9675 38.3193 2.452 0.01420 *
indices$FM 41.1745 20.1436 2.044 0.04095 *
indices$FI 172.8807 60.8192 2.843 0.00448 **
indices$HC -6.4294 13.9394 -0.461 0.64463
indices$IN 4.1735 18.2152 0.229 0.81877
indices$IT 78.3494 30.9307 2.533 0.01131 *
indices$TE 29.9111 13.4184 2.229 0.02581 *
indices$UT -14.4767 23.0602 -0.628 0.53015
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 223.213 on 163 degrees of freedom


Residual deviance: 32.905 on 153 degrees of freedom
AIC: 54.905

Number of Fisher Scoring iterations: 11

(iv)

Sectors which have significantly impacted the direction of Sensex returns are CD, EN,
FI, FM, IT and TE at 95% Confidence level. But only FI has impacted the Sensex
direction at 99% Confidence level
CHAPTER 12 SOLUTION PAPER B

(v) # Residual Plot


> plot(glmmodel$residuals)

outlier

> outlier_position<-which(glmmodel$residuals==min(glmmodel$residuals));outlier_position
41
41
> indices$Month[outlier_position]
[1] Jun-09
164 Levels: Apr-06 Apr-07 Apr-08 Apr-09 Apr-10 Apr-11 ... Sep-19

(vi)
Interpretation [2]
As the residual deviance came down significantly from Null Deviance of 223.21 to
32.90, the variables are able to classify the direction appropriately

One huge outlier (Jun-09) can impact the accuracy of the result (Removing this may
reduce the residual deviance further)

The independent variables are not independent and they are interdependent
(Correlations are very high among the sectors). Hence the standard errors may not be
appropriate

(vii)
> # Refit the model
> model2<-glm(indices$Sensex_direction~indices$CD+indices$EN+indices$FI+indices$IT+ind
ices$TE+indices$FM,family = binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

(viii)
> summary(glmmodel)

Call:
glm(formula = indices$Sensex_direction ~ indices$BM + indices$CD +
indices$EN + indices$FM + indices$FI + indices$HC + indices$IN +
indices$IT + indices$TE + indices$UT, family = binomial(link = "logit"))

Deviance Residuals:
Min 1Q Median 3Q Max
-2.27544 -0.00117 0.00000 0.01354 1.75651

Coefficients:
Estimate Std. Error z value Pr(>|z|)
CHAPTER 12 SOLUTION PAPER B

(Intercept) -1.0086 0.7315 -1.379 0.16796


indices$BM 7.7977 16.5255 0.472 0.63703
indices$CD -87.5335 42.6785 -2.051 0.04027 *
indices$EN 93.9675 38.3193 2.452 0.01420 *
indices$FM 41.1745 20.1436 2.044 0.04095 *
indices$FI 172.8807 60.8192 2.843 0.00448 **
indices$HC -6.4294 13.9394 -0.461 0.64463
indices$IN 4.1735 18.2152 0.229 0.81877
indices$IT 78.3494 30.9307 2.533 0.01131 *
indices$TE 29.9111 13.4184 2.229 0.02581 *
indices$UT -14.4767 23.0602 -0.628 0.53015
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 223.213 on 163 degrees of freedom


Residual deviance: 32.905 on 153 degrees of freedom
AIC: 54.905

Number of Fisher Scoring iterations: 11


(ix)
> # model comparison
> anova(glmmodel,model2)
Analysis of Deviance Table

Model 1: indices$Sensex_direction ~ indices$BM + indices$CD + indices$EN +


indices$FM + indices$FI + indices$HC + indices$IN + indices$IT +
indices$TE + indices$UT
Model 2: indices$Sensex_direction ~ indices$CD + indices$EN + indices$FI +
indices$IT + indices$TE + indices$FM
Resid. Df Resid. Dev Df Deviance
1 153 32.905
2 157 33.646 -4 -0.74102

Interpretation [1]
p-value of the comparison is 0.94 > 0.05 thus not rejecting the null hypothesis of
no significant difference between the two models. So the model did not improve
significantly based on the friend's suggestion
CHAPTER 12 SOLUTION PAPER B

Solution 2
(i) All values are positive integer with some values more than 1, so use Poisson distribution as error
structure

(ii)
model <- glm(formula = Claim.number ~ Age + factor(Car.Group) + Area + factor(NCD) + Gender, data = datatrain, family = poisson())
summary(model)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3203 -0.5624 -0.4445 -0.3185 3.4254

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.895413 0.245584 -7.718 1.18e-14 ***
Age -0.008371 0.001231 -6.802 1.03e-11 ***
factor(Car.Group)2 0.283162 0.285122 0.993 0.320648
factor(Car.Group)3 0.311122 0.282852 1.100 0.271356
factor(Car.Group)4 0.116037 0.292479 0.397 0.691563
factor(Car.Group)5 0.578672 0.265600 2.179 0.029352 *
factor(Car.Group)6 0.912683 0.251178 3.634 0.000279 ***
factor(Car.Group)7 0.260364 0.287397 0.906 0.364968
factor(Car.Group)8 0.914236 0.253831 3.602 0.000316 ***
factor(Car.Group)9 0.877120 0.252408 3.475 0.000511 ***
factor(Car.Group)10 0.914426 0.250011 3.658 0.000255 ***
factor(Car.Group)11 0.799044 0.250434 3.191 0.001420 **
factor(Car.Group)12 1.025303 0.245168 4.182 2.89e-05 ***
factor(Car.Group)13 1.011678 0.248305 4.074 4.61e-05 ***
factor(Car.Group)14 1.118560 0.242695 4.609 4.05e-06 ***
factor(Car.Group)15 1.103179 0.244711 4.508 6.54e-06 ***
factor(Car.Group)16 0.996932 0.247455 4.029 5.61e-05 ***
factor(Car.Group)17 1.128584 0.242389 4.656 3.22e-06 ***
factor(Car.Group)18 1.198728 0.239721 5.001 5.72e-07 ***
factor(Car.Group)19 1.422781 0.238307 5.970 2.37e-09 ***
factor(Car.Group)20 1.317913 0.238579 5.524 3.31e-08 ***
AreaEast Midlands 0.146664 0.132611 1.106 0.268739
AreaLondon 0.318305 0.126773 2.511 0.012045 *
AreaNI 0.393303 0.125065 3.145 0.001662 **
AreaNorth East -0.060812 0.138426 -0.439 0.660439
AreaNorth West -0.193799 0.143745 -1.348 0.177590
AreaSouth East -0.323157 0.151830 -2.128 0.033303 *
AreaSouth West -0.097663 0.142546 -0.685 0.493256
AreaWales -0.309704 0.148283 -2.089 0.036744 *
AreaWest Midlands -0.068206 0.141474 -0.482 0.629730
AreaYorkshire and the Humber 0.100272 0.132276 0.758 0.448421
factor(NCD)1 -0.456078 0.086523 -5.271 1.36e-07 ***
factor(NCD)2 -0.679426 0.091303 -7.441 9.96e-14 ***
factor(NCD)3 -0.885118 0.096849 -9.139 < 2e-16 ***
factor(NCD)4 -0.963244 0.101502 -9.490 < 2e-16 ***
factor(NCD)5 -1.097532 0.104299 -10.523 < 2e-16 ***
GenderMale 0.259982 0.064879 4.007 6.14e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4569.9 on 7999 degrees of freedom
Residual deviance: 4085.6 on 7963 degrees of freedom
AIC: 6455.7 [10]
CHAPTER 12 SOLUTION PAPER B

(ii) Male policyholders have higher mean of reported claims (by exp(0.259982) − 1 = 29.7%) than female policyholders.
The difference is significant (p-value = 6.14e-5).[10]

(iii) Compare to Null model; the deviance is reduced by 484.3 while the degrees of freedom reduce by 36.
The observed difference in deviance (484.3) is very high compared to the values of the 𝜒𝜒36 2
distribution, so the fitted model is significant/good (alternatively, compare the deviance of the fitted
model (4085.6) to the 𝜒𝜒7963 2 distribution.)

(iv)
a) datatrain$Age2= datatrain$Age^2 {2}
(b)
model1 <- glm(formula = Claim.number ~ Age + Age2 +
factor(Car.Group) + Area + factor(NCD) + Gender, data = datatrain,
family = poisson())
summary(model1)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.2760 -0.5532 -0.4261 -0.3063 3.2856
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.118e-01 2.816e-01 -1.462 0.143613
Age -7.268e-02 6.329e-03 -11.484 < 2e-16 ***
Age2 5.603e-04 5.405e-05 10.366 < 2e-16 ***
factor(Car.Group)2 2.913e-01 2.852e-01 1.022 0.307011
factor(Car.Group)3 2.854e-01 2.829e-01 1.009 0.313049
factor(Car.Group)4 1.361e-01 2.925e-01 0.465 0.641617
factor(Car.Group)5 5.608e-01 2.656e-01 2.111 0.034750 *
factor(Car.Group)6 8.935e-01 2.512e-01 3.557 0.000376 ***
factor(Car.Group)7 2.729e-01 2.875e-01 0.949 0.342393
factor(Car.Group)8 9.135e-01 2.539e-01 3.598 0.000320 ***
factor(Car.Group)9 8.766e-01 2.524e-01 3.473 0.000515 ***
factor(Car.Group)10 9.127e-01 2.501e-01 3.650 0.000263 ***
factor(Car.Group)11 8.012e-01 2.505e-01 3.198 0.001383 **
factor(Car.Group)12 1.039e+00 2.452e-01 4.239 2.25e-05 ***
factor(Car.Group)13 9.643e-01 2.483e-01 3.883 0.000103 ***
factor(Car.Group)14 1.109e+00 2.427e-01 4.568 4.93e-06 ***
factor(Car.Group)15 1.111e+00 2.445e-01 4.543 5.54e-06 ***
factor(Car.Group)16 9.760e-01 2.474e-01 3.945 7.99e-05 ***
factor(Car.Group)17 1.131e+00 2.424e-01 4.667 3.06e-06 ***
factor(Car.Group)18 1.188e+00 2.397e-01 4.957 7.14e-07 ***
factor(Car.Group)19 1.420e+00 2.383e-01 5.962 2.49e-09 ***
factor(Car.Group)20 1.322e+00 2.386e-01 5.540 3.03e-08 ***
AreaEast Midlands 9.654e-02 1.327e-01 0.727 0.466981
AreaLondon 3.190e-01 1.268e-01 2.516 0.011863 *
AreaNI 3.837e-01 1.251e-01 3.067 0.002161 **
AreaNorth East -6.318e-02 1.385e-01 -0.456 0.648255
AreaNorth West -2.015e-01 1.438e-01 -1.401 0.161092
AreaSouth East -3.268e-01 1.519e-01 -2.151 0.031438 *
AreaSouth West -1.148e-01 1.426e-01 -0.805 0.420806
AreaWales -3.166e-01 1.484e-01 -2.134 0.032820 *
AreaWest Midlands -8.999e-02 1.415e-01 -0.636 0.524886
AreaYorkshire and the Humber 1.060e-01 1.323e-01 0.801 0.423004
factor(NCD)1 -4.530e-01 8.657e-02 -5.233 1.67e-07 ***
factor(NCD)2 -6.705e-01 9.130e-02 -7.344 2.07e-13 ***
factor(NCD)3 -8.753e-01 9.692e-02 -9.031 < 2e-16 ***
CHAPTER 12 SOLUTION PAPER B

factor(NCD)4 -9.487e-01 1.016e-01 -9.342 < 2e-16 ***


factor(NCD)5 -1.089e+00 1.043e-01 -10.439 < 2e-16 ***
GenderMale 2.657e-01 6.496e-02 4.091 4.30e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4569.9 on 7999 degrees of freedom
Residual deviance: 3981.3 on 7962 degrees of freedom
AIC: 6353.4

(c ) The p-value of the age squared coefficient shows that it is significant. Also, the deviance is reduced
more than twice the change in degrees of freedom. So the variable is significantly associated with the
number of reported claims.
CHAPTER 12 SOLUTION PAPER B

Solution 3
Linear predictor for modelling:
(𝑎)𝛼𝑖 + 𝛽 × 𝑡𝑒𝑚𝑝: 𝑤ℎ𝑒𝑟𝑒 𝑡ℎ𝑒 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝛼𝑖, 𝑖 = 1,2 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑒𝑚𝑒𝑠𝑡𝑒𝑟 [2]
(𝑏)𝛼𝑖 + 𝛽𝑖 × 𝑡𝑒𝑚𝑝: 𝑤ℎ𝑒𝑟𝑒 𝛼𝑖 𝑎𝑠 𝑎𝑏𝑜𝑣𝑒, 𝛽𝑖, 𝑖 = 1,2 𝑎𝑙𝑠𝑜 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑒𝑚𝑒𝑠𝑡𝑒𝑟[2]
(𝑐) 𝛼𝑖 + 𝛽𝑖 × 𝑡𝑒𝑚𝑝 + 𝛾𝑗 𝑤𝑖𝑡ℎ 𝛼𝑖, 𝛽𝑖 𝑎𝑠 𝑎𝑏𝑜𝑣𝑒, 𝛾𝑗, 𝑗 = 1,2 𝑑𝑒𝑝𝑒𝑛𝑑𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑟𝑜𝑢𝑡𝑒. [2]

>Model1<- glm(Passengers~temp*semester + route, family=poisson(link="log"))


>summary(Model1) [2]

Call:
glm(formula = Passengers ~ temp * semester + route, family = poisson(link = "log"))
Deviance Residuals:

Min 1Q Median 3Q Max


-1.8128 -0.6263 -0.1566 0.5162 1.3991

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.40210 0.31155 1.291 0.1968
temp -0.07878 0.03576 -2.203 0.0276 *
semestersemester 0.53514 0.46691 1.146 0.2517 [1]
route9am 0.17370 0.44520 0.390 0.6964
temp:semestersemester 0.10779 0.05741 1.878 0.0604 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 30.406 on 19 degrees of freedom
Residual deviance: 13.833 on 15 degrees of freedom
AIC: 62.187

(b)
Temperature (temp) is significant
Semester is not significant
Route is not significant
The interaction between temperature (temp) and semester is not significant at 5% significance level
but it is close to being significant

(c)
)
>Model2<- update(Model1,~.-route) [2]
Or,
Model2 <- glm(Passengers~temp*semester,family="poisson" (link = "log"))
>summary(Model2)
Call:
glm(formula = Passengers ~ temp + semester + temp:semester, family = poisson(link
= "log"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.84542 -0.66323 -0.06209 0.43732 1.34790
CHAPTER 12 SOLUTION PAPER B

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.44284 0.29121 1.521 0.1283
temp -0.07452 0.03387 -2.200 0.0278 *
semestersemester 0.54602 0.46390 1.177 0.2392
temp:semestersemester 0.10012 0.05316 1.883 0.0597 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 30.406 on 19 degrees of freedom
Residual deviance: 13.982 on 16 degrees of freedom
AIC: 60.33

(b) The AIC has fallen from 62.187 to 60.336 - so new model has improved the initial model

(iv)(a)
>Model3<- glm(Passengers~temp+temp:semester,family=poisson(link="log"))
>Modela<- glm(Passengers~temp+ semester,family=poisson(link="log")) [2]
>Modelb<- glm(Passengers~temp*semester,family=poisson(link="log")) [2]
>Model3$aic
59.65976 [1]
>Modela$aic
62.03591

>Modelb$aic
60.33588 [1]
Model3 has the lowest AIC compared with the other models. We conclude that Model3
outperforms the other models considered here [1]

(b)
Model3 doesn’t include both of the main effects. Despite this, the model still suits the data
well [1]

(v)(a)
> plot(Model3,1) [2]
CHAPTER 12 SOLUTION PAPER B

(b)
The residuals plot shows no patterns - exhibiting a fairly random scatter around zero with
constant variance and no outliers [2]
The plot suggests that the model is appropriate [1]

(vi)
>predict(Model3, data.frame(temp=0,semester="semester",route="8am"),type =
"response") [3]
Predicted number is: 1.866568 [1]

Solution 4

# open a R file - file->open->select the file ->check data will be added in global env
iornment or

> policydata<-load(file.choose());policydata
[1] "n.policies" "sex.code" "class.code"

> plot(class.code,log(n.policies),main = "Number of policies against class of business


")

(b) There seems to be some dependence of number of claims on class of business.


with lower numbers for classes 4 and 5.
The relationship is not clear though.
(ii)

It now seems that the number of claims also depends on the gender of policyholders. [1]
The numbers are generally higher for males. [1]
(iii)
CHAPTER 12 SOLUTION PAPER B

> class.code = as.factor(class.code)


> sex.code = as.factor(sex.code)
> glm1 = glm(n.policies ~ class.code, family = "poisson")
> summary(glm1)

Call:
glm(formula = n.policies ~ class.code, family = "poisson")

Deviance Residuals:
1 2 3 4 5 6
0.6865 1.1476 2.1526 -0.1806 0.1879 -0.7118
7 8 9 10
-1.2165 -2.3899 0.1787 -0.1901

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.7257 0.1098 33.943 <2e-16 ***
class.code2 0.1029 0.1514 0.680 0.4965
class.code3 0.2540 0.1463 1.736 0.0825 .
class.code4 -0.2917 0.1679 -1.738 0.0822 .
class.code5 -0.3935 0.1729 -2.275 0.0229 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 36.761 on 9 degrees of freedom


Residual deviance: 14.256 on 5 degrees of freedom
AIC: 79.136

Number of Fisher Scoring iterations: 4

Business class 1 is used as the baseline category (intercept level).


The effect of class 5 on the number of policies appears to be significantly different from that of class 1, and there
is some (weak) evidence that classes 3 and 4 also have a different effect.

(iv)

> glm2 = glm(n.policies ~ class.code + sex.code, family = "poisson")


> summary(glm2)
Call:
glm(formula = n.policies ~ class.code + sex.code, family = "poisson")
Deviance Residuals:
1 2 3 4 5 6
-0.2213 0.1825 1.0919 -0.9478 -0.5494 0.2530
7 8 9 10
-0.2133 -1.3375 1.0333 0.6127

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.8611 0.1180 32.732 < 2e-16 ***
class.code2 0.1029 0.1514 0.680 0.49648
class.code3 0.2540 0.1463 1.736 0.08248 .
class.code4 -0.2917 0.1679 -1.738 0.08225 .
class.code5 -0.3935 0.1729 -2.275 0.02288 *
sex.code2 -0.2921 0.1011 -2.890 0.00386 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)


CHAPTER 12 SOLUTION PAPER B

Null deviance: 36.7607 on 9 degrees of freedom


Residual deviance: 5.8163 on 4 degrees of freedom
AIC: 72.696

Number of Fisher Scoring iterations: 4

Numbers of policies depend on both business class and gender of policyholder.


Business class 5 has the strongest effect on number of policies when compared to class 1, and this effect is negati
ve
(reducing number of policies). Male policyholders give the baseline here, so being female has a significant negati
ve effect on number of policies.

(v)
The null hypothesis is that the second model (including both factors) is not an improvement over the first
model.

anova(glm1,glm2,test = "Chisq")
Analysis of Deviance Table

Model 1: n.policies ~ class.code


Model 2: n.policies ~ class.code + sex.code
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 5 14.2560
2 4 5.8163 1 8.4397 0.003671 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The p-value is 0.003671.


Therefore we have strong evidence against the null hypothesis. We conclude that the second model gives
significant improvement.

(vi)
> predict(glm2, data.frame(class.code="2", sex.code="1"), type="r
esponse")
1
52.67

Based on model 2 we predict 52.67 policies

You might also like