0% found this document useful (0 votes)
1K views26 pages

CS1 R Summary Sheets

Uploaded by

Pranav Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views26 pages

CS1 R Summary Sheets

Uploaded by

Pranav Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Basic R commands Summary

Functions
x! factorial(x)
n
  choose(n,x)
x
Γ(x) gamma(x)

ex exp(x)
ln x log(x)
x sqrt(x)
xn x^n

Statistics
Given a data set stored in a vector x:

n length(x)

x sum(x)

 x2 sum(x^2)

x mean(x)
median median(x)

(x − x )2 sum((x-mean(x))^2)

sd sd(x)
variance var(x)

Q1 quantile(x,0.25)
Q3 quantile(x,0.75)

Logical Tests
not equal !=
equal ==
greater than >
greater than or equal >=

or |
and &

The Actuarial Education Company © IFE: 2019 Examinations


Tables and graphs
Frequency table (suitable for discrete data)

table(<object>)

Bar chart (suitable for discrete data)

barplot(<heights>,names=<x values>)

Example: bar chart of Bin(20,0.3) probabilities:


x <- 0:20
barplot(dbinom(x,20,0.3),names=x)

Histogram

hist(<object>)

Additional options:
prob=TRUE histogram of probabilities (or probability densities) instead of frequencies
breaks= specifies where you want the breaks between the bars to be.
useful for discrete data so bars are centred on values
eg breaks=-0.5:5.5 gives breaks at -0.5,0.5,1.5,…,4.5,5.5

Graphs

plot(x,y) or plot(<object>)

By default will plot points (ie a scattergraph), otherwise need to specify the type:

type="p" points (default option)


type="l" line
type="o" overplot (draw line with points on top)
type="s" step graph (suitable for discrete CDF)
type="n" no plot (suitable if want to set up the axes and then add lots of graphs to it)

Example: PDF of N(10,4) probabilities:


x <- seq(5,15,by=0.01)
plot(x,dnorm(x,10,2),type="l")

Additional options:
main="…" title of graph
xlab="…" x-axis label
ylab="…" y-axis label

xlim= x-axis limits, eg xlim=c(0,20)


ylim= y-axis limits, eg ylim=c(-5,5)

pch= plot character (ie what character we use for the point)

lty= line type, eg 1=solid, 2=dashed, 3=dotted, etc


lwd= line width, default=1, higher number is thicker
col="…" colour, eg col="red"

© IFE: 2019 Examinations The Actuarial Education Company


Additional graphs on same diagram

lines(x,y) adds line to existing plot, has same options for line types, widths and colours.
points(x,y) adds points to existing plot, has same options for points character and colours.

Plot y=a+bx

abline(a,b) plots straight line with intercept a and gradient b


has same options for line types, widths and colours

abline(model) if insert linear model – will plot the regression line

abline(h=…) plots a horizontal line at …


abline(v=…) plots a vertical line at …

QQ Plots

qqnorm(x) plots quantiles of sample (on vertical) against normal (on horizontal)
qqline(x) adds a diagonal line to qqnorm to show “true” position

qqplot(sim,x) plots quantiles of sample (on vertical) against simulations of theoretical distribution
(on horizontal), eg sim <- rgamma(1000,3,2)
abline(0,1) adds the correct diagonal line to a qqplot

The Actuarial Education Company © IFE: 2019 Examinations


Ch 1 Distributions

Overview
All of these functions have the form:
<letter><distribution>(<x>, <parameters>)
where:
d <distribution>(x , <parameters>) gives the value of the PDF/PF at x, ie f (x) or P( X = x)
p <distribution>(x , <parameters>) gives CDF F (x) = P( X ≤ x) for x
option: lower=FALSE to give P( X > x)
q<distribution>(p , <parameters>) gives smallest value of x such that F (x) = P( X ≤ x) = p or greater
option: lower=FALSE to give largest value of x such that P( X > x) = p
r <distribution>(n , <parameters>) gives n random simulations from the distribution

Discrete Distributions
The names and parameters for discrete distributions are:
Binomial binom(…, <n>,<p>)
Poisson pois(…, <mu>)
Type 2 geometric geom(…, <p>)
Type 2 Negative Binomial nbinom(…, <k>,<p>)
Hypergeometric
hyper(…,<success in pop>, <failure in pop>,<sample size>)

Continuous Distributions
The names and parameters for continuous distributions are:
Exponential exp(…, <lambda>)
Gamma gamma(…, <alpha>,<lambda>)
Chi square chisq(…, <dof>)
Uniform unif(…, <min>,<max>)
Beta beta(…, <alpha>,<beta>)
Normal norm(…, <mean>,<sd>)
Log normal lnorm(…, <mu>,<sigma>)
t t(…, <dof>)
F f(…, <dof1>,<dof2>)

The Actuarial Education Company © IFE: 2019 Examinations


Ch5 The Central Limit Theorem
Suppose we have a sample of size n = 10 from from a Poisson distribution with mu = 5 . Under the Central
Limit Theorem:

( )
X  N μ , σn = N 5, 10

2
( )
5 = N(5,0.5)

We can generate an empirical distribution of the sample means as follows:

xbar <- rep(0,1000)


set.seed(24)
for (i in 1:1000)
{xbar[i] <- mean(rpois(10,5))}

For this particular seed value we get a mean of 4.9899 and variance of 0.5091171.

We can plot a histogram (with height up to 0.6) of the probabilities and superimpose (in red) the empirical
density function:

hist(xbar,ylim=c(0,0.6),prob=TRUE)
lines(density(xbar),col="red")

We can superimpose (in blue) the true PDF:

xvals <- seq(2.5,7.5,0.01)


lines(xvals,dnorm(xvals,5,sqrt(0.5)),col="blue")

We can compare with a normal distribution by drawing a QQ plot:

qqnorm(xbar)
qqline(xbar)

We can calculate probabilities, say P( X ≤ 4) , empirically and using the Central Limit Theorem:

length(xbar[xbar<=4])/length(xbar)
pnorm(4,5,sqrt(0.5))

In this case, we get 0.099 and 0.0786496.

qqnorm
#Close to normal in the middle and fairly good in upper tail
#However, 'banana shape' indicates skewness
#Since sample quantiles above the line in both tails - they need to be lower to match norm
#seriously lighter lower tail - as not as low as should be
#slightly heavier upper tail - as higher than expected
#lighter lower tail and heavier upper tail indicates positive skew

The Actuarial Education Company © IFE: 2019 Examinations


Ch4 Conditional to unconditional moments
Suppose λ  gamma(5,2) and X |λ  Poi(λ ) then:

E ( X |λ ) = λ and var( X |λ ) = λ

Hence:

E ( X ) = E[E ( X |λ )] = E[λ ] = 25 = 2.5

var( X ) = E[var( X |λ )] + var[E ( X |λ )] = E[λ ] + var[λ ] = 25 + 52 = 3.75


2

R code for the above:

x <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x[i] <- rpois(1,lambda)}

Now we can find the unconditional mean and variance:

mean(x)

var(x)

For this particular seed value we get a mean of 2.4955 and variance of 3.733353.

The Actuarial Education Company © IFE: 2019 Examinations


Ch13 Bayesian estimation
Suppose λ  gamma(5,2) and we have a sample of size n where X |λ  Poi(λ ) then:

λ | X  gamma(5 +  xi ,2 + n)

The Bayesian estimate under quadratic loss is:

5 +  xi
2+n

R code for the above with sample size 10:

pm <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x <- rpois(10,lambda)
pm[i] <- (5+sum(x))/(2+10)}

The average of the Bayesian estimates is:

mean(pm)

For this particular seed value we get a mean of 2.508583.

© IFE: 2019 Examinations The Actuarial Education Company


Ch14 Credibility Theory
Suppose λ  gamma(5,2) and we have a sample of size n where X |λ  Poi(λ ) then:

λ | X  gamma(5 +  xi ,2 + n)

The Bayesian estimate under quadratic loss is:

5 +  xi
2+n

This can be expressed in the form of a credibility estimate:

5 +  xi n  xi + 2 × 5
= ×
2+n 2+n n 2+n 2

where:

n
Z=
2+n

R code for the above with sample size 10:

cp <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x <- rpois(10,lambda)
Z <- 10/(2+10)
cp[i] <- Z*mean(x)+(1-Z)*(5/2)}

The average of these credibility premiums is:

mean(cp)

Note: This will be the same as the posterior mean as they are algebraically equivalent.

Hence, for this particular seed value we also get a mean of 2.508583.

The Actuarial Education Company © IFE: 2019 Examinations


EBCT Model 1
Given a data frame, data, consisting of rows of risks and columns of years.

Number of years

n<-ncol(data)

Estimates E[m(θ)],E[s2(θ)],var[m(θ)]

m <-mean(rowMeans(data))
s <-mean(apply(data,1,var))
v<-var(rowMeans(data))-s/n

Credibility factor

Z<-n/(n+s/v)

Credibility Premiums

Z* rowMeans(data)+(1-Z)*m

The Actuarial Education Company © IFE: 2019 Examinations


EBCT Model 2
Given a data frame, data, and a second data frame volume consisting of policies.

Number of years/risks

n<-ncol(data)
N<-nrow(data)

Claims per policy Xij and average X i

X <- data/volume
Xibar<-rowSums(data)/rowSums(volume)

Policy totals, Pi , P and P*

Pi <-rowSums(volume)
P <-sum(Pi)
Pstar <-sum(Pi*(1-Pi/P))/(N*n-1)

Estimates E[m(θ)],E[s2(θ)],var[m(θ)]

m<-sum(data)/P
s <-mean(rowSums(volume*(X-Xibar)^2)/(n-1))
v<-(sum(rowSums(volume*(X-m)^2))/(n*N-1)-s)/Pstar

Credibility factor

Zi<-Pi/(Pi+s/v)

Credibility Premiums per policy

Zi*Xibar+(1-Zi)*m

Credibility Premiums

Given a vector of volumes for the coming year, new.volume

cred.prem <- Zi*Xibar+(1-Zi)*m


cred.prem*new.volume

© IFE: 2019 Examinations The Actuarial Education Company


Ch8 & 9 Inference Summary

Overview
All confidence interval/test functions boil down to the following:

<name>.test(<data>, alternative=”two.sided”, conf.level=0.95)

<data> can be a vector or data frame, for 2 samples you’ll need 2 vectors
alternative is alternative hypothesis, choose from default "two.sided", or "less", "greater"
conf.level is size of the confidence interval, default = 95%
data is an additional option to specify the data frame where variable names in the formula come from (if
it’s not attached), eg t.test(weight,data=chickwts)

Tests additionally require the null hypothesis parameter value to be specified:

<name>.test(<data>, <null param>=…, alternative=”two.sided”,


conf.level=0.95)

<null param> is mu for mean, ratio for 2 variances, p for binomial and r for Poisson.

The following results can be extracted from the functions:

$method type of test


$statistic statistic (# of successes for binomial # of events for Poisson
$parameter dof (#trials for binomial #time base for Poisson
$p.value p-value
$alternative alternative hypothesis
$null.value H0 value
$conf.int confidence intervals
$estimate MLE

One sample confidence intervals/tests


Mean (known variance) No function – do it from 1st principles.

Mean (unknown variance)

CI t.test(x, alternative="two.sided", conf.level=0.95)


Test t.test(<data>,mu=0,alternative="two.sided",conf.level=0.95)

mu is value of mu in null hypothesis, default =0

Variance No function – do it from 1st principles,

Binomial

CI binom.test(<x>,<n>, alternative="two.sided", conf.level=0.95)


Test binom.test(x,n,p=0.5, alternative="two.sided",conf.level=0.95)

x is number of successes (or a vector of successes and trials)


n is number of trials (not needed if included in x)
p is the value of p in the null hypothesis, default = 0.5

The Actuarial Education Company © IFE: 2019 Examinations


Poisson

CI poisson.test(x, T=1,alternative="two.sided",conf.level=0.95)
Test poisson.test(x,T=1,r=1, alternative="two.sided",conf.level=0.95)

x is number of events
T is time base for events that occurred, default =1
r is the value of the rate (lambda) in the null hypothesis, default = 1

Two sample confidence intervals/tests


Mean (known variance) No function – do it from 1st principles

Means (unknown variance)

CI t.test(<data1>,<data2>, alt="two.sided", var.equal=FALSE,


conf=0.95)
Test t.test(<data1>,<data2>,mu=0,alt="two.sided",var.equal=FALSE,
conf=0.95)

mu is value of μ1 − μ2 in the null hypothesis, default = 0


var.equal is the option for whether we assume the variances are equal, default = FALSE

Means (paired data)

t.test(<after>,<before>,mu=0,alt="two.sided",conf=0.95,paired=TRUE)

Variances ratio var(<data1>)/var(<data2>)

CI var.test(<data1>,<data2>, alt="two.sided", conf=0.95)


Test var.test(<data1>,<data2>,ratio=1,alt="two.sided",conf=0.95)

ratio is value of ratio var(<data1>)/var(<data2>) in null hypothesis, default = 1

Binomial

CI prop.test(<successes>,<trials>, conf.level=0.95,correct=TRUE)
x is number of successes (or a vector of successes and trials)
n is number of trials (not needed if included in x)
or matrix of successes and failures (not trials)

Test prop.test(x,n,p=NULL, alternative="two.sided",conf.level=0.95,correct=TRUE)


p is value of p1 − p2 in null hypothesis, default = 0
(actually does chi square 2x2 contingency table test with Yates continuity correction)
(equiv to Core Reading normal approx if continuity correction ignored)

Poisson RATIO (not difference)

CI poisson.test(x,T=1, conf.level=0.95)
x is vector of events
T is vector of time base for events that occurred, default =1

Test poisson.test(x,n,r=1, alt="two.sided",conf.level=0.95)


r is value of lambda ratios in null hypothesis, default = 1

© IFE: 2019 Examinations The Actuarial Education Company


Chi-squared goodness of fit tests
Goodness of fit

Given expected probabilities:

chisq.test(<obs freq>, p=<exptd probabilities>)

Default exptd probabilities are uniform


Note: p= is important or thinks doing contingency table with 2 vectors

Given expected frequencies:

chisq.test(<obs freq>, p=<exptd freq>,rescale.p=TRUE)

If exptd freq < 5 then sim p-value rather than use chi-square approx

chisq.test(f,p=exptd,simulate.p.value=TRUE)

From 1st principles:

statistic <- sum((obs-exptd)^2/exptd)


statistic
pchisq(statistic, <dof>,lower=FALSE)

The following results can be extracted from the functions:

$method type of test


$statistic statistic
$parameter dof
$p.value p-value
$observed observed counts
$expected exptd counts under null hyp
$residuals residuals - Pearson residuals = (obs-exptd)/sqrt(exptd)

Contingency Table

Given matrix of observed frequencies:

chisq.test(<obs freq matrix>)

Note for a 2×2 it applies Yates continuity correction by default, to remove:

chisq.test(<obs freq matrix>,correct=FALSE)

Fishers exact test

Given matrix of observed frequencies, obs:

fisher.test(obs)

The Actuarial Education Company © IFE: 2019 Examinations


Bootstrap Inference
Parametric bootstrap (mean)

For a sample x of size n from a normal distribution, the bootstrap empirical distribution of X is bm:

bm <- rep(0,1000)
set.seed(17)
for(i in 1:1000)
{bm[i]<-mean(rnorm(n,mean=mean(x),sd=sd(x)))}

Or using replicate:

set.seed(17)
bm <- replicate(1000,mean(rnorm(n,mean=mean(x),sd=sd(x))))

95% CI quantile(bm,c(0.025,0.975))
Test – either use equivalence with CIs, compare with critical values or calculate p-value using length.

Non-parametric bootstrap (mean)

For a sample x of size n from an unknown distribution, the bootstrap empirical distribution of X is bm:

bm <- rep(0,1000)
set.seed(17)
for(i in 1:1000)
{bm[i]<-mean(sample(x,replace=TRUE))}

Or using replicate:

set.seed(17)
bm <- replicate(1000,mean(sample(x,replace=TRUE)))

© IFE: 2019 Examinations The Actuarial Education Company


Non-parametric permutation tests
Test two means are equal – all permutations

For a samples x1 and x2 of sizes n1 and n2:

n1 <- length(x1)

results <- c(x1,x2)

index <- 1:length(results)

p<-combn(index,n1)

n<-ncol(p)

dif<-rep(0,n)

for (i in 1:n)

{dif[i]<-mean(results[p[,i]])-mean(results[-p[,i]])}

Find p-value:

ObsT <- mean(x1)-mean(x2)

length(dif[dif>=ObsT])/length(dif)

Test two means are equal – resampling

For a samples x1 and x2 of sizes n1 and n2:

n1 <- length(x1)

results <- c(x1,x2)

index <- 1:length(results)

Repeatedly resample from our combined vector of values

dif<-rep(0,10000)

set.seed(123)

for (i in 1:10000)

{p<-sample(index, n1, replace=FALSE)

dif[i]<-mean(results[p])-mean(results[-p])}

Find p-value:

ObsT <- mean(x1)-mean(x2)

length(dif[dif>=ObsT])/length(dif)

The Actuarial Education Company © IFE: 2019 Examinations


Non-parametric permutation tests
Paired test of two means – all permutations

For a samples of differences D of sizes nD:

nD <- length(D)

sign <- c(-1,1)

library(gtools)

p<-permutations(2,nD,sign,repeats.allowed=TRUE)

n<-nrow(p)

dif<-rep(0,n)

for (i in 1:n)

{dif[i]<-mean(D*p[i,])}

Find p-value:

ObsD <- mean(D)

length(dif[dif>=ObsD])/length(dif)

Paired test of two means - resampling

For a samples of differences D of sizes nD:

nD <- length(D)

sign <- c(-1,1)

Repeatedly resample from the vector of signs

dif<-rep(0,10000)

set.seed(123)

for (i in 1:10000)

{p<-sample(sign, nD, replace=TRUE)

dif[i]<-mean(D*p)}

Find p-value:

ObsD <- mean(D)

length(dif[dif>=ObsD])/length(dif)

© IFE: 2019 Examinations The Actuarial Education Company


PCA Summary
From scatch Using prcomp function

Prepare data:

Given a data frame DF:

data <- as.matrix(DF)


X <- scale(data, scale=FALSE) pca<-prcomp(DF1)

Eigenvectors of (scaled) variance-covariance matrix of X

New principal components (efficient orthogonal co-ord syst) based on old components (rotate axes):

W <- eigen(t(X) %*% X)$vectors pca$rotation

Principal components decomposition P of X

Co-ordinates of points in new PC co-ord system

P <- X %*% W pca$x

(scaled) variance-covariance matrix of P

S <- t(P) %*% P pca$sdev


diag(S) or summary(pca)

Reconstruct original (centred) data X

P %*% t(W)

Deciding which components to remove

• consider the variances – keep variables that explain 99%/95%/90% of variation


• use a scree plot – keep variables before the scree slope
plot(pca,type="line",main="scree plot")
• use Kaiser criterion – keep components whose var >1 if data scaled
pca<-prcomp(DF,scale=TRUE)
summary(pca)

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is owned
by Institute and Faculty Education Limited, a subsidiary of
the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it is


not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through the
profession or through your employer.

These conditions remain in force after you have finished using


the course.

© IFE: 2019 Examinations The Actuarial Education Company


Correlation Summary
Scattergraph

For bivariate data use:

plot(<x>,<y>) or plot(<dataframe>)

Options include: main, xlim, ylim, xlab, ylab, pch, col.

For multivariate data use:

plot(<dataframe>) or pairs(<dataframe>)

Options for pairs include: labels=c(“…”,”…”,…), panel=panel.smooth

Correlation

cor(<x>,<y>,method="pearson") or cor(<dataframe>,method="pearson")

method can be: “pearson” (default), “spearman” or “kendall”

Correlation Test

cor.test(<x>,<y>,method="pearson", alt=”two.sided”, conf=0.95)

Tests whether the correlation coefficient is zero


alternative is alternative hypothesis, choose from default "two.sided", or "less", "greater"
conf.level is size of the confidence interval, default = 95%

( )
For spearman if <50 pairs uses exact, otherwise uses N 0, n1−1 with continuity correction

(
For kendall if <50 pairs uses exact, otherwise uses N 0, 2(2 )
n+ 5)
9 n(n−1)
with continuity correction
Can specify always exact using option exact=TRUE

The Actuarial Education Company © IFE: 2019 Examinations


Linear Regression Summary
Scattergraph

plot(<x>,<y>) or plot(<dataframe>)

Options include: main, xlim, ylim, xlab, ylab, pch, col.

Linear regression model

lm(<formula>)

Y~1 null model (Y = α )


Y~X linear model (Y = α + β x)
Y~X-1 linear model through origin (Y = β x)

Options include: data, subset, na.action.


Use data to specify the data frame where variable names in the formula come from (if it’s not attached)
Use subset to choose a subset of the data (using logical commands) to carry out the regression on.

All the functions below assume the linear regression model is stored in <model>.

Add regression line (to existing plot)

abline(<model>)

Options include: lty, lwd, col.

Linear regression model

<model> displays fitted coefficients

Results can be extracted using: $coef, $fitted, $resid and $df.


Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $r.squared, $fstatistic
However coef and df extract more from summary(model) than they do from model.

© IFE: 2019 Examinations The Actuarial Education Company


Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”) or skip argument and use indexing.

Sum of squares (ANOVA)

anova(<model>) displays the following:

Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted
points(<x>,fitted(<model>)

Mean and individual response

predict(<model>, newdata, interval, level=0.95)

Use newdata to give a data frame of explanatory variables to be used to predict response variables.
eg newdata <- data.frame(<x>=30)
Use interval set to “confidence” for mean response and “predict” for individual response.
Predict without newdata specified gives the fitted values (ie it uses existing explanatory variables).

Residuals and checking the fit

resid(<model>) or <model>$resid

plot(<model>,1) plot(<model>,2)
residuals vs fitted values Q-Q plot of residuals

The Actuarial Education Company © IFE: 2019 Examinations


Multiple Linear Regression Summary
Scattergraph

plot(<dataframe>) or pairs(<dataframe>)

Options for pairs include: labels=c(“…”,”…”,…), panel=panel.smooth

Linear regression model

lm(<formula>)

Y~X1+…+Xk linear model (Y = α + β1 x1 +  + βk xk )


Y~X1+X2+X1:X2 linear model (Y = α + β1 x1 + β2 x2 + γ 12 x1 x2 )
Y~X1*Xk linear model (Y = α + β1 x1 + β2 x2 + γ 12 x1 x2 )
Y~I(X^2) linear model (Y = α + β x 2 )

All the functions below assume the linear regression model is stored in <model>.

Multiple Linear regression model

<model> displays fitted coefficients

Results can be extracted using: $coef, $fitted, $resid and $df.


Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $adj.r.sq, $fstatistic
However coef and df extract more from summary(model) than they do from model.

© IFE: 2019 Examinations The Actuarial Education Company


Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”) or skip argument and use indexing.

Sum of squares (ANOVA)

anova(<model>) displays the following:

Splits SSREG between the explanatory variables


First row gives results for model of just that variable
Subsequent rows gives results if that variable is added to the previous lines
Overall F test in summary
Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted

Mean and individual response

predict(<model>, newdata, interval, level=0.95)

Use newdata to give a data frame of explanatory variables to be used to predict response variables.
Use interval set to “confidence” for mean response and “predict” for individual response.
Predict without newdata specified gives the fitted values (ie it uses existing explanatory variables).

Residuals and checking the fit

resid(<model>) or <model>$resid

plot(<model>,1) residuals vs fitted values


plot(<model>,1) Q-Q plot of residuals

Updating models

update(<model>,.~.+<new variable>)

Comparing models

Compare adjusted R2
Check all parameters significant
anova(<model1>,<model2>,test=”F”)

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is owned
by Institute and Faculty Education Limited, a subsidiary of
the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it is


not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through the
profession or through your employer.

These conditions remain in force after you have finished using


the course.

© IFE: 2019 Examinations The Actuarial Education Company


GLMs Summary
Fitting a Generalised Linear Model

glm(<formula>, family = <…> (link=”…”))

Family and (default) canonical link function and other available link functions:

gaussian : "identity" "log" "inverse"


binomial : "logit" "log" "probit"
Gamma : "inverse" "log" "identity"
poisson : "log" "sqrt" "identity"

Functions as for multiple linear regression.


Use factor(<x>) to force a numeric variable to be a factor

All the functions below assume the GLM is stored in <model>.

GLM analysis

<model> displays fitted coefficients, dof, deviance, AIC

Results can be extracted using: $coef, $fitted, $resid, $df.resid, $deviance, $aic
Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $r.squared, $fstatistic
However coef and df extract more from summary(model) than they do from model.

The Actuarial Education Company © IFE: 2019 Examinations


Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”).

Sum of squares (ANOVA)

anova(<model>) displays the following:

Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted

Predicted values

predict(<model>, newdata, type=”link”)

Gives values of parameters in linear predictor by default.


Use type set to “response” for giving the values of the response variable.

Residuals and checking the fit

resid(<model>,type=”deviance”)

Use type set to “response” for raw residuals, “deviance” (default) or “pearson”

plot(<model>,1) residuals vs fitted values


plot(<model>,2) Q-Q plot of residuals

Updating models

update(<model>,.~.+<new variable>)

Comparing models

Compare AIC
Check all parameters significant
anova(<model1>,<model2>,test=”F”)
or anova(<model1>,<model2>,test=”Chisq”)

© IFE: 2019 Examinations The Actuarial Education Company

You might also like