0% found this document useful (0 votes)

1K views26 pages

CS1 R Summary Sheets

Uploaded by

Pranav Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views26 pages

CS1 R Summary Sheets

Uploaded by

Pranav Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Basic R commands Summary

Functions
x! factorial(x)
n
  choose(n,x)
x
Γ(x) gamma(x)

ex exp(x)
ln x log(x)
x sqrt(x)
xn x^n

Statistics
Given a data set stored in a vector x:

n length(x)

x sum(x)

 x2 sum(x^2)

x mean(x)
median median(x)

(x − x )2 sum((x-mean(x))^2)

sd sd(x)
variance var(x)

Q1 quantile(x,0.25)
Q3 quantile(x,0.75)

Logical Tests
not equal !=
equal ==
greater than >
greater than or equal >=

or |
and &

The Actuarial Education Company © IFE: 2019 Examinations

Tables and graphs
Frequency table (suitable for discrete data)

table(<object>)

Bar chart (suitable for discrete data)

barplot(<heights>,names=<x values>)

Example: bar chart of Bin(20,0.3) probabilities:

x <- 0:20
barplot(dbinom(x,20,0.3),names=x)

Histogram

hist(<object>)

Additional options:
prob=TRUE histogram of probabilities (or probability densities) instead of frequencies
breaks= specifies where you want the breaks between the bars to be.
useful for discrete data so bars are centred on values
eg breaks=-0.5:5.5 gives breaks at -0.5,0.5,1.5,…,4.5,5.5

Graphs

plot(x,y) or plot(<object>)

By default will plot points (ie a scattergraph), otherwise need to specify the type:

type="p" points (default option)

type="l" line
type="o" overplot (draw line with points on top)
type="s" step graph (suitable for discrete CDF)
type="n" no plot (suitable if want to set up the axes and then add lots of graphs to it)

Example: PDF of N(10,4) probabilities:

x <- seq(5,15,by=0.01)
plot(x,dnorm(x,10,2),type="l")

Additional options:
main="…" title of graph
xlab="…" x-axis label
ylab="…" y-axis label

xlim= x-axis limits, eg xlim=c(0,20)

ylim= y-axis limits, eg ylim=c(-5,5)

pch= plot character (ie what character we use for the point)

lty= line type, eg 1=solid, 2=dashed, 3=dotted, etc

lwd= line width, default=1, higher number is thicker
col="…" colour, eg col="red"

© IFE: 2019 Examinations The Actuarial Education Company

Additional graphs on same diagram

lines(x,y) adds line to existing plot, has same options for line types, widths and colours.
points(x,y) adds points to existing plot, has same options for points character and colours.

Plot y=a+bx

abline(a,b) plots straight line with intercept a and gradient b

has same options for line types, widths and colours

abline(model) if insert linear model – will plot the regression line

abline(h=…) plots a horizontal line at …

abline(v=…) plots a vertical line at …

QQ Plots

qqnorm(x) plots quantiles of sample (on vertical) against normal (on horizontal)
qqline(x) adds a diagonal line to qqnorm to show “true” position

qqplot(sim,x) plots quantiles of sample (on vertical) against simulations of theoretical distribution
(on horizontal), eg sim <- rgamma(1000,3,2)
abline(0,1) adds the correct diagonal line to a qqplot

The Actuarial Education Company © IFE: 2019 Examinations

Ch 1 Distributions

Overview
All of these functions have the form:
<letter><distribution>(<x>, <parameters>)
where:
d <distribution>(x , <parameters>) gives the value of the PDF/PF at x, ie f (x) or P( X = x)
p <distribution>(x , <parameters>) gives CDF F (x) = P( X ≤ x) for x
option: lower=FALSE to give P( X > x)
q<distribution>(p , <parameters>) gives smallest value of x such that F (x) = P( X ≤ x) = p or greater
option: lower=FALSE to give largest value of x such that P( X > x) = p
r <distribution>(n , <parameters>) gives n random simulations from the distribution

Discrete Distributions
The names and parameters for discrete distributions are:
Binomial binom(…, <n>,<p>)
Poisson pois(…, <mu>)
Type 2 geometric geom(…, <p>)
Type 2 Negative Binomial nbinom(…, <k>,<p>)
Hypergeometric
hyper(…,<success in pop>, <failure in pop>,<sample size>)

Continuous Distributions
The names and parameters for continuous distributions are:
Exponential exp(…, <lambda>)
Gamma gamma(…, <alpha>,<lambda>)
Chi square chisq(…, <dof>)
Uniform unif(…, <min>,<max>)
Beta beta(…, <alpha>,<beta>)
Normal norm(…, <mean>,<sd>)
Log normal lnorm(…, <mu>,<sigma>)
t t(…, <dof>)
F f(…, <dof1>,<dof2>)

The Actuarial Education Company © IFE: 2019 Examinations

Ch5 The Central Limit Theorem
Suppose we have a sample of size n = 10 from from a Poisson distribution with mu = 5 . Under the Central
Limit Theorem:

( )
X  N μ , σn = N 5, 10

2
( )
5 = N(5,0.5)

We can generate an empirical distribution of the sample means as follows:

xbar <- rep(0,1000)

set.seed(24)
for (i in 1:1000)
{xbar[i] <- mean(rpois(10,5))}

For this particular seed value we get a mean of 4.9899 and variance of 0.5091171.

We can plot a histogram (with height up to 0.6) of the probabilities and superimpose (in red) the empirical
density function:

hist(xbar,ylim=c(0,0.6),prob=TRUE)
lines(density(xbar),col="red")

We can superimpose (in blue) the true PDF:

xvals <- seq(2.5,7.5,0.01)

lines(xvals,dnorm(xvals,5,sqrt(0.5)),col="blue")

We can compare with a normal distribution by drawing a QQ plot:

qqnorm(xbar)
qqline(xbar)

We can calculate probabilities, say P( X ≤ 4) , empirically and using the Central Limit Theorem:

length(xbar[xbar<=4])/length(xbar)
pnorm(4,5,sqrt(0.5))

In this case, we get 0.099 and 0.0786496.

qqnorm
#Close to normal in the middle and fairly good in upper tail
#However, 'banana shape' indicates skewness
#Since sample quantiles above the line in both tails - they need to be lower to match norm
#seriously lighter lower tail - as not as low as should be
#slightly heavier upper tail - as higher than expected
#lighter lower tail and heavier upper tail indicates positive skew

The Actuarial Education Company © IFE: 2019 Examinations

Ch4 Conditional to unconditional moments
Suppose λ  gamma(5,2) and X |λ  Poi(λ ) then:

E ( X |λ ) = λ and var( X |λ ) = λ

Hence:

E ( X ) = E[E ( X |λ )] = E[λ ] = 25 = 2.5

var( X ) = E[var( X |λ )] + var[E ( X |λ )] = E[λ ] + var[λ ] = 25 + 52 = 3.75

R code for the above:

x <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x[i] <- rpois(1,lambda)}

Now we can find the unconditional mean and variance:

mean(x)

var(x)

For this particular seed value we get a mean of 2.4955 and variance of 3.733353.

The Actuarial Education Company © IFE: 2019 Examinations

Ch13 Bayesian estimation
Suppose λ  gamma(5,2) and we have a sample of size n where X |λ  Poi(λ ) then:

λ | X  gamma(5 +  xi ,2 + n)

The Bayesian estimate under quadratic loss is:

5 +  xi
2+n

R code for the above with sample size 10:

pm <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x <- rpois(10,lambda)
pm[i] <- (5+sum(x))/(2+10)}

The average of the Bayesian estimates is:

mean(pm)

For this particular seed value we get a mean of 2.508583.

© IFE: 2019 Examinations The Actuarial Education Company

Ch14 Credibility Theory
Suppose λ  gamma(5,2) and we have a sample of size n where X |λ  Poi(λ ) then:

λ | X  gamma(5 +  xi ,2 + n)

The Bayesian estimate under quadratic loss is:

5 +  xi
2+n

This can be expressed in the form of a credibility estimate:

5 +  xi n  xi + 2 × 5
= ×
2+n 2+n n 2+n 2

where:

n
Z=
2+n

R code for the above with sample size 10:

cp <- rep(0,10000)
set.seed(24)
for (i in 1:10000)
{lambda <- rgamma(1,5,2)
x <- rpois(10,lambda)
Z <- 10/(2+10)
cp[i] <- Z*mean(x)+(1-Z)*(5/2)}

The average of these credibility premiums is:

mean(cp)

Note: This will be the same as the posterior mean as they are algebraically equivalent.

Hence, for this particular seed value we also get a mean of 2.508583.

The Actuarial Education Company © IFE: 2019 Examinations

EBCT Model 1
Given a data frame, data, consisting of rows of risks and columns of years.

Number of years

n<-ncol(data)

Estimates E[m(θ)],E[s2(θ)],var[m(θ)]

m <-mean(rowMeans(data))
s <-mean(apply(data,1,var))
v<-var(rowMeans(data))-s/n

Credibility factor

Z<-n/(n+s/v)

Credibility Premiums

Z* rowMeans(data)+(1-Z)*m

The Actuarial Education Company © IFE: 2019 Examinations

EBCT Model 2
Given a data frame, data, and a second data frame volume consisting of policies.

Number of years/risks

n<-ncol(data)
N<-nrow(data)

Claims per policy Xij and average X i

X <- data/volume
Xibar<-rowSums(data)/rowSums(volume)

Policy totals, Pi , P and P*

Pi <-rowSums(volume)
P <-sum(Pi)
Pstar <-sum(Pi*(1-Pi/P))/(N*n-1)

Estimates E[m(θ)],E[s2(θ)],var[m(θ)]

m<-sum(data)/P
s <-mean(rowSums(volume*(X-Xibar)^2)/(n-1))
v<-(sum(rowSums(volume*(X-m)^2))/(n*N-1)-s)/Pstar

Credibility factor

Zi<-Pi/(Pi+s/v)

Credibility Premiums per policy

Zi*Xibar+(1-Zi)*m

Credibility Premiums

Given a vector of volumes for the coming year, new.volume

cred.prem <- ZiXibar+(1-Zi)m

cred.prem*new.volume

© IFE: 2019 Examinations The Actuarial Education Company

Ch8 & 9 Inference Summary

Overview
All confidence interval/test functions boil down to the following:

<name>.test(<data>, alternative=”two.sided”, conf.level=0.95)

<data> can be a vector or data frame, for 2 samples you’ll need 2 vectors
alternative is alternative hypothesis, choose from default "two.sided", or "less", "greater"
conf.level is size of the confidence interval, default = 95%
data is an additional option to specify the data frame where variable names in the formula come from (if
it’s not attached), eg t.test(weight,data=chickwts)

Tests additionally require the null hypothesis parameter value to be specified:

<name>.test(<data>, <null param>=…, alternative=”two.sided”,

conf.level=0.95)

<null param> is mu for mean, ratio for 2 variances, p for binomial and r for Poisson.

The following results can be extracted from the functions:

$method type of test

$statistic statistic (# of successes for binomial # of events for Poisson
$parameter dof (#trials for binomial #time base for Poisson
$p.value p-value
$alternative alternative hypothesis
$null.value H0 value
$conf.int confidence intervals
$estimate MLE

One sample confidence intervals/tests

Mean (known variance) No function – do it from 1st principles.

Mean (unknown variance)

CI t.test(x, alternative="two.sided", conf.level=0.95)

Test t.test(<data>,mu=0,alternative="two.sided",conf.level=0.95)

mu is value of mu in null hypothesis, default =0

Variance No function – do it from 1st principles,

Binomial

CI binom.test(<x>,<n>, alternative="two.sided", conf.level=0.95)

Test binom.test(x,n,p=0.5, alternative="two.sided",conf.level=0.95)

x is number of successes (or a vector of successes and trials)

n is number of trials (not needed if included in x)
p is the value of p in the null hypothesis, default = 0.5

The Actuarial Education Company © IFE: 2019 Examinations

Poisson

CI poisson.test(x, T=1,alternative="two.sided",conf.level=0.95)
Test poisson.test(x,T=1,r=1, alternative="two.sided",conf.level=0.95)

x is number of events
T is time base for events that occurred, default =1
r is the value of the rate (lambda) in the null hypothesis, default = 1

Two sample confidence intervals/tests

Mean (known variance) No function – do it from 1st principles

Means (unknown variance)

CI t.test(<data1>,<data2>, alt="two.sided", var.equal=FALSE,

conf=0.95)
Test t.test(<data1>,<data2>,mu=0,alt="two.sided",var.equal=FALSE,
conf=0.95)

mu is value of μ1 − μ2 in the null hypothesis, default = 0

var.equal is the option for whether we assume the variances are equal, default = FALSE

Means (paired data)

t.test(<after>,<before>,mu=0,alt="two.sided",conf=0.95,paired=TRUE)

Variances ratio var(<data1>)/var(<data2>)

CI var.test(<data1>,<data2>, alt="two.sided", conf=0.95)

Test var.test(<data1>,<data2>,ratio=1,alt="two.sided",conf=0.95)

ratio is value of ratio var(<data1>)/var(<data2>) in null hypothesis, default = 1

Binomial

CI prop.test(<successes>,<trials>, conf.level=0.95,correct=TRUE)
x is number of successes (or a vector of successes and trials)
n is number of trials (not needed if included in x)
or matrix of successes and failures (not trials)

Test prop.test(x,n,p=NULL, alternative="two.sided",conf.level=0.95,correct=TRUE)

p is value of p1 − p2 in null hypothesis, default = 0
(actually does chi square 2x2 contingency table test with Yates continuity correction)
(equiv to Core Reading normal approx if continuity correction ignored)

Poisson RATIO (not difference)

CI poisson.test(x,T=1, conf.level=0.95)
x is vector of events
T is vector of time base for events that occurred, default =1

Test poisson.test(x,n,r=1, alt="two.sided",conf.level=0.95)

r is value of lambda ratios in null hypothesis, default = 1

© IFE: 2019 Examinations The Actuarial Education Company

Chi-squared goodness of fit tests
Goodness of fit

Given expected probabilities:

chisq.test(<obs freq>, p=<exptd probabilities>)

Default exptd probabilities are uniform

Note: p= is important or thinks doing contingency table with 2 vectors

Given expected frequencies:

chisq.test(<obs freq>, p=<exptd freq>,rescale.p=TRUE)

If exptd freq < 5 then sim p-value rather than use chi-square approx

chisq.test(f,p=exptd,simulate.p.value=TRUE)

From 1st principles:

statistic <- sum((obs-exptd)^2/exptd)

statistic
pchisq(statistic, <dof>,lower=FALSE)

The following results can be extracted from the functions:

$method type of test

$statistic statistic
$parameter dof
$p.value p-value
$observed observed counts
$expected exptd counts under null hyp
$residuals residuals - Pearson residuals = (obs-exptd)/sqrt(exptd)

Contingency Table

Given matrix of observed frequencies:

chisq.test(<obs freq matrix>)

Note for a 2×2 it applies Yates continuity correction by default, to remove:

chisq.test(<obs freq matrix>,correct=FALSE)

Fishers exact test

Given matrix of observed frequencies, obs:

fisher.test(obs)

The Actuarial Education Company © IFE: 2019 Examinations

Bootstrap Inference
Parametric bootstrap (mean)

For a sample x of size n from a normal distribution, the bootstrap empirical distribution of X is bm:

bm <- rep(0,1000)
set.seed(17)
for(i in 1:1000)
{bm[i]<-mean(rnorm(n,mean=mean(x),sd=sd(x)))}

Or using replicate:

set.seed(17)
bm <- replicate(1000,mean(rnorm(n,mean=mean(x),sd=sd(x))))

95% CI quantile(bm,c(0.025,0.975))
Test – either use equivalence with CIs, compare with critical values or calculate p-value using length.

Non-parametric bootstrap (mean)

For a sample x of size n from an unknown distribution, the bootstrap empirical distribution of X is bm:

bm <- rep(0,1000)
set.seed(17)
for(i in 1:1000)
{bm[i]<-mean(sample(x,replace=TRUE))}

Or using replicate:

set.seed(17)
bm <- replicate(1000,mean(sample(x,replace=TRUE)))

© IFE: 2019 Examinations The Actuarial Education Company

Non-parametric permutation tests
Test two means are equal – all permutations

For a samples x1 and x2 of sizes n1 and n2:

n1 <- length(x1)

results <- c(x1,x2)

index <- 1:length(results)

p<-combn(index,n1)

n<-ncol(p)

dif<-rep(0,n)

for (i in 1:n)

{dif[i]<-mean(results[p[,i]])-mean(results[-p[,i]])}

Find p-value:

ObsT <- mean(x1)-mean(x2)

length(dif[dif>=ObsT])/length(dif)

Test two means are equal – resampling

For a samples x1 and x2 of sizes n1 and n2:

n1 <- length(x1)

results <- c(x1,x2)

index <- 1:length(results)

Repeatedly resample from our combined vector of values

dif<-rep(0,10000)

set.seed(123)

for (i in 1:10000)

{p<-sample(index, n1, replace=FALSE)

dif[i]<-mean(results[p])-mean(results[-p])}

Find p-value:

ObsT <- mean(x1)-mean(x2)

length(dif[dif>=ObsT])/length(dif)

The Actuarial Education Company © IFE: 2019 Examinations

Non-parametric permutation tests
Paired test of two means – all permutations

For a samples of differences D of sizes nD:

nD <- length(D)

sign <- c(-1,1)

library(gtools)

p<-permutations(2,nD,sign,repeats.allowed=TRUE)

n<-nrow(p)

dif<-rep(0,n)

for (i in 1:n)

{dif[i]<-mean(D*p[i,])}

Find p-value:

ObsD <- mean(D)

length(dif[dif>=ObsD])/length(dif)

Paired test of two means - resampling

For a samples of differences D of sizes nD:

nD <- length(D)

sign <- c(-1,1)

Repeatedly resample from the vector of signs

dif<-rep(0,10000)

set.seed(123)

for (i in 1:10000)

{p<-sample(sign, nD, replace=TRUE)

dif[i]<-mean(D*p)}

Find p-value:

ObsD <- mean(D)

length(dif[dif>=ObsD])/length(dif)

© IFE: 2019 Examinations The Actuarial Education Company

PCA Summary
From scatch Using prcomp function

Prepare data:

Given a data frame DF:

data <- as.matrix(DF)

X <- scale(data, scale=FALSE) pca<-prcomp(DF1)

Eigenvectors of (scaled) variance-covariance matrix of X

New principal components (efficient orthogonal co-ord syst) based on old components (rotate axes):

W <- eigen(t(X) %*% X)$vectors pca$rotation

Principal components decomposition P of X

Co-ordinates of points in new PC co-ord system

P <- X %*% W pca$x

(scaled) variance-covariance matrix of P

S <- t(P) %*% P pca$sdev

diag(S) or summary(pca)

Reconstruct original (centred) data X

P %*% t(W)

Deciding which components to remove

• consider the variances – keep variables that explain 99%/95%/90% of variation

• use a scree plot – keep variables before the scree slope
plot(pca,type="line",main="scree plot")
• use Kaiser criterion – keep components whose var >1 if data scaled
pca<-prcomp(DF,scale=TRUE)
summary(pca)

The Actuarial Education Company © IFE: 2019 Examinations

All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is owned
by Institute and Faculty Education Limited, a subsidiary of
the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire

out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it is

not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In

addition, we may seek to take disciplinary action through the
profession or through your employer.

These conditions remain in force after you have finished using

the course.

Correlation Summary
Scattergraph

For bivariate data use:

plot(<x>,<y>) or plot(<dataframe>)

Options include: main, xlim, ylim, xlab, ylab, pch, col.

For multivariate data use:

plot(<dataframe>) or pairs(<dataframe>)

Options for pairs include: labels=c(“…”,”…”,…), panel=panel.smooth

Correlation

cor(<x>,<y>,method="pearson") or cor(<dataframe>,method="pearson")

method can be: “pearson” (default), “spearman” or “kendall”

Correlation Test

cor.test(<x>,<y>,method="pearson", alt=”two.sided”, conf=0.95)

Tests whether the correlation coefficient is zero

alternative is alternative hypothesis, choose from default "two.sided", or "less", "greater"
conf.level is size of the confidence interval, default = 95%

( )
For spearman if <50 pairs uses exact, otherwise uses N 0, n1−1 with continuity correction

(
For kendall if <50 pairs uses exact, otherwise uses N 0, 2(2 )
n+ 5)
9 n(n−1)
with continuity correction
Can specify always exact using option exact=TRUE

Linear Regression Summary
Scattergraph

plot(<x>,<y>) or plot(<dataframe>)

Options include: main, xlim, ylim, xlab, ylab, pch, col.

Linear regression model

lm(<formula>)

Y~1 null model (Y = α )

Y~X linear model (Y = α + β x)
Y~X-1 linear model through origin (Y = β x)

Options include: data, subset, na.action.

Use data to specify the data frame where variable names in the formula come from (if it’s not attached)
Use subset to choose a subset of the data (using logical commands) to carry out the regression on.

All the functions below assume the linear regression model is stored in <model>.

Add regression line (to existing plot)

abline(<model>)

Options include: lty, lwd, col.

Linear regression model

<model> displays fitted coefficients

Results can be extracted using: $coef, $fitted, $resid and $df.

Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $r.squared, $fstatistic
However coef and df extract more from summary(model) than they do from model.

Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”) or skip argument and use indexing.

Sum of squares (ANOVA)

anova(<model>) displays the following:

Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted
points(<x>,fitted(<model>)

Mean and individual response

predict(<model>, newdata, interval, level=0.95)

Use newdata to give a data frame of explanatory variables to be used to predict response variables.
eg newdata <- data.frame(<x>=30)
Use interval set to “confidence” for mean response and “predict” for individual response.
Predict without newdata specified gives the fitted values (ie it uses existing explanatory variables).

Residuals and checking the fit

resid(<model>) or <model>$resid

plot(<model>,1) plot(<model>,2)
residuals vs fitted values Q-Q plot of residuals

Multiple Linear Regression Summary
Scattergraph

plot(<dataframe>) or pairs(<dataframe>)

Options for pairs include: labels=c(“…”,”…”,…), panel=panel.smooth

Linear regression model

lm(<formula>)

Y~X1+…+Xk linear model (Y = α + β1 x1 +  + βk xk )

Y~X1+X2+X1:X2 linear model (Y = α + β1 x1 + β2 x2 + γ 12 x1 x2 )
Y~X1*Xk linear model (Y = α + β1 x1 + β2 x2 + γ 12 x1 x2 )
Y~I(X^2) linear model (Y = α + β x 2 )

All the functions below assume the linear regression model is stored in <model>.

Multiple Linear regression model

<model> displays fitted coefficients

Results can be extracted using: $coef, $fitted, $resid and $df.

Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $adj.r.sq, $fstatistic
However coef and df extract more from summary(model) than they do from model.

Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”) or skip argument and use indexing.

Sum of squares (ANOVA)

anova(<model>) displays the following:

Splits SSREG between the explanatory variables

First row gives results for model of just that variable
Subsequent rows gives results if that variable is added to the previous lines
Overall F test in summary
Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted

Mean and individual response

predict(<model>, newdata, interval, level=0.95)

Use newdata to give a data frame of explanatory variables to be used to predict response variables.
Use interval set to “confidence” for mean response and “predict” for individual response.
Predict without newdata specified gives the fitted values (ie it uses existing explanatory variables).

Residuals and checking the fit

resid(<model>) or <model>$resid

plot(<model>,1) residuals vs fitted values

plot(<model>,1) Q-Q plot of residuals

Updating models

update(<model>,.~.+<new variable>)

Comparing models

Compare adjusted R2
Check all parameters significant
anova(<model1>,<model2>,test=”F”)

Unless prior authority is granted by ActEd, you may not hire

out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it is

not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In

addition, we may seek to take disciplinary action through the
profession or through your employer.

These conditions remain in force after you have finished using

the course.

GLMs Summary
Fitting a Generalised Linear Model

glm(<formula>, family = <…> (link=”…”))

Family and (default) canonical link function and other available link functions:

gaussian : "identity" "log" "inverse"

binomial : "logit" "log" "probit"
Gamma : "inverse" "log" "identity"
poisson : "log" "sqrt" "identity"

Functions as for multiple linear regression.

Use factor(<x>) to force a numeric variable to be a factor

All the functions below assume the GLM is stored in <model>.

GLM analysis

<model> displays fitted coefficients, dof, deviance, AIC

Results can be extracted using: $coef, $fitted, $resid, $df.resid, $deviance, $aic
Or by using the functions: coef(<model>), fitted(<model>), resid(<model>)

summary(<model>) displays the following:

Results can be extracted using: $resid , $coef, $df, $sigma, $r.squared, $fstatistic
However coef and df extract more from summary(model) than they do from model.

Confidence intervals for coefficients

confint(<model>, parm=<parameters>, level=0.95)

Use parm to specify which parameters (default = all) to create a confidence interval for.
Specify by number (1=alpha, 2=beta) or by name (eg “claim”).

Sum of squares (ANOVA)

anova(<model>) displays the following:

Extract indiv results using indexing, eg anova(<model>)[1,2].

Fitted values

fitted(<model>) or <model>$fitted

Predicted values

predict(<model>, newdata, type=”link”)

Gives values of parameters in linear predictor by default.

Use type set to “response” for giving the values of the response variable.

Residuals and checking the fit

resid(<model>,type=”deviance”)

Use type set to “response” for raw residuals, “deviance” (default) or “pearson”

plot(<model>,1) residuals vs fitted values

plot(<model>,2) Q-Q plot of residuals

Updating models

update(<model>,.~.+<new variable>)

Comparing models

Compare AIC
Check all parameters significant
anova(<model1>,<model2>,test=”F”)
or anova(<model1>,<model2>,test=”Chisq”)

CS2B Workbook
No ratings yet
CS2B Workbook
149 pages
CS1B Workbook Answers
100% (2)
CS1B Workbook Answers
132 pages
Cs2 Compiler Setted (1 To 12) With Solutions (23.03.23)
100% (1)
Cs2 Compiler Setted (1 To 12) With Solutions (23.03.23)
402 pages
ct42005 2011
80% (5)
ct42005 2011
404 pages
Cs1 Compiler With Index (13.03.2023)
No ratings yet
Cs1 Compiler With Index (13.03.2023)
209 pages
CT5 PXS 11
50% (2)
CT5 PXS 11
88 pages
ct62010 2013
100% (2)
ct62010 2013
165 pages
CS1A Workbook For Sept 2020 Exams Sankhyiki
No ratings yet
CS1A Workbook For Sept 2020 Exams Sankhyiki
193 pages
Life Contingencies
100% (1)
Life Contingencies
394 pages
CT8 - P XS - 13: Series X Solutions
100% (1)
CT8 - P XS - 13: Series X Solutions
82 pages
ACTEX Learning: R Formula & Review Sheet
No ratings yet
ACTEX Learning: R Formula & Review Sheet
52 pages
CS2 Summary
No ratings yet
CS2 Summary
279 pages
Risk Models 1 & 2
100% (1)
Risk Models 1 & 2
81 pages
ct52010 2014
100% (1)
ct52010 2014
202 pages
AM1 - Tutorial 1
100% (1)
AM1 - Tutorial 1
12 pages
Stam Formula Sheet
No ratings yet
Stam Formula Sheet
8 pages
ct52005 2009
100% (1)
ct52005 2009
181 pages
Course 4 Examination Questions and Illustrative Solutions November 2000
100% (1)
Course 4 Examination Questions and Illustrative Solutions November 2000
58 pages
Lesson 4 - The Life Table
50% (2)
Lesson 4 - The Life Table
11 pages
Mathematical Typing Shortcuts
No ratings yet
Mathematical Typing Shortcuts
9 pages
Lecture Notes (Chapter 1) ASC2014 Life Contingencies I
No ratings yet
Lecture Notes (Chapter 1) ASC2014 Life Contingencies I
28 pages
CS1 CMP Upgrade 2020
No ratings yet
CS1 CMP Upgrade 2020
94 pages
CT5 2015-2
100% (1)
CT5 2015-2
46 pages
MST-004 - Statistical Inference PDF
No ratings yet
MST-004 - Statistical Inference PDF
415 pages
Lecture9 SIQ3003 PDF
0% (1)
Lecture9 SIQ3003 PDF
18 pages
Data Analysis
No ratings yet
Data Analysis
19 pages
Cm2a Mock 4 Solutions
No ratings yet
Cm2a Mock 4 Solutions
18 pages
5 6314393805420232845 PDF
No ratings yet
5 6314393805420232845 PDF
11 pages
CT1 CHP 15 Stochastic Interest Rate Models
No ratings yet
CT1 CHP 15 Stochastic Interest Rate Models
14 pages
Examinations: 18 April 2000 (Am)
No ratings yet
Examinations: 18 April 2000 (Am)
205 pages
CS1B April23 EXAM Clean Proof v2
No ratings yet
CS1B April23 EXAM Clean Proof v2
6 pages
Specimen Paper CS1
No ratings yet
Specimen Paper CS1
7 pages
Subject CS1B: Actuarial Statistics 1
No ratings yet
Subject CS1B: Actuarial Statistics 1
4 pages
Loss Distribution
No ratings yet
Loss Distribution
19 pages
Acteduk Ct6 Hand Qho v04
No ratings yet
Acteduk Ct6 Hand Qho v04
58 pages
CS1B April22 EXAM Clean Proof
No ratings yet
CS1B April22 EXAM Clean Proof
5 pages
ct62005 2010
No ratings yet
ct62005 2010
236 pages
Past Exams Subject 106 2000-2004
No ratings yet
Past Exams Subject 106 2000-2004
208 pages
Probability & Statistics 2: AS2110 / MA3666
No ratings yet
Probability & Statistics 2: AS2110 / MA3666
32 pages
Subjectct52005 2009 PDF
No ratings yet
Subjectct52005 2009 PDF
181 pages
CT5, Actuarial Knowledge
No ratings yet
CT5, Actuarial Knowledge
15 pages
Untitled
No ratings yet
Untitled
13 pages
CS1 Formulae Sheet
No ratings yet
CS1 Formulae Sheet
14 pages
CS2 - Risk Modelling and Survival Analysis Core Principles: Syllabus
0% (1)
CS2 - Risk Modelling and Survival Analysis Core Principles: Syllabus
8 pages
Exercises Probability and Statistics: Bruno Tuffin Inria, France
No ratings yet
Exercises Probability and Statistics: Bruno Tuffin Inria, France
33 pages
CM2 Mock 6 Paper A
No ratings yet
CM2 Mock 6 Paper A
6 pages
CS2 B Chapter 1 Poisson Processes - Questions
No ratings yet
CS2 B Chapter 1 Poisson Processes - Questions
2 pages
CT6 CMP Upgrade 09-10
No ratings yet
CT6 CMP Upgrade 09-10
220 pages
CM2 Mock 7 Paper A
No ratings yet
CM2 Mock 7 Paper A
6 pages
R Class 10
No ratings yet
R Class 10
5 pages
CT6 Statistical Methods PDF
0% (2)
CT6 Statistical Methods PDF
6 pages
Exercises: CS1B-15: EBCT - Exercises
No ratings yet
Exercises: CS1B-15: EBCT - Exercises
4 pages
CT6
No ratings yet
CT6
7 pages
IandF CT6 201809 ExamPaper
No ratings yet
IandF CT6 201809 ExamPaper
6 pages
CFA L2 SimpleSheets Formula Sheet Final
No ratings yet
CFA L2 SimpleSheets Formula Sheet Final
5 pages
Institute of Actuaries of India: Subject SA1 - Health and Care Insurance
No ratings yet
Institute of Actuaries of India: Subject SA1 - Health and Care Insurance
4 pages
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 5: Dummy Variables
23 pages
Institute of Actuaries of India: Subject ST6 - Finance and Investment B
No ratings yet
Institute of Actuaries of India: Subject ST6 - Finance and Investment B
7 pages
Regression Formula
No ratings yet
Regression Formula
2 pages
CS1B Pre-Work Material Instructions
No ratings yet
CS1B Pre-Work Material Instructions
1 page
Actuarial Science
No ratings yet
Actuarial Science
3 pages
Tugas 1 Kapita Selekta Statistika II
No ratings yet
Tugas 1 Kapita Selekta Statistika II
14 pages
Actuarial CT8 Financial Economics Sample Paper 2011 by ActuarialAnswers
No ratings yet
Actuarial CT8 Financial Economics Sample Paper 2011 by ActuarialAnswers
8 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Formulae
No ratings yet
Formulae
2 pages
Lecture 12 (Data Analysis and Interpretation
No ratings yet
Lecture 12 (Data Analysis and Interpretation
16 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Violation of OLS - Autocorrelation
No ratings yet
Violation of OLS - Autocorrelation
11 pages
Thesis Using Multiple Regression
100% (4)
Thesis Using Multiple Regression
5 pages
MAT 120 Chapter 9 Notes PDF
No ratings yet
MAT 120 Chapter 9 Notes PDF
4 pages
Matrix Plot of Law. SCH Gpa Vs Under Grad G, Lmat Perctl, Qlty Rating & Gre
No ratings yet
Matrix Plot of Law. SCH Gpa Vs Under Grad G, Lmat Perctl, Qlty Rating & Gre
4 pages
DiD Regression
No ratings yet
DiD Regression
18 pages
Introduction To Econometrics 3rd Edition James H. Stock - Ebook PDF Instant Download
100% (1)
Introduction To Econometrics 3rd Edition James H. Stock - Ebook PDF Instant Download
44 pages
CH 06
No ratings yet
CH 06
22 pages
Inferential Statistics
No ratings yet
Inferential Statistics
3 pages
Defining Model 1 (Null Model) With PASW Menu Commands: Models: Specified Subjects and Repeated
No ratings yet
Defining Model 1 (Null Model) With PASW Menu Commands: Models: Specified Subjects and Repeated
18 pages
Web Sites: Xy XX Xy XX
No ratings yet
Web Sites: Xy XX Xy XX
4 pages
Unveil Conditional Diffusion Models With Classifier-Free Guidance: A Sharp Statistical Theory
No ratings yet
Unveil Conditional Diffusion Models With Classifier-Free Guidance: A Sharp Statistical Theory
92 pages
Confidence Interval For Printing
No ratings yet
Confidence Interval For Printing
6 pages
Loss Functions
No ratings yet
Loss Functions
8 pages
Stock Watson Ecta 1993
No ratings yet
Stock Watson Ecta 1993
38 pages
Assignment-07: Roll No. - 453046 Name - Shrikant Prabhakar Pawar
No ratings yet
Assignment-07: Roll No. - 453046 Name - Shrikant Prabhakar Pawar
3 pages
Session 15-Logistic Regression
No ratings yet
Session 15-Logistic Regression
16 pages
No 2 (SPSS) : Variables Entered/Removed
No ratings yet
No 2 (SPSS) : Variables Entered/Removed
3 pages
Data Analytics & R - Assignment 1 - Shuswalini
No ratings yet
Data Analytics & R - Assignment 1 - Shuswalini
9 pages
SEM One Dependent
No ratings yet
SEM One Dependent
13 pages
Answer Keys - Excercise Questions-Ch10
No ratings yet
Answer Keys - Excercise Questions-Ch10
4 pages
Sample Data For Regression
No ratings yet
Sample Data For Regression
2 pages
Variables Entered/Removed: E. Hasil Yang Di Harapkan Kelas E Pagi
No ratings yet
Variables Entered/Removed: E. Hasil Yang Di Harapkan Kelas E Pagi
4 pages

CS1 R Summary Sheets

Uploaded by

CS1 R Summary Sheets

Uploaded by

Basic R commands Summary

The Actuarial Education Company © IFE: 2019 Examinations

Bar chart (suitable for discrete data)

Example: bar chart of Bin(20,0.3) probabilities:

type="p" points (default option)

Example: PDF of N(10,4) probabilities:

xlim= x-axis limits, eg xlim=c(0,20)

lty= line type, eg 1=solid, 2=dashed, 3=dotted, etc

© IFE: 2019 Examinations The Actuarial Education Company

abline(a,b) plots straight line with intercept a and gradient b

abline(model) if insert linear model – will plot the regression line

abline(h=…) plots a horizontal line at …

The Actuarial Education Company © IFE: 2019 Examinations

The Actuarial Education Company © IFE: 2019 Examinations

We can generate an empirical distribution of the sample means as follows:

xbar <- rep(0,1000)

We can superimpose (in blue) the true PDF:

xvals <- seq(2.5,7.5,0.01)

We can compare with a normal distribution by drawing a QQ plot:

In this case, we get 0.099 and 0.0786496.

The Actuarial Education Company © IFE: 2019 Examinations

E ( X ) = E[E ( X |λ )] = E[λ ] = 25 = 2.5

var( X ) = E[var( X |λ )] + var[E ( X |λ )] = E[λ ] + var[λ ] = 25 + 52 = 3.75

R code for the above:

Now we can find the unconditional mean and variance:

The Actuarial Education Company © IFE: 2019 Examinations

The Bayesian estimate under quadratic loss is:

R code for the above with sample size 10:

The average of the Bayesian estimates is:

For this particular seed value we get a mean of 2.508583.

© IFE: 2019 Examinations The Actuarial Education Company

The Bayesian estimate under quadratic loss is:

This can be expressed in the form of a credibility estimate:

R code for the above with sample size 10:

The average of these credibility premiums is:

The Actuarial Education Company © IFE: 2019 Examinations

The Actuarial Education Company © IFE: 2019 Examinations

Claims per policy Xij and average X i

Policy totals, Pi , P and P*

Credibility Premiums per policy

Given a vector of volumes for the coming year, new.volume

cred.prem <- Zi*Xibar+(1-Zi)*m

© IFE: 2019 Examinations The Actuarial Education Company

<name>.test(<data>, alternative=”two.sided”, conf.level=0.95)

Tests additionally require the null hypothesis parameter value to be specified:

<name>.test(<data>, <null param>=…, alternative=”two.sided”,

The following results can be extracted from the functions:

$method type of test

One sample confidence intervals/tests

Mean (unknown variance)

CI t.test(x, alternative="two.sided", conf.level=0.95)

mu is value of mu in null hypothesis, default =0

Variance No function – do it from 1st principles,

CI binom.test(<x>,<n>, alternative="two.sided", conf.level=0.95)

x is number of successes (or a vector of successes and trials)

The Actuarial Education Company © IFE: 2019 Examinations

Two sample confidence intervals/tests

Means (unknown variance)

CI t.test(<data1>,<data2>, alt="two.sided", var.equal=FALSE,

mu is value of μ1 − μ2 in the null hypothesis, default = 0

Means (paired data)

Variances ratio var(<data1>)/var(<data2>)

CI var.test(<data1>,<data2>, alt="two.sided", conf=0.95)

ratio is value of ratio var(<data1>)/var(<data2>) in null hypothesis, default = 1

Test prop.test(x,n,p=NULL, alternative="two.sided",conf.level=0.95,correct=TRUE)

Poisson RATIO (not difference)

Test poisson.test(x,n,r=1, alt="two.sided",conf.level=0.95)

© IFE: 2019 Examinations The Actuarial Education Company

Given expected probabilities:

chisq.test(<obs freq>, p=<exptd probabilities>)

Default exptd probabilities are uniform

Given expected frequencies:

chisq.test(<obs freq>, p=<exptd freq>,rescale.p=TRUE)

From 1st principles:

statistic <- sum((obs-exptd)^2/exptd)

The following results can be extracted from the functions:

cred.prem <- ZiXibar+(1-Zi)m