0% found this document useful (0 votes)
3 views471 pages

CS1B R Programming

The document provides a guide for installing R and RStudio, essential tools for statistical programming in the actuarial profession. It includes step-by-step instructions for downloading and setting up both software, as well as an overview of basic operations and functionalities within R and RStudio. Additionally, it emphasizes the importance of practice and offers tips for using the command line effectively.

Uploaded by

tmardhiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views471 pages

CS1B R Programming

The document provides a guide for installing R and RStudio, essential tools for statistical programming in the actuarial profession. It includes step-by-step instructions for downloading and setting up both software, as well as an overview of basic operations and functionalities within R and RStudio. Additionally, it emphasizes the importance of practice and offers tips for using the command line effectively.

Uploaded by

tmardhiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 471

R1: Getting started Page 1

Getting started
Covered in R1

 Installing R and RStudio


 Working with R and RStudio

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R1: Getting started

1 Installing R
R is an open‐source programming language that is increasingly being used in the actuarial
profession with many applications in the world of statistics. If you haven’t used R before then
your first job is probably to install it on your computer or device. This section gives you a rough
guide on how to download and install R, although you may find that the process is slightly
different depending on the device you are using. You may also need to ask someone with
administrative privileges for help (eg your IT Service Desk) if you are installing R on a work
computer.

If you encounter any problems, then please do not contact ActEd for help. You will probably find
a solution much more quickly if you search the internet. Many people have published installation
guides as well as numerous problem solving tips in discussion forums.

But here’s our guide:

1. Visit http://www.stats.bris.ac.uk/R/. (If you are IT savvy then you can probably find the
relevant file to download and install R without following the instructions below.)

2. Click on “Download R for Windows” (assuming you are using a Windows‐based system).

3. Click on “base”.

4. Click on “Download R 3.5.1 for Windows” or whatever the latest version is.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 3

5. Next you will probably


need to “Save File”:

6. Find the downloaded file (eg R‐3.3.0‐win.exe) where it was saved and double click on it.

7. You may then have a security


warning which you’ll need to
dismiss with “Run”.

8. You can quickly progress through the next four pop‐ups with one click of OK and three
clicks of Next. However, please note the information in the third popup about
administrator rights.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R1: Getting started

9. Choose where you want to install R (or use the default


location) and click Next. If you are using a work computer,
it may be easiest to use a folder in your personal area, for
example in My Documents.

10. Select the installation type appropriate for


your device.

11. Click No and then Next to select the defaults.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 5

12. A couple more clicks of Next (after changing the options if you wish) and then Finish and
you’re done.

13. You should then be able to run R using the desktop icon or via the Start menu.

We will look at how to use R later.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R1: Getting started

2 Installing RStudio
We will have a very brief look at working directly in R in the next section, but most of the time we
will instead be working in RStudio. This is a more user‐friendly interface which you will probably
find easier to use. So you now need to install this as well:

1. Visit https://www.rstudio.com/products/rstudio/download/#download and download the


relevant open source RStudio Desktop for your operating system.

2. Run the .exe file and follow the installation instructions. (You may need to ask your IT
service desk for help if you need administrator privileges.)

3. If you run into any difficulties then we recommend that you browse the internet for help.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 7

3 Working directly in R
Although we will be working most of the time in RStudio, it might be useful to have a quick look at
R itself.

When you load up R you’ll be greeted with R’s graphical user interface (or GUI for short).

Inside this you’ll see one open window which is called the R console.

Everything interesting happens in this window.

R console

You can change the display preferences by Edit/GUI preferences, where you can change features
like the font, size and style (normal, bold, italic) of the text:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R1: Getting started

Clearing the screen


We can clear the console window either by typing
CTRL+L (or choosing clear console from the Edit
menu):

Entering commands
Unlike modern mouse driven programs, R is a command based programming language.

So rather than choosing options from menus or clicking on icons we’ll be typing commands into
the console window that tell R what we want it to do.

We’ll then execute those commands by pressing enter.

R will return the results of our instruction in this console window.

For example if we type 2+3 and then press enter we’ll get
5, as shown.

In this introduction, we will write the command you will enter in red and the results of executing
that command in blue.

So for the above we would have written:

2+3

The trouble with R is going to be remembering the names of the commands, which is made
harder by the fact that R is case sensitive…

We’ll cover the essential commands later.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 9

Graphics window

If we produce any graphics then they will appear in a separate window to the console, called the
graphics window.

R console

Graphics window

To see for yourself type:

demo(graphics)

You’ll need to hit enter each time to move onto the next graphic.

Use the standard windows icons to maximise, minimise or close the


graphics window.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R1: Getting started

Script window
Rather than entering commands directly in the console window we can use another window
called the script window (or script editor).

Script window

You can open this window first by clicking in the Console (to get the right menus at the top of the
GUI), and then choosing New script from the File menu:

Just like a script for a play or movie which contains the lines that you read out – it has the lines of
commands which can be “read out” or put into the console window either using copy and paste,
or more quickly by clicking on the line and typing CTRL+R. We’ll talk about scripts more later on.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 11

Other interfaces
This is the basic graphics user interface (GUI) for R. As already mentioned, other packages are
also available which offer user‐friendly features, for example RStudio, which we will be using
shortly. Another example is R commander which is another GUI which expands the menus to
include standard commands such as importing data, producing graphs, carrying out tests and
fitting models to the data set.

Quitting R
To end your session in R you could type the command quit( ) or just q( ) in the console.

Alternatively choose exit from the File menu or just click on the close window icon in the corner:

R will then ask you if you want to save the workspace image:

We’ll talk about workspaces more later. Suffice to say, if you have created any objects (that is
important things assigned a special name) that you want next time then you may wish to click on
yes. If you haven’t done anything you wish to save, the just click No.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R1: Getting started

4 Working in RStudio
Now you have had a quick look at R, it won’t take long to become familiar with the basics of
RStudio.

Start/run RStudio and you will probably see something like this:

The panel on the left hand side is simply R’s Console, which we have already met. The panel at
the top on the right has a number of tabs. The first is the Environment (or Workspace) which will
prove very useful as it displays the values of variables and contents of datasets that we are using.
The second tab, History, not surprisingly displays a history of your work in R. We won’t worry
about the third tab for now.

The panel at the bottom on the right also has a number of useful tabs. One displays recently used
files, allowing you to access them quickly. Another, Plots, is simply the graphics window and will
display the plots/graphs that you ask R to produce. There are also important tabs called Packages
and Help which we will look at later.

If you open up a Script in RStudio, using File, New File, R Script, or by clicking on
the drop‐down arrow and then R Script (found in the top right‐hand corner of
the screen), then RStudio will display all four panels neatly arranged.

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 13

You can drag the edges of the panels to size them as you wish:

Commands
Just like in R, to clear the Console press Ctrl L from within the prompt of the Console.

To run lines of codes from the Script window, press Ctrl Enter (and not Ctrl R). Alternatively use
the Run Button on the top bar of the Script window.

Quitting RStudio
You can exit RStudio in the same way you exit R, or just press Ctrl Q.

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 R1: Getting started

5 Summary

Key terms
GUI Graphics User Interface

The name given to the appearance of the R program on your computer.

Console window The window where commands are entered and then executed by hitting
the enter key.

Graphics window The window where graphics are displayed. You can then export these to
put in any documentation you produce.

Script window A window where commands can be written but not executed. We can
transfer them to the console window and execute them using CTRL+R.
This will be covered in a later chapter.

Menus
R RStudio

File, Exit File, Quit Session Exit/quit the R program

Edit/Clear console Edit/Clear console Clear the console window

Edit/GUI preferences Tools, Global Options, To change the font size, type,
Appearance etc

Key commands
R RStudio

Ctrl L Ctrl L Clear the console

quit() or q() quit() or q() or Ctrl Q Quit the program

Ctrl R Ctrl Enter Used in a Script window to


run a line or selected lines of
code

© IFE: 2019 Examinations The Actuarial Education Company


R1: Getting started Page 15

6 Have a go
You will only get proficient at R by practising.

Try the following in R or RStudio:

 Start a new session


 Clear the console screen
 Use R to calculate 3+5 (or something more daring)
 Quit R using a command, not the menu or windows icons.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R2: Basic arithmetic Page 1

Basic arithmetic
Covered in R2

 Using R for basic arithmetic


 The index numbering in the R Console
 Scrolling back in the Console
 Non‐numeric output

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R2: Basic arithmetic

1 Arithmetic operators
Load up RStudio and clear the Console (using Ctrl L from within the Console).

Recall from the previous chapter that R is a command based programming language.

So we have to type in commands rather than clicking on icons or through menus.

In the Console you’ll see a little greater than symbol which is the prompt:

>

We’ll type in a command at the prompt and then press enter to ask R to execute the command.

R will then return the result, or produce the graphical output or send the output to a file or
device.

Throughout this and later chapters we will use red for the commands we enter and blue for the
output/results that we get by executing that command.

© IFE: 2019 Examinations The Actuarial Education Company


R2: Basic arithmetic Page 3

Addition
So let’s get R to carry out out some simple arithmetic. Type:

2+3

We hit enter to execute this command.

R returns as the output to this command the correct answer of:

Index numbers

However note that the output line is preceded by a [1]:

This is an index number of the first answer on that line.

If the output consists of many values over several lines then each new line starts with the index
number of the first element on that line.

So supposing I had five answers, three answers on the 1st line (say 5, 7 and 2) and two on the 2nd
line (say 8 and 3). Then the 1st line would have [1] and the 2nd line would have [4], eg:

[1] 5 7 2

[4] 8 3

Subtraction
Similarly we can subtract numbers. Enter the following:

‐3 ‐ ‐ 5

Spaces

Note that R ignores spaces so typing the same command with lots of spaces will give the same
answer. Try it again with spaces between the numbers, signs and operator:

‐ 3 ‐ ‐ 5

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R2: Basic arithmetic

Spaces are often useful to help us read complicated commands.

Multiplication and division


Just like Excel and other software, multiplication uses the asterisk and division uses the forward
slash:

3*2‐1/4

5.75

We can see that R uses algebraic logic to calculate expressions – that is it calculates multiplication
and division before addition and subtraction.

So to do something different we’ll need to use brackets:

3*(2 – ¼)

5.25

Note that RStudio automatically enters the closed bracked for you.

Editing previous commands

Suppose you make a mistake – and you actually wanted to calculate 1 / 5 not ¼.

Rather than retyping the expression again, you can use the up arrows key  to move back to
previous commands and then edit that command and re‐execute it.

Try it out yourself. Use  to bring up the previous command then  to go back into that
command to edit it to read:

3*(2‐1/5)

5.4

Alternatively you could use the mouse to copy and paste the expression.

Powers
Powers can be obtained using the ^ key (or even a double asterisk, **):

2^3

2**3

© IFE: 2019 Examinations The Actuarial Education Company


R2: Basic arithmetic Page 5

Non‐numeric outputs
Let’s look at some non‐numeric outputs you might see in the future. Try entering the following:

5/0

You’ll get the output:

Inf

which, unsurprisingly, stands for infinity.

However if we try to calculate:

0/0

we get the output:

NaN

which isn’t a misspelling of naan but stands for “Not a Number”, ie it’s undefined.

Mistakes and correcting them


Finally, what happens if we make a mistake?

Suppose we execute the following incomplete command:

2^

R realises that we have clearly not finished our command and so it helpfully puts a + prompt to
say “and…”:

We can then enter the bit we’ve forgotten:

This + prompt also appears if our expression goes onto the next line or we can use this feature to
deliberately split up longer expressions over several lines to make it easier to read.

If you make a mistake and want to cancel what you are doing and type something else then just
press the Esc button.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R2: Basic arithmetic

2 Summary

Key terms
Prompt The icon that appears to the left of each line

> means R awaits the next command

+ means R is expecting more input for the current command.

Index number The start of each line of output has an index number eg [1]

This gives the position of the first output on that line.

Inf An output from R which is infinite

NaN An output from R which is “not a number”, ie undefined

Key commands
+ add

‐ subtract

* multiply

/ divide

^ power

** power

 cycle through previous commands

ESC stop a command from finishing executing

© IFE: 2019 Examinations The Actuarial Education Company


R2: Basic arithmetic Page 7

3 Have a go
You will only get proficient at R by practising.

1. Without referring to the previous chapter, try the following:


– start a new session
– clear the console screen.

2. Use R to calculate the following:

3‐‐8

‐7+5

3×(2+5)

45

8÷0

1 - 1.08-3
0.08

3. Quit R using an R command, not using the menu or windows icons.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R3: Basic functions Page 1

Basic functions
Covered in R3

 Functions
 Log and exponential
 Square root
 Trigonometric functions
 Factorial and Choose

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R3: Basic functions

1 Introduction
In the previous chapter we covered the common arithmetic operators. In this chapter we’re
going to look at functions for simple mathematical operations (such as logs and trig functions)
that we can find on a scientific calculator.

There are literally hundreds of functions included in the standard R program that cover
mathematical operations, statistical analysis, graphing and many other purposes.

We can also get extra functions for specific features (such as time series analysis, twitter data
mining or prettier graphics) by downloading something called packages. We’ll cover how to do
this in a later chapter.

The form of a function


All functions in R operate on the following principle:

<function name> (<value1>, <value2>, … , <option1>, <option2>, …)

arguments

You have the name of the function, followed by brackets. Inside the brackets you put the
arguments of the function in a specified order separated by commas. There are two types of
arguments: the values you’re going to put into the function and then the options. If we omit any
options then R will use the default settings.

Note that both brackets are needed even if there are no arguments. For example, in an earlier
video, we introduced the command for quitting R which was quit( ) or q( ). It has no arguments –
but it’s still got both brackets, which RStudio will remind you by automatically inserting the closed
bracket.

© IFE: 2019 Examinations The Actuarial Education Company


R3: Basic functions Page 3

2 Log, exponential and square root


Let’s start with the log function.

The function name is, unsurprisingly, log. The value is, unsurprisingly, the value we wish to find
the log of and there is only one option which is the base of the log.

log(1000, base = 10)

We see the answer is 3 as 10^3 = 1000.

Again note the index number [1] at the beginning of the output line – which, as mentioned
previously, gives the index number of the first answer on that output line.

Recall that R ignores spaces in commands, whereas spaces make life a bit easier for us humans to
read the commands. So we could put spaces after the comma, or around the equals or anywhere
that helps.

We could abbreviate argument names (as long as the abbreviation is unique)

log (1000, b=10)

or (being quite risky for an actuary) we could omit the option names altogether.

Just make sure you’ve got all the arguments in the brackets in the correct order!

log (1000, 10)

Let’s now omit the option altogether.

log(1000)

6.907755

R uses its default option setting which is base e, ie natural log (in this case ln 1000).

Non‐numeric outputs
Let’s just remind ourselves of some unusual outputs that we might get.

log(0)

–Inf

This obviously stands for –infinity.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R3: Basic functions

log(‐5)

NaN

Recall that this stands for “not a number” as the log function is undefined for negative numbers.

Errors in commands
Finally recall that R is case sensitive. So let’s try that command again (you can use the up cursor
key to get this command to reappear). If you type log with a capital L you’ll get this output:

So remembering the names of functions and their arguments is important and we’ll look at some
more ways to do this in the next chapter. However, you have probably already noticed that as
you type in functions in RStudio, it tries to predict what you are trying to do and also displays
information about the function selected. You can scroll down through the suggestions to find the
one you’re looking for.

If the function you are looking for is selected before you have finished typing, you can press enter
to jump to the arguments.

© IFE: 2019 Examinations The Actuarial Education Company


R3: Basic functions Page 5

Exponential
The exponential function is exp(x)

The only argument is the value you’re finding the exponential of. It doesn’t have any options.

exp(‐5)

0.006737947

Square root
We covered powers in the previous chapter so we could calculate the square root either by taking
the power of a half:

36^(1/2)

Or we can use the dedicated sqrt function:

sqrt(36)

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R3: Basic functions

3 Trigonometric functions
As you’d expect these are given by sin( ), cos( ) and tan ( ).

The only argument is the value you’re inputting and there are no options.

sin(30)

‐0.9880316


As you can see the function assumes the input is in radians not degrees ( radians  degrees  ).
180
We can enter pi as pi in R:

pi

3.141593

sin(pi/6)

0.5

cos(pi)

‐1

tan(pi/2)

1.633178e+16

This should be undefined. Why is it not? Because the value stored for pi is a little limited. There
are ways of getting more accurate values that we won’t cover here. Similarly:

sin(pi)

1.224606e‐16

This should be zero but is again hindered by the rounded value of pi.

The inverse trig functions arcsin, arcosine and arctangent are: asin( ), acos( ) and atan( ).

Again there are no options.

acos(1)

Hyperbolic trigonometric functions

Hyperbolic functions are defined in a similar way:

sinh(x) cosh(x) tanh(x)

asinh(x) acosh(x) atanh(x)

© IFE: 2019 Examinations The Actuarial Education Company


R3: Basic functions Page 7

4 Factorials and combinations


To calculate the factorial we, unsurprisingly, use the function factorial(x). The only argument is
the value we’re finding the factorial of. There are no options:

factorial(6)

720

Let’s find the factorial of a different number. Recall that we can use the up arrow  to cycle
through the previous commands. Use this to bring up the previous factorial command and
change it to calculate 0!

factorial(0)

Finally to calculate combinations, nCk , we use the choose function:

choose(n, k)

This has two arguments which are the values n and k. There are no options. For example:

choose (10,3)

120

Errors in commands
What happens if we enter only one of the two values? Say choose(20)?

R tells us helpfully that we’ve missed an argument and there’s no default. Which is a polite way
of telling us we’ve made a mess of things.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R3: Basic functions

5 Summary

Key terms
Function An R command that has inputs = <value1>, <value2>, … and also has
options for how it works. It has the following form:

<function name> (<value1>, <value2>, … , <option1>, <option2>, …)

Arguments The values and options of a function

Default option If an option is omitted in a function it will take the default

Key commands
log(x, base = b) logb x default base= exp{1} , ie it gives lnx

log(x) lnx

exp(x) ex

sqrt(x) x

sin(x), cos(x), tan(x) trig functions sine, cosine and tangent, in radians

asin(x), acos(x), atan(x) inverse trig functions, in radians

sinh(x), cosh(x), tanh(x) hyperbolic trig functions

asinh(x), acosh(x), atanh(x) inverse hyperbolic trig functions

factorial(x) x!

n
choose(n, k) Ck

© IFE: 2019 Examinations The Actuarial Education Company


R3: Basic functions Page 9

6 Have a go
You will only get proficient at R by practising:

1. Without referring to the previous chapter, use R to calculate the following:

3×(2+5)

4‐5

8÷0

2. Without referring to this chapter, use R to calculate the following:

log(100)

ln5

log2 32

ln(10) what does this output mean?

log0 what does this output mean?

e 3

3  7
4

sin(4 ) , cos( 6 ) , tan( 2 )

sin1 (1) , cos1 (0) , tan1 (½)

sinh(5) , cosh1 (3)

10!

8
C4

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R4: Getting help Page 1

Getting help

Covered in R4

 Generic help with R and RStudio


 Specific help with functions
– Finding the arguments of a function
– Finding functions starting with a phrase
– Finding functions including a phrase
– Finding help for a function

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R4: Getting help

1 Generic help
In the last chapter we introduced some basic mathematical functions in R. However because R is
a command based programming language – if you don’t know how the commands work you’re
going to have trouble getting R to do what you want.

General help

When you start up R or RStudio it suggests you could type

help.start( )

In RStudio, this will display, in the bottom‐right window, general help html page which links to all
the manuals, reference documents and other miscellaneous material:

The most useful general item is “An Introduction to R” which is the starting R manual. However,
the manuals and FAQ documents are clearly intended for experts and so aren’t going to help you
much until you get much more proficient.

RStudio has its own help page which you can access by clicking the home button in the Help
window:

This links to lots of useful information and you might wish to try a few of the links yourself.

In particular, the first link, Learning R Online, recommends lots of free resources that you could
use, instead of this one if you wish, to get to grips with R.

© IFE: 2019 Examinations The Actuarial Education Company


R4: Getting help Page 3

2 More specific help

Finding the arguments of a known function


Supposing I was working with the log function and had forgotten its arguments. Working in
RStudio, we can just type in the function in the Console and it will display some useful
information, including the possible arguments and options on the first line of the yellow box:

(Alternatively use the function args, eg try typing args(log) in the Console.)

The arguments listed are:


 the input value, x (ie the value we want to calculate the log of)
 the option for the base. Note it says base = exp(1). That tells us that the default base is e
(ie what the base is if we miss out this argument). So just writing log(10) will gives us ln(10).

Getting help on a known function


If you want help on a particular function, you can use the help function, or just use a question
mark followed by the function name (without any brackets). For example:

help(log)

or

?log

Either will open up the relevant help page in the help window of RStudio:

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R4: Getting help

The help page gives us lots of information, probably beyond what we were looking for, about the
function.

Another way of reaching the same page is by using the search box in the help window – circled in
red above.

Getting examples for known functions


At the bottom of the help page there are some examples:

Rather than reading about them you can run them in R, either by copying and pasting or by using
the example function:

example(<function name>)

However, some of these examples can often be a bit obscure. So if in doubt you may just end up
searching the internet (eg looking on YouTube) to see if someone can explain it more simply.

© IFE: 2019 Examinations The Actuarial Education Company


R4: Getting help Page 5

Finding an unknown function that starts with a given phrase


One of the main issues you’ll face in R is finding the correct command for what you want R to do.
That’s sometimes a little bit trickier to solve.

If you know the function’s name starts with a given phrase, say log, you can type that into the
Console to reveal the list of functions, as mentioned previously:

This will list all the commands that start with log (with a brief explanation of each). In this case
there are 8. You can scroll through the list and hit enter when you find the one you want.

If you’re still not sure which of these it is then you could look up each of them using the help
function as we did above.

Finding an unknown function that contains a given phrase


Suppose you don’t know what the function starts with but you do know that it contains a
particular phrase, say “log”. In which case you use the apropos function.

apropos(“<phrase>”)

This comes from the French á propos which means literally “to purpose”, ie with reference or with
regard to this purpose.

The only frustration is you have to put the letters that you’re searching for in speech marks:

You’ll see we get all the 8 functions starting with log that we saw earlier and a further 15
functions. (You may see even more depending on versions/packages installed.)

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R4: Getting help

Finding an unknown function on a particular topic


If none of the approaches above help you find the function you want then you’ll have to do a
more general search using:

help.search(“<phrase>”) (you need to put the name in speech marks)

or

??<phrase> (no need for speech marks)

or type the phrase followed by a question mark in the search box of the help window, which
displays a slightly different set of results:

However, you may just end up searching the internet for “How do I … in R?”.

The beginning of each entry in the search results gives the name of the package where that
function can be found:

You can see above that the function log is in the base package – ie the standard package that is
included in R. We’ll talk more about packages later.

Clicking on the link takes us to the help(log) page we saw earlier.

Finally, you may wish to explore the other features that can be found in the Help menu of
RStudio, including a list of keyboard shortcuts, which might help you save time as you become
more proficient in R. For example, Ctrl 2 takes you to the Console.

© IFE: 2019 Examinations The Actuarial Education Company


R4: Getting help Page 7

3 Summary

Key commands
help.start( ) Takes you to the general html help page

args(<function>) Lists the arguments of <function>

?<function> Takes you to the html help page for <function>

example(<function>) Runs the examples for <function> given at the bottom of its html
help page

apropos(“abc”) Lists the functions that contain the phrase abc in their name

??abc Lists all functions that have something to do with abc

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R4: Getting help

4 Have a go
You will only get proficient at R by practising.

1. Without referring to this chapter, use R to:

– find the arguments of the choose function


– find the help page for the choose function
– run the examples for the choose function
– find a list of all functions that start with “choose”
– find a list of all functions which contain the word “choose”
– find a list of all functions that have something to do with “choose”.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 1

Using scripts

Covered in R5

 Using Scripts to enter and run commands


 Using comments in a script
 Saving scripts
 Opening scripts
 Editing scripts outside of R
 Sourcing scripts

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R5: Using scripts

1 Using scripts to enter commands


In previous chapters we executed commands in R by typing the commands directly into the
console window and then hitting enter (ie return) to execute them.

However, this is not ideal for a number of reasons. It’s harder to spot errors (as commands are
interspersed with the outputs), it has to be run line by line so using the up arrow to re‐run
multiple line commands causes issues, and it’s also harder to share our work with our colleagues.

A better method is to use something called a Script. A Script is simply a text file with R commands
in it.

Starting a new script


We can start a new script in a number of ways:

 using RStudio’s menu: File, New File, R Script


 using the drop‐down menu next to the plus sign underneath the File menu
 pressing Ctrl+Shift+N.

RStudio opens up a new script window in the top‐left corner of the screen. If a script is already
open then it will open up a new tab in the same window so you can have multiple scripts open at
the same time.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 3

Running commands from a script


We can type our commands in the editor but pressing enter does not execute them. For example
we could type:

log(10)
sqrt(36)
sin(pi/2)

Just like a script for a play or movie which contains the lines that you read out – it has the lines of
commands which can be “read out” into the console.

We can do this by copying commands line by line or in blocks (using Ctrl+C or Edit/Copy) from the
editor and then pasting them into the console window (using Ctrl+V or Edit/Paste) and then
pressing enter to execute them. If you want to copy everything then first select all using the
standard shortcut command Ctrl+A.

This makes scripts a very efficient way of running the same code again.

However, a quicker way than copying and pasting is to “run” the commands from the script
editor.

To run a single command line in the Script, place the cursor in that line and then do one of:

 press Ctrl+Enter
 click on at the top of the Script widow
 use the Menu: Code, Run Selected Line(s).

This “reads” the script into the console window and executes it. Note that each line of code, and
its output, will appear in the Console as we run it:

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R5: Using scripts

Even more useful is that our cursor moves onto the next line in the script window ready for the
next command. So if we press Ctrl+Enter three times, we will run the three lines of code
displayed in this script:

Similarly to run several lines in the script we just select at least part of those lines and press
Ctrl+Enter.

Alternatively, there are useful options in the Code menu, as well as some more short‐cut keys
which you may find useful:

Later in this chapter we’ll give another way of running the whole script without even opening it.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 5

2 Using comments
Since we may be coming back to our work some time later or sharing our scripts with colleagues,
it might be helpful to add comments as an audit trail so that we, or our colleagues, can follow
what we are doing.

We enter comments by preceding them with a #. R then ignores anything to the right of the #

You can put comments on a separate line or after a command at the end of a line.

So let’s put a comment before our three commands in our script and also one on the same line of
a command but after it:

#This script is a quick demo


log(10)
sqrt(36)
sin(pi/2) #NB trig functions are in radians

When we run all of this script we obtain the following:

The comments are “executed” but there is no output, ie R ignores them.

Writing comments as an audit trail is a really good habit to foster – especially for the CP2 exam
which tests audit trails.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R5: Using scripts

3 Saving a script
There are a number of ways of saving a script:

 click on one of the save icons in either the toolbar or at the top of the Script
window
 use the Menu: File, Save or Save As
 Press Ctrl+S.

(Note that if you are working in R and not RStudio, and your cursor is in the console window then
using CTRL+S or the save icon won’t save the script but something called the workspace which
we’ll cover in a later chapter.)

If you try to close down a script editor window before trying to save it then you will be prompted
to save it.

However, if you close down RStudio without saving a script then you won’t be given a prompt.
But don’t worry, the script should still be there the next time you open RStudio.

If you don’t change the file location, by default R will save the script in what is called the “working
directory” (ie folder for those of you too young to remember DOS).

We will see how to change the working directory later.

Let’s save the current script as “test” (note where it is being saved).

If you look in the folder where you saved the file – you can see that it has been saved with the
extension “.R”:

This let’s R know it’s a script file.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 7

4 Opening a script
There are several ways of opening a script:

 use the menu: File, Open file


 use the shortcut key Ctrl+O
 use the open file icon in the toolbar. If you click on the drop‐down arrow you will see a
list of recently used files:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R5: Using scripts

5 Editing a script outside of R


Remember a script is just a text file containing R commands. So we could create such a file
outside of windows using a text editor such as notepad.

Let’s type some commands into notepad:

And then when we save this file let’s put a “.R” extension on the end of the file name, for
example, “notepad test.R”.

If we try to open a script in RStudio, we can see the file is there:

Clicking on it will open it successfully. Try it out just to be sure.

In fact, R and Rstudio can read normal text files (which have the .txt extension) as scripts. So we
didn’t have to put the “.R” extension on the end. If the file had been saved with a .txt extension
then we could just have easily opened this in the same way using RStudio.

If you are using a word processor such as Word to create scripts then you will need to save the
script as a “.txt” file to strip out all the “invisible” coding.

If you don’t, RStudio will probably just open up the file in the programme it was written in.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 9

6 Sourcing a script
We can actually run a whole script without opening it.

Choose Code, Source File from the menu:

Find the test.R file we created earlier (or a different file you might have created) and open.

(Note that we could again source a text file, ie one that ends in “.txt”.

Even though it runs, nothing seems to happen in the console – all we have is the “source”
command:

Note the forward slashes in the filename path– more on that later.

By default, R doesn’t display anything unless we explicitly tell it to.

There are two ways of doing this. The first is to use the print command.

print(<object>)

Object is the term used in R to describe a “thing” that we perform commands on. We’ll talk about
objects more in the next chapter.

If we load up the “test.R” script and put print( ) around all of the commands as follows:

Now we’ll save this script as before (Ctrl+S). Putting the cursor in the console we’ll source the file
as before using the menu item Code, Source File.

This time something does happen in the console window:

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R5: Using scripts

We can see that it has printed the output of the three commands.

The second way to display the outputs from the commands in the source script is to use the
source command:

source(“<filename including filepath>” , …echo = …, …)

The input value is the filename which includes the file path, for example C:\Documents\R. (The
easiest way of getting this is to copy it from the address bar in Windows Explorer.)

So let’s try running:

source(“C:\R\notepad test.txt”)

Unfortunately, we get the following error:

This is because R doesn’t like single backslashes (as they are used for other commands). So either
we need to change each of them to double backslashes (ie C:\\R\\test.R) or to forward (normal)
slashes (ie C:/R/test.R).

Let’s try again with forward slashes, but also set one of the arguments echo = TRUE. We need to
do this so that the output is printed in the console. Try leaving it out and you will see the same
problem we had earlier with no output.

In fact, the input as well as the output has been displayed, which is even more useful.

There are a number of other arguments that you coud use with the Source command but given
that we don’t think you’ll use it very much in your CS1 or CS2 studies, we won’t discuss them
here. You can always use the help menus if you find you need them.

© IFE: 2019 Examinations The Actuarial Education Company


R5: Using scripts Page 11

7 Summary

Key terms
Script/editor window A window where commands can be written but not executed. We
can transfer them to the console window and execute them using
Ctrl+Enter

Comment Text following a #, which R ignores (ie does not execute) but we
can use to leave an audit trail

Object Something R performs commands on

Run Transfer a command from a script to the console and execute

Menus
File, New File, R Script Opens a new script editor window

File, Open File Opens an existing script file

File, Save Saves the script currently in view

Code, Run Selected Line(s) Runs (ie transfers to the console and executes) the line of script
the cursor is on or the lines if several are selected.

Code, Run Region, Run All Runs (ie transfers to the console and executes) all the lines of the
script in the console

Code, Source File Runs a whole script file without opening it. Only displays outputs
in the console if explicitly told to do so (by using the print
command)

Key commands
print(<object>) Displays the <object> (or its output) in the R console.

source(“<file>”, echo=TRUE) Runs the script <file> without opening it

<file> includes the file path (eg C:\\test.R)

# Comment: R then ignores anything to the right of the #

Ctrl+Enter Runs the line or selected lines of script in the console

Ctrl+O Opens an existing script file

Ctrl+Shift+N Starts a new script file

Ctrl+S Saves the script file

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R5: Using scripts

8 Have a go
You will only get proficient at R by practising.

1. Use R to calculate the following:

– log1000
10
– C6

2. Without referring to R4 (unless you get stuck), use R to:

– find the arguments of the source function


– find the html help page for the source function
– run the examples for the print function
– find a list of all 42 functions which contain the word “print”.

3. Without referring to this chapter (unless you get stuck), use R to:

– start a new script


– write the comment “This is my R5 script” and then put the commands from Q1 into it
– using any method, run the commands in the script (both individually and all together)
– clear the console screen
– save the script as “My R5 script” and close it
– load the script back into R
– close the script again and then run the script without opening it so it displays all the
commands
– open up notepad and write a script with the comment “This is my text R5 script”.
Then put in the commands above but include a command that will ensure the output
is displayed. Save as a text file with the title “My text R5 script”
– load up this script into R and run it
– close this script and then run the script without opening it.

© IFE: 2019 Examinations The Actuarial Education Company


R6: Using objects Page 1

Using objects

Covered in R6

 Assigning values to objects


 Using objects in commands
 Other useful information about objects

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R6: Using objects

1 Assigning values to objects


In this chapter we be looking at how we can store values from our calculations and then use them
in further calculations.

Just like a calculator can store values in its memories we can store values in (or assign values to)
what are called variables (or objects).

Suppose I want to store the value 5 to the variable/object capital A. There are a few ways of
doing this. One is to write A=5:

R has executed this command but nothing else is displayed in the Console. If we want to see
what’s assigned or stored in the variable A (in the Console) we can use the function
print(<object name>). Alternatively, a quicker method is to type the name of the object (in this
case A) and execute it:

We can see that these return the value 5 as that is what has been assigned to the object A.

Note that we can’t assign the value using 5=A:

This returns an error as R is trying to assign the value A to 5.

However, most users of R prefer to use the assign command which is made up of a dash and an
inequality sign which together make an arrow. We can do this either way round:

8 ‐> B

This assigns the value 8 to the variable B. The arrow goes from the 8 to the B.

Alternatively:

C <‐ 10

This assigns 10 to the variable C. The arrow goes from the 10 to the C.

Ensure that you don’t enter a space between the dash and the inequality sign, because together
the two symbols are treated as one operation.

© IFE: 2019 Examinations The Actuarial Education Company


R6: Using objects Page 3

Again, using the print command or just typing the object (or variable) name displays the values
that have been assigned (ie stored) in those objects:

As we’ll discover later, we can assign all sorts of things to an object including characters (such as
names of policyholders) or even the results of functions. Since a function could take up many
lines, many users of R tend to use the second version which puts the object first. This makes it
easier to see it in the coding.

RStudio’s Environment window


One of the many useful features of RStudio is that it displays the values of the variables you are
using in the Environment window:

You can clear all the values you have allocated using the icon circled in red above. (You can also
use the menu: Session, Clear Workspace.) This effectively wipes R’s memory. But be careful, as
this can’t be undone.

Objects are case sensitive


Note that like commands, objects are case sensitive. We have assigned 5 to object A but if we
type lower case a – R has no idea what we’re talking about:

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R6: Using objects

2 Using objects in commands


R is an object orientated language. This means that everything in R is an object (even unassigned
numbers are treated as objects with no name) and we perform operations on these objects.

We can use these named objects we’ve created in, for example, simple mathematical operations
such as addition, subtraction, multiplication and powers:

We can also put objects in functions such as square roots, logs, exponential or combinations:

We can even store the outputs of function in other objects:

And we’ll see later that we can store all sorts of other things in objects including characters and
even functions.

© IFE: 2019 Examinations The Actuarial Education Company


R6: Using objects Page 5

3 More about objects

Naming objects
You can name objects using letters (a‐z, A‐Z), numbers (0‐9), full stops (.) and underscores (_) and
they can be as long as you like. However, they must start with a letter.

For example you could call an object “bob” or “larry123” or “go.west” or “age_at_death”. But
you can’t use spaces. For example, if we try to assign 5 to the object “bob larry” we get:

The unexpected symbol is the space.

Just like on a calculator if we assign another value to the same memory it overwrites it.

For example, we currently have the number 5 assigned to A. If we now assign 6 to A:

We can see that R has overwritten the value 5 that was stored with the value 6 and it didn’t even
warn us that it had done so.

You can use the Environment window in RStudio to keep track of the objects you’ve assigned
values to.

Naming objects for an audit trail


A sensible approach to avoid overwriting objects and to make a clear audit trail is to use
descriptive names that tell us what values are assigned to them.

For example if we’ve got data on age of death then we should store it in an object with this name.
Given that we can’t use spaces in an object name, sensible alternatives could include full stops,
underscores or capitals:
 age.of.death

 age_of_death

 AgeOfDeath.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R6: Using objects

Removing objects
Once we’ve assigned a value to an object it will stay in R’s memory (called workspace) until we
exit R or manually remove it.

We do this using the remove command, remove(<object>). So to get rid of A, we type:

remove(A)

and A will disappear from the Environment window.

An abbreviation for the remove command is rm. So typing rm(B) removes object B.

Rather than repeating this for each of the functions we want to remove we could just list all the
objects we want to remove as the arguments. For example:

remove(<object 1>, <object 2>, ….)

rm(<object 1>, <object 2>, ….)

So we could remove objects C and D using:

remove(C, D)

Removing all objects


As mentioned earlier, you can clear/remove all objects you have assigned using the icon in the
Environment window: . (You can also use the menu: Session, Clear Workspace.)

Keeping objects for the next session


Recall that when we quit RStudio it asks whether we want to save the workspace. This is asking
whether we want to keep all the objects we have created. We’ll cover this in the next chapter,
but we suggest you do not click “yes” until you have read that chapter.

© IFE: 2019 Examinations The Actuarial Education Company


R6: Using objects Page 7

4 Summary

Key terms
Object Something which stores data which R can perform commands on (also
called a variable).

Object names can include letters (a‐z, A‐Z), numbers (0‐9), full stops (.)
and underscores (_). They must start with a letter.

Assign The name given to the process of storing something in an object.

Menus
Session, Clear Workspace Remove all objects in the current working environment

Key commands
<‐ Assigns a value (on the right) to an object (on the left)

‐> Assigns a value (on the left) to an object (on the right)

print(<object>) Displays the <object> in the R console. For objects which have a
value assigned to them it displays the value.

If the <object> is a function then it displays the output.

remove(<list of objects>) Removes all the objects listed

rm(<list of objects>) Removes all the objects listed

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R6: Using objects

5 Have a go
You will only get proficient at R by practising.

Without referring to this chapter (unless you get stuck), use R to:

 Store the values 5, 10, 15, 20 and 25 in the objects V, W, X, Y and Z, respectively.
 Calculate V – W, V/W, X*Y, ln(Z), exp(W)
 Remove the object V
 Remove the objects W and X in one go.
 Remove all the objects in the working memory.

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 1

Workspaces

Covered in R7

 Workspaces
 Working directory
 RStudio Projects

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R7: Workspaces

1 Workspaces
When we open up R we start what is called a session. In this chapter we will look at how we can
keep a history of the commands we entered and the objects we’ve created so that they can be
used in another session.

The workspace and command history


We’ll start by using a new “clean” session. If you’ve just opened RStudio and you don’t have any
scripts open or values assigned to variables then you have one already. If not, then click on
Session, New Session from the menu to start afresh.

We’ll now enter some of the commands we used in the last chapter:

A <- 5
A
8 -> B
C <- 10
A+B
A-B
A*B

We have created three objects (A, B and C) and these are stored in R’s local environment or
working memory. This is called the workspace.

You can see the values assigned to A, B and C in RStudio’s Environment window.

We can also see a record of all the commands entered into the console so far in the History
window, a different tab next to Environment:

You can delete some or all of your history using the icons circled above.

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 3

Closing down without saving


Let’s close down RStudio choosing “Don’t Save” when it asks if we want to save the workspace
image:

If we reopen RStudio you will see that (assuming you didn’t save the workspace after working
through an earlier chapter) we have lost the objects, but retained the history record, which is
stored in a file called “.Rhistory”.

Saving the workspace image


Let’s re‐enter the same commands and quit RStudio again, but this time click “Save” when it asks
if we want to save the workspace image:

R then saves a second file called “.RData”, which contains all the objects we created.

When we open up a new session of R we see in the Console it says:

So it has loaded up the workspace called .RData from the folder ~/R/. We’ll see where that folder
is saved in a minute.

You should still now be able to see the values assigned to A, B and C as well as the history of
previous commands.

Rather than save your data in the working directory on exit, you may wish to save it in a different
folder. You can do this using the Session menu, or by using the save icon in the Environment
window. You can also save the history record somewhere other than the default location using
the save icon in the History window.

You can then open an old workspace using Session, Load Workspace from the menu, or using the
open icon from the Environment window.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R7: Workspaces

Overwriting warning!
A brief word of warning about loading .RData files. Doing so overwrites any objects with the same
name. So if you had a different value assigned to object A in this new RData file, it would replace
whatever was in object A currently, without any warning.

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 5

2 The working directory


The place where R stores the two files (.RData and .Rhistory) on the computer is called the
working directory (or folder).

We can find the location of this directory/folder using the “get working directory” command
getwd( ).

It is also displayed at the top of the Console window:

In this example, the working directory is a folder called ~/R.

The tilde sign, ~, in a path name is a shortcut used by R. We can find what the shortcut is using
the path.expand command:

path.expand("~")

[1] "C:/Users/username/Documents"

Depending on your computer setup, tilde (~) and the working directory may be in a folder named
R in Program Files or elsewhere.

If you click on the icon next to working directory name near the top of the Console, , then you
can see the files stored in the working directory in the Files Window, which is similar to Windows
Explorer. The file list includes the two files mentioned above:

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R7: Workspaces

Changing the working directory


There are a number of ways of doing this. We can use the menu: Session, Set Working Directory,
Choose Directory:

Alternatively, you can use the set working directory function whose argument is the new
directory/folder in speech marks:

setwd(“<folder address>”)

For example:

setwd(“C:/Users/username/Documents/R”)

or more quickly:

setwd(“~/R”)

Have a go and check it works by using the getwd() command afterwards.

Another way of setting the working directory is through the Files window menu:

You can navigate to the folder you want to use and then click on: More, Set as Working Directory.

Note that the Home folder, circled above, is the path of the tilde, ~.

If you wish to set your working directory to a folder that isn’t a sub‐folder of Home, then you will
need to use one of the other methods above.

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 7

All of these methods only change the working directory temporarily. If you close down RStudio
and reopen it, it reverts to the default directory. If you wish to change it permanently you need
to use the Menu: Tools, Global Options, General:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R7: Workspaces

3 Projects
If you save your work in the default working directory, you will automatically pick up where you
left off. But you will probably want to keep work in different places with different names for easy
reference.

One way of doing this is to save your scripts, workspace and/or history record, as discussed
previously, and open them again when you want them.

A neater way in RStudio is to use a Project.

A project will store a workspace, history file, scripts as well as the working directory altogether.

So if you’re about to start a significant piece of work, for example, an assignment, it’s probably a
good idea to start a new project. This is easily done from the menu using File, New Project.

After clicking on New Directory and then New Project you can enter the project name, chosen
here to be Test Project:

A final click on Create Project and you will be working in your new project.

RStudio has automatically set the working directory to the new location, which you can see at the
top of the Console window:

Let’s create a new object, E, for this project and assign 200 to it:

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 9

We’ll now close the project down using File, Close Project, clicking “Save” when it gives us the
prompt:

Note that the prompt tells us where we are saving the data.

If we then reopen RStudio, and open our project using the File menu:

our data and history will be restored and we can carry on with our work.

Good working practice


As mentioned above, it’s a good idea to store your work in specified folders using projects. You
may also wish to keep your default workspace free of clutter.

If it’s not free now, then you have a couple of options:

1. Clear the Environment of its objects; clear the history, close down any scripts and then
save the workspace as you exit RStudio.

2. Go to the working directory in Windows Explorer and delete the .RData and .Rhistory
files.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R7: Workspaces

4 Summary
Key terms

Workspace R’s working memory which contains the objects created during
that session (as well as any data or packages that are loaded)

Working directory The default folder where R loads files from or saves files to

Session The period of time when R is open

Slashes Used to denote a file path. R uses \\ or / (not \)

Menus

Session, Load Workspace Loads a previous saved workspace

Session, Save Workspace As Saves the current workspace (ie objects created)

Session, Set Working Directory Changes the working directory (ie the default location where R
loads files from or saves files to) for this session

File, New Project Opens a new project (where you can store all work for an exercise
or assignment)

File, Open Project Opens an existing project (can also use the Recent Projects
option)

File, Close Project Closes the current project, will prompt you to save on exit

Tools, Global Options You can change the default working directory in the General tab

Key commands

getwd( ) Displays the current working directory

setwd(“<folder address>”) Sets the working directory to the specified folder (eg “C:\\”)

path.expand(“~”) Gives the file path that the shortcut ~ goes to

© IFE: 2019 Examinations The Actuarial Education Company


R7: Workspaces Page 11

5 Have a go
You will only get proficient at R by practising.

1. Create a new project called R7.

2. Open a new script and save it as R7.R.

3. Enter code in the script that will assign the values 10, 50 and 100 to the objects X, Y and Z
respectively.

4. Also, in the script, set a new object Answer to equal X*Y/Z and then to output Answer.

5. Run the script.

6. Check the History window matches the Console and then view the Environment window.

7. Close the project and then RStudio.

8. Open RStudio again and open the R7 project.

Your screen should look something like this:

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 1

Packages

Covered in R8

 Loading Packages
 Help with packages
 Unloading and uninstalling packages
 Installing packages

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R8: Packages

1 Loading packages

What are packages?


We have mentioned in previous chapters that we can obtain additional features or additional
functions by downloading packages.

Packages are the name given to a collection of related R functions, data and code stored in a
particular format that R can load up.

In this chapter we will look at how we can install packages and get help on them.

Overview
To use a package in R we have to go through two steps:

install load


CRAN Workspace

First we have to install the package onto our computer’s hard drive. We only do this once. The
directory or folder where packages are stored is called the library.

Second we have to load the package into R’s workspace so we can use its features. We will have to
do this every time we want to make use of this package as to save time and memory R only loads
the standard (or base) packages each time. However, we will see that RStudio simplifies the process
greatly.

As well as containing the values assigned to any objects we’ve made, the workspace also contains
the functions/data/code from any packages that are loaded.

Finding the packages loaded into the current workspace


We can find a list of the packages which are currently loaded into our workspace by using the
search( ) function:

install load


CRAN Workspace

search ( )

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 3

This returns the following results:

We can see that a number of packages (stats, graphics, grDevices, utils, datasets, methods) are
loaded up in addition to the standard (base) package. We won’t worry about the other items now.

Finding the packages installed on your machine


When R is installed it automatically installs a number of packages on your machine in what we call
R’s library directory (ie the library folder). We can find out the list of these packages installed in the
library using the library function with no arguments, ie library().

install load


CRAN Workspace

library ( ) search ( )

This opens up a new window listing all the packages installed and available in the library.

However, RStudio makes things much easier because there is tab, called Packages in the bottom‐
right window which displays the packages installed in your library.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R8: Packages

Loading packages
We’ll now look at how we can load packages (that have been installed on our computer) into R’s
workspace (so that we can use their features).

All you need to do in RStudio is to use the tick box next to the package in the Packages window. For
example, here we have loaded the Graphics package:

You’ll see in the Console that R has used the library function to load the package:

If you ever need to use this in a program then you can copy the code into your script.

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 5

2 Getting help with packages


Once we load a package we may need some help to make the most of the new functions and
datasets that are included.

The first way of getting help is to use the help files that accompany the package. One way to access
these uses the library function. For example, for help with the package called MASS, we use:

library(help=MASS)

This opens a window which contains an introduction (which includes the dependencies, ie other
packages which are necessary for this package to work):

Scrolling down it gives lists of the new functions and datasets included in the package:

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R8: Packages

Alternatively, click on the name of the package in the Packages window to open up the relevant
pages in the Help window:

Clicking on “description” gives the information we saw above in the introduction. And then
underneath it lists all the package’s functions and datasets intermingled in alphabetical order.

Clicking on the links will take us to the help page on that specific function or dataset.

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 7

Some packages also come with demonstrations which can be accessed via the demo command. In
the same way that using the library command with no arguments lists all the packages installed on
our machine, using the demo command with no arguments lists all the demos available for all the
packages installed on our machine:

We can then run a demo using:

demo(<demo name>)

For example:

And use Return to step through the demo.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R8: Packages

3 Unloading and uninstalling packages

Unloading
You are unlikely to need unload a package as its presence in R doesn’t affect the running of other
functions in any way. However, all you need to do is uncheck the box in the Packages window if
you wish to unload or detach a package.

Uninstalling
Again, you probably won’t have a need during your CS1 or CS2 studies to uninstall a package, but
it’s possible in RStudio using the small cross on the package line in the Package Window.

This deletes the package from your library and if you decide you want it again you will need to
reinstall it using the instructions in the next section.

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 9

4 Installing Packages

Finding out about available packages


At the time of writing there are over 9,000 packages available. We can find out about them by
going to www.cran.r‐project.org and then click on CRAN in the download menu on the left:

This will take us to a page where we need to choose what is called a CRAN mirror:

Recall that CRAN stands for the Comprehensive R Archive Network. Essentially there are lots of
sites on the network which hold identical contents, hence the term “mirror”. You can scroll down
the page and choose one near to you or just select the first option on the list “0‐cloud” which should
automatically direct you to the nearest one.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R8: Packages

Next click on packages on the left hand menu and then we can choose to view the packages sorted
by name or by date of publication:

If we choose to view them by name, we see the packages listed in alphabetical order with a brief
sentence describing its contents.

Clicking on the name of a package gives you some more detail.

If you need to use a package in your CS1 or CS2 studies then you should be told what package that
is and so you won’t need to look too far. However, you do need to know how to install a package
if it isn’t one of the default packages automatically installed with R.

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 11

Installing packages
RStudio enables us to install packages with ease. Say we are asked to install the package called
ggplot2. First click on the Install button in the Packages window:

Start typing the name of the package and then select it from those displayed before clicking Install.

Once the relevant files have been downloaded you will be able to see the package listed in the
Packages window.

If you look back in the Console you will see that R has used this command to install the package:

install.packages("ggplot2")

You can then load it using the checkbox in the Packages window or the library function we saw
earlier:

library("ggplot2", lib.loc="~/R/R‐3.5.1/library")

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R8: Packages

5 Summary

Key terms
Key terms

Package Collection of related R functions, data and code

CRAN “Comprehensive R Archive Network” – bunch of webpages that contain


packages, help files and so on for R.

Install Get a package from the web onto the computer’s hard drive (library)

Library Directory on the computer’s hard drive where packages are stored.

Load Get a package from the hard drive’s library into R’s workspace

Key commands
install.packages(“<package name>”, lib = “<library location>”)

installs the chosen package into the given library (or default library
location if not specified)

library( ) Lists all packages installed and available in the library

library(“<package name>”, lib.loc = “<library location>”)

loads the chosen package from the given library (or default library
if omitted) into R’s workspace

library(help = “<package>”) Opens the help file for the specified package

search( ) Lists all packages currently loaded into R’s workspace

demo( ) Lists all demos available for all packages installed.

demo(<demo>) Runs the specified demo

© IFE: 2019 Examinations The Actuarial Education Company


R8: Packages Page 13

6 Have a go
You will only get proficient at R by practising.

1. Choose another package and install, then load it. Use the help pages to use one of its
functions.

2. Unload the package.

3. Uninstall the package.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 1

Data Types

Covered in R9

 Types of data
 Numeric data
 Character data
 Logical data
 Complex data
 Raw data

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R9: Data types

1 Types of data
We mentioned in a previous chapter that R stores information in data structures called objects.
We looked at how we could store (assign) numbers in these objects by using the assign command,
<‐.

For example to assign the value 5 to the object capital A, we type:

A <‐ 5

To see what has been assigned to an object we can either use the function print function:

print(A)

or just type the name of the object:

or just look in the environment window.

However, in addition to numbers there are other types of data that can be assigned to (ie stored
in) objects. The five data types (sometimes called the “atomic modes”, “atomic vectors” or
“primitive objects”) are:
 numeric
 character
 logical
 complex
 raw.

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 3

2 Numeric data
Numeric data are real numbers such as ‐2, 3.7 and pi. We can use the is.numeric function to
determine whether data is a real number or not. For example:

is.numeric(‐2)

TRUE

‐2 is a real number so the result is true.

is.numeric(2+3i)

FALSE

2+3i is a complex number not a real number so the result is false.

is.numeric(“Bob”)

FALSE

“Bob” is a name not a number so the result is false.

is.numeric(A)

TRUE

We stored the numerical value 5 in the object A and so it is numeric.

Numeric can be subdivided further into: integer and double, depending on how the number is
stored in R’s working memory.

Double stands for double precision, that is a floating point number such as 3.78e+12, which is
stored in 2 pieces – the significant (ie the 3.78) and the exponent (ie the 12). This is the default
type of numeric data.

is.double(3)

TRUE

is.integer(3)

FALSE

So even though 3 is an integer, it is stored in R’s working memory as a floating point number.
Integer in R means that it is an integer and it is not stored as a floating point number. We can
force R to store an integer as a non‐floating point number by placing a capital L after the number:

is.integer(3L)

TRUE

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R9: Data types

3 Character data
Character data (sometimes called strings or character strings) are qualitative data such as names
of policyholders or cities. We use single or double quotes to specify data as character:

is.character(“Bob”)

TRUE

is.character(‘Larry’)

TRUE

is.character(“2”)

TRUE

Even though 2 is a number by using quotes we are telling R to treat it as a character. Just like we
might put ‘2 in Excel to make it text.

is.character(Alice)

Error: object ‘Alice’ not found

Because Alice has not got quotes and is not a number, R assumes it must be an object. But since
we haven’t assigned anything to Alice, R gets a bit confused.

We can store character data in objects. For example, to store the name “Bob” in object B we
type:

B <‐ “Bob”

“Bob”

is.character(B)

TRUE

A useful function is nchar( ) which counts the number of characters in character data. For
example:

nchar(“actuary”)

If our qualitative data are categorical, that is if they can only take a specified number of categories
(eg policy type, gender, etc), then we would want to store them in a special object called a factor.
We’ll cover this in the next chapter.

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 5

4 Logical data
Logical data refers to data values which take one of the two Boolean states TRUE and FALSE.

is.logical(TRUE)

TRUE

is.logical(FALSE)

TRUE

is.logical(true)

Error: object ‘true’ is not found

Only the upper case TRUE and FALSE are used, so when R encounters true it thinks it must be an
object. But since we haven’t assigned anything to it, R has got confused.

We can store logical data in objects. For example:

C <‐ FALSE

FALSE

is.logical(C)

TRUE

We can use the abbreviations T and F to stand for TRUE and FALSE:

is.logical(T)

TRUE

is.logical(F)

TRUE

But unlike TRUE and FALSE, T and F are not reserved words in R. If we try to assign something to
the object FALSE we get the following:

That’s because FALSE is a reserved word and so cannot be used as an object name.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R9: Data types

However, since T and F are not reserved we can treat them as objects that we can assign things
to. For example:

T <‐ 10

is.logical(T)

FALSE

If we had a wicked sense of humour we could even assign TRUE to F and FALSE to T and cause all
sorts of havoc! Hence, since T and F are not reserved it would be wise to always use TRUE and
FALSE to prevent such problems arising.

It should be mentioned that NA, standing for “not available”, is also treated as logical data:

is.logical(NA)

TRUE

is.na(NA)

TRUE

We will make extensive use of NA for missing data. Like TRUE and FALSE, NA is a reserved word
and so cannot be used an object name.

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 7

5 Complex data
Complex data are used for complex numbers such as 2+3i.

is.complex(2+3i)

TRUE

is.complex(2)

FALSE

is.complex(2+0i)

TRUE

is.complex(i)

Error: object ‘i’ not found

R only recognises i as an imaginary number if there is a number before it. Otherwise it thinks it is
an object.

We can store complex data in objects. For example:

D <‐ 5‐2i

5‐2i

is.complex(D)

TRUE

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R9: Data types

6 Raw data and Coercion

Raw data
This final data type stands for raw byte data in hexadecimal. For example R stores the word
“actuary” as the following raw bytes 61 63 74 75 61 72 79.

We’re not going to look at this here.

For the purposes of our actuarial studies we will only be working with numeric, character and
logical data.

Coercion
When R imports data it will make (what it thinks) is a sensible decision as to what type of data it
is. If we want to tell R that it is a particular type of data we can coerce it using the “as.” functions.

For example, earlier we saw that R will, by default, store all numbers (eg 3) using double precision
(ie floating point values). If we want it to store 3 as an integer we would type as.integer(3).

We could also swap between data types using coercion as the following examples show:

as.integer(4.5)

as.character(5)

“5”

as.logical(1)

TRUE

as.complex(7)

7 + 0i

as.numeric(TRUE)

as.character(TRUE)

“TRUE”

as.complex(TRUE)

1 + 0i

as.numeric(“2.5”)

2.5

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 9

as.logical(“true”)

TRUE

as.complex(“5‐4i”)

5‐4i

Sometimes R will do the coercion but give a warning message:

However, sometimes this just doesn’t make any sense and R will return an NA (not available)
result. For example:

In the next chapter we’ll look at the data structures (objects) that R stores data in.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R9: Data types

7 Summary

Key terms
Data type The information that can be stored in objects – can be one of the
following: numeric, character, logical, complex or raw

Object Something which stores data which R can perform commands on

NA Not available – used for missing data or function results if it’s not
possible to perform the action

Numeric Data type that consists of real numbers

Character Data type for qualitative data, treated as text

Complex Data type for complex numbers

Logical Data type for the logical results TRUE, FALSE and NA

Raw Data type for raw data bytes

Double Stands for double precision, which refers to the two values used in
a floating point value (eg 2.78e‐12) – default way numeric data is
stored in R’s memory

Integer Integer numeric data which is not stored as floating point values
in R’s memory

Key commands
is.<data type>(<object>) Logical test of whether <object> has the <data type>

Returns TRUE or FALSE

Includes: is.numeric( ), is.integer( ), is.double( ), is.character( ),


is.logical( ), is.na( ), is.complex( )

as.<data type>(<object>) Coerces the R <object> into the required <data type> if possible

Includes: as.numeric( ), as.integer( ), as.double ( ), as.character( ),


as.logical( ), as.complex( )

nchar(<object>) Counts the number of characters <object> has

© IFE: 2019 Examinations The Actuarial Education Company


R9: Data types Page 11

8 Have a go
You will only get proficient at R by practising.

1. Classify each of the following data types:

FALSE “5” 5 5+0i

Use the is.command to check your answers in R.

2. Determine whether each of the following would be true or false:

is.numeric(5+0i)

is.integer(5)

is.double(5L)

is.logical(0)

Check your answers in R.

3. What would be the outcome of each of the following?

as.integer(3.47)

as.integer(“3.47”)

as.numeric(3i‐2)

as.logical(0)

as.complex(“3i‐2”)

as.character(3.47)

as.character(FALSE)

as.character(NA)

Check your answers in R.

4. What would be the answer to the following:

nchar(“hello”)

nchar(3.47)

nchar(FALSE)

nchar(3i‐2)

Check your answers in R. The last one may surprise you!

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R10: Objects Page 1

Objects

Covered in R10

 Types of objects
 Vectors and matrices
 Arrays, data frames, lists and factors

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R10: Objects

1 Types of objects
R is an object orientated language. It stores information in data structures called objects.
Everything in R is an object (even unassigned numbers are treated as objects with no name) and
we perform operations on these objects.

In the previous chapter we looked at the types of data that can be stored in objects. These were
numeric (ie real numbers such as 2.7 or pi), character (ie qualitative data such as policyholders
names), logical (ie TRUE, FALSE and NA), complex (ie complex numbers such as 3  2i ) and raw
(which was raw data bytes).

In this chapter we’re going to look at six different types (or classes) of objects (ie data structures)
that we can store the data in, which are:
 vectors
 matrices
 arrays
 data frames
 lists
 factors.

The type of object is called its class. We can find the class of an object by using class(<object>).
The class of object tells R how functions interact with it. For example using a print command on a
vector object just displays its contents but using it on a function returns its output:

A<‐5

print(A)

print(log(A))

© IFE: 2019 Examinations The Actuarial Education Company


R10: Objects Page 3

2 Vectors and matrices


The default object type is a vector, which is a one‐dimensional ordered collection of data of the
same type. You can find out whether an object is a vector by using is.vector(<object>). The
dimension is called length and can be found by using the length(<vector object>) command.

Everything we have worked with so far has been a vector. Even unassigned values are considered
vectors with a single element. For example:

is.vector(5)

TRUE

length(5)

Unlike a vector in maths, which contains only numbers, vectors in R can contain any of the five
types of data that we met in the previous chapter (numeric, character, logical, complex and raw).
Examples of vectors would include:

 TRUE 
 "bob"   TRUE   2  3i 
 3.7   "larry"     5i 
 1.4     FALSE    7
   "ginger"   4  8.7i 
     
 TRUE 

However R displays vectors horizontally rather than vertically. We will look at vectors in detail in
the next chapter.

A matrix is a two‐dimensional object containing data of the same type. It is essentially composed
of several vectors of the same length. You can find out whether an object is a matrix by using
is.matrix(<object>). The dimensions are called rows and columns and the numbers of each can be
found by using the nrow(<matrix object>) and ncol(<matrix object>), respectively. Alternatively,
you could use the dimensions command dim(<object>) to get both the number of rows and
columns.

Again, unlike matrices in maths which contain only numbers, matrices in R can contain any of the
five types of data (numeric, character, logical, complex and raw). Examples include:

 "barry" "alice" 
 3 2.1   "harry" "belinda" 
 4.9 8.6   
   "larry" "chelsea" 
 

More on matrices in a later chapter.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R10: Objects

3 Arrays, data frames, lists and factors


An array is an n ‐dimensional object containing data of the same type. For example it might have
3 dimensions length, width and height. You can find out whether an object is an array by using
is.array(<object>). We can find the dimensions of an array by using the dimensions command
dim(<object>).

A data frame is a two‐dimensional object (like a matrix). However, whilst each column (ie vector)
contains data of the same type the different columns (ie vectors) can be a different data type.
This will be most useful for statistical analysis where each row represents a single observation (eg
a single policyholder). For example, a data frame could include policyholders’ names, their ages
and their smoker status:

 Alfie 34 TRUE 
 Belinda 28 FALSE 

 Charlie 31 FALSE 
 
 Delilah 38 TRUE 

You can find out whether an object is a data frame by using is.data.frame(<object>). We will look
at data frames in more detail in a later chapter.

A list is a one‐dimensional ordered collection of data (like a vector) but the data items don’t have
to be the same type. We can have lists of things like vectors, matrices and data frames and even
lists! An example might be:

 3.7    "Open" TRUE  


   
 2  3i    "Closed" FALSE  
 "Bob"    3.1  
     
 FALSE   4.3 
   

You can find out whether an object is a list by using is.list(<object>). Like a vector, the dimension
is called length and can be found by using the length(<list object>) command.

Factors are vectors of characters where the entries are categorical data (eg gender, insurance
group, country). Each entry can only take one of a specified number of categories (eg
male/female, or groups 1‐15, or UK, US, etc). We call these categories the levels of the factor. By
default, R will assign the levels alphabetically (so female=1 and male=2). If the categorical data
are ordinal (eg high/medium/low), then we use an ordered factor.

We have to be a bit careful when importing data into R as it often assumes that character data are
factors (for example policyholder’s names). So we might need to use coercion to tell R what type
of data values they are. We met coercion in the last chapter.

© IFE: 2019 Examinations The Actuarial Education Company


R10: Objects Page 5

4 Summary

Key terms
Object Something which stores data which R can perform commands on

Vector A one‐dimensional, ordered collection of data of the same type

Matrix A two‐dimensional object containing data of the same type

Array An n ‐dimensional object containing data of the same type

Data frame A two‐dimensional object, each column containing data of the


same type – different columns can be of different data types

List A one‐dimensional, ordered collection of data (of different types)

Factor vector of characters where the entries are categorical data (eg
gender, insurance group, country)

Key commands
class(<object>) Displays the class of an object

is.<object type>(<object>) Logical test of whether <object> has the <object type>

Returns TRUE or FALSE.

Includes: is.vector( ), is.matrix( ), is.array( ), is.data.frame( ),


is.list( ) and is.factor( ).

dim(<object>) Displays the dimensions of a matrix, data frame or array

nrow(<object>) Displays the number of rows in a matrix or data frame

ncol(<object>) Displays the number of columns in a matrix or data frame

There is not a “Have a go” section in this chapter as we explore the key types of objects in more
detail in the next few chapters.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 1

Vectors

Covered in R11

 Creating vectors
 Naming vectors
 Indexing vectors
 Vector arithmetic

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R11: Vectors

1 Creating vectors
In the last chapter we briefly described the six types of objects (called classes) that R uses to store
data. In this chapter we look at the most fundamental of these which is the vector.

A vector is a one‐dimensional ordered collection of data of the same type (numeric, character,
logical, complex or raw). Vectors are also called atomic vectors as there is no object more basic
than this ‐ even unassigned values (eg 5) are considered vectors with a single element.

You can find out whether an object is a vector by using is.vector(<object>). The dimension is
called length and can be found by using the length(<vector object>) command.

The simplest way to create a vector is to use the c( ) function. The c stands for “concatenate”
which means “combine” or “join together”. Suppose we want to make a numeric vector, called v,
containing the numbers 1 to 10. We could do this as follows:

v <‐ c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Typing in the object name returns the following:

Even though a vector is vertical, we can see that R displays vectors horizontally, which is why it
measures the number of elements by length.

We can see that v is a vector of 10 elements:

But when we look at its class we obtain the following:

Rather than saying vector it gives the type of data it holds, as a vector is the default object and is
defined by its elements.

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 3

Another way of obtaining all this information in one go is to use the str(<object>) command which
displays the structure of an R object:

It says it is numeric data, gives the dimensions and the contents (in this case all of the contents,
but for larger objects it will only give some of them).

We can even create vectors from other vectors. For example:

Note that because of the length of this vector, when we look at its structure it won’t display all of
its contents:

Since a vector is a collection of data of the same type, if we try to create a vector of different data
types R will try to coerce them all to the same type. For example:

In this case it has converted the logical data into numeric data (TRUE becomes 1 and FALSE
becomes 0) so that it is now a vector of numeric data .

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R11: Vectors

Alternative methods of creating numeric vectors


If we want to create a numeric vector containing consecutive integers between a and b inclusive
we could use the shorthand a:b. For example, either of the following would create a vector object
containing the values 1, 2, 3, …., 10:

1:10

c(1:10)

If we want some other sequence of numbers we could use the sequence generation command:

seq(from, to, by, length)

 “from” is the value the sequence starts at. The default value is 1.
 “to” is the value it finishes at – if only one argument is given it will assume this is the “to”
and the “from” argument will take the default value of 1
 “by” is an optional argument which gives the steps the sequence increases by, its default is
±1 unless the length option is used
 “length” is an optional argument that gives the required length of the sequence.

The following examples illustrate the use of this function:

Finally, we could use the many functions that produce simulations from common distributions
such as runif(n) which returns n values from the U(0,1) or rnorm(n) which returns n values from
the N(0,1) distribution:

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 5

2 Naming vectors
It may be the case that we wish to name the elements in our vectors. For example suppose we
want to create a vector, age, which contains the ages (34, 28, 47) of three policyholders (bob,
larry and ginger). If wish to keep the names of the policyholders associated with their ages in the
vector we could do this as follows:

We can see that when the vector is displayed it also includes the names. Similarly the names are
given if we use the structure command:

It says that it is a named numeric vector with 3 elements (34, 28, 47). Then it gives the attributes
of the vector (ie its names attribute) which are the character data (“bob”, “larry”, “ginger”).

An alternative way of performing this action would be to use the names function. Suppose we
have another vector, claim.free, which gives the number of claim free years (3, 0, 8) for the three
policyholders (bob, larry and ginger). We can assign the names to the vector claim.free as
follows:

Note that you don’t have to name every element in a vector, you could use “” for those elements
to which you wish to give no name.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R11: Vectors

3 Indexing vectors
Sometimes we may be interested in only some of the elements in a vector. To do that we use
indexing.

Recall that in a previous chapter we said that the [1] at the start of the output line referred to the
index value of the first answer. The square brackets tell R it’s an index and the number gives the
position of the element.

Earlier we defined the vector v to contain the numbers 1, 2, 3, …., 10. So to select the third
element we type v[3]:

Suppose I want to select the second and fifth elements of the vector v. If I try v[2,5] we get the
following:

That’s because with indexing it thinks we are referring to the second row and the fifth column.
Since a vector only has one dimension it is very confused. So to specify both elements in just one
dimension (ie both values are rows) we need to use the combine function as follows v[c(2,5)]:

The following show clever ways of specifying the third to seventh elements and all the elements
from the fourth value to the end:

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 7

To specify all elements except the specified ones we use a negative in front of their positions:

We can use the results of logical tests to select elements. For example we could select all the
values of v whose values are between 4 and 8 inclusive as follows:

We can see that all the elements for which the test is TRUE are selected.

Finally, if we have a named vector then we can select the elements using their names. For
example, using our vector of policyholders’ ages we could just select Larry’s age as follows:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R11: Vectors

4 Vector arithmetic
The standard arithmetic operations work on vectors but on an element by element basis:

Whilst this is not quite the same as how vectors work in mathematics, it provides a powerful way
of performing the same operation on many values at once.

Suppose we define two vectors v1 and v2 as follows:

We can see from the following examples that vectors operate on other vectors also on an
element by element basis:

Let’s have a look at what happens if vectors are of different lengths by first defining vectors v3
and v4 as the values (1,2) and (1, 2, 3), respectively:

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 9

Then when we add vectors v1 and v3 together we get the following:

We can see that the shorter vector v3 has been extended by repetition to (1, 2, 1, 2). This is
called recycling. This works fine as the length of v1 is a multiple of v3.

Let’s look at what happens when this is not the case by adding vectors v1 and v4 together:

We can see that the shorter vector v4 has been extended by repetition to (1, 2, 3, 1) and a
warning message is displayed.

This process of recycling the shorter vector explains why v1 + 3 returns the following:

Recall that the default object is a vector. So 3 is treated as a vector of length 1. This is recycled to
(3, 3, 3, 3) and added to vector v1.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R11: Vectors

5 Summary

Key terms
Vector The default object type is a vector which is a one‐dimensional
ordered collection of data of the same type – it has one dimension
which is called length

Length The number of elements in a vector – called length as vectors are


displayed horizontally even though they are actually vertical

Concatenate Join together

Indexing The process of selecting an element from a vector object

Recycling The process of extending a smaller vector by repetition to make


sense of operations with other vectors – returns a warning if the
length of the smaller vector is not a multiple of the length of the
larger vector

Key commands

is.vector(<object>) Logical test of whether <object> is a vector

Returns TRUE or FALSE

length(<vector object>) Gives the number of elements in a vector

class(<object>) Gives the class of an object. Since atomic vectors are defined by
their data type it gives their data type instead

c(<element1>, …) Combines <element1>, … together into a single object.

str(<object>) Displays the structure of the <object>

a:b Returns the integers a, a+1, a+2, …, b

seq(from, to, by, length) Returns a sequence starting at “from”, finishing at “to”, either
increasing in steps of “by” (default ±1) or equally spaced so that
there are “length” length elements in the vector

runif(n) Returns n simulated values from a U(0,1) distribution

rnorm(n) Returns n simulated values from a N(0,1) distribution

names(<vector object>) Names the elements in the <vector object>

<object>[<position>] Returns the element from <object> at the given <position>

© IFE: 2019 Examinations The Actuarial Education Company


R11: Vectors Page 11

6 Have a go
You will only get proficient at R by practising.

1. Create a vector, v, containing the 10 numbers 21, 22, …, 30 using 3 methods:

Using the c function

Using :

Using the seq function.

2. What type of vector (numeric, logical, character) would be formed from each of the
following:

c(3, 2, FALSE)

c(TRUE, “bob”, FALSE)

c(“larry”, 7, 2)

c(5, TRUE, “ginger”).

Check your answers using the is.numeric, is.logical, is.character functions.

3. Use the seq function to generate the following:

70, 71, …., 85

50, 47, 44, …., 14

7 values equally spread between 1 and 31 inclusive.

4. Create a named vector of the temperatures 18, 20, 15 for the cities London, Paris and
Stockholm using the c function.

Create a named vector for the indices 6125.7, 17140.20 and 15323.10 for the FTSE 100,
Dow Jones and Nikkei 225 using the names function.

5. Use indexing on the vector v from Q1 to display:

3rd element

4th and 7th elements

6th‐10th elements

all but the 5th element

all but the 2nd and 8th elements

all but the 3rd to 6th elements

all elements which are smaller than 27.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R11: Vectors

6. Create a vector a of (1, 2, 3, 4, 5, 6), a vector b of (0, 1) and a vector c of (5, 1, 3, 2).

What will be the result of:

b‐1

b*c

a+b

a^b

a/c

7. Create a vector n containing 1,000 simulated N(0,1) values.

Use indexing to obtain all the values which are greater than 2 and store this in m.

Use vectors m and n to obtain an empirical estimate of P(Z  2) (ie the probability that
Z  2 ).

(Hint: use the lengths of the vectors).

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 1

Factors

Covered in R12

 Creating factors
 Specifying the order of the categories
 Abbreviating the names of arguments
 Changing the name of categories
 Indexing and arithmetic

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R12: Factors

1 Creating factors
In the last chapter we looked at vector objects, which are one‐dimensional ordered collections of
data of the same type (numeric, character, logical, etc). These can be used to store results of one
particular variable that are of interest to us, say policyholders name, age, gender, etc.

In this chapter we look at a special vector used for storing categorical data (such as gender,
occupation, make of car, country, etc) called a factor. Unlike, for example, the policyholder’s
name, categorical data can only take one of a limited number of categories. For example, gender
can only take the categories male or female.

In R, the different categories are called levels and they are assigned the values 1, 2, 3, …, n. This
allows R to store them more efficiently (rather than treating each as unique) and use the
categories for graphing or as inputs in a statistical model, such as a generalised linear model.

Suppose we collect the gender of six policyholders:

Male Female Female Male Female Female

We could store these in a vector, gender, using the concatenate function:

The object gender is a character vector of length 6:

We can examine the structure of the object gender:

As expected, we see it contains character data (“chr”), has six values and that they are stored in
R’s memory as “Male”, “Female”, etc.

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 3

To convert a vector into a factor we use the factor command:

factor(<vector object>)

Let’s take the data stored in the vector object gender and put it in a factor object, which we’ll call
gender.factor and print it out:

We can see that it prints the elements (but no longer as character values in speech marks) and it
also gives the levels (ie the categories) that the data can take. Note that the levels are, by default,
sorted into alphabetical order.

We can see that gender.factor is no longer a vector object but a factor object but still has length
6:

We can examine the structure of gender.factor:

We can see that it is a factor with two levels (“Female” and “Male”) which are by default in
alphabetical order. However, when it displays the elements we no longer see “Male”, “Female”,
“Female”, … but 2, 1, 1, … . This is because R assigns positive integers to each level/category. The
female category is assigned a value of 1 and the male category is assigned a value of 2, and R
stores these numbers in its memory instead. So essentially we can see that R has converted our
categorical data to an equivalent numeric vector. This saves memory (1 and 2 use less space than
“Female” and “Male”) and means we can put these numbers into functions.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R12: Factors

We can use levels(<factor object>) to display the levels and nlevels(<factor object>) to display
the number of levels of a factor object:

In the example above we converted a categorical character vector into a factor. We could also
have entered the data directly using the factor command.

Suppose we collect the occupations (which can take the categories of blue collar, white collar and
professional) of the same six policyholders. Abbreviating them as bc, wc and prof they were:

wc wc wc bc wc prof

We can put these in a factor object, which we’ll call occupation, as follows:

Again we see that the levels are assigned alphabetically.

Looking at the structure:

We can again see that the numbers have been assigned alphabetically to each category, so bc is
level 1, prof is level 2 and wc is level 3.

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 5

2 Specifying the order of the categories


In this section we look at how we can specify the order of the levels (ie the categories) to be
something other than alphabetical.

For an object that already exists we can change the levels using the levels(<factor object>)
command. For example the gender.factor object has two levels currently in alphabetical order
(Female, Male). To change them to (Male, Female) we do the following:

We can examine the printout and structure:

The levels have changed order, male is first instead of female but the assignment of 1 to female
and 2 to male is unchanged. This is unfortunate as the data values were stored internally as 2, 1,
1, … and so whereas before that was Male, Female, Female,… it now says Female, Male, Male. So
what we have done is relabelled the levels of the factors and by doing this have changed our data
set! This is obviously not a good idea.

So we have to specify the order of the levels when we create the factor. Levels is an optional
argument of the factor command:

factor(<vector object>, levels=<levels vector>)

So let’s redefine the object gender.factor using the factor command but this time we’ll specify the
order of the levels to be Male then Female:

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R12: Factors

Now when we print it out and look at its structure we get the following:

We can see that the levels are now in the specified order (Male then Female) and the assignment
of the numbers now follows this order, so Male = 1 and Female = 2. Hence the data values are as
they should be Male, Female, Female, etc.

Let’s now redefine the factor object occupation but this time we’ll specify the order of the levels
to be bc, then wc and then prof:

Note that if you are entering the above command in an RStudio Script, rather than in the Console,
then you won’t need to enter the “+” symbol, This is just R telling us we need to enter more if we
press enter before completing the command.

Now when we print it out and look at its structure we get the following:

We can see that the levels are now in the order we’ve specified and the assignment of the
numbers to the levels is in this order, so bc is now 1, wc is now 2 and prof is now 3.

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 7

3 Abbreviating the names of the arguments of a function


Recall from an earlier chapter that when specifying the arguments of a function we could use a
unique abbreviation for the argument’s name or omit it altogether as long as we put the
arguments in the correct order. The factors command has another possible argument called
“labels” and so we can’t uniquely abbreviate the argument “levels” to “l”:

However, we can uniquely abbreviate it to, say, “le”:

Levels is the actually the second argument and so we could omit its name altogether:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R12: Factors

4 Changing the name of the categories


For the occupations we used bc, wc and prof as abbreviations because it was quicker than
entering blue collar, white collar and professional for every policyholder. It’s possible to display
the full name of the data entry, but to save time, enter the data as an abbreviation. The way to
do this is to use the third argument of the factors command which is “labels”.

factor(<vector object>, levels=<levels vector>, labels=<labels vector>)

We’ll now re‐enter the previous command but this time with the labels argument with the full
names in the same order as the levels argument:

Now when we print it and look at its structure we get:

We can see that the data are now printed in full. Unfortunately because there are two words in
the first two categories it’s a little confusing when they’re displayed to differentiate between the
policyholders. So it would be wise to use a full stop or underscore to separate the words.

We could re‐enter the whole command again but as we saw earlier we can change the labels of
an existing factor using the levels(<factor object>) command. This is a bit confusing as we’d
expect to use a “labels(<factor object>)” command but this is unfortunately the way R works. So,
being careful to ensure we keep the correct order of the levels/categories, we’ll change the labels
of the factor object “occupation” to “blue.collar”, “white.collar” and “professional” as follows:

Printing occupation and looking at its structure we get:

We can see that not only have the levels been relabelled but using the full stops makes it much
easier to differentiate between the different data values. So levels inside the factor command
specifies the order only, however levels(<factor object>) replaces the category names (ie the
labels) with the new names.

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 9

5 Indexing and arithmetic

Indexing
Just like for vectors we can select some of the elements using indexing. Here are some examples
on the occupation factor:

Factor arithmetic
Whilst each category/level is stored as a non‐negative integer, factors are, for all intents and
purposes, character data. As such, we can’t apply arithmetic operations to them. For example:

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R12: Factors

6 Ordered factors
The categorical data we’ve been looking at in this chapter so far (gender and occupation) has no
intrinsic order to it. As such, if we try to compare policyholders, it will return an error. For
example, comparing the first and second policyholder’s occupations (white.collar and
white.collar) gives the following:

This kind of quantitative data is often called nominal data.

However, some categorical data do have an inherent order. For example, we might have the
categories small, medium or large, which have the following order:

small < medium < large

Or the categories strongly disagree, disagree, all the way up to strongly agree:

strongly disagree < disagree < neutral < agree < strongly agree

This kind of quantitative data is called ordinal data and, in R, we store ordinal data in an ordered
factor. To do this we use the factor command as before with the optional argument “ordered”
set to TRUE (by default if it’s omitted it is set to FALSE which gives us nominal data).

factor(<vector object>, levels=<levels vector>, labels=<labels vector>, ordered = TRUE)

Suppose our six policyholders are asked to describe their general health and the ordered
categories are poor, average and good:

good poor average average good good

We can put these in an ordered factor object, which we’ll call health, as follows:

However, because R sets the levels alphabetically by default, the order it gives is not the most
sensible:

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 11

It says average < good < poor. So poor is the best health category! The lesson here is to always
specify the desired order of the levels!

Re‐entering the command with the levels option and the ordered option:

We can see that they are now in the correct ascending order poor < average < good.

Comparing elements from ordered factors


Now the order has been specified, it makes sense to compare elements in a factor.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R12: Factors

7 Summary

Key terms
Factor A special type of vector used for storing categorical data

Categorical data Data which can only take one of a number specified categories, eg
gender taking only Male or Female

Levels The categories that data can take in a factor

Labels Names given to a factor’s levels

Key commands
factor(<vector object>) Turns a vector into a factor. Has optional arguments of levels,
labels and ordered

is.factor(<object>) Logical test of whether <object> is a factor

Returns TRUE or FALSE.

levels(<factor object>) Displays the levels of a factor object

nlevels(<factor object>) Displays the number of levels of a factor object

© IFE: 2019 Examinations The Actuarial Education Company


R12: Factors Page 13

8 Have a go
You will only get proficient at R by practising.

1. Create an ordered factor, results, containing some maths test results from 7 students:

(A, C, C, E, D, B, B)

Label the grades A‐E as Excellent, Good, Average, Below Average and Poor.

Hint: remember the lowest category goes first.

2. Use a command in R to check that the second student performed better than the fifth
student.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 1

Matrices

Covered in R13

 Creating matrices
 Naming matrices
 Indexing matrices
 Matrix arithmetic
 Other matrix functions

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R13: Matrices

1 Creating matrices
Recall from a previous chapter that a vector is a one‐dimensional ordered collection of data of the
same type (numeric, character, logical, complex or raw). In this chapter we look at matrices.

A matrix is a two‐dimensional ordered collection of data of the same type. For example:

 "barry" "alice" 
 3 2.1   "harry" "belinda"   TRUE TRUE FALSE 
 4.9 8.6     FALSE TRUE FALSE 
   "larry" "chelsea"   
 

We can find out whether an object is a matrix by using is.matrix(<object>). The dimensions are
called rows and columns and the numbers of each can be found by using the nrow(<matrix
object>) and ncol(<matrix object>), respectively. Alternatively, you could use the dimensions
command dim(<object>) to get both the number of rows and columns.

To create a matrix we use the matrix( ) command:

matrix(<data>, nrow = .. , ncol = ..)

Suppose we want to create a 2×2 matrix called A containing the values 3, 2, 4 and 1. We’ll need
to use the concatenate, c( ), function in the first argument to let R know these 4 values are the
data. Otherwise it will think we have one value 3 in a matrix with 2 rows and 4 columns!

A <‐ matrix(c(3,2,4,1), nrow = 2, ncol = 2)

Or as long as we enter the arguments in the correct order we can omit their names:

A <‐ matrix(c(3,2,4,1), 2, 2)

Notice how by default R will fill the elements of the matrix by column:

This is because R thinks of a matrix as a collection of vectors of the same length.

To specify that we want to fill the elements of the matrix by row we need to specify another
argument of the matrix function, byrow, as TRUE.

To create a matrix B with these same data values but filled by row, we would type:

B <‐ matrix(c(3,2,4,1), nrow = 2, ncol = 2, byrow = TRUE)

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 3

Or again, as long as we enter the arguments in the correct order we can omit their names:

Let’s look at the properties of matrix A:

Recall that another way of obtaining all this information in one go is to use the str(<object>)
command which displays the structure of an R object:

It says it is numeric data, gives the dimensions (rows, columns) and the contents listed in column
order (in this case all of the contents, but for larger objects it will only give some of them).

Coercing
Since a matrix is a collection of data of the same type, if we try to create a matrix of different data
types, R will try to coerce them all to the same type. For example:

In this case it has converted the logical data into numeric data (TRUE becomes 1 and FALSE
becomes 0) so that it is now a matrix of numeric data .

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R13: Matrices

Alternative ways of specifying the elements


Just like we did with vectors, we can specify the elements using the consecutive integer function
a:b or the sequence function seq(from, to, by, length) or the simulation functions such as runif(n)
or rnorm(n):

Creating matrices from other objects


We can create matrices from vectors or other matrices. We can do this using the matrix function.
This will take the elements from the objects and use them in order regardless of the length of the
vectors of sizes of the matrices. Again, by default it will assign the elements in columns:

So in the above example we can see that all 4 elements from Matrix A have been read in columns
and then used to fill up the first column of the new matrix. Likewise all 4 elements from Matrix B
form the second column.

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 5

Here’s an example where we form a matrix from 2 vectors:

Creating matrices using cbind and rbind


An alternative method of creating matrices from vectors or other matrices is to use the column
bind (cbind) or row bind (rbind) functions. This binds the columns or rows together to form a new
matrix. First let’s use column bind:

We can see the new matrix is made up of the two new matrices stuck side by side (ie by column).
Similarly when we column bind two vectors:

We can see that the vectors are combined size by side (ie by columns) to form a matrix. This is
what we would expect as they are column vectors even though they are displayed horizontally.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R13: Matrices

Let’s now use row bind on the two matrices:

We can see the new matrix is made up of the two new matrices stuck one on the other (ie by
row).

Combining the two vectors using row bind gives:

We can see that the vectors have been treated as row vectors and put on top of each other.

We can also use these functions if we want to add another row or column to a matrix:

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 7

2 Naming matrices
Recall that we could name each of the elements in our vectors. For matrices, we can name the
rows and columns. There are two ways of doing this. The first is to use the option dimnames in
the matrix command:

matrix(<data>, nrow = .., ncol = .. , byrow = FALSE, dimnames = list(<row names>, <col names>))

The “list” object for dimension names is because it is just one argument of the command but we
need names for both dimensions. We use the concatenate, c, command to combine together the
rows and the columns. Finally since the names are characters they should be enclosed in quotes.

For example suppose we want to create a matrix, N, that contains the expenditure on rent, food
and bills for two individuals A and B:

A B

rent  500 750 


N  food 100 150 
bills  75 100 

Using the dimnames(<matrix object>) command lists out the row and column names:

These are also displayed when using the structure command, str(<object>):

An alternative way of naming the rows and columns of a matrix is to use the rownames or
colnames functions.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R13: Matrices

For example suppose we want to now create a matrix, P, that contains the expenditure on rent,
food and bills for two different individuals C and D:

C D

rent  600 525 


P  food 75 100 
bills  100 150 

Note that if you create matrices using the cbind or rbind functions then the names of those
matrices will be used for the columns or rows, respectively:

We see here that the names of the vectors have become the names of the columns.

And in this second case the names of the vectors have become the names of the rows.

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 9

3 Indexing (or subsetting) matrices


Recall that indexing is the process of selecting only some of the elements of an object. We give
the position of the element in square brackets after the name of the object.

Since matrices have two dimensions we will need to give the row and column of the element we
want. So let’s define a bigger matrix:

1 4 7 
M   2 5 8 
3 6 9
 

We can choose the element in the first row and second column by writing M[1,2]:

To display all the elements of, say, the first row, we omit the figure for the column and type M[1,]:

Similarly, to display all the elements of the second column, we omit the figure for the row and
type M[ ,2]:

You notice that even though we have selected the column it displays it horizontally as R does for
vectors.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R13: Matrices

To display more than one row or more than one column we simply enter the multiple values using
the c( ) command. For example, to display the elements in the 1st and 2nd rows of the 3rd
column:

Or the 2nd and 3rd rows of the 1st and 3rd columns:

We could also select multiple rows or columns using a:b, the consecutive integer function:

We can use the dimension commands nrow( ) or ncol( ) to specify all the rows or columns until
the end:

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 11

To specify all row/column elements except the specified ones we use a negative in front of their
positions:

We can select elements using the results of logical tests. For example we could select all the
values of M whose value is between 4 and 8 inclusive as follows:

We can see it returns all the elements from the matrix for which the test is TRUE. These are
collected together in a vector. However, it doesn’t give their original positions in the matrix.

Finally, if we have named our rows and columns, then we can select the elements using their
names. For example, using our matrix N from earlier which had row names of rent, food and bills,
and column names of A and B we have:

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R13: Matrices

4 Matrix arithmetic
We multiply or divide a non‐character matrix by a scalar as you’d expect:

3 1   6 2
A   2A   
 5 4   10 8 

We can add and subtract matrices in the usual way:

3 1   4 2   7 1   1 3 
A  B   A B    A B   
 5 4   3 1   2 5   8 3 

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 13

Just like in maths, you can only perform this operation on matrices of the same dimensions:

However, unlike maths, if you add or subtract a scalar to a matrix, it does this to each element:

To multiply matrices you should use the operator %*% rather than * :

3 1   4 2   (3  4)  (1  3) (3  2)  (1  1)   15 7 
A  B   AB    
 5 4   3 1   (5  4)  (4  3) (5  2)  (4  1)   32 14 

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 R13: Matrices

Using just * multiplies the matrices on an element by element basis:

Row and column sums


There are some handy functions available in R to calculate the row and column sums of a matrix.
These are rowSums(<matrix>) and colSums(<matrix>). Note that the capital letter in Sums is
necessary:

Transpose
To obtain the transpose of a matrix we use the t(<matrix object>) function:

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 15

5 Other matrix functions

Determinants
These can be found using, unsurprisingly, the det(<matrix object>) function:

3 1 
A   det A  (3  4)  (1  5)  7
 5 4 

Inverses
We can find the inverse of a matrix using the solve(<matrix object>) function. Recall for a 2×2
matrix:

a b 1  d b 
A   A 1   
c d ad  bc  c a 

Hence, we have:

3 2 1  4 2 
M   M1   
5 4 2  5 3 

The solve function can be used more generally to solve a set of equations of the form:

Ax  b

For example to solve:

2 x  3y  14
3x  4y  4

The Actuarial Education Company © IFE: 2019 Examinations


Page 16 R13: Matrices

We write the equations in vector/matrix form as follows:

Ax = b

 2 3  x   14 
 3 4  y    4 
    

Then:

x = A ‐1 b
1
 x   2 3   14   4 
 y   3 4   4    2 
       

To solve this in R we would type solve(A,b):

Eigenvalues and eigenvectors


Recall that if Av   v where  is a constant, then  is an eigenvalue and v is an eigenvector of
matrix A .

We can obtain the eigenvalues by finding the values of  for which det(A  I)  0 .

 2 1 
A 
2 5 

 2   1 
det  0
 2 5   

(2   )(5   )  2  0

 2  7  12  0
(  3)(  4)  0    3, 4

For   3 this means we have:

 1 1  x   0   x  y   0 
 2 2  y    0    2 x  2y    0   x   y
        

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 17

1 
So the eigenvectors corresponding to   3 are of the form k   .
 1 

For   4 this means we have:

 2 1  x   0   2 x  y   0 
 2 1  y    0    2 x  y    0   y  2 x
        

1 
So the eigenvectors corresponding to   4 are of the form k   .
 2 

To obtain the eigenvalues and eigenvectors in R we use the function eigen(<matrix object>):

Notice that it gives the “normalised” version of the eigenvectors – that is a vector with a modulus
of 1. So:

1 2  1 5 
k  and k  
 1 2   2 5 
   

Notice how it says $values and $vectors? If you just wanted the eigenvalues you could type:

eigen(A)$values

Similarly, if we just wanted the eigenvectors we could have typed eigen(A)$vectors.

We’ll discuss this $ notation more in the next chapter on data frames.

The Actuarial Education Company © IFE: 2019 Examinations


Page 18 R13: Matrices

6 Summary

Key terms
Matrix A two‐dimensional (row, column) ordered collection of data of the
same type

Concatenate Join together

Coercing The process of changing the data types so that they are all the
same

Indexing The process of selecting an element from a matrix object

Key commands
matrix(<data>, nrow = , ncol = , byrow = FALSE, dimnames = list(<row names>, <col names>))

Creates a matrix of dimensions nrow by ncol out of the data

Last two arguments are optional

By default it fills the matrix by column, change byrow=TRUE to


change this to rows

Use dimnames to give names to the rows and columns

is.matrix(<object>) Logical test of whether <object> is a matrix

Returns TRUE or FALSE

dim(<matrix>) Gives the dimensions (ie rows and columns) of <matrix>

nrow(<matrix>) Gives the number of rows of <matrix>

ncol(<matrix>) Gives the number of columns of <matrix>

str(<object>) Displays the structure of the <object>

cbind(<object1>, <object2>, …) Creates a matrix by combining the objects in columns

rbind(<object1>, <object2>, …) Creates a matrix by combining the objects in rows

rownames(<matrix>) Names the rows of the <matrix>

colnames(<matrix>) Names the columns of the <matrix>

<matrix>[<row>,<column>] Returns the element from <matrix> given in <row> and <column>.

t(<matrix>) Returns the transpose of <matrix>

det(<matrix>) Gives the determinant of <matrix>

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 19

%*% Matrix multiplication operator

solve(<matrix>) Gives the inverse of <matrix>

solve(A,b) Gives the vector x that solves Ax  b .

eigen(<matrix>) Gives the eigenvalues and eignevectors of <matrix>

rowSums(<matrix>) Displays a vector of the sums of each row

colSums(<matrix>) Displays a vector of the sums of each column

The Actuarial Education Company © IFE: 2019 Examinations


Page 20 R13: Matrices

7 Have a go
You will only get proficient at R by practising.

1. Create a matrix C :

1 2 3 4 
C   5 6 7 8 
 9 10 11 12 
 

(a) using the c function

(b) using the colon function

(c) using the seq function

2. Use an appropriate function (other than matrix( )) to create a new matrix, D , which is the
same as matrix C but with an additional row containing the elements 13, 14, 15, 16.

3. What type of matrix (numeric, logical, character) would be formed from each of the
following:

(a) matrix(c(3, FALSE, 1, TRUE), 2, 2)

(b) matrix(c(TRUE, “bob”, FALSE, “harry”), 2, 2)

(c) matrix(c(“larry”, 7, 2, ‐1), 2, 2)

(d) matrix(c(5, TRUE, “ginger”, 0), 2, 2)

Check your answers using the is.numeric, is.logical, is.character functions.

4. (a) Create the following named matrix of temperatures using the dimnames option:

Mon Tue

London  18 20 
T  Paris  20 19 
Stockholm  15 13 

(b) Create the following named matrix of indices using rownames and colnames
functions:

Mon Tue

FTSE  6,125.7 6,078.2 


T  Dow Jones  17,140.2 17,160.7 
Nikkei  15,323.1 15,323.8 

© IFE: 2019 Examinations The Actuarial Education Company


R13: Matrices Page 21

5. Use indexing on the matrix C from Q1 to display:

(a) the element 7

(b) the third row

(c) the first column

(d) all but the 2nd row

7 8
(e) a 2×2 matrix of the bottom right  
 11 12 

(f) the 2nd and 3rd columns of the 1st and 2nd rows

(g) all elements which are smaller than 10.

6. Create the following matrix in R:

 1 3 
M   2 5 
0 9 
 

(a) Use R to find its dimensions, 3M , MT and MMT .

(b) Perform an operation in R to subtract 3 from each element in matrix M .

(c) Use R to obtain the column sums and row sums for matrix M .

7. Create the following matrices in R:

1 9  1 2 
A  B 
 2 10   3 4 

(a) Use R to calculate AB and BA . Hence show that matrices are not commutative
(ie AB  BA ).

(b) Use R to find its determinant, B1 and hence show that BB 1  I .

(c) Use R to find the eigenvalues and eigenvectors of B .

8. Use matrices in R to solve the following set of simultaneous equations:

2p  4q  14
2q  3p  13

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 1

Data frames

Covered in R14

 Creating data frames


 Indexing
 Adding a row or column to a data frame
 Names versus objects

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R14: Data frames

1 Creating data frames


In previous chapters we described one‐dimensional objects: vectors (which contain numeric,
character or logical data) and factors (which contain categorical data – either ordinal or nominal).
We also looked at matrices which were two‐dimensional objects. Recall that all of these objects
had to contain data of the same type. So a vector could not, for example, contain numbers and
characters.

In this chapter we look at data frames which are two‐dimensional objects like matrices. However,
they can contain different types of data. Essentially a data frame is a collection of column
vectors/factors of the same length. The columns are the vectors/factors which contain data for a
single variable (eg policyholder’s names, ages and smoker status). The rows represent a single
observation (eg a single policyholder).

 Alfie 34 TRUE 
 Belinda 28 FALSE 

 Charlie 31 FALSE 
 
 Delilah 38 TRUE 

So we can see that whilst each column (ie vector/factor) contains data of the same type, the
different columns (ie vectors/factors) can be different data types. So in the example above we
have character data for the first column, numeric data for the second column and logical data for
the third column.

You can find out whether an object is a data frame by using is.data.frame(<object>). The
dimensions are called rows and columns and the numbers of each can be found by using the
nrow(<data frame object>) and ncol(<data frame object>) commands respectively. Alternatively,
you could use the dimensions command dim(<object>) to get both the number of rows and
columns.

In this section we look at how to create a data frame from scratch. This will be rather long‐
winded and so you will often use scripts to prevent laborious retyping. In reality we will usually
import data from a spreadsheet or other source to create a data frame. We cover this in the next
chapter.

To create a data frame from scratch we use the data.frame( ) command. This will create a data
frame out of a list of vectors (or other data frames):

data.frame(<vector1>, <vector2>, ….)

As already mentioned, vectors don’t have to contain the same data type but they do need to be
the same length. If not, then R will recycle/extend the shorter ones to make them the same
length as the longest vector.

We could either create the vectors separately and then put them together in a data frame or we
can create them inside this function. We’ll look at both to show you the pros and cons of each
approach.

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 3

Let’s create a data frame for the example mentioned above:

 Alfie 34 TRUE 
 Belinda 28 FALSE 

 Charlie 31 FALSE 
 
 Delilah 38 TRUE 

To do this and store it in object A, we would type:

A <‐ data.frame(c("Alfie", "Belinda", "Charlie", "Delilah"), c(34, 28, 31, 38), c(TRUE, FALSE, FALSE,
TRUE))

You might prefer to enter this function in two parts then just put part of the function on the next
line.

However when R displays this data frame it is not pretty. The reason is that a data frame
automatically looks for column names from the vectors. This is great if we create a data frame
out of existing vectors (as we shall see a bit later) but if, as we did above, we create it from
scratch it causes rathy messy names.

We can name the vectors either within the data.frame( ) command itself or separately using the
colnames( ) command or the names( ) command (which assumes you’re talking about columns
when applying it to a dataframe).

To define the names inside the data.frame( ) command is similar to naming elements in a vector.
We put the name equal to the vector as follows:

data.frame(<name1>=<vector1>, <name2>=<vector2>, ….)

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R14: Data frames

Let’s create this data frame again but this time name the columns – name, age and smoker:

A <‐ data.frame(name = c("Alfie", "Belinda", "Charlie", "Delilah"), age = c(34, 28, 31, 38), smoker =
c(TRUE, FALSE, FALSE, TRUE))

This is much more beautiful.

Had we wished to do this via the colnames or names functions we would have typed one of the
following:

colnames(A) <‐ c("name", "age", " smoker")

names(A) <‐ c("name", "age", " smoker")

Note that the numbers 1, 2, 3, 4 down the lefthand side of the data frame are not index values
referring to the first, second, third and fourth rows but the names of the rows and are, therefore,
characters “1”, “2”, “3”, “4”. We can rename them using the rownames( ) function. However,
there is little point in this case as the names of the individuals are included in the data frame
itself.

Properties
Let’s look at the properties of the data frame A:

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 5

And let’s look at its names:

Recall that another way of obtaining (nearly) all this information in one go is to use the
str(<object>) command which displays the structure of an R object:

So the data frame is the default object for data input which, like our example, assumes that the
columns are the variables we are observing and the rows are the observations. Hence it says that
we have 4 observations of 3 variables.

The names of the observations are given at the start of each line, followed by their data type,
followed by their contents (or some of the contents if there are too many to display).

We can see that age is numeric data and smoker status is logical, however names are not
characters but factors. The reason for this is that data frames assume that observations are
factors unless we tell it otherwise. This is understandable as most data we observe is categorical
(eg policy type, car type, postcode, etc). We can turn this feature off by setting the
stringsAsFactors option in the data frame to FALSE (instead of the default TRUE).

A <‐ data.frame(name = c("Alfie", "Belinda", "Charlie", "Delilah"), age = c(34, 28, 31, 38), smoker =
c(TRUE, FALSE, FALSE, TRUE), stringsAsFactors = FALSE)

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R14: Data frames

When we do this we can see that the structure command now lists names as character data:

Creating data frames from other objects


Just like we could create a matrix from vectors, we can also create a data frame from vectors.
Let’s demonstrate this by creating vectors containing the heights and weights of the four
individuals we used in our previous data frame.

We can see that, by default, the names of the vectors will be the column names.

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 7

Similarly we can create a new data frame from other data frames. Let’s now combine our two
data frames A and B together:

Since the arguments of the data frame function are the (column) vectors, it combines the two
data frame objects side by side (ie as new columns).

Creating data frames using cbind and rbind


Just as we did in the matrix chapter we can use column bind (cbind) or row bind (rbind) functions
to combine vectors together. However, by default this combines vectors to form a matrix (with
coercion, if necessary). But if we cbind or rbind a data frame with another object (such as a
vector) then it creates a data frame instead of a matrix. We’ll look at this later in the chapter.

Coercing
Since each column of a data frame is a vector then each column must contain data of the same
type. So if there is a mix of different data types in a column then R will try to coerce them all to
the same type.

For example suppose we create a data frame, D, with two columns, c1 and c2, as follows:

We can see for the first column it has converted the logical data into numeric data (TRUE
becomes 1 and FALSE becomes 0) so that it is now a vector of numeric data . We can see that the
second column has been converted to factors. Had we specified the stringsAsFactors = FALSE
then it would have coerced into a character vector. Try it out to see for yourself.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R14: Data frames

Recycling
Since a data frame is a collection of column vectors/factors of the same length then if we try to
create a data frame from vectors of differing lengths then it will recycle shorter vectors. The
following data frame is made from 3 vectors (v1, v2 and v3) of lengths 1, 2 and 4:

Since the length of the longest vector is 4, the shorter vectors have been recycled (ie extended by
repetition) until they are also of length 4.

However, suppose that the longest vector is not a multiple of one or more of the other vectors
(eg lengths 1, 2 and 5). When we did vector arithmetic it displayed an error message but still
performed the operation. However, for data frames it displays an error message and stops:

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 9

2 Indexing (or subsetting) data frames


Recall that indexing is the process of selecting only some of the elements of an object. We give
the position of the element in square brackets after the name of the object.

Since data frames have two dimensions we will need to give the row and column of the element
we want. Let’s apply this to our first data frame object A:

 Alfie 34 TRUE 
 Belinda 28 FALSE 

 Charlie 31 FALSE 
 
 Delilah 38 TRUE 

We can choose the element in the first row and second column by writing A[1,2]:

To display all the elements of, say, the first row, we omit the figure for the column A[1,]:

Similarly, to display all the elements of the second column, we omit the figure for the row A[ ,2]:

You notice that even though we have selected the column it displays it horizontally as R does for
vectors.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R14: Data frames

To display more than one row or more than one column we simply enter the multiple values using
the c( ) command. For example, to display the elements in the 1st and 2nd rows of the 3rd
column:

Or the 2nd and 3rd rows of the 1st and 3rd columns:

We could also select multiple rows or columns using a:b the consecutive integer function:

We can use the dimension commands nrow( ) or ncol( ) to specify all the rows or columns until
the end:

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 11

To specify all rows/columns elements except the specified ones we use a negative infront of their
positions:

We can select elements using the results of logical tests. However, since our data frame consists
of different types of data we will look at how we can do logical tests just on one variable (ie
column) in a later chapter.

We can also use the names of rows or columns to select elements. In our data frame we had
variable/column names of name, age and smoker:

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R14: Data frames

Subsetting using the $ notation


An alternative way of specifying a column is to use the $ notation.

<data frame>$<column name>

For example, our data frame object is called A and the first of its column names is “name”. So we
can specify that column immediately using A$name. Similarly for the other columns:

This is a very useful way of obtaining a vector object from the data frame to use in calculations (eg
mean or correlation). We’ll make use of this in a later chapter.

Note that just like with function arguments, you can abbreviate the column name to the fewest
letters that uniquely define it. Since our columns all begin with different letters we could just use
the first letters:

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 13

3 Adding new columns or rows to a data frame

Adding a new column


There are three ways of adding a new column to an existing data frame. The first is by combining
the objects using the data.frame command. We did this earlier in the chapter. In the example
below we combine our original data frame with a vector of heights to add this variable:

The second way is to use the cbind function. So we could have obtained the same result as
follows:

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 R14: Data frames

The third method is to use the dollar notation to define a new column. For example let’s add the
weight vector as another column to the data frame H above:

Adding a new row


To add a new row (ie a new observation for each of the variables) is more tricky. We can’t use a
vector as it is likely that the different variables (eg name, age, smooker status, etc) are different
data types as they are in our example.

The only way to do it is to use the row bind function on the original data frame and a new data
frame containing the observations for that individual.

Suppose we wish to add Eddie to the data frame who is aged 24, is not a smoker, is 172cm tall
with a weight of 82kg. We’ll put these results in a data frame called I and then add this row to
data frame H above:

There is a problem as the names of our new observation data frame do not match those of the
original. So we have to define the same names first, and then we can combine them:

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 15

4 The important difference between column names and objects


Let’s have a look at the objects we have created during this chapter:

We can see that we have the objects “height” and “weight” that were vectors we used to create
more columns in some of our data frames such as C:

Let’s look at what happens to the data frames if we change the weights vector.

You can see that the last result in the vector weight has been changed but not the values of the
weight column in the data frame.

The Actuarial Education Company © IFE: 2019 Examinations


Page 16 R14: Data frames

The data frame doesn’t update the values of the weight vector after it has been created. To do
that you would have use the weight subset of the data frame as follows:

This may seem like a silly point at the moment – but it will be very important in the next chapter
when we look at attaching data frames.

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 17

5 Summary

Key terms
Data frame A two‐dimensional (row, column) ordered collection of column
vectors/factors of the same length – each column (vector) must
contain data of the same type but different columns can have
different types.

Coercing The process of changing the data types so that they are all the
same

Indexing The process of selecting an element from a data frame object

Subsetting Another name for indexing – usually used for where more than
one element is obtained from an object

Recycling The process of extending a smaller vector by repetition to make


sense of operations with other vectors – returns a warning if the
length of the smaller vector is not a multiple of the length of the
larger vector

Key commands
data.frame(<name1>=<vector1>, <name2>=<vector2>, … , stringsAsFactors=TRUE)

Creates a data frame of dimensions nrow by ncol out of the data


with column names <name1>, <name2>, …

If no name is given the name of the vector is used

Last argument is optional

By default it assumes that all character data are factors

Use dimnames to give names to the rows and columns

is.data.frame(<object>) Logical test of whether <object> is a data frame

Returns TRUE or FALSE

dim(<data frame>) Gives the dimensions (ie rows and columns) of <data frame>

nrow(<data frame>) Gives the number of row of <data frame>

ncol(<data frame>) Gives the number of columns of <data frame>

class(<object>) Gives the class of an object

str(<object>) Displays the structure of the <object>

cbind(<object1>, <object2>, …) creates a data frame by combining the different vector objects

The Actuarial Education Company © IFE: 2019 Examinations


Page 18 R14: Data frames

rbind(<object1>, <object2>, …) creates a data frame by combining the objects in rows

names(<data frame>) Gives the names of the column vectors in the <data frame>

Can also be used to assign column names:

names(<data frame>) <‐ c(<name1>, <name2>, …)

rownames(<data frame>) Gives the names the rows of the <data frame>

Can also be used to assign row names:

rownames(<data frame>) <‐ c(<name1>, <name2>, …)

colnames(<data frame>) Gives the names the column vectors in the <data frame>

Can also be used to assign column names:

colnames(<data frame>) <‐ c(<name1>, <name2>, …)

<data frame>[<row>,<column>]
Returns the element from <data frame> given in <row> and
<column>

Omitting one of the values returns the whole row/column

© IFE: 2019 Examinations The Actuarial Education Company


R14: Data frames Page 19

6 Have a go
You will only get proficient at R by practising.

1. Create a data frame, cars, containing the following data:

Ê Ford Red 9250ˆ


Á Skoda Blue 6500˜
Á ˜
Ë VW Silver 2300¯

2. Label the columns, Make, Colour and Price.

3. Add an additional column called Year containing the data: 2017, 2015, 2012.

4. Display only the colours from the data frame.

5. Display all the data except the colours.

You will have ample opportunity to create and manipulate data frames in your study of CS1 or
CS2.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 1

Importing data

Covered in R15

 Overview
 Using datasets in packages
 Importing data frames from other programs
 Importing data from CSV files
 Importing data from Excel files and elsewhere

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R15: Importing data

1 Overview
In the last chapter we looked at the data frame, which is going to be the standard object used for
most data analysis.

In this chapter we look at importing data from various sources into R.

There are three fundamental ways of getting data into R:


 manually
 using datasets included in R’s packages
 importing it from another program such as Excel.

We have already covered how to manually enter data into vectors or data frames in previous
chapters. So this chapter will cover the other two methods.

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 3

2 Using datasets in packages

Inbuilt datasets
It may be helpful to review Chapter 8 before reading this section.

We’re going to look at the datasets package which, unsurprisingly, contains a variety of datasets.

To find out more about the contents of this package we can click on its name in the Packages
window.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R15: Importing data

Clicking on the links will take us to the help page on that specific dataset. For example, clicking on
rivers gives:

Accessing inbuilt datasets


Since these inbuilt datasets are already loaded into R’s workspace we can access them by simply
typing their name:

Here R displays the vector of 141 observations (lengths in miles of 141 major rivers in North
America).

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 5

Datasets from other packages


Other packages often contain datasets.

Let’s load up the MASS package (using the tick box in the Packages window).

If we use library(help=MASS) or help(package="MASS") we can see that the MASS package


contains lots of datasets (87 in fact).

Let’s take a look at one of these, Cars93, which contains data from the different types of cars sold
in the USA in 1993. From the help page we can see it is a data frame with 93 types of cars (rows)
and 27 variables (columns):

Once we’ve loaded the package into the workspace we can access the dataset simply by typing its
name:

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R15: Importing data

Because of the size of this data frame it is not possible to display it easily in the Console. Using
the structure command can tell us about the different variables (columns):

It’s easier to view the dataset by selecting it in the Environment window. First select the package
by changing Global Environment to packageMASS using the dropdown menu:

Then select the dataset:

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 7

This will open up a new window displaying the dataset in a grid:

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R15: Importing data

3 Importing data frames from other programs


We can import data from many sources, including:
 text documents
 CSV files
 Excel
 other statistical packages such as SAS or SPSS.

Importing data from text files


Let’s create a simple text document. Open up notepad on your machine and enter the numbers 1
to 5 on separate lines. Note that you need to hit return after entering the last number:

Save this file in your working directory as “Data.txt”.

You can import datasets in RStudio using the Environment window. Click on Import Dataset and
select the first option: From Text (base).

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 9

Open the text file you have just created and RStudio will open a window where you can select
some options to use for the import (feel free to experiment with the options now or later).

We can see that it has placed a V1 above the numbers. That’s because R places the data by
default into a data frame and has given the column the name “V1”. When you click on Import
RStudio will display the data in a window and you’ll also see the object listed in the Environment
window.

Let’s now create a three‐column dataset in notepad. We can separate the columns with spaces or
tabs – both will work. Let’s save this dataset as “data2”:

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R15: Importing data

We will now load this into object C (by changing the Name in the Import window):

Again, we can see that it is placed in a data frame and by default it has column headings of V
followed by the column number.

Suppose we want to have column headings “name”, “age” and “smoker”. One way to do this is
to use the colnames (or names) function like we did in Chapter 14. For example:

colnames(C) <‐ c("name", "age", "smoker")

An alternative way is to add the column names in the original text document (calling this
data3.txt):

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 11

When we reach the Import window we can use the Heading option.

By default the row names are “1”, “2”, “3”, etc. We could change these using the rownames
function like we did in Chapter 14. Alternatively, we could use the row names option in the
window above and use the first column of data.

Let’s look at the structure of this data frame:

Just like in the previous chapter, R assumes non‐numeric data are factors unless we specify
otherwise. We can do this by unchecking the Strings as Factors box in the Import window.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R15: Importing data

Missing data
Missing data is often an issue. R uses the logical value NA to tell its functions that the data is
missing. However, our text file might use something else. In which case we need to tell R what it
is. We do this via the na.strings option in the Import window. For example, we have specified
“n/a” and R will replace any occurrences of this with NA:

If you keep an eye on the Console when you import data you will see that R is using the read.table
(or possibly the read‐delim) function. For example, the last instruction might have read:

Data5 <‐ read.table("~/R/Data3.txt", quote="\"", comment.char="", na.strings="n/a")

If the dataset contained more one way of indicating missing data, we could adjust this line of code
to include the alternatives. For example, here we are looking for “n/a” and “‐“:

Data5 <‐ read.table("~/R/Data3.txt", quote="\"", comment.char="", na.strings= c("n/a", “‐“) )

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 13

4 Importing data from CSV files


CSV stands for Comma Separated Value and is to Excel what NotePad is to Word. Essentially it’s a
stripped down excel file that removes all the formatting and separates the values in the cells with,
unsurprisingly, commas.

We can import CSV files in the same way as other text files. RStudio will usually recognise the file
type and automatically change the Separator option to “Comma”. For example:

All the other features are the same as before.

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 R15: Importing data

5 Importing data from Excel files and elsewhere


To use RStudio’s install menu to install Excel files you first need to download and install a package
called “openxlsx”:

You can select From Excel from the import Dataset drop‐down menu:

RStudio will then open a window from where you can open the Excel file and select a number of
options. Experiment with the options so you can understand what they do.

Importing data from other statistical packages


It is also possible to import data from other statistical packages such as SPSS, SAS or Stata and you
can see options for these in the drop‐down menu.

You may be prompted to install or update some packages when you first use these options and if
you receive an error then search the internet for some help to find out what you might need to
install first.

© IFE: 2019 Examinations The Actuarial Education Company


R15: Importing data Page 15

6 Summary

Key terms
Import Load data stored elsewhere into R

CSV Comma Separated Value – a format often used to store data sets,
with each field of a record separated by a comma

Menus
Import dataset Located in the Environment window

Key commands
read.table Imports data into R. View its help for more information on its
options.arguments.

The Actuarial Education Company © IFE: 2019 Examinations


Page 16 R15: Importing data

7 Have a go
You will only get proficient at R by practising.

1. Create a dataset in Notepad and import it into R.

2. Create a dataset in Excel, including column and row headings, and import it into R.

3. Alternatively, search the internet for some datasets and load one or two that interest you
into R.

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 1

Exporting data

Covered in R16

 Overview
 Exporting vector data to windows clipboard
 Exporting data to a text file
 Using write.table with data frames
 Using write.csv
 Other export commands

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 R16: Exporting data

1 Overview
In this chapter we’ll look at how to export data to text, csv or formats which can be used in other
programs to, for example, produce reports.

Exporting data to other programs

There are a variety of places that we can export data to. These include:
 windows clipboard
 a text document
 a CSV file
 Excel
 other statistical packages such as SAS or SPSS.

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 3

2 Exporting vector data to windows clipboard


If you only have a small amount data, it’s possible to view the data, highlight it and just copy and
paste it where you want it:

You can also copy data to the windows clipboard using the writeClipboard function. We can then
paste this into any other program such as Notepad, Word or Excel. However, it only works well
with vectors of character data.

writeClipboard(<character object>)

Let’s create a quick character vector of policyholder’s names:

name <‐ c("Alfie", "Belinda", "Charlie", "Delilah")

Then we’ll use the writeClipboard command on object “name” to copy the contents of “name”
into Windows clipboard:

Now if we open Word or Excel and paste it using CTRL+V or Edit/Paste we find that we have the
column of data from “name”:

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 R16: Exporting data

The writeClipboard command only works with character data. Let’s look at what happens if we
try to apply it to a numeric vector.

Let’s create a vector of policyholder’s ages:

age <‐ c(34, 28, 31, 38)

Then we’ll use the writeClipboard command on object “age” to copy the contents of “age” into
Windows clipboard:

R returns an error telling us it’s not a character vector and so it can’t do it.

We can fix this by using the as.character command. This coerces the data type into character
data:

We could put this in a new object and then apply writeClipboard on this new object. For example:

age2 <‐ as.character(age)

writeClipboard(age2)

Alternatively, we could just use as.character inside the writeClipboard function:

We are now able to paste it in Word or Excel.

Incidentally, if our vectors have named elements they would not be included in the clipboard.

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 5

3 Exporting data to a text file


To write (ie export) data to a text file (*.txt) or CSV file (*.csv) we use the write.table( ) command.
This function is particularly suited to exporting two dimensional objects. In this section we’ll look
at exporting it to a text file.

write.table(<object>,file="<filename>")

If the filename does not specify the file path then R will save the file in the current working
directory.

Recall that you can find out the current working directory using the getwd( ) and you can change
it using the setwd( ) command or the menu option Session/Set Working Directory.

Also remember that we can use the tilde, ~, shortcut to specify the location. You can find out
what the shortcut is on your computer by using path.expand(“~”).

Using write.table to export vectors to text file


Let’s use this command to save the “name” vector with the name “name”:

write.table(name, file="name")

or since file is the second argument we could omit the file= as long as we put it in the second
position:

write.table(name, "name")

If we look inside our working directory, we’ll find the file:

However, because we didn’t specify the file type Windows doesn’t have a clue what it is. We can
still open it but we’ll have to tell windows which program to use:

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 R16: Exporting data

Since this is a bit annoying let’s save the file correctly this time by adding the extension .txt:

When we look in our working directory we can see this file correctly identified as a text file:

Clicking on it will open it in Notepad:

There are two odd things about our text file. First, we notice that there is an “x” at the top. This
is because by default write.table adds column headings. Since our vector hasn’t got a column
heading it calls it “x”. Secondly, it has added the row names “1”, “2”, “3” and “4”. Had we named
the elements in our vector these would have appeared here instead of the default row names.

Both of these features can be turned off by using the optional arguments col.names and
row.names. By default both of these options are set to TRUE.

Let’s experiment to see what happens if we try the logical options FALSE or NA. First, let’s set
col.names to FALSE:

write.table(name, "name.txt", col.names=FALSE)

Hopefully you are shocked that R has just overwritten our previous file without even a warning.
Hopefully you’ll be careful about saving files from now on. Once you’ve recovered from the shock
and opened this file in Wordpad or Notepad you’ll see the following:

We can see that it has removed the default column name of “x”.

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 7

Next let’s see what happens when we set col.names to NA:

write.table(name, "name.txt", col.names=NA)

Opening this file in Wordpad reveals the following:

We can see that it puts in the default column name “x” but it also adds a blank column name for
the row names. This is a useful output if we were going to paste this into, say, Excel as it ensures
that everything lines up correctly in a grid.

Finally, suppose we want to call the column “name” rather than the default “x”. we simply set the
col.names option to this:

write.table(name, "name.txt", col.names="name")

Next let’s see what happens when we set row.names to FALSE:

write.table(name, "name.txt", row.names=FALSE)

Opening this file in Wordpad reveals the following:

We can see that, as expected, it has removed the row names “1”, “2”, “3” and “4”.

Let’s see what happens when we set row.names to NA:

write.table(name, "name.txt", row.names=NA)

We get an error. Clearly NA is not an option here.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 R16: Exporting data

Suppose we want to name the rows “P1”, “P2”, “P3” and “P4” rather than the default “1”, “2”,
“3” and “4”. We simply set the row.names option to equal this:

write.table(name, "name.txt", row.names=c("P1","P2","P3","P4"))

For a vector we will probably not actually want either the row or the column names. In which
case we would type the following:

write.table(name, "name.txt", row.names=FALSE, col.names=FALSE)

This gives:

Before we move on to applying this function to matrices and data.frames let’s show a couple of
other optional arguments.

The quote option


The first is quote, which refers to whether the character or factor data have double quote marks
around them. It can take the logical values of TRUE or FALSE and by default this is set to TRUE.

Let’s set it to FALSE to see what happens:

write.table(name, "name.txt", quote=FALSE)

default (quote=TRUE) quote=FALSE

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 9

The sep option


Just as with the read.table function we can specify the separator that goes between different
entries in each row. By default this is set to a single space " ". Other options could include a
comma "," a semi‐colon ";" or a tab "\t".

Here are examples of a comma and a tab separator:

write.table(name, "name.txt", sep=",")

write.table(name, "name.txt", sep="\t")

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 R16: Exporting data

4 Using write.table with data frames


First let’s create a data frame, A:

Let’s now export this using the following:

write.table(A, "data frame.txt")

This gives the following:

By now, we know how to get rid of the quote marks, set col.names to NA and separators to tabs
to make it all look pretty as follows:

write.table(A, "data frame.txt", quote=FALSE, sep="\t", col.names=NA)

Or we could get rid of the row names, in which case it would be pointless setting col.names to NA
(and you’ll get a lovely error message from R if you try):

write.table(A, "data frame.txt", quote=FALSE, sep="\t", row.names=FALSE)

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 11

5 Using write.csv

Exporting a vector to a csv file


In the previous chapter we said that CSV stands for Comma Separated Value and is to Excel what
NotePad is to Word. It is a stripped down excel file that removes all the formatting and separates
the values in the cells with commas.

To write (ie export) data to a csv file (*.csv) we use the write.csv( ) command:

write.csv(<object>, file="<filename>")

The write.csv function is actually the same as the write.table command, it just has fixed defaults
for some of the optional arguments to ensure that it is properly formatted as a csv file that can be
read by any spreadsheet program such as Excel.

These defaults include commas for the separator (which tells spreadsheets to put the value in the
next column). It always has a header row (ie column names) and will assign default ones if none
are specified. If row.names is TRUE then it sets col.names=NA so that everything lines up
beautifully as we saw earlier.

Let’s apply this to our vector name.

write.csv(name, "name.csv")

Because we gave the file extension .csv we can see in our working directory that Windows
recognises it as a spreadsheet file:

If we click on this file it opens in Excel to give:

We can see that we have the default row names “1”, “2”, “3” and “4” and the default vector
column name “x”.

We can remove the row names by setting row.names to FALSE:

write.csv(name, "name.csv", row.names=FALSE)

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 R16: Exporting data

If you get an error message it’s because we are saving a file called “name.csv” whilst that file is
open in Excel. Simply close the Excel file and re‐execute the command and you’ll get the
following:

However if we try to get rid of the column name as well:

write.csv(name, "name.csv", row.names=FALSE, col.names=FALSE)

We get a warning that R ignores this setting:

This is because the write.csv always has a header row to match the convention. That’s why when
we used read.csv in the previous chapter it always assumed the file had a header row.

Similarly if you tried to change the separator to something other than a comma you’d get an error
too.

Exporting a data frame to a csv file


Finally let’s write our data frame A from earlier to a csv file:

write.csv(A, "data frame.csv")

Again, everything is beautifully lined up with the row and column names.

If we had no row or column names, the function would automatically assign them as we have
seen.

© IFE: 2019 Examinations The Actuarial Education Company


R16: Exporting data Page 13

6 Other export commands

Exporting data to Excel files


We have already been able to export data from R into Excel by saving it as a csv file. It is possible
to export it directly to an xlsx Excel file by using the package “openxlsx”.

You can then use the new command:

write.xlsx(<object>, "<filename>", sheetname= …)

to export the object to an xlsx file. Additionally you can even specify the formatting and other
features.

Exporting data to other statistical packages


It is possible to export data to other statistical packages such as SAS, SPSS or Stata by using the
package “foreign”. You can then use the new command:

write.foreign(<data object>, "<data filename>", "<code filename>",


package="<package name>")

where package="SAS" or package="SPSS" or package="Stata".

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 R16: Exporting data

7 Summary

Key terms
Export Move data from R into another format, eg a text or Excel file

Key commands
writeClipboard(<character object>)
Copies data stored in a character vector to the clipboard

write.table(<object>,file="<filename>")
Exports data into a text file. Optional arguments include
row.names, col.names, quote and sep

write.csv(<object>,file="<filename>")
Exports data into a csv file

write.xlsx(<object>, "<filename>", sheetname= …)


Exports data into an Excel file – needs the openxlsx package

write.foreign(<data object>, "<data filename>", "<code filename>", package="<package name>")


Exports data into an Excel file – needs the foreign package

There is no ‘Have a go’ section in this chapter.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 1

Discrete Distributions

Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-01: Probability Distribution – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 3

Exercise 1.01
A European roulette wheel has the numbers 0 to 36. Each play involves spinning the wheel one
way and a ball the other. The result is the number the ball lands on.

(i) Store the outcomes in the object R.

(ii) Use the length function and logical operators to calculate the following probabilities:

(a) P(R  20)

(b) P(R  10)

(c) P(3  R  9)

A mathematician plays roulette 1,000 times.

(iii) Use set.seed(37) and the sample function to simulate the mathematician’s results
and store it in the object S1.

(iv) Use the table function to obtain a frequency table of the results.

(v) Use the function hist to plot a histogram of the results, ensuring that the labels on the
horizontal access are in the centres of the bars.

(vi) Use the results of the simulation to calculate the empirical probabilities in part (ii).

(vii) Use the results of the simulation to calculate empirical values of the:

(a) mean

(b) median

(c) standard deviation

(d) coefficient of skewness.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-01: Probability Distribution – Exercises

Exercise 1.02
A group consists of 10 people who have each been independently infected by a serious disease.
The survival probability for the disease is 70%.

(i) Calculate the probability that six people survive:

(a) from scratch using factorial function

(b) from scratch using the choose function

(c) using the dbinom function.

(ii) Use dbinom to calculate the probability that:

(a) at least 9 people survive

(b) fewer than 5 people survive.

(iii) Draw a labelled bar chart showing the number of people surviving from the group of 10
people using the barplot function.

(iv) Use the bar chart from part (iii) to obtain the modal number of people in the group that
will survive.

(v) Using  xP( X  x) , calculate the mean number of people who will survive.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 5

Exercise 1.03
A group consists of 10 people who have each been independently infected by a serious disease.
The survival probability for the disease is 70%.

(i) Use pbinom to calculate the probability that:

(a) no more than 7 people survive

(b) more than 5 people survive

(c) at least 8 people survive

(d) less than 6 people survive

(e) exactly 9 people survive.

(ii) Check your answer to part (i)(a) using the dbinom function.

(iii) Draw a labelled bar chart showing the CDF of the number of people surviving from the
group of 10 people using the barplot function.

(iv) Draw a stepped graph of the CDF in part (iii) using the plot function.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-01: Probability Distribution – Exercises

Exercise 1.04
A group consists of 10 people who have each been independently infected by a serious disease.
The survival probability for the disease is 70%.

(i) Use qbinom to calculate the minimum number of survivors, x , such that:

(a) P( X  x)  0.8

(b) P( X  x)  0.95

(c) P( X  x)  0.4

(ii) Obtain the median number of survivors.

(iii) Obtain the interquartile range for the number of survivors.

(iv) Draw a stepped graph showing the number of survivors (percentiles) against the
cumulative probability for the group of 10 people using the plot function.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 7

Exercise 1.05
A group consists of 10 people who have each been independently infected by a serious disease.
The survival probability for the disease is 70%.

(i) Use set.seed(37) and rbinom to simulate the number of survivors 500 times. Store
this in the object B.

(ii) (a) Use the table function on B to obtain a frequency table for the survivors.

(b) Hence, calculate the empirical probabilities.

(c) Compare the results of (b) with the actual probabilities from dbinom (round
them to 3DP using the round function).

(d) Use length to obtain the empirical probability of at most 6 survivors and
compare with the actual probability using pbinom.

(iii) (a) Draw a histogram of the results obtained from the simulation, centring .

(b) Superimpose on the histogram a line graph of the expected frequencies for the
binomial distribution using the lines function.

(c) Comment on the differences.

(iv) Compare the following statistics for the distribution and simulated values:

(a) mean

(b) standard deviation

(c) IQR (use the quantile function).

(v) (a) Create a vector StdDev which contains 500 zeros.

(b) Use a loop to store the standard deviation of the first i values in the object B in
the ith element of StdDev.

(c) Plot a graph of the object StdDev showing how the standard deviation of the
simulations changes over the 500 values compared to a horizontal line showing
the true value..

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-01: Probability Distribution – Exercises

Exercise 1.06
The probability of having a male child can be assumed to be 0.51 independently from birth to
birth.

(i) Calculate the probability that a woman’s fourth child is her first son:

(a) from scratch

(b) using the dgeom function.

(ii) Draw a labelled bar chart showing the probability of obtaining 0 to 10 daughters before
her first son using the barplot function.

(iii) Use pgeom to calculate the probability that the woman has:

(a) at most 4 daughters before her first son

(b) more than 2 daughters before her first son

(c) one daughter before her first son.

(iv) Draw a stepped graph of the CDF using the plot function.

(v) Use qgeom to calculate the smallest number of daughters, x , before the first son such
that:

(a) P( X  x)  0.9

(b) P( X  x)  0.4

(vi) Use set.seed(47) and rgeom to simulate the number of daughters before the first
son 1,000 times. Store this in the object G.

(vii) (a) Use length to obtain the empirical probabilities for part (iii) and comment.

(b) Use quantile to calculate the empirical results for part (v) and comment.

(viii) Compare the following statistics for the distribution and simulated values:

(a) mean

(b) variance.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 9

Exercise 1.07
The probability that a person will believe a rumour about a scandal in politics is 0.8.

(i) Calculate the probability that the ninth person to hear the rumour will be the fourth
person to believe it:

(a) from scratch using the gamma function

(b) using the dnbinom function.

(ii) (a) Use the par function and mfrow to prepare the plot area to display 4 graphs in a
2 by 2 grid.

(b) Use the barplot function to draw 4 bar charts of the probability function for
negative binomial distributions with p  0.8 and k  1 ,2, 3 and 4, with titles
showing the value of k .

(c) Reset the graphics display area using the par function and mfrow.

(iii) Use pnbinom to calculate the probability that:

(a) at most 2 people didn’t believe the rumour before the fourth person did

(b) more than 3 people didn’t believe the rumour before the fourth person did.

(iv) Use qnbinom to calculate the smallest number of people who didn’t believe the rumour,
x , before the fourth person did such that:

(a) P( X  x)  0.75 (b) P( X  x)  0.75

(v) Use set.seed(57) and rnbinom to simulate the number of people who didn’t
believe the rumour before the fourth person did 2,000 times. Store this in the object N.

(vi) (a) Draw a histogram of the results obtained from the simulation in part (v).

(b) Superimpose on the histogram a line graph of the theoretical expected


frequencies using the lines function.

(c) Comment on the graphs obtained.

(vii) Use length to obtain the empirical probabilities for part (iii) and comment.

(viii) Compare the following statistics for the distribution and simulated values:

(a) standard deviation

(b) IQR (use the quantile function and the results from part (iv)).

(ix) Obtain one simulated value for the number of people who didn’t believe the rumour
before the fourth person did using set.seed(57) and the rgeom function.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-01: Probability Distribution – Exercises

Exercise 1.08
Among the 58 people applying for a job, only 30 have a particular qualification. 5 of the group are
randomly selected for a survey about the job application procedure.

(i) Calculate the probability that none of the group selected have the qualification:

(a) from scratch using the choose function

(b) using the dhyper function

(c) using a binomial approximation and the dbinom function.

(ii) (a) Use the par function and mfrow to prepare the plot area to display 4 graphs in a
2 by 2 grid.

(b) Use the barplot function to draw 4 bar charts showing the number of the
group selected having the qualification from samples of size 5, 10, 15 and 20.

(c) Reset the graphics display area using the par function and mfrow.

(iii) Use phyper to calculate the probability that more than 2 people in the group selected
have the qualification.

(iv) Draw a stepped graph of the CDF using the plot function.

(v) Use qhyper to calculate the upper quartile of the number of people in the group
selected who have the qualification.

(vi) Use set.seed(67) and rhyper to simulate the number of people who have the
qualification in the group selected 2,000 times. Store this in the object H.

(vii) Use length to obtain the empirical probability for part (iii) and comment.

(viii) Compare the following statistics for the distribution and simulated values:

(a) mean

(b) upper quartile (use the quantile function and the result from part (v)).

(ix) (a) Draw a line graph of the binomial approximation to the probabilities of the
number of people selected who have the qualification using the plot function.

(b) Superimpose the actual probabilities using dhyper and the lines function.

(c) Superimpose the actual probabilities when 116 people apply for the job with the
same proportion having the particular qualification.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 11

Exercise 1.09
A home insurance company receives claims at a rate of 2 per month.

(i) Calculate the probability that the company receives 4 claims in a month:

(a) from scratch using the exp and factorial functions

(b) using the dpois function.

(ii) (a) Use the par function and mfrow to prepare to display 4 graphs in a 2 by 2 grid.

(b) use the barplot function to draw 4 bar charts showing the numbers of claims
received in a month if they occur at rates of 2, 5, 10 and 20 per month.

(c) Comment on the shape and position of the distribution for larger values of the
mean.

(d) Reset the graphics display area using the par function and mfrow.

(iii) Use ppois to calculate the probability that the company receives at least 3 claims in a
month.

(iv) Draw a stepped graph of the CDF using the plot function.

(v) Use qpois to calculate the interquartile range of the number of claims received in a
month.

(vi) Use set.seed(77) and rpois to simulate the number of claims received in 2,000
separate months. Store this in the object P and plot a histogram.

(vii) Use length to obtain the empirical probability for part (iii) and comment.

(viii) Compare the following statistics for the distribution and simulated values:

(a) standard deviation

(b) IQR (use the quantile function and the result from part (v)).

(ix) (a) Create a vector average which contains 2,000 zeros.

(b) Use a loop to store the mean of the first i values in the object P in the ith
element of Average.

(c) Plot a graph of the object average showing how the mean of the simulations
changes over the 2,000 values compared to the true value.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-01: Probability Distributions – Answers Page 1

Discrete Distributions

Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-01: Probability Distributions – Answers

Exercise 1.01
(ii) (a) 0.5405405

(b) 0.7297297

(c) 0.1621622

(v)

(vi) (a) 0.563

(b) 0.704

(c) 0.175

(vii) (a) 17.424

(b) 17

(c) 10.73636

(d) 0.04215146

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distributions – Answers Page 3

Exercise 1.02
(i) 0.2001209

(ii) (a) 0.1493083

(b) 0.04734899

(iii)

(iv) 7

(v) 7

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-01: Probability Distributions – Answers

Exercise 1.03
(i) (a) 0.6172172

(b) 0.8497317

(c) 0.3827828

(d) 0.1502683

(e) 0.1210608

(iii)

(iv)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distributions – Answers Page 5

Exercise 1.04
(i) (a) 8

(b) 9

(c) 7

(ii) 7

(iii) 2

(iv)

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-01: Probability Distributions – Answers

Exercise 1.05
(ii) (a)

survivors 2 3 4 5 6 7 8 9 10
Freq 3 4 18 48 83 142 110 72 20

(b)

survivors 2 3 4 5 6 7 8 9 10
Prob 0.006 0.008 0.036 0.096 0.166 0.284 0.220 0.144 0.040

(c)

survivors 2 3 4 5 6 7 8 9 10
Prob 0.001 0.009 0.037 0.103 0.200 0.267 0.233 0.121 0.028

Fairly similar

(d) 0.312 vs 0.350, simulations have slightly less smaller values.

(iii) (a) (b)

(c) We underestimate 3,5,6,8, overestimate 2,7,9,10 so it doesn’t seem to be


particularly skewed - so possibly just random variation

(iv) (a) 7.1 vs true value of 7, very similar

(b) 1.517434 vs true value of 1.449138, slightly less spread

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distributions – Answers Page 7

(c) 2 vs true value of 2, identical

(v) (c)

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-01: Probability Distributions – Answers

Exercise 1.06
The probability of having a male child can be assumed to be 0.51 independently from birth to
birth.

(i) 0.06000099

(ii)

(iii) (a) 0.9717525

(b) 0.117649

(c) 0.2499

(iv)

(v) (a) 3

(b) 1

(vii) (a) 0.967, 0.139, 0.27 – all fairly similar

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distributions – Answers Page 9

(b) 3, 1 – both identical to actual values

(viii) (a) 1.04 vs true value of 0.9607843

(b) 2.138539 vs true value of 1.883891

Simulations overestimate the mean and variance

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-01: Probability Distributions – Answers

Exercise 1.07
The probability that a person will believe a rumour about a scandal in politics is 0.8.

(i) 0.007340032

(iii) (a) 0.90112

(b) 0.033344

(iv) (a) 2

(b) 0

(vi) (a) (b)

(c) 2,0 simulated values very close to theoretical expected frequencies

(vii) (a) 0.893

(b) 0.042

There are slightly more values at the upper end in the simulation.

(viii) (a) 1.155412 vs true value of 1.118034, simulation slightly more spread out

(b) 2 vs true value of 2, IQR identical – ties in with part (vii) – slightly more extreme
values produced

(ix) 3

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distributions – Answers Page 11

Exercise 1.08
(i) (a) 0.02144861

(b) 0.02144861

(c) 0.02622106

(iii) 0.5334928

(iv)

(v) 3

(vii) 0.5155, slightly fewer values at upper end than the true distribution

(viii) (a) 2.573 vs true value of 2.586207

(b) 3 vs true value of 3

(ix)

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-01: Probability Distributions – Answers

Exercise 1.09
(i) 0.09022352

(ii)(c) For higher values of  the distribution shifts to right and becomes more symmetrical.

(iii) 0.3233236

(v) 2

(vii) 0.3165. The probability similar but slightly underestimates the true value.

(viii) (a) 1.392815 vs true value of 1.414214

(b) 2 vs true value of 2

(ix) (c)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 1

Continuous Distributions

Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-01: Probability Distribution – Exercises

Exercise 1.11
Claims to a general insurance company’s 24-hour call centre occur at a rate of 3 per hour.
Accordingly the waiting time between calls is modelled using an exponential distribution with
 3.

(i) Calculate the PDF when x  2 :

(a) from scratch

(b) using the dexp function.

(ii) Draw a labelled graph of the PDF for this exponential distribution over the range x  (0,6)
using the:

(a) plot function

(b) curve function

(c) plot function to draw a blank set of axes and then the lines function to draw
the PDF.

(iii) Use the lines function to add to any one of your graphs from part (ii) the following:

(a) a red dotted line showing the PDF of an exponential distribution with   6

(b) a green dashed line showing the PDF of an exponential distribution with   1.5 .

(iv) Add a legend to your plot in part (iii).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 3

Exercise 1.12
Claims to a general insurance company’s 24-hour call centre occur at a rate of 3 per hour.
Accordingly the waiting time between calls is modelled using an exponential distribution with
 3.

(i) Use pexp to calculate the probability of waiting:

(a) more than half an hour

(b) less than 3 hours

(c) between 90 and 150 minutes

(ii) Draw a labelled graph of the CDF for this exponential distribution over the range x  (0,5)
using either the plot function, or the curve function or the plot function to draw a
blank set of axes and then the lines function to draw the PDF.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-01: Probability Distribution – Exercises

Exercise 1.13
Claims to a general insurance company’s 24-hour call centre occur at a rate of 3 per hour.
Accordingly the waiting time between calls is modelled using an exponential distribution with
 3.

(i) Use qexp to calculate the number of hours waited, x , such that:

(a) P( X  x)  0.8

(b) P( X  x)  0.95

(c) P( X  x)  0.3 .

(ii) Obtain the median waiting time.

(iii) Obtain the interquartile range for the waiting time.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 5

Exercise 1.14
Claims to a general insurance company’s 24-hour call centre occur at a rate of 3 per hour.
Accordingly the waiting time between calls is modelled using an exponential distribution with
 3.

(i) Use set.seed(37) and rexp to simulate 500 waiting times. Store this in the object W.

(ii) (a) Draw a labelled histogram of the densities of the 500 simulated waiting times.

(b) Superimpose on the histogram a graph of the actual PDF of the waiting times
using the lines function.

(iii) Use length to obtain the empirical probability of waiting and compare to results from
Exercise 1.12(i):

(a) more than half an hour

(b) less than 3 hours

(c) between 90 and 150 minutes

(iv) Compare the simulated and actual:

(a) mean

(b) standard deviation

(c) IQR (use the quantile function)

(d) 95th percentile.

(v) (a) Create a vector middle which contains 500 zeros.

(b) Use a loop to store the median of the first i values in the object W in the ith
element of middle.

(c) Plot a graph of the object middle showing how the median of the simulations
changes over the 500 values and show the median of the distribution on the same
graph.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-01: Probability Distribution – Exercises

Exercise 1.15
The annual claim amounts (in $m) due to damage caused by sinkholes in a certain American state
is modelled by a gamma distribution with parameters   2 and   1.5 .

(i) Calculate the PDF of this gamma distribution when x  $3m :

(a) from scratch using the gamma function

(b) using the dgamma function.

(ii) Draw a labelled graph of the PDF for this gamma distribution over the range x  (0,8)
using either plot, curve or plot and lines.

(iii) Use the lines function to add the following to your graph from part (ii):

(a) a red dotted line showing the PDF of a gamma distribution now with   1

(b) a green dashed line showing the PDF of a gamma distribution now with   0.5 .

(iv) Add a legend to your plot in part (iii).

(v) Use pgamma to calculate the probability for the original gamma distribution that:

(a) annual claims are greater than $2m

(b) annual claims are between $1m and $3m.

(vi) Use qgamma to calculate the IQR of the annual sinkhole claim amounts.

(vii) Use set.seed(57) and rgamma to simulate 2,000 annual damage amounts. Store this
in the object C.

(viii) (a) Draw a labelled histogram of the densities of the 2,000 simulated annual claims.

(b) Superimpose on the histogram a graph of the actual PDF of the claim amounts
using the lines function.

(ix) Use length to obtain the empirical probabilities for part (v) and comment.

(x) Compare the simulated and actual:

(a) standard deviation

(b) IQR (use the quantile function and the result from part (vi)).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 7

Exercise 1.16
There is no Exercise 1.16.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-01: Probability Distribution – Exercises

Exercise 1.17
(i) Claim amounts, X , are modelled using a continuous distribution with CDF given by:

0.5
F ( x )  1  e 3 x x0

(a) Use set.seed(47) and runif to simulate 1,000 random numbers between 0
and 1. Store this in the object U.

(b) Use the random numbers from part (i)(a) to obtain 1,000 simulated claims.

(c) Hence obtain an estimate of the mean and standard deviation of the claims.

(ii) Draw the PDFs of the following beta distributions in blue, green and red, respectively, on
the same axes using plot and lines and adding a legend:

(a) beta(0.5,2)

(b) beta(2,0.5)

(c) beta(0.5,0.5)

(iii) Use pbeta to calculate:

(a) P(0.2  beta(2,0.5)  0.8)

(b) P(beta(0.5,0.5)  0.7)

(iv) Use qbeta to find x such that P( X  x)  0.65 , where X has a beta(0.5,2) distribution.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability Distribution – Exercises Page 9

Exercise 1.18
(i) Calculate the value of the PDF when x  120 for:

(a) a normal distribution with mean 100 and variance 50.

(b) a lognormal distribution with parameters   4.5 and  2  0.005 .

(ii) Calculate the following probabilities:

(a) P(80  N(100,50)  110)

(b) P(log N(4.5,0.005)  100) .

(iii) Find the 95th percentile for:

(a) a normal distribution with mean 100 and variance 50.

(b) a lognormal distribution with parameters   4.5 and  2  0.005 .

(iv) Use set.seed(58) and rlnorm to simulate 1,000 values from a lognormal
distribution whose estimated parameters will be exactly   4.5 and  2  0.005 . Store
them in the object L .

(v) (a) Plot the PDF of a lognormal with parameters   4.5 and  2  0.005 for
x  (60,130)

(b) Use lines and density(L) to superimpose the empirical PDF in red.

(vi) Use the simulations from part (iv) to calculate the empirical value of:

(a) part (ii)(b)

(b) part (iii)(b).

and compare them to the true values.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-01: Probability Distribution – Exercises

Exercise 1.19
(i) Calculate the value of the PDF when x  0.5 for:

(a) a t distribution with 5 degrees of freedom

(b) an F distribution with 2 and 7 degrees of freedom.

(ii) Use plot and lines to draw a labelled graph of the PDF of the following distributions over
the range x  (3.5,3.5) :

(a) t5

(b) t10

(c) N(0,1)

(iii) Comment on the shape of the PDF of a tn distribution as n   .

(iv) Calculate the following probabilities:

(a) P(t5  1.2)

(b) P(1  t20  1)

(c) P (F2,7  1.5)

(v) Find x such that P( X  x)  0.38 where X has a:

(a) a t distribution with 5 degrees of freedom

(b) an F distribution with 2 and 7 degrees of freedom.

(vi) Use set.seed(59) and rf to simulate 500 values from an F distribution with 2 and 7
degrees of freedom. Store this in the object S.

(vii) Use the simulated values from part (vi) to calculate the empirical value of:

(a) part (iv)(c).

(b) part (v)(b).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability distributions – Answers Page 1

Continuous distributions
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-01: Probability distributions – Answers

Exercise 1.11
(i) (a)(b) 0.0074363

(ii),(iii),(iv)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability distributions – Answers Page 3

Exercise 1.12
(i) (a) 0.22313

(b) 0.99988

(c) 0.010556

(ii)

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-01: Probability distributions – Answers

Exercise 1.13
(i) (a) 0.53648

(b) 0.99858

(c) 0.40132

(ii) 0.23105

(iii) 0.36620

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability distributions – Answers Page 5

Exercise 1.14
(ii) (a)(b)

(iii) (a) simulated 0.25 vs actual 0.2231302

(b) simulated 1 vs actual 0.9998766

(c) simulated 0.012 vs actual 0.01055591

All simulated values greater than actual so simulations are more spread out.

(iv) (a) simulations 0.357328 vs distribution 0.333333

(b) simulations 0.346578 vs distribution 0.333333

(c) simulations 0.3994169 vs distribution 0.3662041

(d) simulations 1.078866 vs distribution 00.9985774.

All simulated values greater than distribution. So simulated values higher and more spread out.

(v) (c)

After 500 simulations the median has still not settled down to its long-term value.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-01: Probability distributions – Answers

Exercise 1.15
(i) 0.07498573

(ii)-(iv)

(v) (a) 0.1991483

(b) 0.4967259

(vi) 1.154237

(viii)

(ix) (a) 0.205 vs true value of 0.1991483

(b) 0.4905 vs true value of 0.4967259

The probabilities very similar – unsurprising given that we've got 2,000 simulations

(x) (a) 0.9316344 vs true value of 0.942809

(b) 1.189478 vs true value of 1.154237

Spread of simulations is very similar to actual spread.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability distributions – Answers Page 7

Exercise 1.16
There is no Exercise 1.16.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-01: Probability distributions – Answers

Exercise 1.17
(i) (c) mean 0.2298496, standard deviation 0.4498087

(ii)

(iii) (a) 0.3577709

(b) 0.3690101

(iv) 0.05655679

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-01: Probability distributions – Answers Page 9

Exercise 1.18
(i) (a) 0.001033349

(b) 1.209859e-05

(ii) (a) 0.9190115

(b) 0.0684637

(iii) (a) 111.6309

(b) 101.1201

(v)

(vi) (a) 0.064 vs true value of 0.0684637 – smaller tail probability

(b) 100.6061 vs true value of 101.1201 – very similar

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-01: Probability distributions – Answers

Exercise 1.19
(i) (a) 0.3279185

(b) 0.5483227

(ii)

(iii) The PDF of the t distribution approaches that of the N(0,1) as degrees of freedom gets
larger.

(iv) (a) 0.8580545

(b) 0.6707434

(c) 0.2869744

(v) (a) 0.322672

(b) 0.5122197

(vii) (a) 0.292

(b) 0.5353863

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-05: The Central Limit Theorem – Exercises Page 1

The Central Limit Theorem


Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-05: The Central Limit Theorem – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-05: The Central Limit Theorem – Exercises Page 3

Exercise 5.01
Claims amounts for a particular policy are modelled using an exponential distribution with mean
£1,000.

(i) Use set.seed(27) and rexp to simulate the claim amounts 50 times. Store this in
the object E.

(ii) (a) Draw a histogram of the results obtained from the 50 simulations.

(b) Comment on the shape of the distribution.

(iii) (a) Create a vector xsum which contains 1,000 zeros.

(b) Use a loop to store the sum of the results obtained in the i th sample of 50
simulations (using set.seed(27)) in the i th element of xsum.

(iv) (a) Draw a labelled histogram of the probabilities of the results in xsum.

(b) Superimpose on the histogram a graph of the PDF of an appropriate normal


distribution of the form N (n , n 2 ) using the lines function (or the curve
function).

(v) Calculate the probability of the sum of 50 claims being greater than £60,000:

(a) empirically from the results in xsum

(b) using the central limit theorem and the pnorm function.

(vi) (a) Use the par function and mfrow to prepare the plot area to display 4 graphs in a
2 by 2 grid.

(b) Repeat parts (iii) and (iv) for sample sizes of 5, 10, 50 and 100 claims.

(c) Comment on your results.

(d) Reset the graphics display area using the par function and mfrow.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-05: The Central Limit Theorem – Exercises

Exercise 5.02
Claims amounts for a particular policy are modelled using an exponential distribution with mean
£1,000.

If you have already created xsum in your current R session then proceed directly onto part (ii).

(i) (a) Create a vector xsum which contains 1,000 zeros.

(b) Use a loop to store the sum of the results obtained in the i th sample of 50
simulations (using set.seed(27)) in the i th element of xsum.

(ii) (a) Calculate the mean and variance of the simulations of the sum of 50 claims.

(b) Compare part (a) with the mean and variance of an appropriate normal
distribution of the form N (n , n 2 ) .

(iii) (a) Calculate the median, lower and upper quartiles of the simulations of the sum of
50 claims using either quantile or summary.

(b) Compare part (a) with the median, lower and upper quartiles of a N (n , n 2 )
distribution.

(iv) (a) Use qqnorm to obtain a QQ plot for the simulations of the sum of 50 claims and a
normal distribution.

(b) Use qqline to add a line.

(c) Comment on the skewness of the distribution of the sum of 50 claims and how
close it is to a normal distribution.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-05: The Central Limit Theorem – Exercises Page 5

Exercise 5.03
The numbers of claims are modelled by a Poisson distribution with mean 10 per day.

(i) (a) Obtain 1,000 simulations from this distribution using set.seed(29). Store this
in the object P.

(b) Plot a histogram of the probabilities of the results, ensuring that the labels on the
horizontal axis are in the centres of the bars.

(c) Superimpose the PDF of the approximate normal distribution.

(ii) (a) Use length to obtain the empirical probability of more than 10 claims in a day.

(b) Calculate the equivalent probability using the normal approximation.

(c) Compare parts (a) and (b) to the exact Poisson probability using ppois.

(iii) (a) Use qqnorm to obtain a QQ plot for the simulations and a normal distribution.

(b) Use qqline to add a line.

(c) Comment on how close the normal distribution approximation is to the Poisson.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-05: The Central Limit Theorem – Answers Page 1

The Central Limit Theorem


Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-05: The Central Limit Theorem – Answers

Exercise 5.01
(ii) (a)

(b) The distribution is positively skewed.

(iv) (a)(b)

(v) (a) 0.082

(b) 0.07865

(vi) (c) As the sample size increases, the normal approximation approaches the empirical
distribution more closely.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-05: The Central Limit Theorem – Answers Page 3

Exercise 5.02
(ii) (a) mean 49,909 and variance 50,999,108.

(b) mean 50,000 and variance 50,000,000.

Simulated mean slightly smaller whereas simulated variance slightly larger.

(iii) (a) LQ=44,673, M=49,616, UQ=54,711

(b) LQ=45,231, M=50,000, UQ=54,769

All xsum quartiles are similar but slightly smaller

(iv) (a)(b)

(c) Close to normal in the middle and fairly good in upper tail.

However, ‘banana shape’ indicates skewness. Since sample quantiles above the
line in both tails - they need to be lower to match norm.

Seriously lighter lower tail so it’s not as low as should be and slightly heavier
upper tail so it’s higher than expected.

Lighter lower tail and heavier upper tail indicates positive skew.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-05: The Central Limit Theorem – Answers

Exercise 5.03
The numbers of claims are modelled by a Poisson distribution with mean 10 per day.

(i) (b)(c)

(ii) (a) 0.409

(b) 0.43718

(c) The probability from the Poisson simulation is closer than that from the normal
approximation to the true value of 0.41696.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-05: The Central Limit Theorem – Answers Page 5

(iii) (a)(b)

(c) #QQ plot 'banana shape' shows signs of positive skew. The mean of the Poisson
distribution is not large enough to ensure normal distribution is a good
approximation

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-06: Sampling distributions – Exercises Page 1

Sampling distributions
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-06: Sampling distributions – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-06: Sampling distributions – Exercises Page 3

Exercise 6.01
Heights of a particular group of women are normally distributed with mean 162cm and standard
deviation 9 cm.

(i) (a) Create a vector xvar which contains 1,000 zeros.

(b) Use a loop to obtain 1,000 sample variances from a sample of 20 women, using
set.seed(27) and storing the sample variance of the i th sample of 20 women
in the i th element of xvar.

(n  1)S 2
Recall that   n21 for samples of size n from a N( , 2 ) distribution.
2

(20  1) xvar
(ii) Create a new vector X from xvar which is equal to .
92

(iii) (a) Draw a labelled histogram of the densities of vector X.

(b) Superimpose on the histogram the empirical PDF of vector X using the functions
density and lines.

(c) Superimpose on the histogram a graph of the PDF of a  n21 distribution using
the lines function.

(d) Comment on how close our empirical distribution is to the  n21 distribution.

(iv) Calculate the mean and variance of X and compare it to the mean and variance of a  n21
distribution.

(v) Calculate the median, lower and upper quartiles of the vector X using either quantile
or summary and compare them to the median, lower and upper quartiles of a  n21
distribution.

(vi) (a) Simulate 1,000 values from a  n21 distribution and store them in the vector chi.

(b) Use qqplot to obtain a QQ plot for the vectors X and chi.

(c) Use abline to add a line.

(d) Comment on how close the distribution of X is to a  n21 distribution.

(vii) (a) Calculate the probability of a value of X being greater than 15.

(b) Compare this to the corresponding probability from a  n21 distribution.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-06: Sampling distributions – Answers Page 1

Sampling distributions
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-06: Sampling distributions – Answers

Exercise 6.01
(iii) (a),(b),(c)

(d) Fairly similar – simulated peak slightly to the left and also there's a lump in the
positive tail

(iv) Mean and variance of simulated values are 18.888 and 37.623, respectively.

2
Mean and variance of 19 distribution are 19 and 38, respectively.

The results are fairly close.

(v)
Simulated Actual
Lower quartile 14.352 14.562
Median 18.139 18.338
Upper quartile 22.612 22.718

The results are close – but quantiles of the simulated are all slightly lower than the true values.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-06: Sampling distributions – Answers Page 3

(vi) (b)(c)

(d) Fairly close – can see the lump towards the upper end.

Slightly weaker at the tails – but both sides of the line so not skewed.

(vii) (a) 0.705

(b) 0.72260

Empirical probability is about 2.5% smaller.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-07: Estimation – Exercises Page 1

Estimation
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-07: Estimation – Exercises

Data requirements
These exercises require the following data files:

 motor.txt
 lifetime.csv

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Exercises Page 3

Exercise 7.01
A motor insurance portfolio produces claim incidence data for 100,000 policies over one year. The
table below shows the observed number of policyholders making 0, 1, 2, 3, 4, 5, and 6 or more
claims in a year.

No. of claims No. of policies


0 87,889
1 11,000
2 1,000
3 100
4 10
5 1
6 –
Total 100,000

These data values are contained in the csv data file, ‘motor’.

(i) (a) Use the read.table function to load up the data.

(b) Extract it as a vector and store it as the object claims.

(ii) Plot a histogram of claims ensuring the bars line up correctly.

It is thought the data could either be modelled as a Poisson distribution with mean   0.13345
or as a Type 2 negative binomial distribution with k  1.8569 and p  0.93295 .

(iii) (a) List out the expected frequencies for each of these fitted distributions to the
nearest whole number.

(b) Obtain the differences between the observed and expected frequencies for the
two fitted distributions.

(c) Hence, comment on the fit of these two distributions to the observed data.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-07: Estimation – Exercises

Exercise 7.02
The lifetimes (in hours) of 250 incandescent bulbs are contained in the CSV data file ‘lifetime’.

(i) (a) Use the read.table function to load up the data.

(b) Extract it as a vector and store it as the object bulbs.

(ii) Plot a labelled histogram of the lifetimes.

(iii) Plot a labelled empirical PDF of the lifetimes.

It is thought the data could either be modelled as an exponential distribution with parameter
  0.00049724 or as a gamma distribution with   0.86280 and   0.00042902 .

(iv) Superimpose the PDFs of these two distributions on the empirical PDF from part (iii).

(v) (a) Use set.seed(71) and rexp to generate 1,000 values from the fitted
exponential distribution and store them in the object xexp.

(b) Plot a QQ plot using qqplot on xexp and bulbs.

(c) Add abline(0,1) to the QQ plot and comment on the fit.

(vi) (a) Use set.seed(71) and rgamma to generate 1,000 values from the fitted
exponential distribution and store them in the object xgamma.

(b) Plot a QQ plot using qqplot on xgamma and bulbs.

(c) Add abline(0,1) to the QQ plot and comment on the fit.

(vii) State which model is most appropriate using your results of parts (v) and (vi).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Exercises Page 5

Exercise 7.03
There is currently no Exercise 7.03.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-07: Estimation – Exercises

Exercise 7.04
A random sample of eight observations from an unknown distribution is given below:

4.8 7.6 1.2 3.5 2.9 0.8 0.5 2.3

(i) Store these values in the vector data.

(ii) (a) Making no distributional assumption, use set.seed(19), sample and either
replicate or a loop to obtain the mean of 1,000 re-samples of size 8.

(b) Plot a labelled histogram of the densities of this empirical distribution.

(c) Use lines and density to superimpose the empirical distribution of the
estimators.

(d) Calculate the mean and standard deviation of this empirical distribution.

It is now assumed that the random sample has been taken from a 2 distribution.

(ii) Obtain the method of moments estimate for  .

(iv) (a) Use set.seed(19), rchisq and either replicate or a loop to obtain the
mean of 1,000 samples of size 8.

(b) Plot a labelled histogram of the densities of this empirical distribution.

(c) Use lines and density to superimpose the empirical distribution of the
estimators.

(d) Calculate the mean and standard deviation of this empirical distribution.

(v) Compare the parametric and non-parametric bootstrap distributions.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Answers Page 1

Estimation
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-07: Estimation – Answers

Exercise 7.01
(ii)

(iii) (a)(b) Fitted Poisson distribution

# claims 0 1 2 3 4 5 6
Expected 87,507 11,678 779 35 1 0 0
Difference 382 678 221 65 9 1 0

Fitted Type 2 negative binomial

# claims 0 1 2 3 4 5 6
Expected 87,908 10,945 1,048 90 7 1 0
Difference 19 55 48 10 3 0 0

(c) Negative binomial expected frequencies are closer to the observed frequencies,
hence it is the better fit to the number of claims.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Answers Page 3

Exercise 7.02
(ii)

(iii)

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-07: Estimation – Answers

(iv) (a)

(b) Hard to comment on which is the better fit from this graph.

(v) (c)(d)

Middle to upper sample values are higher than model, so heavier upper tail – more positively
skew than model.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Answers Page 5

(vi) (b)(c)

Middle to higher values get worse (with the highest value very poor) but better in middle than the
exponential since both sides of line.

(vii) Both have good fit at lower end but worse elsewhere. Despite the single extreme value in
the gamma the middle has a better fit than the exponential.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-07: Estimation – Answers

Exercise 7.03
There currently is no Exercise 7.03.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-07: Estimation – Answers Page 7

Exercise 7.04
(ii) (b)(c)

(d) mean 2.9750, standard deviation 0.76294

(iii) 2.95

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-07: Estimation – Answers

(iv) (b)(c)

(d) mean 2.9483, standard deviation 0.86432

(v) The longer tail in the parametric bootstrap leads to larger standard deviation.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 1

Confidence intervals
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-08: Confidence Intervals – Exercises

Data requirements
These exercises require the following data file:

 water.txt

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 3

Exercise 8.01
Heights of males with classic congenital adrenal hyperplasia (CAH) are assumed to be normally
distributed with a standard deviation of 8.4 cm.

A sample of 10 men with CAH has a mean height of 162.8 cm.

(i) Use qnorm to obtain:

(a) a 95% confidence interval for the mean height of men

(b) a 99% confidence interval for the mean height of men

(c) a 90% confidence interval for the mean height of men of the form (0, L).

The weights of women in the US are assumed to be normally distributed with standard deviation
12.1kg.

The weights of a sample of 6 women are:

64.30, 68.93, 74.91, 52.71, 59.70, 73.60

(ii) (a) Store these values in the vector data.

(b) Obtain a 95% confidence interval for the mean weight of women in the US.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-08: Confidence Intervals – Exercises

Exercise 8.02
The annual rainfall in centimetres at a certain weather station over the last ten years has been as
follows:

17.2, 28.1, 25.3, 26.2, 30.7, 19.2, 23.4, 27.5, 29.5, 31.6

(i) Store these values in the vector rain.

(ii) Obtain a 99% confidence interval for the average annual rainfall

(a) from scratch using qt

(b) using t.test.

A sample of 100 claims (in £) for damage due to water leakage on an insurance company’s
household contents policies are contained in the file ‘water.txt’.

(iii) Use t.test to obtain a 95% confidence interval for the mean water leakage damage.

The built in data object, women, contains the average heights (in) and weights (lbs) for American
women aged 30–39.

(iv) Use t.test to obtain a 90% confidence interval for the average weights.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 5

Exercise 8.03
The annual rainfall in centimetres at a certain weather station over the last ten years has been as
follows:

17.2, 28.1, 25.3, 26.2, 30.7, 19.2, 23.4, 27.5, 29.5, 31.6

(i) Obtain a 90% confidence interval for the standard deviation of the annual rainfall from
scratch using qchisq.

A sample of 100 claims (in £) for damage due to water leakage on an insurance company’s
household contents policies are contained in the file ‘water.txt’.

(ii) Obtain a 95% confidence interval for the variance of the water leakage damage.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-08: Confidence Intervals – Exercises

Exercise 8.04
The annual rainfall in centimetres at a certain weather station over the last ten years has been as
follows:

17.2, 28.1, 25.3, 26.2, 30.7, 19.2, 23.4, 27.5, 29.5, 31.6

(i) Store these values in the vector rain.

(ii) (a) Assuming that rainfall is normally distributed use set.seed(19), rnorm and
either replicate or a loop to obtain the mean of 1,000 samples of size 10.

(b) Hence, obtain a 99% parametric bootstrap confidence interval for the average
annual rainfall.

(iii) (a) Making no distributional assumption, use set.seed(19), sample and either
replicate or a loop to obtain the mean of 1,000 re-samples of size 10.

(b) Hence, obtain a non-parametric 99% confidence interval for the average annual
rainfall.

(iv) Using the method in part (ii) and the same seed, obtain a 90% confidence interval for the
standard deviation of the annual rainfall.

(v) Using the method in part (iii) and the same seed, obtain a non-parametric 90% confidence
interval for the standard deviation of the annual rainfall.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 7

Exercise 8.05
An opinion poll of 1,000 voters found that 450 favoured Party P.

(i) (a) Use binom.test to calculate a 99% confidence interval for the proportion of
voters who favour Party P.

(b) Comment on the likelihood of more than 50% of the voters voting for Party P in
an election.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-08: Confidence Intervals – Exercises

Exercise 8.06
A sample of 30 values from the Poisson distribution has a mean of 2.

Use poisson.test to calculate an exact 90% confidence interval for the mean rate.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 9

Exercise 8.07
The average blood pressure (in mmHg) for a control group C of 10 patients and a similar group T
of 10 patients on a special diet are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) (a) Use two vectors and t.test to calculate a 90% confidence interval for the
difference in average blood pressures for the two groups.

(b) Comment on the result.

(ii) It is now known that both groups were made up of the same patients at different times.
Repeat part (i)(a) given this new information.

The built in data set iris contains measurements (in cm) of various features of 3 species of iris:
setosa, versicolor and virginica.

(iii) (a) Extract the Petal.Length of the setosa and virginica species and store them in the
vectors PS and PV.

(b) Obtain a 99% confidence interval for the difference between the mean petal
lengths of the two species in part (a), assuming equal variances.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-08: Confidence Intervals – Exercises

Exercise 8.08
There is no Exercise 8.08.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Exercises Page 11

Exercise 8.09
The average blood pressure (in mmHg) for a control group C of 10 patients and a similar group T
of 10 patients on a special diet are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) (a) Use two vectors and var.test to calculate a 99% confidence interval for the
ratio of the variances of two groups blood pressures .

(b) Comment on the result.

The built in data set iris contains measurements (in cm) of various features of 3 species of iris:
setosa, versicolor and virginica.

(ii) (a) Extract the Petal.Length of the setosa and virginica species and store them in the
vectors PS and PV.

(b) Obtain a 90% confidence interval for the ratio of the variances of the petal lengths
of the two species in part (a).

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-08: Confidence Intervals – Exercises

Exercise 8.10
A sample of 100 claims on household policies made during the year just ended showed that 62
were due to burglary. A sample of 200 claims made during the previous year had 115 due to
burglary.

(i) Use two vectors and prop.test to calculate a 90% confidence interval for the
difference in proportions of home claims due to burglary between the two years.

(ii) Repeat part (i) using a matrix for the results instead of two vectors.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 1

Confidence intervals
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-08: Confidence Intervals – Answers

Exercise 8.01
(i) (a) (157.59, 168.00)

(b) (155.96, 169.64)

(c) (0, 166.20)

(ii) (b) (56.010, 75.374)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 3

Exercise 8.02
(ii) (20.987, 30.753)

(iv) (293.72, 330.88)

(iv) (129.69, 143.78)

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-08: Confidence Intervals – Answers

Exercise 8.03
(i) (3.4652, 7.8166)

(ii) (6757.0, 11828)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 5

Exercise 8.04
(ii) (b) (21.714, 29.312)

(iii) (b) (22.050, 28.900)

(iv) (2.9338, 6.4592)

(v) (2.5546, 5.7980)

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-08: Confidence Intervals – Answers

Exercise 8.05
(i) (a) (0.40930, 0.49119)

(b) Since 99% CI for p doesn't contain p  0.5 (or higher values of p ) it is unlikely
that Party P will gain more than 50% of the votes.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 7

Exercise 8.06
(i) (1.5951, 2.4797)

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-08: Confidence Intervals – Answers

Exercise 8.07
(i) (a) (1.0063, 5.0063) or (5.0063, 1.0063)

(b) The confidence interval contains 0 so the means could be equal.

(ii) (0.31393, 3.68607) or (3.68607,  0.31393)

(iii) (b) (4.3049,  3.8751) or (3.8751, 4.3049)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 9

Exercise 8.08
There is no Exercise 8.08.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-08: Confidence Intervals – Answers

Exercise 8.09
(i) (a) (0.14049, 6.0111) or (0.16636, 7.1178)

(b) The confidence interval contains 1 so the variances could be equal.

(ii) (b) (0.061605, 0.15915) or (6.2835, 16.233)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-08: Confidence Intervals – Answers Page 11

Exercise 8.10
(i) (a) (0.14339, 0.053387) or ( 0.053387,0.14339)

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-09: Hypothesis Tests – Exercises Page 1

9
Hypothesis Tests
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-09: Hypothesis Tests – Exercises

Data requirements
These exercises require the following data file:

• water.txt

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 3

Exercise 9.01
Heights of males with classic congenital adrenal hyperplasia (CAH) are assumed to be normally
distributed with a standard deviation of 8.4 cm.

A sample of 10 men with CAH has a mean height of 162.8 cm.

(i) Carry out the following tests in R using pnorm to obtain the p-value:

=
(a) H0 : µ 165 vs H1 : µ < 165

=
(b) H0 : µ 158 vs H1 : µ ≠ 158

The weights of women in the US are assumed to be normally distributed with standard deviation
12.1kg.

The weights of a sample of 6 women are:

64.30, 68.93, 74.91, 52.71, 59.70, 73.60

(ii) (a) Store these values in the vector data.

(b) Test whether the mean weight of women in the US is 64.2kg.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.02
The annual rainfall in centimetres at a certain weather station over the last ten years has been as
follows:

17.2, 28.1, 25.3, 26.2, 30.7, 19.2, 23.4, 27.5, 29.5, 31.6

(i) Store these values in the vector rain.

(ii) Test whether the average annual rainfall has increased from its former long-term value of
22 cm:

(a) from scratch using pt to obtain the p-value

(b) using t.test.

A sample of 100 claims (in £) for damage due to water leakage on an insurance company’s
household contents policies are contained in the file ‘water.txt’.

(iii) Use t.test to test whether the mean water leakage damage is £300.

The built in data object, women, contains the average heights (in) and weights (lbs) for American
women aged 30–39.

(iv) Use t.test to test whether the mean height is less than 70 inches.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 5

Exercise 9.03
The annual rainfall in centimetres at a certain weather station over the last ten years has been as
follows:

17.2, 28.1, 25.3, 26.2, 30.7, 19.2, 23.4, 27.5, 29.5, 31.6

(i) Test whether the standard deviation of annual rainfall is equal to 10cm from scratch using
pchisq to obtain the p-value.

A sample of 100 claims (in £) for damage due to water leakage on an insurance company’s
household contents policies are contained in the file ‘water.txt’.

(ii) Test whether the standard deviation of the water leakage damage is less than £100.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.04
There is currently no Exercise 9.04.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 7

Exercise 9.05
A new gene has been identified that makes carriers particularly susceptible to a particular
degenerative disease. In a random sample of 250 adult males born in the UK, 8 were found to be
carriers of the disease.

(i) Use binom.test to test whether the proportion of adult males born in the UK carrying
the gene is less than 10%.

(ii) Extract the p-value for the test in (i).

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.06
A random sample of 500 policies of a particular kind revealed a total of 116 claims during the last
year. Assume the annual claim frequency per policy has a Poisson distribution with mean λ .

(i) Test the null hypothesis H0 : λ = 0.18 against the alternative H1 : λ > 0.18 .

(ii) Extract the estimate for the mean rate in (i).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 9

Exercise 9.07
The average blood pressure (in mmHg) for a control group C of 10 patients and a similar group T
of 10 patients on a special diet are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) Use two vectors and t.test to test the hypothesis that patients on the special diet have
a lower average blood pressure than the control group (you may assume that the
variances of the 2 groups of patients are equal).

(ii) It is now known that both groups were made up of the same patients at different times.
Repeat part (i) given this new information.

The built in data set iris contains measurements (in cm) of various features of 3 species of iris:
setosa, versicolor and virginica.

(iii) (a) Extract the Petal.Length of the setosa and virginica species and store them in the
vectors PS and PV.

(b) Obtain the p-value for a test that the difference between the means of the two
species in part (a) is equal to 4cm.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.08
The average blood pressure (in mmHg) for a control group C of 10 patients and a similar group T
of 10 patients on a special diet are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) Use two vectors and var.test to test the hypothesis that patients on the special diet
have a higher variance between their blood pressures.

The built in data set iris contains measurements (in cm) of various features of 3 species of iris:
setosa, versicolor and virginica.

(ii) (a) Extract the Petal.Length of the setosa and virginica species and store them in the
vectors PS and PV.

(b) Calculate the p-value for a test that the variances of the two species in part (a) are
equal and comment.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 11

Exercise 9.09
There is currently no Exercise 9.09.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.10
A sample of 100 claims on household policies made during the year just ended showed that 62
were due to burglary. A sample of 200 claims made during the previous year had 115 due to
burglary.

(i) Use two vectors and prop.test to test the hypothesis that the underlying proportion
of home claims that are due to burglary is higher in the second year than in the first. Do
not use a continuity correction.

(ii) Repeat part (i) using a matrix for the results instead of two vectors.

An actuary claims that the rates of burglary should be modelled using a Poisson distribution.

(iii) Use poisson.test to test whether the rates of burglary have changed between the
two years.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 13

Exercise 9.11
The average blood pressure (in mmHg) for a control group C of 10 patients and a similar group T
of 10 patients on a special diet are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) Store these results in two vectors and the value of the difference between their means in
the object ObsT.

(ii) Carry out a permutation test to test the hypothesis that patients on the special diet have a
lower average blood pressure than the control group:

(a) Store the combined vector of results in the object results.

(b) Create an object index that gives the positions of the values in results.

(c) Use the function combn on the object index to calculate all the combinations of
patients in the control group and store this in the object p.

(d) Use a loop to store the differences in the average blood pressures of the two
groups in the object dif.

(iii) (a) Plot a labelled histogram of the differences in the average blood pressures of the
two groups for every combination.

(b) Use the function abline to add a dotted vertical blue line to show the critical
value.

(c) Use the function abline to add a dashed vertical red line to show the observed
statistic.

(iv) (a) Calculate the p-value of the test based on this permutation test.

(b) The p-value calculated under the normality assumption was 13.19%. Comment on
your result.

(v) Repeat part (ii) but with 10,000 resamples from the object results using the function
sample and set.seed(77).

(vi) Calculate the p-value of the test using resampling and compare it to the answer using all
the combinations calculated in part (iv).

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.12
The average blood pressure (in mmHg) for a group of 10 patients under a controlled diet (C) and a
special diet (T) are given in the table below:

C 73.0, 76.9, 82.8, 74.8, 83.0, 79.7, 78.2, 73.9, 74.5, 73.2

T 72.4, 76.2, 73.9, 72.2, 84.6, 75.4, 78.2, 72.8, 72.0, 72.3

(i) Store the differences of pairs of results in the vector D and the mean value of these
differences in the object ObsD.

(ii) Carry out a permutation test to test the hypothesis that patients on the special diet have a
lower average blood pressure than the control group:

(a) Store the values (−1,1) in the vector sign.

(b) Use the function permutations from the package gtools to calculate all the
permutations of the signs of the differences in object D and store these
permutations in the object p.

(c) Use a loop to store the mean differences in the average blood pressures of the
two groups in the object dif.

(iii) (a) Calculate the p-value of the test based on this permutation test.

(b) The p-value calculated under the normality assumption was 2.885%. Comment on
your result.

(iv) Repeat part (ii) but with 10,000 resamples from the object sign using the function
sample in the loop and set.seed(79).

(v) Calculate the p-value of the test using resampling and compare it to the answer using all
the combinations calculated in part (iii).

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 15

Exercise 9.13
The number of employees in a particular company who are absent for each work day are given in
the table below:

Day Mon Tue Wed Thu Fri Total


Frequency 73 52 52 55 68 300

(i) (a) Store these frequencies in the vector obs.

(b) Give the expected frequencies if the number of employees absent was
independent of the day (ie uniformly distributed).

(c) Use chisq.test to determine whether the observed results fit a uniform
distribution.

According to genetic theory the number of colour-strains (pink, white and blue) of a certain
flower should appear in the ratio 2:3:5.

For 120 plants, the results were as follows:

Colour Pink White Blue Total


Frequency 17 29 74 120

(ii) (a) Use chisq.test to determine whether the observed results are consistent
with genetic theory.

(b) Extract the degrees of freedom from the results of chisq.test.

An insurer believes that the distribution of the number of claims on a particular type of policy is
binomial with parameters n = 3 and p . A random sample of the number of claims on 153 policies
revealed the following results:

Number of claims 0 1 2 3
Number of policies 60 75 16 2

(iii) (a) Show that the method of moments estimate for p is 0.246.

(b) Use chisq.test to carry out a goodness of fit test for the specified binomial
model for the number of claims on each policy, ensuring that the expected
frequencies are greater than 5.

(c) Use pchisq to find the correct p-value.

The Actuarial Education Company © IFE: 2019 Examinations


Page 16 CS1B-09: Hypothesis Tests – Exercises

Exercise 9.14
In an investigation into the effectiveness of car seat belts, 292 accident victims were classified
according to the severity of their injuries and whether they were wearing a seat belt at the time
of the accident. The results were as follows:

Wearing a seat belt Not wearing a seat belt

Death 3 47

Severe injury 78 32

Minor injury 103 29

(i) (a) Store these names and frequencies in the matrix obs.

(b) Use chisq.test to determine whether the severity of injuries sustained is


dependent on whether the victims are wearing a seat belt.

The eye colour and hair colour for a group of Caucasian people were noted:

Fair hair Dark hair


Blue eyes 37 15
Brown eyes 12 50

(ii) (a) Store these names and frequencies in the matrix obs2.

(b) Use chisq.test to determine whether eye colour and hair colour are
independent.

(c) Repeat part (ii)(b) without the Yates’ continuity correction.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Exercises Page 17

Exercise 9.15
The eye colour and hair colour for a group of Caucasian people were noted:

Fair hair Dark hair


Blue eyes 37 15
Brown eyes 12 50

(i) Store these names and frequencies in the matrix obs2.

(ii) Use fisher.test to determine whether eye colour and hair colour are independent
and give the exact p-value.

(iii) Compare the p-value obtained in part (ii) using chisq.test both with and without
Yates’ continuity correction.

(iv) Get R to display only the p-value from part (ii), rather than the whole test.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-09: Hypothesis Tests – Answers Page 1

9
Hypothesis Tests
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-09: Hypothesis Tests – Answers

Exercise 9.01
(i) (a) Statistic= −0.8282156 , p-value = 0.2037742, hence do not reject H0 and conclude
µ = 165 .

(b) Statistic= 1.807016, p-value = 0.07075981, hence do not reject H0 and conclude
µ = 158 .

(ii) (b) Statistic= 0.3019688, p-value = 0.7626759, hence do not reject H0 and conclude
mean weight is 64.2kg.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 3

Exercise 9.02
(ii) Statistic= 2.5758, p-value = 0.01495, hence sufficient evidence to reject H0 , annual
rainfall has increased.

(iii) Statistic= 1.3138, p-value = 0.192, hence insufficient evidence to reject H0 , mean water
leakage damage is £300.

(iv) Statistic = −4.3301 , p-value = 0.000346, hence sufficient evidence to reject H0 , mean
height is less than 70 inches.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-09: Hypothesis Tests – Answers

Exercise 9.03
(i) Statistic= 2.03161, p-value = 0.01808439, hence sufficient evidence to reject H0 , the
standard deviation of annual rainfall is not equal to 10cm.

(ii) Statistic= 86.7747, p-value = 0.195035, hence insufficient evidence to reject H0 , the
standard deviation of the water leakage damage is not less than £100.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 5

Exercise 9.04
There is currently no Exercise 9.04.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-09: Hypothesis Tests – Answers

Exercise 9.05
(i) p-value = 3.977e-05 (ie 0.004%), hence sufficient evidence to reject H0 , the proportion of
adult males born in the UK carrying the gene is less than 10%.

(ii) 3.977226e-05 (ie 0.004%)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 7

Exercise 9.06
(i) p-value = 0.004787, hence sufficient evidence to reject H0 , we conclude that λ > 0.18 .

(ii) 0.232

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-09: Hypothesis Tests – Answers

Exercise 9.07
(i) p-value = 0.1319, hence insufficient evidence to reject H0 , patients on the special diet
have the same average blood pressure than the control group.

(ii) p-value = 0.02885, hence sufficient evidence to reject H0 , patients on the special diet
have lower average blood pressure than the control group.

(iii) (b) p-value = 0.2759, hence insufficient evidence to reject H0 , the difference
between the means of the two species is equal to 4cm.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 9

Exercise 9.08
(i) p-value = 0.4509, hence insufficient evidence to reject H0 , patients in the two groups
have same variance between their blood pressures.

(ii) (b) The p-value is as good as zero, hence there is sufficient evidence to reject H0 , the
variances of the two species are not equal.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-09: Hypothesis Tests – Answers

Exercise 9.09
There is currently no Exercise 9.09.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 11

Exercise 9.10
(i) p-value = 0.2275, hence insufficient evidence to reject H0 , the underlying proportion of
home claims that are due to burglary is the same in both years.

(iii) p-value = 0.633, hence insufficient evidence to reject H0 , the rates of burglary have not
changed between the two years.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-09: Hypothesis Tests – Answers

Exercise 9.11
(iii)

(iv) (a) p-value = 0.1322988, hence insufficient evidence to reject H0 , no difference in


mean blood pressure under special diet.

(b) This p-value is very close to the value under the normality assumption.

(vi) p-value = 0.1337, hence insufficient evidence to reject H0 , no difference in mean blood
pressure under special diet.

(p-value = 0.1287 for C – T)

This is very close to the value using all combinations.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 13

Exercise 9.12
(iii) (a) p-value = 0.015625, hence sufficient evidence to reject H0 , blood pressure drops
under special diet.

(b) The non-parametric p-value is lower than the one under the normality
assumption.

(v) (a) p-value = 0.0153, hence sufficient evidence to reject H0 , blood pressure drops
under special diet.

(b) This p-value is very close to the value using all combinations.

The Actuarial Education Company © IFE: 2019 Examinations


Page 14 CS1B-09: Hypothesis Tests – Answers

Exercise 9.13
(i) (b) 300/5 = 60 each day

(c) Statistic= 6.4333 on chi-square 4, p-value = 0.169, hence do not reject H0 and
conclude absent employees are evenly distributed.

(ii) (a) Statistic= 6.6694 on chi-square 2, p-value = 0.03562, hence reject H0 and
conclude that the observed results are not consistent with genetic theory.

(b) 2

(iii) (b) Combining the last two groups so expected frequencies are greater than 5 gives a
statistic of 3.4675 on chi-square 2, p-value = 0.1766, hence do not reject H0 and
conclude that the binomial model is a good fit.

(c) Statistic of 3.4675 on chi-square 1, p-value = 0.0625, hence do not reject H0 and
conclude that the binomial model is a good fit.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-09: Hypothesis Tests – Answers Page 15

Exercise 9.14
(i) (b) Statistic= 85.449 on chi-square 2, p-value = 0, hence reject H0 and conclude injury
NOT independent of wearing seatbelt.

(ii) (b) Statistic= 28.885 on chi-square 1, p-value = 0, hence reject H0 and conclude eye
colour not independent of hair colour.

(c) Statistic= 30.962 on chi-square 1, p-value = 0, hence reject H0 and conclude eye
colour not independent of hair colour.

The Actuarial Education Company © IFE: 2019 Examinations


Page 16 CS1B-09: Hypothesis Tests – Answers

Exercise 9.15
(ii) p-value = 2.36e-08, we reject H0 and conclude eye colour depends on hair colour.

(iii) With Yates’ continuity correction: p-value = 7.68e-08, we reject H0 and conclude eye
colour depends on hair colour.

Without Yates’ continuity correction: p-value = 2.63e-08, we reject H0 and conclude eye
colour depends on hair colour.

Surprisingly using the continuity correction is not as accurate here.

(iv) 2.362163e-08

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 1

10
Data Analysis
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-10: Data Analysis – Exercises

Data requirements
These exercises require the following data files:

• baby weights.txt
• AIDS.csv

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 3

Exercise 10.01
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
of unborn babies. The table below shows the estimated weights for one particular baby at
fortnightly intervals during the pregnancy.

Gestation period (weeks) 30 32 34 36 38 40

Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5

This data is contained in the text file: ‘baby weights’

(i) (a) Load the data frame and store it in the data frame baby.

(b) Obtain the following scattergraph:

(c) Comment on the linear relationship.

(continued overleaf)

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-10: Data Analysis – Exercises

The numbers of new AIDS cases recorded in the US in successive years during the early part of the
AIDS epidemic are shown in the table below.

Year 81 82 83 84 85 86 87 88

Number of cases (000s) 0.34 1.20 3.15 6.37 12.04 19.40 29.11 36.13

Source: www.avert.org

This data is contained in the CSV file: ‘AIDS’

(ii) (a) Load the data frame and store it in the data frame AIDS.

(b) Plot a scattergraph and comment on the linear relationship.

(iii) (a) Create a new data frame AIDS2 which contains the log of the number of cases.

(b) Obtain the following scattergraph of the logged data:

(c) Comment on the linear relationship.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 5

Exercise 10.02
This question uses the ‘baby weights’ data from Exercise 10.01 that should be stored in the data
frame baby.

(i) Use the cor function to obtain the following correlation coefficients:

(a) Pearson

(b) Spearman

(c) Kendall.

(ii) (a) Store the gestation period in vector x and the weight in vector y.

(b) Create objects Sxx, Syy and Sxy which contain the sum of squares for the baby
weights data.

(c) Hence calculate the (Pearson) correlation coefficient.

(iii) (a) Calculate the Pearson correlation coefficient of the ranks of the gestation period
and ranks of the weights.

(b) Check this gives the same result as the formula:

6∑ di2
rs = 1 − i
2
n(n − 1)

This question uses the ‘AIDS’ data from Exercise 10.01 that should be stored in the data frame
AIDS and the log data stored in the data frame AIDS2.

(iv) (a) Use cor to calculate the correlation coefficient for the AIDS data before and after
logging.

(b) Comment on the effect of logging the data on the linear correlation.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-10: Data Analysis – Exercises

Exercise 10.03
This question uses the ‘baby weights’ data from Exercise 10.01 that should be stored in the data
frame baby.

(i) (a) Use cor.test and the Pearson correlation coefficient to test whether ρ = 0 .

(b) Extract the statistic for the test in part (i)(a).

(c) Use the statistic from part (i)(b) to obtain the p-value for the test in part (i)(a).

(ii) Use cor.test and Spearman’s correlation coefficient to test H0 : ρ s = 0 vs H1 : ρ s > 0 at


the 1% significance level.

(iii) Use cor.test to test if the true value of Kendall’s correlation coefficient is less than
zero.

(iv) =
Use Fisher’s transformation to test whether H0 : ρ 0.9 vs H1 : ρ > 0.9 stating the p-
value.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 7

Exercise 10.04
The built in data set Iris contains measurements (in cm) of the variables sepal length, sepal width,
petal length and petal width, respectively, for 50 flowers from each of 3 species (Iris setosa,
versicolor, and virginica) of iris.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
… … … … …
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

(i) Extract the four measurements for the setosa species only and store them in the 50 × 4
data frame, SDF.

(ii) Use plot to obtain a scattergraph of each pair of measurements for the setosa species.

(iii) Use pairs to produce the following scattergraph:

(iv) Comment on the relationship between Petal Width and the other measurements.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-10: Data Analysis – Exercises

Exercise 10.05
This question uses the setosa iris data from Exercise 10.04 that should be stored in the data frame
SDF.

(i) Use the cor function to obtain the following correlation coefficients between all the four
pairs of variables:

(a) Pearson

(b) Spearman

(c) Kendall.

(ii) Obtain the Spearman correlation coefficient between the Sepal Length and the Petal
Length only.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 9

Exercise 10.06
This question uses the setosa iris data from Exercise 10.04 that should be stored in the data frame
SDF.

(i) (a) Use cor.test and the Pearson correlation coefficient to test H0 : ρ = 0 vs
H1 : ρ > 0 between Sepal Length and Petal Length.

(b) Extract the statistic and the degrees of freedom for the test in part (i)(a).

(c) Use the statistic from part (i)(b) to obtain the p-value for the test in part (i)(a).

(ii) Use cor.test and Kendall’s correlation coefficient to test whether the true value of τ
is zero between Sepal Width and Petal Length.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-10: Data Analysis – Exercises

Exercise 10.07
This question uses the ‘baby weights’ data from Exercise 10.01 that should be stored in the data
frame baby.

(i) Use prcomp to carry out PCA on the baby weights data and store it in the object pca1.

(ii) (a) Obtain the eigenvectors (matrix W) of each principal component in pca1.

(b) Explain what they represent.

(iii) (a) Obtain the principal components decomposition (matrix P) for the baby weight
data from pca1.

(b) Interpret what this represents.

(iv) (a) Obtain the centre and scale of pca1.

(b) Explain what these mean.

(v) (a) Obtain the percentages each of the variances of the principal components using
the summary of the prcomp function.

(b) Using part (a) determine which components, if any, should be dropped.

The scattegraph of the centred baby weight data is shown by the + signs below:

The circles show the points obtained when the second principal component is removed.

(vi) Comment on the fit.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Exercises Page 11

Exercise 10.08
This question uses the setosa iris data from Exercise 10.04 that should be stored in the data frame
SDF.

(i) Using scale or otherwise, obtain a scaled matrix of the 50 observations of 4 variables
which have zero mean and store in the matrix object X.

(ii) Use eigen to obtain the eigenvectors of XT X and store them in the matrix object W.

(iii) Obtain P = XW , the principal components decomposition P of X.

(iv) (a) Obtain the diagonal matrix S = PT P .

(b) Calculate what percentage each of the variances in matrix S are of the total.

(c) State which principal component(s) should be dropped to simplify the


dimensionality.

(v) (a) Obtain the matrix P using the prcomp function.

(b) Obtain the percentages in part (iv)(b) using the summary of the prcomp
function.

(c) Draw a scree diagram using plot on the result of the prcomp function and hence
state which principal component(s) should be dropped to simplify the
dimensionality.

(vi) (a) Carry out PCA with scaling of the data using prcomp.

(b) Using the Kaiser Criterion state which principal component(s) should be dropped
to simplify the dimensionality.

(vii) (a) Using cbind and rep, or otherwise, obtain a new matrix P1 which has only the
first two principle components and vectors of zeroes for the removed
components.

(b) Obtain the reduced data set X1 using X1 = P1WT .

(c) Use pairs to plot X1 and compare to the original data.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-10: Data Analysis – Answers Page 1

10
Data Analysis
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-10: Data Analysis – Answers

Exercise 10.01
(i) (c) Apart from the result at 32 weeks it is nearly a perfect straight line so there is a
very strong linear relationship.

(ii) (b)

There appears to be an exponential not a linear relationship.

(iii) (c) It’s much more linear but it still appears to have some curvature.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Answers Page 3

Exercise 10.02
(i) (a) 0.984336

(b) 1

(c) 1

(ii) (c) 0.984336

(iii) (a) 1

(iv) (a) 0.9603627 and 0.9725912

(b) The correlation coefficient has increased – so logging the data has improved the
linearity.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-10: Data Analysis – Answers

Exercise 10.03
(i) (a) p-value = 0.0003661 reject H0 and conclude ρ ≠ 0 .

(b) 11.16642

(ii) p-value = 0.001389 reject H0 and conclude that ρ s > 0 .

(iii) p-value = 1 do NOT reject H0 and conclude Kendall’s correlation coefficient is not less
than zero.

(iv) p-value is 0.050186, do NOT reject H0 and conclude that ρ = 0.9 .

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Answers Page 5

Exercise 10.04
(ii)

(iv) It looks like there might be weak positive correlation between Petal Width and all the
other variables. For example there are few values in the top left quadrant.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-10: Data Analysis – Answers

Exercise 10.05
(i) (a)

Sepal.Length Sepal.Width Petal.Length Petal.Width


Sepal.Length 1.0000000 0.7425467 0.2671758 0.2780984
Sepal.Width 0.7425467 1.0000000 0.1777000 0.2327520
Petal.Length 0.2671758 0.1777000 1.0000000 0.3316300
Petal.Width 0.2780984 0.2327520 0.3316300 1.0000000

(b)

Sepal.Length Sepal.Width Petal.Length Petal.Width


Sepal.Length 1.0000000 0.7553375 0.2788849 0.2994989
Sepal.Width 0.7553375 1.0000000 0.1799110 0.2865359
Petal.Length 0.2788849 0.1799110 1.0000000 0.2711414
Petal.Width 0.2994989 0.2865359 0.2711414 1.0000000

(c)

Sepal.Length Sepal.Width Petal.Length Petal.Width


Sepal.Length 1.0000000 0.5972530 0.2173273 0.2311459
Sepal.Width 0.5972530 1.0000000 0.1426453 0.2342674
Petal.Length 0.2173273 0.1426453 1.0000000 0.2217029
Petal.Width 0.2311459 0.2342674 0.2217029 1.0000000

(ii) 0.2788849

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Answers Page 7

Exercise 10.06
(i) (a) p-value = 0.03035, reject H0 and conclude that there is an increasing monotonic
relationship between Sepal Length and Petal Length.

(b) 1.920876, 48

(ii) p-value = 0.1876, do not reject H0 and conclude that there is no monotonic relationship
between Sepal Width and Petal Length.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-10: Data Analysis – Answers

Exercise 10.07
(ii) (a)

PC1 PC2
gestation 0.9797143 -0.2003992
weight 0.2003992 0.9797143

(b) These are the orthogonal vectors of the new co-ordinate system (which is a
rotation of the old co-ordinate system).

(iii) (a)

PC1 PC2
[1,] -5.0889509 0.07126724
[2,] -3.1094823 -0.23155967
[3,] -0.9897343 0.15141345
[4,] 1.0298141 0.04452941
[5,] 3.0694025 0.03561680
[6,] 5.0889509 -0.07126724

(b) Matrix P expresses the 6 points in terms of the two new principal components.
This is effectively a new co-ordinate system with the most important one across
the horizontal axis.

(iv) (a) Center

gestation weight
35.00 2.55

Scale: FALSE

(b) prcomp subtracted 35 from gestation and 2.55 from weight to make the means
zero. prcomp did not change the scale ie did not divide to change the variance.

(v) (a) 99.88% and 0.12%.

(b) We should clearly drop the second component.

(vi) We have essentially removed one dimension and so left a straight line which is the trend
line. We can see it is a good fit to most of the points.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Answers Page 9

Exercise 10.08
(ii)
[,1] [,2] [,3] [,4]
[1,] 0.66907840 0.5978840 0.4399628 -0.03607712
[2,] 0.73414783 -0.6206734 -0.2746075 -0.01955027
[3,] 0.09654390 0.4900556 -0.8324495 -0.23990129
[4,] 0.06356359 0.1309379 -0.1950675 0.96992969

(iii) The first six rows are:

[,1] [,2] [,3] [,4]


1 0.106842367 -0.024893980 0.082169737 -0.034541755
2 -0.394047228 0.165865927 0.131480917 -0.017551195
3 -0.390687734 -0.126851118 0.071811819 0.009744303
4 -0.511701577 -0.026561059 -0.111213611 -0.032673214
5 0.113349309 -0.146749722 0.010712713 -0.032889070
6 0.642900908 0.079406116 -0.184432770 0.068830552

(iv) (a)

[,1] [,2] [,3] [,4]


[1,] 11.5863 0.000 0.000 0.0000
[2,] 0.0000 1.809 0.000 0.0000
[3,] 0.0000 0.000 1.313 0.0000
[4,] 0.0000 0.000 0.000 0.4426

(b) 76.47%, 11.94%, 8.67% and 2.92%

(c) The first two PCs explain 88.41%, the first three PCs explain 97.08% which is more
than 90%. So probably drop just fourth PC.

(v) (a) The first six rows of are:

PC1 PC2 PC3 PC4


1 -0.106842367 -0.024893980 0.082169737 -0.034541755
2 0.394047228 0.165865927 0.131480917 -0.017551195
3 0.390687734 -0.126851118 0.071811819 0.009744303
4 0.511701577 -0.026561059 -0.111213611 -0.032673214
5 -0.113349309 -0.146749722 0.010712713 -0.032889070
6 -0.642900908 0.079406116 -0.184432770 0.068830552

Note, if you used matrix P rather than the output of prcomp (as asked for) you
will have the opposite signs for PC1.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-10: Data Analysis – Answers

(c)

It seems to level off after just the first PC.

(vi) (b)

PC1 PC2 PC3 PC4


Standard deviation 1.4348 1.0110 0.8172 0.50146

The Kaiser Criterion only keeps components whose var (or sd) of scaled data is
greater than 1. Hence, it would suggest keeping only the first two PCs.

(vii) (b) The first six rows of X1 are:

[,1] [,2] [,3] [,4]


1 -0.086369633 -0.062987060 -0.022514413 -0.0100508505
2 0.362817076 0.186340345 0.119326381 0.0467651959
3 0.185558470 0.365555668 -0.024445583 0.0082238963
4 0.326488042 0.392150345 0.036385270 0.0290477419
5 -0.163578887 0.007868503 -0.082858706 -0.0264199914
6 -0.382675466 -0.521269571 -0.023154749 -0.0304678215

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-10: Data Analysis – Answers Page 11

(c) The reduced data gives:

The original data is:

It captures sepal length and width relationship well but not the other
relationships.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-11: Linear Regression – Exercises Page 1

11
Linear Regression
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-11: Linear Regression – Exercises

Data requirements
These exercises require the following data file:

• baby weights.txt
• growth.csv

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Exercises Page 3

Exercise 11.01
There is currently no Exercise 11.01.

However, you may wish to revisit the exercises from the data analysis chapter if it has been a
while since you looked at them. The exercises in this chapter will assume you are able to recall
and use the R code from that chapter.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-11: Linear Regression – Exercises

Exercise 11.02
A new computerised ultrasound scanning technique has enabled doctors to monitor the weights
of unborn babies. The table below shows the estimated weights for one particular baby at
fortnightly intervals during the pregnancy.

Gestation period (weeks) 30 32 34 36 38 40

Estimated baby weight (kg) 1.6 1.7 2.5 2.8 3.2 3.5

This data is contained in the file: ‘baby weights.txt’

(i) (a) Load the data in the file ‘baby weights.txt’, and store it in the data frame baby.

(b) Fit a linear regression model, model1, of weight on gestation period.

(ii) (a) Obtain the slope and intercept parameters.

(b) Plot a labelled scattergraph of the data and add a red dashed regression line onto
your scatterplot.

(iii) Obtain the fitted values:

(a) by extracting them from model1

(b) using the fitted command

(c) using the predict command.

(iv) Add blue points to the scatterplot to show the fitted values.

(v) Obtain the expected baby’s weight at 42 weeks (assuming it hasn’t been born by then):

(a) from first principles by extracting the coefficients from model1

(b) using the predict function.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Exercises Page 5

Exercise 11.03
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.

(i) Obtain the total sum of squares in the baby weights model together with its split between
the residual sum of squares and the regression sum of squares:

(a) using the ANOVA command

(b) from first principles using the functions sum, mean, fitted and residuals.

(ii) Obtain the coefficient of determination, R2 , from the:

(a) linear regression model, model1

(b) correlation coefficient

(c) by calculation from the values in the ANOVA table.

(iii) Obtain the correlation coefficient from the extracted coefficient of determination.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-11: Linear Regression – Exercises

Exercise 11.04
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.

(i) Obtain the statistic and p-value for a test of H0 : β = 0 vs H1 : β ≠ 0 .

(ii) Use confint to:

(a) obtain a 99% confidence interval for the slope parameter

(b) test, at the 5% level, whether β = 0.24 .

(iii) Extract the estimated value of beta, the standard error of beta and the degrees of
freedom and store them in the objects b, se and dof.

(iv) Using the objects created in part (iii), use a first principles approach to:

(a) obtain a 90% confidence interval for the slope parameter.

(b) obtain the statistic and p-value for a test of H0 : β = 0.25 vs H1 : β < 0.25 .

(c) obtain the statistic and p-value for a test of H0 : β = 0.18 vs H1 : β ≠ 0.18 .

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Exercises Page 7

Exercise 11.05
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.

(i) Obtain the results of an F-test to test the ‘no linear relationship’ hypothesis using the:

(a) anova command

(b) summary command.

(ii) Calculate the F statistic and p-value from first principles by extracting the mean sum of
squares and degrees of freedom from the ANOVA table.

(iii) Obtain a 95% confidence interval for the error variance, σ 2 , from first principles.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-11: Linear Regression – Exercises

Exercise 11.06
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.

(i) Estimate the mean weight of a baby at 33 weeks:

(a) from first principles by extracting the coefficients from model1.

(b) using the predict function.

(ii) Obtain a 90% confidence interval for the:

(a) mean weight of a baby at 33 weeks.

(b) weight of an individual baby at 33 weeks.

(iii) Obtain the estimated mean weight of a baby at 0 weeks:

(a) from the model1 parameters

(b) using the predict function,

and comment on the value obtained.

(iv) Obtain a 99% confidence interval for the mean weight of a baby at 0 weeks:

(a) using the confint function

(b) using the predict function.

(v) In one go, use the predict function to obtain:

(a) the mean weights of babies at 20, 21, 22, 23, 24 weeks

(b) 95% confidence intervals for the mean weight of a baby at 20, 21, 22, 23, 24
weeks.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Exercises Page 9

Exercise 11.07
This question uses the ‘baby weights’ linear regression model, model1, of weight on gestation
period, created in an earlier exercise.

(i) Obtain the residuals for the regression model:

(a) from first principles using the fitted command

(b) using the residuals function.

(ii) (a) Obtain a plot of the residuals against the fitted values.

(b) Comment on the constancy of the variance and whether a linear model is
appropriate.

(iii) (a) Obtain a Q-Q plot of the residuals.

(b) Comment on the normality assumption.

(iv) Examine the final two graphs obtained by plot(model1) and comment.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-11: Linear Regression – Exercises

Exercise 11.08
Part (i) of this question uses the ‘baby weights’ linear regression model, model1, of weight on
gestation period, created in an earlier exercise.

(i) (a) Obtain a new linear regression model, model2, based on the data without the
second data point (gestation of 32 weeks).

(b) By examining the new value of R2 comment on the fit of model2 compared to
that of model1 which had R2 = 0.9689 .

The table below shows the results from a growth experiment:

x 1 2 3 4 5 6 7 8 9 10

y 0.33 0.51 0.75 1.16 1.90 2.59 5.14 7.39 11.3 17.4

This data is contained in the csv file: ‘growth’

(ii) It is thought that a suitable model is y = aebxi +ei .

(a) Load the csv file and store it in the data frame growth.

(b) Obtain a scatterplot of y vs x and comment whether a linear model is appropriate


for these data.

(c) Obtain a scatterplot of lny vs x and comment whether a linear model is


appropriate for this transformation.

(iii) Fit a linear regression model, model3, of ln yi on x .

(iv) (a) Obtain estimates for the slope and intercept parameters for model3.

(b) Add a red dashed regression line to your scatterplot of lny vs x from part (ii)(c).

(v) (a) Use part (iv)(a) to obtain estimates for a and b .

(b) Re-plot the scatterplot of y vs x and this time add blue points to the scatterplot
to show the fitted values of y using model3.

(c) Add a dashed red regression curve that passes through the fitted points.

(vi) Obtain a 95% confidence interval for the mean value of y when x = 8.5 .

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 1

11
Linear Regression
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-11: Linear Regression – Answers

Exercise 11.01
There is currently no Exercise 11.01.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 3

Exercise 11.02
(ii) (a) slope parameter = 0.2043 , intercept parameter = −4.6000 .

(b) See part (iv) but ignore the points on the regression line.

(iii)
1 2 3 4 5 6
1.528571 1.937143 2.345714 2.754286 3.162857 3.571429

(iv)

(v) 3.98 kg

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-11: Linear Regression – Answers

Exercise 11.03
(i) SSTOT = 3.015 , SSRES = 0.09371 and SSREG = 2.92129 .

(ii) R2 = 0.9689173

(iii) r = 0.984336

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 5

Exercise 11.04
(i) Statistic = 11.17 and p-value = 0.000366, hence reject H0 and conclude β ≠ 0 .

(ii) (a) (0.120, 0.289)

(b) Since the 95% confidence interval, (0.153, 0.255), contains β = 0.24 , we do not
reject H0 .

(iv) (a) (0.165, 0.243)

(b) Statistic = −2.499 and p-value = 0.0334, hence reject H0 and conclude β < 0.25 .

(c) Statistic = 1.327 and p-value = 0.2550457, hence do not reject H0 and conclude
β = 0.18 .

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-11: Linear Regression – Answers

Exercise 11.05

(i) F statistic = 124.69 and p-value = 0.0003661, hence reject H0 and conclude there is a
linear relationship between gestation and weight.

(iii) (0.00841, 0.193)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 7

Exercise 11.06
(i) 2.141429 kg

(ii) (a) (1.987, 2.296)

(b) (1.780, 2.502)

(iii) −4.6 kg, clearly the linear relationship does not continue backwards to conception.

(iv) (−7.562, −1.638)

(v) (a) −0.514, −0.310, −0.106,0.0986,0.303

(b) (−1.30, 0.267) , (−1.04, 0.422) , (−0.788, 0.577) , (−0.535, 0.732) , (−0.282, 0.888)

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-11: Linear Regression – Answers

Exercise 11.07
(i)
1 2 3 4 5
0.07142857 -0.23714286 0.15428571 0.04571429 0.03714286
6
-0.07142857

(ii) (a)

(b) It’s hard to tell with so few values – but point 2 looks like an outlier. If we include
point 2 then there is no discernible pattern, however if we omit point 2 then there
could be a possible pattern.

The variance appears constant (in the sense it's not increasing).

(iii) (a)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 9

(b) It’s hard to tell with so few values – but point 2 looks like an outlier. If we don’t
include that value then it seems OK.

If we include point 2, then it appears to make a ‘banana’ shape which indicates


skewness.

(iv)

Again, it’s hard to tell with so few values but it looks like constant variance.

The diagram shows that points 1 and 6 have the most influence. However, point 2 is a
combination of an outlier and high influence and therefore should be removed.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-11: Linear Regression – Answers

Exercise 11.08
(i) (b) R2 = 0.9935 which is greater hence model 2 has a better fit.

(ii) (b)

(c)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11: Linear Regression – Answers Page 11

(iv) (a) slope parameter = 0.4467 , intercept parameter = −1.5998 .

(b)

(v) (a) a = 0.2019412 and b = 0.4466943 .

(b)(c)

(vi) (8.41,9.63)

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-11b: Multivariate Linear Regression – Exercises Page 1

11b
Multivariate Linear
Regression
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-11b: Multivariate Linear Regression – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 3

Exercise 11b.01
There is currently no Exercise 11b.01.

However, you may wish to revisit the exercises from the data analysis and linear regression
chapters if it has been a while since you looked at them. The exercises in this chapter will assume
you are able to recall and use the R code from those chapters.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-11b: Multivariate Linear Regression – Exercises

Exercise 11b.02
The built in data set iris contains measurements (in cm) of the variables sepal length, sepal
width, petal length and petal width, respectively, for 50 flowers from each of 3 species (Iris
setosa, versicolor, and virginica) of iris.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
… … … … …
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

(i) Extract the four measurements for the versicolor species only and store them in the 50×4
data frame, VDF.

A scattergraph of the VDF shows positive correlation between Petal.Width and the other
variables:

(ii) Using the versicolor iris data fit a linear regression model, model2, with Petal.Width as
the response variable and Sepal.Length, Sepal.Width and Petal.Length as explanatory
variables:

α + β1 x1 + β2 x2 + β3 x3
y=

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 5

(iii) Obtain the slope and intercept parameters.

(iv) Obtain the fitted Petal Width values:

(a) by extracting them from model2

(b) using the fitted command

(c) using the predict command.

(v) Obtain the expected petal width on a versicolor iris with sepal length 5.1cm, sepal width
3.5cm and petal length 1.4cm:

(a) from first principles by extracting the coefficients from model2.

(b) using the predict function.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-11b: Multivariate Linear Regression – Exercises

Exercise 11b.03
This question uses the versicolor iris linear regression model, model2, with Petal.Width (y) as
the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as
explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(i) Obtain the total sum of squares in model2 together with its split between the residual
sum of squares and the regression sums of squares using the anova command.

(ii) Obtain the coefficient of determination, R2 from the:

(a) linear regression model, model2

(b) by calculation from the values in the ANOVA table.

2
(iii) Obtain the adjusted coefficient of determination, Radj from the:

(a) linear regression model, model2

(b) by calculation from the values in the ANOVA table.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 7

Exercise 11b.04
This question uses the versicolor iris linear regression model, model2, with Petal.Width (y) as
the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as
explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(i) State the statistic and p-value for a test of H0 : β1 = 0 vs H1 : β1 ≠ 0 .

(ii) Use confint to:

(a) obtain a 90% confidence interval for β2

(b) test, at the 5% level, whether β3 = 0.24 .

(iii) Extract the value of β2 , the standard error of β2 and the degrees of freedom and store
them in the objects b2, se2 and dof.

(iv) Using the objects created in part (iii), use a first principles approach to:

(a) obtain a 90% confidence interval for β2 and compare to part (ii)(a).

(b) obtain the statistic and p-value for a test of H0 : β2 = 0.3 vs H1 : β2 < 0.3 .

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-11b: Multivariate Linear Regression – Exercises

Exercise 11b.05
This question uses the versicolor iris linear regression model, model2, with Petal.Width (y) as
the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as
explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(i) Carry out an F-test to test H0 : β=


1 β= 2 β=
3 0 using the summary command, stating
the test statistic and p-value clearly.

(ii) Obtain a 95% confidence interval for the error variance, σ 2 , from first principles.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 9

Exercise 11b.06
This question uses the versicolor iris linear regression model, model2, with Petal.Width (y) as
the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as
explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(i) Obtain the expected petal width on a versicolor iris with sepal length 5.94cm, sepal
width 2.77cm and petal length 4.26cm.

(ii) Obtain a 90% confidence interval for the:

(a) mean petal width for versicolor irises with sepal length 5.94cm, sepal
width 2.77cm and petal length 4.26cm.

(b) petal width of an individual versicolor iris with sepal length 5.94cm, sepal
width 2.77cm and petal length 4.26cm.

(iii) Obtain the estimated mean value of α :

(a) from the model2 parameters

(b) using the predict function.

(iv) Obtain a 99% confidence interval for the mean value of α :

(a) using the confint function

(b) using the predict function.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-11b: Multivariate Linear Regression – Exercises

Exercise 11b.07
This question uses the versicolor iris linear regression model, model2, with Petal.Width (y) as
the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as
explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(i) Obtain the residuals for the regression model:

(a) from first principles using the fitted command

(b) using the residuals function.

(ii) (a) Obtain a plot of the residuals against the fitted values.

(b) Comment on the constancy of the variance and whether a linear model is
appropriate.

(iii) (a) Obtain a Q-Q plot of the residuals.

(b) Comment on the normality assumption.

(iv) Examine the final two graphs obtained by plot(model2) and comment.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 11

Exercise 11b.08
This question uses the versicolor iris data from Exercise 11b.02 which should be stored in the data
frame VDF.

We are fitting multiple linear regression models with Petal.Width as the response variable and a
combination of Sepal.Length, Sepal.Width and Petal.Length as explanatory variables.

Forward selection

(i) Fit the null regression model, fit0, to the Petal.Width data.

(ii) Obtain the (Pearson) linear correlation coefficient between all the pairs of variables.

(iii) Fit a linear regression model, fit1, with Petal.Width as the response variable and the
variable with the greatest correlation with Petal.Width as the explanatory variable.

(iv) (a) Fit a linear regression model, fit2, with Petal.Width as the response variable
and the variable from part (iii) and the variable with the next highest correlation
with Petal.Width as the two explanatory variables.

(b) Compare the adjusted R2 of fit1 and fit2 and comment.

(v) (a) Fit a linear regression model, fit3, with Petal.Width as the response variable
and the variables from part (iv) plus the last variable as the explanatory variables.

(b) Compare the adjusted R2 of fit2 and fit3 and comment.

(vi) Comment on the output of the fit3 model and the results of the ANOVA output.

Backward selection

Start with versicolor iris linear regression model, model2, with Petal.Width (y) as the response
variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

(vii) (a) Update the model to create model2b by removing the variable with β j not
significantly different from zero.

(b) Compare the adjusted R2 of model2 and model2b and comment.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-11b: Multivariate Linear Regression – Exercises

Exercise 11b.09
This question uses the versicolor iris linear regression model, fit3, from Exercise 11b.08, with
Petal.Width (y) as the response variable and Petal.Length (x1 ) , Sepal.Width (x2 ) and
Sepal.Length (x3 ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3
y=

Forward selection

(i) (a) Fit a linear regression model, fit4, with Petal.Width as the response variable
and a two-way interaction term between the two most significant variables.

(b) Compare the adjusted R2 of fit3 and fit4. Comment on these values and the
results of the ANOVA output.

(ii) Create two further models, fit5 and fit6, each containing the three explanatory
variables from fit3 plus a single two-way interaction term. Show that only one of them
improves the value of the adjusted R2 but the ANOVA output shows that there is no
significant improvement in fit.

(iii) Explain why we would not consider adding a three-way interaction term in this case.

Backward selection

Start with the versicolor iris linear regression model, fitA, with Petal.Width (y) as the response
variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) and Petal.Length (x3 ) as explanatory variables,
together with all two and three way interactions.

(iv) Update the model fitA to create fitB, fitC, etc by removing:

(a) any insignificant three way interaction terms

(b) any insignificant two way interaction terms

(c) any insignificant main effect terms.

Each time compare only the adjusted R2 of the models to ensure only those models
which improve the fit are kept.

(v) Comment on the limitations of only using adjusted R2 as a basis for model fit.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Exercises Page 13

Exercise 11b.10
There is currently no Exercise 11b.10.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-11b: Multivariate Linear Regression – Answers Page 1

11b
Multivariate Linear
Regression
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.01
There is currently no Exercise 11b.01.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Answers Page 3

Exercise 11b.02
(iii) α=
−0.1686, β1 =
−0.07398, β2 =
0.2233, β3 =
0.3088

(iv) 1.4791476, ....., 1.3007574

(v) 0.6678075cm

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.03
(i) SSTOT = 1.9162, SSRES = 0.56164, SSREG = 1.354562

(ii) R2 = 0.7069

(iii) adjusted R2 = 0.6878

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Answers Page 5

Exercise 11b.04
(i) Statistic = −1.560 and p-value = 0.125599 hence do not reject H0 and conclude β1 = 0 .

(ii) (a) (0.119, 0.327)

(b) Since the 95% confidence interval (0.201, 0.416) contains 0.24 we do not reject
H0 and conclude β3 = 0.24 .

(iv) (a) (0.119, 0.327). The confidence interval is the same as part (ii)(a).

(b) Statistic = −1.240004 and p-value = 0.1106 hence do not reject H0 and conclude
β2 = 0.3 .

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.05
(i) Statistic = 36.98 and p-value = 2.571e-12, hence reject H0 and conclude there is at least
one non-zero slope parameter.

(ii) (0.00843, 0.0193)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Answers Page 7

Exercise 11b.06
(i) 1.325704cm

(ii) (a) (1.299, 1.352)

(b) (1.138, 1.513)

(iii) −0.16864

(iv) (−0.6777, 0.3404)

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.07
(i) −0.0791475663,..., −0.0007574191

(ii) (a)

(b) The variance appears to start increasing towards the end, so it may not be
constant.

(iii) (a)

(b) The middle section corresponds well, however extremes detract from normal
distribution. It appears to have ‘fat’ tails.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Answers Page 9

(iv)

This diagram appears to display constant variance.

Point 99 has the most influence, but there is no point that is both an outlier and has a high
influence.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.08
(ii)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 0.5259107 0.7540490 0.5464611
Sepal.Width 0.5259107 1.0000000 0.5605221 0.6639987
Petal.Length 0.7540490 0.5605221 1.0000000 0.7866681
Petal.Width 0.5464611 0.6639987 0.7866681 1.0000000

(iii) The variable with the greatest correlation with Petal.Width is Petal.Length. The fitted
model has an adjusted R2 = 0.6109 .

(iv) (a) The variable with the next highest correlation with Petal.Width is Sepal.Width.
The fitted model has an adjusted R2 = 0.6783 .

(b) The adjusted R2 of fit2 is greater than that of fit1. Therefore we should keep
both variables.

(v) (a) The fitted model has an adjusted R2 = 0.6878 .

(b) The adjusted R2 of fit3 is greater than that of fit2. Therefore we should keep
all three variables.

(vi) In the summary the Sepal.Length parameter is not significant – which suggests we should
remove it.

The ANOVA printout shows there is not a significant improvement in fit when we add the
last variable which also suggests that we should not include Sepal.Length.

However, the adjusted R2 does increase marginally. The problem is caused by the
overlap between the variables – PCA would remove this issue.

(vii) (a) The Sepal.Length parameter is not significantly different from zero.

(b) The adjusted R2 has fallen from 0.6878 to 0.6783. Hence, we should not remove
Sepal.Length.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-11b: Multivariate Linear Regression – Answers Page 11

Exercise 11b.09
(i) (a) Petal.Length and Sepal.Width have greatest significance.

(b) The adjusted R2 has fallen from 0.6878 to 0.6814. The ANOVA printout confirms
there is not a significant improvement. Therefore we should not add the
interactive term.

(ii) Adding Petal.Length:Sepal.Length increases the adjusted R2 from 0.6878 to 0.6919.


However, the parameter is not significant in the summary which is confirmed by the
ANOVA printout.

Adding Sepal.Width:Sepal.Length reduces the adjusted R2 from 0.6878 to 0.681. The


parameter is also not significant in the summary which is confirmed by the ANOVA
printout.

(iii) Since no two-way terms have been included, we should not consider adding a three-way
interaction term.

(iv) (a) The three way interaction term is not significant. Removing it increases the
adjusted R2 from 0.6863 to 0.691.

(b) Sepal.Width:Petal.Length is least significant. Removing this increases the adjusted


R2 from 0.691 to 0.6975.

The next least significant 2 way interaction term is Sepal.Length:Sepal.Width.


However, removing this decreases the adjusted R2 from 0.6975 to 0.6919. So we
should not remove this term.

Similarly, removing the insignificant Sepal.Length:Petal.Length parameter causes


the adjusted R2 to decrease from 0.6975 to 0.681. So we should not remove this
term either.

(c) It is not appropriate to remove single terms when we have 2 way interactions that
involve them.

(v) Even though we have maximised the adjusted R2 none of the coefficients are significant.
So we need a better method to use. Hence we tend to use the ANOVA test between
models to check improvement instead.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-11b: Multivariate Linear Regression – Answers

Exercise 11b.10
There is currently no Exercise 11b.10.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 1

12
GLMs
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-12: GLMs – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 3

Exercise 12.01
The built in data set iris contains measurements (in cm) of the variables sepal length, sepal
width, petal length and petal width, respectively, for 50 flowers from each of 3 species (Iris
setosa, versicolor, and virginica) of iris.

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
… … … … …
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica

(i) Extract the four measurements for the versicolor species only and store them in the 50×4
data frame, VDF.

(ii) Using the versicolor iris data fit a linear regression model, lmodel, with Petal.Width as
the response variable and Sepal.Length, Sepal.Width and Petal.Length as explanatory
variables:

α + β1 x1 + β2 x2 + β3 x3
y=

State the coefficients of the fitted model.

(iii) (a) Use the function glm to fit an equivalent generalised linear model, glmodel, to
the versicolor iris data. State explicitly the appropriate family and the link
function in the arguments.

(b) Confirm that the estimated parameters are identical to the linear model in
part (ii).

(c) Give a shortened version of the R code from part (iii)(b) that will fit the same GLM
as part (iii)(a) but makes use of the default settings of the glm function.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-12: GLMs – Exercises

Exercise 12.02
The built in data set iris contains measurements (in cm) of the variables sepal length, sepal
width, petal length and petal width, respectively, for 50 flowers from each of 3 species (setosa,
versicolor, and virginica) of iris.

(i) (a) Assuming that the measurements are normally distributed, use the function glm
to fit a generalised linear model, glmodel1, with Petal.Width as the response
variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) , Petal.Length (x3 ) and
Species (γ i ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3 + γ i
y=

(b) Obtain the coefficients by extracting them from glmodel1.

(c) Explain what has happened to the coefficient for the setosa species.

(ii) State the code for a linear predictor which also included a quadratic effect from
Petal.Length.

The built in data set esoph contains data from a case-control study of oesophageal cancer in Ille-
et-Vilaine, France. agegp contains 6 age groups, alcgp contains 4 alcohol consumption groups,
tobgp contains 4 tobacco consumption groups, ncases gives the number of observed cases of
oesophageal cancer out of the group of size ncontrols.

(iii) Fit a binomial generalised linear model, glmodel2, with a logit link function to estimate
the probability of obtaining oesophageal cancer as the response variable and a linear
predictor containing the main effects of agegp (α i ) , alcgp (β j ) and tobgp (γ k ) :

y =α i + β j + γ k

(iv) State the code for a linear predictor which also has interaction between alcohol and
tobacco.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 5

Exercise 12.03
The first two parts of this exercise use the iris generalised linear model, glmodel1, with
Petal.Width (y) as the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) , Petal.Length
(x3 ) and Species (γ i ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3 + γ i
y=

(i) (a) State the statistic, p-value and conclusion for a test of H0 : β2 = 0 vs H1 : β2 ≠ 0 .

(b) Use R to extract only the p-value for this test.

(ii) Use confint to:

(a) obtain a 90% confidence interval for the versicolor coefficient.

(b) test, at the 5% level, whether β1 = −0.2 .

The next two parts of this exercise use the oesophageal cancer binomial probability generalised
linear model, glmodel2, with the probability of obtaining oesophageal cancer as the response
variable and a linear predictor containing the main effects of agegp (α i ) , alcgp (β j ) and
tobgp (γ k ) :

y =α i + β j + γ k

(iii) State the p-value and conclusion for a test that the second non-base category in the age
group is zero.

(iv) Use confint to:

(a) obtain a 99% confidence interval for the third non-base coefficient in the alcohol
group.

(b) test, at the 5% level, whether the first non-base coefficient in the tobacco group is
equal to 0.5.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-12: GLMs – Exercises

Exercise 12.04
The first three parts of this exercise use the iris generalised linear model, glmodel1, with
Petal.Width (y) as the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) , Petal.Length
(x3 ) and Species (γ i ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3 + γ i
y=

(i) (a) Obtain the residual degrees of freedom and residual deviance for this model.

(b) Use R to extract only these numbers from the model.

(ii) Find and extract the AIC for this model.

(iii) (a) Create a new GLM, glmodel01, which does not contain Species as an
explanatory variable.

(b) Use the AIC to compare glmodel01 and glmodel1.

(c) Use anova to carry out a formal F test to compare these two models.

The last part of this exercise uses the oesophageal cancer binomial probability generalised linear
model, glmodel2, with the probability of obtaining oesophageal cancer as the response variable
and a linear predictor containing the main effects of agegp (α i ) , alcgp (β j ) and tobgp (γ k ) :

y =α i + β j + γ k

(iv) (a) Create a new GLM, glmodel02, which does not contain tobgp as an explanatory
variable.

(b) Use the AIC to compare glmodel02 and glmodel2.

(c) Use anova to carry out a formal χ 2 test to compare these two models.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 7

Exercise 12.05
We are fitting generalised linear models to the iris data with Petal.Width as the response variable
and a combination of Sepal.Length, Sepal.Width, Petal.Length and Species as explanatory
variables, assuming the measurements are normally distributed.

Forward selection

(i) Fit the null generalised linear model, fit0, to the iris data.

First covariate

(ii) (a) By examining the scatterplot of all the pairs of variables explain why either
Species or Petal.Length should be chosen as our first explanatory variable.

(b) Fit a linear regression model, fit1a, with Petal.Width as the response variable
and Species as the only explanatory variable. Determine the AIC for fit1a.

(c) Fit a linear regression model, fit1b, with Petal.Width as the response variable
and Petal.Length as the only explanatory variable. Determine the AIC for fit1b.

(d) By examining the AIC of fit1a and fit1b choose the model that provides the
best fit to the data.

(e) Use the anova function to carry out an F test comparing fit0 and the model
chosen in part (ii)(d).

Second covariate

(iii) (a) Fit a linear regression model, fit2, with Petal.Width as the response variable
and both Species and Petal.Length as the explanatory variables.

(b) By examining the AIC and carrying out an F test compare fit2 and the model
chosen in part (ii)(d).

Third covariate

(iv) (a) Fit a linear regression model, fit3a, with Petal.Width as the response variable
and Species, Petal.Length and Sepal.Length as explanatory variables. Determine
the AIC for fit3a.

(b) Fit a linear regression model, fit3b, with Petal.Width as the response variable
and Species, Petal.Length and Sepal.Width as explanatory variables. Determine
the AIC for fit3b.

(c) By examining the AIC of fit3a and fit3b choose the model that provides the
best fit to the data.

(d) Use the anova function to carry out an F test comparing fit2 and the model
chosen in part (iv)(c).

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-12: GLMs – Exercises

Fourth covariate

(v) (a) Fit a linear regression model, fit4, with Petal.Width as the response variable
and all four covariates as the explanatory variables.

(b) By examining the AIC and carrying out an F test compare fit4 and the model
chosen in part (iv)(c).

Fifth covariate

(vi) (a) Fit a linear regression model, fit5a, with Petal.Width as the response variable,
all four covariates as main effects and an interactive term between Species and
Sepal.Width as explanatory variables. Determine the AIC for fit5a.

(b) Fit a linear regression model, fit5b, with Petal.Width as the response variable,
all four covariates as main effects and an interactive term between Petal.Length
and Sepal.Width as explanatory variables. Determine the AIC for fit5b.

(c) By examining the AIC of fit5a and fit5b choose the best fit to the data.

(d) Use the anova function to carry out an F test comparing fit4 and the model
chosen in part (vi)(c).

Sixth covariate

(vii) (a) Fit a linear regression model, fit6, with Petal.Width as the response variable, all
four covariates as main effects, the interactive terms between Species and
Sepal.Width, and between Petal.Length and Sepal.Width as explanatory variables.

(b) By examining the AIC and carrying out an F test compare fit6 and the model
chosen in part (vi)(c).

Seventh covariate

(viii) Show that adding interaction between Petal.Length and Sepal.Length to fit6 leads to a
drop in the AIC and a significant improvement in the residual deviance.

It can be shown that adding other two-way interactions terms do not improve the AIC nor lead to
a significant improvement in residual deviance.

(ix) Explain why we should not add any three-way interaction terms at this stage.

Backward selection

(x) Fit the full generalised linear model, fitA, to the iris data to model Petal.Width using
Species*Petal.Length*Sepal.Length*Sepal.Width and show the AIC is −109.79 .

(xi) Show that the generalised linear model, fitB, which removes the four-way interaction
term leads to an improvement in the AIC.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 9

It can be shown that two three-way interaction terms have parameters that are insignificant.

(xii) (a) Update the model fitB to create fitC1 by removing the three-way interaction
between Species, Petal.Length and Sepal.Width. Determine the AIC for fitC1.

(b) Update the model fitB to create fitC2 by removing the three-way interaction
between Species, Petal.Length and Sepal.Length. Determine the AIC for fitC2.

Let fitC be the model from parts (xii)(a) and (b) which produces the biggest improvement in the
AIC.

(xiii) It can be shown that another three-way interaction term has insignificant parameters at
the 10% level. Use the summary function to determine which interaction term this is.
Create the generalised linear model, fitD, which removes it and show that there is an
improvement in the AIC.

(xiv) Show that generalised linear model, fitE, which removes another insignificant three-
way interaction term also leads to an improvement in the AIC.

(xv) Use the summary function to show that the parameter of the final three-way interaction
term is still significant but that the two-way interaction term between Species and
Sepal.Length is not. Update the model fitE to create fitF by removing this two-way
interaction and show it leads to an improvement in the AIC.

(xvi) Use the summary function to show that the parameters of three of the two-way
interaction terms are insignificant at the 5% level. Show that removing any of these
interaction terms leads to no improvement in the AIC.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-12: GLMs – Exercises

Exercise 12.06
The first three parts of this exercise use the iris generalised linear model, glmodel1, with
Petal.Width (y) as the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) , Petal.Length
(x3 ) and Species (γ i ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3 + γ i
y=

(i) Obtain the value of the linear predictor for glmodel1 for a versicolor iris with sepal
length 5.1cm, sepal width 3.5cm and petal length 1.4cm:

(a) from first principles by extracting the coefficients from glmodel1.

(b) using the predict function.

(ii) (a) Explain why the expected petal width of a versicolor iris will be the same as the
linear predictor in part(i).

(b) Show that this is the case by using the predict function.

(iii) Explain why there is no constant for the setosa species in the linear predictor.

(iv) Obtain the expected petal width of a setosa iris with sepal length 5.1cm, sepal width
3.5cm and petal length 1.4cm:

(a) from first principles by extracting the coefficients from glmodel1.

(b) using the predict function.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Exercises Page 11

Exercise 12.07
The first two parts of this exercise use the iris generalised linear model, glmodel1, with
Petal.Width (y) as the response variable and Sepal.Length (x1 ) , Sepal.Width (x2 ) ,
Petal.Length (x3 ) and Species (γ i ) as explanatory variables:

α + β1 x1 + β2 x2 + β3 x3 + γ i
y=

(i) Obtain the raw residuals for the generalised linear model:

(a) from first principles using the fitted command

(b) using the residuals function.

(ii) Show that the raw residuals are the same as the:

(a) deviance residuals

(b) Pearson residuals.

(iii) By examining the median, lower and upper quartiles of the residuals, comment on their
skewness.

(iv) (a) Obtain a plot of the residuals against the fitted values.

(b) Comment on the constancy of the variance of the residuals and whether a normal
model is appropriate.

(v) (a) Obtain a Q-Q plot of the residuals.

(b) Comment on the normality assumption.

(vi) Examine the final two graphs obtained by plot(glmodel1) and comment.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-12: GLMs – Answers Page 1

12
GLMs
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-12: GLMs – Answers

Exercise 12.01
(ii)
coefficients of (Intercept) Sepal.Length Sepal.Width Petal.Length

-0.16864 -0.07398 0.22328 0.30875

(iii) (a) glmodel <-


glm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length,
data=VDF,family=gaussian (link="identity"))

(c) glm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length,
data=VDF)

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Answers Page 3

Exercise 12.02
(i) (b)

(Intercept) Sepal.Length Sepal.Width


-0.47313802 -0.09293364 0.24220047
Petal.Length Speciesversicolor Speciesvirginica
0.24220288 0.64811253 1.04637025

(c) The setosa coefficient has been absorbed into the ‘intercept’ coefficient.

(ii) glm(Petal.Width~Sepal.Length+Sepal.Width+Petal.Length+Species
+I(Petal.Length^2))

(iv) Either agegp+alcgp+tobgp+alcgp:tobgp or agegp+alcgp*tobgp

ie glm(cbind(ncases,ncontrols) ~
agegp+alcgp+tobgp+alcgp:tobgp, family=binomial)

or glm(cbind(ncases,ncontrols) ~ agegp+alcgp*tobgp,
family=binomial).

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-12: GLMs – Answers

Exercise 12.03
(i) (a) Statistic = 5.072, p-value = 1.20e-06 hence we reject H0 and conclude that
β2 ≠ 0 .

(ii) (a) (0.446, 0.851)

(b) A 95% confidence interval for β1 is (−0.180, −0.00555) . Since it does not contain
−0.2 we reject the null hypothesis at the 5% level and conclude that β1 ≠ −0.2 .

(iii) p-value = 0.02362 hence we reject H0 and conclude that the coefficient is not zero.

(iv) (a) (−0.153, 0.668)

(b) A 95% confidence interval for the first non-base coefficient in the tobacco group is
(0.209, 0.972). Since it contains 0.5 we have insufficient evidence to reject the
null hypothesis at the 5% level and conclude that the coefficient is 0.5.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Answers Page 5

Exercise 12.04
(i) (a) The residual degrees of freedom are 144 and the residual deviance is 3.998.

(ii) The AIC is −104.06 .

(iii) (b) The AIC for glmodel01 is −63.5 . Since the AIC for glmodel1 is lower it is
considered a better fit. Therefore we should include Species in our model.

(c) The p-value is 5.143e-10 which is way less than 5% so we would reject H0 . The
model with Species significantly reduces the scaled deviance and hence is a better
fit.

(iv) (b) The AIC for glmodel2 is 225.5, whereas the AIC for glmodel02 is 230.1. Since
the AIC for glmodel2 is lower it is considered the better model. Therefore we
should include tobacco in our model.

(c) The p-value is 0.0141 which is smaller than 5% so we would reject H0 . The model
with tobacco significantly reduces the scaled deviance and is a better fit.

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B-12: GLMs – Answers

Exercise 12.05
First covariate

(ii) (a) The strongest correlation is between Petal.Width and Species or Petal.Width and
Petal.Length.

(b) −45.29

(c) −43.59

(d) The AIC for fit1a (−45.29) is the lower out of fit1a and fit1b. So it is
considered the better model.

(e) The p-value is 2.2e-16 which is much lower than 5% so we would reject H0 . The
model with Species significantly reduces the residual deviance and is a better fit.

Second covariate

(iii) (b) Since the AIC for fit2, −83.41 , is lower than fit1a it is considered the better
model so we should also include Petal.Length in our model.

The p-value is 4.409e-10 which is much lower than 5% so we would reject H0 .


The model with Petal.Length significantly reduces the residual deviance and is a
better fit.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Answers Page 7

Third covariate

(iv) (a) −81.41

(b) −101.60

(c) The AIC for fit3a is −81.41 (which is worse than fit2) and the AIC for fit3b
is −101.6 which is an improvement. The best model is fit3b.

(d) The p-value is 1.03e-05 which is much lower than 5% so we would reject H0 . The
model with Sepal.Width significantly reduces the residual deviance and is a better
fit.

Fourth covariate

(v) (b) Since the AIC for fit4, −104.06 , is lower than fit3b it is considered the better
model so we should also include Sepal.Length in our model.

The p-value is 0.03889 which is less than 5% so we would reject H0 . The model
with Sepal.Length significantly reduces the residual deviance and is a better fit.

Fifth covariate

(vi) (a) −109.88

(b) −107.46

(c) The AIC for fit5a is −109.88 and the AIC for fit5b is −107.46 . Both are an
improvement but fit5a is lower and so is the better fit.

(d) The p-value is 0.009593 which is less than 5% so we would reject H0 . The model
with Species:Sepal.Width significantly reduces the residual deviance and is a
better fit.

Sixth covariate

(vii) (b) Since the AIC for fit6, −114.26 , is lower than fit5a it is considered the better
model so we should also include Petal.Length:Sepal.Width in our model.

The p-value is 0.0145 which is less than 5% so we would reject H0 . The model
with Petal.Length:Sepal.Width significantly reduces the residual deviance and is a
better fit.

Seventh covariate

(viii) The AIC drops to −116.55 and the p-value is 0.04566 so there is a significant
improvement.

(ix) We can’t add Petal.Length:Sepal.Width:Sepal.Length at this stage as there are no 2 way


interactions between the last two. Nor can we add any three way interaction involving
species as there are no 2 way interactions between species and the other covariates.

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B-12: GLMs – Answers

Backward selection

(xi) The AIC falls to −112.49 .

(xii) (a) The AIC for fitC1 is −113.76 .

(b) The AIC for fitC2 is −115.13 .

(xiii) The AIC for fitD falls to −117.22 .

(xiv) The AIC for fitE falls to −117.64 .

(xv) The AIC for fitF falls to −120.62 .

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Answers Page 9

Exercise 12.06
(i) 0.8877986 cm

(ii) (a) The canonical link function for the normal distribution is the identity function.
Hence the mean response variable is equal to the linear predictor.

(iii) It has been absorbed into the intercept parameter.

(iv) 0.2396861 cm

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B-12: GLMs – Answers

Exercise 12.07
(i) −0.0396860931, ... , − 0.1867598541

(iii) Min 1Q Median 3Q Max


−0.59239 −0.08288 −0.01349 0.08773 0.45239

The median is nearly zero, lower and upper quartiles have nearly equal absolute values.
So middle 50% of the data is nearly symmetrical.

(iv) (a)

(b) The line is fairly horizontal – so variance of residuals is fairly constant and hence
the normal model is appropriate.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-12: GLMs – Answers Page 11

(v) (a)

(b) The middle section is good but there are issues in the extremes.

The residuals at the lower end are more negative than expected – so the fitted
values are too large.

The residuals at the upper end are more positive than expected – so the fitted
values are too small.

So the current model has very ‘fat’ tails which is not ideal.

The Actuarial Education Company © IFE: 2019 Examinations


Page 12 CS1B-12: GLMs – Answers

(vi)

The variance of the residuals is increasing – implying a defect in our model. Interaction
terms may resolve this problem.

No data points have undue influence on our model.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-13: Bayesian Statistics – Exercises Page 1

13
Bayesian Statistics
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-13: Bayesian Statistics – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-13: Bayesian Statistics – Exercises Page 3

Exercise 13.01
The probability of a person dying from a particular disease is p . The prior distribution of p is
beta with parameters a = 2 and b = 3 .

(i) (a) Create a vector x which contains 1,000 zeros.

(b) Use a loop to obtain 1,000 simulations of the posterior outcome (where 1 denotes
death and 0 denotes survival) for a single person. Use the functions
set.seed(77), rbeta and rbinom and store the i th outcome in the i th
element of x.

(c) Hence, obtain an empirical Bayesian estimate for p under quadratic loss.

The Bayesian estimate for p under quadratic loss for a single outcome x is:

x +a
a + b +1

(ii) (a) Create a vector pm which contains 1,000 zeros.

(b) Repeat part (i)(b) but also store the i th theoretical Bayesian estimate in the i th
element of pm.

(c) Compare the average empirical and theoretical Bayesian estimates under
quadratic loss.

A biologist is now going to analyse samples of 12 people.

(iii) (a) Create a vector xp which contains 1,000 zeros.

(b) Use a loop to obtain 1,000 simulations of the posterior probability of death, based
on 1,000 samples each of 12 people. Use the functions set.seed(79),
rbeta and rbinom and store the estimate for the probability in the i th
outcome in the i th element of xp.

(c) Hence, obtain an empirical Bayesian estimate for p under quadratic loss.

The Bayesian estimate for p under quadratic loss, given x deaths in a sample of size n is:

x +a
a+b+n

(iv) (a) Repeat part (iii)(b) but also store the i th theoretical Bayesian estimate in the i th
element of pm.

(b) Compare the average empirical and theoretical Bayesian estimates under
quadratic loss.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-13: Bayesian Statistics – Answers Page 1

13
Bayesian Statistics
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-13: Bayesian Statistics – Answers

Exercise 13.01
(i) (c) 0.382

(ii) (c) Average empirical estimate = 0.382, theoretical Bayesian estimate = 0.397. There
is about a 4% difference between the two quadratic loss estimates.

(iii) (c) 0.4097 (or 0.4023 if use rbinom(n,1,p))

(iv) (b) Average empirical estimate = 0.4097 (or 0.4023 if use rbinom(n,1,p)),
theoretical Bayesian estimate = 0.4068 (or 0.4016 if use rbinom(n,1,p)).
There is less than a 1% difference between the two quadratic loss estimates.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-14: Credibility Theory – Exercises Page 1

14
Credibility Theory
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-14: Credibility Theory – Exercises

Data requirements
These exercises do not require you to upload any data files.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-14: Credibility Theory – Exercises Page 3

Exercise 14.01
The probability of a person dying from a particular disease is p . The prior distribution of p is
beta with parameters a = 2 and b = 3 .

A statistician is going to calculate a credibility estimate of p :

x a
Z × + (1 − Z ) ×
n a+b

where the credibility factor is:

n
Z=
n+a+b

The statistician is going to take samples of 5 people to calculate the credibility estimate.

(i) (a) Create a vector cp which contains 1,000 zeros.

(b) Use a loop to obtain 1,000 simulations of the posterior probability of death, based
on 1,000 random samples each containing 5 people. Use the functions
set.seed(79), rbeta and rbinom and store the credibility estimate of
the i th outcome in the i th element of cp.

(ii) Plot a labelled bar chart of the simulated credibility estimates for p using the functions
barplot and table.

(iii) Calculate the mean and standard deviation of the empirical credibility estimates.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B-14: Credibility Theory – Answers Page 1

14
Credibility Theory
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-14: Credibility Theory – Answers

Exercise 14.01
(ii)

(iii) mean = 0.403, standard deviation = 0.1393931 (or 0.4046 and 0.1402088 if use
rbinom(n,1,p))

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-15: EBCT – Exercises Page 1

15
EBCT
Exercises

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-15: EBCT – Exercises

Data requirements
These exercises require the following data files:

• insurance claims.txt
• insurance volumes.txt

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-15: EBCT – Exercises Page 3

Exercise 15.01
The table below shows the aggregate claim amounts (in £m ) for an international insurer’s fire
portfolio for a 5-year period. The claim amounts are subdivided by country of origin.

Total claim
Year
amount
Country 1 2 3 4 5
A 48 53 42 50 59
B 64 71 64 73 70
C 85 54 76 65 90
D 44 52 69 55 71

This data is contained in the file: “insurance claims.txt”

(i) Load the data frame and store it in the matrix ins.claim.

(ii) Store the number of years and number of countries in the objects n and N, respectively.

An actuary is using EBCT Model 1 to set premiums for the coming year.

(iii) (a) Use mean and rowMeans (or otherwise) to calculate an estimate of E[m(θ )] and
store it in the object m.

(b) Use apply, var and mean to calculate an estimate of E[ s2 (θ )] and store it in
the object s.

(c) Use var and rowMeans (or otherwise) and your result from part (iii)(b) to
calculate an estimate of var[m(θ )] and store it in the object v.

(iv) Use your results from parts (ii) and (iii) to calculate the credibility factor and store it in the
object Z.

(v) Calculate the EBCT premiums for each of the four countries.

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B-15: EBCT – Exercises

Exercise 15.02
This question uses the aggregate claim amounts (in £m ) for an international insurer’s fire
portfolio for a 5-year period from the previous exercise, which should be stored in the matrix
ins.claim.

The table below shows the volumes of business for each country in each year for the international
insurer.

Volume Year
Country 1 2 3 4 5
A 12 15 13 16 10
B 20 14 22 15 30
C 5 8 6 12 4
D 22 35 30 16 10

This data is contained in the file: “insurance volumes.txt”

(i) Load the data frame of volumes and store it in the matrix ins.volume.

An actuary is using EBCT Model 2 to set premiums for the coming year.

(ii) Calculate the claims per unit of risk volume and store them in the matrix X.

(iii) (a) Use rowSums to calculate the total policies for each country and store them in
the object Pi.

(b) Use sum to calculate the overall total policies for all countries and store it in the
object P.
N
(c) calculate P∗ Nn1−1 ∑ i =1 Pi (1 − Pi P ) and store it in
Use ncol, nrow and sum to=
the object Pstar.

(iv) (a) Calculate E[m(θ )] and store it in the object m.

(b) Use rowSums to calculate the mean claims per policy for each country and store
it in the object Xibar.

(c) Use rowSums and mean to calculate E[ s2 (θ )] and store it in the object s.

(d) Use sum and rowSums and your result from part (iii)(c) to calculate var[m(θ )]
and store it in the object v.

(v) Use your results from parts (iii) and (iv) to calculate the credibility factor for each country
and store the values in the object Zi.

(vi) If the volumes of business for each country for the coming year are 20, 25, 10 and 12,
respectively, calculate the EBCT Model 2 premiums for each of the four countries.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-15: EBCT – Answers Page 1

15
EBCT
Answers

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B-15: EBCT – Answers

Exercise 15.01
(iii) (a) m = 62.75

(b) s = 101.2

(c) v = 90.33

(iv) 0.8169485

(v) Country A premium = 52.66, Country B premium = 67.37, Country C premium = 71.94,
Country D premium = 59.03.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B-15: EBCT – Answers Page 3

Exercise 15.02

(iii) (a) =P1 66,


= P2 101,
= P3 35,
= P4 113

(b) P = 315
(c) P∗ = 11.80852

(iv) (a) m = 3.984127

(b) X1 = 3.818,
= X2 3.386,
= X3 10.57,
= X 4 2.575

(c) s = 104.642

(d) v = 6.538782

(v) Z1 = 0.8048,
= =
Z2 0.8632, =
Z3 0.6862, Z4 0.8759

(vi) The credibility premiums for countries A, B, C and D are 77.0, 86.7, 85.0, 33.0,
respectively.

The Actuarial Education Company © IFE: 2019 Examinations


Subject CS1: Assignment Y1
2019 Examinations

The time allowed for this assignment is 1¾ hours.

Attempt all of the questions, as far as possible under exam conditions.

If you are having your assignment marked by ActEd, please follow these instructions carefully:

– Download and open the Word document ‘CS1 Assignment Y1 Answer Booklet 12345’.
Follow the instructions provided in the template and enter your answers where
indicated.

– In your submission include sufficient R code for the markers to work out how you
arrived at your answers.

– Begin your answer to each question on a new page. Only send ActEd one Word file
(created using the template) when you have completed the assignment.

– When submitting your script, email your word file to [email protected].

– Assignment marking is not included in the price of the course materials. Please
purchase Series Y Marking or a Marking Voucher before submitting your script.

– We only accept the current version of assignments for marking, and so you can only
submit this assignment in the sessions leading to the 2019 exams.

– We only accept Word files produced in Office 2007, 2010 or 2013 format. Submitted
assignments will not be marked if any of the files are suspected to have been affected
by a computer virus or to have been corrupted.

– You should aim to submit this script for marking by the recommended submission
date. The recommended and deadline dates for submission of this assignment are
listed on the summary page at the back of this pack and on our website at
www.ActEd.co.uk.

– Scripts received after the deadline date will not be marked, unless you are using a
Marking Voucher. It is your responsibility to ensure that scripts reach ActEd in good
time. If you are using Marking Vouchers, then please make sure that your script
reaches us by the Marking Voucher deadline date to give us enough time to mark
and return the script before the exam.

– In addition to this paper, you should have available actuarial tables and an electronic
calculator.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B: Assignment Y1 Questions Page 1

Y1.1 In a particular portfolio of 1,000 life assurance policyholders, deaths are assumed to occur
independently with a probability of 0.05.

(i) Calculate the median number of deaths. [4]

(ii) Calculate the probability that the number of deaths, D , lies between 45 and 59 inclusive:

(a) exactly

(b) using a Poisson approximation

(c) using a normal approximation. [16]


[Total 20]

Y1.2 (i) (a) Calculate 1,000 simulated values from a U(0,1) distribution using set.seed(13).

(b) Hence, determine 1,000 simulations from the distribution which has cumulative
distribution function:

1
F (x) = 1 − , x >0 [7]
1+ x

(ii) Use your simulations from part (i)(b) to:

(a) plot a labelled graph of the empirical PDF of the simulations for the range
x ∈(0,200)

(b) calculate the empirical mean, standard deviation and coefficient of skewness and
comment on the shape of the distribution. [13]
[Total 20]

Y1.3 A company that makes Gizmos™ is trying to ascertain the percentage of consumers who are
aware of the existence of its product. A study is to be carried out in which a random sample of
the population will be interviewed and asked whether or not they are aware of it.

(i) In a sample of 20 people, 10 had heard of Gizmos™. Determine the width of an exact 95%
confidence interval for the underlying population proportion. [5]

(ii) Show exhaustively for a sample of size 20 that the greatest width of an exact binomial
confidence interval occurs when half of the sample have heard of Gizmos™. [8]
[Total 13]

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B: Assignment Y1 Questions

Y1.4 An insurer is measuring the inter-arrival times between notification of consecutive claims from a
portfolio of policies with a low claim rate. The insurer believes that these inter-arrival times may
have an exponential distribution with unknown parameter λ . A random sample gives the
following time periods (in days) between consecutive claims:

14, 4, 3, 2, 3, 1, 5, 10, 4, 23

(i) Derive a 99% confidence interval for the exponential parameter λ using a non-parametric
bootstrap and set.seed(17), based on a sample of 1,000 values. [10]

After extensive analysis it is decided that the inter-arrival times have an exponential distribution
with parameter 0.145.

(ii) (a) Determine 1,000 simulated means from samples of size 10 from this exponential
distribution using set.seed(19).

(b) Plot a histogram of the densities of these sample means.

(c) Use the results of part (ii)(a) to calculate the empirical probability that the sample
mean is less than 5. [14]

A statistician points out that if X  Exp(λ ) , then X  Gamma(n, nλ ) .

(iii) Plot the PDF of the appropriate gamma distribution on the histogram of part (ii)(b) and
comment. [5]

(iv) Calculate the exact probability that the sample mean is less than 5 using this result and
compare to part (ii)(c). [5]

(v) (a) Determine 1,000 simulated values from the appropriate gamma distribution using
set.seed(21).

(b) Plot a Q-Q plot of the sample means from part (ii)(a) and the simulations from
part (v)(a) and comment on the result. [13]
[Total 47]

END OF PAPER

© IFE: 2019 Examinations The Actuarial Education Company


Assignment deadlines

For the session leading to the April 2019 exams – CS1B, CS2B, CM1B & CM2B Subjects

Marking vouchers

Subjects Assignments

CS1B 6 March 2019

CS2B, CM1B 13 March 2019

CM2B 20 March 2019

Series Y Assignments

Recommended
Subjects Assignment Final deadline date
submission date

CS1B 2 January 2019 30 January 2019

CS2B, CM1B Y1 9 January 2019 6 February 2019

CM2B 16 January 2019 13 February 2019

CS1B 13 February 2019 6 March 2019

CS2B, CM1B Y2 20 February 2019 13 March 2019

CM2B 27 February 2019 20 March 2019

We encourage you to work to the recommended submission dates where possible.

If you submit your assignment on the final deadline date you are likely to receive your script back less than a
week before your exam.

The Actuarial Education Company © IFE: 2019 Examinations


Assignment deadlines

For the session leading to the September 2019 exams – CS1B, CS2B, CM1B & CM2B Subjects

Marking vouchers

Subjects Assignments

CS1B, CS2B 21 August 2019

CM1B, CM2B 28 August 2019

Series Y Assignments

Recommended
Subjects Assignment Final deadline date
submission date

CS2B 26 June 2019 24 July 2019

CS1B 19 June 2019 31 July 2019


Y1
CM1B 3 July 2019 31 July 2019

CM2B 26 June 2019 7 August 2019

CS2B 31 July 2019 14 August 2019

CS1B 31 July 2019 21 August 2019


Y2
CM1B 7 August 2019 21 August 2019

CM2B 7 August 2019 28 August 2019

We encourage you to work to the recommended submission dates where possible.

If you submit your assignment on the final deadline date you are likely to receive your script back less than a
week before your exam.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y1 Solutions Page 1

Solution Y1.1

(i) Median

n <- 1000

p <- 0.05

qbinom(0.5,n,p) [3]

This gives a median of 50. [1]

Alternatively, students may do it directly:

qbinom(0.5,1000,0.05)

(ii)(a) Exact probability

pbinom(59,n,p) - pbinom(44,n,p) [2]

This gives a probability of 0.69858. [1]

Alternatively, students may do it directly:

pbinom(59, 1000,0.05) - pbinom(44, 1000,0.05)

(ii)(b) Poisson approximation

Since Bin(n, p)  Poi(np) we have a Poisson distribution with mean 50. [2]

ppois(59,n*p) - ppois(44,n*p) [2]

This gives a probability of 0.68669. [1]

Alternatively, students may do it directly:

ppois(59,50) - ppois(44,50)

(ii)(c) Normal approximation

Since Bin(n, p)  N(np, npq) we have a N(50,47.5) distribution. [2]



pnorm(59.5,n*p,sqrt(n*p*(1-p))) - pnorm(44.5,n*p,sqrt(n*p*(1-p)))

[2 for continuity correction, 3 for code = 5]

This gives a probability of 0.70353. [1]

Alternatively, students may do it directly:

pnorm(59.5,50,sqrt(47.5)) - pnorm(44.5,50,sqrt(47.5))

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B: Assignment Y1 Solutions

Solution Y1.2

(i)(a) Simulate U(0,1)

set.seed(13) [1]

u <- runif(1000,0,1) [2]

(i)(b) Simulate random variable

Rearranging u  F (x) to get x  F 1 (u) :

1 1
u 1  x 1 [2]
1 x 1u

So the R code is:

x <- (1-u)^(-1)-1 [2]

(ii)(a) Labelled graph of empirical PDF of simulations

plot(density(x),xlim=c(0,200),main="Empirical PDF of
simulations", xlab="x",col="blue")
[4]
[Subtract 1 mark per error, colour not needed]

(ii)(b) Empirical mean, standard deviation and coefficient of skewness

mean(x)

This gives an answer of 14.550. [2]

sd(x)

This gives an answer of 199.76. [2]

skew <- sum((x-mean(x))^3)/length(x)

skew/(sd(x)^3)

This gives an answer of 27.723. [3]

The huge standard deviation for such a small mean indicates that we have a very long tail (as the
values must be greater than zero). [1]

This is confirmed by the very large positive coefficient of skewness. [1]

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y1 Solutions Page 3

Solution Y1.3

(i) Width of exact confidence interval

test <- binom.test(10,20)

test$conf[2]-test$conf[1]

This gives an answer of 45.6%. [5]

Alternatively, students could just use binom.test(10,20) to obtain a confidence interval of


(0.27196, 0.72804) .

(ii) Show greatest width occurs when there are 10 successes

width <- rep(0,20)

for (i in 1:20)

{test <-binom.test(i,20);width[i]<-test$conf[2]-test$conf[1]}

width

max(width)
[7]

By examining the widths, we can see the greatest width (0.4560843) occurs for 10 successes. [1]

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B: Assignment Y1 Solutions

Solution Y1.4

(i) Non-parametric 99% confidence interval

Since ˆ  1 X we could obtain a 99% confidence interval for the sample mean and then find its
reciprocal.

x <- c(14, 4, 3, 2, 3, 1, 5, 10, 4, 23)

bm <- rep(0,1000)

set.seed(17)

for(i in 1:1000)

{y <- sample(x,replace=TRUE); bm[i] <- mean(y)}

[6]

Alternatively, students could use the replicate command:

set.seed(17)

bm <- replicate(1000,mean(sample(x,replace=TRUE)))

Hence a 99% confidence interval for the mean is given by:

ci <- quantile(bm,c(0.005,0.995)) [2]

Therefore a 99% confidence interval for  is given by:

1/ci [1]

This is (0.08129, 0.3705) to 4 SF. [1]

Note that you will have to swap the numbers to get the confidence interval in the correct form.

(ii)(a) Simulate 1,000 sample means from Exp(0.145)

xbar <- rep(0,1000)

set.seed(19)

for (i in 1:1000)

{x<-rexp(10,0.145);xbar[i] <- mean(x)}


[6]

Alternatively students could use the replicate command:

set.seed(19)

xbar <- replicate(1000,mean(rexp(10,0.145)))

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y1 Solutions Page 5

(ii)(b) Histogram

hist(xbar, prob=TRUE, main="Histogram of sample means from


Exp(0.145)",xlab="Sample mean")
[4]

(ii)(c) Empirical probability

length(xbar[xbar<5])/length(xbar) [3]

This gives an answer of 0.183. [1]

(iii) Superimpose gamma PDF

Using X  Gamma(n , n ) we have X  Gamma(10,1.45) . [1]

xvals <- seq(0,20,by=0.01)

lines(xvals,dgamma(xvals,10,1.45),type="l",col="blue") [3]
[Subtract 1 mark per error, colour not needed]

The histogram has a similar shape to the gamma distribution. [1]

(iv) Exact probability

pgamma(5,10,1.45) [2]

This gives an answer of 0.19573. [1]

There’s about a 7% difference in the answers so they’re not that close. [2]

(v)(a) Simulate 1,000 values from gamma

set.seed(21)

sim <- rgamma(1000,10,1.45) [2]

(v)(b) QQ plot

qqplot(sim,xbar,xlab="gamma quantiles",ylab="sample mean


quantiles") [4]

abline(0,1,col="red",lty=2,lwd=2) [2]
[Colour, line type and width not needed]

The fit appears fairly good in the middle. The lower end sample means are slightly higher than
expected – so we have a lighter lower tail. [2]

The upper end sample means are much lower than expected – so we have a lighter upper tail
except for a handful of extremely large sample mean values. [2]

So it’s not a very good fit, possibly because of the sample size not being large enough. [1]

The Actuarial Education Company © IFE: 2019 Examinations


Subject CS1: Assignment Y2
2019 Examinations

The time allowed for this assignment is 1¾ hours.

Attempt all of the questions, as far as possible under exam conditions.

If you are having your assignment marked by ActEd, please follow these instructions carefully:

– Download and open the Word document ‘CS1 Assignment Y2 Answer Booklet 12345’.
Follow the instructions provided in the template and enter your answers where
indicated.

– In your submission include sufficient R code for the markers to work out how you
arrived at your answers.

– Begin your answer to each question on a new page. Only send ActEd one Word file
(created using the template) when you have completed the assignment.

– When submitting your script, email your word file to [email protected].

– Assignment marking is not included in the price of the course materials. Please
purchase Series Y Marking or a Marking Voucher before submitting your script.

– We only accept the current version of assignments for marking, and so you can only
submit this assignment in the sessions leading to the 2019 exams.

– We only accept Word files produced in Office 2007, 2010 or 2013 format. Submitted
assignments will not be marked if any of the files are suspected to have been affected
by a computer virus or to have been corrupted.

– You should aim to submit this script for marking by the recommended submission
date. The recommended and deadline dates for submission of this assignment are
listed on the summary page at the back of this pack and on our website at
www.ActEd.co.uk.

– Scripts received after the deadline date will not be marked, unless you are using a
Marking Voucher. It is your responsibility to ensure that scripts reach ActEd in good
time. If you are using Marking Vouchers, then please make sure that your script
reaches us by the Marking Voucher deadline date to give us enough time to mark
and return the script before the exam.

– In addition to this paper, you should have available actuarial tables and an electronic
calculator.

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

The Actuarial Education Company © IFE: 2019 Examinations


CS1B: Assignment Y2 Questions Page 1

Y2.1 An investigation is to be carried out into the spreading properties of two brands of paint, Brand A
and Brand B . Samples of 5 cans (of the same size) of each type are analysed, and the area of wall
covered by the paint in each can is measured (in square metres), with the following results:

Brand A 55.4 53.2 56.0 50.1 51.8


Brand B 49.2 47.9 52.2 50.8 48.3

(i) Test whether the variances of the 2 brands can be considered to be equal. [8]

(ii) Based on your answer to part (i), test the hypothesis that Brand A covers a greater area
than Brand B , against the hypothesis that both brands are equally effective. State the
probability value of your test statistic. [6]

(iii) It is decided that the assumption of normality is not appropriate. Repeat part (ii) using an
appropriate non-parametric test without resampling. [10]
[Total 24]

Y2.2 The aggregate claims X each year, from a portfolio of insurance policies, are assumed to have a
normal distribution with unknown mean θ and variance τ 2 = 400 . Prior information is such that
θ is assumed to have a normal distribution with mean μ = 270 and variance σ 2 = 225 .
Independent claim amounts over the past 5 years have been obtained.

Simulate 1,000 samples of 5 aggregate claims using a seed value of 13 and calculate the mean of
each sample. Hence, obtain an empirical Bayesian estimate for θ under quadratic loss. [10]

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B: Assignment Y2 Questions

Y2.3 A statistician is analysing the fall in fertility rates that occurred in Switzerland in 1888. The
standardised fertility measure for each of 47 French-speaking provinces is obtained along with
five other socio-economic factors:

• Agriculture (percentage of males involved in agriculture as occupation)


• Examination (percentage of draftees who received the highest mark on army examination)
• Education (percentage of draftees who received education beyond primary school)
• Catholic (percentage of province who were catholic)
• Infant.Mortality (percentage of live births who lived less than 1 year)
This information can be found in the built-in dataset, swiss.

(i) Show that Education has the second strongest Spearman’s correlation with Fertility. [6]

(ii) Plot a labelled scattergraph of Fertility against Education. [6]

(iii) (a) Fit a linear regression model, using Fertility as the response variable and
Education as the explanatory variable. State the intercept and gradient
parameters and comment on their p-values.

(b) Plot a red dotted fitted regression line to the scattergraph in part (i).

(c) By considering the coefficient of determination and the fitted line from part (b),
explain the limitations of this model. [12]

Neuchatel, the 42nd swiss province has 17.6% of men occupied in agriculture, 35% of draftees
receiving the highest mark on the army examination, 32% of draftees receiving education beyond
primary school, 16.92% who are catholic and 23.0% of live births who live less than 1 year.

(iv) Calculate the residual and a 90% confidence interval for the fertility rate of the Neuchatel
province, based on the model in part (iii). [8]

[continued over]

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Questions Page 3

To improve the model it is decided to use forward selection to add other variables and interaction
terms that meet the following criteria:

• Main effects are considered before interaction terms


• Only interactions between education and one other variable (that was included as a main
effect) are to be considered

• The variable that most improves the adjusted R 2 out of the remaining possibilities is to be
added first
• The variable is then only kept if all the resulting parameters in the model are significant.
(v) Derive the best model for fertility that meets all these criteria, recording your adjusted
R 2 for each model considered. Comment on how each model meets the criteria. [26]

(vi) (a) Repeat part (iv) for your model from part (v).

(b) Hence, comment on the fit of the second model compared to the first. [8]
[Total 66]

END OF PAPER

The Actuarial Education Company © IFE: 2019 Examinations


All study material produced by ActEd is copyright and is sold
for the exclusive use of the purchaser. The copyright is
owned by Institute and Faculty Education Limited, a
subsidiary of the Institute and Faculty of Actuaries.

Unless prior authority is granted by ActEd, you may not hire


out, lend, give out, sell, store or transmit electronically or
photocopy any part of the study material.

You must take care of your study material to ensure that it


is not used or copied by anybody else.

Legal action will be taken if these terms are infringed. In


addition, we may seek to take disciplinary action through
the profession or through your employer.

These conditions remain in force after you have finished


using the course.

The Actuarial Education Company © IFE: 2019 Examinations


Assignment deadlines

For the session leading to the April 2019 exams – CS1B, CS2B, CM1B & CM2B Subjects

Marking vouchers

Subjects Assignments

CS1B 6 March 2019

CS2B, CM1B 13 March 2019

CM2B 20 March 2019

Series Y Assignments

Recommended
Subjects Assignment Final deadline date
submission date

CS1B 2 January 2019 30 January 2019

CS2B, CM1B Y1 9 January 2019 6 February 2019

CM2B 16 January 2019 13 February 2019

CS1B 13 February 2019 6 March 2019

CS2B, CM1B Y2 20 February 2019 13 March 2019

CM2B 27 February 2019 20 March 2019

We encourage you to work to the recommended submission dates where possible.

If you submit your assignment on the final deadline date you are likely to receive your script back less than a
week before your exam.

The Actuarial Education Company © IFE: 2019 Examinations


Assignment deadlines

For the session leading to the September 2019 exams – CS1B, CS2B, CM1B & CM2B Subjects

Marking vouchers

Subjects Assignments

CS1B, CS2B 21 August 2019

CM1B, CM2B 28 August 2019

Series Y Assignments

Recommended
Subjects Assignment Final deadline date
submission date

CS2B 26 June 2019 24 July 2019

CS1B 19 June 2019 31 July 2019


Y1
CM1B 3 July 2019 31 July 2019

CM2B 26 June 2019 7 August 2019

CS2B 31 July 2019 14 August 2019

CS1B 31 July 2019 21 August 2019


Y2
CM1B 7 August 2019 21 August 2019

CM2B 7 August 2019 28 August 2019

We encourage you to work to the recommended submission dates where possible.

If you submit your assignment on the final deadline date you are likely to receive your script back less than a
week before your exam.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 1

Solution Y2.1

(i) Test equality of variances

A <- c(55.4,53.2,56.0,50.1,51.8) [1]

B <- c(49.2,47.9,52.2,50.8,48.3) [1]

var.test(A,B) [4]

This gives a p-value of 0.5576. Hence we have insufficient evidence to reject H0 . Therefore it is
reasonable to assume that the variances are equal. [2]

Alternatively, students may reverse the A and B in the function, which gives the same answer.

(ii) Test whether Brand A covers more than Brand B

Since the variances can be considered equal we can use the equal variance t test:

t.test(A,B,var.equal=TRUE,alt="greater") [4]

This gives a p-value of 0.01446. Hence we have sufficient evidence to reject H0 . Therefore it is
reasonable to assume that Brand A covers a greater area than Brand B.

Alternatively, students may reverse the A and B in the function, and test ‘less’ which gives the
same answer.

Subtract 2 marks for students who carry out a 2 sided test and get a p-value of 0.02891.

Subtract 2 marks for students who don’t specify that the variances are equal and get a p-value of
0.01564.

(iii) Non-parametric test

ObsT <- mean(A)-mean(B) [1]

results <- c(A,B)

index <- 1:length(results)

p<-combn(index,length(A)) [2]

n <- ncol(p)

dif<-rep(0,n) [1]

for (i in 1:n)

{dif[i]<-mean(results[p[,i]])-mean(results[-p[,i]])} [2]

The Actuarial Education Company © IFE: 2019 Examinations


Page 2 CS1B: Assignment Y2 Solutions

length(dif[dif>=ObsT])/length(dif) [2]

This gives an empirical p-value of 0.01984127. Hence we have sufficient evidence to reject H0 .
Therefore it is reasonable to assume that Brand A covers a greater area than Brand B. [2]

Alternatively, students may reverse the A and B in the function, and test dif<=ObsT which gives the
same answer.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 3

Solution Y2.2

Storing the distribution variance:

dvar <- 400

Storing the prior mean and variance:

pmean <- 270

pvar <- 225

Storing the number of years:

n <- 5

Obtaining the means of 1,000 samples of size n:

xp <- rep(0,1000)

set.seed(13)

for (i in 1:1000)

{mu <- rnorm(1,pmean,sqrt(pvar))

x <- rnorm(n,mu,sqrt(dvar))

xp[i] <- sum(x)/n}

[8, 2 marks each for the last 3 lines, 2 marks for everything else]

Alternatively, students could place the values directly into the loop:

xp <- rep(0,1000)

set.seed(13)

for (i in 1:1000)

{mu <- rnorm(1,270,sqrt(225))

x <- rnorm(5,mu,sqrt(400))

xp[i] <- sum(x)/5}

The average of these is the estimate:

mean(xp) [1]

Hence the empirical Bayesian estimate for θ under quadratic loss is 268.9667. [1]

The Actuarial Education Company © IFE: 2019 Examinations


Page 4 CS1B: Assignment Y2 Solutions

Solution Y2.3

(i) Spearman’s rank correlation coefficient

cor(swiss,method="spearman") [4]

The first row is as follows:

Agriculture Examination Education Catholic Infant.Mortality


0.2426643 -0.66090300 -0.44325769 0.41364556 0.43713670

Hence, we can see that Education has the second strongest correlation coefficient (−0.44) with
Fertility after Examination (−0.66) . [2]

Subtract 2 marks for students who calculate the default Pearson or the Kendall correlation.

(ii) Labelled scattergraph

attach(swiss)

plot(Education,Fertility,pch=3,main="Scattergraph of
Fertility rate vs education")
[6]

Alternatively, students need not attach the data and instead use either of the following:

plot(swiss$Education,swiss$Fertility,pch=3,xlab="Education",
ylab="Fertility",main="Scattergraph of Fertility rate vs
education")

plot(swiss[,4],swiss[,1],pch=3,xlab="Education",y
lab="Fertility",main="Scattergraph of Fertility rate vs
education")

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 5

(iii)(a) Fit a linear regression model

If the data is attached then students can use:

model <- lm(Fertility~Education) [2]

Otherwise they’ll need to use either of the following:

lm(Fertility ~ Education, data = swiss)

lm(swiss$Fertility ~ swiss$Education)

Note, hereafter we will only use the attached version.

summary(model)

This gives the following output:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.6101 2.1041 37.836 < 2e-16 ***
Education -0.8624 0.1448 -5.954 3.66e-07 ***

So the intercept parameter = 79.6101 and slope parameter = −0.8624 . [2]

The p-values are both very significant indicating that the parameters are non-zero. [1]

(iii)(b) Add a fitted regression line

abline(model,col="red",lty=3) [4]

The Actuarial Education Company © IFE: 2019 Examinations


Page 6 CS1B: Assignment Y2 Solutions

(iii)(c) Coefficient of determination and explain the limitations

The coefficient of determination is given in the summary to part (iii)(a):

R 2 = 0.4406 [1]

Half marks for students who give the adjusted R 2 of 0.4282 as the question asks for the R 2 .

The low coefficient of determination is due to the large spread of results around the line. [1]

There are also very few provinces with high levels of education. [1]

(iv) Residual and 90% confidence interval

The residual for Neuchatel which is the 42nd province can be obtained from either of the
following:

resid(model)[42] [1]

model$resid[42]

This gives 12.38515. [1]

Creating the data frame:

newdata <-data.frame(Education=32) [2]

Alternatively, students could enter all the explanatory variables:

newdata <-
data.frame(Agriculture=17.6,Examination=35,Education=32,
Catholic=16.92,Infant.Mortality=23.0)

Obtaining the confidence interval for the individual response:

predict(model, newdata, interval="predict", level=0.9)

[3, −1 per error]

This gives a 90% confidence interval of (35.2, 68.8). [1]

Students who use the mean response will get (46.4, 57.6). Lose 1 mark.

Students who obtain a 95% confidence interval will get individual (31.8, 72.2) or mean (45.3, 58.7).
Lose 1 mark and 2 marks, respectively.

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 7

(v) Forward selection

Our starting model, model, is Education with an adjusted R 2 = 0.4282 .

Adding the second covariate

fit1a <- update(model,.~.+Agriculture) [1]

summary(fit1a)

The adjusted R 2 is 0.4242, which is a bit worse, and not all parameters are significant. [1]

fit1b <- update(model,.~.+Catholic) [1]

summary(fit1b)

The adjusted R 2 is 0.5552, which is an improvement. [1]

fit1c <- update(model,.~.+Examination) [1]

summary(fit1c)

The adjusted R 2 is 0.483, which is an improvement. [1]

fit1d <- update(model,.~.+Infant.Mortality) [1]

summary(fit1d)

The adjusted R 2 is 0.545, which is an improvement. [1]

So fit1b (+Catholic) improves the adjusted R 2 the most (to 0.5552) and has all parameters
significant. [1]

Adding the third covariate

fit2a <- update(fit1b,.~.+Agriculture)

summary(fit2a)

The adjusted R 2 is 0.6173, which is an improvement. [1]

fit2b <- update(fit1b,.~.+Examination)

summary(fit2b)

The adjusted R 2 is 0.5452, which is worse, and not all parameters are significant. [1]

fit2c <- update(fit1b,.~.+Infant.Mortality)

summary(fit2c)

The adjusted R 2 is 0.639, which is an improvement. [1]

The Actuarial Education Company © IFE: 2019 Examinations


Page 8 CS1B: Assignment Y2 Solutions

So fit2c (+Infant.Mortality) improves the adjusted R 2 the most (to 0.639) and has all
parameters significant. [1]

Adding the fourth covariate

fit3a <- update(fit2c,.~.+Agriculture)

summary(fit3a)

The adjusted R 2 is 0.6707, which is an improvement. [1]

fit3b <- update(fit2c,.~.+Examination)

summary(fit3b)

The adjusted R 2 is 0.6319, which is worse, and not all parameters are significant. [1]

So fit3a (+Agriculture) is the only model that improves the adjusted R 2 (to 0.6707) and
has all parameters significant. [1]

Adding the fifth covariate

fit4a <- update(fit3a,.~.+Examination)

summary(fit4a)

The adjusted R 2 is 0.671, which is a marginal improvement but not all the parameters are
significant. [1]

Hence, we do not add this covariate and stick with model fit3a with an adjusted R 2 of 0.6707.
[1]

Adding the first Education interaction term

fit5a <- update(fit3a,.~.+Education:Agriculture) [1]

summary(fit5a)

The adjusted R 2 is 0.6628, which is worse, and not all parameters are significant. [1]

fit5b <- update(fit3a,.~.+Education:Catholic)

summary(fit5b)

The adjusted R 2 is 0.699, which is an improvement. [1]

Examination was not a significant main effect and therefore should not be considered.

fit5c <- update(fit3a,.~.+Education:Infant.Mortality)

summary(fit5c)

The adjusted R 2 is 0.6779, which is an improvement, but not all parameters are significant. [1]

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 9

So fit5b (+Education:Catholic) is the only model that improves the adjusted R 2 (to
0.699) and has all parameters significant. [1]

Adding the second Education interaction term

fit6a <- update(fit5b,.~.+Education:Agriculture)

summary(fit6a)

The adjusted R 2 is 0.6917, which is worse, and not all parameters are significant. [1]

fit6b <- update(fit5b,.~.+Education:Infant.Mortality)

summary(fit6b)

The adjusted R 2 is 0.6957, which is worse, and not all parameters are significant. [1]

Neither of these models meet the criteria so the best model is fit5b which has the following
covariates:

Education + Catholic + Infant.Mortality + Agriculture + Education:Catholic [1]

Students could try examination as a main effect again after the first interaction term to see if it is
now significant. If they do, they’ll get an adjusted R 2 of 0.7073, which is an improvement, but
note that not all parameters are significant. So again, it would not be added.

The Actuarial Education Company © IFE: 2019 Examinations


Page 10 CS1B: Assignment Y2 Solutions

Forward selection marker summary grid:

model: Education adjusted R² = 0.4282

+Agriculture adjusted R² = 0.4242 worse

+Catholic adjusted R² = 0.5552 best

+Examination adjusted R² = 0.483 better

+Infant.Mortality adjusted R² = 0.545 better

model: Education+Catholic adjusted R² = 0.5552

+Agriculture adjusted R² = 0.6173 better

+Examination adjusted R² = 0.5452 worse

+Infant.Mortality adjusted R² = 0.639 best

model: Education+Catholic+Infant.Mortality adjusted R² = 0.639

+Agriculture adjusted R² = 0.6707 best

+Examination adjusted R² = 0.6319 worse

model: Education+Catholic+Infant.Mortality+Agriculture adjusted R² = 0.6707

+Examination adjusted R² = 0.671 marginally better (not significant parameters)

model: Education+Catholic+Infant.Mortality+Agriculture adjusted R² = 0.6707

+Education:Agriculture adjusted R² = 0.6628 worse

+ Education:Catholic adjusted R² = 0.699 best

+ Education:Examination adjusted R² = 0.6644 worse

+ Education:Infant.Mortality adjusted R² = 0.6779 better (not significant parameters)

model: Education+Catholic+Infant.Mortality+Agriculture+Education:Catholic adjusted R² = 0.699

+Examination adjusted R² = 0.7073 marginally better (not significant parameters)

+Education:Agriculture adjusted R² = 0.6917 worse

+ Education:Examination adjusted R² = 0.6916 worse

+ Education:Infant.Mortality adjusted R² = 0.6957 worse

© IFE: 2019 Examinations The Actuarial Education Company


CS1B: Assignment Y2 Solutions Page 11

(vi)(a) Residual and 90% confidence interval

The residual for Neuchatel which is the 42nd province can be obtained from either of the
following:

resid(fit5b)[42] [1]

fit5b$resid[42]

This gives 3.568406. [1]

Creating the data frame:

newdata <-
data.frame(Agriculture=17.6,Examination=35,Education=32,
Catholic=16.92,Infant.Mortality=23.0) [2]

Obtaining the confidence interval for the individual response:

predict(fit5b, newdata, interval="predict", level=0.9)

[2, −1 per error]

This gives a 90% confidence interval of (47.6, 74.1).

Students who use the mean response will get (54.3, 67.3). Lose 1 mark.

Students who obtain a 95% confidence interval will get individual (44.9, 76.7) or mean (53.0, 68.6).
Lose 1 mark and 2 marks, respectively.

(vi)(b) Comment

For the second model (fit5b), the residual is smaller and the confidence interval is narrower.
Both of these indicate that the model is a better fit. [2]

The Actuarial Education Company © IFE: 2019 Examinations

You might also like