MTH5120 Statistical Modelling I Tutorial 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

School of Mathematical Sciences

MTH5120 Statistical Modelling I Tutorial 1


You should hand in answers to Questions 2 and 3 by Monday, 3 February, 1.00pm.
Please put your course-work into coursework box 8. The work should be stapled if neces-
sary. In order to have your work properly recorded and to get it back please write your lab
group number and time on the work.

Introduction to R
Most of you have used R before but a few may not.
R is a language for doing statistical work. It is open source so free to download. We are
going to use Rstudio which gives a better interface to R. Rstudio is also open source so you
can download it to try using at home.
Logon to the computer and open Rstudio (not Rstudio Geography).
You will see a window with probably three panes. Drag down the one on the upper left
labelled console so you can see one labelled source. This for entering commands. After
entering a command click on run. This copies the command to the console window below
on the left and executes it.(You can also type a number of commands, highlight them and
click on run.) Whilst it is possible to type the commands directly into the console you are
recommended to type into the source pane as if you make a mistake you can easily correct
it. You cannot correct the commands in the console..
The top right pane can show the environment, i.e. what variables are defined and also a
history of what commands you have entered.
The bottom right pane can be used for looking at plots you will produce in future weeks and
at the help which is available and has other possibilities we will explore later.
I hope you will experiment with R and find out for yourselves some of the many features it
has.
To start with I will remind you about some of the basics. To save a value in a variable we use
the assignment operator <- (less than hyphen with no space between them).

> x <- 3

We can do arithmetic as follows

> x <- 3
> y <- 4
> z <- sqrt(xˆ2 + yˆ2)

1
> print(z)
[1] 5

The variables x, y, z will keep the values you have assigned to them until you assign a
new value or delete them by using the remove function. For example rm(x) will delete x.
You can see which variables are currently defined by using the ls function. Note that all
functions have to be followed by a pair of brackets. Often this contains arguments which you
need in the function but even if there are no arguments they are still needed. So you type
ls() to use it.
To create a vector we use the c(...) operator so

> x <- c(1, 2, 3)


> y <- c(4, 5, 6)
> c(x ,y)
[1] 1 2 3 4 5 6

To calculate basic summary statistics we can use the following functions (we assume x, y
are vectors of the same length).

sum(x)
mean(x)
median(x)
sd(x)
var(x)
cor(x, y)
cov(x, y)

Define two six element vectors and find their means, variances and correlation.
Given two (or more) vectors we can add, subtract, multiply or divide them with the operations
being used element-wise

> v <- c(11,12,13,14,15)


> w <- c(1,2,3,4,5)
> v + w
[1] 12 14 16 18 20
> v - w
[1] 10 10 10 10 10
> v * w
[1] 11 24 39 56 75
> v / w
[1] 11.000000 6.000000 4.333333 3.500000 3.000000

If one operand is a scalar then the operation is performed between every vector element and
the scalar

> w + 2
[1] 3 4 5 6 7
>w / 2

2
[1] 0.5 1.0 1.5 2.0 2.5
> mean(w)
[1] 3
> w - mean(w)
[1] -2 -1 0 1 2

Question 1
In lecturesPwe had
P dataPon Sparrow’s ageP
(x) and wing size (y) . Use R to check the values
given for xi , y1 , xi yi , x2i and yi2 .
P
Find the mean of x and y and Syy , Sxx and Sxy and check that they agree with the figures
given in the lecture.

Question 2
Use the command
plot(x,y, main="Plot of Y versus X")
to produce a scatterplot of the data. The text after main= is the title of your plot. The plot
appears in the lower right pane. To save it you should Export it as an image to your directory.
Does the relationship between y and x seem to be linear?
To get just the estimates of the intercept and slope type
lm(y ∼ x)

We normally want more information than this. So we name the model sparrow
sparrow <- lm(y ∼ x)

To see the details of our fitted model we use


summary(sparrow)
Save the results.
We saw such a summary in the first lecture for the widget data. We will learn what all the
output means in the next few lectures. For now you need to know how to get these results.
Add the fitted line to your scatterplot by the following commands
plot(x,y, main="Fitted line plot of Y versus X")
abline(sparrow)
Save your plot to your directory.
You can predict the mean value of y when x=7 by
predict(sparrow, newdata=data.frame(x=7))
Hand in the results from the summary and your plots.

Question 3
Derive the least squares estimates of α and β for the centred form of the simple linear re-
gression model given by

yi = α + β(xi − x̄) + εi i = 1, 2, . . . , n.

Check that the estimates do give a minimum in the same way as we saw for the standard
form of the slmple linear regression model.

You might also like