MTH5120 Statistical Modelling I Tutorial 1
MTH5120 Statistical Modelling I Tutorial 1
MTH5120 Statistical Modelling I Tutorial 1
Introduction to R
Most of you have used R before but a few may not.
R is a language for doing statistical work. It is open source so free to download. We are
going to use Rstudio which gives a better interface to R. Rstudio is also open source so you
can download it to try using at home.
Logon to the computer and open Rstudio (not Rstudio Geography).
You will see a window with probably three panes. Drag down the one on the upper left
labelled console so you can see one labelled source. This for entering commands. After
entering a command click on run. This copies the command to the console window below
on the left and executes it.(You can also type a number of commands, highlight them and
click on run.) Whilst it is possible to type the commands directly into the console you are
recommended to type into the source pane as if you make a mistake you can easily correct
it. You cannot correct the commands in the console..
The top right pane can show the environment, i.e. what variables are defined and also a
history of what commands you have entered.
The bottom right pane can be used for looking at plots you will produce in future weeks and
at the help which is available and has other possibilities we will explore later.
I hope you will experiment with R and find out for yourselves some of the many features it
has.
To start with I will remind you about some of the basics. To save a value in a variable we use
the assignment operator <- (less than hyphen with no space between them).
> x <- 3
> x <- 3
> y <- 4
> z <- sqrt(xˆ2 + yˆ2)
1
> print(z)
[1] 5
The variables x, y, z will keep the values you have assigned to them until you assign a
new value or delete them by using the remove function. For example rm(x) will delete x.
You can see which variables are currently defined by using the ls function. Note that all
functions have to be followed by a pair of brackets. Often this contains arguments which you
need in the function but even if there are no arguments they are still needed. So you type
ls() to use it.
To create a vector we use the c(...) operator so
To calculate basic summary statistics we can use the following functions (we assume x, y
are vectors of the same length).
sum(x)
mean(x)
median(x)
sd(x)
var(x)
cor(x, y)
cov(x, y)
Define two six element vectors and find their means, variances and correlation.
Given two (or more) vectors we can add, subtract, multiply or divide them with the operations
being used element-wise
If one operand is a scalar then the operation is performed between every vector element and
the scalar
> w + 2
[1] 3 4 5 6 7
>w / 2
2
[1] 0.5 1.0 1.5 2.0 2.5
> mean(w)
[1] 3
> w - mean(w)
[1] -2 -1 0 1 2
Question 1
In lecturesPwe had
P dataPon Sparrow’s ageP
(x) and wing size (y) . Use R to check the values
given for xi , y1 , xi yi , x2i and yi2 .
P
Find the mean of x and y and Syy , Sxx and Sxy and check that they agree with the figures
given in the lecture.
Question 2
Use the command
plot(x,y, main="Plot of Y versus X")
to produce a scatterplot of the data. The text after main= is the title of your plot. The plot
appears in the lower right pane. To save it you should Export it as an image to your directory.
Does the relationship between y and x seem to be linear?
To get just the estimates of the intercept and slope type
lm(y ∼ x)
We normally want more information than this. So we name the model sparrow
sparrow <- lm(y ∼ x)
Question 3
Derive the least squares estimates of α and β for the centred form of the simple linear re-
gression model given by
yi = α + β(xi − x̄) + εi i = 1, 2, . . . , n.
Check that the estimates do give a minimum in the same way as we saw for the standard
form of the slmple linear regression model.