Introduction
Introduction
Introductory Material
ST419
Computational Statistics
This course teaches the fundamental computing skills required by practicing statisticians. We
focus on analysis of data using a computer and simulation as a tool to improve understanding
of statistical models. The software package R is taught throughout. On successful completion
of the course you will be a competent R user with skills that adapt readily to other statistical
software packages. A small subset of the things you will be able to do are listed below:
• generate detailed descriptive analyses of data sets (both numbers and graphics),
• fit statistical models and interpret the output from model fitting,
• write your own R functions to simplify commonly performed tasks or to implement novel
statistical methods.
Course Administration
Prerequisites
This is intended primarily for MSc Statistics, MSc Social Research Methods (Statistics) and MSc
Operational Research students. You are required to have a good basic knowledge of statistics
(to the level of ST203 Statistics for Management Sciences). The course makes extensive use
of computers in teaching and assessment. Basic familiarity with machines running a Windows
operating system is assumed.
1
For this year’s course, I have slightly modified the original course notes written by Dr. J. Penzer which can
be found on http://stats.lse.ac.uk/penzer/CS.html
Timetabling
You are required to attend four hours per week for this course.
On Tuesday of week 10, we will be in H102 from 0900-1200 for students’ presentations (see
assessment below).
Assessment
This course is assessed by coursework (50%) and by a two hour examination during the summer
term (50%). There are two pieces of coursework:
Detailed instructions for each of the projects will be given. There will be a practice written test.
Books
This course has detailed lecture notes. It should not be necessary to buy a book for this course.
Main texts:
• Venable, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S, (Fourth Edition),
Springer. (QA276.4 V44)
• Venables, W. N., Smith, D.M. and the R Core Development Team (2001) An Introduction to R,
freely available from http://cran.r-project.org/doc/manuals/R-intro.pdf.
New R books are appearing all the time – the library will get these soon:
Course content
Week 1 Data, Session Management and Simple Statistics – R objects: vectors, data
frames and model objects; assignment; descriptive analysis; managing objects;
importing data; t-tests; simple regression; financial series.
On Mondays we will give details of the application of R to new material. On Tuesdays we will
go through the exercises from the previous week - you will be expected to have attempted to
complete the exercises so that you can contribute to the class. Any common problems with the
exercises or in the use of R will be dealt with during these sessions. The background ideas for
new material will also be introduced during the Tuesdays session.
Computer practicals
The only way to learn a computer language is to actually use it. The computer practicals provide
time when you can work on material from the lecture notes and exercises. Using a combination
of the notes, the R help system and experimentation you should be able to work out most of the
commands. The R commands in the text are there to illustrate certain points. You do not have
to follow them rigidly. There is often more than one way to do something and you may be able
to find a better solution than the one given. If you find something interesting or difficult, spend
more time on it; if you get the idea straight away, move on. If you can’t work something out
by experimenting with it, ask me or Limin. Please don’t mindlessly type in commands
from the notes without understanding what they do.
Lecture notes
The course attempts to convey a large amount of information in a short space of time. Some
of the material is of a technical nature and may not be covered explicitly in the lectures and
classes. You are expected to read the lecture notes thoroughly. The syllabus is defined
by the contents of the lecture notes with the exception of topics marked with a †. Additional
reading is suggested at the end of each weeks notes. Some of the notational conventions adopted
in the notes are described below.
italic typewriter font – things to be replaced with with an appropriate value, iden-
tifier or expression.
If you miss a set of notes, additional copies are available from three sources:
• Hand out boxes in Statistics department – on the sixth floor of Columbia House there is
a set of hand out boxes. I will put any spare hard copies of lecture notes into these boxes.
• Public folder – to find copies of the notes in pdf format, open Outlook and go to
Public Folders −→All Public Folders −→Departments
−→Statistics −→ST419 −→Notes
The versions on the notes on the public folders and the website will be updated to include
corrections. Despite my best efforts, there will be mistakes in the notes. If you spot
something that looks wrong please let me know.
Asides
Interesting issues that arise from the topic under consideration are placed
in boxes in the text. These may be hints on using R effectively, quick
questions or suggestions for further work.
R software is freely available under the GNU General Public License. The R project homepage
is http://www.r-project.org/. You can download the software to install on your own machine
from http://www.stats.bris.ac.uk/R/. Data files associated with the course can be found in
the ST419 public folder under Data.
Communicating with me
By far the best way to communicate with me is via email. I will respond to sensible emails
relating to:
The ST419 projects are unsupervised; I will not provide any direct assistance with projects.