0% found this document useful (0 votes)
11 views

Introduction

Uploaded by

adamsnurudeen974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Introduction

Uploaded by

adamsnurudeen974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

0.

Introductory Material

ST419
Computational Statistics

Lecturer: Erik Baurdoux1 B604


Assistant: Limin Wang

Course Aims and Objective

This course teaches the fundamental computing skills required by practicing statisticians. We
focus on analysis of data using a computer and simulation as a tool to improve understanding
of statistical models. The software package R is taught throughout. On successful completion
of the course you will be a competent R user with skills that adapt readily to other statistical
software packages. A small subset of the things you will be able to do are listed below:

• generate detailed descriptive analyses of data sets (both numbers and graphics),

• fit statistical models and interpret the output from model fitting,

• design and perform simulation experiments to explore the properties of statistics,

• write your own R functions to simplify commonly performed tasks or to implement novel
statistical methods.

Course Administration

Prerequisites

This is intended primarily for MSc Statistics, MSc Social Research Methods (Statistics) and MSc
Operational Research students. You are required to have a good basic knowledge of statistics
(to the level of ST203 Statistics for Management Sciences). The course makes extensive use
of computers in teaching and assessment. Basic familiarity with machines running a Windows
operating system is assumed.
1
For this year’s course, I have slightly modified the original course notes written by Dr. J. Penzer which can
be found on http://stats.lse.ac.uk/penzer/CS.html

ST419 Computational Statistics 1 c J Penzer 2006


0. Introductory Material

Timetabling

You are required to attend four hours per week for this course.

Mondays 1400-1500 H102 Weeks 1-10 Lecture


Mondays 1500-1700 S175 Weeks 1-10 Computer Workshop
Tuesdays 1000-1100 H101 Weeks 1-9 Lecture/Problem Class.

On Tuesday of week 10, we will be in H102 from 0900-1200 for students’ presentations (see
assessment below).

Assessment

This course is assessed by coursework (50%) and by a two hour examination during the summer
term (50%). There are two pieces of coursework:

Handed out Due in Marks


Group project 22/10/07 Written: 07/12/07 10%
Presentation: 11/12/07 10%
Individual project 19/11/07 14/01/08 30%

Detailed instructions for each of the projects will be given. There will be a practice written test.

Books

This course has detailed lecture notes. It should not be necessary to buy a book for this course.

Main texts:

• Venable, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S, (Fourth Edition),
Springer. (QA276.4 V44)
• Venables, W. N., Smith, D.M. and the R Core Development Team (2001) An Introduction to R,
freely available from http://cran.r-project.org/doc/manuals/R-intro.pdf.

New R books are appearing all the time – the library will get these soon:

• Dalgaard, P. Introductory Statistics with R


• Faraway, J. J. Linear Models with R
• Maindonald, J. and Braun, J. Data Analysis and Graphics using R

ST419 Computational Statistics 2 c J Penzer 2006


0. Introductory Material

Course content

A rough timetable and outline of course contents are given below.

Week 1 Data, Session Management and Simple Statistics – R objects: vectors, data
frames and model objects; assignment; descriptive analysis; managing objects;
importing data; t-tests; simple regression; financial series.

Week 2 More R Essentials and Graphics – R objects: logical vectors, character


sequences, lists; regular sequences; subset selection; descriptive plots: histogram,
qq-plot, boxplot; interactive graphics; low level graphics.

Week 3 Writing Functions in R – R objects: functions, arrays, matrices; function syntax;


flow control: conditioning, loops; R code from other sources; libraries.

Group project allocated. Individual project instructions given.

Week 4 Distributions and Simulation – statistical concepts: distributions, moments,


sample moments, properties of statistics; distributions in R: probability density,
quantiles; pseudo-random number generation; interpretation of simulation results.

Week 5 Monday: Written test 1 and Group project work.


Tuesday: Feedback on written test 1.

Week 6 Linear Models I – statistical concepts: multiple regression, model selection,


diagnostics; model formulae; interpretation of linear model output.

Week 7 Linear Models II – statistical concepts: factors as explanatory variables, logistic


regression; manipulating model objects.

Individual project allocated.

Week 8 Time Series Analysis – statisical concepts: auto-correlation, ARMA models,


GARCH models; R time series packages; graphics for exploratory analysis and
diagnostics.

Week 9 Monday: Written test 2 and Individual project work.


Tuesday: Feedback on written test 2.

Week 10 Monday: Revision.


Tuesday: Student presentations (0900-1200, H102).

ST419 Computational Statistics 3 c J Penzer 2006


0. Introductory Material

Teaching methods and materials

Lectures and problem classes

On Mondays we will give details of the application of R to new material. On Tuesdays we will
go through the exercises from the previous week - you will be expected to have attempted to
complete the exercises so that you can contribute to the class. Any common problems with the
exercises or in the use of R will be dealt with during these sessions. The background ideas for
new material will also be introduced during the Tuesdays session.

Computer practicals

The only way to learn a computer language is to actually use it. The computer practicals provide
time when you can work on material from the lecture notes and exercises. Using a combination
of the notes, the R help system and experimentation you should be able to work out most of the
commands. The R commands in the text are there to illustrate certain points. You do not have
to follow them rigidly. There is often more than one way to do something and you may be able
to find a better solution than the one given. If you find something interesting or difficult, spend
more time on it; if you get the idea straight away, move on. If you can’t work something out
by experimenting with it, ask me or Limin. Please don’t mindlessly type in commands
from the notes without understanding what they do.

Lecture notes

The course attempts to convey a large amount of information in a short space of time. Some
of the material is of a technical nature and may not be covered explicitly in the lectures and
classes. You are expected to read the lecture notes thoroughly. The syllabus is defined
by the contents of the lecture notes with the exception of topics marked with a †. Additional
reading is suggested at the end of each weeks notes. Some of the notational conventions adopted
in the notes are described below.

† – additional non-examinable material for those who are interested.

* – material that you may want to skip first time around.

typewriter font – R related quantities and commands.

italic typewriter font – things to be replaced with with an appropriate value, iden-
tifier or expression.

italic font – a new term or idea.

ST419 Computational Statistics 4 c J Penzer 2006


0. Introductory Material

If you miss a set of notes, additional copies are available from three sources:

• Hand out boxes in Statistics department – on the sixth floor of Columbia House there is
a set of hand out boxes. I will put any spare hard copies of lecture notes into these boxes.

• Public folder – to find copies of the notes in pdf format, open Outlook and go to
Public Folders −→All Public Folders −→Departments
−→Statistics −→ST419 −→Notes

• Website – I will also put copies of the notes on my website at


http://stats.lse.ac.uk/baurdoux/CS.html

The versions on the notes on the public folders and the website will be updated to include
corrections. Despite my best efforts, there will be mistakes in the notes. If you spot
something that looks wrong please let me know.

Asides
Interesting issues that arise from the topic under consideration are placed
in boxes in the text. These may be hints on using R effectively, quick
questions or suggestions for further work.

R Software and data

R software is freely available under the GNU General Public License. The R project homepage
is http://www.r-project.org/. You can download the software to install on your own machine
from http://www.stats.bris.ac.uk/R/. Data files associated with the course can be found in
the ST419 public folder under Data.

Communicating with me

By far the best way to communicate with me is via email. I will respond to sensible emails
relating to:

• problems with material in the course,

• problems with exercises,

• mistakes in the notes.

The ST419 projects are unsupervised; I will not provide any direct assistance with projects.

ST419 Computational Statistics 5 c J Penzer 2006

You might also like