0% found this document useful (0 votes)
99 views

BIO503: Introduction To Programming and Statistical Modeling in R

This course provides an introduction to using R, a flexible statistical programming language, for statistical computation, modeling, and graphics. The course will cover basic R syntax and programming, writing functions, importing and managing data, and fitting several common statistical models like linear regression. The intended audience is graduate students in non-statistics fields who need R for research. No prior R experience is required, but a basic understanding of statistics and linear regression is expected. The course involves five 3-hour sessions combining lecture, demonstration, and hands-on practice to help students use R independently in their own work.

Uploaded by

Michael Botta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

BIO503: Introduction To Programming and Statistical Modeling in R

This course provides an introduction to using R, a flexible statistical programming language, for statistical computation, modeling, and graphics. The course will cover basic R syntax and programming, writing functions, importing and managing data, and fitting several common statistical models like linear regression. The intended audience is graduate students in non-statistics fields who need R for research. No prior R experience is required, but a basic understanding of statistics and linear regression is expected. The course involves five 3-hour sessions combining lecture, demonstration, and hands-on practice to help students use R independently in their own work.

Uploaded by

Michael Botta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BIO503: Introduction to Programming and Statistical Modeling in R

Course Description

This course is an introduction to R, a powerful and exible statistical language and environment that also provides more exible graphics capabilities than other popular statistical packages. The course will introduce students to the basics of using R for statistical programming, computation, graphics, and modeling. We will start with a basic introduction to the R language, reading and writing data, and graphics. We then discuss writing functions in R and tips on programming in R. Finally, the latter part of the course will focus on using R to t some important types of statistical models, including linear regression. Our goal is to get students up and running with R such that they can use R in their research and are in a good position to expand their knowledge of R on their own. Basic knowledge of statistics at the level of a basic understanding of linear regression is required The rst 3 lecture will focus on R basics. Depending on course progress, I am happy to tailor the last lectures to students interests.

Learning Objectives
1. Use R for statistical programming, computation, graphics, and modeling, 2. Write functions and use R in an efcient way, 3. Fit some basic types of statistical models 4. Use R in their own research, 5. Be able to expand their knowledge of R on their own.

After taking the course, students will be able to

Intended Audience and prerequisites

There are no formal prerequisites, but in order to appreciate the abilities of R and for the later classes that explore various statistical models, we expect that students will have some basic knowledge of statistics, at the level of a basic understanding of linear regression. The intended audience is doctoral students in departments other than biostatistics who need a exible statistical environment for their research. Masters students are also allowed. We do not expect any prior experience with R, but experience with another programming or statistical language may be helpful to a limited extent. Beginning R users with basic knowledge may also nd the course useful. 1

Instructors

Primary classroom and grading instructor: Aedin Culhane Dana-Farber Cancer Institute, Smith 822C (8th oor of the Smith building at the end of Shattuck St) Phone: (617) 617-2468 e-mail: [email protected]

Faculty sponsor: Chris Paciorek Room 407, Building 2 Phone: (617) 432-4912 e-mail: [email protected]

Course Material
Students may use either of these two books depending on their needs and back-

Course text: ground.

1. Peter Dalgaard. Introductory Statistics with R (Paperback) 1st Edition Springer-Verlag New York, Inc. ISBN 0-387-95475-9 http://www.amazon.com/Introductory-Statistics-R-Peter-Dalgaard/dp/0387954759 Introductory Statistics with R provides an very basic introduction to R, targeting both nonstatistician scientists. It maybe sufcient for students who may use R for basic statistics. 2. W. N. Venables and B. D. Ripley. 2002. Modern Applied Statistics with S. 4th Edition. Springer. ISBN 0-387-95457-0 http://www.amazon.com/Modern-Applied-Statistics-W-N-Venables/dp/0387954570 Modern Applied Statistics with S is a more comprehensive introduction to statistical computing using S and R. Other useful references: An Introduction to R. Online manual at the R website at http://cran.r-project.org/manuals.html Andreas Krause, Melvin Olson. 2005. The Basics of S-PLUS. 4th edition. Springer-Verlag, New York. ISBN 0-387-26109-5 Materials suggested at http://cran.r-project.org/manuals.html Jose Pinheiro, Douglas Bates. 2000. Mixed-effects models in S and S-PLUS Springer-Verlag, Berlin. ISBN 0-387-98957-9

Answers to examples in book are available on authors webpage http://staff.pubhealth.ku.dk/ pd/ISwR.htm

Software

R is available for free from http://cran.r-project.org/ for UNIX/Linux, Windows, and Mac. It is also available in the IT microlabs.

Class Format

There will be ve 3-hour class sessions. They will be held in the microlab and will combine lecture, demonstration, and laboratory components, with an emphasis on demonstration and hands-on experience.

Grading/Assessment

Course note: Pass/Fail or audit grading option only. There will be 3 practical assignments, requiring students to use and expand on the material discussed in class. Pass-fail grading will be based on return and performance on these assignments, and on attendence.

Course topics
1. Introduction to the R language: SAS versus R R, S, and S-plus Obtaining and managing R Objects - types of objects, classes, creating and accessing objects Arithmetic and matrix operations Introduction to functions 2. More details on working with R Reading and writing data R libraries Functions and R programming 3. Graphics Basic plotting Manipulating the plotting window Advanced plotting using lattice library 3 the if statement looping: for, repeat, while writing functions function arguments and options

Saving plots 4. Standard statistical models in R Model formulae and model options Output and extraction from tted models Models considered: Linear regression: lm() Logistic regression: glm() Poisson regression: glm() Survival analysis: Surv(), coxph() Linear mixed models: lme()

5. Advanced R Extensions of topics discussed in lectures 1, 2 and 3 based on a course survey Data management (importing, subsetting, merging, new variables, missing data etc.) Plotting Loops and functions Further topics to be determined by student interest/requirements but may include Migration SAS to R Plotting and Graphics in R Writing R functions, optimizing R code Bioconductor, analysis of gene expression and genomics data. More on linear models Multivariate analysis, Cluster analysis, dimension reduction methods (PCA).

You might also like