0% found this document useful (0 votes)

133 views109 pages

Presentation of R

R is an open-source statistical computing environment. It has interfaces for Windows, Mac OS, and Unix/Linux operating systems. R can handle different data types like numeric vectors, factors, matrices, and data frames. Various functions allow users to read in external data, manipulate data, perform statistical analyses, and create graphics. Common tasks include data wrangling, modeling, significance tests, and producing publication-quality graphs.

Uploaded by

TomasBelen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views109 pages

Presentation of R

Uploaded by

TomasBelen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 109

An Introduction to R

John Verzani

CUNY/College of Staten Island Department of Mathematics

NYC ASA, CUNY Ed. Psych/May 24, 2005

An Introduction to R

Outline
What is R?
The many faces of R
Data
Manipulating data
Applying functions to data
Vectorization of data
Graphics
Model formulas
Inference
Significance tests
confidence intervals
Models
Simple linear regression
Multiple linear regression
Analysis of variance models
Logistic regression models
An Introduction to R
What is R?

R is an open-source statistical computing environment

I R is available from http://www.r-project.org

I R is a computing language, based on S and S-Plus, which is
well suited for statistical calculations
I R has the ability to produce excellent graphics for statistical
explorations and publications
An Introduction to R
What is R?

The structure of R

Library
Base
base
CLI grid
stats
Batch stats4
...
DCOM,... R Recommended
Kernel library() MASS
nlme
...
Contrib. (CRAN)
UsingR
text graphics
gregmisc
device
...
An Introduction to R
The many faces of R

The many faces of R

I R is ported to most modern computing platforms: Windows,

MAC OS X, Unix with X11 (linux), ...
I R has an interface that varies depending on the installation:
An Introduction to R
The many faces of R

Windows interface

Figure: Windows gui

An Introduction to R
The many faces of R

Mac OS X interface

Figure: Mac OS X gui

An Introduction to R
The many faces of R

X11 interface – the command line

Figure: There is no standard GUI for X11 implementations. A “typical”

usage may look like this screenshot. Also of interest is ESS package for
(X)Emacs users.
An Introduction to R
The many faces of R

The command line interface (CLI)

In R the typical means of interacting with the software is at the

command line.
I Commands are typed at the prompt >
I Continuation lines are indicated with a +
I ENTER sends the commands off to the interpreter

2+2

> 2 + 2
[1] 4
An Introduction to R
Data

Data types

I In statistics data comes in different types: numeric,

categorical, univariate, bivariate, multivariate, etc.
I R has different data types or classes to accommodate these
different types of data sets.
Some basic storage types are:
An Introduction to R
Data

numeric vectors: created with c(), etc.

> somePrimes = c(2, 3, 5, 7, 11, 13, 17)
> somePrimes
[1] 2 3 5 7 11 13 17

> odds = seq(1, 15, 2)

> odds
[1] 1 3 5 7 9 11 13 15

> ones = rep(1, 10)

> ones
[1] 1 1 1 1 1 1 1 1 1 1
An Introduction to R
Data

Character strings are indicated by using matching quote, double of

single.
Character variables
> character = c("Homer", "Marge", "Bart", "Lisa",
+ "Maggie")
> gender = c("Male", "Female", "Male", "Female",
+ "Male")
An Introduction to R
Data

Categorical variables – factors

> gender = factor(c("Male", "Female", "Male",

+ "Female", "Male"))
> gender
[1] Male Female Male Female Male
Levels: Female Male

Factors have an extra attribute a fixed set of levels that require

some care. (E.g., you can’t add new levels without some work.)
Factors are used instead of character vectors, as this allows R to
identify certain types of data. (Storage space is smaller as well.)
An Introduction to R
Data

Logical vectors: vectors of TRUE or FALSE

> somePrimes
[1] 2 3 5 7 11 13 17

> somePrimes < 10

[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes %in% c(3, 5, 7)

[1] FALSE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes == 2 | somePrimes >= 10

[1] TRUE FALSE FALSE FALSE TRUE TRUE TRUE
An Introduction to R
Data

Matrices

defining matrices:matrix(), rbind(), ...

> M = rbind(c(1, 1), c(0, 1))

> M
[,1] [,2]
[1,] 1 1
[2,] 0 1
An Introduction to R
Data

Matrices, cont.
Operations: multiplication. (* is entry-by-entry

> M %*% M
[,1] [,2]
[1,] 1 2
[2,] 0 1

Inverse is found by “solving” Ax = b, b an indentity matrix

> solve(M)
[,1] [,2]
[1,] 1 -1
[2,] 0 1
An Introduction to R
Data

Matrices, cont.

Least squares regression coefficients the hard way

> x = 1:5
> y = c(2, 3, 1, 4, 5)
> ones = rep(1, length(x))
> X = cbind(ones, x)
> solve(t(X) %*% X, t(X) %*% y)
[,1]
ones 0.9
x 0.7
An Introduction to R
Data

Lists

Lists are recursive structures with each level made up of

components.
I List components can be other data types, functions, additional
lists, etc.
I Lists are used often as return values of functions in R. The
print method is set to show only part of the values contained
in the list.
An Introduction to R
Data

Defining lists

> lst = list(a = somePrimes, b = M, c = mean)

> lst
$a
[1] 2 3 5 7 11 13 17

$b
[,1] [,2]
[1,] 1 1
[2,] 0 1

$c
function (x, ...)
UseMethod("mean")
<environment: namespace:base>
An Introduction to R
Data

Other data types

Data can be given extra attributes, such as a time series:
Time series have regular date information

> google = c(100.2, 132.6, 196, 180, 202.7,

+ 191.9, 186.1, 180)
> ts(google, start = c(2004, 9), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep
2004 100.2
2005 202.7 191.9 186.1 180.0
Oct Nov Dec
2004 132.6 196.0 180.0
2005
An Introduction to R
Data

Tables – an extension of an matrix or array

The table() function (also xtabs, ftable,...)
> table(gender)
gender
Female Male
2 3

> satisfaction = c(3, 4, 3, 5, 4, 3)

> category = c("a", "a", "b", "b", "a", "a")
> table(category, satisfaction)
satisfaction
category 3 4 5
a 2 2 0
b 1 0 1
An Introduction to R
Data

Data frames

The most common data-storage format is a data frame

I Stores rectangular data: each column a variable, typically each
row data for one subject
I Columns have names for easy reference
I May be manipulated like a matrix or a list (each variable a
top-level component)
An Introduction to R
Data

Relationship between vector, matrix, data frame, list

Matrix Data frame List

● ● ●

Vector

● ● ●
An Introduction to R
Data

Data frame examples

> role = c("Comic relief", "Parent", "troublemaker",

+ "Goody two-shoes", "Cute baby")
> theSimpsons = data.frame(name = character,
+ gender = gender, role = role)
> theSimpsons
name gender role
1 Homer Male Comic relief
2 Marge Female Parent
3 Bart Male troublemaker
4 Lisa Female Goody two-shoes
5 Maggie Male Cute baby
An Introduction to R
Data

Reading in data
Data can be built-in, entered in at the keyboard, or read in from
external files. These may be formatted using fixed width format,
commas separated values, tables, etc. For instance, this command
reads in a data set from a url:
Reading urls

> f = "http://www.math.csi.cuny.edu/st/R/crackers.csv"
> crackers = read.csv(f)
> names(crackers)
[1] "Company" "Product"
[3] "Crackers" "Grams"
[5] "Calories" "Fat.Calories"
[7] "Fat.Grams" "Saturated.Fat.Grams"
[9] "Sodium" "Carbohydrates"
[11] "Fiber"
An Introduction to R
Data
Manipulating data

Assignment, Extraction
Values in vectors, matrices, lists, and data frames can be accessed
by their components:
By index

> google[1:3]
[1] 100.2 132.6 196.0

> crackers[1:3, 2:3]

Product Crackers
1 Country Water Cracker Crck Pepper 4
2 Country Water Cracker Klassic 4
3 Country Water Cracker Sun Dried Tomato 4
An Introduction to R
Data
Manipulating data

By index cont.

> M[1, ]
[1] 1 1

> lst[[1]]
[1] 2 3 5 7 11 13 17
An Introduction to R
Data
Manipulating data

Access by name
by name

> crackers[1:3, c("Product", "Crackers")]

Product Crackers
1 Country Water Cracker Crck Pepper 4
2 Country Water Cracker Klassic 4
3 Country Water Cracker Sun Dried Tomato 4

> theSimpsons[["role"]]
[1] Comic relief Parent troublemaker
[4] Goody two-shoes Cute baby
5 Levels: Comic relief Cute baby ... troublemaker
An Introduction to R
Data
Manipulating data

Access by logical expressions

logical questions answered TRUE or FALSE

> somePrimes < 10

[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE

> somePrimes[somePrimes < 10]

[1] 2 3 5 7
An Introduction to R
Data
Manipulating data

Recycling values
When making assignments in R we might have a situation where
many values are replaced by 1, or a few. R has a means of
recycling the values in the assignment to fill in the size mismatch.
replace coded values with NA

> x = c(1, 1, 0, 1, 99, 0, 1, 99, 0, 1, 99)

> x[x == 99] = NA
> x[x == 1] = "Yes"
> x[x == 0] = "No"
> x
[1] "Yes" "Yes" "No" "Yes" NA "No" "Yes" NA
[9] "No" "Yes" NA
An Introduction to R
Data
Applying functions to data

Applying functions to data

Finding the mean

> fat = crackers$Fat.Grams

> mean(fat)
[1] 3.679

The median
> median(fat)
[1] 3.25
An Introduction to R
Data
Applying functions to data

Extra arguments to find trimmed mean

> mean(fat, trim = 0.2)

[1] 3.482

missing data – coded NA

> shuttleFailures = c(0, 1, 0, NA, 0, 0, 0)

> mean(shuttleFailures, na.rm = TRUE)
[1] 0.1667
An Introduction to R
Data
Applying functions to data

Functions

I Functions are called by name with a matching pair of ()

I Arguments may be indicated by position or name
I Named arguments can (and usually do) have reasonable
defaults
I A special role is played by the first argument
An Introduction to R
Data
Applying functions to data

generic functions
Interacting with R from the command line requires one to
remember a lot of function names, although R helps out
somewhat. In practice, many tasks may be viewed generically:
E.g., “print” the values of an object, “summarize” values of an
object, “plot” the object. Of course, different objects should yield
different representations.
R has methods (S3, S4) to declare a function to be generic. This
allows different functions to be “dispatched” based on the “class” of
the first argument.
A basic template is:
methodName( object, extraArguments)

Some common generic functions are print() (the default action),

summary() (for summaries), plot() (for basic plots).
An Introduction to R
Data
Applying functions to data

summary() function called on a number and factor

> summary(somePrimes)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 4.00 7.00 8.29 12.00 17.00

> summary(gender)
Female Male
2 3
An Introduction to R
Data
Vectorization of data

R, like MATLAB, is naturally vectorized. For instance, to find the

sample variance, (n − 1)−1 ∑(xi − x̄)2 by hand involves:
sample variance (also var())

> x = c(2, 3, 5, 7, 11, 13)

> fractions(x - mean(x))
[1] -29/6 -23/6 -11/6 1/6 25/6 37/6

> fractions((x - mean(x))^2)

[1] 841/36 529/36 121/36 1/36 625/36 1369/36

> fractions(sum((x - mean(x))^2)/(length(x) -

+ 1))
[1] 581/30
An Introduction to R
Data
Vectorization of data

Example: simulating a sample distribution

A simulation of the sampling distribution of x̄ from a random
sample of size 10 taken from an exponential distribution with
parameter 1 naturally lends itself to a “for loop:”
for loop simulation

> res = c()

> for (i in 1:200) {
+ res[i] = mean(rexp(10, rate = 1))
+ }
> summary(res)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.392 0.786 0.946 1.000 1.220 2.440
An Introduction to R
Data
Vectorization of data

Vectorizing a simulation
It is often faster in R to vectorize the simulation above by
generating all of the random data at once, and then applying the
mean() function to the data. The natural way to store the data is
a matrix.
Simulation using a matrix

> m = matrix(rexp(200 * 10, rate = 1), ncol = 200)

> res = apply(m, 2, mean)
> summary(res)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.308 0.800 1.010 1.020 1.190 1.830
An Introduction to R
Graphics

Graphics

R has many built-in graphic-producing functions, and facilities to

create custom graphics. Some standard ones include:
histogram and density estimate

> hist(fat, probability = TRUE, col = "goldenrod")

> lines(density(fat), lwd = 3)
An Introduction to R
Graphics

histogram and density estimate

Histogram of fat

0.20
0.15
Density

0.10
0.05
0.00

0 2 4 6 8

fat
An Introduction to R
Graphics

Graphics: cont.

Quantile-Quantile plots

> qqnorm(fat)

Boxplots

> boxplot(MPG.highway ~ Type, data = Cars93)

plot() is generic, last one also with

> plot(MPG.highway ~ Type, data = Cars93)

An Introduction to R
Graphics

Quantile-Quantile plot
Normal Q−Q Plot

●● ● ● ●
8

●●●

●●●●●●●●●●●●
6
Sample Quantiles

●
●●●
●●

●
●●
●

●
●●
●●
●●
●
4

●
●●
●●
●●

●
●●
●●
●●
●●
●●
●●
●●
●

●
●●

●●●
●●●
●
2

●●●●●●●●●●●

● ●●●●●

● ● ●
0

−2 −1 0 1 2

Theoretical Quantiles
An Introduction to R
Graphics

Boxplots

50 ●

●
45
40
35
30
25
20

Compact Large Midsize Small Sporty Van

An Introduction to R
Graphics

Fancy examples from upcoming book of P. Murrell

N = 360 brokenness = 0.5

2.0
350
300

1.8
250

234 (65%)
Number of Vessels

Completeness
1.6
200
150

159 (44%)

1.4
100

1.2

1.2
50

1.0
0

0.0 0.2 0.4 0.6 0.8 1.0

Sampling Fraction
An Introduction to R
Graphics

3-d graphics
X1

X2 X3

X2 X3
An Introduction to R
Graphics
Model formulas

Model formula notation

The boxplot example illustrates R’s model formula. Many generic
functions have a method to handle this type of input, allowing for
easier usage with multivariate data objects.
Example with xtabs – tables

Model formula cont.

Suppose x, y are numeric variables, and f is a factor. The basic

model formulas have these interpretations:

y ~ 1 yi = β0 + εi
y ~ x yi = β0 + β1 xi + εi
y ~ x - 1 yi = β1 xi + εi remove the intercept
y ~ x | f yi = β0 + β1 xi + εi grouped by levels of f
y ~ f yij = τi + εij

The last usage suggests storing multivariate data in two variables –

a numeric variable with the measurements, and a factor indicating
the treatment.
An Introduction to R
Graphics
Model formulas

Lattice graphics

Lattice graphics can effectively display multivariate data that are

naturally defined by some grouping variable. (Slightly more
complicated than need be to show groupedData() function in
nlme package.)
lattice graphics

> cars = groupedData(MPG.highway ~ Weight |

+ Type, Cars93)
> plot(cars)
An Introduction to R
Graphics
Model formulas

2000 3000 4000

Compact Sporty Small

● 50
●
45
●
●
40
● ● ● ●●●
●● ● 35
● ●●
●● ●
● ●●● ●● ●
●●●
● ●● ● ●●●● 30
●● ●● ●
MPG.highway

● ● ●● ● ● 25
20
Van Large Midsize
50
45
40
35
●
30 ● ●●● ●
●
●
● ● ●● ●●●●
●● ●
●●● ●
25 ● ● ●
●● ●
●● ● ● ●
●
● ●●
20

2000 3000 4000 2000 3000 4000

Weight
An Introduction to R
Graphics
Model formulas

more lattice graphics (Murrell’s book)

Waseca
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ●
Crookston
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ●
Morris
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ● 1932 ●
University Farm 1931
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ●
Duluth
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ●
Grand Rapids
Trebi ●
Wisconsin No. 38 ●
No. 457 ●
Glabron ●
Peatland ●
Velvet ●
No. 475 ●
Manchuria ●
No. 462 ●
Svansota ●

20 30 40 50 60

Barley Yield (bushels/acre)

An Introduction to R
Inference
Significance tests

Significance tests

There are several functions for performing classical statistical tests

of significance: t.test(), prop.test(), oneway.test(),
wilcox.test(), chisq.test(), ...
These produce a p-value, and summaries of the computations.
An Introduction to R
Inference
Significance tests

The Bumpus data set (Ramsey and Shafer) contains data from
1898 lecture supporting evolution (Some birds survived a harsh
winter storm)
two-sample t test

> Bumpus = read.table("Bumpus.txt", header = TRUE)

> plot(humerus ~ factor(code), data = Bumpus)
An Introduction to R
Inference
Significance tests

Diagnostic plot

780
760
740
humerus

720
700

●
680
660

1 2

factor(code)
An Introduction to R
Inference
Significance tests

t.test() output

> t.test(humerus ~ code, data = Bumpus)

Welch Two Sample t-test

data: humerus by code

t = -1.721, df = 43.82, p-value = 0.09236
alternative hypothesis: true difference in means is not equ
95 percent confidence interval:
-21.895 1.728
sample estimates:
mean in group 1 mean in group 2
727.9 738.0
An Introduction to R
Inference
Significance tests

The SchizoTwins data set (R&S) contains data on 15 pairs of

monozygotic twins. Measured values are of volume of left
hippocampus.
t-tests: paired

> twins = read.table("SchizoTwins.txt", header = TRUE)

> plot(affected ~ unaffected, data = twins)
> attach(twins)
> t.test(affected - unaffected)$p.value
[1] 0.006062
> t.test(affected, unaffected, paired = TRUE)$p.value
[1] 0.006062
> detach(twins)
An Introduction to R
Inference
Significance tests

Diagnostic plot

2.0 ●
●
●

●
1.8

●
●
●
1.6

●
affected

●
1.4

●
●

● ●
●
1.2

●
1.0

1.4 1.6 1.8 2.0

unaffected
An Introduction to R
Inference
confidence intervals

Confidence intervals

Confidence intervals are computed as part of the output of many

of these functions. The default is to do 95% CIs, which may be
adjusted using conf.level=.
95% CI humerus length overall

> t.test(Bumpus$humerus)
...
95 percent confidence interval:
728.2 739.6
...
An Introduction to R
Inference
confidence intervals

Chi-square tests
Goodness of fit tests are available through chisq.test() and
others. For instance, data from Rosen and Jerdee (1974, from
R&S) on the promotion of candidates based on gender:
gender data

> rj = rbind(c(21, 3), c(14, 10))

> dimnames(rj) = list(gender = c("M", "F"),
+ promoted = c("Y", "N"))
> rj
promoted
gender Y N
M 21 3
F 14 10
An Introduction to R
Inference
confidence intervals

sieveplot(rj)
Sieve diagram

F
gender
M

Y N
promoted
An Introduction to R
Inference
confidence intervals

chi-squared test p-value

> chisq.test(rj)$p.value
[1] 0.05132

Fischer’s exact test

> fisher.test(rj, alt = "greater")$p.value
[1] 0.02450
An Introduction to R
Models
Simple linear regression

Fitting linear models

Linear models are fit using lm():

I This function uses the syntax for model formula
I Model objects are reticent – you need to ask them for more
information
I This is done with extractor functions: summary(), resid(),
fitted(), coef(), predict(), anova(), deviance(),...
An Introduction to R
Models
Simple linear regression

Body fat data set

> source("http://www.math.csi.cuny.edu/st/R/fat.R")
> names(fat)
[1] "case" "body.fat" "body.fat.siri"
[4] "density" "age" "weight"
[7] "height" "BMI" "ffweight"
[10] "neck" "chest" "abdomen"
[13] "hip" "thigh" "knee"
[16] "ankle" "bicep" "forearm"
[19] "wrist"
An Introduction to R
Models
Simple linear regression

Fitting a simple linear regression model

Basic fit is done with lm: response ˜ predictor(s)

> res = lm(body.fat ~ BMI, data = fat)

> res

Call:
lm(formula = body.fat ~ BMI, data = fat)

Coefficients:
(Intercept) BMI
-20.41 1.55
An Introduction to R
Models
Simple linear regression

Making scatterplot, adding the regression line

> plot(body.fat ~ BMI, data = fat)

> abline(res)
> res.robust = lqs(body.fat ~ BMI, data = fat)
> abline(res.robust, lty = 2, col = "blue")
> title("BMI predicting body fat")
> legend(35, 20, legend = c("least-squares",
+ "lqs"), lty = 1:2)
An Introduction to R
Models
Simple linear regression

Scatterplot with regression line

BMI predicting body fat

40 ●

●
●
●
● ● ●
●
●●
● ● ● ●●
● ●●● ● ●
30

● ● ●● ● ●
● ● ●● ● ● ●●
● ●
● ●●●
●● ●●● ●●● ● ● ●
body.fat

●●● ● ● ● ●
●●●●●● ● ●● ● ●●
● ● ● ●● ●
●● ●● ● ● ●
●●●●
● ● ●●
● ● ●
●● ● ● ● ● ●
● ● ●
20

● ●● ●●●●● ● ● ●
● ●● ●●
● ● ● ●● ●
● ● ●● ●● ● ●
●
●●●● ●●● ●
● least−squares
● ●●●●●
● ●● ●
● ●● ●
●
● ●
●● ●● ●
●● ●
●● ● lqs
● ●
●●
●
● ●● ● ●
●● ●●●
● ● ● ●● ●●● ●
●
● ●●●● ●● ●
10

● ●● ●●●● ● ●
●● ●●●
●
● ● ●
●● ●
●● ● ●● ● ●
● ● ●
● ● ●● ●
●
●
●
0

20 25 30 35 40 45 50

BMI
An Introduction to R
Models
Simple linear regression

Basic output is minimal, more is given by summary()

> summary(res)

Call:
lm(formula = body.fat ~ BMI, data = fat)

Residuals:
Min 1Q Median 3Q Max
-21.4292 -3.4478 0.2113 3.8663 11.7826

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -20.40508 2.36723 -8.62 7.78e-16 ***
BMI 1.54671 0.09212 16.79 < 2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’
An Introduction to R
Models
Simple linear regression

summary() output cont.

Residual standard error: 5.324 on 250 degrees of freedom

Multiple R-Squared: 0.53, Adjusted R-squared: 0.5281
F-statistic: 281.9 on 1 and 250 DF, p-value: < 2.2e-16
An Introduction to R
Models
Simple linear regression

Extractor functions used to extract information

extractor functions
> coef(res)
(Intercept) BMI
-20.405 1.547

> summary(residuals(res))
Min. 1st Qu. Median Mean 3rd Qu.
-2.14e+01 -3.45e+00 2.11e-01 3.52e-17 3.87e+00
Max.
1.18e+01
An Introduction to R
Models
Simple linear regression

Residual plots to test model assumptions

> plot(fitted(res), resid(res))

> qqnorm(resid(res))
An Introduction to R
Models
Simple linear regression

Residual plots

Normal Q−Q Plot

● ● ● ● ● ●
● ●
10

10
● ● ●●
● ● ● ●●●
● ●● ● ●●●●
● ● ● ●
●●
● ●●● ● ● ● ● ●
●●
●●
●●
●
●● ● ● ● ●
●●
●●
●
●● ● ● ●● ● ●● ●
●●
●
●●
●
●●
●● ● ● ● ● ● ●● ●
●
●●
●
●●
●
● ●● ●
●
●●
5

5
● ● ● ● ●
●
●
●●
●● ●●●● ● ●
●
●
● ● ●● ● ●● ●
● ● ● ●
●●
●
●
●
●
●●● ●● ● ● ●●
●
●
●
●
●
●● ● ● ● ● ● ●● ●●● ●
●●
●
●●
● ● ● ●
●● ●● ● ● ● ● ● ●
●●
●
●●
●● ●
● ● ● ● ● ●●
●
●●
● ● ● ● ●●●● ● ●●

Sample Quantiles
●● ● ●
●●
●
0

0
● ●
● ● ●● ● ●● ●
● ● ●
● ●●
●
●
●
●
●
● ●
● ●● ● ● ●● ●●
● ● ●● ●● ●● ● ● ●
●
●
●
●
●●● ● ● ● ● ●
●
resid(res)

● ● ● ●
●●
●●
●● ● ● ● ● ● ● ●●
●●
● ● ●●●● ●● ●● ● ●● ● ●
●●
●●
●
● ●● ● ● ● ● ●
●
●●
●
●●
−5

−5
● ● ● ● ●● ●●
●
●
●●
● ● ● ● ●● ● ●
●●
●
●
●●
●
● ●● ● ●
●
●●
● ●● ● ●● ● ● ●
●●
●
●
●
●
●
● ●● ● ●
●
●
● ● ● ● ● ●
●●
●●
●
● ● ● ● ●
●●●●
●●
−10

−10
● ● ● ● ●●●●●

● ● ●●
● ●
−15

−15
−20

●
−20 ●

10 20 30 40 50 −3 −2 −1 0 1 2 3

fitted(res) Theoretical Quantiles

An Introduction to R
Models
Simple linear regression

Default diagnostic plots

Model objects, such as the output of lm(), have default plots

associated with them. For lm() there are four plots.
plot(res)

> par(mfrow = c(2, 2))

> plot(res)
An Introduction to R
Models
Simple linear regression

Diagnostic plots for lm()

Residuals vs Fitted Normal Q−Q plot

Standardized residuals
●●●
81 81●
●●
10 ● ● ● ●●●

2
●● ● ●
● ●●● ●●
●
●
●
●
●●●● ●●● ● ●
●●
●
●
●●
●
●●
●
●●
●●
● ●
● ●●
●
●
●●● ●●●
● ●
●●
●
●
●●●●●●●●
●
● ● ●
●●
●
●
●●
●
●●
●
●●●●●
●●●● ●
●●●●●● ●● ● ●
●
●●
●●
●
●● ●
Residuals

●●● ●●
● ●● ●●● ●●●●
●●●●
●● ●
●
●
●●
●
●
●●● ●●●●● ●●
●
●●●●●● ● ●● ●●
●
●
●●
●
●
●●● ●● ● ●
●
0

0
●● ● ●
●
● ●●●
●●● ● ●
●●
●
●
●●
●●● ● ●●●● ● ●● ● ●
●
●
●●
●● ●●●●
●●
●
●
●
●
●
●
●●
●
● ● ●●
●●
● ●
●●
●●●
●●● ●● ●
●
●
●
●
●
●●
●
●●
●
●
●● ● ● ● ● ● ● ●
●
●●
●
●
●●●
●●●●
● ● ● ● ●
●
●●
● ●● ● ●●
●
●
−20 −10

● ●
● ● ● ●● ●
●
●●
●
●
●●
● ●● ● ● ●●
●●
●
●

−2
●●
●● ● ●●
9 9●

−4
39●
39●

10 20 30 40 50 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles

Scale−Location plot Cook's distance plot

Standardized residuals

39●
0.0 0.5 1.0 1.5 2.0

0.0 0.5 1.0 1.5 2.0

39
Cook's distance

9●●●●
●
81 ●
●● ●●●● ●● ●● ●
●● ●●
●●●
● ●●● ● ●
● ●● ●
● ●● ●●● ● ●●
●●●●
● ●
●
● ●●
●●
●●● ●
●● ●● ●●●●●● ●● ●●
●
●● ●● ●
●
●● ●● ●●●● ●● ●
●
●●●●●●●●
●
●
●●●
● ●● ● ●●
●● ●●● ●● ●
●●● ●●
●●
●●● ●●●
●
●
●
●●
●● ●●
●● ●● ●●
●● ●
●●
● ●● ●
●●
●●●
● ●●● ●
● ●●●● ●
●
●●
●●
● ● ●●
● ●● ●● ●●●● ●
● ●●● ●● ●
●
●● ● ● ●●●● ●● ●
●●● ●● ●●●● ●
● 41 216

10 20 30 40 50 0 50 100 150 200 250

Fitted values Obs. number

An Introduction to R
Models
Simple linear regression

Predictions done using the predict() extractor function

> vals = seq(15, 35, by = 2)

> names(vals) = vals
> predict(res, newdata = data.frame(BMI = vals))
15 17 19 21 23 25 27
2.796 5.889 8.982 12.076 15.169 18.263 21.356
29 31 33 35
24.450 27.543 30.636 33.730
An Introduction to R
Models
Multiple linear regression

Multiple regression

Multiple regression models are modeled with lm() as well. Extra

covariates are specified using the following notations:
I + adds terms (- subtracts them, such as -1)
I Math expressions can (mostly) be used as is: log, exp,...
I I() used to insulate certain math expressions
I a:b adds an interaction between a and b. Also, *, ^ are
shortcuts for more complicated interactions
To illustrate, we model the body.fat variable, by measurements
that are easy to compute
An Introduction to R
Models
Multiple linear regression

Modeling body fat

> res = lm(body.fat ~ age + weight + height +

+ chest + abdomen + hip + thigh, data = fat)
> res

Call:
lm(formula = body.fat ~ age + weight + height + chest + abd

Coefficients:
(Intercept) age weight height
-33.27351 0.00986 -0.12846 -0.09557
chest abdomen hip thigh
-0.00150 0.89851 -0.17687 0.27132
An Introduction to R
Models
Multiple linear regression

Model selection using AIC

> stepAIC(res, trace = 0)

Call:
lm(formula = body.fat ~ weight + abdomen + thigh, data = fa

Coefficients:
(Intercept) weight abdomen thigh
-48.039 -0.170 0.917 0.209

(Set trace=1 to get diagnostic output.)

An Introduction to R
Models
Multiple linear regression

Model selection using F -statistic

> res.sub = lm(body.fat ~ weight + height +

+ abdomen, fat)
> anova(res.sub, res)
Analysis of Variance Table

Model 1: body.fat ~ weight + height + abdomen

Model 2: body.fat ~ age + weight + height + chest + abdomen
Res.Df RSS Df Sum of Sq F Pr(>F)
1 248 4206
2 244 4119 4 88 1.3 0.27
An Introduction to R
Models
Analysis of variance models

Analysis of variance models

A simple one-way analysis of variance test is done using,

oneway.test() with a model formula of the type y ~ f.
For instance, we look at data on the lifetime of mice who have
been given a type of diet (R&S).
read in data, make plot

> mice = read.table("mice-life.txt", header = TRUE)

> plot(lifetime ~ treatment, data = mice, ylab = "Months")
An Introduction to R
Models
Analysis of variance models

Plot of months survived by diet

50
40
Months

●
●
●
20

●
● ●
●
●

●
10

N/N85 N/R40 N/R50 NP R/R50 lopro

treatment
An Introduction to R
Models
Analysis of variance models

One way test of equivalence of means

> oneway.test(lifetime ~ treatment, data = mice,

+ var.equal = TRUE)

One-way analysis of means

data: lifetime and treatment

F = 57.1, num df = 5, denom df = 343, p-value <
2.2e-16

(var.equal=TRUE for assumption of equal variances, not default)

An Introduction to R
Models
Analysis of variance models

Using lm() for ANOVA

Modeling is usually done with a modeling function. The lm()
function can also fit a one-way ANOVA, again with the same
model formula
Using lm()

> res = lm(lifetime ~ treatment, data = mice)

> anova(res)
Analysis of Variance Table

Response: lifetime
Df Sum Sq Mean Sq F value Pr(>F)
treatment 5 12734 2547 57.1 <2e-16 ***
Residuals 343 15297 45
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’
An Introduction to R
Models
Analysis of variance models

Following Ramsey and Schafer we ask: Does lifetime on 50kcal/wk

exceed that of 85 kcal/month? We can take advantage of the use
of treatment contrasts to investigate this (difference in mean from
first level is estimated).
Treatment contrasts, set β1

> treat = relevel(mice$treatment, "N/R50")

> res = lm(lifetime ~ treat, data = mice)
> coef(summary(res))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.30 0.8 53.37 3.4e-168
treatN/N85 -9.61 1.2 -8.09 1.1e-14
treatN/R40 2.82 1.2 2.41 1.7e-02
treatNP -14.90 1.2 -12.01 5.7e-28
treatR/R50 0.59 1.2 0.49 6.2e-01
treatlopro -2.61 1.2 -2.19 2.9e-02
An Introduction to R
Models
Logistic regression models

logistic regression

Logistic regression extends the linear regression framework to

binary response variables. It may be seen as a special case of a
generalized linear model which consists of:
I A response y and covariates x1 , x2 , . . . , xp
I A linear predictor η = β1 x1 + · · · + βp xp .
I A specific family of distributions which describe the random
variable y with mean response, µy |x , related to η through a
link function, m−1 , where

µy |x = m(η), or η = m−1 (µy |x )

An Introduction to R
Models
Logistic regression models

logistic regression example

I Linear regression would be m being the identity and the

distribution being the normal distribution.
I For logistic regression, the response variable is Bernoulli, so
the mean is also the probability of success. The link function
is the logit, or log-odds, function and the family is Bernoulli, a
special case of the binomial.
We illustrate the implementation in R with an example from
Devore on the failure of space shuttle rings (a binary response
variable) in terms of take off temperature.
An Introduction to R
Models
Logistic regression models

Space shuttle data

read in data, diagnostic plot

> shuttle = read.table("shuttle.txt", header = TRUE)

> dotplot(~Temperature | Failure, data = shuttle,
+ layout = c(1, 2))
An Introduction to R
Models
Logistic regression models

Lift-off temperature by O-Ring failure/success (Y/N)

● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

55 60 65 70 75 80

Temperature
An Introduction to R
Models
Logistic regression models

Fitting a logistic model

We need to specify the formula, the family, and the link to the
glm() function:
Specify model, family, optional link

> res.glm = glm(Failure ~ Temperature, data = shuttle,

+ family = binomial(link = logit))
> coef(summary(res.glm))
Estimate Std. Error z value Pr(>|z|)
(Intercept) 11.7464 6.0214 1.951 0.05108
Temperature -0.1884 0.0891 -2.115 0.03443

(Actually, link=logit is default for binomial family.)

An Introduction to R
Models
Random effects models

mixed effects

Mixed-effects models are fit using the nlme package (or its new
replacement lmer which can also do logistic regression).
I Need to specify the fixed and random effects
I Can optionally specify structure beyond independence for the
error terms.
I The implemention is well documented in Pinheiro and Bates
(2000)
An Introduction to R
Models
Random effects models

Mixed-effects example

We present an example from Piheiro and Bates on a data set

involving a dental measurement taken over time for the same set of
subjects – longitudinal data.
The key variables
I Orthodont the data frame containing:
I distance – measurement
I age – age of subject at time of measurement
I Sex – gender of subject
I subject – subject code
An Introduction to R
Models
Random effects models

Fit model with lm(), check

The simple regression model is
yi = β0 + β1 xi + εi ,
where εi is N (0, σ2 ).
model using lm()

> res.lm = lm(distance ~ I(age - 11), Orthodont)

> res.lm
> bwplot(getGroups(Orthodont) ~ resid(res.lm))
Call:
lm(formula = distance ~ I(age - 11), data = Orthodont)

Coefficients:
(Intercept) I(age - 11)
24.02 0.66
An Introduction to R
Models
Random effects models

Boxplots of residuals by group

F11 ●
F04 ●
F03 ●
F08 ●
F02 ●
F07 ●
F05 ●
F01 ●
F06 ●
F09 ●
F10 ●
M10 ●
M01 ●
M04 ●
M06 ●
M15 ●
M09 ●
M14 ●
M13 ●
M12 ●
M03 ●
M08 ●
M07 ●
M11 ●
M02 ●
M05 ●
M16 ●

−5 0 5

resid(res.lm)
An Introduction to R
Models
Random effects models

Fit each group

A linear model fit for each group is this model

yij = βi0 + βi1 xij + εij ,

where εij are N (0, σ2 ).

fit each group with lmList()

> res.lmlist = lmList(distance ~ I(age - 11) |

+ Subject, data = Orthodont)
> plot(intervals(res.lmlist))
An Introduction to R
Models
Random effects models

Intervals of slope, intercept estimate by group

(Intercept) I(age − 11)
F11 | | | | | |
F04 | | | | | |
F03 | | | | | |
F08 | | | | | |
F02 | | | | | |
F07 | | | | | |
F05 | | | | | |
F01 | | | | | |
F06 | | | | | |
F09 | | | | | |
F10 | | | | | |
M10 | | | | | |
M01 | | | | | |
Subject

M04 | | | | | |
M06 | | | | | |
M15 | | | | | |
M09 | | | | | |
M14 | | | | | |
M13 | | | | | |
M12 | | | | | |
M03 | | | | | |
M08 | | | | | |
M07 | | | | | |
M11 | | | | | |
M02 | | | | | |
M05 | | | | | |
M16 | | | | | |

20 25 30 −0.5 0.0 0.5 1.0 1.5 2.0 2.5

An Introduction to R
Models
Random effects models

Fit with random effect

This fits the model

yij = (β0 + b0i ) + (β1 + b1i )xij + εij ,

with b·i N (0, Ψ). εij N (0, σ2 ).

fit with lme()

> res.lme = lme(distance ~ I(age - 11), data = Orthodont,

+ random = ~I(age - 11) | Subject)
> plot(augPred(res.lme))
> plot(res.lme, resid(.) ~ fitted(.) | Sex)
An Introduction to R
Models
Random effects models

Predictions based on random-effects model

8 9 10 11 12 13 14

F03 F04 F11

30
● ●
● ●
25 ● ● ● ● ●
● ●

20 ●

F06 F01 F05 F07 F02 F08

● ● 25
Distance from pituitary to pterygomaxillary fissure (mm)

● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● 20

M06 M04 M01 M10 F10 F09

● ● ●
30
● ●
● ● ●
● ● ●
● ● ●
25 ● ●

● ●
●
20
● ● ● ●

M03 M12 M13 M14 M09 M15

●
● ● 30
● ●
● ● ● ● ● ●
● ● 25
● ● ●
● ● ● ● ●
●
● 20
●

M16 M05 M02 M11 M07 M08

● ● ●
25 ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ●
20 ●

8 9 10 11 12 13 14 8 9 10 11 12 13 14 8 9 10 11 12 13 14

Age (yr)
An Introduction to R
Models
Random effects models

Residuals by gender, subject

20 25 30

Male Female

2
● ● ●
●
● ●
● ●
●● ● ● ● ● ● ● ●
Residuals (mm)

● ● ● ● ● ●
● ● ●● ● ● ●
● ● ●
● ● ●
●● ● ● ●
● ● ● ●
0 ●
●● ●● ●● ● ●
●● ● ●● ● ●
●
●
●● ●
● ● ● ● ● ● ●
● ●
● ● ●
● ● ● ●●● ●
● ● ●
● ●●● ● ● ●● ●
● ●
● ●
●
−2

●
−4
●

20 25 30

Fitted values (mm)

An Introduction to R
Models
Random effects models

Adjust the variance

Adjust the variance for different groups is done by specifying a

formula to the weights= argument:
Adjust σ for each gender

> res.lme2 = update(res.lme, weights = varIdent(form = ~1 |

+ Sex))
> plot(compareFits(ranef(res.lme), ranef(res.lme2)))
> plot(comparePred(res.lme, res.lme2))
An Introduction to R
Models
Random effects models

Compare random effects BLUPs for two models

● ranef(res.lme) ranef(res.lme2)

(Intercept) I(age − 11)

F11 ● ●

F04 ● ●

F03 ● ●

F08 ● ●

F02 ● ●

F07 ● ●

F05 ● ●

F01 ● ●

F06 ● ●

F09 ● ●

F10 ● ●

M10 ● ●

M01 ● ●

M04 ● ●

M06 ● ●

M15 ● ●

M09 ● ●

M14 ● ●

M13 ● ●

M12 ● ●

M03 ● ●

M08 ● ●

M07 ● ●

M11 ● ●

M02 ● ●

M05 ● ●

M16 ● ●

−4 −2 0 2 4 −0.2 0.0 0.2 0.4

An Introduction to R
Models
Random effects models

Compare the different predictions

res.lme res.lme2

8 9 10 11 12 13 14

F03 F04 F11

30
● ●
● ●
25 ● ● ● ● ●
● ●

20 ●

F06 F01 F05 F07 F02 F08

●
Distance from pituitary to pterygomaxillary fissure (mm)

● 25
● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● 20

M06 M04 M01 M10 F10 F09

● ● ●
30
● ●
● ● ●
● ● ●
● ● ●
25 ● ●

● ●
●
20
● ● ● ●

M03 M12 M13 M14 M09 M15

●
● ● 30
● ●
● ● ● ● ● ●
● ● 25
● ● ●
● ● ● ● ●
●
● 20
●

M16 M05 M02 M11 M07 M08

● ● ●
25 ● ● ●
● ● ●
● ● ● ● ● ●
● ● ● ● ●
● ● ●
20 ●

8 9 10 11 12 13 14 8 9 10 11 12 13 14 8 9 10 11 12 13 14

Age (yr)
An Introduction to R
Models
Random effects models

compare models using anova()

These nested models can be formally compared using a likelihood
ratio test. The details are carried out by the anova() method:
compare nested models

> anova(res.lme, res.lme2)

Model df AIC BIC logLik Test L.Ratio
res.lme 1 6 454.6 470.6 -221.3
res.lme2 2 7 435.6 454.3 -210.8 1 vs 2 20.99
p-value
res.lme
res.lme2 <.0001
An Introduction to R
Extras

Extending R: writing functions

R can be extended by writing functions. These may be defined in

separate files and read into R, or defined within an R session. For
instance, how to create the following diagram?
An Introduction to R
Extras

Pizza and pepperoni

An Introduction to R
Extras

helper functions

plotCircle = function(x,radius=1,...) {
t = seq(0,2*pi,length=100)
polygon(x[1]+ radius*cos(t), x[2]+radius*sin(t), ...)
}

doPlot = function(x,r,R,...) {
if(sqrt(sum(x^2))+r < R) plotCircle(x,r=r,...)
}
An Introduction to R
Extras

Key points

I functions are defined using function()

I Arguments are matched by position or name
I Named arguments may be abbreviated
I The special argument ... is used to pass along extra
arguments
I Command blocks are indicated using braces
I Functions can be defined on the command line, used
anonymously, or be stored in files to be sourced in.
An Introduction to R
Extras

Calling function

plotPizza = function(n,R=n,r=0.2) {
par(mai=c(0,0,0,0))
plot.new();plot.window(xlim=c(-n,n),ylim=c(-n,n),asp=1)
plotCircle(c(0,0),radius=R,lwd=2)

x = rep(-n:n,rep(2*n+1,2*n+1))
y = rep(-n:n,length.out=(2*n+1)^2)
apply(cbind(x,y),1,function(x) doPlot(x,r,R,col=gray(.5))
}

plotPizza(5)
An Introduction to R
Extras
Extending R with add-on packages

Extending R: add-on packages

One can extend R using add-on packages. Some are built-in

(≈ 10), over 400 contributed packages are hosted on CRAN
(http://cran.r-project.org), others are on author’s websites.
For instance, neglecting issues with permissions, to download and
install a basic GUI for R can be done with the command
Installing a package

> install.packages("Rcmdr")

This GUI, available for the three main platforms, allows one to
select variables, and fill in function arguments with a mouse.
The command install.packages() allows one to browse the
available packages.
An Introduction to R
Extras
Extending R with add-on packages

Learning more

R has several different ways that you can learn more:

The built in help pages. Some key fuctions are
I help.start() – to start web interface
I ?functionName – to find specific help for the named function
I apropos("word") To search through searchlist for a word
I help.search() matches in more places than apropos()
An Introduction to R
Extras
Extending R with add-on packages

The accompanying manuals Included with R are 5 manuals in pdf

or html form. An Introduction to R contains lots of
information.
Contributed documentation Contributed documentation: On the R
project webpage http://www.r-project.org is a
link to contributed documentation. There are quite a
few documents of significant size, including a few
that have made it into book form.
The R mailing list The R mailing list is full of information.
Questions should only be asked after a reading of the
FAQ or you are likely to have a rather terse response.
An Introduction to R
Extras
Extending R with add-on packages

Books on R (Also numerous S-plus titles)

Verzani, Using R for Introductory Statistics

Dalgaard, Introductory Statistics with R

Maindonald and Braun, Data Analysis and Graphics Using R

Fox, An R and S−Plus Companion to Applied Regression

Faraway, Linear Models with R

Heiberger and Holland, Statistical Analysis and Data Display...

Venables and Ripley, Modern Applied Statistics with S

Pinheiro and Bates, Mixed−Effects Models in S and S−Plus

Venables and Ripley, S Programming

Chambers and Hastie, Statistical Models in S.

Chambers, Programming with Data

Paul Murrell, R Graphics

200 400 600

楊睿中統計學合併版
No ratings yet
楊睿中統計學合併版
557 pages
Introduction To R: August 14, 2023
No ratings yet
Introduction To R: August 14, 2023
44 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
R Introduction II
No ratings yet
R Introduction II
45 pages
Intro To R Presentation
No ratings yet
Intro To R Presentation
50 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
179 pages
Intro To R Presentation
No ratings yet
Intro To R Presentation
50 pages
R Basic
No ratings yet
R Basic
16 pages
Lab Manual DAR
No ratings yet
Lab Manual DAR
81 pages
R: Introduction: Kedar Kelkar
No ratings yet
R: Introduction: Kedar Kelkar
24 pages
An Ordered Book For R Language
No ratings yet
An Ordered Book For R Language
92 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
Introduction To R Programming 1691124649
No ratings yet
Introduction To R Programming 1691124649
79 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Introduction To R - Part1
No ratings yet
Introduction To R - Part1
34 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
R Prog Lab Manual Theory
No ratings yet
R Prog Lab Manual Theory
16 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Class One
No ratings yet
Class One
66 pages
Big Data File in R
No ratings yet
Big Data File in R
23 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
Programming in R
No ratings yet
Programming in R
10 pages
R Workshop
No ratings yet
R Workshop
47 pages
Introduction To R II
No ratings yet
Introduction To R II
35 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
Module 1 Rprogramming Introduction Part A
No ratings yet
Module 1 Rprogramming Introduction Part A
20 pages
MIT 201 - Tutorial 01
No ratings yet
MIT 201 - Tutorial 01
8 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
Unit I - Introduction To R
No ratings yet
Unit I - Introduction To R
21 pages
R Programming
No ratings yet
R Programming
61 pages
Untitled
No ratings yet
Untitled
59 pages
Data Analytics Using R
100% (1)
Data Analytics Using R
27 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
Introduction To Analytics and R File
No ratings yet
Introduction To Analytics and R File
29 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
SCM 2.0
No ratings yet
SCM 2.0
71 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
MAS.M-1414. Cost Concepts, Classification and Segregation - MC
No ratings yet
MAS.M-1414. Cost Concepts, Classification and Segregation - MC
10 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
RStudio Exercices
No ratings yet
RStudio Exercices
8 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Quiz TH
No ratings yet
Quiz TH
43 pages
Cybersecurity Internship Report
No ratings yet
Cybersecurity Internship Report
55 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Linear Regression and Correlation Analysis PPT at BEC DOMS
50% (2)
Linear Regression and Correlation Analysis PPT at BEC DOMS
67 pages
Simple Linear Regressionclassroom
No ratings yet
Simple Linear Regressionclassroom
37 pages
Sample Thesis Using Regression Analysis
100% (5)
Sample Thesis Using Regression Analysis
6 pages
Simple Linear Regression Examples
No ratings yet
Simple Linear Regression Examples
14 pages
Simple Linear Regression: Definition of Terms
No ratings yet
Simple Linear Regression: Definition of Terms
13 pages
Sample MCQs
No ratings yet
Sample MCQs
67 pages
DADM-Correlation and Regression
No ratings yet
DADM-Correlation and Regression
138 pages
Ex08 PDF
No ratings yet
Ex08 PDF
29 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
Problem Sets (Days 1-6)
No ratings yet
Problem Sets (Days 1-6)
18 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
11 - Introduction To Forecasting Analysis
No ratings yet
11 - Introduction To Forecasting Analysis
54 pages
Simple Linear Regression: Y XI. XI X
No ratings yet
Simple Linear Regression: Y XI. XI X
25 pages
(Original PDF) Australasian Business Statistics, 4th Edition PDF Download
100% (2)
(Original PDF) Australasian Business Statistics, 4th Edition PDF Download
40 pages
Delmars Administrative Medical Assisting 5th Edition Ebook and TestBank Bundle
No ratings yet
Delmars Administrative Medical Assisting 5th Edition Ebook and TestBank Bundle
36 pages
Chapter Eight 8 Simple Linear Regression and Correlation: N XY X Y N X X
No ratings yet
Chapter Eight 8 Simple Linear Regression and Correlation: N XY X Y N X X
5 pages
Rquestions Sol PDF
100% (1)
Rquestions Sol PDF
6 pages
20200929105758YPDCHAN001ST3188 Topic 9 2021
No ratings yet
20200929105758YPDCHAN001ST3188 Topic 9 2021
73 pages
TP05 Econometrics p1
No ratings yet
TP05 Econometrics p1
22 pages
SM-Chapter 2docx
No ratings yet
SM-Chapter 2docx
25 pages
ECO4016F 2011 Tutorial 7
No ratings yet
ECO4016F 2011 Tutorial 7
6 pages
Chapter Four 4. Demand and Demand Forecasting: 4.1 Definition of Market Demand
No ratings yet
Chapter Four 4. Demand and Demand Forecasting: 4.1 Definition of Market Demand
16 pages
Determination of The Coefficient of Consolidation Using A Least Squares Method
No ratings yet
Determination of The Coefficient of Consolidation Using A Least Squares Method
4 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
SMMD Assignment-5 Name: Shubham Chauhan L PGID: 61910037: Interpretation
No ratings yet
SMMD Assignment-5 Name: Shubham Chauhan L PGID: 61910037: Interpretation
3 pages
Case Study: Intrinsically Linear Models Cobb-Douglas Production Function
No ratings yet
Case Study: Intrinsically Linear Models Cobb-Douglas Production Function
2 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Presentation of R

Uploaded by

Presentation of R

Uploaded by

An Introduction to R

CUNY/College of Staten Island Department of Mathematics

NYC ASA, CUNY Ed. Psych/May 24, 2005

R is an open-source statistical computing environment

I R is available from http://www.r-project.org

The many faces of R

I R is ported to most modern computing platforms: Windows,

Figure: Windows gui

Figure: Mac OS X gui

X11 interface – the command line

Figure: There is no standard GUI for X11 implementations. A “typical”

The command line interface (CLI)

In R the typical means of interacting with the software is at the

I In statistics data comes in different types: numeric,

numeric vectors: created with c(), etc.

> odds = seq(1, 15, 2)

> ones = rep(1, 10)

Character strings are indicated by using matching quote, double of

Categorical variables – factors

> gender = factor(c("Male", "Female", "Male",

Factors have an extra attribute a fixed set of levels that require

Logical vectors: vectors of TRUE or FALSE

> somePrimes < 10

> somePrimes %in% c(3, 5, 7)

> somePrimes == 2 | somePrimes >= 10

defining matrices:matrix(), rbind(), ...

> M = rbind(c(1, 1), c(0, 1))

Inverse is found by “solving” Ax = b, b an indentity matrix

Least squares regression coefficients the hard way

Lists are recursive structures with each level made up of

> lst = list(a = somePrimes, b = M, c = mean)

Other data types

> google = c(100.2, 132.6, 196, 180, 202.7,

Tables – an extension of an matrix or array

> satisfaction = c(3, 4, 3, 5, 4, 3)

The most common data-storage format is a data frame

Relationship between vector, matrix, data frame, list

Matrix Data frame List

Data frame examples

> role = c("Comic relief", "Parent", "troublemaker",

> crackers[1:3, 2:3]

> crackers[1:3, c("Product", "Crackers")]

Access by logical expressions

logical questions answered TRUE or FALSE

> somePrimes < 10

> somePrimes[somePrimes < 10]

> x = c(1, 1, 0, 1, 99, 0, 1, 99, 0, 1, 99)

Applying functions to data

Finding the mean

> fat = crackers$Fat.Grams

Extra arguments to find trimmed mean

> mean(fat, trim = 0.2)

missing data – coded NA

> shuttleFailures = c(0, 1, 0, NA, 0, 0, 0)

I Functions are called by name with a matching pair of ()

Some common generic functions are print() (the default action),

summary() function called on a number and factor

R, like MATLAB, is naturally vectorized. For instance, to find the

> x = c(2, 3, 5, 7, 11, 13)

> fractions((x - mean(x))^2)

> fractions(sum((x - mean(x))^2)/(length(x) -

Example: simulating a sample distribution

> res = c()

> m = matrix(rexp(200 * 10, rate = 1), ncol = 200)

R has many built-in graphic-producing functions, and facilities to

> hist(fat, probability = TRUE, col = "goldenrod")

histogram and density estimate

> boxplot(MPG.highway ~ Type, data = Cars93)

plot() is generic, last one also with

> plot(MPG.highway ~ Type, data = Cars93)

Compact Large Midsize Small Sporty Van

Fancy examples from upcoming book of P. Murrell

0.0 0.2 0.4 0.6 0.8 1.0

Model formula notation

> df = data.frame(cat = category, sat = satisfaction)

Model formula cont.