MDPN460 Lecture05
MDPN460 Lecture05
Engineering Lab
Lecture 5
ANOVA Using R
1 / 32
Today’s Lecture
●
Basic R programming (Continued)
– Logical vectors and relational operators.
– Data frames and lists.
– Data input and output.
●
ANOVA using R
●
Applying ANOVA using R for Example 1 of Lecture 4
2 / 32
Logical Vectors
●
We have used the c() function to put numeric vectors
together as well as character vectors.
●
R also supports logical vectors. These contain two
different elements:
– TRUE and
– FALSE ,
– as well as NA for missing.
3 / 32
Boolean Algebra
●
The idea of Boolean algebra is to formalize a
mathematical approach to logic.
●
Boolean algebra tells us how to evaluate the truth of
compound statements.
– A ← “sky is clear”
– B ← “it is raining”
– “A and B” is the statement that it is both clear and
raining.
4 / 32
Boolean Algebra – Truth Table
5 / 32
Logical Operations in R
●
You can use a Boolean vector to access selected
elements in any vector.
> a <- c(TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
> v <- 1:8
> v[a]
[1] 1 4 6 7
●
We can do some arithmetic operations on Boolean
vectors:
> sum(a)
[1] 4
> mean(a)
[1] 0.5
>v+a
6 / 32
[1] 2 2 3 5 5 7 8 8
Boolean Algebra in R
>a
[1] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
> b = sample(rep(c(TRUE,FALSE),4),size=8)
>b
[1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE
> !a
[1] FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
>a|b
[1] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
>a&b
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
> !(a | b)
[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
> !a | !b
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
> !(a & b)
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
> xor(a, b) 7 / 32
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
Relational Operators
●
It is often necessary to test relations when
programming. R allows for equality and inequality
relations to be tested using the relational operators:
< , > , == , >= , <= , !=
●
Some simple examples follow.
> x <- sample(1:10, size=5, replace=TRUE)
>x
[1] 4 10 9 8 4
> x == 4
[1] TRUE FALSE FALSE FALSE TRUE
> x != 8
[1] TRUE TRUE TRUE FALSE TRUE
> x / 2 <= 4
[1] TRUE FALSE FALSE TRUE TRUE
> x[x/3 >= 3] 8 / 32
[1] 10 9
Try it Yourself!
9 / 32
Data Frames, Tibbles, and Lists
●
Data sets frequently consist of more than one column of
data, where each column represents measurements of a
single variable. Each row usually represents a single
observation.
●
Most data sets are stored in R as data frames or tibbles.
Tibbles are very similar to data frames.
●
Both are like matrices, but with the columns having their
own names.
●
Several data frames come with R.
– An example is women and mtcars.
10 / 32
Summary Content of Data frames
●
In R, we can see a brief content of the table using head() function.
●
Summary statistics and plotting of data frames have their special
functions.
> head(women)
height weight
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
> summary(women)
height weight
Min. : 58.0 Min. : 115.0
1st Qu.: 61.5 1st Qu.:124.5
Median : 65.0 Median :135.0
Mean : 65.0 Mean :136.7
3rd Qu.: 68.5 3rd Qu.:148.0 11 / 32
Max. : 72.0 Max. : 164.0
Summary Content of Data frames
●
You can also display the content of a data frame using str() function:
> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
12 / 32
Dimensions of Data Frames
●
The number of rows and number of columns can be
determined for any data frame using the following
functions:
> nrow(mtcars)
[1] 32
> ncol(mtcars)
[1] 11
> dim(mtcars)
[1] 32 11
13 / 32
Simple Plots for Data in Data frames
●
Simple plots of data in data frames is easy.
14 / 32
Extracting data frame elements and
subsets
●
We can extract elements from data frames using similar
syntax to what was used with matrices. Consider the
following examples:
> mtcars[3, 5]
[1] 3.85
> mtcars[2,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
> mtcars[,4]
[1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205
215 230 66 52 65 97 150
[23] 150 245 175 66 91 113 264 175 335 109
15 / 32
Extracting data frame columns
●
Data frame columns can also be addressed using their
names using the $ operator. For example, the weight
column can be extracted as follows:
> mtcars$wt
[1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440
3.440 4.070 3.730 3.780 5.250
[16] 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840 3.845
1.935 2.140 1.513 3.170 2.770
[31] 3.570 2.780
> women$weight
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
> women$height[women$weight > 130]
[1] 64 65 66 67 68 69 70 71 72
16 / 32
Using “with”
●
The with() function allows us to access columns of a data
frame directly without using the $ . For example, we can
divide the weights by the heights in the women data
frame using
17 / 32
Taking Random Samples
●
The sample() function can be used to take samples (with
or without replacement) from larger finite populations
whose data are stored in data frames.
> s <- sample(1:nrow(mtcars), size=8, replace=FALSE)
> mtcars[s,]
mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
18 / 32
Constructing Data Frames
●
Use the data.frame() function to construct data
frames from vectors that already exist in your
workspace:
> x <- 2 ^ seq(1,15)
> y <- seq(1,15) ^ 2
> z <- x > y
> f <- data.frame(x, y, z)
> head(f)
x y z
1 2 1 TRUE
2 4 4 FALSE
3 8 9 FALSE
4 16 16 FALSE
5 32 25 TRUE
6 64 36 TRUE
19 / 32
Non-numeric columns in data frames
●
Columns of data frames can be of different types. For
example, the built-in data frame chickwts has a numeric
column and a factor. Again, the summary() function provides a
quick peek at this data set.
> summary(chickwts)
weight feed
Min. : 108.0 casein :12
1st Qu.: 204.5 horsebean:10
Median : 258.0 l inseed :12
Mean : 261.3 meatmeal :11
3rd Qu.: 323.5 soybean :14
Max. : 423.0 sunflower:12
> nrow(chickwts)
[1] 71
20 / 32
Lists in R
●
Data frames are actually a special kind of list, or structure.
●
Lists in R can contain any other objects. You won’t often
construct these yourself, but many functions return
complicated results as lists.
●
The list() function is one way of organizing multiple pieces of
output from functions. For example,
> x <- c(2, 4, 6)
> y <- c(8, 9)
> z <- list(x = x, y = y)
>z
$x
[1] 2 4 6
$y
21 / 32
[1] 8 9
Working with lists
●
There are several functions which make working with lists
easy. Two of them are lapply() and vapply() .
●
The lapply() function “applies” another function to every
element of a list and returns the results in a newlist; for
example,
>z
$x
[1] 2 4 6
$y
[1] 8 9
> lapply(z, mean)
$x
[1] 4
$y
[1] 8.5
22 / 32
Working with lists
●
In a case like this, it might be more convenient to have the
results in a vector; the vapply() function does that. It takes a
third argument to tell R what kind of result to expect from the
function. In this case each result of mean should be a number,
so we could use where the 1 just serves as an example of the
type of output expected.
23 / 32
Data Input and Output
●
When in an R session, it is possible to read and
write data to files outside of R, for example on
your computer’s hard drive.
●
You can get and set the working directory in
which files are stored.
> getwd()
[1] "/home/tamer"
> setwd("/home/tamer/work")
Error in setwd("/home/tamer/work") : cannot change working
directory
> setwd("/home/tamer/Work")
24 / 32
dump() and source()
●
To write your data on the default working directory, use dump()
function. To read data stored in the default working directory, use
source() function
●
This reads and executes the command in wa.R, resulting in the
creation of the wa object in your global environment.
●
If there was an object of the same name there before, it will be
replaced.
●
To save all of the objects that you have created during a session,
type:
> dump(list = objects(), "all.R")
●
This produces a file called all.R on your computer’s hard drive. Using
source("all.R") at a later time will allow you to retrieve all of these
objects.
26 / 32
Redirecting R Output
●
By default, R directs the output of most of its functions to
the screen. Output can be directed to a file with the sink()
function.
●
Consider the greenhouse data in solar.radiation . The
command mean(solar.radiation) prints the mean of the
data to the screen. To print this output to a file called
solarmean.txt instead, run
sink("solarmean.txt") # Create a file solarmean.txt for output
mean(solar.radiation) # Write mean value to solarmean.txt
●
All subsequent output will be printed to the file
solarmean.txt until the command
sink()
27 / 32
The read.table() function
●
Consider the following text data table:
x y z
61 13 4
175 21 18
111 24 14
124 23 18
●
If such a data set is stored in a file called pretend.dat in the
directory myfiles on “/home/tamer/Work” folder, then it can be
read into an R data frame. This can be accomplished by typing
> pretend.df <- read.table("home/tamer/Work/pretend.dat", header = TRUE)
> pretend.df
x y z
1 61 13 4
2 175 21 18
3 111 24 14 28 / 32
4 124 23 18
Reading csv files
●
Comma separated files (csv) are text files that can be
obtained from spreadsheet applications.
●
You can upload csv files in R using the function
read.table() by using sep = "," in the argument.
●
Now, load the “MDPN460-Lecture04.csv” file that
contains the formatted data found in Example 1 in
Lecture 4.
●
Get first the right directory in which the file is stored in
the server’s shared folder.
> papfl1 <- read.table("????/MDPN460-Lecture04.csv", header = TRUE, sep = “,”)
29 / 32
ANOVA using R
> dryingData <- read.table("/home/tamer/Work/MDPN460-Lecture04.csv",
header = TRUE, sep = ",")
> dryingData$Applicator <- as.factor(dryingData$Applicator)
> levels(dryingData$Applicator)
[1] "Brush" "Pad" "Roller"
> boxplot(dryingData$DryingTime ~ dryingData$Applicator, data = dryingData)
ANOVA using R
Response: dryingData$DryingTime
Df Sum Sq Mean Sq F value Pr(>F)
dryingData$Applicator 2 108.97 54.483 4.2748 0.03255 *
Residuals 16 203.92 12.745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Lab Assignment 4