0% found this document useful (0 votes)
30 views32 pages

MDPN460 Lecture05

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views32 pages

MDPN460 Lecture05

Uploaded by

mohamedggharib02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

MDPN460 – Industrial

Engineering Lab
Lecture 5

ANOVA Using R
1 / 32
Today’s Lecture

Basic R programming (Continued)
– Logical vectors and relational operators.
– Data frames and lists.
– Data input and output.

ANOVA using R

Applying ANOVA using R for Example 1 of Lecture 4

2 / 32
Logical Vectors

We have used the c() function to put numeric vectors
together as well as character vectors.

R also supports logical vectors. These contain two
different elements:
– TRUE and
– FALSE ,
– as well as NA for missing.

> l <- c(TRUE, FALSE, FALSE, TRUE, TRUE, NA, TRUE)


>l
[1] TRUE FALSE FALSE TRUE TRUE NA TRUE

3 / 32
Boolean Algebra

The idea of Boolean algebra is to formalize a
mathematical approach to logic.

Boolean algebra tells us how to evaluate the truth of
compound statements.
– A ← “sky is clear”
– B ← “it is raining”
– “A and B” is the statement that it is both clear and
raining.

4 / 32
Boolean Algebra – Truth Table

5 / 32
Logical Operations in R

You can use a Boolean vector to access selected
elements in any vector.
> a <- c(TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE)
> v <- 1:8
> v[a]
[1] 1 4 6 7


We can do some arithmetic operations on Boolean
vectors:
> sum(a)
[1] 4
> mean(a)
[1] 0.5
>v+a
6 / 32
[1] 2 2 3 5 5 7 8 8
Boolean Algebra in R
>a
[1] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE
> b = sample(rep(c(TRUE,FALSE),4),size=8)
>b
[1] TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE
> !a
[1] FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
>a|b
[1] TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE
>a&b
[1] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
> !(a | b)
[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
> !a | !b
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
> !(a & b)
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
> xor(a, b) 7 / 32
[1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
Relational Operators

It is often necessary to test relations when
programming. R allows for equality and inequality
relations to be tested using the relational operators:
< , > , == , >= , <= , !=

Some simple examples follow.
> x <- sample(1:10, size=5, replace=TRUE)
>x
[1] 4 10 9 8 4
> x == 4
[1] TRUE FALSE FALSE FALSE TRUE
> x != 8
[1] TRUE TRUE TRUE FALSE TRUE
> x / 2 <= 4
[1] TRUE FALSE FALSE TRUE TRUE
> x[x/3 >= 3] 8 / 32
[1] 10 9
Try it Yourself!

9 / 32
Data Frames, Tibbles, and Lists

Data sets frequently consist of more than one column of
data, where each column represents measurements of a
single variable. Each row usually represents a single
observation.

Most data sets are stored in R as data frames or tibbles.
Tibbles are very similar to data frames.

Both are like matrices, but with the columns having their
own names.

Several data frames come with R.
– An example is women and mtcars.

10 / 32
Summary Content of Data frames

In R, we can see a brief content of the table using head() function.

Summary statistics and plotting of data frames have their special
functions.
> head(women)
height weight
1 58 115
2 59 117
3 60 120
4 61 123
5 62 126
6 63 129
> summary(women)
height weight
Min. : 58.0 Min. : 115.0
1st Qu.: 61.5 1st Qu.:124.5
Median : 65.0 Median :135.0
Mean : 65.0 Mean :136.7
3rd Qu.: 68.5 3rd Qu.:148.0 11 / 32
Max. : 72.0 Max. : 164.0
Summary Content of Data frames

You can also display the content of a data frame using str() function:

> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

12 / 32
Dimensions of Data Frames


The number of rows and number of columns can be
determined for any data frame using the following
functions:

> nrow(mtcars)
[1] 32
> ncol(mtcars)
[1] 11
> dim(mtcars)
[1] 32 11

13 / 32
Simple Plots for Data in Data frames

Simple plots of data in data frames is easy.

> plot(wt ~ hp, data = mtcars)

14 / 32
Extracting data frame elements and
subsets

We can extract elements from data frames using similar
syntax to what was used with matrices. Consider the
following examples:
> mtcars[3, 5]
[1] 3.85
> mtcars[2,]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
> mtcars[,4]
[1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205
215 230 66 52 65 97 150
[23] 150 245 175 66 91 113 264 175 335 109

15 / 32
Extracting data frame columns


Data frame columns can also be addressed using their
names using the $ operator. For example, the weight
column can be extracted as follows:
> mtcars$wt
[1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440
3.440 4.070 3.730 3.780 5.250
[16] 5.424 5.345 2.200 1.615 1.835 2.465 3.520 3.435 3.840 3.845
1.935 2.140 1.513 3.170 2.770
[31] 3.570 2.780
> women$weight
[1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
> women$height[women$weight > 130]
[1] 64 65 66 67 68 69 70 71 72

16 / 32
Using “with”


The with() function allows us to access columns of a data
frame directly without using the $ . For example, we can
divide the weights by the heights in the women data
frame using

> with(women, weight/height)


[1] 1.982759 1.983051 2.000000 2.016393 2.032258 2.047619
2.062500 2.076923 2.106061 2.119403
[11] 2.147059 2.173913 2.200000 2.239437 2.277778

17 / 32
Taking Random Samples

The sample() function can be used to take samples (with
or without replacement) from larger finite populations
whose data are stored in data frames.
> s <- sample(1:nrow(mtcars), size=8, replace=FALSE)
> mtcars[s,]
mpg cyl disp hp drat wt qsec vs am gear carb
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4

18 / 32
Constructing Data Frames

Use the data.frame() function to construct data
frames from vectors that already exist in your
workspace:
> x <- 2 ^ seq(1,15)
> y <- seq(1,15) ^ 2
> z <- x > y
> f <- data.frame(x, y, z)
> head(f)
x y z
1 2 1 TRUE
2 4 4 FALSE
3 8 9 FALSE
4 16 16 FALSE
5 32 25 TRUE
6 64 36 TRUE
19 / 32
Non-numeric columns in data frames

Columns of data frames can be of different types. For
example, the built-in data frame chickwts has a numeric
column and a factor. Again, the summary() function provides a
quick peek at this data set.

> summary(chickwts)
weight feed
Min. : 108.0 casein :12
1st Qu.: 204.5 horsebean:10
Median : 258.0 l inseed :12
Mean : 261.3 meatmeal :11
3rd Qu.: 323.5 soybean :14
Max. : 423.0 sunflower:12

> nrow(chickwts)
[1] 71
20 / 32
Lists in R

Data frames are actually a special kind of list, or structure.

Lists in R can contain any other objects. You won’t often
construct these yourself, but many functions return
complicated results as lists.

The list() function is one way of organizing multiple pieces of
output from functions. For example,
> x <- c(2, 4, 6)
> y <- c(8, 9)
> z <- list(x = x, y = y)
>z
$x
[1] 2 4 6

$y
21 / 32
[1] 8 9
Working with lists

There are several functions which make working with lists
easy. Two of them are lapply() and vapply() .

The lapply() function “applies” another function to every
element of a list and returns the results in a newlist; for
example,
>z
$x
[1] 2 4 6
$y
[1] 8 9
> lapply(z, mean)
$x
[1] 4
$y
[1] 8.5
22 / 32
Working with lists

In a case like this, it might be more convenient to have the
results in a vector; the vapply() function does that. It takes a
third argument to tell R what kind of result to expect from the
function. In this case each result of mean should be a number,
so we could use where the 1 just serves as an example of the
type of output expected.

> vapply(z, mean, 1)


x y
4.0 8.5

23 / 32
Data Input and Output

When in an R session, it is possible to read and
write data to files outside of R, for example on
your computer’s hard drive.

You can get and set the working directory in
which files are stored.

> getwd()
[1] "/home/tamer"

> setwd("/home/tamer/work")
Error in setwd("/home/tamer/work") : cannot change working
directory
> setwd("/home/tamer/Work")
24 / 32
dump() and source()

To write your data on the default working directory, use dump()
function. To read data stored in the default working directory, use
source() function

> w <- women


> dim(w)
[1] 15 2
> age <- sample(40:70, size=nrow(w), replace=TRUE)
> wa <- data.frame(w, age)
> head(wa)
height weight age
1 58 115 45
2 59 117 45
3 60 120 63
4 61 123 42
5 62 126 69
6 63 129 44
> dump("wa", "wa.R") 25 / 32
dump() and source()

To retrieve the vector in a future session, type
> source("wa.R")


This reads and executes the command in wa.R, resulting in the
creation of the wa object in your global environment.

If there was an object of the same name there before, it will be
replaced.

To save all of the objects that you have created during a session,
type:
> dump(list = objects(), "all.R")


This produces a file called all.R on your computer’s hard drive. Using
source("all.R") at a later time will allow you to retrieve all of these
objects.
26 / 32
Redirecting R Output

By default, R directs the output of most of its functions to
the screen. Output can be directed to a file with the sink()
function.

Consider the greenhouse data in solar.radiation . The
command mean(solar.radiation) prints the mean of the
data to the screen. To print this output to a file called
solarmean.txt instead, run
sink("solarmean.txt") # Create a file solarmean.txt for output
mean(solar.radiation) # Write mean value to solarmean.txt

All subsequent output will be printed to the file
solarmean.txt until the command
sink()
27 / 32
The read.table() function

Consider the following text data table:
x y z
61 13 4
175 21 18
111 24 14
124 23 18


If such a data set is stored in a file called pretend.dat in the
directory myfiles on “/home/tamer/Work” folder, then it can be
read into an R data frame. This can be accomplished by typing
> pretend.df <- read.table("home/tamer/Work/pretend.dat", header = TRUE)
> pretend.df
x y z
1 61 13 4
2 175 21 18
3 111 24 14 28 / 32
4 124 23 18
Reading csv files

Comma separated files (csv) are text files that can be
obtained from spreadsheet applications.

You can upload csv files in R using the function
read.table() by using sep = "," in the argument.

Now, load the “MDPN460-Lecture04.csv” file that
contains the formatted data found in Example 1 in
Lecture 4.

Get first the right directory in which the file is stored in
the server’s shared folder.
> papfl1 <- read.table("????/MDPN460-Lecture04.csv", header = TRUE, sep = “,”)

29 / 32
ANOVA using R
> dryingData <- read.table("/home/tamer/Work/MDPN460-Lecture04.csv",
header = TRUE, sep = ",")
> dryingData$Applicator <- as.factor(dryingData$Applicator)
> levels(dryingData$Applicator)
[1] "Brush" "Pad" "Roller"
> boxplot(dryingData$DryingTime ~ dryingData$Applicator, data = dryingData)
ANOVA using R

> anova(lm(dryingData$DryingTime ~ dryingData$Applicator, dryingData))


Analysis of Variance Table

Response: dryingData$DryingTime
Df Sum Sq Mean Sq F value Pr(>F)
dryingData$Applicator 2 108.97 54.483 4.2748 0.03255 *
Residuals 16 203.92 12.745
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Lab Assignment 4

Use the results of the paper airplane


flight experiments in the last week to
test the hypothesis that the two
teams resulted in similar mean flight
lengths.

-- to be done this Thursday.

You might also like