0% found this document useful (0 votes)
12 views63 pages

R Statistical Package

The document provides a comprehensive overview of the R Statistical Package, detailing its capabilities for statistical analysis and graphics. It includes essential commands, data modes and types, operators, and methods for creating vectors, matrices, lists, and data frames, as well as importing data and performing descriptive statistics. Additionally, it covers how to install and use R packages, assign values to variables, and utilize various functions for data manipulation and analysis.

Uploaded by

bikilaanole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views63 pages

R Statistical Package

The document provides a comprehensive overview of the R Statistical Package, detailing its capabilities for statistical analysis and graphics. It includes essential commands, data modes and types, operators, and methods for creating vectors, matrices, lists, and data frames, as well as importing data and performing descriptive statistics. Additionally, it covers how to install and use R packages, assign values to variables, and utilize various functions for data manipulation and analysis.

Uploaded by

bikilaanole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

R Statistical Package

Dec 2024
Wolaita Sodo,
Ethiopia
Outline
Introduction
Important Commands in R
Data Modes & Types in R
Operators in R
Creating Vectors, Matrix, Lists & Data
frames in R
Importing Data into R
Descriptive Statistics & Graphics in R
Statistical Models in R
Introduction
R is a powerful computer program for
performing statistical analysis and
graphics.
It is free & easy to learn.
R is a platform for the object-oriented
statistical programming language
R has an excellent built-in help system.
 R has excellent graphing capabilities.
R is a computer programming
language.
R is initially written by Ross Ihaka and
Robert Gentleman at Dep. of Statistics of
It is "open-source" software (which for our
purposes means that it can be freely
downloaded);
Download R at http://cran.r‐project.org/
It is available for a number of different
operating systems, including Windows, Linux,
and Macintosh;
By itself is fairly powerful and is extensible
(meaning that procedures for analyzing data
Getting Started
Once you have installed R, there will be an
icon on your desktop. Double click it and R
will start up.
Type 'q()' to quit R.
R does have a few pull-down menus, but
mostly commands in R are entered on the
command line (>).
The > is a prompt symbol displayed by R, not
typed by you. This is R’s way of telling you it’s
ready for you to type a command
To see the list of installed datasets, use the
data method with an empty argument:
> data()
Installing and using R packages
Installing R packages:
install.packages() #To Install a package from CRAN

installed.packages() #To View the list of installed packages

library() # To Load and use an R package

search() #To View loaded R packages

detach(pkg_name, unload = TRUE) # To Unload an R


package

remove.packages() # To Remove installed packages

update.packages() #To Update installed packages


Some Important Commands
>sort() :-used for sorting a vector of values in ascending/
descending order.
rev(x): reverses the elements of x

rev(sort(x)): sorts the elements of x in decreasing/ascending order.


>rank() :-Returns the ranks of the values in a vector of value.
>rep() :- Repeats the same value several times, e.g.,
rep(pi,12)
>seq() :-Generates regular sequences of values, e.g.,
seq(from=5,to=30,by=5)
>print():- Enables to print an object using a different format
than simply typing its name,
e.g., print(pi,digits=20)
>rm():- Removes (i.e., delete) an object
>ls():- Lists all existing objects
round(x, n): rounds the elements of x to n decimals
Cont’d
>length():- Returns the number of values in an R
object
>mean():- Returns the arithmetic average of a vector
>median():- Returns the median of a vector
>max():- Returns the maximum value of a vector
>min():- Returns the minimum value of a vector
>range():- Returns the range of a vector, i.e.,
return the minimum and maximum values
>var():- Returns the variance of a vector
(computed with n-1 as a denominator)
>sd():- Returns the standard deviation of the
vector, i.e., the square root of the variance
>cor():- Returns the correlation coefficient
between two vectors
>summary():- Gives several descriptive statistics of an
object
Cont’d
>sqrt():- Returns the square root of values of a vector
>log():- Returns the natural logarithm of values of a
vector
>log10():- Returns the base-10 logarithm of values of
a vector
>exp():- Returns the exponential of values of a vector
>abs():- Returns the absolute values of a vector
>sin():- Returns the sinus of values of a vector (in
radians)
>cos():- Returns the cosinus of values of a vector (in radians)
>tan():- Returns the tangent of values of a vector (in radians)
Finding help (e.g., on functions)
You will find also several helps in the menu “help” in
the main R window.
Cont’d
For getting help on a specific function or
command, you can use the function help() that
will open a help window, with all e.g.,
>help(mean) or ?mean

>help(var) or ?var
>help(t.test) or ?t.test, etc…
will give you all the information, with examples,
references, etc., on how to use the function
mean() or ?mean, ?var, etc.
Data Modes
Logical - Binary data mode, with values
represented as T or F.
Numeric - Numeric data mode includes
integer , representations of numeric
values.
Complex - Complex numeric values (real
and imaginary parts).
Character - Character values represented
Data Types
Vector : A set of elements in a specified
order.
Matrix : is a two-dimensional array of
elements of the same mode.
Factor : is a vector of categorical data.
Data frame : is a two-dimensional array
whose columns may represent data of
different modes.
R as a calculator
Arithmetic: R can function as a calculator for scalar arithmetic, performing
addition +, subtraction −, multiplication *, division /, exponentiation ˆ , taking
the modulus %%, and integer division %/%. Parentheses () specifies the order
of operations.
Example
>(17*0.35)^(1/3); > log(10); > exp(1); > 3^-1; >2+5
> (3+5/78)^3*7
[1] 201.3761
> 89%%13 # modulus
[1] 11
> 89%/%13 # division
[1] 6
Assigning Values to variables
Variables are assigned using ‘<-’ or “=“
> x<-12.6
>x
[1] 12.6
Variables that contains many values (vectors), e.g. with the concatenate
function:
> y<-c(3,7,9,11)
>y
[1] 3 7 9 11
Assigning Values to variables
Operator ‘:’ means “a series of integers between”:
> x<-1:6
> x
[1] 1 2 3 4 5 6
 Object names cannot contain `strange' symbols like !, +, -, #.
 A dot (.) and an underscore ( _) are allowed, also a name starting with a dot.
 Object names can contain a number but cannot start with a number.
 R is case sensitive, X and x are two different objects, as well as temp and temP.
 > x = sin(9)/75
 > y = log(x) + x^2
> x
> y
 > m <- matrix(c(1,2,4,1), ncol=2)
> m
 > solve(m)
 To list the objects that you have in your current R session use the function ls or
the function objects.
 > ls()
 [1] "x" "y"
 10
Operators in R
I. Arithmetic Operators
* : Multiply
+ : Add
- : Subtract
/ : Divide
^ : Exponentiation
%% : Modulus
II. Comparison Operators

!= Not Equal To
< Less Than
<= Less Than or Equal to
== Equal
> Greater Than
>= Greater Than or Equal
to
III. Logical Operators
!: Not
| : Or (For Calculating Vectors and Arrays
of Logical)

||: Sequential or (for Evaluating


Conditionals)
& : And (For Calculating Vectors and
Arrays of Logical)

&&: Sequential And (For Evaluating


Creating Vectors
Vectors in the mathematical sense are one-
dimensional arrays .
Lets create two small vectors with data and a scatter
plot.
>z1 <- c(1,2,3,4,5,6)
>z2 <- c(6,8,3,5,7,1)
>plot(z1,z2)
>title("My first scatterplot")
>a <- c(1,2,5.3,6,-2,4) # numeric vector
>b <- c("one","two","three") # character vector
>c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
Refer to elements of a vector using subscripts.
>a[c(2,4)] # 2nd and 4th elements of vector
Alternatively create a vector as follows:
>d=1:12 #making vector
>e=seq(10,20,0.1) # making vector
MATRIX
Matrices are two-dimensional arrays; higher
dimensions are possible
 All columns in a matrix must have the same
mode(numeric, character, etc.) and the same length.
The general format is:
>mymatrix <- matrix(vector, nrow=r, ncol=c,
byrow=
FALSE,dimnames=list(char_vector_rownames,
char_vector_colnames))
 byrow=TRUE indicates that the matrix should be
filled by rows.
 byrow=FALSE indicates that the matrix should be
filled by columns (the default).
 dimnames provides optional labels for the columns
and rows.
Creating Matrix
An array is simply a vector with an
associated dimension attribute, to give its
shape.
 Arraysare similar to matrices but can have
more than two dimensions. See help(array)
for details.
Example
Use the following vector to create matrix
>cm <-
c(35,14,11,1,4,11,3,0,12,9,38,4,2,5,12,2)
>cm
>dim(cm) <- c(4, 4)
>cm
Cont’d
[,1] [,2] [,3] [,4]
[1,] 35 4 12 2
[2,] 14 11 9 5
[3,] 11 3 38 12
[4,] 1 0 4 2
> dim(cm)
[1] 4 4
Cont’d
>m <- matrix(1:15, 5, 3, byrow=T))# making
matrix
# generates 5 x 4 numeric matrix
>y<-matrix(1:20, nrow=5,ncol=4)
# another example
>cells <- c(1,26,24,68)
>rnames <- c("R1", "R2")
>cnames <- c("C1", "C2")
>mymatrix <- matrix(cells, nrow=2, ncol=2,
byrow=TRUE, dimnames=list(rnames, cnames))
#Identify rows, columns or elements using subscripts.
>y[,4] # 4th column of matrix
>y[3,] # 3rd row of matrix
>y[2:4,1:3] # rows 2,3,4 of columns 1,2,3
>y[2,3] # entry in row 2 and column 3
cbind and rbind can also be used to
create matrices:
> x1 = 1:3
> x2 = c(7,6,6)
> x3 = c(12,19,21)
> A = cbind(x1,x2,x3) # Bind vectors x1, x2, and x3 into a
matrix. Treats each as a column.
> A = rbind(x1,x2,x3) # Bind vectors x1, x2, and x3 into a
matrix. Treats each as a row.
Other matrix commands are:
> dim(A) # get the dimensions of a matrix
> nrow(A) # number of rows; > ncol(A) # number of columns
> apply(A,1,sum) # apply the sum function to the rows of A
> apply(A,2,sum) # apply the sum function to the columns of A
> sum(diag(A)) # trace of A; > A = diag(1:3); solve(A) #
inverse of A
> det(A) # determinant of A
Data frames
 A data frame is more general than a matrix, in that
different columns can have different modes
(numeric, character, factor, etc.).
>d <- c(1,2,3,4)
>e <- c("red", "white", "red", NA)
>f <- c(TRUE,TRUE,TRUE,FALSE)
>mydata <- data.frame(d,e,f)
>names(mydata) <- c("ID","Color","Passed")#variable
names
 There are a variety of ways to identify the elements of a
data frame .
>myframe[3:5] # columns 3,4,5 of data frame
>myframe[c("ID","Age")] # columns ID and Age from data
frame
>myframe$X1 # variable x1 in the data frame
Cont’d
Example
# create a data frame
>age <- c(25, 30, 56)
>gender <- c("male", "female", "male")
>weight <- c(160, 110, 220)
>mydata <- data.frame(age,gender,weight)
Creating Spreadsheet to input data from keyboard
# enter data using editor
>mydata <- data.frame(age=numeric(0), gender=
character(0), weight=numeric(0))
>mydata <- edit(mydata)
# note that without the assignment in the line
above,
# the edits are not saved!
Data frame

Create the following data frame:

Use attach to make the variables accessible


by name:
> attach(worms)
Use names to get a list of variable names:
Data frame
Selecting Parts of a Data frame: Subscripts used
Subscripts within square brackets: to select part of a
dataframe[, means “all the rows” and ,] means “all
the columns”
To select the first three column of the dataframe
worms given next slide
> worms[,1:3]
Area Slope Vegetation
Nashs.Field 3.6 11 Grassland
Silwood.Bottom 5.1 2 Arable
Nursery.Field 2.8 3 Grassland
Rush.Meadow 2.4 5 Meadow
Gunness.Thicket 3.8 0 Scrub
(…)
Lists
Creating Lists
 Lists can be created using the list function. Like data frames, they can
incorporate a mixture of modes into the one list and each component can
be of a different length or size.
 For example, the following is an example of how we might create a list
from scratch.
 L1 <- list(x = sample(1:5, 20, rep=T), y = rep(letters[1:5], 4), z = rpois(20, 1))
> L1; > L1[1] # indexing
Working with Lists
The length of a list is equal to the number of components in that list.
>length(L1)
To determine the names assigned to a list, the names function can be
used. Names of lists can also be altered in a similar way to that
shown for data frames.
> names(L1) <- c("Item1","Item2","Item3")
Joining two lists can be achieved either using the concatenation
function
Concatenation function:
> L2 <- list(x=c(1,5,6,7), y=c("apple","orange","melon","grapes"))
> c(L1,L2)
List vs Vector vs Data frame

list: an ordered collection of data of arbitrary types.


vector: an ordered collection of data of the same
type
Data frame: is supposed to represent the typical
data table that researchers come up with – like a
spreadsheet.
It is a rectangular table with rows and columns;
data within each column has the same type (e.g.
number, text, logical), but different columns may
have different types.

 Subsetting:
 Individual elements of a vector, matrix, array or data
frame are accessed with “[ ]” by specifying their index, or
their name
Useful Functions
>length(object) # number of elements or
components
>names(object) # names
>c(object,object,...) # combine objects into a vector
>cbind(object, object, ...) # combine objects as
columns
>rbind(object, object, ...) # combine objects as rows
>ls() # list current objects
>rm(object) # delete an object
>newobject <- edit(object) # edit copy and save a
new object
>fix(object) # edit in place
Data Import
From the keyboard one by one

c( )

weight <- c(160, 110, 220)

From the file

read.table(); read.csv(); read.dta(); read.spss(); …

# to read data saved as text (tab delimited)saved in D local disk.

#folder name =Rtraining, file name=datatry w/c is excel data.

dat1<-read.table(“D:/Rtraining/datatry.txt", header=TRUE)

dat1

attach(dat1)

# To import Excell data saved as comma separeted value (csv)

getwd() # to get working directory

dat2<-read.csv(“D:/Rtraining/mastmo.csv")# saved in local disk D, folder=Rtraining, file name=mastmo, 54 african

data

dat2
Data Import…
#How to import data from SPSS into R.

library(foreign)# first you have to load the library foreign.

#We need to know the current wokring directory

getwd()

dat3<-read.spss(“D:/Rtraining/employee.sav", header=FALSE)# saved in local disk D,

folder=Rtraining, file name=employee under the folder Rtraining..

dat3

attach(dat3)# to make the variables accessible by name

By a spreadsheet

data.entry()

edit()
Value Labels
 You can use the factor function to create your own value
labels.
# variable v1 is coded 1, 2 or 3
# we want to attach value labels 1=red, 2=blue,3=green
>mydata$v1 <- factor(mydata$v1,
levels = c(1,2,3),
labels = c("red", "blue", "green"))
# mydata$sex <- factor(mydata$sex, levels = c(1,2), labels =
c("male", "female"))
# variable y is coded 1, 3 or 5
# we want to attach value labels 1=Low, 3=Medium, 5=High
>mydata$y <- ordered(mydata$y, levels = c(1,3, 5),
labels = c("Low", "Medium", "High"))
Note: factor and ordered are used the same way, with the
same arguments. The former creates factors and the later
creates ordered factors.
Creating new variables
Use the assignment operator <- to create new variables.
A wide array of operators and functions are available
here.
# Three examples for doing the same computations
>mydata$sum <- mydata$x1 + mydata$x2
>mydata$mean <- (mydata$x1 + mydata$x2)/2
>attach(mydata)
>mydata$sum <- x1 + x2
>mydata$mean <- (x1 + x2)/2
>detach(mydata)
>mydata <- transform( mydata, sum = x1 + x2,mean =
(x1+ x2)/2 )
Recoding variables
In order to recode data, you will probably use
one or more of R's control structures.
# create 2 age categories
>mydata$agecat <- ifelse(mydata$age > 70,
c("older"), c("younger"))
# another example: create 3 age categories
>attach(mydata)
>mydata$agecat[age > 75] <- "Elder“
>mydata$agecat[age > 45 & age <= 75] <-
"Middle Aged“
>mydata$agecat[age <= 45] <- "Young“
>detach(mydata)
Merging Data frames/files/data
 To merge two data frames (datasets) horizontally, use
the merge function. In most cases, you join two data
frames by one or more common key variables (i.e., an
inner join).
 # merge two data frames by ID
>total <- merge(dataframeA, dataframeB,by="ID")
# merge two dataframes by ID and Country
>total <-
merge(dataframeA,dataframeB,by=c("ID","Country"))
ADDING ROWS
 To join two data frames (datasets) vertically, use the
rbind function.
 The two data frames must have the same variables, but
they do not have to be in the same order.
>total <- rbind(dataframeA, dataframeB)
Statistical HypothesisTests
>t.test(): ->Student t-tests(one and two
samples)
>var.test(): ->Fisher(variance tests; one and
equality of variances)
>cor.test(): ->correlation tests
>chisq.test(): ->² test
>prop.test(): ->proportion tests (one &
difference of two proportions)
>Wilcox.test(): ->wilcoxon test(one and two
samples)
Graphical Procedures
> plot(x)->function is used
> plot(xvalues,yvalues)
Histograms
Histograms are a useful graphic for displaying univariate data
>hist(x)
>boxplot(x)#to produce box plot.
Q-Q Plot: to check normality
> qqnorm(resid,main="Normal Q-Qplot")
Changing the Look of Graphics
> plot(xvalues,yvalues, ylab = "Label for y axis", xlab = "Label for x axis", las = 1,
cex.lab = 1.5)
 las : numeric in {0,1,2,3} change orientation of the axis
labels;
 cex.lab : magnification to be used for x and y labels;
 To get full range of changes about graphical parameters:
>?par
Cont’d
Each function has its own set of arguments.
The most common ones are
 xlim,ylim: range of variable plotted on the x
and y axis respectively
 pch, col, lty: plotting character, colour and
line type
 xlab, ylab: labels of x and y axis respectively
 main, sub: main title and sub-title of graph
 type=“l” (line),”p” (point),”h” (vertical line)…
Example:
## plot the graph of f(x)=x^2+2x+9 b/n x=-3
and x=3
> x<-seq(-3,3,0.01)
> f<-x^2+2*x+9
> plot(x,f,type="l",main="Graph of
Quadratic",xlab="Xvalue",ylab="funalvalue
",col="red")
Cont’d
Graph of Quadratic

20
funalvale

15
10

-3 -2 -1 0 1 2 3

Xvalue

> #plot of histogram


> xx<-rnorm(100,36,10)
> hist(xx,main="Histogram",nclass=25,col=5)

Histogram
10
8
Frequency

6
4
2
0

10 20 30 40 50 60

xx
Example
The plot of x^3 −3x between x=−2 and x=2:
>curve(x^3-3*x, -2, 2)
Here is the more cumbersome code to do the
same thing using plot:
>x<-seq(-2,2,0.01)
>y<-x^3-3*x
>plot(x,y,type="l")
More Graphical Parameters
C0lor options and their descriptions
>col # Default plotting color. Some functions
(e.g. lines) accept a vector of values that are
recycled.
>col.axis # color for axis annotation
>col.lab # color for x and y labels
>col.main # color for titles
>col.sub #color for subtitles
>fg # plot foreground color (axes, boxes - also
sets col= to same)
>bg # plot background color
Scatterplot Matrices
# Basic Scatterplot Matrix
>pairs(~mpg+disp+drat+wt,data=mtcars,
main="Simple Scatterplot Matrix")
Statistical Models in R
Regression Model in R
>fit1<-lm(y ∼ x) : ->Simple regression
>lm(y ∼ 1+x): -> Explicit intercept
>lm(y ∼ -1 + x):-> Through the origin
>fit<-lm(y ∼ x + x2):-> Quadratic regression
>fit<-lm(y ∼ x1 + x2 + x3):-> Multiple Regression
>coef(fit)-> to find regression coefficients
>resid(fit) -> to find residuals
>fitted(fit) -> to find fitted values
>summary(fit) -> to find analysis summary
>predict(fit)-> predict for new data
>anova(fit) # to get anova table
>deviance(fit)-> residual sum of squares
>plot(resid, fitted) #to check constant variance assumption
>qqnorm(resid(fit)) # to check normality assumption
>X <- model.matrix(˜ y - 1, Data)
Cont’d
Fitting the Model
# Multiple Linear Regression Example
>fit <- lm(y ~ x1 + x2 + x3, data=mydata)
>summary(fit) # show results
# Other useful functions
>coefficients(fit) # model coefficients
>confint(fit, level=0.95) # CIs for model parameters
>fitted(fit) # predicted values
>residuals(fit) # residuals
>anova(fit) # anova table
>vcov(fit) # covariance matrix for model parameters
>influence(fit) # regression diagnostics
# diagnostic plots provide checks for heteroscedasticity,
normality, and influential observations.
>plot(fit) #Diagnostic Plots.
Example of Simple LRM from R
data set
>data()# to view data set available in R
>edit(cars) -> close it # to import data frame
named cars to our current working space
>names(cars)
[1] "speed" "dist"
> y<-cars$speed
> x<-cars$dist
> fit<-lm(y~x)
>Fit
>plot(resid(fit),fitted(fit),main=“CCVA”,
ylab=“fitted”, xlab=“resid”)
>qqnorm(resid(fit),main=“QQ plot”)
Example of Multiple LRM from R
data set
>data()# to view data set available in R
>edit(rock) -> close it # to import data frame
named rock to our current working space
>names(rock)# names in data frame rock
[1] "area" "peri" "shape" "perm"
> Y<-rock$area
> X1<-rock$peri
> X2<-rock$shape
> X3<-rock$perm
> fit1<-lm(Y~X1+X2+X3)# fitting multiple
linear Regression model
> fit1
Tree data example
>data() # to view data set in R
>edit(trees) # to view data trees
> names(trees)
[1] "Girth" "Height" "Volume"
> Y<-trees$Girth
> x1<-trees$Height
> x2<-trees$Volume
> fit1<-lm(Y~x1+x2)
>fit1
>coef(fit1)
>anova(fit1)
Extracting Statistics from the Regression
The most important statistics and parameters of a
regression are stored in the lm object or the summary
object.
> output <- summary(result)
> SSR <- deviance(result)
> LL <- logLik(result)
> DegreesOfFreedom <- result$df
> Yhat <- result$fitted.values
> Coef <- result$coefficients
> Resid <- result$residuals
> s <- output$sigma
> RSquared <- output$r.squared
> CovMatrix <- s^2*output$cov
> aic <- AIC(result)
>vcov() #variance-covariance matrix of the coefficients
linear model
>lm(y~x) : ->To fit régression model
>lm(y~x1+x2): ->To fit multiple linear régression
model using two regressors x1 & x2
>aov(y~x): -> to fit one way anova model
>f=as.factor(f): ->transforms f into a factor
>lm(y~f) : ->one factor ANOVA
>lm(y~f1+f2) : ->two factors ANOVA
>lm(y~x+f): -> covariance analysis
Families :
?family # to identify the family of model
Logistic regression
glm.out=glm(y~x, binomial)
Poisson régression
glm.out=glm(y~x, poisson)
Remark:
>lm(y~x) equivalent to > glm(y~x, gaussian)
ANOVA MODEL
Partition of variation into
 Between groups
 Within groups
The model:(One Way ANOVA Model)
Yij = m + aj + eij
Assumptions:
 Normality
 Independence
 Homogeneity
Var(Y) = Var(m) + Var(a) + Var(e) = Var(a) + Var(e)
Example: Perform one way ANOVA for the data given in table
below:
>treat <- c(1,1,1,2,2,2,3,3,3) A B C
>y <- c(43, 40, 35, 41, 47, 54, 39, 34, 43
37) 41 39
>treat <- as.factor(treat) 40 47 34
>fit <- aov(y ~ treat) 35 54 37
>summary(fit)
>anova(fit)
treat=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
yield=c(13.25,25.61,37.9,38.65,2.14,14.05,23.8,37.5,38.93,1.91,14
.21,24.76,36.44,37.8,1.78)
>treat <- as.factor(treat)
>fit <- aov(yield ~ treat)
>summary(fit)
>anova(fit)
ANOVA - Fit a Model
# One Way Anova (Completely Randomized Design)
>fit <- aov(y ~ group)
# Randomized Block Design (B is the blocking factor)
>fit <- aov(y ~ A + B)
# Two Way Factorial Design
>fit <- aov(y ~ A + B + A*B, data=mydataframe)
>fit <- aov(y ~ A*B, data=mydataframe) # same thing
# Analysis of Covariance
>fit <- aov(y ~ A + x, data=mydataframe)
For within subjects designs, the dataframe has to be rearranged
so that each measurement on a subject is a separate observation
# One Within Factor
>fit <- aov(y~A+Error(Subject/A),data=mydataframe)
# Two Within Factors W1 W2, Two Between Factors B1 B2
>fit <- aov(y~(W1*W2*B1*B2)+Error(Subject/(W1*W2))+(B1*B2),
data=mydataframe)
Factorial ANOVA
Example :Perform factorial ANOVA using for the
following
Variety data Pesticide Total
1 2 3 4
B1 29 50 43 53 175
B2 41 58 42 73 214
B3 66 85 63 85 305
Total 136 193 154 211 694

Model: product = a(mean) + b(variety) +


g(pesticide)
>variety <- c(1, 1, 1, +e
1, 2, 2, 2,2, 3, 3, 3, 3)
>pesticide <- c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
>product <-
c(29,50,43,53,41,58,42,73,66,85,69,85)
>variety <- as.factor(variety)
>pesticide <- as.factor(pesticide)
>fit<-aov(product~variety+pestcide)
>anova(fit)
Cont’d
>analysis <- aov(product ~ variety + pesticide)
>anova(analysis)
Analysis of Variance Table
Response: product
Df Sum Sq Mean Sq F value
Pr(>F)
variety 2 2225.17 1112.58 44.063 0.000259 ***
pesticide 3 1191.00 397.00 15.723 0.003008 **
Residuals 6 151.50 25.25
---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Lab Activity on One Way ANOVA
Perform One –Way Anova using the following
Data obtained from four treatments
Low Low High High
Sugar - Sugar - Sugar - Sugar -
Low High Low High
Acidity Acidity Acidity Acidity
65 63 72 60
72 69 80 61
77 65 83 55
68 73 75 62
81 71 78 64
59 55 62 50
72 65 84 57
80 66 91 58
66 73 79 60
77 64 84 65
71 60 79 60
91 72 88 63
74 80 86 65
69 62 78 57
73 59 84 64
The Logistic Model
Data:
Binary outcomes (eg disease status)
Aim:
 is to identify which factors influence the outcome

Prob(Yi=0) = exp(hi)/(1+exp(hi))
hi = Sj xij bj - Linear Predictor

xij – Design Matrix (genotypes etc)


bj – Model Parameters (to be estimated)

Model is investigated by
estimating the bj’s by maximum likelihood
testing if the estimates are different from 0
Fitting the Model
>afit <- glm( y ~additive(x),family=‘binomial’)
Model Comparison
> afit <- glm(t$y ~ additive(t$m20))
> gfit <- glm(t$y ~ genotype(t$m20))
> anova(afit,gfit)
R> plasma_glm_1 <- glm(ESR ~ fibrinogen, data =
plasma,family = binomial())# simple Logistic
R> data("womensrole", package = "HSAUR2")
R> fm1 <- cbind(agree, disagree) ~ gender +
education
R> womensrole_glm_1 <- glm(fm1, data =
womensrole,
+ family = binomial())
> no.yes <- c("No","Yes")
> smoking <- gl(2,1,8,no.yes)
> obesity <- gl(2,2,8,no.yes)
> snoring <- gl(2,4,8,no.yes)
> n.tot <- c(60,17,8,2,187,85,51,23)
> n.hyp <- c(5,2,1,0,35,13,15,8)
> data.frame(smoking,obesity,snoring,n.tot,n.hyp)
The gl function, to “generate levels”
R is able to fit logistic regression analyses for
tabular data in two different ways.
> hyp.tbl <- cbind(n.hyp,n.tot-n.hyp)
>
glm(hyp.tbl~smoking+obesity+snoring,family=bi
nomial ("logit"))
logistic regression model is to give the
proportion of diseased in each cell:
> prop.hyp <- n.hyp/n.tot
> glm.hyp <-
glm(prop.hyp~smoking+obesity+snoring,
+ binomial,weights=n.tot)
> summary(glm.hyp)
> confint(glm.hyp)
> exp(confint(glm.hyp))
Thank You!

You might also like