R Programming Slides
R Programming Slides
R Programming Slides
Introduction to R-Program
R/R Studio-program installation
• R-Packages
• Rpackages can also be downloaded from this
site or alternatively, they can be obtained via
Ronce the package has beeninstalled.
data=read.csv(file.choose(),header=TRUE)
Datasets in R
• There are some built-in datasets in R. These
datasets are stored as data frames. To see the list
of datasets, type
• data()
• To open the dataset called trees, simply type
• data(trees)
• You can access single variables in a data frame by
using the $ argument.
• trees$Height
• sum(trees$Height) # sum of just these values ,
2356
c() Function
• The c() Function and the Assignment Operator is useful command in R for
entering smalldata sets.Thisfunction combinesterms together.
• Help
• There is text help available from within Rusing
the function help() or the ? character typed
before a command.
• Basic Math
• One of the simplest (but very useful) ways to use Ris asa
powerful number cruncher.
• Examples
– 2+3
– [1] 5
– 3/2
– [1] 1.5
– 2^3
– [1] 8
– 4^2 -3*2
– [1] 10
– (56-14)/6 - 4*7*10/(5^2-5) # this is more complicated
– [1] -7
Ras a calculator
• Example
– a <- c(1:10)
– A <- matrix(a, nrow = 5, ncol = 2) # fill in by
column.
– B<- matrix(a, nrow = 5, ncol = 2, byrow = TRUE) #
fill in byrow.
– C<- matrix(a, nrow = 2, ncol = 5, byrow =TRUE)
Matrix Operations
– plot(x, y)
– x=c(2,3,4,5,7,8,9,1)
– y=c(3,4,5,2,5,8,7,2)
– plot(x,y)
– data(trees)
– plot(Height, Volume)
Graphs in R
• hist()
• This function will plot a histogram that is typically used
to display continuous-type data. As an example,
consider the faithful dataset in R, which is a famous
dataset that exhibits natural bimodality. The variable
eruptions gives the duration of the eruption (in
minutes) and waiting is the time between eruptions for
the Old Faithful geyser:
– data(faithful)
– attach(faithful)
– hist(eruptions, main = "Old Faithful data", prob =T)
Graphical Summaries
• boxplot()
• This function will construct a single boxplot.
For the two data files in the Old Faithful
dataset:
– boxplot(faithful) # same as boxplot(eruptions,
waiting).
• Thus, the waiting time for an eruption is
generally much larger and has higher
variability than the actual eruption time.
Exercises
• data(stackloss)
• mean(stack.loss)
• fivenum(stack.loss)
• var(stack.loss)
• hist(stack.loss)
• boxplot(stack.loss)
Statistical Analysis Welead
– One-sample T-test.
– Two sample independent T-test.
– Paired sample T-test.
– ANOVA.
•
One group Two groups 3 or more groups
One Sample T-Test Independent Samples T-Test One-
Paired sample T-test. Way
ANOVA
One-sampleT-test Welead
• Example
• Example
An outbreak of Salmonella-related illness
attributed to was ice cream
producedthe level of
factory. Scientists measured at Salmonella
a certain
in 9 randomly sampled batches of ice cream. The
levels (in MPN/g) were;
(0.59, 0.14, 0.32, 0.69, 0.23, 0.79, 0.52, 0.39, 0.42).
Is there evidence that the mean level of Salmonella
in the ice cream is greater than 0.3MPN/g?
One-sampleT-test Welead
• Compare the means of two groups under the assumption that both samples
are random, independent, and normally distributed with unknown but
equalvariances.
Two-sample T-test Welead
• Example
• 6 subjects were given a drug (treatment
group) and an additional 6 subjects a placebo
(control group). Their reaction time to a
stimulus was measured (in ms). We want to
perform a two-sample t-test for comparing
the means of the treatment and control
groups.
Two-sample T-test Welead
The output
Two-sample t-tests Welead
• t.test(Control,Treat,alternative="less")
Two-sample t-tests Welead
• Example
• A study was performed to test whether cars get
better mileage on premium gas than on regular gas.
Each of 10 cars was first filled with either regular or
premium gas, decided by a coin toss, and the mileage
for that tank was recorded. The mileage was
recorded again for the same cars using the other kind
of gasoline. We use a paired t-test to determine
whether cars get significantly better mileage with
premium gas.
Paired Sample T-test Welead
– reg = c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28)
– prem = c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32)
– t.test(prem,reg,alternative="greater",
paired=TRUE)
Paired sample T-test Welead
• The output
• Assumptions:
• Subjects are randomly assigned to one of k
groups.
• The distribution of the means by groupis
normal with equal variances.
• Sample sizes between groups do not have to
be equal, but large differences in samplesizes
by group may affect the outcome of the
multiple comparisons tests.
Analysis of Variance Welead
• In the ANOVAtable
– mean_group1= tapply(count,spray,mean)
– mean_group1
– diet =read.csv(choose.files(),header=TRUE)
– attach(diet)
– aov.out= aov(loss_weight~Diet,data=diet)
– aov.out
– summary(aov.out)
– mean_group= tapply(loss_weight,Diet,mean)
– mean_group
Shapiro-Wilk Normality test Welead
• Shapiro.Test()
• NULLhypothesis that the samples came from a Normal distribution.
• if the p-value <= 0.05, then you would reject the NULLhypothesis .
• The p-value > 0.05 implying that the distribution of the data are not
significantly different from normal distribution. In other words, we can
assume the normality.
– shapiro.test(loss_weight)
– hist(loss_weight, probability=T, breaks = 15, main="Histogram of normal
– data",xlab="Approximately normally distributed data")
– lines(density(loss_weight))
Normality test Welead