0% found this document useful (0 votes)

86 views73 pages

R Programming Slides

The document provides an introduction to performing statistical analysis in R. It discusses downloading and installing R/RStudio, loading and working with data sets in R, performing basic math and calculations, creating vectors and matrices, plotting graphs, summarizing data through measures like mean and variance, and performing hypothesis tests. The key topics covered include data import, exploration, visualization, and basic statistical analysis capabilities in R.

Uploaded by

Yan Jun Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views73 pages

R Programming Slides

Uploaded by

Yan Jun Ho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Workshop

Introduction to R-Program
R/R Studio-program installation

• The latest copy of Rcan be downloaded from

the CRAN
https://cloud.r-project.org/
• RStudio is an integrated development
environment for Rprogramming.Download
and install
http://www.rstudio.com/download.
RPackages

• R-Packages
• Rpackages can also be downloaded from this
site or alternatively, they can be obtained via
Ronce the package has beeninstalled.

• library() function is used to load libraries, or

groups of functions and data sets thatare not
included in the base Rdistribution.
Importing Data

data=read.csv(file.choose(),header=TRUE)
Datasets in R
• There are some built-in datasets in R. These
datasets are stored as data frames. To see the list
of datasets, type
• data()
• To open the dataset called trees, simply type
• data(trees)
• You can access single variables in a data frame by
using the $ argument.
• trees$Height
• sum(trees$Height) # sum of just these values ,
2356
c() Function

• The c() Function and the Assignment Operator is useful command in R for
entering smalldata sets.Thisfunction combinesterms together.

• Toenter this into an Rsession, wetype

• diceroll <-c(2,5,1,6,5,5,4,1)
• diceroll
• [1] 2 5 1 6 5 5 41
c() Function
• All variables or \objects created inRare stored in
the workspace.
• To see what variables are in the workspace, you
can use the function ls() to listthem.

• If we define a new variable a simple function of

the variable diceroll it will be added to the
workspace:
• newdiceroll <- diceroll/2 # divide every element
by two
• newdiceroll
• 1.0 2.5 0.5 3.0 2.5 2.5 2.0 0.5
The Workspace

• You can add a comment to a command line by

beginning it with the# character.

• To remove objects from the workspace, use the rm()

function:

• rm(newdiceroll) # this was a silly variable anyway

• ls()
• [1] "diceroll"
help()

• Help
• There is text help available from within Rusing
the function help() or the ? character typed
before a command.

• For example, suppose you would like to learn

more about the function log() inR.
• help(log)
• ?log
Ras a calculator

• Basic Math
• One of the simplest (but very useful) ways to use Ris asa
powerful number cruncher.
• Examples
– 2+3
– [1] 5
– 3/2
– [1] 1.5
– 2^3
– [1] 8
– 4^2 -3*2
– [1] 10
– (56-14)/6 - 4*7*10/(5^2-5) # this is more complicated
– [1] -7
Ras a calculator

• Other standard functions that are foundon

most calculators are available in R:
ArithmeticOperators
LogicalOperators
LogicalOperators
ArithmeticOperators
• Example
– sqrt(2)
– [1] 1.414214
– abs(2-4)
– [1] 2
– cos(4*pi)
– [1] 1
– log(0) # not defined
– [1] -Inf
– factorial(6) # 6!
– [1] 720
– choose(52,5) # this is 52!=(47!*5!)
– [1] 2598960
Vector ArithmeticOperators
• Vectors can be manipulated in a similar manner to
scalars by using the same functions introduced in the
last section.
– x <- c(1,2,3,4)
– y <- c(5,6,7,8)
– x*y
– [1] 5 12 21 32
– y/x
– [1] 5.00 3.00 2.33 2.000
– y-x
– [1] 4 4 4 4
– x^y
– [1] 1 64 2187 65536
ArithmeticOperators

• Other useful functions that pertain tovectors

include:
ArithmeticOperators
• Some examples using these functions:
– s <-c(1,1,3,4,7,11)
– length(s)
– [1] 6
– sum(s) # 1+1+3+4+7+11
– [1] 27
– prod(s) # 1*1*3*4*7*11
– [1] 924
– cumsum(s)
– [1] 1 2 5 9 16 27
– diff(s) # 1-1, 3-1, 4-3, 7-4, 11-7
– [1] 0 2 1 3 4
– diff(s, lag = 2) # 3-1, 4-1, 7-3, 11-4
– [1] 2 3 4 7
Matrix Operations
• Matrix Operations
• Among the many powerful features of Ris its ability
to perform matrix operations. You can create matrix
objects from vectors of numbers using the matrix()
command.
• a <- c(1,2,3,4,5,6,7,8)
• A <- matrix(a,nrow=2,ncol=4, byrow=FALSE) # a is
different from A
• Note that we could have left o the
byrow=FALSE argument, since this is the
default value.
• A <- matrix(a,2,4)
Matrix Operations

• Example
– a <- c(1:10)
– A <- matrix(a, nrow = 5, ncol = 2) # fill in by
column.
– B<- matrix(a, nrow = 5, ncol = 2, byrow = TRUE) #
fill in byrow.
– C<- matrix(a, nrow = 2, ncol = 5, byrow =TRUE)
Matrix Operations

• Matrix operations (multiplication, transpose,

etc.) can easily be performed in Rusing a few
simple.
Matrix Operations

• Using the matrices A, B, and Cjust created, we

can have some linear algebra calculation using
the above functions.
– t(C) # this is the same asA
– B%*%C
– D <-C%*%B
– det(D)
– solve(D) # this is D-1
Exercises

• Use Rto compute the following:

Exercises-Answer
• 1. abs(2^3-3^2)
– [1] 1
• 2. exp(exp(1))
– [1] 15.15426
• 3. (2*3)^8+log(7.5)-cos(pi/sqrt(2))
– [1] 1679619
• 4.
– a=c(1,2,3,2,2,1,6,4,4,7,2,5)
– b=c(1,3,5,2,0,1,3,4,2,4,7,3,1,5,1,2)
– A=matrix(a, nrow=3, ncol=4, byrow = TRUE)
– B=matrix(b, nrow = 4, ncol=4, byrow = TRUE)
– A
– B
– A%*%solve(B)
– B%*%t(A)
• 5. prod(2,5,6,7)*prod(-1,3,-1,-1)#
-1260
Graphs in R
• The plot() Function

• The most common function used to graph anything in R is

the plot() function. This is a generic function that can be
used for scatterplots, time-series plots, function graphs,
etc.

– plot(x, y)
– x=c(2,3,4,5,7,8,9,1)
– y=c(3,4,5,2,5,8,7,2)
– plot(x,y)
– data(trees)
– plot(Height, Volume)
Graphs in R

• The curve() Function

• To graph a continuous function over a range of

values, the curve() function can beused.

– curve(sin(x), from = 0, to =2*pi)

Graphs in R
• Additional Features on Graphs
Summarizing Data

• Rincludes several functions for computing

sample statistics for both numerical (both
continuous and discrete) and categorical data.
Summarizing Data
• Example
• Lets consider the dataset mtcars in Rcontains measurements on 11
aspects of automobile design and performance for 32 automobiles
(1973-74 models).

– data(mtcars) # load in dataset

– attach(mtcars) # add mtcars to searchpath
– mtcars
– mean(hp) #146.6875
– var(mpg)#36.3241
– quantile(qsec, probs = c(.20, .80)) # 20&80 percentiles(16.7, 19.3)
– cor(wt,mpg) # not surprising that thisis negative
– For the discrete variables, we can get summarycounts:
– table(cyl)
Graphical Summaries

• For discrete or categorical data, we can display

the information given in a table command in a
picture using the barplot() function.

• barplot(table(cyl)/length(cyl)) # use relative

frequencies on the y-axis.
Graphical Summaries

• hist()
• This function will plot a histogram that is typically used
to display continuous-type data. As an example,
consider the faithful dataset in R, which is a famous
dataset that exhibits natural bimodality. The variable
eruptions gives the duration of the eruption (in
minutes) and waiting is the time between eruptions for
the Old Faithful geyser:
– data(faithful)
– attach(faithful)
– hist(eruptions, main = "Old Faithful data", prob =T)
Graphical Summaries

• We can give the picture a slightly different

look by changing the number of bins

– hist(eruptions, main = "Old Faithful data", prob =

T,breaks=18)
Graphical Summaries

• boxplot()
• This function will construct a single boxplot.
For the two data files in the Old Faithful
dataset:
– boxplot(faithful) # same as boxplot(eruptions,
waiting).
• Thus, the waiting time for an eruption is
generally much larger and has higher
variability than the actual eruption time.
Exercises

• Using the stackloss dataset that isavailable

from R.

• Compute the mean, variance, and 5 number

summary of the variable stack.loss.

• Create a histogram and boxplot for the

variable stack.loss.
Exercises-Answer

• data(stackloss)
• mean(stack.loss)
• fivenum(stack.loss)
• var(stack.loss)

• hist(stack.loss)
• boxplot(stack.loss)
Statistical Analysis Welead

• Make sure you have a good data set;

1. First describe and present your data, e.g.frequency
distributions in tables or charts.
2. Calculate basic statistics where possible, e.g. means and
standard deviations, quintiles etc.
3. Start to interpret your data – what might it mean?
4. Select specific items for closer attention (based onyour
research hypotheses).
5. Select and carry out the right kind of test.
6. Interpret your findings in terms of significance levels.
7. Modify and repeat the analysis if necessary.
The Hypothesis Welead

• Hypothesis is a statement relating to an

observation that may be true but for which a
proof (or disproof) has not beenfound.

• Null hypothesis (H0)

– Opposite of desired result .

• Alternative hypothesis (H1)

– Opposite of the null hypothesis.
Hypothesis Testing Welead

• Hypothesis testing is a procedure, based on

sample evidence, used to determine whether

– the hypothesis is a reasonable statement and

should not be rejected,
or
– unreasonable and should be rejected.
Statistical Comparison Tests Welead

– One-sample T-test.
– Two sample independent T-test.
– Paired sample T-test.
– ANOVA.
•
One group Two groups 3 or more groups
One Sample T-Test Independent Samples T-Test One-
Paired sample T-test. Way
ANOVA
One-sampleT-test Welead

• Compare the sample mean with a known

value, when the variance of the population is
unknown.

• The Rfunction t.test() can be used to perform

both one and two sample t-tests on vectors of
data.
One-sampleT-test
Welead

• The function contains a variety of options and can be

called as follows

• t.test(x, y = NULL, alternative = c("two.sided", "less",

"greater"), mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95).

– x is a numeric vector of data values.

– y is an optional numeric vector of datavalues.
– If y is excluded, the function performs a one-sample t-test
on the data contained in x.
– if it is included it performs a two-sample t-tests using both
x and y.
One-sampleT-test
Welead

• mu provides a number indicating the true value

of the mean (or difference in means if you are
performing a two sample test) under the null
hypothesis.
• The option alternative is a character string
specifying the alternative hypothesis, and must
be one of the following: "two.sided" (which is the
default), "greater" or "less" depending on
whether the alternative hypothesis is that the
mean is different than, greater than or less than
mu, respectively.
One-sampleT-test Welead

• The option var.equal is a logical variable indicating

whether or not to assume the two variances as being
equal when performing a two-sample t-test.

• If TRUE then the pooled variance is used to estimate

the variance otherwise the Welch (or Satterthwaite)
approximation to the degrees of freedom is used. If
you leave this option out it defaults to FALSE.

• Finally, the option conf.level determines the confidence

level of the reported confidence interval for in the one-
sample case and M1- M2 in the two-samplecase.
One-sampleT-test
Welead

• Example

• t.test(x, alternative ="less", mu =10)

• performs a one sample t-test on the data contained in x where the null
hypothesis is that M=10andthe alternative isthatM<10.
One-sampleT-test
Welead

• Example
An outbreak of Salmonella-related illness
attributed to was ice cream
producedthe level of
factory. Scientists measured at Salmonella
a certain
in 9 randomly sampled batches of ice cream. The
levels (in MPN/g) were;
(0.59, 0.14, 0.32, 0.69, 0.23, 0.79, 0.52, 0.39, 0.42).
Is there evidence that the mean level of Salmonella
in the ice cream is greater than 0.3MPN/g?
One-sampleT-test Welead

• Let be the mean level of Salmonella in all batches of ice

cream.
• Here the hypothesis of interest can be expressedas:
– H0:M = 0.3
– Ha: M> 0.3

• Hence, we will need to include the options

alternative="greater", mu=0.3. Below is the relevant R-
code:
– x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392,
0.418)
– t.test(x, alternative="greater", mu=0.3)
One-sampleT-test Welead

• From the output we see that the p-value =

0.029. Hence, there is moderately strong
evidence that the mean Salmonella level in
the ice cream is above 0.3 MPN/g.
Two-sample Independent T-test Welead

• Compare the means of two groups under the assumption that both samples
are random, independent, and normally distributed with unknown but
equalvariances.
Two-sample T-test Welead

• Example
• 6 subjects were given a drug (treatment
group) and an additional 6 subjects a placebo
(control group). Their reaction time to a
stimulus was measured (in ms). We want to
perform a two-sample t-test for comparing
the means of the treatment and control
groups.
Two-sample T-test Welead

• Let µ1 be the mean of the population taking medicine

and µ2 the mean of the untreated population. Here the
hypothesis of interest can be expressedas
– H0: µ1 - µ2 =0
– Ha: µ1 - µ2 <0

• we will need to include the data for the treatment

group in x and the data for the control group in y.
• We will also need to include the options
alternative="less", mu=0.
• Finally, we need to decide whether or not the standard
deviations are the same in bothgroups.
Two-sample T-tests Welead

• Below is the relevant R-code when assuming

equal standard deviation.

– Control = c(91, 87, 99, 77, 88, 91)

– Treat = c(101, 110, 103, 93, 99, 104)
– t.test(Control,Treat,alternative="less",
var.equal=TRUE)
Two-sample t-tests Welead

The output
Two-sample t-tests Welead

• Below is the relevant R-code when not

assuming equal standard deviation.

• t.test(Control,Treat,alternative="less")
Two-sample t-tests Welead

• Here the pooled t-test and the Welsh t-test

give roughly the same results (p-value =
0.00313 and 0.00339, respectively).
Paired sample T-test Welead

• There are many experimental settings where

each subject in the study is in both the treatment
and control group.
• For example, in a matched pairs design, subjects
are matched in pairs and different treatments
are given to each subject in thepair.
• The outcomes are thereafter compared pair-
wise. Alternatively, one can measure each subject
twice, before and after atreatment.
• In either of these situations we can’t use two-
sample t-tests since the independence
assumption is not valid.
Paired sample T-test Welead

• Compare the means of two sets of paired

samples, taken from two populations with
unknown variance.

• The option paired indicates whether or not

you want a paired t-test (TRUE = yes and
FALSE = no). If you leave this option out it
defaults to FALSE.
Paired sample T-test Welead

• Example
• A study was performed to test whether cars get
better mileage on premium gas than on regular gas.
Each of 10 cars was first filled with either regular or
premium gas, decided by a coin toss, and the mileage
for that tank was recorded. The mileage was
recorded again for the same cars using the other kind
of gasoline. We use a paired t-test to determine
whether cars get significantly better mileage with
premium gas.
Paired Sample T-test Welead

• Below is the relevant R-code

– reg = c(16, 20, 21, 22, 23, 22, 27, 25, 27, 28)
– prem = c(19, 22, 24, 24, 25, 25, 26, 26, 28, 32)
– t.test(prem,reg,alternative="greater",
paired=TRUE)
Paired sample T-test Welead

• The output

The results show that the t-statistic is equal to 4.47

and the p-value is 0.00075. Since the p-value is very
low, we reject the null hypothesis. There is strong
evidence of a mean increase in gas mileage between
regular and premium gasoline.
Analysis of Variance (ANOVA) Welead

• The T-test is limited to compare two sets ofdata, but

to compare many groups at once you need analysis
of variance (ANOVA).
• The test statistic is an F test with k-1 and N-k degrees
of freedom, where N is the total number ofsubjects.
• While P-Value < 0.05 for this test indicates evidence
to reject the null hypothesis in favor to the
alternative hypothesis. In other words, there is
evidence that at least one pair of means are not
equal.
Analysis of Variance Welead

• The hypotheses for the comparisonof

independent groups are:
• Ho: µ1 = µ 2 ...= µ k (means of the all groupsare
equal).
• Ha: µ 1 ≠ µ 2 ≠ µ k (means of the two or more
groups are not equal).
– Reject null if at least one population has amean
that differs from the others
Analysis of Variance Welead

• Assumptions:
• Subjects are randomly assigned to one of k
groups.
• The distribution of the means by groupis
normal with equal variances.
• Sample sizes between groups do not have to
be equal, but large differences in samplesizes
by group may affect the outcome of the
multiple comparisons tests.
Analysis of Variance Welead

• In the ANOVAtable

– Sources of variation. The analysis of variance requires the estimation

of two variances: between groups and the within groups.
– SS.Sum of square deviations.
– df. Degrees of freedom.
– MS. Mean square of deviations (variance estimates), which is equalto
SS/df,.
– F.Is a probability distribution. It is the ratio of twovariances.
– P-value. This is the value that answers your question. We interestedto
know whether there is some sort ofrelationship.
– ANOVAassumes by default that there is no relationship.
– As a general rule, a p-value greater than 0.05 meansANOVA
assumption may be right.
Example
Welead

We’re going to use a data set called InsectSprays. 6 different

insect sprays (1 Independent Variable with 6 levels) were tested
to see if there was a difference in the number of insects found in
the field after each spraying (DependentVariable).
data(InsectSprays)
attach(InsectSprays)
str(InsectSprays)
boxplot(count ~ spray)
oneway.test(count~spray)
Example
Welead

• Default is equal variances (i.e. homogeneity of variance) not

assumed – i.e. Welch’s correction applied.
• Oneway.test( ) corrects for non-homogeneity, but doesn’t give
much information – i.e. just F, p-value and dfs for numerator
and denominator – no MSetc.
• aov.out = aov(count ~ spray, data=InsectSprays)
• summary(aov.out)
• The "select if" command or the tapply( ) function can be used
to get standard deviations and sample sizes for eachgroup.
Example
Welead

– mean_group1= tapply(count,spray,mean)
– mean_group1

• Post Hoc tests

• Tukey HSD(Honestly Significant Difference) is
default in R
– TukeyHSD(aov.out)
Example2
Welead

• The data set contains information on 76

people who undertook one of three diets
(referred to as diet A, B and C). There is
background information such as age, gender,
and height. The aim of the study was to see
which diet was best for losingweight.
Example2
Welead

– diet =read.csv(choose.files(),header=TRUE)
– attach(diet)
– aov.out= aov(loss_weight~Diet,data=diet)
– aov.out
– summary(aov.out)

– mean_group= tapply(loss_weight,Diet,mean)
– mean_group
Shapiro-Wilk Normality test Welead

• Shapiro.Test()
• NULLhypothesis that the samples came from a Normal distribution.
• if the p-value <= 0.05, then you would reject the NULLhypothesis .

• The p-value > 0.05 implying that the distribution of the data are not
significantly different from normal distribution. In other words, we can
assume the normality.

– shapiro.test(loss_weight)
– hist(loss_weight, probability=T, breaks = 15, main="Histogram of normal
– data",xlab="Approximately normally distributed data")
– lines(density(loss_weight))
Normality test Welead

• Draw the qq-plot of the normally distributed

data using pch=19 to produce solid circles.
– qqnorm(loss_weight,main="QQ plot of normal
data")
• Add a line where x = y to help assess how
closely the scatter fits theline.
– qqline(loss_weight)
Normality test Welead

• What if the data is notnormally distributed?

• Transform the dependent variable (repeating the

normality checks on the transformed data): Common
transformations include taking the log or square root
of the dependent variable.
• Use a non-parametric test: Non-parametric tests are
often called distribution free tests and can be used
instead of their parametric equivalent.
Non-parametrictests
Welead
Commands for non-parametric tests inR
Welead

Introduction To R
No ratings yet
Introduction To R
20 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R Studio
No ratings yet
R Studio
41 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Rintro
No ratings yet
Rintro
14 pages
R Lab
No ratings yet
R Lab
114 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
P1 - NotesOnR
No ratings yet
P1 - NotesOnR
17 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
RStudio Exercices
No ratings yet
RStudio Exercices
8 pages
R Short Tutorial
No ratings yet
R Short Tutorial
5 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
R Workshop
No ratings yet
R Workshop
47 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
R Lecture 1
No ratings yet
R Lecture 1
17 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
STTN 225 R Summary
No ratings yet
STTN 225 R Summary
18 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
On Hands On R Programming
No ratings yet
On Hands On R Programming
30 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
59 pages
Module 1-1
No ratings yet
Module 1-1
38 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Ex 1 R Basics
No ratings yet
Ex 1 R Basics
5 pages
1research Methodology For Commerce Lab
No ratings yet
1research Methodology For Commerce Lab
35 pages
Getting Started in R
No ratings yet
Getting Started in R
39 pages
STATS LAB Basics of R PDF
No ratings yet
STATS LAB Basics of R PDF
77 pages
A Crash Course in R - Laurent Gautier
No ratings yet
A Crash Course in R - Laurent Gautier
17 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
Introduction To R
No ratings yet
Introduction To R
45 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
R Introduction by Deepayan Sarkar
No ratings yet
R Introduction by Deepayan Sarkar
23 pages
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
No ratings yet
WINSEM2021-22 MAT2001 ELA VL2021220501462 Reference Material I 04-01-2022 1. Introduction of R Language - I
15 pages
About R Language
No ratings yet
About R Language
15 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Teaching R
No ratings yet
Teaching R
15 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Research Methodology For Commerce Lab
No ratings yet
Research Methodology For Commerce Lab
35 pages
R Software Project
No ratings yet
R Software Project
42 pages
L1 Intro R
No ratings yet
L1 Intro R
15 pages
ECON 1100 R04 - R.Commands PDF
No ratings yet
ECON 1100 R04 - R.Commands PDF
15 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
179 pages
Introduction To R
No ratings yet
Introduction To R
19 pages
Introduction To Analytics and R File
No ratings yet
Introduction To Analytics and R File
29 pages
Untitled
No ratings yet
Untitled
59 pages
Introduction To R-Copy1
No ratings yet
Introduction To R-Copy1
16 pages
Émile Benveniste - Subjectivity in Language
100% (1)
Émile Benveniste - Subjectivity in Language
7 pages
DLL Engineering Simple Reliability Calculations: Empirical Failure Rate
No ratings yet
DLL Engineering Simple Reliability Calculations: Empirical Failure Rate
8 pages
Pub The Logic of Sense
100% (2)
Pub The Logic of Sense
204 pages
Explain Steps Involved in Research Process
100% (1)
Explain Steps Involved in Research Process
28 pages
Affective Domain
No ratings yet
Affective Domain
2 pages
Schema-Script Theory: Listening and Reading Process
No ratings yet
Schema-Script Theory: Listening and Reading Process
16 pages
Introductory Statistics (EDUC 102)
No ratings yet
Introductory Statistics (EDUC 102)
16 pages
Fictional Truths: Sherlock Holmes
No ratings yet
Fictional Truths: Sherlock Holmes
10 pages
Subject:: INDE8900-80-R-2020S Statistical Quality Control
No ratings yet
Subject:: INDE8900-80-R-2020S Statistical Quality Control
5 pages
Types of Research: Descriptive Vs Analytical Research
100% (1)
Types of Research: Descriptive Vs Analytical Research
30 pages
Advanced Research Methods in Architecture
100% (1)
Advanced Research Methods in Architecture
9 pages
Guidance On Conducting A Systematic Literature Review: Yu Xiao and Maria Watson
No ratings yet
Guidance On Conducting A Systematic Literature Review: Yu Xiao and Maria Watson
20 pages
Treatise of Human Nature, Book 1: David Hume 1739
No ratings yet
Treatise of Human Nature, Book 1: David Hume 1739
26 pages
R. Henle (1994) Principles of Legality
No ratings yet
R. Henle (1994) Principles of Legality
24 pages
Curriculum Evaluation Insights
No ratings yet
Curriculum Evaluation Insights
9 pages
PPIECH PG Marksheet
No ratings yet
PPIECH PG Marksheet
3 pages
NLP Presuppositions Explained
100% (2)
NLP Presuppositions Explained
2 pages
Midterm Examination Fall 2009 HRM624-Conflict Management (Session - 5)
No ratings yet
Midterm Examination Fall 2009 HRM624-Conflict Management (Session - 5)
6 pages
Doctrine of Res Gestae
No ratings yet
Doctrine of Res Gestae
22 pages
Research in Home Economics:: Interdisciplinary
No ratings yet
Research in Home Economics:: Interdisciplinary
11 pages
Area Under The Disease Progress Curve (AUDPC)
No ratings yet
Area Under The Disease Progress Curve (AUDPC)
9 pages
Confession in Indian Evidence Act
No ratings yet
Confession in Indian Evidence Act
17 pages
Static Lab Discussion & Conclusion
100% (1)
Static Lab Discussion & Conclusion
3 pages
Hobbes' Social Contract Theory - JOCSING
100% (1)
Hobbes' Social Contract Theory - JOCSING
9 pages
Wenger 2000
No ratings yet
Wenger 2000
22 pages
Advanced Statistics Hypothesis Testing
No ratings yet
Advanced Statistics Hypothesis Testing
4 pages
Emotional Self Social Self and Moral Self
No ratings yet
Emotional Self Social Self and Moral Self
7 pages
Thomasson - Fictional Entities
No ratings yet
Thomasson - Fictional Entities
18 pages
UTS Finals Reviewer
No ratings yet
UTS Finals Reviewer
11 pages
Method Validation of Analytical Procedures
100% (1)
Method Validation of Analytical Procedures
14 pages

R Programming Slides

Uploaded by

R Programming Slides

Uploaded by

Workshop

• The latest copy of Rcan be downloaded from

• library() function is used to load libraries, or

• Toenter this into an Rsession, wetype

• If we define a new variable a simple function of

• You can add a comment to a command line by

• To remove objects from the workspace, use the rm()

• rm(newdiceroll) # this was a silly variable anyway

• For example, suppose you would like to learn

• Other standard functions that are foundon

• Other useful functions that pertain tovectors

• Matrix operations (multiplication, transpose,

• Using the matrices A, B, and Cjust created, we

• Use Rto compute the following:

• The most common function used to graph anything in R is

• The curve() Function

• To graph a continuous function over a range of

– curve(sin(x), from = 0, to =2*pi)

• Rincludes several functions for computing

– data(mtcars) # load in dataset

• For discrete or categorical data, we can display

• barplot(table(cyl)/length(cyl)) # use relative

• We can give the picture a slightly different

– hist(eruptions, main = "Old Faithful data", prob =

• Using the stackloss dataset that isavailable

• Compute the mean, variance, and 5 number

• Create a histogram and boxplot for the

• Make sure you have a good data set;

• Hypothesis is a statement relating to an

• Null hypothesis (H0)

• Alternative hypothesis (H1)

• Hypothesis testing is a procedure, based on

– the hypothesis is a reasonable statement and

• Compare the sample mean with a known

• The Rfunction t.test() can be used to perform

• The function contains a variety of options and can be

• t.test(x, y = NULL, alternative = c("two.sided", "less",

– x is a numeric vector of data values.

• mu provides a number indicating the true value

• The option var.equal is a logical variable indicating

• If TRUE then the pooled variance is used to estimate

• Finally, the option conf.level determines the confidence

• t.test(x, alternative ="less", mu =10)

• Let be the mean level of Salmonella in all batches of ice

• Hence, we will need to include the options

• From the output we see that the p-value =

• Let µ1 be the mean of the population taking medicine

• we will need to include the data for the treatment

• Below is the relevant R-code when assuming

– Control = c(91, 87, 99, 77, 88, 91)

• Below is the relevant R-code when not

• Here the pooled t-test and the Welsh t-test

• There are many experimental settings where

• Compare the means of two sets of paired

• The option paired indicates whether or not

• Below is the relevant R-code

The results show that the t-statistic is equal to 4.47

• The T-test is limited to compare two sets ofdata, but

• The hypotheses for the comparisonof

– Sources of variation. The analysis of variance requires the estimation

We’re going to use a data set called InsectSprays. 6 different

• Default is equal variances (i.e. homogeneity of variance) not

• Post Hoc tests

• The data set contains information on 76

• Draw the qq-plot of the normally distributed

• What if the data is notnormally distributed?

• Transform the dependent variable (repeating the

You might also like