0% found this document useful (0 votes)

91 views130 pages

STAT630 RSlide

This document provides an introduction to a course on R taught by Dr. S. Iddi at the University of Ghana. The course objectives are to provide an introduction to the R environment, teach how to create and manipulate R objects, import and export data, and perform simulations, bootstrapping, and linear modeling. The textbook is "Data Analysis and Graphics Using R" and lectures will cover basics of R including objects, attributes, expressions, and getting help functions. Examples will be drawn from various fields like medicine, biology, economics and finance.

Uploaded by

Tennyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views130 pages

STAT630 RSlide

Uploaded by

Tennyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 130

Introduction to R

For STAT 605/703

Instructor
Samuel Iddi (PhD)

Department of Statistics and Actuarial Science

University of Ghana
[email protected]

February 10, 2020

Dr. S. Iddi (UG) R Training February 10, 2020 1 / 130

Course Information

Learing Objectives:
Provide an introduction to R environment
Create R objects; list, factor and data frame
Subset and index objects
Import and export data
Create new columns, rename, sort and subset dataset
Merge two or more datasets and aggregate data
Perform simulations and boostrapping
Use R to perform fit linear models, interpret results and draw
conclusions.

Dr. S. Iddi (UG) R Training February 10, 2020 2 / 130

Course Information

Textbook:
1 Maindonald, J. and Braun, J. (2003). Data Analysis and Graphics Using
R. New York: Cambridge University Press.
Reference:
1 Chambers (2008). Software for Data Analysis. Springer.
2 Venables W.N. and Ripley B.D (1997). Modern Applied Statistics with
S-PLUS. 2nd Ed. New York: Springer.

Dr. S. Iddi (UG) R Training February 10, 2020 3 / 130

Introduction

In statistics, we study data for some purpose.

Learn statistical methods to analyze data and draw conclusions.
Preparing data, apply methods to data, interpreting and presenting data
receive the least amount of attention in teach and studying statistics.
Examples will be taken from various fields; medicine, biology,
economics, finance etc.
Application through R scripts and packages.
R integrates the task of preparing data, analyzing and presenting in
powerful and flexible way.

Dr. S. Iddi (UG) R Training February 10, 2020 4 / 130

Basic R

Introduction to R

Learning new language does not come easy and the learning curve of R
is steep.

Knowledge of basic to more advanced techniques is an advantage.

The objective of this section is to demystify some things about R

Dr. S. Iddi (UG) R Training February 10, 2020 5 / 130

Basic R Preliminaries

History of R

R is a dialect of the S language developed by John Chambers and others

at Bell Labs.
The S system provides a very flexible and powerful environment for
implementing new statistical ideas
The software facility is also used for data analysis and graphical display.
Ross Ihaka and Robert Gentleman created R in New Zealand in 1991.
It was made free software in 1998.
The R core group is formed (involving some people associated with
S-PLUS - an enhanced version of S).
The first version of R (1.0.0) was released in 2000.

Dr. S. Iddi (UG) R Training February 10, 2020 6 / 130

Basic R Preliminaries

Overview of R

R syntax is very similar to S and so easier for S-PLUS user to switch

over.
The basic R installed is quite lean because functionality is divided into
modular packages.
R has sophisticated graphical and better than most statistical packages.
Contains powerful programming language for developing and
implementing new ideas.
Very active and vibrant user community; R-help and R-devel mailing
lists and stackoverflow.
It‘s free (who hates free lunch?).

Dr. S. Iddi (UG) R Training February 10, 2020 7 / 130

Basic R Preliminaries

Advantages and Drawbacks

Advantages
Run program for any purpose.
Study how program works and adapt to your own needs (free assess to
source code).
Redistribute copies to anyone.
Improve program and share with public.
Drawbacks
Based on an old technology.
Functionality is based on consumer demand and user contribution. If
your favorite method is not implemented, you have to do the job yourself.
Not ideal for all possible situations (a drawback of all software
packages).

Dr. S. Iddi (UG) R Training February 10, 2020 8 / 130

Basic R Preliminaries

Design of the R System

There are two conceptual parts.

The ‘base’ R system which contain base packages.
Everything else.
R functionality is divided into a number of packages
Base package: is required to run R and contain the most fundamental
functions.
Example packages in ‘base’ system: utils, stats, datasets, graphics,
grid, tools, parallel, compilers, splines, stats4.
Recommended packages: tidyverse, dplyr, Rcmdr,Hmisc,plotly,
ggplot2, boot, class, cluster, foreign, KernSmooth, lattice, mgcr,
nlme, nle4, rpart, survival, MASS, spatial, nnet, Matrix,
randomForest.
About 4000 packages are in R CRAN.

Dr. S. Iddi (UG) R Training February 10, 2020 9 / 130

Basic R Preliminaries

How to install packages?

How are R packages installed? We use the R function

install.packages()

Example: textttinstall.packages(’ggplot2’)

You can also install multiple packages by forming a vector.

Example: install.packages(c(’ggplot2’, ’dplyr’))

Note that: install.packages(’ggplot2’, ’dplyr’) will not

work.

Dr. S. Iddi (UG) R Training February 10, 2020 10 / 130

Basic R Preliminaries

R Resources and Getting Help

The following resources can be found from CRAN (visit

http://cran.r-project.org)
An introduction to R.
Writing R extensions.
R data import/export.
etc.
Find help by
searching the web.
reading the manual.
reading FAQ.
ask experienced and skilled friends.
read source code.
ask questions via mailing lists.
Dr. S. Iddi (UG) R Training February 10, 2020 11 / 130
Basic R Preliminaries

Getting Help with R Functions

Within R
Access help file: ?rnorm.

Search help files: help.search("rnorm").

Get arguments: args(rnorm).

Access codes: simply type rnorm.

Dr. S. Iddi (UG) R Training February 10, 2020 12 / 130

Basic R Getting Started with R

Expressions and Assignment

Elementary commands consist of either expression or assignments.

Assignment are indicated by the assignment operator <-.
The # character is used to indicate a comment. Anything including and
to the right of # is ignored.
5+10
## [1] 15

sqrt(4^2+2/3)*pi^3

## [1] 126.5826

sum(4,2,5,1) # sum of all elements

## [1] 12

prod(4,2,5,1) # product of all elements

## [1] 40

Dr. S. Iddi (UG) R Training February 10, 2020 13 / 130

Basic R Getting Started with R

Expressions and Assignment

x<-mean(c(3,5,8,1)) #nothing is printed

x #auto-printing occurs

## [1] 4.25

print(x) #explicit printing

## [1] 4.25

y<-exp(1) # exponential function

## [1] 2.718282

(m<-max(4,2,5,1)) # maximum of all elements

## [1] 5

Dr. S. Iddi (UG) R Training February 10, 2020 14 / 130

Basic R Getting Started with R

Expressions and Assignment

(y<-sqrt(x)) #another way to print results

## [1] 2.061553

msg<-"hello"
print(msg)

## [1] "hello"

z<-1:6 #The operator : is used to create integer sequences.

## [1] 1 2 3 4 5 6

(seq(from=1, to=2, by=0.2))# create a sequence from 1 to 2

## [1] 1.0 1.2 1.4 1.6 1.8 2.0

# with 0.2 increment

(seq(from=1, to=2, length=6))# specify length of sequence
Dr. S. Iddi (UG) R Training February 10, 2020 15 / 130
Objects

Objects

R works with objects.

Understanding objects is key to using R effectively.
Objects consist of vectors, list, matrices, data frame, arrays and
functions.
The most basic object is a vector and consist of elements of the same
class or mode.
Exception: a list which represent a vector can contain objects of different
classes.
Five basic classes of objects: character, numeric (real number), integer,
complex, logical (TRUE/FALSE).

Dr. S. Iddi (UG) R Training February 10, 2020 16 / 130

Objects Object Attributes

Object Attributes

View a list of currently defined objects with ls() or objects()

Remove object from list with the function rm().
Object attributes can be examined and set using various functions.
◦ name: names(), dimnames().
◦ dimensions (eg. matrices, arrays): dim().
◦ class: class(), typeof(), mode().
◦ length: length().
◦ other user-defined.

Dr. S. Iddi (UG) R Training February 10, 2020 17 / 130

Objects Object Attributes

Object Attributes in R

x<- c(1:5) #vector

names(x)<-c("a","c","c","d","e")
names(x)

## [1] "a" "c" "c" "d" "e"

length(x)#length

## [1] 5

z<-c("Male","Female")#character vector
mode(x)

## [1] "numeric"

class(z)

## [1] "character"

Dr. S. Iddi (UG) R Training February 10, 2020 18 / 130

Objects Coercion

Coercion

When different objects are mixed in a vector, coercion occurs so that

every element in the vector is of the same mode.
An object can be coerced from one class to another using the as.*()
functions, if available. Example, as.numeric, as.logical,
as.character, as.matrix, as.factor etc.
Nonsensical coercion results in NA eg. coercing a character object to
numeric or logical object.

Dr. S. Iddi (UG) R Training February 10, 2020 19 / 130

Objects Coercion

Examples: Coercion

z <- c(1.7, "a") ## character

class(z)

## [1] "character"

z <- c(TRUE, 2) ## numeric

class(z)

## [1] "numeric"

z <- c("a", TRUE) ## character

class(z)

## [1] "character"

x<-0:5
class(x)

## [1] "integer"

x<-as.numeric(x)
class(x)

## [1] "numeric"

Dr. S. Iddi (UG) R Training February 10, 2020 20 / 130

Objects Coercion

Examples: Coercion

as.logical(x)

## [1] FALSE TRUE TRUE TRUE TRUE TRUE

##Nonsensical Coercion
y<-c("NPP","NDC","CPP")
as.numeric(y)

## Warning: NAs introduced by coercion

## [1] NA NA NA

as.logical(y)

## [1] NA NA NA

as.complex(y)

## Warning: NAs introduced by coercion

## [1] NA NA NA
Dr. S. Iddi (UG) R Training February 10, 2020 21 / 130
Operators and Special Values

Arithmetic and Logic Operators

Arithmetic operators consist of: + (addition), - (subtraction), *

(multiplication), / (division) and ∧ (power) operators.
They operate on numbers, vectors, matrices etc.
Logical operators: "and" and "or" denoted by & and |.
Others: > (greater than), >= (greater than or equal to), < (less than), <=
(less than or equal to), == (equal to) and =! (not equal to).
Upon evaluation, logical operators return the logical values TRUE or
FALSE.
If operation cannot be accomplished, NA is returned.

Dr. S. Iddi (UG) R Training February 10, 2020 22 / 130

Operators and Special Values

Special Values

Special values:
◦ Logical values: TRUE/FALSE or T/F
◦ Missing values: NA (not available), NaN (not a number)
◦ Inf is a special number which represent infinity. Eg. 1/0.
◦ NaN represent value of an undefined mathematical operations or
missing value. Eg. 0/0.
is.na() is used to test objects if they are NA.
is.nan() is used to test for NaN.
NA values have a class also, so there are integer NA and character NA.
NaN value is also NA but the converse is not true.
If operation cannot be accomplished, NA is returned.

Dr. S. Iddi (UG) R Training February 10, 2020 23 / 130

Operators and Special Values

Examples in R

x <- c(TRUE, FALSE, TRUE, FALSE)

class(x)

## [1] "logical"

(y<-seq(from=-5, to=10, by=1))

## [1] -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

z<-y<=0
z

## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE

(p<-rnorm(10,0,1))

## [1] 1.01953555 -0.61850147 -0.70058543 -1.66749774 -1.48370835

## [6] -0.05437555 0.46999880 -2.01273952 0.66474585 0.26280083

(q<-p>= -1 & p<=1)

## [1] FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE
Dr. S. Iddi (UG) R Training February 10, 2020 24 / 130
Operators and Special Values

Examples in R

0/0

## [1] NaN

1/0

## [1] Inf

(r<-c(1,2,4, NA, NaN, 5, Inf))

## [1] 1 2 4 NA NaN 5 Inf

is.na(r)

## [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE

is.nan(r)

## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE

Dr. S. Iddi (UG) R Training February 10, 2020 25 / 130

Numbers and Vectors

Numbers are treated as numeric objects.

For an integer, specify the L suffix eg. entering 1 gives you a numeric
object, entering 1L gives you an integer.
We can create integers from floating point numbers by going to
◦ the next larger integer: ceiling().
◦ the next smallest: floor().
◦ the next integer closer to zero: trunc().
Round a number to a number of decimal places: round().
R considers a number as a vector of length one.

Dr. S. Iddi (UG) R Training February 10, 2020 26 / 130

Numbers and Vectors

R Examples on Numbers

(x<-20/sqrt(2))

## [1] 14.14214

floor(x)

## [1] 14

ceiling(x)

## [1] 15

trunc(x)

## [1] 14

round(x,digit=2)

## [1] 14.14

round(x,4)
Dr. S. Iddi (UG) R Training February 10, 2020 27 / 130
Numbers and Vectors

Vectors

A vector consist of an ordered collection of elements.

All elements must have the same class (or mode) eg. logical, numeric,
character etc.
Exception: We can mix with any mode the special element, NA (not NaN)
The length of a vector is the number of its elements.
Construct a vector: c() - concatenate elements or vectors.
Vector of dimension zero (empty vector): vector(). Default mode is
logical.
To create a vector of specific mode, simply name the mode as a function.
eg. v=integer(), w=character().
Generate sequence of vectors: seq(), rep() etc.

Dr. S. Iddi (UG) R Training February 10, 2020 28 / 130

Numbers and Vectors

R Examples on Vectors

(x<-c(1,3,4, sqrt(5), 10, -2))

## [1] 1.000000 3.000000 4.000000 2.236068 10.000000 -2.000000

class(x)

## [1] "numeric"

(y<-integer(5))

## [1] 0 0 0 0 0

class(y)

## [1] "integer"

(z<-character(4))

## [1] "" "" "" ""

Dr. S. Iddi (UG) R Training February 10, 2020 29 / 130

Numbers and Vectors

R Examples on Vectors

class(z)

## [1] "character"

(a=rep(3,9))

## [1] 3 3 3 3 3 3 3 3 3

(b=seq(3,8,0.5))

## [1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

(c=1:5)

## [1] 1 2 3 4 5

(v=c(x,c))

## [1] 1.000000 3.000000 4.000000 2.236068 10.000000 -2.000000 1.000

## [8] 2.000000 3.000000 4.000000 5.000000
Dr. S. Iddi (UG) R Training February 10, 2020 30 / 130
Numbers and Vectors

Useful Vector Functions

Working with data requires manipulation of vectors frequently. Some

functions allows to manipulate vectors in R.
sort() - returns a vector which is a sorted version of the input.
order() - returns an integer vector containing the permutation that will
sort the input into ascending order.
rank() - ranks the input vector.
unique() - return unique values of input vector.
diff() - create a vector of differences, xi − xi−k for lag k.
length() - returns length of input vector.
mean() - return the mean of the input vector.

Dr. S. Iddi (UG) R Training February 10, 2020 31 / 130

Numbers and Vectors

Examples

x<-c(1,1,7,3,3,4,3,2,3,2,1,4,1,1,6,4)
sort(x)

## [1] 1 1 1 1 1 2 2 3 3 3 3 4 4 4 6 7

order(x)

## [1] 1 2 11 13 14 8 10 4 5 7 9 6 12 16 15 3

rank(x)

## [1] 3.0 3.0 16.0 9.5 9.5 13.0 9.5 6.5 9.5 6.5 3.0 13.0 3.0 3.0
## [15] 15.0 13.0

Dr. S. Iddi (UG) R Training February 10, 2020 32 / 130

Numbers and Vectors

Examples

unique(x)

## [1] 1 7 3 4 2 6

length(x)

## [1] 16

diff(x,lag=2)

## [1] 6 2 -4 1 0 -2 0 0 -2 2 0 -3 5 3

Dr. S. Iddi (UG) R Training February 10, 2020 33 / 130

Numbers and Vectors

Vector arithmetics

Arithmetic operations, addition, subtraction, multiplication and division

(+, -, *, and respectively) can be applied to vectors.
To avoid printing many digits, especially when dividing, set:
options(digits= ).
Square root function: sqrt().
(x=seq(2,10,by=2))
## [1] 2 4 6 8 10
(y=c(2.2:6.4))
## [1] 2.2 3.2 4.2 5.2 6.2
(z=c(2,4))
## [1] 2 4
x+y
## [1] 4.2 7.2 10.2 13.2 16.2

Dr. S. Iddi (UG) R Training February 10, 2020 34 / 130

Numbers and Vectors

Vector arithmetics

x*y

## [1] 4.4 12.8 25.2 41.6 62.0

x-y

## [1] -0.2 0.8 1.8 2.8 3.8

2*x+y-5*sqrt(x)

## [1] -0.8710678 1.2000000 3.9525513 7.0578644 10.3886117

y/x

## [1] 1.10 0.80 0.70 0.65 0.62

x/z

## Warning in x/z: longer object length is not a multiple of shorter

object length
Dr. S. Iddi (UG) R Training February 10, 2020 35 / 130
Numbers and Vectors

Character Vectors

Data, reports and figures require frequent manipulation of characters.

Character strings are delineated by double or single quotes.
Create a single string: paste().
(x<-c('NPP',"NDC","CPP"))
## [1] "NPP" "NDC" "CPP"
(y <- c("Volta Region is NPP's",'nightmare'))
## [1] "Volta Region is NPP's" "nightmare"
paste(y[1],y[2])
## [1] "Volta Region is NPP's nightmare"
(z<-letters[1:5])
## [1] "a" "b" "c" "d" "e"
(Z<- LETTERS[20:25])
## [1] "T" "U" "V" "W" "X" "Y"

Dr. S. Iddi (UG) R Training February 10, 2020 36 / 130

Numbers and Vectors

Subsets and Index Vectors

Extracting subsets of vectors is frequently required.

To extract a subset, we specify the indices of elements we wish to extract
or exclude.
The index vector specifies the elements to return.
x=34:45 #create a vector x
c(x[2],x[6]) #create a new vector with the second
## [1] 35 39
#and tenth element of x
x[c(2,6)] #an alternative way to extract the
## [1] 35 39
#second and sixth element

Dr. S. Iddi (UG) R Training February 10, 2020 37 / 130

Numbers and Vectors

Index vector of logical values and missing data

Index vectors can be created by logical vectors.

Some numerical data contain missing cases represented by NA.

Arithmetic operations involving NA results in NA.

So we need a way to extract values that are not NA.

identify NA: is.na().

Dr. S. Iddi (UG) R Training February 10, 2020 38 / 130

Numbers and Vectors

R Examples: Index vector of logical values and missing data

x <- c(10, 20, NA, 4, NA, 2)

sum(x)/length(x)

## [1] NA

mean(x)

## [1] NA

(i=is.na(x))#identify NA's

## [1] FALSE FALSE TRUE FALSE TRUE FALSE

y=x[!i] #obtain values of x that are not NA

j=complete.cases(x) #identify values of x that are not NA
yy=x[j] #subset with no NA
sum(y)/length(y) #compute mean

## [1] 9

mean(x, na.rm = TRUE) #yield the same results

Dr. S. Iddi (UG) R Training February 10, 2020 39 / 130
Numbers and Vectors

R Examples: Index vector of logical values and missing data

airquality[1:6, ]#using an inbuild data

## Ozone Solar.R Wind Temp Month Day

## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6

good <- complete.cases(airquality) #remove NA

airquality[good, ][1:6, ] #subset data without NAs

## Ozone Solar.R Wind Temp Month Day

## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 7 23 299 8.6 65 5 7
## 8 19 99 13.8 59 5 8

Dr. S. Iddi (UG) R Training February 10, 2020 40 / 130

Matrices

Creating Matrices

A two dimensional object.

Matrices are vectors with a dimension attribute.
The dimension attribute is itself an integer vector of length 2 (nrow,
ncol).
The function matrix() creates a matrix from a vector.
The general call for the function have the form: matrix(vector,
number of columns, number of rows, byrow=T(or
F))
By default, they are constructed column-wise. For row-wise, use the
option byrow=T.
Matrix can be created from a vector by adding a dimension attribute.
Also with: cbind() and rbind().

Dr. S. Iddi (UG) R Training February 10, 2020 41 / 130

Matrices

Matrices in R

(x <- matrix(1:4, nrow = 2, ncol = 2))

## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

dim(x)

## [1] 2 2

(y <- matrix(c(2, 4, 5, -1,0,-4), nrow = 3, ncol = 2))

## [,1] [,2]
## [1,] 2 -1
## [2,] 4 0
## [3,] 5 -4

Dr. S. Iddi (UG) R Training February 10, 2020 42 / 130

Matrices

Matrices in R

rbind(x,y)

## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [3,] 2 -1
## [4,] 4 0
## [5,] 5 -4

(z<-c(2, 4, 5, -1,0,-4))

## [1] 2 4 5 -1 0 -4

dim(z)=c(3,2)#create matrix from vector using dim attribute

## [,1] [,2]
## [1,] 2 -1
## [2,] 4 0
## [3,] 5 -4

Dr. S. Iddi (UG) R Training February 10, 2020 43 / 130

Matrices

Matrices in R

(p <- 1:3)

## [1] 1 2 3

(q <- 10:12)

## [1] 10 11 12

cbind(p, q)

## p q
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12

(r <- cbind(letters[1 : 4], LETTERS[1 : 4]))

## [,1] [,2]
## [1,] "a" "A"
## [2,] "b" "B"
## [3,] "c" "C"
## [4,]
Dr. "d" "D"
S. Iddi (UG) R Training February 10, 2020 44 / 130
Matrices

Matrix operations

R contains many functions and operators for matrices.

functions:
◦ t() - transpose the input matrix
◦ nrow(), ncol() - returns the number of rows and columns of a
matrix respectively.
◦ solve() - returns inverse of a square matrix.
◦ solve(A,b) - gives the solution for the system of equation
Ax = b
◦ diag(n) - for a positive integer n, generates an n × n identity
matrix
◦ det() - calculate determinant of square matrix.
◦ sum(diag()) - returns the trace of an input matrix.
Arithmetic operators also work with matrices.
◦ A ∗ B - multiply corresponding elements of two matrices A and B.
◦ A% ∗ %B - does proper matrix multiplication.
Dr. S. Iddi (UG) R Training February 10, 2020 45 / 130
Matrices

Functions and Operators for matrices in R

(x<-c(10,2,5,7,125,3,0,1,1))

## [1] 10 2 5 7 125 3 0 1 1

(y<-matrix(x,3,3))

## [,1] [,2] [,3]

## [1,] 10 7 0
## [2,] 2 125 1
## [3,] 5 3 1

det(y)#calculate determant of y

## [1] 1241

sum(diag(y))#return trace of y

## [1] 136

Dr. S. Iddi (UG) R Training February 10, 2020 46 / 130

Matrices

Functions and Operators for matrices in R

t(y)#transpose of y

## [,1] [,2] [,3]

## [1,] 10 2 5
## [2,] 7 125 3
## [3,] 0 1 1

solve(y)#inverse of y

## [,1] [,2] [,3]

## [1,] 0.098307816 -0.005640612 0.005640612
## [2,] 0.002417405 0.008058018 -0.008058018
## [3,] -0.498791297 0.004029009 0.995970991

Dr. S. Iddi (UG) R Training February 10, 2020 47 / 130

Matrices

Functions and Operators for matrices in R

eigen(y)#eigenvalues and eigenvectors of x

## eigen() decomposition
## $values
## [1] 125.148195 9.844519 1.007286
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.06065781 0.87316411 -0.006357337
## [2,] -0.99780533 -0.01939437 0.008167102
## [3,] -0.02655459 0.48704034 -0.999946440

eigen(y)$values

## [1] 125.148195 9.844519 1.007286

eigen(y)$vectors

## [,1] [,2] [,3]

## [1,] -0.06065781 0.87316411 -0.006357337
## [2,] -0.99780533 -0.01939437 0.008167102
## [3,] -0.02655459 0.48704034 -0.999946440
Dr. S. Iddi (UG) R Training February 10, 2020 48 / 130
Matrices

Functions and Operators for matrices in R

y*t(y) #elementwise multiplication

## [,1] [,2] [,3]

## [1,] 100 14 0
## [2,] 14 15625 3
## [3,] 0 3 1

solve(y)%*%y #proper matrix multiplication

## [,1] [,2] [,3]

## [1,] 1.000000e+00 4.163336e-17 0
## [2,] -1.387779e-17 1.000000e+00 0
## [3,] 8.881784e-16 4.440892e-16 1

Dr. S. Iddi (UG) R Training February 10, 2020 49 / 130

Matrices

Functions and Operators for matrices in R

diag(3) #returns 3x3 identity matrix

## [,1] [,2] [,3]

## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1

5+y

## [,1] [,2] [,3]

## [1,] 15 12 5
## [2,] 7 130 6
## [3,] 10 8 6

Dr. S. Iddi (UG) R Training February 10, 2020 50 / 130

Matrices

Subsetting a matrix

A matrix has subscripts mat[i,j].

By default, when a single element of a matrix is retrieved, it is returned
as a vector of length 1 rather than 1 × 1 matrix.
This behaviour can be turned off by setting drop=FALSE
(m <- matrix(c(2, 4, 5, -1,0,-4), nrow = 2, ncol = 3))
## [,1] [,2] [,3]
## [1,] 2 5 0
## [2,] 4 -1 -4
m[1,2]#subseting a single element of a matrix
## [1] 5

Dr. S. Iddi (UG) R Training February 10, 2020 51 / 130

Matrices

Subsetting a matrix

m[1,2,drop=F] #return 1x1 matrix

## [,1]
## [1,] 5

m[,2] #subsetting column 2, returns a vector

## [1] 5 -1

m[,2 ,drop=FALSE] #subsetting column 2, returns a matrix

## [,1]
## [1,] 5
## [2,] -1

Dr. S. Iddi (UG) R Training February 10, 2020 52 / 130

Arrays

A matrix is a two dimension array but larger array can be defined as well.
Arrays have k dimensions.
Each element of an array is accessed with k indices, x[i1,...,ik].
Eg. an array of 3 matrices 2 × 3 each is defined by dim=c(2,3,3).

Dr. S. Iddi (UG) R Training February 10, 2020 53 / 130

Arrays

(x<-array(c(1:18),dim=c(2,3,3)))

## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18

Dr. S. Iddi (UG) R Training February 10, 2020 54 / 130

Arrays

x[,,1] #subset first element in the array

## [,1] [,2] [,3]

## [1,] 1 3 5
## [2,] 2 4 6

x[,1,2] #subset first column in the second matrix

## [1] 7 8

dim(x)

## [1] 2 3 3

Dr. S. Iddi (UG) R Training February 10, 2020 55 / 130

List

Creating, subsetting a list and R "output" as a List

A list is a special type of vector that can contain different objects.

Each component can be an object of different type (i.e vectors and
matrices in the same list) and length.
In contrast, matrices and data frame contain vector of the same length.
To construct a list, use the function list().
Components of a list can be accessed by name and index.
R ’output’ are usually lists.
For example, the eigen(x) is a list that contains the eigen values and
the eigen vectors.

Dr. S. Iddi (UG) R Training February 10, 2020 56 / 130

List

Examples on lists

(x <- list(1, "a", TRUE, 1 + 4i))

## [[1]]
## [1] 1
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+4i

Dr. S. Iddi (UG) R Training February 10, 2020 57 / 130

List

Examples on lists

(y <- list(1, c("Male","Female"), matrix(1:4, 2,2)))

## [[1]]
## [1] 1
##
## [[2]]
## [1] "Male" "Female"
##
## [[3]]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

z <- list(a=1, b=c("Male","Female"), d=matrix(1:4, 2,2)) #list with names

z[2] #subsetting a list

## $b
## [1] "Male" "Female"

z$d #subset with name

## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

Dr. S. Iddi (UG) R Training February 10, 2020 58 / 130

List

Examples on lists

z[["d"]] #alternatively

## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

z[c(2,3)] #subset last two list

## $b
## [1] "Male" "Female"
##
## $d
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4

(ol<-eigen(z$d))

## eigen() decomposition
## $values
## [1] 5.3722813 -0.3722813
##
## $vectors
## [,1] [,2]
## [1,] -0.5657675 -0.9093767
## [2,] -0.8245648 0.4159736

summary(ol)
Dr. S. Iddi (UG) R Training February 10, 2020 59 / 130
Factors

Factors

Factors are used to define groups in vectors.

Each level of a factor defines a group.
Many statistical model functions make use of factors. eg. ANOVA and
regression.
Factors are stored as numerical values with values 1, 2, . . . , k where k is
the number of levels.
Vector strings that hold character values can be changed to factor with
the function factor().
The levels of factor can be assessed using levels().

Dr. S. Iddi (UG) R Training February 10, 2020 60 / 130

Factors

When a vector of string is included as a column of a data frame, R by

default turns the vector into a factor in which the distinct strings are the
level names.
There are some context in which factors become numeric vectors. To
obtain the vector of strings, specify as.character().
To extract the codes 1, 2, . . . , specify as.numeric().
For a factor whose levels are character strings representation of numeric
values eg "10", "2", "3.2", use as.numeric(as.character()) to
extract the numerical values.
When the index variable in a for loop takes factor values, the values are
the integer codes.

Dr. S. Iddi (UG) R Training February 10, 2020 61 / 130

Factors

Example: Factors

(gender<-c(rep("F",3),rep("M",5)))

## [1] "F" "F" "F" "M" "M" "M" "M" "M"

levels(gender)

## NULL

(gender<-factor(gender))

## [1] F F F M M M M M
## Levels: F M

levels(gender)

## [1] "F" "M"

Dr. S. Iddi (UG) R Training February 10, 2020 62 / 130

Factors

Example: Factors

as.numeric(gender)

## [1] 1 1 1 2 2 2 2 2

x<-data.frame(sex=c(rep("M",3),rep("F",4)),
height=ceiling(rnorm(7,10,2)))
levels(x$sex)

## [1] "F" "M"

Dr. S. Iddi (UG) R Training February 10, 2020 63 / 130

Data Frame

Data frame

Used to store tabular data.

These objects fit somewhere in between matrices and lists.
They are not rigid as matrices - contain columns of different classes.
Data frame can contain numerical and character vectors.
They are not as loose as lists - have rectangular structure.
Many functions in R used for analysis require data frame.
They are constructed with the function, data.frame().
Appropriate objects can be coerced into data frame with
as.data.frame().

Dr. S. Iddi (UG) R Training February 10, 2020 64 / 130

Data Frame Creating a Data Frame

Create Data frame

Can be constructed by reading from saved data from text file with
read.table() or read.csv()
Columns of a data frame can be referenced by index or name.
Use $ sign to call for a vector in the data frame.
names() can be used to see the names of a data frame.
Can simply attach (with attach()) data frame and indicate the column
name.
Detach data frame with detach().
Data frames can be converted into matrix with data.matrix() or
as.matrix().

Dr. S. Iddi (UG) R Training February 10, 2020 65 / 130

Data Frame Creating a Data Frame

R examples: data frame

#From Vectors#
scores<-c(50,45,90)
exams<-c("maths","english","science")
(dat1<-data.frame(exams,scores))

## exams scores
## 1 maths 50
## 2 english 45
## 3 science 90

#From Matrix#
(dat2<-data.frame(matrix(1 : 24, nrow = 4, ncol = 6)))

## X1 X2 X3 X4 X5 X6
## 1 1 5 9 13 17 21
## 2 2 6 10 14 18 22
## 3 3 7 11 15 19 23
## 4 4 8 12 16 20 24

Dr. S. Iddi (UG) R Training February 10, 2020 66 / 130

Data Frame Creating a Data Frame

R examples: data frame

(dat3<-as.data.frame(matrix(1 : 24, nrow = 4, ncol = 6)))

## V1 V2 V3 V4 V5 V6
## 1 1 5 9 13 17 21
## 2 2 6 10 14 18 22
## 3 3 7 11 15 19 23
## 4 4 8 12 16 20 24

mat<-data.matrix(dat1) #convert data frame to matrix

mat

## exams scores
## [1,] 2 50
## [2,] 1 45
## [3,] 3 90

Dr. S. Iddi (UG) R Training February 10, 2020 67 / 130

Data Frame Creating a Data Frame

R examples: data frame

##Subset Dataframe##
dat1$exams #subset by name

## [1] maths english science

## Levels: english maths science

dat1[,1] #subset by index

## [1] maths english science

## Levels: english maths science

dat1[,'scores'] #subset by name-index

## [1] 50 45 90

Dr. S. Iddi (UG) R Training February 10, 2020 68 / 130

Data Frame Creating a Data Frame

R examples: data frame

attach(dat1) #attach data frame

## The following objects are masked _by_ .GlobalEnv:

##
## exams, scores

scores #access 'scores' from dat1

## [1] 50 45 90

Dr. S. Iddi (UG) R Training February 10, 2020 69 / 130

Data Frame Reading and Exporting Data

Import data

Data can be entered directly or imported into R.

Data from database management system (DBMS) can be imported
directly into R without first exporting from the system.
To import text or from foreign systems:
◦ Text data: read.table(), read.csv().
◦ Excel data: load library(xlsx) and read.xlsx().
◦ SAS data: load library(Hmics) and sasxport.get().
◦ SPSS data: load library(Hmics) and spss.get().
◦ STATA data: load library(foreign) and read.dta().
◦ SYSTAT data: load library(foreign) and
read.systat().

Dr. S. Iddi (UG) R Training February 10, 2020 70 / 130

Data Frame Reading and Exporting Data

Important arguments in read.table()

The read.table function is one of the most commonly used functions for
reading data. It has a few important arguments:
file, the name of a file, or a connection.
header, logical indicating if the file has a header line.
sep, a string indicating how the columns are separated.
colClasses, a character vector indicating the class of each column in
the dataset.
nrows, the number of rows in the dataset.
comment.char, a character string indicating the comment character.
skip, the number of lines to skip from the beginning.
stringsAsFactors, should character variables be coded as factors?

Dr. S. Iddi (UG) R Training February 10, 2020 71 / 130

Data Frame Reading and Exporting Data

Export data

There are several ways to export R objects into other formats.

For SPSS, SAS and STATA, first load the package foreign.
For Excel, load the xlsReadWrite package.
To export:
◦ To tab delimited text file: write.table(mydata,
"C:/dat.txt", sep="" )
◦ Excel spreadsheet: write.xlsx(mydata, "C:/dat.xlsx").
◦ SAS: write.foreign(mydata, "C:/dat.txt",
"C:/dat.sas", package="SAS").
◦ SPSS: write.foreign(mydata, "C:/dat.txt",
"C:/dat.sps", package="SPSS").
◦ STATA:write.dta(mydata, "C:/dat.dta").

Dr. S. Iddi (UG) R Training February 10, 2020 72 / 130

Data Frame Reading and Exporting Data

Activity

Enter the following data in a .txt file.

Ozone Solar.R Wind Temp Month Day

41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
14.3 56 5 5
28 14.9 66 5 6

Dr. S. Iddi (UG) R Training February 10, 2020 73 / 130

Data Frame Reading and Exporting Data

R examples: import data into R

#mydata<- read.table("C:/Datasets/mydata.txt",header=TRUE)
mydata<-head(airquality)
head(mydata,n=3) #print first 3 rows of mydata

## Ozone Solar.R Wind Temp Month Day

## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3

tail(mydata,n=3) #print last 3 rows of mydata

## Ozone Solar.R Wind Temp Month Day

## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6

Dr. S. Iddi (UG) R Training February 10, 2020 74 / 130

Data Frame Reading and Exporting Data

R examples: export data from R

##Export complete cases

expdat<-mydata[complete.cases(mydata),]
write.table(expdat, "./Datasets/expdat.txt",sep="," )

## Warning in file(file, ifelse(append, "a", "w")): cannot

open file ’./Datasets/expdat.txt’: No such file or directory
## Error in file(file, ifelse(append, "a", "w")): cannot open
the connection

#install.packages('xlsx)
#install.packages('openxlsx)
library(openxlsx)
write.xlsx(expdat,"./Datasets/expdat.xlsx" )

## Note: zip::zip() is deprecated, please use zip::zipr()

instead
## Warning in file.create(to[okay]): cannot create file
’./Datasets/expdat.xlsx’, reason ’No such file or directory’

Dr. S. Iddi (UG) R Training February 10, 2020 75 / 130

Data Frame Reading and Exporting Data

Activity

Enter the following data into Excel

Save data as .csv file
Import data into R

Names Ages Sex

Ben 14 M
Jullie 17 F
Fred 13 M
Ama 14 F
Vic 13 F
Joe 16 M
Sam 9 M
Ellen 11 F

Dr. S. Iddi (UG) R Training February 10, 2020 76 / 130

Data Manipulation Creating and Renaming Datasets

Creating new variable

New variable in dataset is created by using the assignment operator <-

##Create new variable##
hospital <- c("Kolebu", "37 Military", "Police", "Legon")
patients <- c(150, 350, 200,500)
costs <- c(3.1, 2.5, 2.9,2.0)
(HosDat <- data.frame(hospital, patients, costs))
## hospital patients costs
## 1 Kolebu 150 3.1
## 2 37 Military 350 2.5
## 3 Police 200 2.9
## 4 Legon 500 2.0

Dr. S. Iddi (UG) R Training February 10, 2020 77 / 130

Data Manipulation Creating and Renaming Datasets

Creating new variable

##Create new variable##

HosDat$totcosts <- HosDat$patients *HosDat$costs
HosDat
## hospital patients costs totcosts
## 1 Kolebu 150 3.1 465
## 2 37 Military 350 2.5 875
## 3 Police 200 2.9 580
## 4 Legon 500 2.0 1000

##Alternatively, using the 'transform()' function

(mydata <- transform(HosDat, totalcost2 = patients*costs))
## hospital patients costs totcosts totalcost2
## 1 Kolebu 150 3.1 465 465
## 2 37 Military 350 2.5 875 875
## 3 Police 200 2.9 580 580
## 4 Legon 500 2.0 1000 1000

Dr. S. Iddi (UG) R Training February 10, 2020 78 / 130

Data Manipulation Creating and Renaming Datasets

Creating and renaming variables

We can also recode variable

## For 2-categories
HosDat$costs.cat<-ifelse(HosDat$costs <= 2.5,
"Cheap","Expensive")

##For more than 2-categories

HosDat$costs.cat2[costs=2.5]<-"Normal"
HosDat$costs.cat2[costs<2.5]<-"Cheap"
HosDat$costs.cat2[costs>2.5]<-"Expensive"
HosDat
## hospital patients costs totcosts costs.cat costs.cat2
## 1 Kolebu 150 3.1 465 Expensive Expensive
## 2 37 Military 350 2.5 875 Cheap Normal
## 3 Police 200 2.9 580 Expensive Expensive
## 4 Legon 500 2.0 1000 Cheap Cheap

Dr. S. Iddi (UG) R Training February 10, 2020 79 / 130

Data Manipulation Creating and Renaming Datasets

Recoding variable

Variable can be rename interactively with the function fix() or

programmatically by the rename() function from the reshape
package.
##Rename variable##
#fix(HosDat) #rename interactively (Chance hospital = Hospital,
#patients=Patient)

#install.packages('reshape')
library(reshape)
HosDat<-rename(HosDat, c(costs="Costs",totcosts="TotalCosts"))
HosDat
## hospital patients Costs TotalCosts costs.cat costs.cat2
## 1 Kolebu 150 3.1 465 Expensive Expensive
## 2 37 Military 350 2.5 875 Cheap Normal
## 3 Police 200 2.9 580 Expensive Expensive
## 4 Legon 500 2.0 1000 Cheap Cheap

Dr. S. Iddi (UG) R Training February 10, 2020 80 / 130

Data Manipulation Creating and Renaming Datasets

Activity

Assume that we have registered the height and weight for four people:
Heights in cm are 180, 165, 160, 193, 163, 145, 200; weights in kg are 87, 58,
65, 100,150,100, 75. Make two vectors, height and weight, with the data. The
bodymass index (BMI) is defined as
weight in kg
(height in m)2
Create a data frame. Make a column with the BMI values for the four people,
and a column with the natural logarithm of the BMI values.

Make a column to classify BMI values into the following classification.

BMI Classification
< 18.5 Underweight
18.5 − 24.9 Normal weight
25.0 − 29.9 Overweight
≥ 30 Obese

Export your data as a csv file.

Dr. S. Iddi (UG) R Training February 10, 2020 81 / 130
Data Manipulation Sorting and Subsetting Datasets

sorting and subsetting data sets

To sort a data by a variable, use with() and order() functions.

Sorting by default is ASCENDING. To sort by DESCENDING, prepend
the sorting variable with the minus sign (not useful for factors).
#Using the 'iris' dataset
irisdata<- iris

##Sort ascending by Sepal.Length

irisdata<-irisdata[order(irisdata$Sepal.Length),]
head(irisdata)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 14 4.3 3.0 1.1 0.1 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 4 4.6 3.1 1.5 0.2 setosa

Dr. S. Iddi (UG) R Training February 10, 2020 82 / 130

Data Manipulation Sorting and Subsetting Datasets

sorting and subsetting data sets

#Sort ascending by Species

irisdata<-irisdata[order(irisdata$Species),]
head(irisdata)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 14 4.3 3.0 1.1 0.1 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 4 4.6 3.1 1.5 0.2 setosa

Dr. S. Iddi (UG) R Training February 10, 2020 83 / 130

Data Manipulation Sorting and Subsetting Datasets

sorting and subsetting data sets

##Sort descending
irisdata<-irisdata[order(-irisdata$Sepal.Length),]
head(irisdata)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 132 7.9 3.8 6.4 2.0 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 106 7.6 3.0 6.6 2.1 virginica

Dr. S. Iddi (UG) R Training February 10, 2020 84 / 130

Data Manipulation Sorting and Subsetting Datasets

sorting and subsetting data sets

##Sort with two variables

irisdata<-irisdata[order(irisdata$Species, irisdata$Petal.Length),]
head(irisdata)

## Sepal.Length Sepal.Width Petal.Length Petal.Width Species

## 23 4.6 3.6 1.0 0.2 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 17 5.4 3.9 1.3 0.4 setosa

Dr. S. Iddi (UG) R Training February 10, 2020 85 / 130

Data Manipulation Sorting and Subsetting Datasets

Activity

Enter the data below in notepad and import in R.

id h1 h2 h3 w1 w2 w3 sex
1 1 11 101 5 25 35 male
2 2 12 102 6 26 36 male
3 3 13 103 7 27 37 male
4 4 14 104 8 28 38 female
5 5 15 105 9 29 39 female
The following codes can be used to import the data.
wide<-read.table("./Datasets/wide.txt", header=T)

## Warning in file(file, "rt"): cannot open file

’./Datasets/wide.txt’: No such file or directory
## Error in file(file, "rt"): cannot open the connection

##OR wide<-read.table(file.choose(), header=T)

Dr. S. Iddi (UG) R Training February 10, 2020 86 / 130

Data Manipulation Sorting and Subsetting Datasets

subsetting data sets

##Keep and drop variables

#select with variable name
(newWide<-wide[c('h1','h2', 'w1','w2', 'sex', 'id')])

## Error in eval(expr, envir, enclos): object ’wide’ not found

(newWide1<-wide[c(1:2,7:8)]) #select with variable position

## Error in eval(expr, envir, enclos): object ’wide’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 87 / 130

Data Manipulation Sorting and Subsetting Datasets

subsetting data sets

##Keep and drop variables

(newWide2<-wide[-c(1:2,7)]) #exclde with variable position

## Error in eval(expr, envir, enclos): object ’wide’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 88 / 130

Data Manipulation Sorting and Subsetting Datasets

subset() function

We can delete or keep variables and observations.

An easier way to select variables and observations is by using the
subset() function.
attach(wide)
## Error in attach(wide): object ’wide’ not found
#selecting observations
wide[(sex=="male" & w3 > 35),]
## Error in eval(expr, envir, enclos): object ’wide’ not
found
#selecting variables and observations
subset(wide, w3 > 35 & sex=='male', select=id:sex)
## Error in subset(wide, w3 > 35 & sex == "male", select =
id:sex): object ’wide’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 89 / 130

Data Manipulation Sorting and Subsetting Datasets

Activity

For the BMI data above make a vector with the weights for those people who
have a BMI larger than 25.

Dr. S. Iddi (UG) R Training February 10, 2020 90 / 130

Data Manipulation Merging and Aggregating

Merging and Aggregating

Two datasets can be merged by one or more common key with the
command merge(data.frameA, data.frameB,
by=c("common key variable")).
Return only the rows in which the left table have matching keys in the
right table and is called Inner join
Other types of merge
◦ An outer join of data.frameA and data.frameB:
Returns all rows from both tables, join records from the left which have matching keys in the
right table.
◦ A left outer join (or simply left join) of data.frameA and
data.frameB
Return all rows from the left table, and any rows with matching keys from the right table.
◦ A right outer join of data.frameA and data.frameB
Return all rows from the right table, and any rows with matching keys from the left table.

Dr. S. Iddi (UG) R Training February 10, 2020 91 / 130

Data Manipulation Merging and Appending

Merging and Aggregating

Append datasets with the rbind(data.frameA, data.frameB)

command.
Examples
##Mergeing data###
df1<-data.frame(CustomerId = c(1:6),
Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2<-data.frame(CustomerId = c(2, 4, 6),
State = c(rep("Alabama",2), rep("Ohio", 1)))

#Inner join
merge(df1, df2,by="CustomerId")
## CustomerId Product State
## 1 2 Toaster Alabama
## 2 4 Radio Alabama
## 3 6 Radio Ohio

Dr. S. Iddi (UG) R Training February 10, 2020 92 / 130

Data Manipulation Merging and Appending

Merging and Aggregating

#Outer join
merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
## CustomerId Product State
## 1 1 Toaster <NA>
## 2 2 Toaster Alabama
## 3 3 Toaster <NA>
## 4 4 Radio Alabama
## 5 5 Radio <NA>
## 6 6 Radio Ohio

Dr. S. Iddi (UG) R Training February 10, 2020 93 / 130

Data Manipulation Merging and Appending

Merging

#Left outer
merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

## CustomerId Product State

## 1 1 Toaster <NA>
## 2 2 Toaster Alabama
## 3 3 Toaster <NA>
## 4 4 Radio Alabama
## 5 5 Radio <NA>
## 6 6 Radio Ohio

#Right outer
merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
## CustomerId Product State
## 1 2 Toaster Alabama
## 2 4 Radio Alabama
## 3 6 Radio Ohio

#Cross join
#merge(x = df1, y = df2, by = NULL)
Dr. S. Iddi (UG) R Training February 10, 2020 94 / 130
Data Manipulation Merging and Appending

Activity

Create and import the following datasets into R.

Table: Books
Name Title Other Author
Tukey Exploratory Data Analysis
Venables Modern Applied Statistics ... Ripley
Tierney LISP-STAT
Ripley Spatial Statistics
Ripley Stochastic Simulation
McNeil Interactive Data Analysis
R Core An Introduction to R Venables & Smith
Table: Authors
Surname Nationality Deceased
Tukey US yes
Venables Australia no
Tierney US no
Ripley UK no
McNeil Australia no

Merge the two datasets using the various merge types.

Dr. S. Iddi (UG) R Training February 10, 2020 95 / 130
Data Manipulation Aggregating

Aggregating

It is relatively easy to collapse data in R using one or more BY variables

and a defined function.
This can be demonstrated in the example using aggregate()
function.
(aggdata<-aggregate(irisdata[,c(-5)],by=list(irisdata$Species),
FUN=mean, na.rm=TRUE))
## Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026

Dr. S. Iddi (UG) R Training February 10, 2020 96 / 130

Data Manipulation Aggregating

Aggregating

attach(mtcars)
# ?mtcars
(agdata <-aggregate(mtcars, by=list(cyl,vs),FUN=mean, na.rm=TRUE))

## Group.1 Group.2 mpg cyl disp hp drat wt qsec

## 1 4 0 26.00000 4 120.30 91.0000 4.430000 2.140000 16.70000
## 2 6 0 20.56667 6 155.00 131.6667 3.806667 2.755000 16.32667
## 3 8 0 15.10000 8 353.10 209.2143 3.229286 3.999214 16.77214
## 4 4 1 26.73000 4 103.62 81.8000 4.035000 2.300300 19.38100
## 5 6 1 19.12500 6 204.55 115.2500 3.420000 3.388750 19.21500
## vs am gear carb
## 1 0 1.0000000 5.000000 2.000000
## 2 0 1.0000000 4.333333 4.666667
## 3 0 0.1428571 3.285714 3.500000
## 4 1 0.7000000 4.000000 1.500000
## 5 1 0.0000000 3.500000 2.500000

Dr. S. Iddi (UG) R Training February 10, 2020 97 / 130

Data Manipulation Reshaping data sets

R Examples on reshaping data sets from Wide to Long

Reshape a dataset from wide to long and long to wide is easily done with
the reshape() or melt() and cast() command from the
library(reshape).
str(reshape)

## function (data, varying = NULL, v.names = NULL, timevar = "time",

## idvar = "id", ids = 1L:NROW(data), times = seq_along(varying[[1L]]),
## drop = NULL, direction, new.row.names = NULL, sep = ".", split = if (sep ==
## "") {
## list(regexp = "[A-Za-z][0-9]", include = TRUE)
## } else {
## list(regexp = sep, include = FALSE, fixed = TRUE)
## })

Dr. S. Iddi (UG) R Training February 10, 2020 98 / 130

Data Manipulation Reshaping data sets

Simple R Example on reshape from Wide to Long

#Simple Example: Wide to Long#

(wide<-read.table("./Datasets/wide.txt", header=T))

## Warning in file(file, "rt"): cannot open file

’./Datasets/wide.txt’: No such file or directory
## Error in file(file, "rt"): cannot open the connection

Dr. S. Iddi (UG) R Training February 10, 2020 99 / 130

Data Manipulation Reshaping data sets

Simple R Example on reshape from Wide to Long

#Simple Example: Wide to Long#

long<-reshape(wide,varying=list(c("h1","h2","h3"), c("w1","w2","w3")),
v.names=c("h","w"), times=1:3,direction="long")

## Error in idvar %in% names(data): object ’wide’ not found

head(long)

## Error in head(long): object ’long’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 100 / 130

Data Manipulation Reshaping data sets

Simple R Example on reshape from Long to Wide

#Long to Wide#
long<-long[order(long$id),]

## Error in eval(expr, envir, enclos): object ’long’ not found

head(long)

## Error in head(long): object ’long’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 101 / 130

Data Manipulation Reshaping data sets

Simple R Example on reshape from Long to Wide

#Long to Wide#
w<-reshape(long, timevar = "time", idvar = c("id", "sex"),
direction = "wide")

## Error in reshape(long, timevar = "time", idvar = c("id",

"sex"), direction = "wide"): object ’long’ not found

## Error in eval(expr, envir, enclos): object ’w’ not found

Dr. S. Iddi (UG) R Training February 10, 2020 102 / 130

Data Manipulation Reshaping data sets

Activity

Convert the following data into the long format.

Code Country 1950 1951 1952 1953 1954
1 GH Ghana 20,249 21,352 22,532 23,557 24,555
2 NGR Nigeria 8,097 8,986 10,058 11,123 12,246
3 SA South Africa 12,004 23,024 30,345 32,100 44,456

Convert back to the wide format.

Dr. S. Iddi (UG) R Training February 10, 2020 103 / 130

Data Manipulation Reshaping data sets

Exercise

Below, is a dataset in wide format. Students have been measured using five
metrics: read, write, math, science, and socst.
id female race ses schtyp prog read write math science socst
70 0 4 1 1 1 57 52 41 47 57
121 1 4 2 1 3 68 59 53 63 61
86 0 4 3 1 1 44 33 54 58 31
141 0 4 3 1 3 63 44 47 53 56
172 0 4 2 1 2 47 52 57 53 61
113 0 4 2 1 2 44 52 51 63 61
50 0 3 2 1 1 50 59 42 53 61
11 0 1 2 1 2 34 46 45 39 36
84 0 4 2 1 1 63 57 54 58 51
48 0 3 2 1 2 57 55 52 50 51

Reformat this dataset into long form, using the reshape function.

Dr. S. Iddi (UG) R Training February 10, 2020 104 / 130

Special Functions

The apply family of functions

Loops are useful in programming but can be particularly difficult when

working interactively on the command line.
The apply family of functions implement looping in an easy way.
Example of these functions are lapply(), sapply, tapply() and
apply().
They function as follows:
◦ lapply: loop over a list and evaluate a function on each element.
◦ sapply: same as lapply but try to simplify the results.
◦ apply: apply a function over the margins of an array (rows or
columns).
◦ tapply: apply a function over subsets of a vector.
◦ mapply: multivariate version of lapply.

Dr. S. Iddi (UG) R Training February 10, 2020 105 / 130

Special Functions

lapply() and sapply()

Applies to a list or data frame (since data frame has the structure of a list
of columns).
Used to apply a function to each columns of a data frame in turn.
The results of lapply is a list.
sapply() is used to simplify the result into a vector or matrix.
◦ if the result is a list where every element is of length 1, then a vector
is returned.
◦ if the result is a list where every element is a vector of the same
length (>1), a matrix is returned.
◦ If it can’t figure things out, a list is returned.

Dr. S. Iddi (UG) R Training February 10, 2020 106 / 130

Special Functions

R Examples on lapply() and sapply()

x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1),

d = rnorm(100, 5))
str(lapply)

## function (X, FUN, ...)

lapply(x, mean)

## $a
## [1] 2.5
##
## $b
## [1] -0.6856285
##
## $c
## [1] 0.9688946
##
## $d
## [1] 5.045283

Dr. S. Iddi (UG) R Training February 10, 2020 107 / 130

Special Functions

R Examples on lapply() and sapply()

str(sapply)

## function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

sapply(x,mean)

## a b c d
## 2.5000000 -0.6856285 0.9688946 5.0452832

Dr. S. Iddi (UG) R Training February 10, 2020 108 / 130

Special Functions

R Examples on lapply() and sapply()

lapply(iris[,-5],mean,na.rm=TRUE)

## $Sepal.Length
## [1] 5.843333
##
## $Sepal.Width
## [1] 3.057333
##
## $Petal.Length
## [1] 3.758
##
## $Petal.Width
## [1] 1.199333

sapply(iris[,-5],mean,na.rm=TRUE)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## 5.843333 3.057333 3.758000 1.199333

Dr. S. Iddi (UG) R Training February 10, 2020 109 / 130

Special Functions

R Examples on lapply() and sapply()

x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))

lapply(x, function(col) col[,1])

## $a
## [1] 1 2
##
## $b
## [1] 1 2 3

sapply(x, function(col) col[,1])

## $a
## [1] 1 2
##
## $b
## [1] 1 2 3

Dr. S. Iddi (UG) R Training February 10, 2020 110 / 130

Special Functions

apply() function

Used to evaluate a function over the margins of an array.

Can also be applied with data frame and give the same results as
sapply.

Its first argument is an array or data frame, the second specifies the
margin.

Specify MARGIN=1 to apply a function to each row in turn, 2 when the

function is to be applied to each column in turn and a number greater
than 2 if the argument is an array of more than two dimensions.

There are shortcuts to find the sums and means of matrix dimensions.
◦ rowSums=apply(x, 1, sum)
◦ rowMeans=apply(x,1, mean)
◦ colSums=apply(x,2, sum)
◦ colMeans=apply(x,2, mean)
Dr. S. Iddi (UG) R Training February 10, 2020 111 / 130
Special Functions

R Examples on apply() function

attach(iris)
x <- matrix(rnorm(200), 20, 10)
apply(x, 1, quantile, probs = c(0.25, 0.75))

## [,1] [,2] [,3] [,4] [,5] [,6]

## 25% -0.4084939 -0.1802337 -0.18559466 -0.3762959 0.03392191 -0.8586708
## 75% 0.9235477 0.5131000 0.09425262 0.5606906 0.51882212 0.5261400
## [,7] [,8] [,9] [,10] [,11] [,12]
## 25% -0.4185235 -0.95517398 -0.4549109 -0.6282301 -0.5888163 -1.3137031
## 75% 1.0384544 -0.09248019 0.9809233 0.1271469 0.7732049 0.4801687
## [,13] [,14] [,15] [,16] [,17] [,18]
## 25% 0.2723333 -0.5674557 -0.7895098 -1.056187 -0.4533377 -0.9828001
## 75% 0.8751587 0.1775895 1.0717022 1.563150 1.3608930 0.1409344
## [,19] [,20]
## 25% -0.4863923 -0.06651604
## 75% 0.4251499 0.82322584

apply(iris[,-5],2,mean)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## 5.843333 3.057333 3.758000 1.199333

Dr. S. Iddi (UG) R Training February 10, 2020 112 / 130

Special Functions

tapply() function

Arguments are a variable, a list factors and function that operates on a

vector to return a single value.
The output is an array with as many dimensions as there are factors.

library(MASS)
attach(cabbages)
attach(iris);

## The following objects are masked from iris (pos = 5):

##
## Petal.Length, Petal.Width, Sepal.Length, Sepal.Width, Species

str(tapply)

## function (X, INDEX, FUN = NULL, ..., default = NA, simplify = TRUE)

x <- c(rnorm(10), runif(10), rnorm(10, 1))

Dr. S. Iddi (UG) R Training February 10, 2020 113 / 130

Special Functions

tapply() function

f <- gl(3, 10)

tapply(x, f, mean)

## 1 2 3
## -0.2104579 0.5430863 1.1316653

tapply(iris$Sepal.Length,list(iris$Species), mean)

## setosa versicolor virginica

## 5.006 5.936 6.588

tapply(HeadWt,list(Cult, Date),mean)

## d16 d20 d21

## c39 3.18 2.80 2.74
## c52 2.26 3.11 1.47

Dr. S. Iddi (UG) R Training February 10, 2020 114 / 130

Graphical Functions

Exploratory graphs

Graphs are very useful in data analysis.

They are used for
◦ understanding data properties.
◦ find underlying patterns of data
◦ suggest modeling strategies
◦ communicate results, etc.
Univariate plots: stem-and-leaf plot, histograms, boxplots, barplots,
density plots, qqplots.
Bivariate plots: scatterplots, line plots.
Several figures can be plotted on the one page with the function
par(from=c(n,k)).

Dr. S. Iddi (UG) R Training February 10, 2020 115 / 130

Graphical Functions

Stem-and-leaf

Textual graph that classifies data items according to their most significant
numeric digits.
Used to study the distribution of a continuous random variable.
Created using the function stem().
attach(faithful)
stem(faithful$waiting)

##
## The decimal point is 1 digit(s) to the right of the |
##
## 4 | 3
## 4 | 55566666777788899999
## 5 | 00000111111222223333333444444444
## 5 | 555555666677788889999999
## 6 | 00000022223334444
## 6 | 555667899
## 7 | 00001111123333333444444
## 7 | 555555556666666667777777777778888888888888889999999999
## 8 | 000000001111111111111222222222222333333333333334444444444
## 8 | 55555566666677888888999
## 9 Dr.|S.00000012334
Iddi (UG) R Training February 10, 2020 116 / 130
Graphical Functions

Histogram or Ogives

Consist of vertical bars that show graphically the frequency distribution

of a quantitative variable.

Standard histogram is created with hist(x,...).

Knowledge of few options can help your histogram look exactly how
you want it.

Dr. S. Iddi (UG) R Training February 10, 2020 117 / 130

Graphical Functions

Histogram or Ogives

BMI<-rnorm(n=1000, m=24.2, sd=2.2)

hist(BMI)

Histogram of BMI
Frequency

100
0

20 25 30

BMI
Dr. S. Iddi (UG) R Training February 10, 2020 118 / 130
Graphical Functions

Histogram or Ogives

hist(BMI, breaks=20, main="Breaks=20")

Breaks=20
Frequency

100
0

20 25 30

BMI
Dr. S. Iddi (UG) R Training February 10, 2020 119 / 130
Graphical Functions

Histogram or Ogives

#plottig with densities instead of frequencies

hist(BMI, freq=FALSE, main="Density plot")

Density plot
Density

0.10
0.00

20 25 30

BMI
Dr. S. Iddi (UG) R Training February 10, 2020 120 / 130
Graphical Functions

Histogram or Ogives

hist(BMI, freq=FALSE, xlab="Body Mass Index",

main="Distribution",
col="lightgreen", xlim=c(15,35), ylim=c(0, .20))
# Add a normal curve
curve(dnorm(x, mean=mean(BMI), sd=sd(BMI)),
add=TRUE, col="darkblue", lwd=2)

Distribution
0.20
Density

0.00

15 20 25 30 35

Dr. S. Iddi (UG) Body MassRIndex

Training February 10, 2020 121 / 130
Graphical Functions

QQ-Plot and Box-Plot

QQ-Plot: To check if univariate data is close to being normal.

The R command qqplot() is used to generate it and qqline() for
quartile line.
It plots the sample quartile vs the theoretical quartile.
Boxplot: graphical way to summarize data.
Automatically compute median, first and third quartile and 95% CI of
the median.
Can detect outliers and compare distributions.
Generated with the function boxplot().

Dr. S. Iddi (UG) R Training February 10, 2020 122 / 130

Graphical Functions

Example: QQ-Plot and Box-Plot

##QQPlot
qqnorm(faithful$waiting)
qqline(faithful$waiting)

Normal Q−Q Plot

Sample Quantiles

50 80

−3 −2 −1 0 1 2 3

Theoretical Quantiles

Dr. S. Iddi (UG) R Training February 10, 2020 123 / 130

Graphical Functions

Example: QQ-Plot and Box-Plot

##Boxplot
boxplot(faithful$waiting)
50 80

Dr. S. Iddi (UG) R Training February 10, 2020 124 / 130

Graphical Functions

Example: QQ-Plot and Box-Plot

##Boxplot
boxplot(faithful$waiting, main="Boxplot of Time Waited",
xlab="Time waited", horizontal=TRUE)

Boxplot of Time Waited

50 60 70 80 90

Time waited

Dr. S. Iddi (UG) R Training February 10, 2020 125 / 130

Graphical Functions

Example: QQ-Plot and Box-Plot

##Boxplot
hist(faithful$waiting,main="Boxplot of Time Waited",
xlab="Time waited")

Boxplot of Time Waited

Frequency

30
0

40 50 60 70 80 90

Time waited

Dr. S. Iddi (UG) R Training February 10, 2020 126 / 130

Graphical Functions

Scatter plots and line plots

Scatter Plot: There are many ways to create a scatter plot of two
quantitative variables.
Created with the basic plot function plot(x,y,...).
It display the pair of values of the vectors x and y inside a cartesian
diagram.
Used to reveal the relationship between the variables.
There are several options to change the default plot. Example,
◦ pch - change symbol of points
◦ cex - change size of text
◦ adj - shift title to left or right, etc.
Line plot: control the type of line connecting points with the option
type=.
Example: "p" for points, "l" for lines, "b" for both, "s" for stair
steps,"n" for no plotting.
Dr. S. Iddi (UG) R Training February 10, 2020 127 / 130
Graphical Functions

Codes: Scatter plots and line plots

##Scatter plots
par(mfrow=c(1,2))
plot(eruptions, waiting, # plot the variables
xlab="Eruption duration", # x-axis label
ylab="Time waited",main="Scatterplot1")
abline(lm(waiting~eruptions))
#change symbol of points and size of points.
plot(eruptions, waiting,main="Scatterplot2",adj=0,
xlab="Eruption duration", cex=2, ylab="Time waited",pch=2,col=3)

Scatterplot1 Time waited Scatterplot2

Time waited

50 80

1.5 4.5 1.5 4.5

Dr. S. Iddi (UG) R Training February 10, 2020 128 / 130
Graphical Functions

Codes: Scatter plots and line plots

##Line plot
x<-rnorm(25,0,1)
x<-sort(x)
Fn<-order(x)/length(x)
#cbind(x,order(x),Fn)
par(mfrow=c(1,2))
plot(x,Fn)
plot(x,Fn,type="l")
0.2 0.8

0.2 0.8
Fn

−2.0 1.0 −2.0 1.0

Dr. S. Iddi (UG) x R Training x February 10, 2020 129 / 130

Graphical Functions

Codes: Scatter plots and line plots

##Line plot
par(mfrow=c(1,2))
plot(x,Fn,type="b")
plot(x,Fn,type="s")
1.0

1.0
0.6

0.6
Fn

Fn
0.2

0.2

−2.0 1.0 −2.0 1.0

Dr. S. Iddi (UG) x R Training x February 10, 2020 130 / 130

R - Programming - Fundamentals - PPT 1
No ratings yet
R - Programming - Fundamentals - PPT 1
14 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
R Programming Presentation
100% (1)
R Programming Presentation
23 pages
Learn R Programming in A Day
100% (8)
Learn R Programming in A Day
229 pages
Genetics
No ratings yet
Genetics
392 pages
Introduction To R-TD
No ratings yet
Introduction To R-TD
38 pages
R Programming Notes
100% (1)
R Programming Notes
32 pages
Advanced R Notes
No ratings yet
Advanced R Notes
28 pages
D1 2 Intro R
No ratings yet
D1 2 Intro R
52 pages
RTraining
No ratings yet
RTraining
85 pages
Machine Learning in R: Alexandros Karatzoglou
No ratings yet
Machine Learning in R: Alexandros Karatzoglou
151 pages
Introduction To R Programming 1691124649
No ratings yet
Introduction To R Programming 1691124649
79 pages
R Basic
No ratings yet
R Basic
16 pages
D1 R-Intro
No ratings yet
D1 R-Intro
33 pages
Introduction To R
No ratings yet
Introduction To R
28 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
179 pages
Carol Ash - The Probability Tutoring Book - An Intuitive Course For Engineers and Scientists (And Everyone Else!) - Wiley-IEEE Press (1996)
100% (2)
Carol Ash - The Probability Tutoring Book - An Intuitive Course For Engineers and Scientists (And Everyone Else!) - Wiley-IEEE Press (1996)
481 pages
R Programming Basics Slides
No ratings yet
R Programming Basics Slides
91 pages
R Workshop
No ratings yet
R Workshop
47 pages
Sessions
No ratings yet
Sessions
88 pages
R Module 1
No ratings yet
R Module 1
34 pages
DSRS BR
No ratings yet
DSRS BR
25 pages
Essential R
No ratings yet
Essential R
183 pages
R Studio
No ratings yet
R Studio
41 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
Lec 3
No ratings yet
Lec 3
23 pages
15MA102 Advanced Calculus and Complex Analysis PDF
No ratings yet
15MA102 Advanced Calculus and Complex Analysis PDF
2 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
Owen TheRGuide
No ratings yet
Owen TheRGuide
61 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
R Language
No ratings yet
R Language
59 pages
Algebra Cheat Sheet: Basic Properties & Facts
No ratings yet
Algebra Cheat Sheet: Basic Properties & Facts
4 pages
TIMO 2017 Heat Round S1 PDF
No ratings yet
TIMO 2017 Heat Round S1 PDF
5 pages
Class One
No ratings yet
Class One
66 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
Satyam Jha R File
No ratings yet
Satyam Jha R File
41 pages
01-MSBA-615 - Introduction To R Programming and R Studio
No ratings yet
01-MSBA-615 - Introduction To R Programming and R Studio
47 pages
R Prog Lab Manual Theory
No ratings yet
R Prog Lab Manual Theory
16 pages
APPS RMO CAMP 2019 VOL 2 Soln-1 (2473)
No ratings yet
APPS RMO CAMP 2019 VOL 2 Soln-1 (2473)
20 pages
R Lab
No ratings yet
R Lab
114 pages
Unit1 R
No ratings yet
Unit1 R
16 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
Bayes CPH - Tutorial R
No ratings yet
Bayes CPH - Tutorial R
9 pages
EssentialR PDF
No ratings yet
EssentialR PDF
181 pages
MIT 201 - Tutorial 01
No ratings yet
MIT 201 - Tutorial 01
8 pages
Unit I - Introduction To R
No ratings yet
Unit I - Introduction To R
21 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
Module 1 Rprogramming Introduction Part A
No ratings yet
Module 1 Rprogramming Introduction Part A
20 pages
Introduction To R: Pavan Kumar A
No ratings yet
Introduction To R: Pavan Kumar A
55 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
28 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
28 pages
Ntroductory Tatistics: by Dr. Laila M. Fatehy
No ratings yet
Ntroductory Tatistics: by Dr. Laila M. Fatehy
22 pages
Introduction To Rlogistic
No ratings yet
Introduction To Rlogistic
135 pages
R Programming 2
No ratings yet
R Programming 2
11 pages
Module 1-1
No ratings yet
Module 1-1
38 pages
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
No ratings yet
Ba Assignment Sem 6 (22504025) Dhruvi Pathania
28 pages
Introduction To R
No ratings yet
Introduction To R
30 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Introducation To R
No ratings yet
Introducation To R
23 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
PMAEE Mathematics Modules and Readings
No ratings yet
PMAEE Mathematics Modules and Readings
5 pages
Numerical Methods For Least Squares Problems, Second Edition
No ratings yet
Numerical Methods For Least Squares Problems, Second Edition
510 pages
Recurrence Relations: Solution. Let A
No ratings yet
Recurrence Relations: Solution. Let A
24 pages
NBS-S3h Basic Maths Mcqs Guide - NUSTrive by Nowsherwan
No ratings yet
NBS-S3h Basic Maths Mcqs Guide - NUSTrive by Nowsherwan
62 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Algebra and Pstat
No ratings yet
Algebra and Pstat
2 pages
Introduction To Robotics
No ratings yet
Introduction To Robotics
145 pages
Matrix and Determinant: Presented By: Sudip Shrestha Shuvekshya Bhattrai Shrisha Tuladhar
100% (1)
Matrix and Determinant: Presented By: Sudip Shrestha Shuvekshya Bhattrai Shrisha Tuladhar
14 pages
2 Indices
No ratings yet
2 Indices
18 pages
Symmetry2011 1 K Horn
100% (1)
Symmetry2011 1 K Horn
31 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
18 pages
Linear Algebra Matrices
No ratings yet
Linear Algebra Matrices
20 pages
10 - 19UMABS302 - C - 7 - 3219UMABS302 - M III (DM) Unit - 5 Module
No ratings yet
10 - 19UMABS302 - C - 7 - 3219UMABS302 - M III (DM) Unit - 5 Module
35 pages
Target Mathematics By:-Agyat Gupta: Pre-Board Examination 2010 - 11 Class - Xii Cbse Mathematics
No ratings yet
Target Mathematics By:-Agyat Gupta: Pre-Board Examination 2010 - 11 Class - Xii Cbse Mathematics
4 pages
Multivariate Final PDF
No ratings yet
Multivariate Final PDF
261 pages
Multivariate Final PDF
No ratings yet
Multivariate Final PDF
261 pages
Linear Algebra Applications
No ratings yet
Linear Algebra Applications
25 pages
Resolving 3D Forces
No ratings yet
Resolving 3D Forces
5 pages
Critical Thinking Book
No ratings yet
Critical Thinking Book
327 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
14 pages
Complete - CCEA C3 January 2014 Unofficial Mark Scheme
No ratings yet
Complete - CCEA C3 January 2014 Unofficial Mark Scheme
9 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
18 pages
STAT301 Notes
No ratings yet
STAT301 Notes
168 pages
Composite Functions (Y11)
No ratings yet
Composite Functions (Y11)
8 pages
STAT 606 - Some Empirical Methods Slides - 2019 - 2020
No ratings yet
STAT 606 - Some Empirical Methods Slides - 2019 - 2020
39 pages
Alm Gren
No ratings yet
Alm Gren
72 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
38 pages
Dipmaths
No ratings yet
Dipmaths
2 pages
Math 10 Peta 3rd Quarter
No ratings yet
Math 10 Peta 3rd Quarter
2 pages
Espinosa Mark Joven D
No ratings yet
Espinosa Mark Joven D
4 pages
MATH 223: Calculus II: Dr. Joseph K. Ansong
No ratings yet
MATH 223: Calculus II: Dr. Joseph K. Ansong
34 pages
ch5 Linear Equations in Two Unknowns
No ratings yet
ch5 Linear Equations in Two Unknowns
12 pages
USA Schools
No ratings yet
USA Schools
30 pages
Matlab - Variables: and With
No ratings yet
Matlab - Variables: and With
5 pages
334 Exx3
No ratings yet
334 Exx3
7 pages
Yaliserves: Yaliserves Business Business Analysis Analysis Workbook Workbook
No ratings yet
Yaliserves: Yaliserves Business Business Analysis Analysis Workbook Workbook
13 pages
Fourier - Series - Fact - Sheet - Corrected
No ratings yet
Fourier - Series - Fact - Sheet - Corrected
3 pages
Understanding High School Mathematics
No ratings yet
Understanding High School Mathematics
11 pages
Stat 441 & Stat 601 I.A 2 Solns 17-18
No ratings yet
Stat 441 & Stat 601 I.A 2 Solns 17-18
8 pages
STAT 630: Advanced Data Analysis: Procedure For Project May 29, 2020
No ratings yet
STAT 630: Advanced Data Analysis: Procedure For Project May 29, 2020
3 pages
07MA6005 Mathematical Methods in Structural Engineering DEC 2015
No ratings yet
07MA6005 Mathematical Methods in Structural Engineering DEC 2015
3 pages
MSREQ
No ratings yet
MSREQ
2 pages
General Leibniz Rule - Wikipedia
No ratings yet
General Leibniz Rule - Wikipedia
1 page
STAT630 - Exam Project
No ratings yet
STAT630 - Exam Project
3 pages
Contest Whiteboard Flyer
No ratings yet
Contest Whiteboard Flyer
1 page
Beginner's Guide to R Programming
From Everand
Beginner's Guide to R Programming
Agasti Khatri
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
R Programming Unlocked: Easy Learning
From Everand
R Programming Unlocked: Easy Learning
Md. Sifat Hossain
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet

STAT630 RSlide

Uploaded by

STAT630 RSlide

Uploaded by

Introduction to R

For STAT 605/703

Department of Statistics and Actuarial Science

February 10, 2020

Dr. S. Iddi (UG) R Training February 10, 2020 1 / 130

Dr. S. Iddi (UG) R Training February 10, 2020 2 / 130

Dr. S. Iddi (UG) R Training February 10, 2020 3 / 130

 In statistics, we study data for some purpose.

Dr. S. Iddi (UG) R Training February 10, 2020 4 / 130

 Knowledge of basic to more advanced techniques is an advantage.

 The objective of this section is to demystify some things about R

Dr. S. Iddi (UG) R Training February 10, 2020 5 / 130

 R is a dialect of the S language developed by John Chambers and others

Dr. S. Iddi (UG) R Training February 10, 2020 6 / 130

 R syntax is very similar to S and so easier for S-PLUS user to switch

Dr. S. Iddi (UG) R Training February 10, 2020 7 / 130

Advantages and Drawbacks

Dr. S. Iddi (UG) R Training February 10, 2020 8 / 130

Design of the R System

There are two conceptual parts.

Dr. S. Iddi (UG) R Training February 10, 2020 9 / 130

How to install packages?

 How are R packages installed? We use the R function

 You can also install multiple packages by forming a vector.

 Example: install.packages(c(’ggplot2’, ’dplyr’))

 Note that: install.packages(’ggplot2’, ’dplyr’) will not

Dr. S. Iddi (UG) R Training February 10, 2020 10 / 130

R Resources and Getting Help

The following resources can be found from CRAN (visit

Getting Help with R Functions

 Search help files: help.search("rnorm").

 Get arguments: args(rnorm).

 Access codes: simply type rnorm.

Dr. S. Iddi (UG) R Training February 10, 2020 12 / 130

Expressions and Assignment

 Elementary commands consist of either expression or assignments.

sum(4,2,5,1) # sum of all elements

prod(4,2,5,1) # product of all elements

Dr. S. Iddi (UG) R Training February 10, 2020 13 / 130

Expressions and Assignment

x<-mean(c(3,5,8,1)) #nothing is printed

print(x) #explicit printing

y<-exp(1) # exponential function

(m<-max(4,2,5,1)) # maximum of all elements

Dr. S. Iddi (UG) R Training February 10, 2020 14 / 130

Expressions and Assignment

(y<-sqrt(x)) #another way to print results

z<-1:6 #The operator : is used to create integer sequences.

(seq(from=1, to=2, by=0.2))# create a sequence from 1 to 2

## [1] 1.0 1.2 1.4 1.6 1.8 2.0

# with 0.2 increment

 R works with objects.

Dr. S. Iddi (UG) R Training February 10, 2020 16 / 130

 View a list of currently defined objects with ls() or objects()

Dr. S. Iddi (UG) R Training February 10, 2020 17 / 130

x<- c(1:5) #vector

## [1] "a" "c" "c" "d" "e"

Dr. S. Iddi (UG) R Training February 10, 2020 18 / 130

 When different objects are mixed in a vector, coercion occurs so that

Dr. S. Iddi (UG) R Training February 10, 2020 19 / 130

z <- c(1.7, "a") ## character

z <- c(TRUE, 2) ## numeric

z <- c("a", TRUE) ## character

Dr. S. Iddi (UG) R Training February 10, 2020 20 / 130

## [1] FALSE TRUE TRUE TRUE TRUE TRUE

## Warning: NAs introduced by coercion

## Warning: NAs introduced by coercion

Arithmetic and Logic Operators

 Arithmetic operators consist of: + (addition), - (subtraction), *

Dr. S. Iddi (UG) R Training February 10, 2020 22 / 130

Dr. S. Iddi (UG) R Training February 10, 2020 23 / 130

x <- c(TRUE, FALSE, TRUE, FALSE)

(y<-seq(from=-5, to=10, by=1))

## [1] 1.01953555 -0.61850147 -0.70058543 -1.66749774 -1.48370835

(q<-p>= -1 & p<=1)

(r<-c(1,2,4, NA, NaN, 5, Inf))

In statistics, we study data for some purpose.

Knowledge of basic to more advanced techniques is an advantage.

The objective of this section is to demystify some things about R

R is a dialect of the S language developed by John Chambers and others

R syntax is very similar to S and so easier for S-PLUS user to switch

How are R packages installed? We use the R function

You can also install multiple packages by forming a vector.

Example: install.packages(c(’ggplot2’, ’dplyr’))

Note that: install.packages(’ggplot2’, ’dplyr’) will not

Search help files: help.search("rnorm").

Get arguments: args(rnorm).

Access codes: simply type rnorm.

Elementary commands consist of either expression or assignments.

R works with objects.

View a list of currently defined objects with ls() or objects()

When different objects are mixed in a vector, coercion occurs so that

Arithmetic operators consist of: + (addition), - (subtraction), *

Numbers are treated as numeric objects.

A vector consist of an ordered collection of elements.

Arithmetic operations, addition, subtraction, multiplication and division

Data, reports and figures require frequent manipulation of characters.

Extracting subsets of vectors is frequently required.

Index vectors can be created by logical vectors.

Some numerical data contain missing cases represented by NA.

Arithmetic operations involving NA results in NA.

So we need a way to extract values that are not NA.

identify NA: is.na().

A two dimensional object.

R contains many functions and operators for matrices.