R Training by Emma Mba

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 68

PROGRAMING WITH R

The Basics of R program


By
Mba Emmanuel Ikechukwu,
Dept. of Statistics.
UNN.
Course objectives
Our target in this course is for the students to
know how to:
1. Construct and manipulating data structure.
2. Generate and compute probability from
different distributions.
3. Analyze basic dataset
4. Create graphics.
5. Write program code.
Contents
• Help facility
• Data object
• R operators
• Data structure
• Statistical distribution in R
• Graphics
• Program tools (if statement, Loop, create
function)
How to install R ?

R statistical software can be downloaded free from :

http://cran.r-project.org/bin/windows/base/

www.r-project.org
Help Facilities
  There are many choices to start the help system :

For general help:


–help()
–Click on [Help]

For a specific command or function:


–help ( command name), for example,
>help(mean)
–? Function name, for example,
>? Mean
Help Facilities

 For searching for entries


The help.search command, for example,
>help.search("linear models")
>help.search("ANOVA")
 

The examples on a help topic can normally be run by


> example(t.test)
Data objects
  Data modes:
The data object is a collection of values. The
modes of values are as follows:
 Logical: the values T( or TRUE) and F(FALSE).
 Numeric: real numbers, integers, decimal or scientific notation.
 Complex: complex numbers of the form a+bi ( 3+1.23 i ), (a and b) are
numeric.
 Character: enclosed by double quotes (“) or apostrophes (‘), such a
“Sara” or ‘Sara’.

 
– To know the mode of any object use mode ( )
function
Types of data objects
There are seven basic types of data objects in R:
 Vector ( a set of values) – one way array of data.
 Matrix (two way array).
 Array ( a matrix with more than two dimensions)
 Data frame ( generalized matrices that allow a mix
of columns with different data modes).
 Factor (categorical data).
 List ( a list of components, where each component
can be a data object of different data types).
 Time series.
Operators in R

   I. Names and Assignment:


The assignment operator (<- or =) used to associate names and values.
For example

x <- 7 or x =7 # stores the value 7 in an object named x


You can check of the object x either by typing x or print (x).
 
Note:
All assignments in R remain until removed or overwritten. The rm() command used to remove a
variable.
Example:
>Print(x)
[1] 7
>rm(x) # remove x
>x
Error: object “x” not found.
Operators in R

II. Arithmetic operators

Operator Description Priority


() parentheses 1
** or ^ Exponentiation 2
Sequences of
: 3
numbers
* / Multiply, divide 4
+ - Add, subtract 5
II. Arithmetic operators

Example 1:

>2+3*4 #Multiplication is done first, answer=14


>(2+3)*4 #Addition is done first, answer=20
>3/2+1 #Division is done first, answer=2.5
>4*3^2 #Powers or exponentiation are done before * ,
answer=36
Operators in R

III. Logical and comparison operators:

Operator Description Operator Description

< Smaller than & Factorized And

> Larger than | Factorized Or

== Equal to ! Not
greater than or
>= != Not equal to
equal to
Less than or equal
<=  
to
III. Logical and comparison operators

Example 2:
> 3<4
[1] TRUE

> 3==4
[1] FALSE

> x<- -3:3;


[1] -3 -2 -1 0 1 2 3
>Y<- -1; z<-3
> y>0 & z>0
[1] FALSE
> y>0 | x>0
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
III. Logical and comparison operators

#True and Fales as number, R converts true to 1 and false to 0

> x<- -3:3


>x
[1] -3 -2 -1 0 1 2 3

> x<2
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
> sum(x<2)
[1] 5
 sum(x>=2)
 [1] 2
Missing values

When an element or value is “not available”


or a ”missing value ” the data values are
represented by such special symbols

NA

(missing data, square root or logarithm of


negative number).
Missing values
There is a second kind of “missing” values which are produced by
numerical computation; it is called Not a Number, NaN, values.

Example 3:
> 0/0 > Inf - Inf
[1] NaN [1] NaN

> log(-2)
>Inf/Inf [1] NaN
[1] NaN Warning message:
NaNs produced in: log(x)
Missing values

For these cases, any operation on NA becomes NA.

The function is.na(x) gives a logical vector of the same size as x with value TRUE if and
only if the corresponding element in x is NA.

> x<-c(1:3,NA) ; x
[1] 1 2 3 NA

> is.na(x)
[1] FALSE FALSE FALSE TRUE

> sum(x)
[1] NA

> x[!is.na(x)]
[1] 1 2 3
Missing values

> x<-c(1,2,3,NaN,4,5,NaN,7); x
[1] 1 2 3 NaN 4 5 NaN 7

> is.na(x)
[1] FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE

> sum(x)
[1] NaN
Use of Brackets

Name of
bracket function
bracket

Round For function calls like in mean(x), and to set


( )
brackets priorities
Square Index brackets in x[3] used to access or extracts
[ ]
brackets data

Curly Block delimiter for grouping sequences of


{ }
brackets commands as in functions or if statements
DATA STRUCTURE
I. Vector

A. Creating a vector
function Symbol description example
Combines
Concatenate c( ) values with Xc(2,3,8,0,-7)
command
any mode
seq(from= ,to= ,by= ) Regular X seq(1,10,1)
Sequence
sequences of
command : to X1:10
from numbers
Takes a
Replicate rep(x, times= ) pattern and Xrep(1, 5)
command
replicates it
DATA STRUCTURE
I. Vector
A. Creating a vector
Example 4:
> c(1,7:9)
[1] 1 7 8 9

> x<- c(1,7:9)


> c(1:5, 10.5, "next")
[1] "1" "2" "3" "4" "5" "10.5" "next“

> c("This", "is","Sta",“371")


[1] "This" "is" "Sta" “371"

 
DATA STRUCTURE
I. Vector
Example 5:
> 1:4
[1] 1 2 3 4

> seq(0,1, length=5)


[1] 0.00 0.25 0.50 0.75 1.00

> seq(1,9, by = 2)
[1] 1 3 5 7 9

> seq(1,by=0.05,length=10)
[1] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45
DATA STRUCTURE
I. Vector
Example 6:
> rep(1:4, 2)
[1] 1 2 3 4 1 2 3 4

> rep(1:4, c(2,1,3,2))


[1] 1 1 2 3 3 3 4 4

> rep(c("yes","no"), c(4,2))


[1] "yes" "yes" "yes" "yes" "no" "no"
DATA STRUCTURE
I. Vector
B. Vector Arithmetic
Example 7:
> x<-1:10
>x
[1] 1 2 3 4 5 6 7 8 9 10

> x*2
[1] 2 4 6 8 10 12 14 16 18 20

> y<-6:2
>y
[1] 6 5 4 3 2

> y+x
[1] 7 7 7 7 7 12 12 12 12 12

 
DATA STRUCTURE
I. Vector
C. Accessing elements in a vector

 To select elements of a vector, use brackets, [indices],


 To delete elements from a vector, use the minus sign.
[ - indices ].
 Extracting elements using logical values, [ logical
condition ]
DATA STRUCTURE
I. Vector
C. Accessing elements in a vector

Example 8: (Integral index vector )


> y<- -2:3
> log(y)
[1] NaN NaN -Inf 0.0000000 0.6931472 1.0986123
Warning message:
NaNs produced in: log(x)

> log(y[y>0])
[1] 0.0000000 0.6931472 1.0986123

> z=c(2,5,4,NA,3,-2)
> z[!is.na(z)]
[1] 2,5,4,3,-2

> mean(z[!is.na(z)])
[1] 4.5
DATA STRUCTURE
I. Vector
C. Accessing elements in a vector

Example 9: (lntegar index vector )


> X<-seq(2,10,2)
 

>X
[1] 2 4 6 8 10

> X[2]
[1] 4

> > y<-2:6; y


[1] 2 3 4 5 6      

> X[3:5] > X[c(1,3,5)] > X[6]


> y[-c(1:3)] [1] 6 8 10 [1] 2 6 10 [1] NA
[1] 5 6

 
You also could use rep( ) or seq( ) inside []  
 
> X[seq( )];X[rep( )]
Some Arithmetic and Statistical R Functions
R Function Notes
log(x), log10(x), exp(x), sqrt(x) ln(x), log10(x), ex, x
Sin(x), cos(x), tan(x) Trigonometric function
Maximum, minimum, number of elements, and range of a
max(x), min(x), length(x), range(x)
vector
Sign, absolute value, sort in ascending order, summation,
sign(x), abs(x), sort(x), sum(x), prod(x)
product of elements in a vector x
ceiling(x) Rounds to the next higher integer
floor(x) Rounds to the next lower integer
trunc(x) Cuts off all digits after the decimal point
Rounds to the nearest integer. The second argument is the
round(x), round(x, 3), round(x, -1) number of significant number of digits desired, negative
value to round large number to nearest 10 or 100, etc.
cor(x,y), mean(x), var(x), quantile(x),
Statistical function
median(x), summary( ),
Quotient( integer division), modulo function (remainder)
%/%
% / % and % % always satisfy e1 = = ( e1 % / % e2))e2+e1 %
%%
% e2
Returns an object which, for each element, is the sum
cumsum(x), cumprod(x)
(product) of all of the elements to that point.
gamma Gamma function
Statistical function in R
Measures Function in R

Mean mean(x)

Median median(x)

range(x)
return vector of two elements (min(x),max(x))
Range the actual range
range(x)[2] – range(x)[1]
or diff(range(x))

Variance var(x)

Standard deviation sd(x)


quantile(x); quantile(x, percentage);
Quartiles return a vector of five elements

Person correlation coefficient cor(x, y)

Sample function sample(x, size, replace = FALSE, prob = NULL)


Arithmetic and Statistical R Functions
Example 10:
> y<- c(-1,2,NA,5,4); y
[1] -1 2 NA 5 4

> x<-log(y); x
Warning message:
NaNs produced in: log(x)
[1] NaN 0.6931472 NA 1.6094379 1.3862944

> mean(x)
[1] NA

> mean(x,na.rm =TRUE)


[1] 1.229626
DATA STRUCTURE
II. Matrix

A. Creating matrix

Function Description Example 11

matrix(1:12,3,4)
Creates matrix, takes a vector argument and turns
matrix(1:12,3)
matrix( ) it into a matrix
matrix(1:12,ncol=4)
matrix(data, nrow, ncol, byrow = F)
matrix(1:12,3,4,byrow=T)

cbind( ) Combines vectors column by column x  cbind(c(1:4),c(5:8))

rbind( ) Combines vectors row by row x  rbind(c(1:4),c(5:8))


DATA STRUCTURE
II. Matrix

B. Matrix arithmetic

The arithmetic operation ( +, - , *, / ) are applied in an


element wise manner ( element by element ), the
matrices should be the same dimension.

% * % operator performs matrix multiplication on two


conformable matrices.
DATA STRUCTURE
II. Matrix
Example 12:
x  matrix(1:4,2)
1 3
2 4
 
y matrix(c(1:2, 1:2),2)
1 1
2 2
 
Calculate: x+y; x-y; x*y; x/y
DATA STRUCTURE
II. Matrix

Function Description
nrow( ), ncol( ) Returns the number of row or the column of the matrices
dimnames( ) Returns or changes the dimnames attribute of a matrix or array
Either creates a diagonal matrix or extracts the diagonal
diag( )
elements of a matrix
Solve( ) Calculate the inverse
var( ) Covariance matrix of the columns
t(x) Transpose of x
eign(x) Eigenvalues and eigenvectors of x
apply( ) Applies a function to each row or column in the matrix

Outer() outer(X, Y, FUN="*", ...)


X %o% Y
DATA STRUCTURE
II. Matrix
Example 13:
> x<- matrix(c(1,2,3, 11,12,13), 2, 3, byrow=TRUE)
> dimnames(x)<-list(c("row1", "row2"), c("C.1", "C.2", "C.3"))
>x

C.1 C.2 C.3


row1 1 2 3
row2 11 12 13
DATA STRUCTURE
II. Matrix
Example 14:
The function solve is used to solve a system of equations
Z1+2Z2+3Z3=3
2Z1+3Z2+2Z3=0
3Z1+2Z2+Z3=1
 mat=rbind(c(1,2,3),c(2,3,2),c(3,2,1))
y=c(3,0,1)
z=solve(mat,y)
mat%*% z =y
[,1]
[1,] 3
[2,] 0
[3,] 1
DATA STRUCTURE
II. Matrix
Example 15:
> x <- cbind(ClassA = 3, ClassB = c(4:1, 2:5)) > class.m<-apply(x, 2, mean); class.m
> dimnames(x)[[1]] <- letters[1:8] ClassA ClassB
ClassA ClassB 3 3
a 3 4 > x<-rbind(x,class.m); x
b 3 3 ClassA ClassB
c 3 2 a 3 4
d 3 1 b 3 3
e 3 2 c 3 2
f 3 3 d 3 1
e 3 2
g 3 4
f 3 3
h 3 5
g 3 4
h 3 5
class.m 3 3
DATA STRUCTURE
II. Matrix
C) Matrix indexing
To select elements of a matrix use square brackets, [no.of. row, no.of. col]
also you can use the labels of row and column to access the element.

Access the elements in the x matrix


x[a,]
x[,b]
x[-a,]
x[,-b]
x[a:b,c:d]
DATA STRUCTURE
II. Matrix
C) Matrix indexing > x[1,2]; x["a","ClassB"]
>x [1] 4
ClassA ClassB
a 3 4 > x[1,]; x["a",]
ClassA ClassB
b 3 3
3 4
c 3 2
d 3 1 > x[,2]; x[,"ClassB"]
e 3 2 a b c d e f g h class.m
f 3 3 4 3 2 1 2 3 4 5 3
g 3 4
h 3 5
class.m 3 3
DATA STRUCTURE
III. Data frame
Are very similar to matrices except that they allow the columns to contain different types of data
of the same length, whereas a matrix is restricted to one type of data only. Data frames still have
to be in rectangular form as matrices.

A) Creating Data frame


1) Read data from external files using read.table() for .txt file
or using the functions in foreign Package
2) Data.frame() binds together objects of different types.

data.frame(data1,data2,…)

B) Data frame Arithmetic


You can only apply numeric computations to numeric variables in data frame.

C) Data frame Indexing


Same tools used with matrix indexing can be used and also you can use $ to extract vector.
DATA STRUCTURE
III. Data frame
Example 16:
Create data frame includes the following information using
data.frame(data1,data2,…)
:
Price Car Model
27 Proton
25 Saga
20 Werra
DATA STRUCTURE
III. Data frame
car.inf<-data.frame(model=c("Proton","Saga","Werra"),price=c(27,25,20))
> car.inf
model price
1 Proton 27
2 Saga 25
3 Werra 20

> car.inf$model
[1] Proton Saga Werra

> car.inf[1,2]
[1] 27
DATA STRUCTURE
IV. List
A list allows a programmer to tie together related
data that do not have the same structure (different
lengths or modes).
A) Creating List:
Used list() function.
B) List Indexing:
To access the elements in a list, used a double square
brackets [[ ]] then the sub elements by using a single
square brackets.
DATA STRUCTURE
IV. List
Example 16:
Create a list with two components: car.inf data frame and no.model
vector
> no.model<-c(1990,2002,2005)
> car.list<-list(car.inf,no.model); car.list
[[1]]
model price
1 Proton 27
2 Saga 25
3 Werra 20

[[2]]
[1] 1990 2002 2005
DATA STRUCTURE
IV. List
> names(car.list)<-(c("car.inf","no.model"))
> car.list[[1]]; car.list$car.inf;
model price
1 Proton 27
2 Saga 25
3 Werra 20

> car.list$car.inf[,"price"]; car.list$car.inf$"price"


[1] 27 25 20

> car.list[[2]]; car.list$no.model


[1] 1990 2002 2005
Statistical Distribution in R
R functions produce 4 important values for commonly
statistical distributions. The four important functions are:
 
 The density function (d)
 The probability function (p) – or cumulative density function
P(X≤x)=F(x)
 The quantile function (q) – inverse of the probability function p(q(x))=x
& q(p(x))=x.
 The random number generation function (r) – generate random
numbers from specified distribution.
Statistical Distribution in R
For normal distribution
1) To compute density at x
dnorm(x, mean=0, sd=1) , x is a vector of quantiles.

2) To compute the cumulative


pnorm (q, mean=0, sd=1) , q vector of quantiles.

3) To compute the pth quantile – inverse of the prob. function


qnorm( p, mean=0, sd=1) , p vector of probabilities.

4) To generate a random sample of size n from normal distribution


rnorm (n, mean=0, sd=1) , n sample size
Statistical Distribution in R
Example 17:
> rnorm(10) # generate 10 numbers from normal(0,1)
[1] 0.805614270 1.306613806 0.005207416 -1.422705501 -1.597293862
[6] 0.358613353 -0.143352219 -0.688565383 0.367230807 0.425260159
> rnorm(10,12,3) # generate 10 numbers from normal(12,9)
[1] 15.761257 15.591548 12.526016 15.917080 11.271831 7.833598
10.349292
[8] 13.136388 9.738675 10.905414
Statistical Distribution in R
Example 17:
> pnorm(1,0,2) #P(X  1) where X N(0,2)
[1] 0.6914625

> pnorm(1.96) #P(X  1.96) where X N(0,1)


[1] 0.9750021
> pnorm(seq(-2,2,1))
[1] 0.02275013 0.15865525 0.50000000 0.84134475 0.97724987
> qnorm(0.05) #The quantile corresponding to propability 0.05
[1] -1.644854
>qnorm(0.95)
[1] 1.644854
Statistical Distribution in R
Distribution R root name Parameters
norm Mean = 0 sd =1
Student's t t df
Chi-square chisq df
F F df1 df2
Gamma gamma shape
Beta beta shape1 shape2
Uniform unif min=0 max=1
Lognormal lnorm meanlog=0 sdlog=1
Logistic logis location=0 scale=1
Cauchy cauchy location=0 scale=1
Exponential exp rate=1
Binomial binom size probability
Poisson pois lambda
Weibull weibull shape
Statistical Distribution in R
Example 18:
> qt(p=0.975,df=9) #The 5% critical value for a two sided t-test on 9 d.f.
[1] 2.262157

> dpois(2,lambda=5) #P(x=2) where x has poisson distribution with lambda=5


[1] 0.08422434

> dnorm(-2:2,2,2) #f(x) when x=-2,-1,0,1,2 where x has Normal(2,2)


[1] 0.02699548 0.06475880 0.12098536 0.17603266
0.19947114
Graphics In R
A. Graphical data exploration function
Description R function

Boxplot chart boxplot(x);


boxplot(x~y,data)
Leaf and Stem plot stem(x)
Histogram hist(x)
Pie chart pie(frequency, name of
categories)
Quantile-quantile plot for qqnorm
Normal distribution
Creates a bar graph barplot(frequency, name
of categories)
Graphics In R
The plot command:
plot( vector x, vector y, . . . )

Where . . . stands for options, some of which include the


following
Graphics In R
Argument Description
type= plot type. “ p “ for points, “l” for lined, “b” for both, “o” for
overlaid, “n” for nothing, “s” for stairstep, and “h” for height
bars.
pch= plot characters at the points. Square(0); octagon(1); triangle(2);
cross(3); x(4); diamond (5) and inverted triangle(6) or
“character”
lty= line type. 1 for solid, 2 for dotted, 3 for small breaks, etc
lwd= line width, 1=default, 2=twice as thick, etc
xlab, ylab, x-axis and y-axis labels
xlim, ylim x-axis and y-axis limits (min, max)
box=T / F draw / or not a box around the plot
Axes=T / F with. Without axes
main= “ “
sub=” “ add “ main title “ , “ subtitle “ to the plot
Graphics In R
Example 19:
Draw the probability density function for the
following distribution.
1- N(0,1)
2- F(5,5)
3- t df(3)
4- chi-square(5)
Graphics In R
Function Description
par ( mfrow=c(2, 3)) create 2 3 layout of figures
lines( ) add lines to existing graph
points( ) add points to existing graph
axis(n) add an axis to side n, n=1 for x-axis, 2 for y-axis
text( ) add text at a specified location
title( ) add title
abline ( v=pos ) add vertical line at a specified position
abline ( h=pos ) add horizontal line at a specified position
abline(a, b) add line with intercept a, slope b
mtext( ) add text on the margins
Graphics In R
Example 20:
Compare standard normal distribution with
t distributions with (d.f) 2, 10, and 50
Study the effect of increasing degrees of
freedom on the t distribution.
Programming Tools
I.Iteration
Forms of Syntax
loop
for loop for (index in range) { expressions to
be executed }
while loop while (condition) { expressions to be
executed }
repeat loop repeat { expressions to be executed
if (condition) break}
Programming Tools
I.Iteration
Example 21:
Calculate the sum over 1, 2, 3, . . . until the sum
equals 100 or larger by using while and repeat
loops.
Programming Tools
I.Iteration
Example 22:
Calculate the sum over 1, 2, 3, . . . , 10 using for
loop.
Programming Tools
I.Iteration
The looping variable i values can be of any mode
Example 23:
A) Numeric looping variable :
for ( i in c(3, 2, 9, 6))
print ( i^2)
or
x  c(3, 2, 9, 6)
for ( i in 1:4)
print((x[i]^2)
 
B) Character looping variable:
ttransport.media  c(“car”, “bus”, “ train”)
for ( i in transport.media)
print( i)
Programming Tools
II. Conditional Execution ( The if statement )

A) if ( condition ) { expression 1 }
B) if ( cond 1 ) { expr 1 }
else if ( cond 2 ) { expr 2 }
else { last expr }
C) ifelse ( condition, expression for true,
expression for false )

Example 24:
 
Programming Tools
III.Writing Function

Syntax:
function_name <- function ( input arguments )
{
function body ( R expressions )
return ( list ( output argument ))
}
 
You can call the function using the calling routine
function_name ( argument )
Programming Tools
III.Writing Function

Note that:
1. All variables declared inside the body of a
function are local and vanish after the
function is executed.
2. Better to use return function if we need more
than one value to return from function.
Programming Tools
III.Writing Function

Example 25:
1) Construct a function which determines the
sign of number.

2) Construct a function which compute mean


and standard error
Simulation
The central limit theorem states that given a
distribution with a mean μ and variance σ²,
the sampling distribution of the mean
approaches a normal distribution with a mean
(μ) and a variance σ²/N as N, the sample size
increases.
Simulation
Algorithm:
1. Generate 1000 samples with size 30 from
each distribution (N(0,1), exp(10), Chi-
square(10)).
2. Compute the mean for each samples.
3. Compute the mean and variance of the
sample mean.
4. Draw the histogram and qq-norm for each
sample means.
http://www.r-project.org/
http://stat.ethz.ch/R-manual/R-patched/doc/html/index.html
http://www.statmethods.net/index.html

You might also like