0% found this document useful (0 votes)
99 views21 pages

Introduction To R 2022

R is a system for statistical computation and graphics. It provides a programming language, high-level graphics, interfaces to other languages, and debugging facilities. R contains functions and data to implement common statistical procedures like regression, ANOVA, and tests. Users can interact with R through commands entered at the prompt or via scripts. Packages containing additional functions can be loaded into R sessions.

Uploaded by

lynn zigara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views21 pages

Introduction To R 2022

R is a system for statistical computation and graphics. It provides a programming language, high-level graphics, interfaces to other languages, and debugging facilities. R contains functions and data to implement common statistical procedures like regression, ANOVA, and tests. Users can interact with R through commands entered at the prompt or via scripts. Packages containing additional functions can be loaded into R sessions.

Uploaded by

lynn zigara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

[Document title]

HSTS204/HASC216/HASTS202
STATISTICAL COMPUTING
Introduction to R
R is a system for statistical computation and graphics. R provides, among other things, a
programming language, high level graphics, interfaces to other languages and debugging
facilities.
The R language is a dialect of S which was designed in the 1980s and has been in widespread
use in the statistical community since. The language syntax has a superficial similarity with C.

R is open source software and its home page is http://www.R-project.org/. The base R
distribution contains functions and data to implement and illustrate most common statistical
procedures, including regression and ANOVA, classical parametric and nonparametric tests,
cluster analysis, density estimation and much more.

The system processes commands entered by the user, who types the commands at the
command prompt, or submits the commands from a file called a script to save retyping and to
separate commands from results. In a window system, users interact with R through the R
console.

Each command or expression to be evaluated is typed at the command prompt, and


immediately evaluated when the Enter key is pressed at the end of a syntactically complete
statement.

-Press the up-arrow key to recall commands and edit them.


-Use the Esc (Escape) key to cancel a command.

Packages

An R installation contains a library of packages. Some of these packages are part of the basic
installation and others can be downloaded from Comprehensive R Archive Network (CRAN)
sites through mirror sites. We use the South African mirror site for downloads. You can create
your own packages.

A package is loaded into R using the library command, e.g. library(survival). The loaded
packages are not considered part of the workspace and if you terminate your session you need
to load them again when you start a new session.

Basics

The command prompt

The R commands are entered at the prompt in the R console window. The prompt character is >
and when a line is continued the prompt changes to +.

L siziba(UZ) 2022
[Document title]

R is case sensitive.

Comments : In R comments begin with a # symbol.

Assignments

In every computer language ‘variables’ provide a means of accessing the data stored in memory.
So one has to name ‘things’ that you would want to use or refer to in future.

The right-to-left assignment operators are the left arrow <- and equal sign=.

Please NOTE: On specifying the file path we use / (forward slash) or (\\) and not \.

Working directory :

To view the current working directory type getwd(). To change the working directory type
setwd(“pathname”). Change your working directory to the HSTS204 course folder you created.

Built in data

R has a lot of inbuilt data sets and some are contained in the ISwR package.

To load these you need to be connected to the internet in type the following command in an R
session:

install.packages(“ISwR”, .libPaths()[1])

The R Help System

The R Graphical User Interface has a Help menu to find and display online documentation for R
objects, methods, datasets, and functions. Through the Help menu one can find several manuals
in PDF form, an html help page, and help search utilities. The help search utility functions are
also available at the command line, using the functions help where you type help("keyword")
which displays help for “keyword” and help.search using help.search("keyword") which
searches for all objects containing “keyword” and the corresponding shortcuts are ? and ??
respectively. The quotes are optional in the help command, but would be required for special
characters and are required in the help.search command.

Example Type :

?barplot #searches for barplot topic

??plot #anythingcontaining "plot"

R example/Tutorial

R also provides a function example that runs all of the examples if any exist for the keyword. To
see the examples for the function mean, type example(mean).

L siziba(UZ) 2022
[Document title]

Session management

The workspace

All variables stored in R are stored in a common workspace. To see the variables that are
defined in a workspace, type ls() (list).

It is possible to delete some of the objects using the command rm(x,y,z) (remove).

It is possible to save the workspace to a file at any time using: save.image() and it will be saved
with file extension .RData.

All the commands typed in an R session are saved upon exit in a file called .Rhistory under the
working directory. You can use a text editor to edit the .Rhistory.

R objects

R does not provide direct access to the computer’s memory but rather provides a number
of specialized data structures we will refer to as objects. The entities that R creates and
manipulates are known as objects. These objects are referred to through
symbols or variables. In R, however, the symbols are themselves objects and can be
manipulated in the same way as any other object.

During an R session, objects are created and stored by name . The R command
> objects()
(alternatively, ls()) can be used to display the names of (most of) the objects which are
currently
stored within R. The collection of objects currently stored is called the workspace.

The list below gives some of the basic R objects:

1) Vectors
Vectors can be thought of as contiguous cells containing data. Cells are accessed through
indexing operations such as x[5] means the 5th observation of the vector x.
R has six basic (‘atomic’) vector types: logical, integer, real, complex, string (or character)
and raw.
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length
1; there are no more basic types. Vectors with length zero are possible (and useful).
String vectors have mode and storage mode "character". A single element of a character
vector is often referred to as a character string.

2) Lists
Lists (“generic vectors”) are another kind of data storage. Lists have elements, each of which
can contain any type of R object, i.e. the elements of a list do not have to be of the same type.
List elements are accessed through three different indexing operations.
Lists are vectors, and the basic vector types are referred to as atomic vectors where it is
necessary to exclude lists.
3) Language objects

L siziba(UZ) 2022
[Document title]

There are three types of objects that constitute the R language. They are calls, expressions,
and names. These objects have modes "call", "expression", and "name", respectively.
They can be created directly from expressions using the quote mechanism and converted to
and from lists by the as.list and as.call functions.
Symbol objects
Symbols refer to R objects. The name of any R object is usually a symbol. Symbols can be
created through the functions as.name and quote.
4) Expression objects
An expression contains one or more statements.
5) Function objects
In R functions are objects and can be manipulated in much the same way as any other object.
Functions (or more precisely, function closures) have three basic components:
-a formal argument list: the argument list is a comma-separated list of arguments;
-a body : The body is a parsed R statement which is usually a collection of statements in braces
(‘{’ and ‘}’), but it can be a single statement, a symbol or even a constant
and an environment: a function’s environment is the environment that was active at the time
that the function was created. The syntax for writing a function is function ( arglist ) body
The function declaration is the keyword function which indicates to R that you want to create a
function.
6) NULL
There is a special object called NULL. It is used whenever there is a need to indicate or specify
that an object is absent. It should not be confused with a vector or list of zero length.
The NULL object has no type and no modifable properties.
7) Builtin objects and special forms
These two kinds of object contain the builtin functions of R, i.e., those that are displayed as
.Primitive in code listings (as well as those accessed via the .Internal function and hence not
user-visible as objects). The difference between the two lies in the argument handling. Builtin
functions have all their arguments evaluated and passed to the internal function, in accordance
with call-by-value, whereas special functions pass the unevaluated arguments to the internal
function.
The other objects include: Promise objects, Dot-dot-dot, Pairlist objects and Environments
Environments can be thought of as consisting of two things. A frame, consisting of a set of
symbol-value pairs, and an enclosure, a pointer to an enclosing environment.

8) Special compound Objects

(i)Factors
Factors are used to describe items that can have a finite number of values (categorical
variables). A factor may be purely nominal or may have ordered categories.
(ii)Data frame objects
Data frames are the R structures which most closely mimic the SAS or SPSS data set, i.e. a
“cases by variables” matrix of data.
A data frame is a list of vectors, factors, and/or matrices all having the same length (number
of rows in the case of matrices). In addition, a data frame generally has a names for the
variables.
Objects Attributes
All objects except NULL can have one or more attributes attached to them. Attributes are stored
as a pairlist where all elements are named, but should be thought of as a set of name=value
pairs.
The following are the basic attributes of an object:
Names:A names attribute, when present, labels the individual elements of a vector or list. When
an object is printed the names attribute, when present, is used to label the elements.

L siziba(UZ) 2022
[Document title]

Dimensions: The dim attribute is used to implement arrays. The content of the array is stored in
a vector in column-major order and the dim attribute is a vector of integers specifying the
respective extents of the array. R ensures that the length of the vector is the product of the
lengths of the dimensions. For example Matrices and arrays are simply vectors with the
attribute dim attached to the vector. A dimension vector is a vector of non-negative integers
Dimnames:Arrays may name each dimension separately using the dimnames attribute which is
a list of character vectors.
Classes: R has an elaborate class system1, principally controlled via the class attribute. This
attribute
is a character vector containing the list of classes that an object inherits from. This forms the
basis of the “generic methods” functionality in R.
Time series attributes: The tsp attribute is used to hold parameters of time series, start, end,
and frequency. This construction is mainly used to handle series with periodic substructure
such as monthly or quarterly data.
Execution of commands in R

When a user types a command at the prompt (or when an expression is read from a file), the
command is transformed by the parser/compiler into an internal representation and the
evaluator executes parsed R expressions and returns the value of the expression. All
expressions have a value. This is the core of the language.

Data Entry

Basics

Recall that R has objects and modes. Objects are anything that you can give a name. There are
many different classes of objects. The main classes of interest here are vector, matrix, factor, list,
and data frame. The mode of an object tells what kind of things are in it. The main modes of
interest here are logical ,numeric, and character.

(i) Typing

Creating Vectors (atomic)

There are 3 functions which are used for creating vectors:

(a) c eg c(2,4,6,8,10)

There are three types of vectors created this way:

(i) Numerical vectors e.g. c(2,4,6,8,10) : a vector of numerical elements.

(ii) Character vectors- c(“Gerald”, “Peter”, “Alfred”, “Mildred”, “Tafadzwa”,) : a vector of text
string elements which should be specified and printed in quotes, does not matter whether single
or double

(iii) Logical vectors- c(T,T,F,T,F) : can take the value TRUE or FALSE or (NA).

(b) seq (sequence) : Used for equidistant series of numbers, e.g. seq(2, 10) , or seq(4,20,2)

L siziba(UZ) 2022
[Document title]

(c) rep (replicate): Used to generate repeated values, x<-c(5,10,15), rep(x,4), or rep(x, 1:3) or
rep(1:3, c(8,10,12))

(ii) Reading from a text file

This the most convenient way of reading data into R. Use the command : read.table(path,
header=T). It requires the data to be in an ASCII (American Standard Code for Information
Interchange) which a format created by any plain editor such as Windows NotePad. This results
in a data frame. The first line of the data can contain a header .

The read.table command assumes fields are separated by whitespaces. Variants of the
command are : read.csv and read.csv2 which assume that fields are separated by comma and
semicolons respectively. Another variant is read.delim or read.delim2 for reading delimited
files for which the default delimiter is the Tab character.

(iii)Reading data from other statistical packages and spreadsheets

The simplest way is to request the package to export data as a text file (one of the forms state in
(ii) above). Alternatively the foreign package is recommended for handling other formats like
SPSS, SAS, STATA, Minitab etc.

Vectorised arithmetic

The construct c(...) is used to define vectors. Example

> height<-c(1.75, 1.80, 1.65, 1.90, 1.74, 1.91).

You can do calculations with vectors just like ordinary numbers and operations are applied
element by element.

> weight<-c(60, 72, 57, 90, 95, 72)

>bmi<-weight/height^2

>bmi

Data frames

A data frame corresponds to what is commonly referred to as a data matrix or a data set. It is a
list of vectors and/or factors of the same length which are associated across.

Creating a data frame manually:

Enter your variables as columns, form an array.

e.g >y1=c(1,2,3,4,5)

>y2=c(“Y”, “Y”, “N”, “N”, “Y”)

L siziba(UZ) 2022
[Document title]

>y3=c(7,8,9,10,11)

>ydata=data.frame(y1,y2,y3)

Handling categorical vectors (factors)

A factor is a vector object used to specify a discrete classification (grouping) of the components
of other vectors of the same length. R provides both ordered and unordered factors. A factor is
similarly created using the factor() function applied on a vector of numbers or characters.

Example: We want to capture the sex of the respondents in the data set in the table below.

Quesid Head of Sex of Year of Marital House Annual Beneficiary


Household Head of Birth of Status of hold size Income Status
name HH HHH HH ($)
1 Dube Betty Female 1952 Married 5 1200 Non Beneficiary
2 Hove Tom Male 1988 Single 7 380 Beneficiary
3 Sadza Hama Male 1942 Widowed 12 900 Beneficiary
4 Hope Alice Female 1981 Separated 8 482 Beneficiary
5 Ndlovu Thuli Female 1988 Married 4 2400 Non Beneficiary
6 Sibanda Iso Male 1972 Married 11 800 Beneficiary
7 Chaipa Helna Female 1982 Single 5 680 Beneficiary
8 Moyo Alpha Female 1992 Widowed 4 800 Beneficiary
9 Donga Zet Male 1971 Married 6 450 Beneficiary
10 Ncube Mark Male 1938 Widowed 10 720 Beneficiary
We would type the following at command prompt:

>sex=c( “Female”, “Male” ,“Female” ,“Female”, “Male”, “Female”, “Female”, “Male”, “Male”)

>sexf = factor(sex)

>sexf

The command below can be used to get the levels of the factor directly without listing the
factor.

> levels(sexf)

Alternative we can create the factor by specifying the vector of values, the levels and labels.

>ben=c(2,1,1,1,2,1,1,1,1,1)

>possible.ben=c(1,2)

>labels.ben=c(“Beneficiary”,”Non Beneficiary”)

> benf = factor(ben, levels=possible.ben, labels=labels.ben)

Importing a data frame plain text files

>dataframe_name=read.csv(path, header=T)# to import from a csv format

L siziba(UZ) 2022
[Document title]

The data Editor

R provides 2 ways of editing data interactively.

(i) Using the command data.entry : Allows you to edit variables in the workspace.

e.g >data.entry(x,y,z) NB this works if the variables are already in the work space.

(ii) Using the (a)fix function: This command requires you to call the data frame to display using
the command :

>fix(dataframe name)

Alternative you can use the Edit menu, then Data editor , there after you are prompted to select the
data frame name.

(b) Using the (a) edit function:

>data() was originally intended to allow users to load datasets from packages for use in their
examples. This function lists all available example data sets in the base package and

> data(package=”survival”) will list all example datasets in the survival package.

then type newname<-edit(filename). This brings up a spreadsheet-like editor with a column


for each variable in the data frame. Inside the editor, you can move around with a mouse or
cursor keys and edit the current cell by typing in data. The original data frame is left intact.

Missing values

R allows vectors to contain a special value NA and computations and operations on it yield NA
as the result.

Some basics on handling data frames

>nrow(Dataframe_name)# to get the number of cases in the file

>str(Dataframe_name)# to get the structure of the file i.e dimension as well as the name and
type of each variable.
>names(dataframe_name) # displays variable names

>dim(dataframe_name) # displays dimension of the data frame, number of rows and number
of variables

> summary(dataframe_name) #gives appropriate summary statistics for each variable.

>dataframe_name$variable_name # extracts a variable from a data frame

Attaching and detaching a dataframe

If we attach the data frame, the variables can be referenced directly by name, without the dollar
sign operator.

L siziba(UZ) 2022
[Document title]

> attach(dataframe_name)

> variable_name

We can dettach the data frame if no longer needed

Indexing

Used for selection of data in a vector e.g. z<-(5:12), z[6] will give the element sitting on position
6 of the vector z.

Indexing can also be used to select data in a data frame, e.g. d[6,5], will report the value of the
5th variable for the 6th subject in the data frame d and d[6,] will display the elements in the 6th
row of data frame d.

>d[d$v1>1,] can be used to select cases with v1>1 in the data set.

The transform function can be used to add/append transformed variables to the data set, eg

>transform(dataframe,newv1=log(v1))

Indexing can be used to modify values in a vector data frame. Eg z[6]<-25

Deleting and adding variables to a data frame


The subset function can be used to delete/select variables from a data frame. Suppose n1 is a
dataframe with variable x, y, and z.

>n2=subset(n1, select=c(x, z)) or N2=subset(n1, select=-y) would produce a data frame n2 with
variable y dropped.

The subset command can also be used to select a data set that satisfies a condition:

Eg> subset(dataframe, condition)

>n3=subset(n1, subset=x>5) would produce a data frame n3 with only cases where x>5

Appending a column to a dataframe

>n1$newvar=c(1,2,3,4,5,5) . Please note that the number of elements in the new vector has to be
equal to the number of elements in the data frame.

Descriptive Statistics

The function table() allows frequency tables to be calculated from equal length factors.

>s <- table(sexf)

>table(x,y) prints a cross tabulation of the two variables x and y

L siziba(UZ) 2022
[Document title]

The commands table, xtable and ftable are used for tabulating numeric vectors as well as
factor variables when the data is presented in a unit-wise database.

Marginal tables and relative frequency

Two way or multiway tables need to be in a matrix object.

The margin.table and prop.table commands are used to get marginal and relative frequencies
, respectively, for multiway tables. Eg margin.table(a,2) would give marginal frequencies for
columns .

Matrices and arrays

In R matrices and arrays are represented as vectors with dimensions. An array can be
considered as a multiply subscripted collection of data entries. Matrices can be created using
different functions:

(i) dim sets of changes the dimension of an attribute say y,

e.g Type the following

y<-1:12

dim(x)<-c(3,4)

(ii) matrix function, eg matrix(1:12,nrow=3,byrow=T)

(iii) cbind and rbind ‘glue’ vectors together columnwise or rowwise respectively,

e.g. cbind(A=1:4, =5:8,C =9:12), rbind(A=1:4, =5:8,C =9:12)

Matrix operations :

The matrix product of A and B is given by:

> A %*% B

>eigen(x)#eigen values-vectors

>solve(x)# inverse matrix

>t(x) # transpose matrix

Graphics

One of the most attractive features of R is that it gives a fine control of graphic components. You
can specify the plotting parameters like, plotting characters, line types etc. Learn more using the
help function.

Plot

e.g plot (x,y) , use any two variables of your choice, plot (x,y, pch=3,col=”red”, type=”o”)

L siziba(UZ) 2022
[Document title]

type = default=p (points), “l” for lines, “o”=overplotted, “b” both points
and lines
For a:

vector: values as function of position

matrix: second column as a function of the first

data.frame: Plots each column as a function of others

The barplot()

For a:

vector: produces one bar for each position

matrix: one bar for each column, summing successive values in colours

data.frame: error

Boxplot

For a:

vector: one box for whole vector

matrix: one box plot for all values in a matrix

data.frame: one boxplot for each variable(column)

Multiple figures on one screen


The par() function is used to specify the number and layout of multiple figures on a page.

e.g. >par(mfrow=c(3,2)) sets the display for 3 rows and 2 columns, filled one row at a time
while >par(mfcol=c(3,2)) fills in the entries column wise.

Eg:

>par(mfrow=c(2,2))

brk=10*1:10

v1=c(1,2,3,4,5,6,7,8,9,10)

v2=c(11,12,13,14,15,16,17,18,19,20)

v3=c(1,12,31,41,15,6,7,8,9,10)

>plot(v1, brk, pch=2,col = "red3", type="o")

plot(v2, brk. pch=3, col = "green", type="b"))

L siziba(UZ) 2022
[Document title]

plot(v3, brk, pch=7, col = "yellow", type="l"))

plot(v3, v2, pch=12,col = "blue", xlab="X axis", ylab="y axis", main="The Graph")

dev.new() # to create a new graphic device( to have multiple graphical windows)

dev.off() # closes the active window graphical device(window)

Probability functions:

R provides functions for the density, cumulative distribution function (CDF), percentiles, and for
generating random variates for many commonly applied distributions. For the Poisson
distribution these functions are dpois, ppois, qpois, and rpois, respectively

dpois(x, lambda, log = FALSE)


ppois(q, lambda, lower.tail = TRUE, log.p = FALSE)
qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
rpois(n, lambda)
where:
x vector of (non-negative integer) quantiles.
q vector of quantiles.
p vector of probabilities.
n number of random values to return.
lambda vector of (non-negative) means.
log, log.p logical; if TRUE, probabilities p are given
as log(p).
lower.tail logical; if TRUE (default), probabilities
are P[X ≤ x], otherwise, P[X > x].
For the normal distribution these functions :
are dnorm, pnorm, qnorm, and rnorm.

dnorm(x, mean = 0, sd = 1, log = FALSE)


pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
where:
x, q vector of quantiles.
p vector of probabilities.
n number of observations. If length(n) > 1, the length is taken to be the
number required.
mean vector of means.
sd vector of standard deviations.
log, log.p logical; if TRUE, probabilities p are given as log(p).
lower.tail logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].
For the binomial distribution the functions are:

L siziba(UZ) 2022
[Document title]

dbinom(x, size, prob, log = FALSE)


pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
rbinom(n, size, prob)
where:
x, q vector of quantiles.
p vector of probabilities.
n number of observations. If length(n) > 1, the length is taken to be the
number required.
size number of trials (zero or more).
prob probability of success on each trial.
log, log.p logical; if TRUE, probabilities p are given as log(p).
lower.tail logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x].

R programming

The main attracting feature of R is that It is possible to write your own R functions. R is a true
programming language that allows conditional execution and looping constructs.

Functions

R users interact with the software primarily through functions. The syntax of a function is

f <- function(x, ...){

Where f is the name of the function, x is the name of the first argument (there can be several
arguments), and... indicates possible additional arguments.

Functions can be defined with no arguments, also. The curly brackets enclose the body of the
function. The return value of a function is the value of the last expression evaluated.

Conditional Statements

Conditional statements perform different computations or actions depending on whether a specific


Boolean constraint evaluates to true or false

if(cond) expr
If the condition is TRUE then the expression executed.

if(cond) cons.expr else alt.expr


If-else statement evaluates the condition and executes the block of code present inside the if block if
the condition becomes TRUE and executes a block of code present in the else block if the condition
becomes FALSE.

####### IF STATEMENTS

L siziba(UZ) 2022
[Document title]

# IF

x<-1
if(x == 1) x<-x+1
print(x)

if(x < 10){


t<-x^2
print(t+x)
}

x<-7
if(x =< 10 & 5 =< x){
print("X is between 5 and 10")
}

x<-7
if(x > 10 | 5 > x){
print("X is outside 5 and 10")
}

# Else statements

x<-5
if(x > 10 | 5 > x){
print("X is outside 5 and 10")
}else{
print("X is between 5 and 10")
}
# If/Else ladders

x <- 0
if (x < 0) {
print("Negative number")
} else if (x > 0) {
print("Positive number")
} else
print("Zero")

LOOPS

Loop=cycling or iterating.

A loop is a programming structure that repeats a sequence of instructions until a specific condition is
met.

Loop statements usually have four components: initialization (usually of a loop control variable),
continuation test on whether to do another iteration, an update step, and a loop body.

L siziba(UZ) 2022
[Document title]

The control statement is a combination of some conditions that direct the body of the loop to
execute until the specified condition becomes false

A loop statement tests a condition and enters the body of the loop or exits the loop based on the
test result (true or false): typical loops are for, while, do-while.

Loops can be finite or infinite and can be classified into 2 categories:

1. Entry controlled loop : a condition is checked before executing the body of a loop. It is also
called as a pre-checking loop.

2. Exit controlled loop: a condition is checked after executing the body of a loop. It is also
called as a post-checking loop.

The use of some common loop statements:

WHILE loop

while(test_cond) {expr } ***** entry controlled


If a condition is true then and only then the body of a loop is executed. After the body of a loop is
executed then control again goes back at the beginning, and the condition is checked if it is true, the
same process is executed until the condition becomes false. Once the condition becomes false, the
control goes out of the loop.
FLOW CHART

FOR loop
for(var in sequence) expr
(for (initial value; condition; incrementation or decrementation)
{
statements;
})

The sequence is a vector.


A do...while loop in C is similar to the while loop except that the condition is always executed after
the body of a loop. It is also called an exit-controlled loop.

L siziba(UZ) 2022
[Document title]

FLOW CHART

REPEAT loop

repeat expr

A repeat loop is used to iterate over a block of code multiple number of times.

There is no condition check in repeat loop to exit the loop. Therefore we must put a condition
explicitly inside the body of the loop and use the to exit the loop.

FLOW CHART

L siziba(UZ) 2022
[Document title]

Jump Statements in a loop


Depending on the loop type it may be necessary to include a statement terminate the loop at a
particular iteration or skip the loop for a particular iteration.

(i) A next statement is useful when we want to skip the current iteration of a loop without
terminating it. On encountering next, the R compiler skips further evaluation and starts
next iteration of the loop. That is, next : halts the processing of the current iteration and
advances the looping index

(ii) A break statement is used inside a loop to stop the iterations and flow the control
outside of the loop or control is transferred to the first statement outside the inner-
most loop

In a nested looping situation, where there is a loop inside another loop, this statement exits
from the innermost loop that is being evaluated.

Examples (adapted from the R pdf manual)

for(i in 1:5) print(1:i)


for(n in c(2,5,10,20,50)) {
x <- stats::rnorm(n)
cat(n, ": ", sum(x^2), "\n", sep = "")
}
f <- factor(sample(letters[1:5], 10, replace = TRUE))
for(i in unique(f)) print(i)

Martin examples

for(i in 1:10){ print(i)


}
vec1<-numeric(0)
for(i in 1:10){
vec1[i]<-i^2
}

for(i in c(1,3,6,7)){
print(vec1[i])
}

vec2<-numeric(0)

for(i in c(1,3,5,7,9)){
vec2[i]<-i
}

## WHILE loops

L siziba(UZ) 2022
[Document title]

counter<-1

while(counter<10){
counter<-counter+1
}

print(counter)

counter<-1

while(counter %% 5 != 0){
counter<-counter+1
}
for (i in seq_along(x)){....}
In summary for R: vectorization (vectorised calculations), will be much faster than applying the
same function to each element of the vector individually, so loops are slower in R

Apply function in R
# sapply - apply function over vector/ apply over an object and return a simplified object (an
array) if possible

# lapply - apply function over lists/ apply over an object and return list

# apply - apply function over matrices/ apply over the margins of an array (e.g. the rows or
columns of a matrix)

# mapply - apply multivariate function over two vectors

#tapply allows one to do analysis by a categorising variable (disaggregated analysis) i.e.


function tapply() is used to apply a function to each group of components of the first argument,
defined by the levels of the second component.

Examples

x<-1:10

sapply(X = x, FUN = sqrt)

y<-matrix(1:9,nrow = 3)

apply(X = y,MARGIN = 1,FUN = sqrt)

z<-list(-1,2,5,-7,5,8,10,-3)

lapply(X = z, FUN = abs)

x<-2:5

n<-3:6

L siziba(UZ) 2022
[Document title]

mapply(FUN = log,x = x,base = n)

Example we may require average income by sex of HHH.

Enter the income vector in the same order as the categorising variable:

>income=c(1200,380,900,482,2400,800,680,800,450,720)

> incmeans <- tapply(income, sexf, mean)

Regression and ANOVA

The lm function is used to fit linear models. The argument of lm is a model formula, in which
thetilde symbol (~) stands for “described by”. E.g

model=lm (y~x1+x2) #regression of y on x1 and x2, this automatically puts an intercept, so to


exclude the intercept put -1

eg model3=lm (y~x1+x2-1)

The lm function produces a model object where more information about the model can be
obtained using extractor functions.

The most basic extractor function is summary (model) gives the fitted model coefficients and
other summary statistics.

Other extractor functions include:

anova(model) # gives the anova table

model$coefficient or alternatively coeffcient(model) # would produce model coefficients

model$residuals or alternatively resid(model)# would produce model residuals

model$fitted.values or alternatively fitted(model) #would produce fitted values

predict(model) # with no arguments would just produce fitted values

predict(model, int=”c”) #would produce confidence bands (narrow bands) which reflect
uncertainity about the line itself

predict(model, int=”p”) #would produce prediction bands (wide bands) which include
uncertainity about the future observations

EXAMPLE

consider a simple linear regression model where

>y<-c(38,39,36,45,33,43,38,38,27,34,24,32,31,21,28)

>x<-c(21,26,22,28,19,34,26,29,18,25,23,29,30,16,29)

L siziba(UZ) 2022
[Document title]

>model=lm(y~x)

We may want to predict for future values of x say from 35:40

>pred.frame= data.frame(x=35:40)

>pp=predict(model, int=”p”, newdata=pred.frame)

>pc=predict(model, int="c", newdata=pred.frame)

>plot(x,y,ylim=range(y, pp))

>abline(lm(y~x))

>pred.x=pred.frame$x

> matlines(pred.x,pc,lty=c(1,2,2),col="red")

> matlines(pred.x,pp,lty=c(1,3,3),col="black") video vedio

R Scripts

R commands can be placed in a file, called an R script, and can be run using source or copy paste.
Using the source function causes R to accept input from the named source, such as a file.

In the R GUI users can open a new script window through the File menu. R scripts will be saved
with extension .R.

Using the source function, auto-printing of expressions does not happen and we need to add the
print statements to the script so that the values of objects will be printed. The command you
type is

source("filename.R"). The script file is saved in your working directory.

Example

Create a script file and name it trialdata.R


Type the following in the script file:
# trialdata
k=c(0,1,2,3,4)
x =c(109,65, 22, 3, 1)
p =x / sum(x) #relative frequencies

L siziba(UZ) 2022
[Document title]

print(p)
r =sum(k *p) #mean
v =sum(x *(k - r)^2) / 199 #variance
print(r)
print(v)
f =dpois(k, r)
print(cbind(k, p, f))
On the R console type command source("trialdata.R")

Alternatively a user may:

• Select lines and click the button ‘Run line or selection’ on the toolbar.
• Copy the lines, and then paste the lines at the command prompt.
• (For Windows users:) To execute one or more lines of the file in the R
GUI editor, select the lines and type Ctrl-R.

L siziba(UZ) 2022

You might also like