0% found this document useful (0 votes)
13 views16 pages

Unit 2 R

R Programming NEP 5th SEM BCA

Uploaded by

sabnam pradhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views16 pages

Unit 2 R

R Programming NEP 5th SEM BCA

Uploaded by

sabnam pradhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Faculty of Computer Applications

DSC14- Statistical Computing And R Programming

V Semester BCA

Prepared By
Sabnam Pradhan
Professor and Faculty of Computer Applications
Unit-2

Reading and writing files, Programming, Calling Functions, Conditions and Loops: stacking statements, coding
loops,Writing Functions, Exceptions, Timings, and Visibility.

----------------------------------------------------------------------------------------------
Reading in External Data Files
The Table Format Files:
Table-format files are best thought of as plain-text files with three key features that fully define how R should
read the data.
Header: If a header is present, it’s always the first line of the file. This optional feature is used to
provide names for each column of data.
Delimiter: The all-important delimiter is a character used to separate the entries in each line.
Missing value: This is another unique character string used exclusively to denote a missing value.
When reading the file, R will turn these entries into the form it recognizes: NA.

Typically, these files have a .txt extension (highlighting the plain-text style)

R> mydatafile <- read.table(file="/Users/tdavies/mydatafile.txt", header=TRUE,sep="


",na.strings="*",
stringsAsFactors=FALSE)

 header is a logical value telling R whether file has a header (TRUE in this case)
 sep takes a character string providing the delimiter (a single space, " ", in this case)
 na.strings requests the characters used to denote missing values ("*" in this case)
 To keep some of your data saved as strings, so set
stringsAsFactors=FALSE, which prevents R from treating all nonnumeric elements as factors.
Or for CSV files:
data <- read.csv("filename.csv")
Reading Excel Files:
library(readxl)
data <- read_excel("filename.xlsx")

Web-Based Files:
R> dia.url <- "http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
R> diamonds <- read.table(dia.url)

Writing Out Data Files


The function for writing table-format files to your computer is write.table.

R> write.table(x=mydatafile,file="/Users/tdavies/somenewfile.txt",
sep="@",na="??",quote=FALSE,row.names=FALSE)

 This command creates a new table-format file called somenewfile.txt in the specified folder location,
delimited by @ and with missing values denoted with ??
 mydatafile has variable names, these are automatically written to the file as a header.
 The optional logical argument quote determines whether to encapsulate each
 non-numeric entry in double quotes
 row.names,asks whether to include the row names of mydatafile

Ad Hoc Object Read/Write Operations


if you need to read or write other kinds of R objects, such as lists or arrays, you’ll need the dput and dget
command

eg.,

R> somelist <-


list(foo=c(5,2,45),bar=matrix(data=c(T,T,F,F,F,F,T,F,T),nrow=3,ncol=3),
baz=factor(c(1,2,2,3,1,1,3),levels=1:3,ordered=T))
R> somelist
$foo
[1] 5 2 45
$bar
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] FALSE FALSE TRUE
$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3

R> dput(x=somelist,file="/Users/tdavies/myRobject.txt") R> newobject <-


dget(file="/Users/tdavies/myRobject.txt") R> newobject
$foo
[1] 5 2 45
$bar
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE

[3,] FALSE FALSE TRUE


$baz
[1] 1 2 2 3 1 1 3
Levels: 1 < 2 < 3

PROGRAMMING

CONDITIONS AND LOOPS


if Statements:
An if statement runs a block of code only if a certain condition is true. Stand-Alone Statement:
if(condition){
do any code here
}
Eg.,
R> a <- 3
R> mynumber <- 4
if(a<=mynumber){
a <- a^2
}

To illustrate a more complicated if statement, consider the following Objects:


R> myvec <-c(2.73,5.40,2.15,5.29,1.36,2.16,1.41,6.97,7.99,9.52) R> myvec
[1] 2.73 5.40 2.15 5.29 1.36 2.16 1.41 6.97 7.99 9.52
if(any((myvec-1)>9)||matrix(myvec,2,5)[2,1]<=6){
cat("Condition satisfied --\n")
new.myvec <- myvec
new.myvec[seq(1,9,2)] <- NA
mylist <- list(aa=new.myvec,bb=mymat+0.5)
cat("-- a list with",length(mylist),"members now exists.") }

O/P
Condition satisfied --
-- a list with 2 members now exists.

R> mylist
$aa
[1] NA 5.40 NA 5.29 NA 2.16 NA 6.97 NA 9.52
$bb

[,1] [,2]
[1,] 2.5 0.5
[2,] 0.5 3.5
[3,] 1.5 0.5
[4,] 2.5 1.5
[5,] 3.5 1.5

else Statements:
if(condition){
do any code in here if condition is TRUE
} else {
do any code in here if condition is FALSE
}

if(a<=mynumber){
cat("Condition was",a<=mynumber)
a <- a^2
} else {
cat("Condition was",a<=mynumber)
a <- a-3.5
}
a

Using ifelse for Element-wise Checks


An if statement can check the condition of only a single logical value. To check more than one logical value
in a vector we use ifelse
R> x <- 5
R> y <- -5:5
R> y
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5

R> result <- ifelse(test=y==0,yes=NA,no=x/y)


R> result
[1] -1.000000 -1.250000 -1.666667 -2.500000 -5.000000 NA 5.000000 2.500000
[9] 1.666667 1.250000 1.000000
Three arguments must be specified: test takes a logical-valued data structure, yes provides the element to
return if the condition is satisfied, and no gives the element to return if the condition is FALSE.

Nesting and Stacking Statements


An if statement can itself be placed within the outcome of another if statement.

if(a<=mynumber){
cat("First condition was TRUE\n")
a <- a^2
if(mynumber>3){
cat("Second condition was TRUE")
b <- seq(1,a,length=mynumber)
} else {

cat("Second condition was FALSE")


b <- a*mynumber
}
}

The switch Function


R can handle this type of multiple-choice decision in a far more compact form via the switch function.
Syntax:
switch(expression, case1, case2, case3....)
eg.,
R> mystring <- "Lisa"
R> foo <-switch(EXPR=mystring,Homer=12,Marge=34,Bart=56,Lisa=78,Maggie=90,NA) R> foo
[1] 78

R> mynum <- 3


R> foo <- switch(mynum,12,34,56,78,NA)
R> foo
[1] 56

Coding Loops
for Loops:
Syntax:
for(loopindex in loopvector){
do any code in here
}

 the loopindex is a placeholder that represents an element in the loop vector—it starts off as the first
element in the vector and moves to the next element with each loop repetition.

Eg 1.,
for(myitem in 5:7){
cat("--BRACED AREA BEGINS--\n")
cat("the current item is",myitem,"\n")
cat("--BRACED AREA ENDS--\n\n")
}

O/P:
--BRACED AREA BEGINS--
the current item is 5
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 6
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 7
--BRACED AREA ENDS—

Eg 2.,
R> foo <- list(aa=c(3.4,1),bb=matrix(1:4,2,2),cc=matrix(c(T,T,F,T,F,F),3,2), dd="string
here",ee=matrix(c("red","green","blue","yellow")))

R> name <- names(foo)


R> name
[1] "aa" "bb" "cc" "dd" "ee"
R> is.mat <- rep(NA,length(foo))
R> is.mat
[1] NA NA NA NA NA
R> nr <- is.mat
R> nc <- is.mat
R> data.type <- is.mat
for(i in 1:length(foo)){
member <- foo[[i]]
if(is.matrix(member)){
is.mat[i] <- "Yes"
nr[i] <- nrow(member)
nc[i] <- ncol(member)
data.type[i] <- class(as.vector(member))
} else {
is.mat[i] <- "No"
}
}
bar <- data.frame(name,is.mat,nr,nc,data.type,stringsAsFactors=FALSE)
R> bar
name is.mat nr nc data.type
1 aa No NA NA <NA>
2 bb Yes 2 2 integer
3 cc Yes 3 2 logical
4 dd No NA NA <NA>
5 ee Yes 4 1 character

Nesting for Loops


 When a for loop is nested in another for loop, the inner loop is executed in full before
the outer loop loopindex is incremented, at which point the inner loop is executed all
over again.
R> loopvec1 <- 5:7
R> loopvec2 <- 9:6
R> foo <- matrix(NA,length(loopvec1),length(loopvec2))
R> for(i in 1:length(loopvec1)){
+ for(j in 1:length(loopvec2)){
+ foo[i,j] <- loopvec1[i]*loopvec2[j]
+ }
+ }
R> foo
[,1] [,2] [,3] [,4]
[1,] 45 40 35 30
[2,] 54 48 42 36
[3,] 63 56 49 42

while Loops

while(loopcondition){
do any code in here
}
myval <- 5
while(myval<10){
myval <- myval+1
cat("\n'myval' is now",myval,"\n")
cat("'mycondition' is now",myval<10,"\n")
}

Implicit Looping with apply


The apply function is the most basic form of implicit looping—it takes a function and applies it to
each margin of an array.

Syntax: apply( x, margin, function )


 x: determines the input array including matrix.
 margin: If the margin is 1 function is applied across the row, if the margin is 2 it is applied across the
column.
 function: determines the function that is to be applied on input data.
R> foo <- matrix(1:12,4,3)
R> foo
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
R> row.totals2 <- apply(X=foo,MARGIN=1,FUN=sum)
R> row.totals2
[1] 15 18 21 24

Declaring break or next


 You can preemptively terminate a loop by declaring break.

 you can use next to simply advance to the next iteration and continue execution

R> bar <- c(2,3,1.1,4,0,4.1,3)


R> loop1.result <- rep(NA,length(bar))
R> loop1.result
[1] NA NA NA NA NA NA NA
R> for(i in 1:length(bar)){
+ temp <- foo/bar[i]
+ if(is.finite(temp)){
+ loop1.result[i] <- temp
+ } else {
+ break
+ }
+ }
R> loop1.result
[1] 2.500000 1.666667 4.545455 1.250000 NA NA NA

R> loop2.result <- rep(NA,length(bar))


R> loop2.result
[1] NA NA NA NA NA NA NA
R> for(i in 1:length(bar)){
+ if(bar[i]==0){

+ next
+ }
+ loop2.result[i] <- foo/bar[i]
+ }
R> loop2.result
[1] 2.500000 1.666667 4.545455 1.250000 NA 1.219512 1.666667

The repeat Statement


 A repeat statement doesn’t include any kind of loop index for loop condition.
 To stop repeating the code inside the braces, you must use a break declaration inside the braced
area
repeat{statements
....
if(expression){
break
}}
Example:
x = 1
# Print 1 to 5

repeat{
print(x)
x = x + 1
if(x > 5){
break}}
WRITING FUNCTIONS

A function definition always follows this standard format:

functionname <- function(arg1,arg2,arg3,...){


do any code in here when called
return(returnobject)
}
Eg.,
myfib <- function(){
fib.a <- 1
fib.b <- 1
cat(fib.a,", ",fib.b,", ",sep="")
repeat{
temp <- fib.a+fib.b
fib.a <- fib.b
fib.b <- temp
cat(fib.b,", ",sep="")

if(fib.b>150){
cat("BREAK NOW...")
break
}
}
}

R> myfib()
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW... Adding Arguments

myfib2 <- function(thresh){


fib.a <- 1
fib.b <- 1
cat(fib.a,", ",fib.b,", ",sep="")
repeat{
temp <- fib.a+fib.b
fib.a <- fib.b
fib.b <- temp
cat(fib.b,", ",sep="")
if(fib.b>thresh){

cat("BREAK NOW...")
break
}
}
}

R> myfib2(thresh=150)
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...

Returning Results
myfib3 <- function(thresh){
fibseq <- c(1,1)
counter <- 2
repeat{
fibseq <- c(fibseq,fibseq[counter-1]+fibseq[counter]) counter <-
counter+1
if(fibseq[counter]>thresh){
break
}
}
return(fibseq)

R> foo <- myfib3(150)


R> foo
[1] 1 1 2 3 5 8 13 21 34 55 89 144 233

CALLING FUNCTIONS
Global Environment
The global environment is the compartment set aside for user-defined objects. Current
global Environment is all the objects, variables, and user-defined functions in the active
workspace.

Package Environments and Namespaces


Package environment rather loosely refers to the items made available by each package in R.
Each package environment actually represents several environments that control different aspects
of a search for a given object. A package namespace, for example, essentially defines the visibility
of its functions. (A package can have visible functions that a user is able to use and invisible
functions that provide internal support to the visible functions.) Another part of the
package environment handles import designations, dealing with any functions or objects from
other libraries that the package needs to import for its own functionality.

Local Environments
Each time a function is called in R, a new environment is created called the local

environment, sometimes referred to as the lexical environment. This local environment contains
all the objects and variables created in and visible to the function, including any arguments
you’ve supplied to the function upon execution.

Search Path
The search path is basically a list of the environments that R will search when an object is requested.
R> search()
[1] ".GlobalEnv" "tools:RGUI" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils" [7] "package:datasets"
"package:methods" "Autoloads" [10] "package:base"

Argument Matching Exact:


Exact matching of arguments, where each argument tag is written out in full.

Benefits of exact matching include the following:


• Exact matching is less prone to mis-specification of arguments than other matching styles.
• The order in which arguments are supplied doesn’t matter.
• Exact matching is useful when a function has many possible arguments but you want to specify only
a few.
The main drawbacks of exact matching are clear:
• It can be cumbersome for relatively simple operations.
• Exact matching requires the user to remember or look up the full, case sensitive tags.
Eg.,
R> bar <- matrix(data=1:9,nrow=3,ncol=3,dimnames=list(c("A","B","C"), c("D","E","F")))
R> bar <-
matrix(nrow=3,dimnames=list(c("A","B","C"),c("D","E","F")),ncol=3, data=1:9)

R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9

Partial
Partial matching lets you identify arguments with an abbreviated tag. This can shorten your code, and it still
lets you provide arguments in any order.

R> bar <-matrix(nr=3,di=list(c("A","B","C"),c("D","E","F")),nc=3,dat=1:9) R> bar


D E F
A 1 4 7
B 2 5 8
C 3 6 9
Partial matching has the following benefits:
• It requires less code than exact matching.
• Argument tags are still visible (which limits the possibility of misspecification).
• The order of supplied arguments still doesn’t matter.
Drawbacks of partial matching include the following:
• The user must be aware of other potential arguments that can be matched by the shortened tag (even if they
aren’t specified in the call or have a default value assigned).

• Each tag must have a unique identification, which can be difficult to remember.

Positional
The most compact mode of function calling in R is positional matching. This is when you supply arguments
without tags, and R interprets them based solely on their order.
Args() function lets you know the position of arguments
R> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) NULL
On knowing the position, you can give the arguments.
R> bar <- matrix(1:9,3,3,F,list(c("A","B","C"),c("D","E","F"))) R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9
The benefits of positional matching are as follows:
• Shorter, cleaner code, particularly for routine tasks
• No need to remember specific argument tags
Drawbacks of positional matching:
• You must look up and exactly match the defined order of arguments. • Reading code written by
someone else can be more difficult, especially when it includes unfamiliar functions.

Mixed
Since each matching style has pros and cons, it’s quite common, and perfectly legal, to mix these three styles
in a single function call.
R> bar <- matrix(1:9,3,3,dim=list(c("A","B","C"),c("D","E","F"))) R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9

Dot-Dot-Dot: Use of Ellipses


Many functions exhibit variadic behaviours. That is, they can accept any number of arguments, and it’s up to
the user to decide how many arguments to provide. The functions c, data.frame, and list are all like this. This
flexibility is achieved in R through the special dot-dot-dot designation(...), also called the ellipsis. Ellipses are
a convenient programming tool for writing variadic functions or functions where an unknown number of
arguments may be supplied.

R> args(data.frame)
function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, stringsAsFactors =
default.stringsAsFactors())
NULL

When you call a function and supply an argument that can’t be matched with one of the function’s defined
argument tags, normally this would produce an error. But if the function is defined with an ellipsis, any
arguments that aren’t matched to other argument tags are matched to the ellipsis.

EXCEPTIONS, TIMINGS,AND VISIBILITY

Formal Notifications: Errors and Warnings


An error forces the function to immediately terminate at the point it occurs. A warning is less severe. In R, you
can issue warnings with the warning command, and you can throw errors with the stop command.
Eg.,

warn_test <- function(x){


if(x<=0){
warning("'x' is less than or equal to 0 but setting it to 1 and continuing")
x <- 1
}
return(5/x)
}

error_test <- function(x){


if(x<=0){
stop("'x' is less than or equal to 0... TERMINATE")
}
return(5/x)
}
 In warn_test, if x is nonpositive, the function issues a warning, and x is overwritten to be 1 and execution
continues.
 In error_test, on the other hand, if x is nonpositive, the function throws an error and terminates
immediately.

Catching Errors with try Statements


 When a function terminates from an error, it also terminates any parent functions.
 To avoid this severe consequence, you can use a try statement to attempt a function call and check
whether it produces an error.
 You can also use an if statement to specify alternative operations, rather than allowing all processes
to cease.
Eg.,

myfibrec2 <- function(n){


if(n<0){
warning("Assuming you meant 'n' to be positive -- doing that instead")
n <- n*-1
} else if(n==0){
stop("'n' is uninterpretable at 0")
}
if(n==1||n==2){
return(1)
} else {
return(myfibrec2(n-1)+myfibrec2(n-2))
}
}

R> attempt1 <- try(myfibrec2(0),silent=TRUE)


 The error is not displayed as the function is called within try and silent = true. The error message is stored
in attempt1. attempt1 is of class or mode “try-error”
 If silent=false then error message is displayed and also stored in attempt1. Using try in the Body of a

Function

Eg.,

myfibvectorTRY <- function(nvec){


nterms <- length(nvec)
result <- rep(0,nterms)
for(i in 1:nterms){
attempt <- try(myfibrec2(nvec[i]),silent=T)
if(class(attempt)=="try-error"){
result[i] <- NA
} else {
result[i] <- attempt
}
}
return(result)
}
 Here, within the for loop, you use attempt to store the result of trying each call to myfibrec2
 Then, you inspect the attempt. If this object’s class is try-error", that means myfibrec2 produced an error,
and you fill the corresponding slot in the result vector with NA. Otherwise, attempt will represent a valid
return value from myfibrec2, so you place it in the corresponding slot of the result vector.
Suppressing Warning Messages
If silent =false and still the warning message has to be suppressed you can use suppressWarnings
R> attempt4 <- suppressWarnings(myfibrec2(-3))

Progress and Timing


 A progress bar shows how far along R is as it executes a set of operations.
 The Sys.sleep command makes R pause for a specified amount of time, in seconds, before continuing.
R> Sys.sleep(3)
 You can implement a textual progress bar with three steps:
1. Initialize the bar object with txtProgressBar
2. update the bar with setTxtProgressBar
3. terminate the bar with close
eg.

prog_test<- function(n){
result<- 0
progbar<- txtProgressBar(min=0,max=n,style=1,char="=")
for(i in 1:n){
result <- result + 1
Sys.sleep(0.5)
setTxtProgressBar(progbar,value=i)
}
close(progbar)
return(result)
}
 txtProgressBar has four arguments
 The min and max arguments are numeric values that define the limits of the bar
 The style argument (integer, either 1, 2, or 3) and the char argument (character string,
usually a single character) govern the appearance of the bar.
 To instruct the bar to actually progress during execution with a call to setTxtProgressBar.
 You pass in the bar object to update (progbar) and the value it should update to (in this case, i)
 Once complete (after exiting the loop), the progress bar must be terminated with a call to close,
passing in the bar object of interest.
R> prog_test(8)

Measuring Completion Time:


 If you want to know how long a computation takes to complete, you can use the Sys.time command, which
outputs an object that details current date and time information based on your system
R> Sys.time()
[1] "2016-03-06 16:39:27 NZDT"
R> t1 <- Sys.time()
R> Sys.sleep(3)
R> t2 <- Sys.time()
R> t2-t1
Time difference of 3.012889 secs

Masking
you define a function with the same name as a function in an R package that you have already loaded. R
responds by masking one of the objects—that is, one object or function will take precedence over the other
and assume the object or function name, while the masked function must be called with an additional
command. This protects objects from overwriting or blocking one another.
Function and Object Distinction
 When two functions or objects in different environments have the same name, the object that comes
earlier in the search path will mask the later one.
Eg.,
This is how built-in sum in R package works
R> foo <- c(4,1.5,3)
R> sum(foo)
[1] 8.5
Now, suppose you were to enter the following function
sum <- function(x){
result <- 0
for(i in 1:length(x)){

result <- result + x[i]^2


}
return(result)
}

 Now, after importing the function, if you make a call to sum, your version is used rather than built in sum
.

 This happens because the user-defined function is stored in the global environment (.GlobalEnv), which
always comes first in the search path. R’s built-in function is part of the base package, which comes at the
end of thesearch path.
R> sum(foo)
[1] 27.25

 To call base version sum you have to include the name of its package in the call, with a double colon.
R> base::sum(foo)
[1] 8.5
When Package Objects Clash
When you load a package, R will notify you if any objects in the package clash with other objects that are
accessible in the present session.
R> library("spatstat")
spatstat 1.40-0 (nickname: 'Do The Maths')
For an introduction to spatstat, type 'beginner'
R> library("car")
Attaching package: 'car'
The following object is masked from 'package:spatstat':
ellipse
This indicates that the two packages each have an object with the same name— ellipse. Now when you type
ellipse, the car version will be executed as it comes first in search() path,as it was recently added. To use spatstat’s
version, you must type spatstat::ellipse.
Unmounting Packages
The detach functions unmounts the mentioned package from the search path.

R> detach("package:car",unload=TRUE)
R> search()
[1] ".GlobalEnv" "package:MASS" "package:spatstat"
[4] "tools:RGUI" "package:stats" "package:graphics"
[7] "package:grDevices" "package:utils" "package:datasets" [10] "package:methods"
"Autoloads" "package:base

Data Frame Variable Distinction


There’s one other common situation in which you’ll be explicitly notified of masking: when you add a data
frame to the search path.

R> foo <- data.frame(surname=c("a","b","c","d"),


sex=c(0,1,1,0),height=c(170,168,181,180),
stringsAsFactors=F)
R> foo
surname sex height
1 a 0 170
2 b 1 168
3 c 1 181
4 d 0 180

 The data frame foo has three column variables: surname, sex, and height.To access one of these columns,
normally you need to use the $ operator and enter something like foo$surname.
 However, you can attach a data frame directly to your search path, which makes it easier to access a
variable. You can just access it by name like surname in this example.

R> attach(foo)
R> search()
[1] ".GlobalEnv" "foo" "package:MASS"
[4] "package:spatstat" "tools:RGUI" "package:stats"
[7] "package:graphics" "package:grDevices" "package:utils" [10] "package:datasets"
"package:methods" "Autoloads" [13] "package:base"
R> surname
[1] "a" "b" "c" "d"

• If you declare another dataframe for example bar, that has a column height and attach(bar). The height
column in foo will be masked.

You might also like