Unit 2 R
Unit 2 R
V Semester BCA
Prepared By
Sabnam Pradhan
Professor and Faculty of Computer Applications
Unit-2
Reading and writing files, Programming, Calling Functions, Conditions and Loops: stacking statements, coding
loops,Writing Functions, Exceptions, Timings, and Visibility.
----------------------------------------------------------------------------------------------
Reading in External Data Files
The Table Format Files:
Table-format files are best thought of as plain-text files with three key features that fully define how R should
read the data.
Header: If a header is present, it’s always the first line of the file. This optional feature is used to
provide names for each column of data.
Delimiter: The all-important delimiter is a character used to separate the entries in each line.
Missing value: This is another unique character string used exclusively to denote a missing value.
When reading the file, R will turn these entries into the form it recognizes: NA.
Typically, these files have a .txt extension (highlighting the plain-text style)
header is a logical value telling R whether file has a header (TRUE in this case)
sep takes a character string providing the delimiter (a single space, " ", in this case)
na.strings requests the characters used to denote missing values ("*" in this case)
To keep some of your data saved as strings, so set
stringsAsFactors=FALSE, which prevents R from treating all nonnumeric elements as factors.
Or for CSV files:
data <- read.csv("filename.csv")
Reading Excel Files:
library(readxl)
data <- read_excel("filename.xlsx")
Web-Based Files:
R> dia.url <- "http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
R> diamonds <- read.table(dia.url)
R> write.table(x=mydatafile,file="/Users/tdavies/somenewfile.txt",
sep="@",na="??",quote=FALSE,row.names=FALSE)
This command creates a new table-format file called somenewfile.txt in the specified folder location,
delimited by @ and with missing values denoted with ??
mydatafile has variable names, these are automatically written to the file as a header.
The optional logical argument quote determines whether to encapsulate each
non-numeric entry in double quotes
row.names,asks whether to include the row names of mydatafile
eg.,
PROGRAMMING
O/P
Condition satisfied --
-- a list with 2 members now exists.
R> mylist
$aa
[1] NA 5.40 NA 5.29 NA 2.16 NA 6.97 NA 9.52
$bb
[,1] [,2]
[1,] 2.5 0.5
[2,] 0.5 3.5
[3,] 1.5 0.5
[4,] 2.5 1.5
[5,] 3.5 1.5
else Statements:
if(condition){
do any code in here if condition is TRUE
} else {
do any code in here if condition is FALSE
}
if(a<=mynumber){
cat("Condition was",a<=mynumber)
a <- a^2
} else {
cat("Condition was",a<=mynumber)
a <- a-3.5
}
a
if(a<=mynumber){
cat("First condition was TRUE\n")
a <- a^2
if(mynumber>3){
cat("Second condition was TRUE")
b <- seq(1,a,length=mynumber)
} else {
Coding Loops
for Loops:
Syntax:
for(loopindex in loopvector){
do any code in here
}
the loopindex is a placeholder that represents an element in the loop vector—it starts off as the first
element in the vector and moves to the next element with each loop repetition.
Eg 1.,
for(myitem in 5:7){
cat("--BRACED AREA BEGINS--\n")
cat("the current item is",myitem,"\n")
cat("--BRACED AREA ENDS--\n\n")
}
O/P:
--BRACED AREA BEGINS--
the current item is 5
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 6
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 7
--BRACED AREA ENDS—
Eg 2.,
R> foo <- list(aa=c(3.4,1),bb=matrix(1:4,2,2),cc=matrix(c(T,T,F,T,F,F),3,2), dd="string
here",ee=matrix(c("red","green","blue","yellow")))
while Loops
while(loopcondition){
do any code in here
}
myval <- 5
while(myval<10){
myval <- myval+1
cat("\n'myval' is now",myval,"\n")
cat("'mycondition' is now",myval<10,"\n")
}
you can use next to simply advance to the next iteration and continue execution
+ next
+ }
+ loop2.result[i] <- foo/bar[i]
+ }
R> loop2.result
[1] 2.500000 1.666667 4.545455 1.250000 NA 1.219512 1.666667
repeat{
print(x)
x = x + 1
if(x > 5){
break}}
WRITING FUNCTIONS
if(fib.b>150){
cat("BREAK NOW...")
break
}
}
}
R> myfib()
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW... Adding Arguments
cat("BREAK NOW...")
break
}
}
}
R> myfib2(thresh=150)
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...
Returning Results
myfib3 <- function(thresh){
fibseq <- c(1,1)
counter <- 2
repeat{
fibseq <- c(fibseq,fibseq[counter-1]+fibseq[counter]) counter <-
counter+1
if(fibseq[counter]>thresh){
break
}
}
return(fibseq)
CALLING FUNCTIONS
Global Environment
The global environment is the compartment set aside for user-defined objects. Current
global Environment is all the objects, variables, and user-defined functions in the active
workspace.
Local Environments
Each time a function is called in R, a new environment is created called the local
environment, sometimes referred to as the lexical environment. This local environment contains
all the objects and variables created in and visible to the function, including any arguments
you’ve supplied to the function upon execution.
Search Path
The search path is basically a list of the environments that R will search when an object is requested.
R> search()
[1] ".GlobalEnv" "tools:RGUI" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils" [7] "package:datasets"
"package:methods" "Autoloads" [10] "package:base"
R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9
Partial
Partial matching lets you identify arguments with an abbreviated tag. This can shorten your code, and it still
lets you provide arguments in any order.
• Each tag must have a unique identification, which can be difficult to remember.
Positional
The most compact mode of function calling in R is positional matching. This is when you supply arguments
without tags, and R interprets them based solely on their order.
Args() function lets you know the position of arguments
R> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) NULL
On knowing the position, you can give the arguments.
R> bar <- matrix(1:9,3,3,F,list(c("A","B","C"),c("D","E","F"))) R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9
The benefits of positional matching are as follows:
• Shorter, cleaner code, particularly for routine tasks
• No need to remember specific argument tags
Drawbacks of positional matching:
• You must look up and exactly match the defined order of arguments. • Reading code written by
someone else can be more difficult, especially when it includes unfamiliar functions.
Mixed
Since each matching style has pros and cons, it’s quite common, and perfectly legal, to mix these three styles
in a single function call.
R> bar <- matrix(1:9,3,3,dim=list(c("A","B","C"),c("D","E","F"))) R> bar
D E F
A 1 4 7
B 2 5 8
C 3 6 9
R> args(data.frame)
function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, stringsAsFactors =
default.stringsAsFactors())
NULL
When you call a function and supply an argument that can’t be matched with one of the function’s defined
argument tags, normally this would produce an error. But if the function is defined with an ellipsis, any
arguments that aren’t matched to other argument tags are matched to the ellipsis.
Function
Eg.,
prog_test<- function(n){
result<- 0
progbar<- txtProgressBar(min=0,max=n,style=1,char="=")
for(i in 1:n){
result <- result + 1
Sys.sleep(0.5)
setTxtProgressBar(progbar,value=i)
}
close(progbar)
return(result)
}
txtProgressBar has four arguments
The min and max arguments are numeric values that define the limits of the bar
The style argument (integer, either 1, 2, or 3) and the char argument (character string,
usually a single character) govern the appearance of the bar.
To instruct the bar to actually progress during execution with a call to setTxtProgressBar.
You pass in the bar object to update (progbar) and the value it should update to (in this case, i)
Once complete (after exiting the loop), the progress bar must be terminated with a call to close,
passing in the bar object of interest.
R> prog_test(8)
Masking
you define a function with the same name as a function in an R package that you have already loaded. R
responds by masking one of the objects—that is, one object or function will take precedence over the other
and assume the object or function name, while the masked function must be called with an additional
command. This protects objects from overwriting or blocking one another.
Function and Object Distinction
When two functions or objects in different environments have the same name, the object that comes
earlier in the search path will mask the later one.
Eg.,
This is how built-in sum in R package works
R> foo <- c(4,1.5,3)
R> sum(foo)
[1] 8.5
Now, suppose you were to enter the following function
sum <- function(x){
result <- 0
for(i in 1:length(x)){
Now, after importing the function, if you make a call to sum, your version is used rather than built in sum
.
This happens because the user-defined function is stored in the global environment (.GlobalEnv), which
always comes first in the search path. R’s built-in function is part of the base package, which comes at the
end of thesearch path.
R> sum(foo)
[1] 27.25
To call base version sum you have to include the name of its package in the call, with a double colon.
R> base::sum(foo)
[1] 8.5
When Package Objects Clash
When you load a package, R will notify you if any objects in the package clash with other objects that are
accessible in the present session.
R> library("spatstat")
spatstat 1.40-0 (nickname: 'Do The Maths')
For an introduction to spatstat, type 'beginner'
R> library("car")
Attaching package: 'car'
The following object is masked from 'package:spatstat':
ellipse
This indicates that the two packages each have an object with the same name— ellipse. Now when you type
ellipse, the car version will be executed as it comes first in search() path,as it was recently added. To use spatstat’s
version, you must type spatstat::ellipse.
Unmounting Packages
The detach functions unmounts the mentioned package from the search path.
R> detach("package:car",unload=TRUE)
R> search()
[1] ".GlobalEnv" "package:MASS" "package:spatstat"
[4] "tools:RGUI" "package:stats" "package:graphics"
[7] "package:grDevices" "package:utils" "package:datasets" [10] "package:methods"
"Autoloads" "package:base
The data frame foo has three column variables: surname, sex, and height.To access one of these columns,
normally you need to use the $ operator and enter something like foo$surname.
However, you can attach a data frame directly to your search path, which makes it easier to access a
variable. You can just access it by name like surname in this example.
R> attach(foo)
R> search()
[1] ".GlobalEnv" "foo" "package:MASS"
[4] "package:spatstat" "tools:RGUI" "package:stats"
[7] "package:graphics" "package:grDevices" "package:utils" [10] "package:datasets"
"package:methods" "Autoloads" [13] "package:base"
R> surname
[1] "a" "b" "c" "d"
• If you declare another dataframe for example bar, that has a column height and attach(bar). The height
column in foo will be masked.