Unit2 R
Unit2 R
Unit-2 R-Programming
Reading in External Data Files
The Table Format Files:
Table-format files are best thought of as plain-text files with three key features
that fully define how R should read the data.
Header: If a header is present, it’s always the first line of the file. This
optional feature is used to provide names for each column of data.
Delimiter: The all-important delimiter is a character used to separate the
entries in each line.
Missing value: This is another unique character string used exclusively
to denote a missing value. When reading the file, R will turn these
entries into the form it recognizes: NA.
header is a logical value telling R whether file has a header (TRUE in this
case)
sep takes a character string providing the delimiter (a single space, " ", in
this case)
na.strings requests the characters used to denote missing values ("*" in
this case)
To keep some of your data saved as strings, so set
stringsAsFactors=FALSE, which prevents R from treating all nonnumeric
elements as factors.
Or for CSV files:
data <- read.csv("filename.csv")
Reading Excel Files:
library(readxl)
data <- read_excel("filename.xlsx")
Web-Based Files:
R> dia.url <- "http://www.amstat.org/publications/jse/v9n2/4cdata.txt"
R> diamonds <- read.table(dia.url)
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
R> write.table(x=mydatafile,file="/Users/tdavies/somenewfile.txt",
sep="@",na="??",quote=FALSE,row.names=FALSE)
eg.,
R> dput(x=somelist,file="/Users/tdavies/myRobject.txt")
R> newobject <- dget(file="/Users/tdavies/myRobject.txt")
R> newobject
$foo
[1] 5 2 45
$bar
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE FALSE FALSE
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
PROGRAMMING
if(any((myvec-1)>9)||matrix(myvec,2,5)[2,1]<=6){
cat("Condition satisfied --\n")
new.myvec <- myvec
new.myvec[seq(1,9,2)] <- NA
mylist <- list(aa=new.myvec,bb=mymat+0.5)
cat("-- a list with",length(mylist),"members now exists.")
}
O/P
Condition satisfied --
-- a list with 2 members now exists.
R> mylist
$aa
[1] NA 5.40 NA 5.29 NA 2.16 NA 6.97 NA 9.52
$bb
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
[,1] [,2]
[1,] 2.5 0.5
[2,] 0.5 3.5
[3,] 1.5 0.5
[4,] 2.5 1.5
[5,] 3.5 1.5
else Statements:
if(condition){
do any code in here if condition is TRUE
} else {
do any code in here if condition is FALSE
}
if(a<=mynumber){
cat("Condition was",a<=mynumber)
a <- a^2
} else {
cat("Condition was",a<=mynumber)
a <- a-3.5
}
a
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
if(a<=mynumber){
cat("First condition was TRUE\n")
a <- a^2
if(mynumber>3){
cat("Second condition was TRUE")
b <- seq(1,a,length=mynumber)
} else {
cat("Second condition was FALSE")
b <- a*mynumber
}
}
Coding Loops
for Loops:
Syntax:
for(loopindex in loopvector){
do any code in here
}
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
Eg 1.,
for(myitem in 5:7){
cat("--BRACED AREA BEGINS--\n")
cat("the current item is",myitem,"\n")
cat("--BRACED AREA ENDS--\n\n")
}
O/P:
--BRACED AREA BEGINS--
the current item is 5
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 6
--BRACED AREA ENDS--
--BRACED AREA BEGINS--
the current item is 7
--BRACED AREA ENDS—
Eg 2.,
R> foo <- list(aa=c(3.4,1),bb=matrix(1:4,2,2),cc=matrix(c(T,T,F,T,F,F),3,2),
dd="string here",ee=matrix(c("red","green","blue","yellow")))
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
R> bar
name is.mat nr nc data.type
1 aa No NA NA <NA>
2 bb Yes 2 2 integer
3 cc Yes 3 2 logical
4 dd No NA NA <NA>
5 ee Yes 4 1 character
while Loops
while(loopcondition){
do any code in here
}
myval <- 5
while(myval<10){
myval <- myval+1
cat("\n'myval' is now",myval,"\n")
cat("'mycondition' is now",myval<10,"\n")
}
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
+ next
+}
+ loop2.result[i] <- foo/bar[i]
+}
R> loop2.result
[1] 2.500000 1.666667 4.545455 1.250000 NA 1.219512 1.666667
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
if(fib.b>150){
cat("BREAK NOW...")
break
}
}
}
R> myfib()
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...
Adding Arguments
R> myfib2(thresh=150)
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, BREAK NOW...
Returning Results
myfib3 <- function(thresh){
fibseq <- c(1,1)
counter <- 2
repeat{
fibseq <- c(fibseq,fibseq[counter-1]+fibseq[counter])
counter <- counter+1
if(fibseq[counter]>thresh){
break
}
}
return(fibseq)
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
CALLING FUNCTIONS
Global Environment
The global environment is the compartment set aside for user-defined objects.
Current global Environment is all the objects, variables, and user-defined
functions in the active workspace.
Local Environments
Each time a function is called in R, a new environment is created called the local
environment, sometimes referred to as the lexical environment. This local
environment contains all the objects and variables created in and visible to the
function, including any arguments you’ve supplied to the function upon
execution.
Search Path
The search path is basically a list of the environments that R will search when an
object is requested.
R> search()
[1] ".GlobalEnv" "tools:RGUI" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads"
[10] "package:base"
Argument Matching
Exact:
Exact matching of arguments, where each argument tag is written out in full.
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
R> bar
DEF
A147
B258
C369
Partial
Partial matching lets you identify arguments with an abbreviated tag. This can
shorten your code, and it still lets you provide arguments in any order.
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
Positional
The most compact mode of function calling in R is positional matching. This
is when you supply arguments without tags, and R interprets them based
solely on their order.
Args() function lets you know the position of arguments
R> args(matrix)
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames =
NULL) NULL
On knowing the position you can give the arguments.
R> bar <- matrix(1:9,3,3,F,list(c("A","B","C"),c("D","E","F")))
R> bar
DEF
A147
B258
C369
The benefits of positional matching are as follows:
• Shorter, cleaner code, particularly for routine tasks
• No need to remember specific argument tags
Drawbacks of positional matching:
• You must look up and exactly match the defined order of arguments.
• Reading code written by someone else can be more difficult, especially
when it includes unfamiliar functions.
Mixed
Since each matching style has pros and cons, it’s quite common, and perfectly
legal, to mix these three styles in a single function call.
R> bar <- matrix(1:9,3,3,dim=list(c("A","B","C"),c("D","E","F")))
R> bar
DEF
A147
B258
C369
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
R> args(data.frame)
function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, stringsAsFactors = default.stringsAsFactors())
NULL
When you call a function and supply an argument that can’t be matched with
one of the function’s defined argument tags, normally this would produce an
error. But if the function is defined with an ellipsis, any arguments that aren’t
matched to other argument tags are matched to the ellipsis.
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
The error is not displayed as the function is called within try and silent =
true. The error message is stored in attempt1. attempt1 is of class or mode
“try-error”
If silent=false then error message is displayed and also stored in attempt1.
Eg.,
myfibvectorTRY <- function(nvec){
nterms <- length(nvec)
result <- rep(0,nterms)
for(i in 1:nterms){
attempt <- try(myfibrec2(nvec[i]),silent=T)
if(class(attempt)=="try-error"){
result[i] <- NA
} else {
result[i] <- attempt
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
}
}
return(result)
}
Here, within the for loop, you use attempt to store the result of trying each
call to myfibrec2
Then, you inspect attempt. If this object’s class is try-error", that means
myfibrec2 produced an error, and you fill the corresponding slot in the
result vector with NA. Otherwise, attempt will represent a valid return
value from myfibrec2, so you place it in the corresponding slot of the result
vector.
Suppressing Warning Messages
If silent =false and still the warning message has to be suppressed you can use
suppressWarnings
R> attempt4 <- suppressWarnings(myfibrec2(-3))
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
Masking
you define a function with the same name as a function in an R package that
you have already loaded. R responds by masking one of the objects—that is,
one object or function will take precedence over the other and assume the
object or function name, while the masked function must be called with an
additional command. This protects objects from overwriting or blocking one
another.
Function and Object Distinction
When two functions or objects in different environments have the same
name, the object that comes earlier in the search path will mask the later
one.
Eg.,
This is how built_in sum in R package works
R> foo <- c(4,1.5,3)
R> sum(foo)
[1] 8.5
Now, suppose you were to enter the following function
sum <- function(x){
result <- 0
for(i in 1:length(x)){
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
Now, after importing the function, if you make a call to sum, your version
is used rather than built_in sum .
To call base version sum you have to include the name of its package in the
call, with a double colon.
R> base::sum(foo)
[1] 8.5
When Package Objects Clash
When you load a package, R will notify you if any objects in the package clash
with other objects that are accessible in the present session.
R> library("spatstat")
spatstat 1.40-0 (nickname: 'Do The Maths')
For an introduction to spatstat, type 'beginner'
R> library("car")
Attaching package: 'car'
The following object is masked from 'package:spatstat':
ellipse
This indicates that the two packages each have an object with the same name—
ellipse. Now when you type ellipse, the car version will be executed as it comes
first in search() path,as it is recently added. To use spatstat’s version, you must
type spatstat::ellipse.
Unmounting Packages
The detach functions unmounts the mentioned package from search path.
R> detach("package:car",unload=TRUE)
R> search()
[1] ".GlobalEnv" "package:MASS" "package:spatstat"
[4] "tools:RGUI" "package:stats" "package:graphics"
[7] "package:grDevices" "package:utils" "package:datasets"
[10] "package:methods" "Autoloads" "package:base
INNAHAI ANUGRAHAM
BCA V SEM R PROGRAMMING RAJADHANI DEGREE COLLGE
The data frame foo has three column variables: surname, sex, and height.To
access one of these columns, normally you need to use the $ operator and
enter something like foo$surname.
However, you can attach a data frame directly to your search path, which
makes it easier to access a variable. You can just access it by name like
surname in this example.
R> attach(foo)
R> search()
[1] ".GlobalEnv" "foo" "package:MASS"
[4] "package:spatstat" "tools:RGUI" "package:stats"
[7] "package:graphics" "package:grDevices" "package:utils"
[10] "package:datasets" "package:methods" "Autoloads"
[13] "package:base"
R> surname
[1] "a" "b" "c" "d"
• If you declare another dataframe for example bar, that has a column height and
attach(bar). The height column in foo will be masked.
INNAHAI ANUGRAHAM