R Content
R Content
➢ R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
➢ The core of R is an interpreted computer language which allows branching and looping as
well as modular programming using functions. R allows integration with the procedures
Comments are like helping text in your R program and they are ignored by the interpreter
while executing your actual program. Single comment is written using # in the beginning of
the statement as follows
➢ Vectors
➢ Lists
➢ Matrices
➢ Arrays
➢ Factors
➢ Data Frames
Data Type Example
Logical TRUE, FALSE
Numeric 12.3, 5, 999
Integer 2L, 34L, 0L
Complex 3 + 2i
Character 'a' , '"good", "TRUE", '23.4'
"Hello" is stored as 48 65 6c 6c
Raw
6f
Raw Data Types
v <- charToRaw("Hello")
print(class(v))
x = 1; y = 2
z=x>y
print(z)
print(class(z))
u = TRUE; v = FALSE;
z=u & v # u | v # !u
print(z)
Decimal values are called numerics in R. It is the default computational data type. If we
assign a decimal value to a variable x as follows, x will be of numeric type.
> x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric“
The fact that k is not an integer can be confirmed with the is.integer function.
elements of different classes as shown above. Please note in R the number of classes is not
confined to only the above six types. For example, we can use many atomic vectors and
can store an atomic vector, group of atomic vectors or a combination of many Robjects. A
valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.
Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and
underscore
var_name% Invalid Has the character '%'. Only dot(.) and
underscore allowed.
2var_name invalid Starts with a number
.var_name , valid Can start with a dot(.) but the
var.name dot(.)should not be followed by a
number.
.2var_name invalid The starting dot is followed by a
number making it invalid.
_var_name invalid Starts with _ which is not valid
Variable Assignment
The variables can be assigned values using leftward, rightward and equal to operator. The
values of the variables can be printed using print() or cat()function. The cat() function
combines multiple items into a continuous print output.
In R, a variable itself is not declared of any data type, rather it gets the data type of the R -
object assigned to it. So R is called a dynamically typed language, which means that we can
change a variable’s data type of the same variable again and again when using it in a
program.
Finding Variables
To know all the variables currently available in the workspace we use the ls() function. Also
the ls() function can use patterns to match the variable names.
print(ls())
The ls() function can use patterns to match the variable names.
print(ls(pattern = "var"))
Finding Variables
The variables starting with dot(.) are hidden, they can be listed using "all.names = TRUE"
print(ls(all.name = TRUE))
Deleting Variables
Variables can be deleted by using the rm() function. Below we delete the variable var.3. On
rm(var.3)
print(var.3)
[1] "var.3"
Variables can be deleted by using the rm() function. Below we delete the variable var.3. On
rm(var.3)
print(var.3)
[1] "var.3"
if statement
happens.
Here, test_expression can be a logical or numeric vector, but only the first element is taken
into consideration.
ifelse(test_expression,x,y)
Here, test_expression must be a logical vector (or an object that can be coerced to logical).
This returned vector has element from x if the corresponding value of test_expression is
a = c(5,7,2,9)
ifelse(a %% 2 == 0,"even","odd")
Looping in R
statement
Here, sequence is a vector and val takes on each of its value during the loop. In each
x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x) {
print(count)
While loop
while (test_expression)
statement
Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to evaluate the
test_expression again.
Break Statement
A next statement is useful when we want to skip the current iteration of a loop without
terminating it. On encountering next, the R parser skips further evaluation and starts next
A repeat loop is used to iterate over a block of code multiple number of times.
We must ourselves put a condition explicitly inside the body of the loop and use the break
statement to exit the loop. Failing to do so will result into an infinite loop.
repeat {
statement
}
Example of repeat
x <- 1
repeat {
print(x)
x = x+1
if (x == 6){
break
}
}
Sequence Generation
The third generates a sequence of length equally spaced values from from to to.
When you want to create vector with more than one element, you should use c() function
which means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
Value Coercion
In the code snippet above, notice how the numeric values are being coerced into character
strings when the two vectors are combined. This is necessary so as to maintain the same
primitive data type for members in the same vector.
Vector Arithmetics
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1) #Output is
# [[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to
the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions.
The array function takes a dim attribute which creates the required number of dimension. In
the below example we create an array with two elements which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.
#Calling of function
pow(8, 2)
Named Arguments
In the above function calls, the argument matching of formal argument to the actual
arguments takes place in positional order.
This means that, in the call pow(8,2), the formal arguments x and y are assigned 8 and 2
respectively.
When calling a function in this way, the order of the actual arguments doesn't matter. For
example, all of the function calls given below are equivalent.
pow(x = 8, y = 2) or
Pow(y=8, x=2) # We can mix named and unnamed
Default values to the Arguments
Default Values for Arguments
We can assign default values to arguments in a function in R.
This is done by providing an appropriate value to the formal argument in the function
declaration.
Here is the above function with a default value for y.
The return() function can return only a single object. If we want to return multiple values in
R, we can use a list (or other objects) and return it.
Following is an example.
By default the read.csv() function gives the output as a data frame. This can be easily
checked as follows. Also we can check the number of columns and rows.
print(is.data.frame(data))
print(ncol(data))
print(nrow(data))
When we execute the above code, it produces the following result −
[1] TRUE
[1] 5
[1] 8
CSV Files
Once we read data in a data frame, we can apply all the functions applicable to data frames as
explained in subsequent section.
getwd()
# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)
# Print a header.
cat("# # # # The First data frame\n")
# Print a header.
cat("# # # The Second data frame\n")
# Print a header.
cat("# # # The combined data frame\n")
We can merge two data frames by using the merge() function. The data frames must have
In the example below, we consider the data sets about Diabetes in Pima Indian Women
available in the library names "MASS". we merge the two data sets based on the values of
blood pressure("bp") and body mass index("bmi"). On choosing these two columns for
merging, the records where values of these two variables match in both data sets are