All Unit R - Programming Notes PDF
All Unit R - Programming Notes PDF
R Script File:
•We make a text file and save this file with .R
extension
Comments
Program 2:
#WAP in R to print name and age of student without
using print ()
• class() function
• ls() function
• rm() function
• Print()
• Paste() and paste0()
• Cat()
class() function
• This built-in function is used to determine the
data type of the variable provided to it. The R
variable to be checked is passed to this as an
argument and it prints the data type in return.
Syntax:
class(variable)
Example:
var1 = "hello"
print(class(var1))
ls() function
• This built-in function is used to know all the present variables in the workspace.
This is generally helpful when dealing with a large number of variables at once and
helps prevents overwriting any of them.
Syntax
ls()
Example
# using equal to operator
var1 = "hello"
Example:
x = “R language"
cat(x, "is best\n")
x <- 1:9
cat(x, sep =" + ")
# print normal string
cat("This is R language")
The Difference Between cat() and
paste() in R
• The cat() function will output the
concatenated string to the console, but it
won't store the results in a variable. T
• he paste() function will output the
concatenated string to the console and it will
store the results in a character variable.
• EXAMPLE 1 :
#concatenate several strings together using cat()
results <- cat("hey", "there", "everyone")
#attempt to view concatenated string
results
Note: This is because the cat() function does not store results.
EXAMPLE 2:
results <- paste("hey", "there", "everyone")
#view concatenated string
results
[1] "hey there everyone“ #output of program
Finding Variables
# global variable
global = 5
func = function(){
# this variable is local to the
# function func() and cannot be
# accessed outside this function
age = 18
print(age)
}
cat("Age is:\n")
func()
Difference between local and global variables in R
3. Logical: It is a special data type for data with only two possible
values which can be construed as true/false.
• TRUE and FALSE
Ex: logical_value <- TRUE
4. Complex: A complex value in R is defined as the pure imaginary value i.
• Set of complex numbers
Ex: complex_value <- 1 + 2i
6.raw: A raw data type is used to holds raw bytes. To save and work with data
at the byte level in R, use the raw data type
• as.raw()
Ex: single_raw <- as.raw(255) #raw ff
• object.size()
• memory.profile()
Ex:
• # numeric value
• a = 11
• print(object.size(a))
Find data type of an object in R
# complex
x <- 9i + 3
class(x)
# Sample values
x=4
y=3
Syntax
as.data_type(object)
Example:
• as.numeric()
• as.integer()
• as.complex()
• as.character()
• as.logical()
# A simple R program
# convert data type of an object to another
# Logical
print(as.numeric(TRUE))
# Integer
print(as.complex(3L))
# Numeric
print(as.logical(10.5))
# Complex
print(as.character(1+2i))
# Can't possible
print(as.numeric("12-04-2020"))
#possible
print(as.numeric("12"))
x <- 1L # integer
y <- 2 # numeric
z<-”hello”
z <- as.numeric(z)
# convert from integer to numeric:
a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
# print values of x and y
x
y
z
# print the class name of a and b
class(a)
class(b)
Keywords/Reserved Words
• Reserved words in R programming are a set of
words that have a special meaning and cannot
be used as an identifier.
• The list of reserved words can be viewed by
typing ?reserved or help(reserved) at the R
command prompt.
Identifiers
• The unique name given to a variable like function
or objects is known as an identifier.
• Following are the rules for naming an identifier.
1. Identifiers can be a combination of letters,
digits, period(.) and underscore.
2. It must start with a letter or a period. If it starts
with a period, it can not be followed by a digit.
3. Reserved word in R can not be used as identifier.
Ex. Total1,sum,.date.of.birth,Sum_of_two etc.
Constants
• Constants or literals, are entities whose value
cannot be altered. Basic types of constants are
numeric constants and character constants.
• There are built-in constants also. All numbers fall
under this category.
• They can be of type integer, double and complex.
• But it is not good to rely on these, as they are
implemented as variables whose values can be
changed,
Taking Input from User in R
Programming
• t’s also possible to take input from the user.
For doing so, there are two methods in R.
Syntax:
var1 = readline(prompt = “Enter any number : “);
or,
var1 = readline(“Enter any number : “);
# R program to illustrate taking input from the user
Syntax:
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
# R program to illustrate taking input from the user to taking multiple
inputs using braces
{
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
}
# character input
var2 = readline(prompt = "Enter any character : ");
# convert to character
var2 = as.character(var2)
# printing values
print(var1)
print(var2)
Using scan() method
• Syntax:
x = scan()
scan() method is taking input continuously, to
terminate the input process, need to press Enter key 2
times on the console.
Example 1:
# taking input from the user
x = scan()
# print the inputted values
print(x)
Example 2:
# double input using scan()
d = scan(what = double())
# string input using 'scan()'
s = scan(what = " ")
• Arithmetic Operators
• Logical Operators
• Relational Operators
• Assignment Operators
• Miscellaneous Operator
Arithmetic Operations
• Every statistical analysis involves a lot of
calculations, and calculation is what R is
designed for — the work that R does best.
1. Basic arithmetic operators-
• These operators are used in just about every
programming language.
1. Basic Arithmetic Operator
#Arithmatic operators
x <- 5
y <- 16
x+y
x-y
x*y
y/x
y^x
Y%%x
2. USING MATHEMATICAL FUNCTIONS
3. Relational Operators
x <- 5
y <- 16
x<y
x>y
x <= 5
y >= 20
y == 16
x != 5
4. Logical Operators
5. Assignment operator-
6. Vector operations-
• Vector operations are functions that make
calculations on a complete vector, like sum().
• Each result depends on more than one value of the
vector.
7. Matrix operations-
• These functions are used for operations and
calculations on matrices.
Objects
• Unlike other programming languages, variables
are assigned to objects rather than data types
in R programming.
• Instead of declaring data types, as done in C++
and Java, in R, the user assigns the variables with
certain Objects in R, the most popular are:
• Vectors
• Factors
• Lists
• Data Frames
• Matrices
• The data type of the object in R becomes the
data type of the variable by definition.
• R's basic data types are character, numeric,
integer, complex, and logical.
Data Structure
• A data structure is a particular way of
organizing data in a computer so that it can be
used effectively.
• The idea is to reduce the space and time
complexities of different tasks.
• Data structures in R programming are tools
for holding multiple values.
• R’s base data structures are often organized by
their dimensionality (1D, 2D, or nD)
• The most essential data structures used in R
include:
• Vectors
• Lists
• Dataframes
• Matrices
• Arrays
• Factors
vector
• A vector is the simplest type of data structure
in R.
• Vectors are single-dimensional,
homogeneous data structures. To create a
vector, use the c() function.
• A vector is a sequence of data elements of the
same basic type.
• If you want to check the variable type,
use class().
• Vectors in R are the same as the arrays in C
language which are used to hold multiple data
values of the same type.
• One major key point is that in R the indexing
of the vector will start from ‘1’ and not from
‘0’. We can create numeric vectors and
character vectors as well.
Types of vectors
• Vectors are of different types which are used in R.
Following are some of the types of vectors:
• There are 5 data types of the simplest object -
vector:
1. Logical
2. Numeric
3. Character
4. Raw
5. Complex
Numeric vectors
Numeric vectors are those which contain numeric values
such as integer, float, etc.
• x <- c(1,2,3,4)
• There are several other ways of creating a vector:
2. Using the Operator
• x <- 1:5
• For y operator:
• y <- 5:-5
2. Create R vector using seq() function
• There are also two ways in this. The first way is to
set the step size and the second method is by
setting the length of the vector.
1) Setting step size with ‘by’ parameter:
seq(2,4, by = 0.4)
• (2.0,2.4,2.8,3.2,3.6,4.0)
2) Specifying length of vector with the ‘length.out’
feature:
• seq(1,4, length.out = 5)
• (1.00,1.75,2.50,3.25,4.00)
EXAMPLE :
# R program to create Vectors we can use the c function to
combine the values as a vector. By default the type will be
double
X <- c(61, 4, 21, 67, 89, 2)
cat('using c function', X, '\n')
# seq() function for creating
# a sequence of continuous values.
# length.out defines the length of vector.
Y <- seq(1, 10, length.out = 5)
cat('using seq() function', Y, '\n')
# use':' to create a vector
# of continuous values.
Z <- 2:7
cat('using colon', Z)
How to Access Elements of R Vectors?
Output:
Using Subscript operator 5
Using combine() function 1 4
Using Logical indexing 5
Modifying a vector
• Modification of a Vector is the process of
applying some operation on an individual
element of a vector to change its value in the
vector. There are different ways through which
we can modify a vector:
# R program to modify elements of a Vector
Creating a vector
X <- c(2, 7, 9, 7, 8, 2)
# modify a specific element
X[3] <- 1
X[2] <-9
cat('subscript operator', X, '\n')
# Modify using different logics.
X[X>5] <- 0
cat('Logical indexing', X, '\n')
# Modify by specifying
# the position or elements.
X <- X[c(3, 2, 1)]
cat('combine() function', X)
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Access the first and third item (banana and orange)
fruits[c(1, 3)]
# Access all items except for the first item
fruits[c(-1)]
Output
subscript operator 2 9 1 7 8 2
Logical indexing 2 0 1 0 0 2
combine() function 1 0 2
Change an Item
To change the value of a specific item, refer to the index
number:
Example
fruits <-
c("banana", "apple", "orange", "mango", "lemon")
# Change "banana" to "pear"
fruits[1] <- "pear"
# Print fruits
fruits
• Deleting a vector
Deletion of a Vector is the process of deleting all of
the elements of the vector. This can be done by
assigning it to a NULL value.
• Delete by rm()
# R program to delete a Vector Creating a Vector
M <- c(8, 10, 2, 5)
# set NULL to the vector
M <- NULL
cat('Output vector', M)
rm(M)#delete entire vector from memory
Output:
Output vector NULL
Object not found
FUNCTIONS OF VECTOR
• v<-c(10,20,30,40,50)
• min(v)
• max(v)
• sum()
• prod()
• sort()
• length()
Sorting elements of a Vector
sort() function is used with the help of which we can sort the values in
ascending or descending order.
# R program to sort elements of a Vector Creation of Vector
X <- c(8, 2, 7, 1, 11, 2)
# Sort in ascending order
A <- sort(X)
cat('ascending order', A, '\n')
# sort in descending order
# by setting decreasing as TRUE
B <- sort(X, decreasing = TRUE)
cat('descending order', B)
Output:
ascending order 1 2 2 7 8 11
descending order 11 8 7 2 2 1
order()
Sorting by order()
X<-c(10, 40,50,20,30)
X[order(x)]#increasing order
X[order(-x)]#decreasing order
Reverse the order
x<-c(10,20,30,40)
rv<-rev(x)
Print(rv)
seq() or Generating Sequenced Vectors
One of the examples on top, showed you how to create a
vector with numerical values in a sequence with
the : operator:
Example
numbers<- 1:10
numbers
1. To make bigger or smaller steps in a sequence, use
the seq() function:
Example
numbers<- seq(from = 0, to = 100, by = 20)
numbers
Note: The seq() function has three parameters: from is
where the sequence starts, to is where the sequence
stops, and by is the interval of the sequence.
• To create a vector with numerical values in a
sequence, use the : operator:
Example
• # Vector with numerical values in a sequence
numbers <- 1:10
numbers
• You can also create numerical values with decimals in
a sequence, but note that if the last element does not
belong to the sequence, it is not used:
Example
• # Vector with numerical decimals in a sequence
numbers1 <- 1.5:6.5
numbers1
Example:
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Output
[1] "1“ "2“ "4“ "5" "7" "8"
[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"
2. Arithmetic Operations
Output:
vector 1 : 2 3 4 5
vector 2 : 2 3 4 5
vector 3 : 2 3 4 5
vector 4 : 2 3 4 5
vector 5 : 2 3 4 5
Miscellaneous Operations
• X
It is the input vector which is to be transformed into a factor.
• levels
It is an input vector that represents a set of unique values which are
taken by x.
• labels
It is a character vector which corresponds to the number of labels.
• Exclude
It is used to specify the value which we want to be excluded,
• ordered
It is a logical attribute which determines if the levels are ordered.
• nmax
It is used to specify the upper bound for the maximum number of
level.
How to create a factor?
Output
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
# Creating a factor with levels defined by
programmer
gender <- factor(c("female", "male", "male",
"female"),levels = c("female", "transgender",
"male"));
gender
Output
[1] female male male female
Levels: female transgender male
Checking for a Factor in R
Output
[1] "factor"
Accessing elements of a Factor in R
Like we access elements of a vector, the same way
we access the elements of a factor. If gender is a
factor then gender[i] would mean accessing an
ith element in the factor.
Example
gender <- factor(c("female", "male", "male",
"female"));
gender[3]
Output
[1] male Levels: female male
• More than one element can be accessed at a
time.
Example
gender <- factor(c("female", "male", "male",
"female"));
gender[c(2, 4)]
Output
[1] male female Levels: female male
Subtract one element at a time.
Example
gender <- factor(c("female", "male", "male",
"female" ));
gender[-3]
Output
[1] female male female
Levels: female male
How to Create a Factor
# Apply the factor function with the required order of the level.
new_order_factor<-
factor(factor_data,levels = c("Gunjan","Nishka","Arpita","Shubham
","Sumit"))
print(new_order_factor)
• Output
• [1] Nishka Gunjan Shubham Arpita Arpita
Sumit Gunjan Shubham
• Levels: Arpita Gunjan Nishka Shubham Sumit
• [1] Nishka Gunjan Shubham Arpita Arpita
Sumit Gunjan Shubham
• Levels: Gunjan Nishka Arpita Shubham Sumit
Generating Factor Levels
# Importing package
library(stringr)
Output
• 5
• Using nchar() function
• R
•
• # R program to find length of string
•
• # Using nchar() function
• nchar("hel'lo")
• Output
• 6
• Accessing portions of an R string
• The individual characters of a string can be
extracted from a string by using the indexing
methods of a string. There are two R’s inbuilt
functions in order to access both the single
character as well as the substrings of the string.
• substr() or substring() function in R extracts
substrings out of a string beginning with the start
index and ending with the end index. It also
replaces the specified substring with a new set of
characters.
• Syntax
• substr(..., start, end) or substring(..., start, end)
• Using substr() function
• R
•
• # R program to access
• # characters in a string
•
• # Accessing characters
• # using substr() function
• substr("Learn Code Tech", 1, 1)
• Output
• "L"
• If the starting index is equal to the ending
index, the corresponding character of the
string is accessed. In this case, the first
character, ‘L’ is printed.
• Using substring() function
• R
•
• # R program to access characters in string
• str <- "Learn Code"
•
• # counts the characters in the string
• len <- nchar(str)
•
• # Accessing character using
• # substring() function
• print (substring(str, len, len))
•
• # Accessing elements out of index
• print (substring(str, len+1, len+1))
• Output
• [1] "e"
• The number of characters in the string is 10. The first print
statement prints the last character of the string, “e”, which is
str[10]. The second print statement prints the 11th character of the
string, which doesn’t exist, but the code doesn’t throw an error and
print “”, that is an empty character.
• The following R code indicates the mechanism of String Slicing,
where in the substrings of a R string are extracted:
• R
•
• # R program to access characters in string
• str <- "Learn Code"
•
• # counts the number of characters of str = 10
• len <- nchar(str)
• print(substr(str, 1, 4))
• print(substr(str, len-2, len))
• Output
• [1]"Lear" [1]"ode"
Case Conversion
• The R string characters can be converted to
upper or lower case by R’s inbuilt
function toupper() which converts all the
characters to upper case, tolower() which
converts all the characters to lower case,
and casefold(…, upper=TRUE/FALSE) which
converts on the basis of the value specified to
the upper argument. All these functions can
take in as arguments multiple strings too. The
time complexity of all the operations is
O(number of characters in the string).
• Example
• # R program to Convert case of a string
• str <- "Hi LeArn CodiNG"
• print(toupper(str))
• print(tolower(str))
• print(casefold(str, upper = TRUE))
• Output
• [1] "HI LEARN CODING" [1] "hi learn coding" [1]
"HI LEARN CODING"
By default, the value of upper in casefold() function is set to FALSE. If we set it to
TRUE, the R string gets printed in upper case.
• Concatenation of R Strings
• Using R’s paste function, you can concatenate strings. Here is a
straightforward example of code that joins two strings together:
• R
•
• # Create two strings
• string1 <- "Hello"
• string2 <- "World"
•
• # Concatenate the two strings
• result <- paste(string1, string2)
•
• # Print the result
• print(result)
• Output
• "Hello World"
• # Concatenate three strings
• result <- paste("Hello", "to", "the World")
•
• # Print the result
• print(result)
• R String formatting
• String formatting in R is done via the sprintf
function. An easy example of code that prepares
a string using a variable value is provided below:
• # Create two variables with values
• x <- 42
• y <- 3.14159
•
• # Format a string with the two variable values
• result <- sprintf("The answer is %d, and pi is
%.2f.", x, y)
•
• # Print the result
• print(result)
• Updating R strings
• The characters, as well as substrings of a
string, can be manipulated to new string
values. The changes are reflected in the
original string. In R, the string values can be
updated in the following way:
• substr (..., start, end) <- newstring substring
(..., start, end) <- newstring
• # Create a string
• string <- "Hello, World!"
•
• # Replace "World" with "Universe"
• string <- gsub("World", "Universe", string)
•
• # Print the updated string
• print(string)
• Output
• "Hello, Universe!"
• Multiple strings can be updated at once, with the
start <= end.
• If the length of the substring is larger than the
new string, only the portion of the substring
equal to the length of the new string is replaced.
• If the length of the substring is smaller than the
new string, the position of the substring is
replaced with the corresponding new string
values.
String Function
• R provides various string functions to perform
tasks. These string functions allow us to
extract sub string from string, search pattern
etc. There are the following string functions in
R:
1.substr(x, start=n1,stop=n2)
It is used to extract substrings in a character vector.
a <- "987654321"
substr(a, 3, 3)
Output[1] "3“
2.grep(pattern, x , ignore.case=FALSE, fixed=FALSE)
It searches for pattern in x.
st1 <- c('abcd','bdcd','abcdabcd')
pattern<- '^abc'
print(grep(pattern, st1))
Output[1] 1 3
• 3.sub(pattern, replacement, x, ignore.case
=FALSE, fixed=FALSE)
• It finds pattern in x and replaces it with
replacement (new) text.
st1<- "England is beautiful but no the part of
EU"
sub("England', "UK", st1)
Output[1] "UK is beautiful but not a part of EU"
4. paste(..., sep="")
It concatenates strings after using sep string to
separate them.
paste('one',2,'three',4,'five')
Output[1] one 2 three 4 five
5.strsplit(x, split)
It splits the elements of character vector x at
split point.
a<-"Split all the character“
print(strsplit(a, ""))
Output[[1]] [1] "split" "all" "the" "character"
6.tolower(x)
It is used to convert the string into lower case.
st1<- "shuBHAm"
print(tolower(st1))
Output[1] shubham
7.toupper(x)
It is used to convert the string into upper case.
st1<- "shuBHAm"
print(toupper(st1))
Output[1] SHUBHAM
Reading Strings
• We can read strings from a keyboard using the
readline() fun.
• It lets the user to enter a one-line string at the
terminal.
• Value <- readline(prompt=“string”)
• Ex. Print(n<-readline(prompt=“enter the subject:”))
• Enter the subject : R
• [1] “R”
List
• Lists are the objects of R which contain
elements of different types such as number,
vectors, string and another list inside it.
• It can also contain a function or a matrix as its
elements.
• A list is a data structure which has components
of mixed data types. We can say, a list is a
generic vector which contains other objects.
• A list in R is a generic object consisting of an
ordered collection of objects.
• Lists are one-dimensional, heterogeneous
data structures.
• The list can be a list of vectors, a list of
matrices, a list of characters and a list of
functions, and so on.
• A list is a vector but with heterogeneous data
elements.
Lists creation
Output:
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
Giving a name to list elements
• R provides a very easy way for accessing
elements, i.e., by giving the name to each
element of a list.
• By assigning names to the elements, we can
access the element easily. There are only three
steps to print the list data corresponding to the
name:
1. Creating a list.
2. Assign a name to the list elements with the help
of names() function.
3. Print the list data.
Example
# Creating a list containing a vector, a matrix and a list.
list_data <-
list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow =
2),
list("BCA","MCA","B.tech"))
length(thislist)
• Check if Item Exists
• To find out if a specified item is present in a
list, use the %in% operator:
• Example
• Check if "apple" is present in the list:
• thislist<- list("apple", "banana", "cherry")
append(thislist, "orange")
• To add an item to the right of a specified
index, add "after=index number" in
the append() function:
• Example
• Add "orange" to the list after "banana" (index
2):
• thislist<- list("apple", "banana", "cherry")
newlist<- thislist[-1]
Output:
The 3x3 matrix:
cde
a123
b456
c789
Creating special matrices
• R allows creation of various different types of matrices with the use
of arguments passed to the matrix() function.Matrix where all rows
and columns are filled by a single constant ‘k’:
To create such a matrix the syntax is given below:
Syntax:
matrix(k, m, n)
Parameters:
k: the constant
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
# Matrix having 3 rows and 3 columns
# filled by a single constant 5
print(matrix(5, 3, 3))
Output:
• Syntax:
• diag(k, m, n)
Parameters:
k: the constants/array
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
# Diagonal matrix having 3 rows and 3 columns
# filled by array of elements (5, 3, 3)
print(diag(c(5, 3, 3), 3, 3))
Output:
Syntax:
• diag(k, m, n)
Parameters:
k: 1
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
# Identity matrix having
# 3 rows and 3 columns
print(diag(1, 3, 3))
Output:
cat("Number of rows:\n")
print(nrow(A))
cat("Number of columns:\n")
print(ncol(A))
cat("Number of elements:\n")
print(length(A))
# OR
print(prod(dim(A)))
Output:
# Accessing 2
print(A[1, 2])
# Accessing 6
print(A[2, 3])
• Logical vector as index- Two logical vectors
can be used to index a matrix. In such
situation, rows and columns where the value
is TRUE is returned.
• These indexing vectors are recycled if
necessary and can be mixed with integers
vectors.
• M= matrix(c(1:12), nrow =4, byrow = TRUE)
• M[c(TRUE, FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 1 2
[3,] 10 11
• Character vector as index – If we assign
names to the rows and columns of a matrix,
then we can access the elements by names.
• This can be mixed with integers or logical
indexing.
• M <-matrix(c(3:14), nrow = 4, byrow = TRUE,
dimname = list(c(“r1”,”r2”,”r3”,”r4”),c(“c1”,”c2”,”c3”)))
• M[“r2”, “c3”] # elements at 2nd row, 3rd column
• M[ , “c1”] # elements of the column named c1
• M[TRUE, c(“c1”,”c2”)] # all rows and columns c1 & c2
• M[2:3, c(“c1”,”c3”)] # 2nd & 3rd row, columns c1 & c3
[,1] [,2] [,3]
• [1,] 1 2 3
• [2,] 4 5 6
• [3,] 7 8 9
• [4,] 10 11 12
• [1] 6
• r1 r2 r3 r4
1 4 7 10
c1 c2
• r1 1 2
• r2 4 5
• r3 7 8
• r4 10 11
c1 c2
• r1 4 6
• r2 7 9
Modifying elements of a Matrix
To delete a row or a column, first of all, you need to access that row or
column and then insert a negative sign before that row or column.
It indicates that you had to delete that row or column.
Row deletion:
# R program to illustrate
# row deletion in metrics Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE)
cat("Before deleting the 2nd row\n")
print(A)
# 2nd-row deletion
A = A[-2, ]
# R program to illustrate
# column deletion in metrics
# Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol =
3, byrow = TRUE )
cat("Before deleting the 2nd column\n")
print(A)
# 2nd-row deletion
A = A[, -2]
cat("After deleted the 2nd column\n")
print(A)
Matrix Arithmetic
• The dimensions ( no of rows and columns)
should be same for the matrices involved in the
operation.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• Matrix2 <- matrix(c(1,2,3,4,5,6), nrow=2)
• Sum <- Matrix1 + Matrix2
• Difference <- Matrix1 – Matrix2
• Product <- Matrix1 * Matrix2
• Quotient <- Matrix1 / Matrix2
[,1] [,2] [,3]
• [1,] 10 20 30
• [2,] 40 50 60
4. Diagonal Matrix –
• A <- matrix (1:9 , nrow =3)
• diag(A) # prints the diagonal element
• diag(3) # create an identity matrix of order 3
• diag( c(1,2,3) ,3) # create a matrix of order 3
with diagonal elements 1,2,3.
[,1] [,2] [,3]
• [1,] 1 4 7
• [2,] 2 5 8
• [3,] 3 6 9
• [1,] 1 5 9
V1= c(1,2,3)#RECYCLE
V2= c(10,20,30,40,50,60)
A<- array(c(V1,V2),dim=c(3,3,2))
print(A)
OUTPUT
,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13
NOTE
• Vectors of different lengths can also be fed as
input into the array() function.
• However, the total number of elements in all
the vectors combined should be equal to the
number of elements in the matrices. The
elements are arranged in the order in which
they are specified in the function.
Naming rows and columns
• [1] 4
• [1] 24
• [1] 3 6 9 12
• [1] 22 23 24
Adding elements to array
2 %in% multiarray
• OUTPUT:
• [1] TRUE
Loop Through an Array
• for(x in multiarray){
• print(x)
• }
OUTPUT
• [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9
[1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 [1] 16
[1] 17 [1] 18 [1] 19 [1] 20 [1] 21 [1] 22 [1] 23
[1] 24
Amount of Rows and Columns
OUTPUT:
• [1] 4 3 2
Array Length
• OUTPUT:
• [1] 24
calculations across the elements
• We can do calculations across the elements in an
array using the apply() function.
• Syntax- apply(x, margin,func)
• X is an array, margin is the name of the dataset,
func is function to be applied.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• B <- apply(A, c(1), sum)
• C <- apply (C, c(2), sum)
EXAMPLE
• #Creating two vectors of different lengths
• vec1 <-c(1,3,5)
• vec2 <-c(10,11,12,13,14,15)
•
• #Taking the vectors as input to the array1
• res1 <- array(c(vec1,vec2),dim=c(3,3,2))
• print(res1)
•
• #using apply function
• result <- apply(res1,c(1),sum)
• print(result)
Array Arithmetic
• To perform the arithmetic operations, we need to
convert the multi-dimensional matrix into one
dimensional matrix.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• mat.a <- A[ , , 1]
• mat.b <- A[ , ,2]
• mat.a + mat.b
• mat.a - mat.b
• mat.a * mat.b
• mat.a / mat.b
MATHS
Math Functions
• R provides the various mathematical functions
to perform the mathematical calculation.
• These mathematical functions are very helpful
to find absolute value, square value and much
more calculations.
• In R, there are the following functions which
are used:
Example
1.abs(x)It returns the absolute value of input x.
x<- -4
print(abs(x))
Output
[1] 4
2.sqrt(x)It returns the square root of input x.
x<- 4
print(sqrt(x))
Output[
1] 2
3.ceiling(x)It returns the smallest integer which is
larger than or equal to x.
x<- 4.5
print(ceiling(x))
Output
[1] 5
4.floor(x)It returns the largest integer, which is
smaller than or equal to x.
x<- 2.5
print(floor(x))
Output
[1] 2
5.trunc(x)It returns the truncate value of input x.
x<- c(1.2,2.5,8.1)
print(trunc(x))
Output
[1] 1 2 8
6.round(x, digits=n)It returns round value of input x.
x<- -4
print(abs(x))
Output
4
7.cos(x), sin(x), tan(x)It returns cos(x), sin(x) value of input x.
x<- 4
print(cos(x))
print(sin(x))
print(tan(x))
Output[1] -06536436 [2] -0.7568025 [3] 1.157821
8.log(x)It returns natural logarithm of input x.
x<- 4
print(log(x))
Output[1] 1.386294
9.log10(x)It returns common logarithm of input x.
x<- 4
print(log10(x))
Output[1] 0.60206
10.exp(x)It returns exponent.
x<- 4
print(exp(x))
Output[1] 54.59815
Factors
• Factor is a data structure used for fields that takes
only predefined finite number of values or
categorical data.
• They are used to categorize the data and store it
as levels.
• They can store both string and integers.
• For ex., A data field such as marital status may
contain only values from single, married,
separated, divorced and widowed. In such case,
the possible values are predefined and distnict
called levels.
Creating factors
• factors are created with the help
of factor() functions, by taking a vector as input.
• Factor contains a predefined set value called
levels. By default, R always sorts levels in
alphabetical order.
• directions <- c("North", "North", "West", "South")
• factor(directions)
• o/p= levels: North, South,West
Accessing Factor
• There are various ways to access the elements
of a factor in R. Some of the ways are as
follows:
• data <- c("East", "West", "East", "North)
• data[4]
• data[c(2,3)]
• data[-1]
• data[c(TRUE, FALSE, TRUE, TRUE)]
Modifying Factor
• To modify a factor, we are only limited to the
values that are not outside the predefined
levels.
• print(data)
• data[2] <- "North"
• data[3] <- "South"
Data Frames
# using str()
print(str(friend.data))
OUTPUT
• X <- data.frame("roll"=1:2,"name"=c("jack","jill"),"age"=c(20,22))
• print(X)
• names(X)
• nrow(X)
• ncol(X)
• str(X)
• summary(X)
Summary of Data in Data Frame
The statistical summary and nature of the data
can be obtained by
applying summary() function.
# Print the
summary.print(summary(emp.data))
# using summary()
print(summary(friend.data))
OUTPUT
Extract Data from Data Frame
Extract specific column from a data frame using
column name.
# Extract Specific columns.result<-
data.frame(emp.data$emp_name,emp.data$sala
ry)print(result)
• X["name"]
• X[1:2,]
• X[, 2:3]
• X[c(1,2),c(2,3)]
• X[,-1]
• X[-1,]
• X[X$age>21,]
• head(X,2)
Expand Data Frame
A data frame can be expanded by adding
columns and rows.
Add Column
• Just add the column vector using a new
column name.
Example 1:
Example 2:
• Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame2 <- data.frame (
Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
New_Data_Frame<- rbind(Data_Frame1, Data_Frame2)
New_Data_Frame
OUTPUT
• And use the cbind() function to combine two or more data frames
in R horizontally:
Example 3:
#sort by name
• newdata <- X[order(X$name),]
• print(df1)
• print(df2)
• # inner join
• merge(df1,df2, by= "CustomerId")
• # outer join
• merge(x=df1,y=df2, by= "CustomerId",all=TRUE)
• #cross join
• merge(x=df1,y=df2, by= NULL)
Reshaping Data
• R provides a variety of methods for reshaping
data prior to analysis.
• Two important functions for reshaping data are
the melt() and cast() functions.
• These functions are available in reshape package.
• Before using these functions, make sure that the
package is properly installed in your system.
• We can “melt” the data so that each row is a
unique id-variable combination. Then we can
“cast” the melted data into any shape we would
like.
• y <- data.frame("id"=c(1,2,1,2,1), "age"=c(20,20,21,21,19),
"marks1"=c(80,60,70,80,90),"marks2"=c(100,98,99,75,80))
• print(y)
• #melting data
• mdata= melt(y, id=c("id","age"))
• newdata <-
subset(X,age>=25&age<30,select=c(roll,name,age))
• print(newdata)
• newdata <-
subset(X,name=="smith"|name=="john",select=roll:age)
• print(newdata)
Unit 3
Conditions and loops
• Decision making structures are used by the
programmer to specify one or more
conditions to be evaluated or tested by the
program.
• A statement or statements need to be
executed if the condition is TRUE and
optionally other statements to be executed if
the condition is FALSE.
• Control statements are expressions used to
control the execution and flow of the program
based on the conditions provided in the
statements.
• These structures are used to make a decision
after assessing the variable.
• In R programming, there are 8 types of control
statements as follows:
• if condition
• if-else condition
• for loop
• nested loops
• while loop
• repeat and break statement
• return statement
• next statement
•
Decision Making
• R provides the following types of decision
making statements which includes if
statement, if..else statement, nested if…else
statement, ifelse() function and switch
statement.
• if condition
• This control structure checks the expression
provided in parenthesis is true or not. If true,
the execution of the statements in braces {}
continues.
• Syntax:
if(expression)
{ statements .... ....
}
x <- 100
if(x > 10){
print(paste(x, "is greater than 10"))
}
Output:
[1] "100 is greater than 10"
if Statement
• An if statement consists of a boolean
expression followed by one or more
statements. The syntax is-
• If( boolean_expression)
{
// statement will execute if the boolean
expression is true.
}
• If the boolean_expression evaluates to TRUE,
then the block of code inside the if statement
will be executed.
• If boolean_expression evaluates to FALSE,
then the first set of code after the end of if
statement will be executed.
• Here boolean expression can be a logical or
numeric vector, but only the first element is
taken into consideration.
• In the case of numeric vector, zero is taken as
FALSE, rest as TRUE.
• x<- 10
if (x > 0)
{
x <- 5
# Check value is less than or greater than 10
if(x > 10){
print(paste(x, "is greater than 10"))
}else{
print(paste(x, "is less than 10"))
}
Output:
[1] "5 is less than 10"
• X <- -5
If(x > 0){
cat( x, “is a positive number\n”)
} else {
cat( x, “is a negative number\n”)
}
• We can write the if…else statement in a single
line if the “if and else” block contains only one
statement as follows.
• if( x>0) cat ( x, ”is a positive no\n”) else cat(x, “is
a negative no\n”)
Nested if…else Statement
• An if statement can be followed by an optional
else if..else statement, which is very useful to
test various conditions using single if…else if
statement.
• We can nest as many if..else statement as we
want.
• Only one statement will get executed
depending upon the boolean_expression.
• if( boolean_expression 1) {
// execute when expression 1 is true.
} else if(boolean_expression 2) {
// execute when expression 2 is true.
} else if(boolean_expression 3) {
// execute when expression 3 is true.
} else {
// execute when none of the above condition is
true.
}
• X <- 19
if (x < 0)
{
cat(x, ”is a negative number”)
} else if (x>0)
{
cat(x, “is a positive number”)
}
else
print(“zero”)
ifelse() function
• Most of the function in R take vector as input and
output a resultant vector.
• This vectorization of code, will be much faster
than applying the same function to each element
of the vector individually.
• There is an easier way to use if..else statement
specifically for vectors in R.
• We can use if…else() function instead which is the
vector equivalent form of the if..else statement.
• ifelse(boolean_expression, x, y)
• Here, boolean_expression must be a logical
vector.
• The return value is a vector with the same length
as boolean_expression.
• This returned vector has element from x if the
corresponding value of boolean_expression is
TRUE or from Y if the corresponding value of
boolean_expression is FALSE.
• For example, the ith element of result will be x[i],
if boolean_expression[i] is TRUE else it will take
the value of y[i].
• The vectors x and y are recycled whenever
necessary.
• a = c(5,7,2,9)
ifelse( a %% 2 == 0 , “even” ,”odd”)
• o/p = ?
• In the above example, the boolean_expression
is a %% 2 ==0 which will result into the
vector(FALSE, FALSE,TRUE,FALSE).
• Similarly, the other two vectors in the function
argument gets recycled to (“even”, ”even”,
”even”, ”even”) and (“odd”, “odd”, “odd”,
“odd”) respectively.
• Hence the result is evaluated accordingly.
switch Statement
• A switch statement allows a variable to be tested
for equality against a list of values.
• Each value is called a case, and the variable being
switched on is checked for each case.
• switch( expression, case1, case2, case3….)
• If the value of expression is not a character string,
it is coerced to integer.
• We can have any no of case statements within a
switch.
• Each case is followed by the value to be
compared to and a colon.
• If the value of the integer is between 1 and
nargs()-1 { the max no of arguments} then the
corresponding element of case condition is
evaluated and the result is returned.
• If expression evaluates to a character string
then the string is matched(exactly) to the
names of the elements.
• If there is more than one match, the first
matching element is returned.
• No default argument is available.
• Switch( 2, “red”, “green”, “blue”)
• Switch(“color”, “color” = “red”, “shape” = “ square” ,
”length “=5)
• Output- [1] “green”
[2] “red”
• If the value evaluated is a number, that item of the list
is returned.
• In the above example, “red”, “green”, ”blue” from a
three item list. The switch() function returns the
corresponding item to the numeric value evaluated.
• In the above example, green is returned.
• The result of the statement can be a string as well.
• In this case, the matching named item’s value is
returned.
• In the above example, “color” is the string that is
matched and its value “red” is returned.
Question
• Write a program in R to check if a given year
is a leap year or not.
• Write a program in R to find the largest of
three numbers using if-else.
• Write a program in R to check if a given
character is a vowel or consonant.
• Write a program in R to check if a given
number is a prime number.
Loops
• In General, statements are executed
sequentially.
• Loops are used in programming to repeat a
specific block of code.
• R provides various looping structures like for
loop, while loop and repeat loop.
for loop
• A for loop is a repetition control structure that allow us
to efficiently write a loop that needs to execute a
specific number of times.
• A for loop is used to iterate over a vector in R
programming.
for ( value in sequence)
{
statements
}
• Here sequence is a vector and value takes on each of
its value during the loop.
• In each iteration, statements are evaluated.
• for loop
• It is a type of loop or sequence of statements
executed repeatedly until exit condition is
reached.
Syntax:
for(value in vector)
{ statements .... ....
}
x <- letters[4:10]
for(i in x){
print(i)
}
Output:
[1] "d“
[1] "e“
[1] "f"
[1] "g"
[1] "h“
[1] "i"
[1] "j"
• X <- c(2,5,3,9,8,11,6)
count <- 0
for(val in X)
{
if (val %% 2 == 0)
count = count+1
}
cat( “no of even numbers in”, X, “is”, count, ”\n”)
• o/p = ?
• The for loop in R is flexible that they are not
limited to integers in the input.
• We can pass character vector, logical vector,
lists or expressions.
• Ex-
• V <- c( “a”, “e”, “i”, “o”, “u”)
for ( vowel in V)
{
print(vowel)
}
• o/p- ?
• Nested loops
• Nested loops are similar to simple loops.
Nested means loops inside loop. Moreover,
nested loops are used to manipulate the
matrix.
# Defining matrix
m <- matrix(2:15, 2)
for (r in seq(nrow(m))) {
for (c in seq(ncol(m))) {
print(m[r, c])
}
}
• Output:
• [1] 2 [1] 4 [1] 6 [1] 8 [1] 10 [1] 12 [1] 14 [1] 3
[1] 5 [1] 7 [1] 9 [1] 11 [1] 13 [1] 15
• while loop
• while loop is another kind of loop iterated
until a condition is satisfied. The testing
expression is checked first before executing
the body of loop.
• Syntax:
while(expression)
{ statement .... ....
}
x=1
# Print 1 to 5
while(x <= 5){
print(x)
x=x+1
}
• Output:
• [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
while loop
• while loops used to loop until a specific condition in
met.
• Syntax-
while ( test_expression)
{ statement
}
• Here, test expression is evaluated and the body of the
loop is entered if the result is TRUE.
• The statements inside the loop are executed and the
flow returns to evaluate the test_expression again.
• This is repeated each time until test_expression
evaluated to FALSE, in which case, the loop exits.
num=5
sum=0
while(num>0)
{ sum= sum + num
num= num - 1
} cat( “the sum is”, sum, “\n”)
• repeat loop and break statement
• repeat is a loop which can be iterated many
number of times but there is no exit condition to
come out from the loop. So, break statement is
used to exit from the loop. break statement can
be used in any type of loop to exit from the loop.
Syntax:
repeat {
statements .... ....
if(expression) {
break
}
}
x=1
# Print 1 to 5
repeat{
print(x)
x=x+1
if(x > 5){
break
}
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
repeat loop
• A repeat loop is used to iterate over a block of
code multiple number of times.
• There is no condition check in repeat loop to
exit the loop. We must ourselves put a
condition explicitly inside the body of the loop
and use the break statement to exit the loop.
• Otherwise it will result in an infinite loop.
repeat {
Statements
if( condition)
{
Break
}
}
Example
count <- 1
repeat {
print(count)
count <- count + 1
if (count > 5) {
break # Exit the
loop when count is greater than 5
}
}
Output:
[1] 1 2 3 4 5
Loop Control Statements
• Loop control statements are also known as
jump statements.
• Loop control statements change execution
from its normal sequence.
• When execution leaves a scope, all automatic
objects that were created in that scope are
destroyed.
• The loop control statements in R are break
statement and next statement.
break statement
• A break statement is used inside a loop
(repeat, for, while) to stop the iterations and
flow the control outside of the loop.
• In a nested looping situation, where there is a
loop inside another loop, this statement exists
from the innermost loop that is being
evaluated.
• x<- 1:10
for( val in x) {
if (val == 3) {
break
}
print(val) }
• o/p = ?
• In the above example, we iterate over the vector
x, which has consecutive numbers from 1 to 10.
• Inside the for loop we have used an if condition
to break if the current value is equal to 3.
next statement
• A next statement is useful when we want to
skip the current iteration of a loop without
terminating it.
• On encountering next, the R parser skips
further evaluation and starts next iteration of
loop.
• This is equivalent to the continue statement in
C, java and python.
• X <- 1:10
for( val in X) {
if ( val == 3) {
next
}
print( val)
}
• We use the next statement inside a condition to
check if the value is equal to 3.
• If the value is equal to 3, the current evaluation
stops( value is not printed) but the loop continues
with the next iteration.
• return statement
• return statement is used to return the result
of an executed function and returns control to
the calling function.
• Syntax:
• return(expression)
• Example:
func(1)
func(0)
func(-1)
• Output:
• [1] "Positive"
• [1] "Zero"
• [1] "Negative"
Question
• Write a program in R to print the numbers
from 1 to 10 using a for loop.
• Write a program in R to find the sum of all
numbers from 1 to 100 using a for loop.
• Write a program in R to print the odd
numbers between 1 and 20 using a for loop.
• Write a program in R to calculate the factorial
of a given number using a for loop.
R Functions
• Function Name
The function name is the actual name of the function. In R, the function is stored as an
object with its name.
• Arguments
In R, an argument is a placeholder. In function, arguments are optional means a
function may or may not contain arguments, and these arguments can have
default values also. We pass a value to the argument when a function is invoked.
• Function Body
The function body contains a set of statements which defines what the function does.
• Return value
• It is the last expression in the function body which is to be evaluated.
Function Types
new.function()
Built-in function
• [1] 1 2
• [1] 1 2 3
• [1] 1
• [1] 1 2 3
2. char.expand()- This function seeks for a unique
match of its first argument among the elements
of its second.
• If successful, it returns this element, otherwise, it
performs an action specified by the third
argument. The syntax is as follow-
char.expand( input, target, nomatch= stop(“no
match”), warning())
• Where input is the character string to be
expanded, target is the character vector with the
values to be matched against, nomatch is an R
expression to be evaluated in case expansion was
not possible and warning function prints the
warning message in case there is no match.
• The match string searches only in the beginning.
• x<- c(“sand”, “and”, “land”)
• char.expand(“an”, x, warning(“no expand”))
• char.expand(“a”, x, warning(“no expand”))
3. charmatch()- This function finds matches
between two arguments and returns the index
position.
• charmatch( x, table, nomatch= NA_integer_)
• Where x gives the value to be matched, table
gives the value to be matched against and
nomatch gives the value to be returned at non
matching positions.
• charmatch (“an”, c(“and”, ”sand”))
• charmatch(“an”, “sand”)
• [1] 1
• [1] NA
4. charToRow – This function converts character
to ASCII or “raw” objects.
• x <- charToRaw(“a”)
• Y <- charToRaw(“AB”)
• [1] 61
• [1] 41 42
5. chartr() – this function is used for character
substitutions.
• chartr(old, new, x)
• x <- “apples are red”
• chartr(“a”, “g”, x)
6. dquote()- this function is used for putting double
quotes on a text.
• x <- ‘2013-06-12’
• dquote(x)
7. format()- numbers and strings can be formatted
to a specific style using format() function.
• Ex- format(x, digits, nsmall, scientific, width,
justify= c(“left”, “right”, “centre”, “none”))
8. gsub()- this function replaces all matches of a
string, if the parameter is a string vector, returns
a string vector of the same length and with the
same attributes.
• gsub(pattern, replacement, x, ignore.case=FALSE)
Ex- x<- “apples are red”
gsub(“are”, “were”, x)
o/p- “apples were red”
9. nchar() & nzchar()- This function determines
the size of each elements of a character
vector. nzchar() tests whether elements of a
character vector are non-empty strings.
Syn- nchar(x, type=“chars”, allowNA= FALSE)
syn- nzchar()
10. noquote()- This function prints out strings
without quotes. The syntax is noquote(x)
where x is a character vector.
Ex- letters
noquotes(letters)
11. paste()- Strings in R are combined using the
paste() function. It can take any number of string
arguments to be combined together.
Syn- paste(…., sep = “ “, collapse = NULL)
• Where…. Represents any number of arguments
to be combined, sep represents any seperator
between the arguments. It is optional.
• Collapse is used to eliminate the space in
between two strings but not the space within two
words of one string.
• Ex- a <- “hello”
• b <- “everyone”
• print(paste(a,b,c))
• print( paste(a,b,c, sep = “-” ))
• print( paste(a,b,c, sep = “”, collapse = “”)
12. replace()- This function replaces the values in X
with indices given in list by those given in values.
If necessary, the values in ‘values’ are recycled.
syn- replace( x, list, values)
Ex- x <- c(“green”, ”red”, “yellow”)
y <- replace(x,1,”black”)
13. sQuote()- This function is used for putting single
quote on a text.
X <- “2013-06-12 19:18:05”
sQuote(X)
14. strsplit()- This function splits the elements of a
character vector x into substrings according to
the matches to substring split within them.
Syn- strsplit( x, split)
15. substr()- This function extracts or replace
substrings in a character vector.
Syn- substr( x, start, stop)
substr( x, start, stop) <- value
Ex- substr( “programming”, 2,3)
x= c(“red”, “blue”, “green”, “yellow”)
Substr(x,2,3) <- “gh”
16. tolower() – This function converts string to
its lower case.
Syn- tolower(“R Programming”)
17. toString – This function produces a single
character string describing an R object.
Syn- toString(x)
toString( x, width = NULL)
18. toupper- This function converts string to its
upper case.
Syn- toupper(“r programming”)
Statistical Function
1. mean()- The function mean() is used to
calculate average or mean in R.
Syn- mean(x, trim= 0, na.rm = FALSE)
Trim is used to drop some observation from
both end of the sorted vector and na.rm is
used to remove the missing values from the
input vector.
2. median()- the middle most value in a data
series is called the median. The median() fun
is used in R to calculate this value.
Syn- median(x, na.rm= FALSE)
3. var()- returns the estimated variance of the
population from which the no in vector x are
sampled.
Syn- x<- c(10,2,30,2,5,8)
var(x, na.rm= TRUE)
4. sd()- returns the estimated standard deviation of
the population from which the no in vector x are
sampled.
Syn- sd(x, na.rm= TRUE)
5. scale()- returns the standard scores(z-score) for
the no in vector in x. Used to standardizing a
matrix.
Syn- x<- matrix(1:9, 3,3)
scale(x)
6. sum()- adds up all elements of a vector.
Syn- sum(X)
sum(c(1:10))
7. diff(x,lag=1)- returns suitably lagged and iterated
differences.
Syn- diff(x, lag, differences)
Where X is a numeric vector or matrix containing the
values to be differenced, lag is an integer indicating
which lag to use and difference is an integer indicating
the order of the difference.
• For ex., if lag=2, the difference between third and first
value, between the fourth and the second value are
calculated.
• The attribute differences returns the differences of
differences.
8. range()- returns a vector of the minimum and
maximum values.
Syn- x<- c(10,2,14,67,86,54)
range(x)
o/p- 2 86
9. rank()- This function returns the rank of the
numbers( in increasing order) in vector x.
Syn- rank(x, na.last = TRUE)
10. Skewness- how much differ from normal
distribution.
Syn- skewness(x)
Date and Time Functions
• R provides several options for dealing with date and
date/time.
• Three date/time classes commonly used in R are Date,
POSIXct and POSIXIt.
1. Date – date() function returns a date without time as
character string. Sys.Date() and Sys.time() returns the
system’s date and time.
Syn <- date()
Sys.Date()
Sys.time()
• We can create a date as follows-
• Dt <- as.Date(“2012-07-22”)
• While creating a date, the non-standard must be
specified.
• Dt2 <- as.Date(“04/20/2011” , format =“%m%d%Y”)
• Dt3 <- as.Date(“October 6, 2010”, format = “%B %d,%Y”)
2. POSIXct- If we have times in your data, this is
usually the best class to use. In POSIXct, “ct”
stands for calender time.
• We can create some POSIXct objects as follows.
Tm1<- as.POSIXct(“2013-07-24 23:55:26”)
o/p – “2013-07-24 23:55:26 PDT”
Tm2 <- as.POSIXct(“25072012 08:32:07”, format=
“%d%m%Y %H:%M:%S”)
• We can specify the time zone as follows.
Tm3<- as.POSIXct(“2010-12-01 11:42:03”,
tz=“GMT”)
• Times can be compared as follows.
• Tm2> Tm1
• We can add or subtract seconds as follows.
• Tm1 +30
• Tm1- 30
• Tm2 – Tm1
3. POSIXlt- This class enables easy extraction of
specific components of a time. In POSIXit, “lt”
stands for local time.
• “lt” also helps one remember that POSIXlt objects
are lists.
• Tm1.lt <- as.POSIXlt(“2013-07-24 23:55:26”)
• o/p- “2013-07-24 23:55:26”
• We can extract the components in time as follows.
• unlist(Tm1.lt)
sec min hour mday mon year wday yday isdat
26 55 23 24 6 113 3 204 1
• mday, wday, yday stands for day of the month, day of
the week and day of year resp.
• A particular component of a time can be extracted as
follows.
• Tm1.lt$sec
• we can truncate or round off the times as given below.
• trunc( Tm1.lt, “days”) o/p - “2013-07-24”
• trunc( Tm1.lt, “mins”) o/p – “2013-07-24 23:55:00”
Other Functions
1. rep( x, ntimes) – This function repeats x n
times.
Ex.- rep( 1:3,4)
• Class Attribute:
– Classes are determined by the class() attribute of an object.
– You can assign a class to an object using class(object) <- "class_name".
– Checking the class of an object is done with class(object).
• Output:
• Name: John
• Age: 30
Inheritance in S3 Class
• Validity Checking
• S4 classes also allow you to define validity checks
to ensure that objects are created with valid data
• validity checks can be added with setValidity.
characteristics
• Formal Structure: S4 classes have a formal definition that includes
slots (similar to fields in other programming languages) that hold
the data and methods (functions) that operate on that data.
# Calling object
a
• Output:
Slot "name":
[1] "Adam“
Slot "Roll_No":
[1] 20
• Example:
# Calling object
stud
Reference Class
• In R, the Reference Class system (also known
as Reference Classes or RC) is another form of
object-oriented programming (OOP) that
provides more flexibility and control
compared to S3 and S4 classes. Reference
Classes were introduced to R in version 2.12.0.
• Reference Classes in R provide a more flexible
and mutable approach to object-oriented
programming.
• Reference Classes are useful when you need
more control over objects' mutability, private
fields, and inheritance at the object level.
However, they can be more complex and have
a bit more overhead compared to S3 and S4
classes.
characteristics
• Mutable Objects: Objects created from Reference Classes are
mutable, meaning you can modify their slots directly.
• S3:
– Lightweight, simple, and informal.
– Flexible for quick prototyping.
– Limited support for formal inheritance and encapsulation.
• S4:
– Formal and strict class definition.
– Strong typing, encapsulation, and inheritance.
– More complex and best suited for larger projects and packages.
• R6:
• Encapsulated object-oriented programming.
• Strong encapsulation, inheritance, and control over privacy.
• Useful for creating reusable and well-structured code.
Debugging
• A grammatically correct program may give us incorrect
results due to logical errors. In case, if such errors (i.e.
bugs) occur, we need to find out why and where they
occur so that you can fix them. The procedure to
identify and fix bugs is called “debugging”.
• There are a number of R debug functions, such as:
• traceback()
• debug()
• browser()
• trace()
• recover()
• Debugging is a process of cleaning a program
code from bugs to run it successfully.
• While writing codes, some mistakes or
problems automatically appears after the
compilation of code and are harder to
diagnose. So, fixing it takes a lot of time and
after multiple levels of calls.
• Debugging in R is through warnings,
messages, and errors. Debugging in R means
debugging functions. Various debugging
functions are availabale.
Fundamental principles of Debugging
Output:
Error in "100" + "200" : non-numeric argument to binary
operator
Example:
# Class of success
class(success)
• Output
• [1] "try-error"
• tryCatch() specifies handler functions that control what happens when a condition
is signaled. One can take different actions for warnings, messages, and interrupts.
• The tryCatch() function is used for error handling in R. It allows you to "try"
executing a block of code and "catch" any errors that occur, enabling you to handle
them gracefully.
• Syntax:
• tryCatch(expr, error = function(e) {
• # Handle the error
• })
• expr: The expression or block of code to be evaluated.
• error: A function that specifies how to handle errors.
• Example:
tryCatch( sqrt("hello"),
error = function(e) {
print("An error occurred!")
})
•
• Error Messages:
• Handling Specific Errors: You can catch specific
types of errors by using the condition parameter
in tryCatch(). This allows you to handle different
types of errors differently.
• Example:
• tryCatch( sqrt("hello"), error = function(e) {
• if (inherits(e, "simpleError")) {
• print("Error: Invalid input for sqrt!")
• } else if (inherits(e, "error")) {
• print("Some other error occurred!")
• }})
• Custom Error Messages: When an error
occurs, you can provide custom error
messages to give more meaningful feedback
to the user.
• Example:
• tryCatch( sqrt("hello"), error = function(e) {
message("An error occurred:",
conditionMessage(e))
• })
• Example:
# Using tryCatch()
display_condition <- function(inputcode)
{
tryCatch(inputcode,
error = function(c) "Unexpected error occurred",
warning = function(c) "warning message, but
take precautions")
}
Output:
Important message is caught!
• try-catch-finally in R
• Unlike other programming languages such as
Java, C++, and so on, the try-catch-finally
statements are used as a function in R. The
main two conditions to be handled in
tryCatch() are “errors” and “warnings”.
• Syntax:
check = tryCatch({
expression }, warning = function(w){
code that handles the warnings
}, error = function(e){
code that handles the errors }, finally =
function(f){
clean-up code })
# R program illustrating error handling
# Applying tryCatch
tryCatch(
# Specifying expression
expr = {
1+1
print("Everything was fine.")
},
# Specifying error message
error = function(e){
print("There was an error message.")
},
warning = function(w){
print("There was a warning message.")
},
finally = {
print("finally Executed")
}
)
• Output:
• [1] "Everything was fine."
• [1] "finally Executed"
• withCallingHandlers() in R
• In R, withCallingHandlers() is a variant
of tryCatch(). The only difference is tryCatch()
deals with exiting handlers while
withCallingHandlers() deals with local
handlers.
• Example:
# R program illustrating error handling
# Evaluation of tryCatch
check <- function(expression){
withCallingHandlers(expression,
warning = function(w){
message("warning:\n", w)
},
error = function(e){
message("error:\n", e)
},
finally = {
message("Completed")
})
}
check({10/2})
check({10/0})
check({10/'noe'})
Unit-4
• Files in R Programming
• So far the operations using the R program are
done on a prompt/terminal which is not stored
anywhere. But in the software industry, most of
the programs are written to store the information
fetched from the program. One such way is to
store the fetched information in a file. So the two
most common operations that can be performed
on a file are:
• Importing/Reading Files in R
• Exporting/Writing Files in R
• Reading Files in R Programming Language
• When a program is terminated, the entire data is
lost. Storing in a file will preserve our data even if
the program terminates. If we have to enter a
large number of data, it will take a lot of time to
enter them all. However, if we have a file
containing all the data, we can easily access the
contents of the file using a few commands in R.
You can easily move your data from one
computer to another without any changes. So
those files can be stored in various formats. It
may be stored in a i.e..txt(tab-separated value)
file, or in a tabular format i.e .csv(comma-
separated value) file or it may be on the internet
or cloud. R provides very easier methods to read
those files.
TEXT File reading in R
• One of the important formats to store a file is in a text
file. R provides various methods that one can read data
from a text file.
• read.delim(): This method is used for reading “tab-
separated value” files (“.txt”). By default, point (“.”) is
used as decimal point.
• Syntax: read.delim(file or file.choose(), header = TRUE,
sep = “\t”, dec = “.”, …)
• Parameters:
• file.choose(): In R it’s also possible to choose a file
interactively using the function file.choose(), and if
you’re a beginner in R programming then this method
is very useful for you.
• file: the path to the file containing the data to
be read into R.
• header: a logical value. If TRUE, read.delim()
assumes that your file has a header row, so
row 1 is the name of each column. If that’s not
the case, you can add the argument header =
FALSE.
• sep: the field separator character. “\t” is used
for a tab-delimited file.
• dec: the character used in the file for decimal
points.
# R program reading a text file
print(myData)
• Output:
• 1 A computer science portal for Ks.
• read_tsv(): This method is also used for to read a
tab separated (“\t”) values by using the help
of readr package.
• Syntax: read_tsv(file, col_names = TRUE)
• Parameters:
• file: the path to the file containing the data to be
read into R.
• col_names: Either TRUE, FALSE, or a character
vector specifying column names. If TRUE, the first
row of the input will be used as the column
names.
• # R program to read text file
• # using readr package
• Output:
• # A tibble: 1 x 1
X1
1 A computer science portal for ks.
• Reading one line at a time
• read_lines(): This method is used for the reading
line of your own choice whether it’s one or two
or ten lines at a time. To use this method we have
to import reader package.
• Syntax: read_lines(file, skip = 0, n_max = -1L)
• Parameters:
• file: file path
• skip: Number of lines to skip before reading data
• n_max: Numbers of lines to read. If n is -1, all
lines in the file will be read.
# R program to read one line at a time
EXAMPLE
library(readr)
# read_file() to read the whole file
myData = read_file("ks.txt")
print(myData)
• # R program to read a file in table format
• # Using read.table()
• myData = read.table("basic.csv")
• print(myData)
• Output:
• 1 Name,Age,Qualification,Address
2 Amiya,18,MCA,BBS
3 Niru,23,Msc,BLS
4 Debi,23,BCA,SBP
5 Biku,56,ISC,JJP
• Reading a file in a table format
• Another popular format to store a file is in a tabular format.
R provides various methods that one can read data from a
tabular formatted data file.
• read.table(): read.table() is a general function that can be
used to read a file in table format. The data will be
imported as a data frame.
• Syntax: read.table(file, header = FALSE, sep = “”, dec = “.”)
• Parameters:
• file: the path to the file containing the data to be imported
into R.
• header: logical value. If TRUE, read.table() assumes that
your file has a header row, so row 1 is the name of each
column. If that’s not the case, you can add the argument
header = FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
READ CSV FILE IN R
• read.csv(): read.csv() is used for reading “comma separated
value” files (“.csv”). In this also the data will be imported as
a data frame.
• Syntax: read.csv(file or file.choose(), header = TRUE, sep =
“,”, dec = “.”, …)
• file.choose(): You can also
use file.choose() with read.csv() just like before.
• Parameters:
• file: the path to the file containing the data to be imported
into R.
• header: logical value. If TRUE, read.csv() assumes that your
file has a header row, so row 1 is the name of each column.
If that’s not the case, you can add the argument header =
FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
• # R program to read a file in table format
• # Using read.csv()
• myData = read.csv("basic.csv")
• print(myData)
• Output:
• Name Age Qualification Address
1 Amiya 18 MCA BBS
2 Niru 23 Msc BLS
3 Debi 23 BCA SBP
4 Biku 56 ISC JJP
• read.csv2(): read.csv() is used for variant used in
countries that use a comma “,” as decimal point and a
semicolon “;” as field separators.
• Syntax: read.csv2(file, header = TRUE, sep = “;”, dec =
“,”, …)
• Parameters:
• file: the path to the file containing the data to be
imported into R.
• header: logical value. If TRUE, read.csv2() assumes that
your file has a header row, so row 1 is the name of
each column. If that’s not the case, you can add the
argument header = FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
• # R program to read a file in table format
• # Using read.csv2()
• myData = read.csv2("basic.csv")
• print(myData)
• Output:
• Name.Age.Qualification.Address
1 Amiya,18,MCA,BBS
2 Niru,23,Msc,BLS
3 Debi,23,BCA,SBP
4 Biku,56,ISC,JJP
read_csv() fro readr package
• read_csv(): This method is also used for to read a
comma (“,”) separated values by using the help
of readr package.
• Syntax: read_csv(file, col_names = TRUE)
• Parameters:
• file: the path to the file containing the data to be
read into R.
• col_names: Either TRUE, FALSE, or a character
vector specifying column names. If TRUE, the first
row of the input will be used as the column
names.
• # R program to read a file in table format
• # using readr package
• Example:
• # import and store the dataset in data2
• data2 <- read.table(file.choose(), header=T, sep=", ")
•
• # display data
• data2
• Using R-Studio
• Here we are going to import data through R
studio with the following steps.
• Steps:
• From the Environment tab click on the Import
Dataset Menu.
• Select the file extension from the option.
• In the third step, a pop-up box will appear,
either enter the file name or browse the
desktop.
• The selected file will be displayed on a new
window with its dimensions.
• In order to see the output on the console,
type the filename.
Working with Excel Files in R
Programming
• Excel files are of extension .xls, .xlsx and
.csv(comma-separated values). To start working
with excel files in R Programming Language, we
need to first import excel files in RStudio or any
other R supporting IDE(Integrated development
environment).
• Reading Excel Files in R Programming Language
• First, install readxl package in R to load excel files.
Various methods including their subparts are
demonstrated further.
• Reading Files:
• The two excel files Sample_data1.xlsx and
Sample_data2.xlsx and read from the working
directory.
# Working with Excel Files
# Installing required package
install.packages("readxl")
• library(xlsx)
• # Loading package
• library(writexl)
• # Writing Data1
• write_xlsx(Data1, "New_Data1.xlsx")
• # Writing Data2
• write_xlsx(Data2, "New_Data2.xlsx")
Writing to Files in R Programming
• Output:
The above code creates a new file and redirects the output.
The contents of the file are shown below after executing
the code-
• [1] 4.6 [1] "numeric" [1] 4
• Writing to CSV files
• A matrix or data-frame object can be
redirected and written to csv file
using write.csv() function.
• Syntax: write.csv(x, file)
• Parameter:
file specifies the file name used for writing
• To know about all the arguments
of write.csv(), execute below command in R:
• help("write.csv")
• # Create vectors
• x <- c(1, 3, 4, 5, 10)
• y <- c(2, 4, 6, 8, 10)
• z <- c(10, 12, 14, 16, 18)
• # Create matrix
• data <- cbind(x, y, z)
• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )
• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )
• Parameters:
x: a matrix or a data frame to be written.
file: a character specifying the name of the result file.
sep: the field separator string, e.g., sep = “\t” (for tab-
separated value).
dec: the string to be used as decimal separator. Default
is “.”
row.names: either a logical value indicating whether
the row names of x are to be written along with x, or a
character vector of row names to be written.
col.names: either a logical value indicating whether
the column names of x are to be written along with x,
or a character vector of column names to be written.
• # R program to illustrate
• # Exporting data from R
• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )
• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )
• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )
?mtcars
• Display R datasets
• To display the dataset, we simply write the
name of the dataset inside
the print() function. For example,
• # display airquality dataset print(airquality)
rownames(Data_Cars)
Handling large data sets in R
• The Problem with large data sets in R-
• R reads entire data set into RAM all at once.
Other programs can read file sections on
demand.
• R Objects live in memory entirely.
• Does not have int64 data type
Not possible to index objects with huge
numbers of rows & columns even in 64 bit
systems (2 Billion vector index limit) . Hits file
size limit around 2-4 GB.
• How big is a large data set:
• We can categorize large data sets in R across two
broad categories:
• Medium sized files that can be loaded in R (
within memory limit but processing is
cumbersome (typically in the 1-2 GB range )
• Large files that cannot be loaded in R due to R /
OS limitations as discussed above . we can further
split this group into 2 sub groups
– Large files - (typically 2 - 10 GB) that can still be
processed locally using some work around solutions.
– Very Large files - ( > 10 GB) that needs distributed
large scale computing.
• Medium sized datasets (< 2 GB)
1. Try to reduce the size of the file before loading it into R
• If you are loading xls files , you can select specific columns that is
required for analysis instead of selecting the entire data set.
• You can not select specific columns if you are loading csv or text
file - you might want to pre-process the data in command line
using cut or awk commands and filter data required for analysis.
2. Pre-allocate number of rows and pre-define column classes
• Read optimization example :
• read in a few records of the input file , identify the classes of the
input file and assign that column class to the input file while
reading the entire data set
• calculate approximate row count of the data set based on the size
of the file , number of fields in the column ( or using wc in
command line ) and define nrow= parameter
• define comment.char parameter
• Alternately, use fread option from package data.table.
• “fast and friendly file finagler”, the popular data.table package is
an extremely useful and easy to use. Its fread() function is meant
to import data from regular delimited files directly into R,
without any detours or nonsense.
• One of the great things about this function is that all controls,
expressed in arguments such as sep, colClasses and nrows are
automatically detected.
• Also, bit64::integer64 types are also detected and read directly
without needing to read as character before converting.
• ff - ff is another package dealing with large data sets similar to
bigmemory. It uses a pointer as well but to a flat binary file
stored in the disk, and it can be shared across different sessions.
• One advantage ff has over bigmemory is that it supports
multiple data class types in the data set unlike bigmemory.
• Parallel Processing-Parallelism approach runs several
computations at the same time and takes advantage of
multiple cores or CPUs on a single system or across
systems. Following R packages are used for parallel
processing in R.
• Bigmemory - bigmemory is part of the “big” family
which consists of several packages that perform
analysis on large data sets. bigmemory uses several
matrix objects but we will only focus on big.matrix.
• big.matrix is a R object that uses a pointer to a C++
data structure. The location of the pointer to the C++
matrix can be saved to the disk or RAM and shared
with other users in different sessions.
• By loading the pointer object, users can access the data
set without reading the entire set into R.
• Very Large datasets -
• There are two options to process very large data
sets ( > 10GB) in R.
• Use integrated environment packages
like Rhipe to leverage Hadoop MapReduce
framework.
• Use RHadoop directly on hadoop distributed
system.
• Storing large files in databases and connecting
through DBI/ODBC calls from R is also an option
worth considering.
•
Unit-5
• Regular Expressions- Regular Expressions
(regex) are a set of pattern matching
commands used to detect string sequences in
a large text data. These commands are
designed to match a family (alphanumeric,
digits, words) of text which makes then
versatile enough to handle any text / string
class.
• In short, using regular expressions you can get
more out of text data while writing shorter
codes.
• String Manipulation- In R, we have packages such
as stringr and stringi which are loaded with all string
manipulation functions.
• In addition, R also comprises several base functions for
string manipulations. These functions are designed to
complement regular expressions.
• The practical differences between string manipulation
functions and regular expressions are
• We use string manipulation functions to do simple
tasks such as splitting a string, extracting the first three
letters, etc. We use regular expressions to do more
complicated tasks such as extract email IDs or date
from a set of text.
• String manipulation functions are designed to respond
in a certain way. They don't deviate from their natural
behavior. Whereas, we can customize regular
expressions in any way we want.
List of String Manipulation Functions
List of Regular Expression Commands
• #non greedy
regmatches(number, gregexpr(pattern = "1.?1",text = number))
[1] "101“
• It works like this: the greedy match starts from the first digit, moves
ahead, and stumbles on the second '1' digit. Being greedy, it
continues to search for '1' and stumbles on the third '1' in the
number. Then, it continues to check further but couldn't find more.
Hence, it returns the result as "1010000000001." On the other
hand, the non-greedy quantifier, stops at the first match, thus
returning "101."
• Let's look at a few more examples of quantifiers:
• names <-
c("anna","crissy","puerto","cristian","garcia","steven","alex
","rudy")
• #match a non-digit
gsub(pattern = "\\D+",replacement = "_",x = string)
regmatches(string,regexpr(pattern = "\\D+",text = string))
• #extract numbers
regmatches(x = string,gregexpr("[0-9]+",text =
string))
• #remove punctuations
gsub(pattern = "[[:punct:]]+",replacement = "",x = string)
• #remove spaces
gsub(pattern = "[[:blank:]]",replacement = "-",x = string)