0% found this document useful (0 votes)
16 views

Introduction To Analytics and R-Programming

The document provides an introduction to R programming including key features of R like its programming language, data handling capabilities, and graphical facilities. It discusses various data types in R, important data structures, and basic functions to create, save and execute R files. The document also covers R console, variables, comments, keywords and basic operations in R code.

Uploaded by

ANMOL AGGARWAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Introduction To Analytics and R-Programming

The document provides an introduction to R programming including key features of R like its programming language, data handling capabilities, and graphical facilities. It discusses various data types in R, important data structures, and basic functions to create, save and execute R files. The document also covers R console, variables, comments, keywords and basic operations in R code.

Uploaded by

ANMOL AGGARWAL
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

Introduction to Analytics

and R-Programming
Introduction to R

► R is a programming language and software environment for


statistical analysis, graphics representation and reporting.
► R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed
by the R Development Core Team.
► The core of R is an interpreted computer language which allows
branching and looping as well as modular programming using
functions.
► R allows integration with the procedures written in the C, C++,
.Net, Python or FORTRAN languages for efficiency.
Features of R

R is a programming language and software environment for statistical analysis, graphic


representation and reporting. The following are the important features of R:
► A well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
► Has an effective data handling and storage facility.
► Provides a suite of operators for calculations on arrays, lists, vectors and matrices.
► Provides a large, coherent and integrated collection of tools for data analysis.
► Provides graphical facilities for data analysis and display either directly at the computer
or printing at the papers.
► Before you can ask your computer to save some numbers, you’ll
need to know how to talk to it.
► That’s where R and RStudio come in. RStudio gives you a way to
talk to your computer. R gives you a language to speak in.
► To get started, open RStudio just as you would open any other
application on your computer.
► When you do, a window should appear in your screen like the
one shown here
R screen
The RStudio interface is simple. You type R code into the bottom line of the
RStudio console pane and then click Enter to run it.
The code you type is called a command, because it will command your
computer to do something for you. The line you type it into is called
the command line.
When you type a command at the prompt and hit Enter, your computer executes
the command and shows you the results. Then RStudio displays a fresh prompt
for your next command. For example, if you type 2 + 3 and hit Enter, RStudio
will display:
Did you notice that a [1] appears next to your result?
R is just letting you know that this line begins with the first value in your result.
Some commands return more than one value, and their results may fill up multiple
lines. For example, the command 100:150 returns 51 values; it creates a sequence of
integers from 100 to 150. Notice that new bracketed numbers appear at the start of
the second and third lines of output. These numbers just mean that the second line
begins with the 25th value in the result, and the third line begins with the 49th value.
You can mostly ignore the numbers that appear in brackets:
► If you type an incomplete command and press Enter, R will display a
+ prompt, which means it is waiting for you to type the rest of your
command. Either finish the command or hit Escape to start over:

► If you type a command that R doesn’t recognize, R will return an


error message. If you ever see an error message, don’t panic. R is just
telling you that your computer couldn’t understand or do what you
asked it to do. You can then try a different command at the next
prompt:
The R Console

► The R console is the most important tool for using R. The R console is
a tool that allows you to type commands into R and see how the R
system responds. The commands that you type into the console are
called expressions. A part of the R system called the interpreter will
read the expressions and respond with a result or an error message.
Sometimes, you can also enter an expression into R through the
menus.
► By default, R will display a greater-than sign (“>”) in the console (at
the beginning of a line, when nothing else is shown) when R is
waiting for you to enter a command into the console. R is
prompting you to type something, so this is called a prompt.
Everything in R is an object.
R has 6 basic data types. (In addition to the five listed below, there is
also raw which will not be discussed in this workshop.)
► character
► numeric (real or decimal)
► integer
► logical
► complex
Datastructures in R

R has many data structures. These include


► vector
► list
► matrix
► data frame
► factors
Data Type Example Syntax
a<- TRUE
Logical TRUE, FALSE
class(a)
b<- 13.5
Numeric 12.3, 5, 999
class(b)
c<- 3L
Integer 2L, 34L, 0L
class(c)
d<- 2 + 5i
Complex 3 + 2i
class(d)
‘a' , '"good", "TRUE", e<- “23”
Character
'23.4' class(e)

► There are six data types of these atomic vectors, also termed as six
classes of vectors. The other R-Objects are built upon the atomic
vectors.
Basic Functions in R console

► Creating an R file
► Saving an R file
► Clearing the Console: The console can be cleared using the shortcut key “ctrl + L“.

.
Execution of an R file:

There are several ways in which the execution of the commands that are available in
the R file is done.
► Using the run command: This “run” command can be executed using the GUI, by
pressing the run button there, or you can use the Shortcut key control + enter.
► What does it do? It will execute the line in which the cursor is there.
► Using the source with echo command: This “source with echo” command can be
executed using the GUI, by pressing the source with echo button there, or you can
use the Shortcut key control + shift + enter.
► What does it do? It will print the commands also, along with the output you are
printing.
► Clearing the Environment: Variables on the R environment can be cleared in
two ways:
► Using rm() command: When you want to clear a single variable from the R
environment you can use the “rm()” command followed by the variable you
want to remove.
► Typing rm(variable) will delete the variable which you want to remove. If
you want to delete all the variables that are there in the environment
what you can do is you can use the “rm” with an argument “list” is equal
to “ls” followed by a parenthesis.
► Using the GUI: We can also clear all the variables in the environment using
the GUI in the environment pane by using the brush button.
Run command over Source
command:

► Run can be used to execute the selected lines of R code.


► Source with echo can be used to run the whole file.
► The advantage of using Run is, you can troubleshoot or debug the
program when something is not behaving according to your
expectations.
► The disadvantages of using run command are, it populates the
console and makes it messy unnecessarily.
Syntax of R program

► A program in R is made up of three things: Variables, Comments, and


Keywords. Variables are used to store the data, Comments are used to
improve code readability, and Keywords are reserved words that hold a
specific meaning to the compiler.
Variables

► Like most other languages, R lets you assign values to variables and refer to them
by name.
► In R, the assignment operator is <-. Usually, this is pronounced as “gets.” For
example, the statement:
x <- 1
► is usually read as “x gets 1.”)
► After you assign a value to a variable, the R interpreter will substitute that value in
place of the variable name when it evaluates an expression.
Comments in R

► Comments are a way to improve your code’s


readability and are only meant for the user so the
interpreter ignores it. Only single-line comments are
available in R which can be written by using # at the
beginning of the statement.
Keywords in R

Keywords are the words reserved by a program because they have a special
meaning thus a keyword can’t be used as a variable name, function name, etc. We
can view these keywords by using either help(reserved) or ?reserved.
► Here’s a simple example:
> x <- 1
> y <- 2
> z <- c(x,y)
>z
[1] 1 2
Notice that the substitution is done at the time that the value is
assigned to z, not at the time that z is evaluated.
Suppose that you were to type in the preceding three expressions and
then change the value of y. The value of z would not change:
> y <- 4
>z
[1] 1 2

But try and see if you assign a new value to z, does the previous one
shows up?
► R provides several different ways to refer to a member (or set of members) of
a vector.
► You can refer to elements by location in a vector:
> b <- c(1,2,3,4,5,6,7,8,9,10,11,12)
>b
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> b[7]
[1] 7
> b[1:6]
[1] 1 2 3 4 5 6
> b[b %% 3 == 0]
[1] 3 6 9 12
Exercise

That’s the basic interface for executing R code in RStudio. Try doing these
simple tasks. If you execute everything correctly, you should end up with the
same number that you started with:
► Choose any number and add 2 to it.
► Multiply the result by 3.
► Subtract 6 from the answer.
► Divide what you get by 3.
The variables can be assigned values using leftward, rightward and equal to
operator. The values of the variables can be printed using print() or
cat()function.

# Assignment using equal operator.


var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
Deleting Variables

► Variables can be deleted by using the rm() function. Let us delete the
variable var.3.
► On printing the value of the variable error is thrown.
Basic Operations in R
► When you perform an operation on two vectors, R will match the elements of the
two vectors pairwise and return a vector. For example:

> c(1, 2, 3, 4) + c(10, 20, 30, 40)


[1] 11 22 33 44

> c(1, 2, 3, 4) * c(10, 20, 30, 40)


[1] 10 40 90 160

> c(1, 2, 3, 4) - c(1, 1, 1, 1)


[1] 0 1 2 3
If the two vectors aren’t the same size, R will repeat the smaller sequence
multiple times:
> c(1, 2, 3, 4) + 1
[1] 2 3 4 5
> 1 / c(1, 2, 3, 4, 5)
[1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000
> c(1, 2, 3, 4) + c(10, 100)
[1] 11 102 13 104
> c(1, 2, 3, 4, 5) + c(10, 100)
[1] 11 102 13 104 15
Warning message: In c(1, 2, 3, 4, 5) + c(10, 100) : longer object length is not a
multiple of shorter object length Note the warning if the second sequence
isn’t a multiple of the first.
► In R, the operations that do all of the work are called functions. We’ve already
used a few functions above (you can’t do anything interesting in R without
them).
► Functions are just like what you remember from math class. Most functions are
in the following form: f(argument1, argument2, ...) Where f is the name of the
function, and argument1, argument2, . . . are the arguments to the function.
► Here are a few more examples:
> exp(1) [1] 2.718282
> cos(3.141593) [1] -1
> log2(1) [1] 0
In each of these examples, the functions only took one argument.
► Many functions require more than one argument. You can specify
the arguments by name:
> log(x=64, base=4)
[1] 3
Or, if you give the arguments in the default order, you can omit the
names:
> log(64,4)
[1] 3
R operators
Vectors

► When you want to create vector with more than one element, you
should use c() function which means to combine the elements into
a vector.
apple <- c('red','green',"yellow")
apple
class(apple)
► When we execute the above code, it produces the following result:
[1] "red" "green" "yellow"
► [1] "character"
Types of vectors

► Vectors are of different types which are used in R. Following are some of
the types of vectors:
► Numeric vectors: Numeric vectors are those which contain numeric
values such as integer, float, etc.
► Character vectors: Character vectors contain alphanumeric values and
special characters.
► Logical vectors: Logical vectors contain boolean values such as TRUE,
FALSE and NA for Null values.
Creating a vector # we can use the c function
# to combine the values as a vector.
# By default the type will be double
X <- c(61, 4, 21, 67, 89, 2)
cat('using c function', X, '\n')
► There are different
ways of creating # seq() function for creating
vectors. Generally, # a sequence of continuous values.
# length.out defines the length of vector.
we use ‘c’ to
Y <- seq(1, 10, length.out = 5)
combine different cat('using seq() function', Y, '\n')
elements together.
# use':' to create a vector
# of continuous values.
Z <- 2:7
cat('using colon', Z)
Accessing vector elements
# R program to access elements of a Vector

► Accessing elements in a # accessing elements with an index number.


X <- c(2, 5, 18, 1, 12)
vector is the process of
cat('Using Subscript operator', X[2], '\n')
performing operation on
an individual element of # by passing a range of values
a vector. There are # inside the vector index.
many ways through Y <- c(4, 8, 2, 1, 17)
cat('Using combine() function', Y[c(4, 1)],
which we can access
'\n')
the elements of the
vector. The most # using logical expressions
common is using the ‘[]’, Z <- c(5, 2, 1, 4, 4, 3)
symbol. cat('Using Logical indexing', Z[Z>4])
Modifying a vector

► Modification of a Vector is # Creating a vector


X <- c(2, 7, 9, 7, 8, 2)
the process of applying
some operation on an # modify a specific element
individual element of a X[3] <- 1
vector to change its value X[2] <-9
in the vector. There are cat('subscript operator', X, '\n')
different ways through # Modify using different logics.
which we can modify a X[X>5] <- 0
vector: cat('Logical indexing', X, '\n')
Deleting a vector

► Deletion of a Vector is the process of deleting all of the


elements of the vector. This can be done by assigning it
to a NULL value. (Deleting through rm() function can
also be done)
# Creating a Vector
M <- c(8, 10, 2, 5)

# set NULL to the vector


M <- NULL
cat('Output vector', M)
Sorting elements of a Vector

# Creation of Vector
sort() function is used with the help of
X <- c(8, 2, 7, 1, 11, 2)
which we can sort the values in
ascending or descending order. # Sort in ascending order
A <- sort(X)
cat('ascending order', A, '\n')

# sort in descending order


# by setting decreasing as TRUE
B <- sort(X, decreasing = TRUE)
cat('descending order', B)
Lists

► A list in R is a generic object consisting of an ordered


collection of objects. Lists are one-dimensional,
heterogeneous data structures. The list can be a list of
vectors, a list of matrices, a list of characters and a list of
functions, and so on.
► A list contains many different types of elements inside it
like vectors, functions and even another list inside it.
list1 <- list(c(2,5,3),21.3,sin)
# The first attributes is a numeric vector
# containing the employee IDs which is created
Creating a List # using the command here
empId = c(1, 2, 3, 4)
# The second attribute is the employee name
# which is created using this line of code here
► To create a List in R you # which is the character vector
need to use the function empName = c("Debi", "Sandeep", "Subham",
called “list()”. In other words, "Shiba")
a list is a generic vector # The third attribute is the number of employees
containing other objects. To # which is a single numeric variable.
illustrate how a list looks, we numberOfEmp = 4
take an example here. We
want to build a list of # We can combine all these three different
employees with the details. # data types into a list
So for this, we want # containing the details of employees
attributes such as ID, # which can be done using a list command
employee name, and the empList = list(empId, empName, numberOfEmp)
number of employees.
print(empList)
Accessing components of a list
# Creating a list by naming all its
components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep",
We can access components "Subham", "Shiba")
of a list in two ways. numberOfEmp = 4
empList = list(
► Access components by “empID" = empId,
names: All the components “empNames" = empName,
of a list can be named and "Total Staff" = numberOfEmp
we can use those names to )
access the components of print(empList)
the list using the dollar # Accessing components by names
command. cat("Accessing name components using $
command\n")
print(empList$Names)
Access components by indices:
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
► We can also access the empList = list(
"ID" = empId,
components of the list using "Names" = empName,
indices. To access the top-level "Total Staff" = numberOfEmp
)
components of a list we have to print(empList)
use a double slicing operator “[[
# Accessing a top level components by indices
]]” which is two square brackets cat("Accessing name components using indices\n")
and if we want to access the print(empList[[2]])
lower or inner level components # Accessing a inner level components by indices
of a list we have to use another cat("Accessing Sandeep from name using indices\n")
print(empList[[2]][2])
square bracket “[ ]” along with
the double slicing operator “[[ # Accessing another inner level components by indices
cat("Accessing 4 from ID using indices\n")
]]“. print(empList[[1]][4])
Modifying components of a list
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
► A list can also be "ID" = empId,
modified by accessing "Names" = empName,
"Total Staff" = numberOfEmp
the components and )
replacing them with the cat("Before modifying the list\n")
print(empList)
ones which you want.
# Modifying the top-level component
empList$`Total Staff` = 5

# Modifying inner level component


empList[[1]][5] = 5
empList[[2]][5] = "Kamala"

cat("After modified the list\n")


print(empList)
empId = c(1, 2, 3, 4)
Concatenation of lists empName = c("Debi", "Sandeep", "Subham",
"Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
Two lists can be "Names" = empName,
concatenated using the "Total Staff" = numberOfEmp
)
concatenation function. cat("Before concatenation of the new list\n")
So, when we want to print(empList)

concatenate two lists we # Creating another list


have to use the empAge = c(34, 23, 18, 45)
empAgeList = list(
concatenation operator. "Age" = empAge
Syntax: )

list = c(list, list1) # Concatenation of list using concatenation


list = the original list operator
empList = c(empList, empAgeList)
list1 = the new list
cat("After concatenation of the new list\n")
print(empList)
Deleting components of a list
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham",
"Shiba")
numberOfEmp = 4
► To delete components empList = list(
of a list, first of all, we "ID" = empId,
need to access those "Names" = empName,
"Total Staff" = numberOfEmp
components and then )
insert a negative sign cat("Before deletion the list is\n")
before those print(empList)
components. It # Deleting a top level components
indicates that we had cat("After Deleting Total staff components\n")
to delete that print(empList[-3])
component. # Deleting a inner level components
cat("After Deleting sandeep from name\n")
print(empList[[2]][-2])
Merging list

# Create two lists.


► We can merge the list by lst1 <- list(1,2,3)
placing all the lists into a lst2 <- list("Sun","Mon","Tue")
single list.
# Merge the two lists.
new_list <- c(lst1,lst2)

# Print the merged list.


print(new_list)
Converting List to Vector

► Here we are going to convert the list to vector, for this


we will create a list first and then unlist the list into the
vector. # Create lists.
lst <- list(1:5)
print(lst)

# Convert the lists to vectors.


vec <- unlist(lst)

print(vec)
R List to matrix # Defining list
lst1 <- list(list(1, 2, 3),
list(4, 5, 6))

# Print list
► We will create cat("The list is:\n")
matrices using print(lst1)
matrix() function in R cat("Class:", class(lst1), "\n")
programming.
# Convert list to matrix
Another function that
mat <- matrix(unlist(lst1), nrow =
will be used is unlist() 2, byrow = TRUE)
function to convert
the lists into a vector. # Print matrix
cat("\nAfter conversion to
matrix:\n")
print(mat)
cat("Class:", class(mat), "\n")
Matrices

► Matrix is a rectangular arrangement of numbers in rows


and columns. In a matrix, as we know rows are the ones
that run horizontally and columns are the ones that run
vertically. In R programming, matrices are
two-dimensional, homogeneous data structures. These
are some examples of matrices:
► To create a matrix in R you need to use the function
called matrix(). The arguments to this matrix() are the
set of elements in the vector. You have to pass how
many numbers of rows and how many numbers of
columns you want to have in your matrix.
M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow =
TRUE)
print(M)
# R program to create a matrix
A = matrix( # Naming rows
rownames(A) = c("a", "b", "c")
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9), # Naming columns
colnames(A) = c("c", "d", "e")
# No of rows
nrow = 3, cat("The 3x3 matrix:\n")
print(A)
# No of columns
ncol = 3,

# By default matrices are in column-wise


order
# So this parameter decides how to arrange
the matrix
byrow = TRUE
)
Creating special matrices

R allows creation of various different types of matrices with the use of


arguments passed to the matrix() function.

► Matrix where all rows and columns are filled by a single constant ‘k’:
To create such a matrix the syntax is given below:
Syntax: matrix(k, m, n) # R program to illustrate
# special matrices
Parameters:
k: the constant # Matrix having 3 rows and 3
m: no of rows columns
n: no of columns # filled by a single
constant 5
print(matrix(5, 3, 3))
Diagonal matrix:


A diagonal matrix is a matrix in which the entries outside the main diagonal
are all zero. To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters: # R program to illustrate
k: the constants/array # special matrices
m: no of rows
n: no of columns # Diagonal matrix having 3 rows and
3 columns
# filled by array of elements (5,
3, 3)
print(diag(c(5, 3, 3), 3, 3))
Identity matrix:

► A square matrix in which all the elements of the


principal diagonal are ones and all other elements are
zeros. To create such a matrix the syntax is given below:
Syntax: diag(k, m, n)
# R program to illustrate
Parameters: # special matrices
k: 1
m: no of rows # Identity matrix having
n: no of columns # 3 rows and 3 columns
print(diag(1, 3, 3))
Matrix metrics

Matrix metrics mean once a matrix is created then

► How can you know the dimension of the matrix?


► How can you know how many rows are there in the matrix?
► How many columns are in the matrix?
► How many elements are there in the matrix?
# R program to illustrate
# matrix metrics

# Create a 3x3 matrix cat("Number of rows:\n")


A = matrix( print(nrow(A))
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, cat("Number of columns:\n")
ncol = 3, print(ncol(A))
byrow = TRUE
) cat("Number of elements:\n")
cat("The 3x3 matrix:\n") print(length(A))
print(A) # OR
print(prod(dim(A)))
cat("Dimension of the matrix:\n")
print(dim(A))
Accessing elements of a Matrix

► We can access elements in the matrices using thematrix


name followed by a square bracket with a comma in
between array. Value before the comma is used to
access rows and value that is after the comma is used
to access columns. Let’s illustrate this by taking a simple
R code.
Accessing rows: # R program to illustrate
# access rows in metrics

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing first and second row


cat("Accessing first and second row\n")
print(A[1:2, ])
# R program to illustrate
Accessing columns: # access columns in metrics

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing first and second column


cat("Accessing first and second column\n")
print(A[, 1:2])
Accessing elements # R program to illustrate
# access an entry in metrics
of a matrix:
# Create a 3x3 matrix
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

# Accessing 2
print(A[1, 2])

# Accessing 6
print(A[2, 3])
# R program to illustrate
Accessing Submatrices: # access submatrices in a matrix

# Create a 3x3 matrix


A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
► We can access submatrix in a nrow = 3,
matrix using ncol = 3,
the colon(:) operator. byrow = TRUE
)
cat("The 3x3 matrix:\n")
print(A)

cat("Accessing the first three


rows and the first two
columns\n")
print(A[1:3, 1:2])
Modifying elements of a
# Create a 3x3 matrix
Matrix A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
► In R you can modify the ncol = 3,
byrow = TRUE
elements of the matrices )
by a direct assignment. cat("The 3x3 matrix:\n")
print(A)

# Editing the 3rd rows and 3rd column


element
# from 9 to 30
# by direct assignments
A[3, 3] = 30

cat("After edited the matrix\n")


print(A)
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Matrix nrow = 3,
ncol = 3,
Concatenation )
byrow = TRUE

cat("The 3x3 matrix:\n")


print(A)

# Creating another 1x3 matrix


Matrix concatenation B = matrix(
refers to the merging of c(10, 11, 12),
nrow = 1,
rows or columns of an ncol = 3
existing matrix. )
cat("The 1x3 matrix:\n")
► Concatenation of a print(B)
row: The concatenation
of a row to a matrix is # Add a new row using rbind()
C = rbind(A, B)
done using rbind().
cat("After concatenation of a row:\n")
print(C)
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
Concatenation of a ncol = 3,
byrow = TRUE
column: )
cat("The 3x3 matrix:\n")
print(A)

# Creating another 3x1 matrix


► The concatenation B = matrix(
of a column to a c(10, 11, 12),
nrow = 3,
matrix is done ncol = 1,
using cbind(). byrow = TRUE
)
cat("The 3x1 matrix:\n")
print(B)

# Add a new column using cbind()


C = cbind(A, B)

cat("After concatenation of a column:\n")


print(C)
# Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3,
ncol = 3, byrow = TRUE)
cat("The 3x3 matrix:\n")
print(A)

# Creating another 1x3 matrix


► Dimension B = matrix(
inconsistency: Note c(10, 11, 12),
nrow = 1,
that you have to make ncol = 3,
sure the consistency of )
dimensions between cat("The 1x3 matrix:\n")
print(B)
the matrix before you
do this matrix # This will give an error
concatenation. # because of dimension inconsistency
C = cbind(A, B)

cat("After concatenation of a column:\n")


print(C)
Deleting rows and columns of a Matrix
# Create a 3x3 matrix
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
► To delete a row or a column, ncol = 3,
first of all, you need to byrow = TRUE
access that row or column )
and then insert a negative cat("Before deleting the 2nd row\n")
sign before that row or print(A)
column. It indicates that you # 2nd-row deletion
had to delete that row or A = A[-2, ]
column.
cat("After deleted the 2nd row\n")
► Row deletion: print(A)
Column deletion:

A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
cat("Before deleting the 2nd column\n")
print(A)

# 2nd-row deletion
A = A[, -2]

cat("After deleted the 2nd column\n")


print(A)
► make a alphabet 3*3 matrix starting from a to i
► check the dimensions, no of rows, no of cols, length
► digonal matrix with 5 as your constant
► 5 and 3 as constant
► 1*3 alphabet ,matrix
► 3*1 next alphabets
► Combine them
Operations on Matrices

► There are four basic operations i.e. DMAS (Division, Multiplication,


Addition, Subtraction) that can be done with matrices. Both the
matrices involved in the operation should have the same number
of rows and columns.
► Order of a Matrix : The order of a matrix is defined in terms of its
number of rows and columns.
Order of a matrix = No. of rows × No. of columns
Therefore Matrix [M] is a matrix of order 3 × 3.
# Creating 1st Matrix
Matrices Addition B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2,
ncol = 3)

# Creating 2nd Matrix


► The addition of two same ordered C = matrix(c(7, 8, 9, 10, 11, 12), nrow =
matrices Mr*c and Nr*c yields a matrix 2, ncol = 3)
Rr*c where every element is the sum
of corresponding elements of the # Getting number of rows and columns
input matrices. num_of_rows = nrow(B)
► In the code, nrow(B) gives the
num_of_cols = ncol(B)
number of rows in B and ncol(B)
gives the number of columns. Here, # Creating matrix to store results
sum is an empty matrix of the same sum = matrix(, nrow = num_of_rows, ncol =
size as B and C. The elements of sum num_of_cols)
are the addition of the
corresponding elements of B and C # Printing Original matrices
through nested for loops print(B)
print(C)
Using ‘+’ operator for matrix addition:

# R program for matrix addition


# using '+' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4, 3, 4, 5), nrow = 2, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 0i, 0.1, 3, 4, 5), nrow = 2, ncol = 3)

# Printing the resultant matrix


print(B + C)
Properties of Matrix Addition:

► Commutative: B + C = C + B
► Associative: For n number of matrices A + (B + C) = (A +
B) + C
► Order of the matrices involved must be same.
# R program to add two matrices

# Creating 1st Matrix


Matrix Subtraction B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2,
ncol = 3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12), nrow =
► The subtraction of two 2, ncol = 3)
same ordered matrices
Mr*c and Nr*c yields a # Getting number of rows and columns
num_of_rows = nrow(B)
matrix Rr*c where every num_of_cols = ncol(B)
element is the
difference of # Creating matrix to store results
corresponding elements diff = matrix(B-C, nrow = num_of_rows, ncol
of the second input = num_of_cols)
matrix from the first. # Printing Original matrices
print(B)
print(C)
Using ‘-‘ operator for matrix
subtraction:

# R program for matrix addition


# using '-' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4, 3, 4, 5), nrow = 2, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 0i, 0.1, 3, 4, 5), nrow = 2, ncol = 3)

# Printing the resultant matrix


print(B - C)
Properties of Matrix Subtraction:

► Non-Commutative: B – C != C – B
► Non-Associative: For n number of matrices A – (B – C) !=
(A – B) – C
► Order of the matrices involved must be same.
# R program to multiply two matrices

# Creating 1st Matrix


Matrices Multiplication B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2,
ncol = 3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12), nrow =
► The multiplication of 2, ncol = 3)
two same ordered
# Getting number of rows and columns
matrices Mr*c and
num_of_rows = nrow(B)
Nr*c yields a matrix num_of_cols = ncol(B)
Rr*c where every
element is the # Creating matrix to store results
product of prod = matrix(, nrow = num_of_rows, ncol =
num_of_cols)
corresponding
elements of the # Printing Original matrices
input matrices. print(B)
print(C)
Using ‘*’ operator for matrix
multiplication:

# R program for matrix multiplication


# using '*' operator

# Creating 1st Matrix


B = matrix(c(1, 2 + 3i, 5.4), nrow = 1, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 1i, 0.1), nrow = 1, ncol = 3)

# Printing the resultant matrix


print (B * C)
Properties of Matrix Multiplication:

► Commutative: B * C = C * B
► Associative: For n number of matrices A * (B * C) = (A *
B) * C
► Order of the matrices involved must be same.
# R program to divide two matrices

# Creating 1st Matrix


Matrices Division B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol =
3)

# Creating 2nd Matrix


C = matrix(c(7, 8, 9, 10, 11, 12), nrow = 2, ncol
► The division of two = 3)
same ordered matrices
Mr*c and Nr*c yields a # Getting number of rows and columns
matrix Rr*c where every num_of_rows = nrow(B)
element is the quotient num_of_cols = ncol(B)
of corresponding # Creating matrix to store results
elements of the the first div = matrix(B/C, nrow = num_of_rows, ncol =
matrix element divided num_of_cols)
by the second.
# Printing Original matrices
print(B)
print(C)
Using ‘/’ operator for matrix division:

# R program for matrix division


# using '/' operator

# Creating 1st Matrix


B = matrix(c(4, 6i, -1), nrow = 1, ncol = 3)

# Creating 2nd Matrix


C = matrix(c(2, 2i, 0), nrow = 1, ncol = 3)

# Printing the resultant matrix


print (B / C)
Properties of Matrix Division:

► Non-Commutative: B / C != C / B
► Non-Associative: For n number of matrices A / (B / C) !=
(A / B) / C
► Order of the matrices involved must be same.
Arrays

► Arrays are essential data storage structures defined by a fixed number of dimensions.
Arrays are used for the allocation of space at contiguous memory locations.
Uni-dimensional arrays are called vectors with the length being their only dimension.
Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns. Arrays consist of all elements of the same data type. Vectors are supplied
as input to the function and then create an array based on the number of
dimensions.
► While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required
number of dimension. In the below example we create an array with two elements
which are 3x3 matrices each.

a <- array(c('green','yellow'),dim=c(3,3,2))
print(a)
Creating an Array

An array in R can be created with the use of array() function. List of


elements is passed to the array() functions along with the dimensions
as required. Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)
where, nrow : Number of rows; ncol : Number of columns
nmat : Number of matrices of dimensions nrow * ncol
dimnames : Default value = NULL.
Uni-Dimensional Array
A vector is a uni-dimensional array, which is specified by a
single dimension, length. A Vector can be created using
‘c()‘ function. A list of values is passed to the c() function to
create a vector. vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print (vec1)

# cat is used to concatenate


# strings and print it.
cat ("Length of vector : ",
length(vec1))
Multi-Dimensional Array

► A two-dimensional matrix is an array specified by a fixed


number of rows and columns, each containing the
same data type. A matrix is created by
using array() function to which the values and the
dimensions are passed.

# arranges data from 2 to 13


# in two matrices of dimensions 2x3
arr = array(2:13, dim = c(2, 3, 2))
print(arr)
Vectors of different lengths
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
can also be fed as input into
vec2 <- c(10, 11, 12)
the array() function.
However, the total number of
# elements are combined into a single
elements in all the vectors
vector,
combined should be equal
# vec1 elements followed by vec2
to the number of elements in
elements.
the matrices. The elements
arr = array(c(vec1, vec2), dim = c(2, 3,
are arranged in the order in
2))
which they are specified in
print (arr)
the function.
Naming of Arrays row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")

# the naming of the various elements


► The row names, column
# is specified in a list and
names and matrices
# fed to the function
names are specified as a
arr = array(2:14, dim = c(2, 3, 2),
vector of the number of
dimnames =
rows, number of columns
list(row_names,
and number of matrices
col_names,
respectively. By default,
mat_names))
the rows, columns and
print (arr)
matrices are named by
their index values.
Accessing arrays

► The arrays can be accessed by using indices for different dimensions


separated by commas. Different components can be specified by any
combination of elements’ names or positions.
► Accessing Uni-Dimensional Array: The elements can be accessed by
using indexes of the corresponding elements.
vec <- c(1:10)

# accessing entire vector


cat ("Vector is : ", vec)

# accessing elements
cat ("Third element of vector is : ",
vec[3])
Accessing entire matrices
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
dimnames = list(row_names,
col_names, mat_names))

# accessing matrix 1 by index value


print ("Matrix 1")
print (arr[,,1])

# accessing matrix 2 by its name


print ("Matrix 2")
print(arr[,,"Mat2"])
Accessing specific rows and columns
of matrices vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
► Rows and mat_names <- c("Mat1", "Mat2")
columns can also arr = array(c(vec1, vec2), dim = c(2, 3, 2),
be accessed by dimnames = list(row_names,
col_names,
both names as mat_names))
well as indices.
# accessing matrix 1 by index value
print ("1st column of matrix 1")
print (arr[, 1, 1])

# accessing matrix 2 by its name


print ("2nd row of matrix 2")
print(arr["row2",,"Mat2"])
Accessing elements individually
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
► Elements can col_names <- c("col1", "col2", "col3")
be accessed mat_names <- c("Mat1", "Mat2")
by using both arr = array(c(vec1, vec2), dim = c(2, 3, 2),
the row and dimnames = list(row_names, col_names, mat_names))
column
# accessing matrix 1 by index value
numbers or print ("2nd row 3rd column matrix 1 element")
names. print (arr[2, "col3", 1])

# accessing matrix 2 by its name


print ("2nd row 1st column element of matrix 2")
print(arr["row2", "col1", "Mat2"])
Accessing subset of array elements

row_names <- c("row1", "row2")


col_names <- c("col1", "col2", "col3",
► A smaller subset "col4")
of the array mat_names <- c("Mat1", "Mat2")
elements can be arr = array(1:15, dim = c(2, 4, 2),
accessed by dimnames = list(row_names,
defining a range col_names, mat_names))
of row or column # print elements of both the rows and
limits. columns 2 and 3 of matrix 1
print (arr[, c(2, 3), 1])
Example

► Make a vector from range 2 to 10


► Add 11 in the starting and 12 in the end using
c() fn
► Add 13 using append
► Add 14,15,16 using length
► Add 17 after 5th value using append
Adding elements to array

► Elements can be appended at the different positions in the


array. The sequence of elements is retained in order of their
addition to the array. The time complexity required to add
new elements is O(n) where n is the length of the array. The
length of the array increases by the number of element
additions. There are various in-built functions available in R to
add new values:
► c(vector, values): c() function allows us to append values to
the end of the array. Multiple values can also be added
together.
► append(vector, values): This method allows the values
to be appended at any position in the vector. By
default, this function adds the element at end.
► append(vector, values, after=length(vector)) adds new
values after specified length of the array specified in the
last argument of the function. 17 after 5
► Using the length function of the array:
Elements can be added at length+x indices where x>0
14,15,15
# adding on length + 3 index
x[len + 3]<-9
# creating a uni-dimensional array print ("Array after 4th
x <- c(1, 2, 3, 4, 5) modification ")
print (x)
# addition of element using c() function
x <- c(x, 6) # append a vector of values to the
print ("Array after 1st modification ") array after length + 3 of array
print (x) print ("Array after 5th
modification")
# addition of element using append function x <- append(x, c(10, 11, 12), after
x <- append(x, 7) = length(x)+3)
print ("Array after 2nd modification ") print (x)
print (x)
# adds new elements after 3rd index
# adding elements after computing the length print ("Array after 6th
len <- length(x) modification")
x[len + 1] <- 8 x <- append(x, c(-1, -1), after =
print ("Array after 3rd modification ") 3)
print (x) print (x)
► The original length of the array was 7, and after third
modification elements are present till the 8th index
value. Now, at the fourth modification, when we add
element 9 at the tenth index value, the R’s inbuilt
function automatically adds NA at the missing value
positions.
► At 5th modification, the array of elements [10, 11, 12]
are added beginning from the 11th index.
At 6th modification, array [-1, -1] is appended after the
third position in the array.
Removing Elements from Array

► Elements can be removed from arrays in R, either one at a time


or multiple together. These elements are specified as indexes to
the array, wherein the array values satisfying the conditions are
retained and rest removed. The comparison for removal is
based on array values. Multiple conditions can also be
combined together to remove a range of elements. Another
way to remove elements is by using %in% operator wherein the
set of element values belonging to the TRUE values of the
operator are displayed as result and the rest are removed.
# creating an array of length 9
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print ("Original Array")
print (m)

# remove a single value element:3 from array # remove sequence of elements


m <- m[m != 3] using another array
print ("After 1st modification") remove <- c(4, 6, 8)
print (m)
# check which element satisfies
# removing elements based on condition the remove property
# where either element should be print (m % in % remove)
# greater than 2 and less than equal to 8 print ("After 3rd modification")
m <- m[m>2 & m<= 8] print (m [! m % in % remove])
print ("After 2nd modification")
print (m)
At 1st modification, all the element values that are not equal to 3 are retained. At 2nd modification,
the range of elements that are between 2 and 8 are retained, rest are removed. At 3rd
modification, the elements satisfying the FALSE value are printed, since the condition involves the
NOT operator.
Updating Existing Elements of Array

► The elements of the array can be updated with new


values by assignment of the desired index of the array
with the modified value. The changes are retained in
the original array. If the index value to be updated is
within the length of the array, then the value is
changed, otherwise, the new element is added at the
specified index. Multiple elements can also be updated
at once, either with the same element value or multiple
values in case the new values are specified as a vector.
# updating two indices with two
different values
m[c(2, 5)] <- c(-1, -2)
# creating an array of length 9 print ("After 3rd modification")
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) print (m)
print ("Original Array")
print (m) # this add new element to the array
m[10] <- 10
# updating single element print ("After 4th modification")
m[1] <- 0 print (m)
print ("After 1st modification")
print (m) At 2nd modification, the elements at indexes 7 to 9
are updated with -1 each. At 3rd modification, the
# updating sequence of elements second element is replaced by -1 and fifth element
m[7:9] <- -1 by -2 respectively. At 4th modification, a new
print ("After 2nd modification") element is added since 10th index is greater than
print (m) the length of the array.
Factors

► Factors are the R-objects which are created using a vector. It stores the vector
along with the distinct values of the elements in the vector as labels.
► The labels are always character irrespective of whether it is numeric or character
or Boolean etc. in the input vector. They are useful in statistical modeling.
► Factors are created using the factor() function. The nlevels functions gives the
count of levels.
apple_colors <- c('green','green','yellow','red','red','red','green')
factor_apple <- factor(apple_colors)
factor_apple
nlevels(factor_apple)
► Factors in R Programming Language are data structures that
are implemented to categorize the data or represent
categorical data and store it on multiple levels.
► They can be stored as integers with a corresponding label to
every unique integer. Though factors may look similar to
character vectors, they are integers and care must be taken
while using them as strings. The factor accepts only a restricted
number of distinct values. For example, a data field such as
gender may contain values only from female, male, or
transgender.
► In the above example, all the possible cases are known
beforehand and are predefined. These distinct values
are known as levels. After a factor is created it only
consists of levels that are by default sorted
alphabetically.
Attributes of Factors in R Language

► x: It is the vector that needs to be converted into a factor.


► Levels: It is a set of distinct values which are given to the input
vector x.
► Labels: It is a character vector corresponding to the number
of labels.
► Exclude: This will mention all the values you want to exclude.
► Ordered: This logical attribute decides whether the levels are
ordered.
► nmax: It will decide the upper limit for the maximum number
of levels.
Creating a Factor in R Programming
Language

► The command used to create or modify a factor in R


language is – factor() with a vector as input.
The two steps to creating a factor are:
► Creating a vector
► Converting the vector created into a factor using
function factor()
Let us create a factor gender with
levels female, male and transgender.

# Creating a vector
x < -c("female", "male", "male", "female")
print(x)

# Converting the vector x into a factor


# named gender
gender < -factor(x)
print(gender)
Levels can also be predefined by the
programmer.

# Creating a factor with levels defined by programmer


gender <- factor(c("female", "male", "male", "female"),
levels = c("female", "transgender", "male"));
gender
Checking for a Factor in R

► The function is.factor() is used to check whether the


variable is a factor and returns “TRUE” if it is a factor.

gender <- factor(c("female", "male", "male", "female"));


print(is.factor(gender))
► Function class() is also used to check whether the
variable is a factor and if true returns “factor”.

gender <- factor(c("female", "male", "male", "female"));


class(gender)
Accessing elements of a Factor in R

► Like we access elements of a vector, the same way we


access the elements of a factor. If gender is a factor
then gender[i] would mean accessing ith element in the
factor. gender <- factor(c("female", "male", "male", "female"));
gender[3]

► More than one element can be accessed at a time.


gender <- factor(c("female", "male", "male", "female"));
gender[c(2,4)]
Modification of a Factor in R

► After a factor is formed, its components can be modified but the new values
which need to be assigned must be at the predefined level.
gender <- factor(c("female", "male", "male", "female" ));
gender[2]<-"female"
gender
► For selecting all the elements of the factor gender except ith element, gender[-i]
should be used. So if you want to modify a factor and add value out of
predefines levels, then first modify levels.
gender <- factor(c("female", "male", "male", "female" ));
# add new level
levels(gender) <- c(levels(gender), "other")
gender[3] <- "other"
gender
Factors in Data Frame

The Data frame is similar to a 2D array with the columns containing all
the values of one variable and the rows having one set of values from
every column. There are four things to remember about data frames:
► column names are compulsory and cannot be empty.
► Unique names should be assigned to each row.
► The data frame’s data can be only of three types- factor, numeric,
and character type.
► The same number of data items must be present in each column.
► In R language when we create a data frame, its column is
categorical data and hence a factor is automatically created on it.
We can create a data frame and check if its column is a factor.

age <- c(40, 49, 48, 40, 67, 52, 53)


salary <- c(103200, 106200, 150200,
10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
"female", "male", "female", "transgender")
employee<- data.frame(age, salary, gender)
print(employee)
print(is.factor(employee$gender))
Data frames

► R Programming Language is an open-source


programming language that is widely used as a
statistical software and data analysis tool. Data Frames
in R Language are generic data objects of R which are
used to store the tabular data. Data frames can also be
interpreted as matrices where each column of a matrix
can be of the different data types. DataFrame is made
up of three principal components, the data, rows, and
columns.
Data Frames

► Data frames are tabular data objects. Unlike a matrix in data frame each column
can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of
vectors of equal length. Data Frames are created using the data.frame() function.
BMI <- data.frame( gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age =c(42,38,26)
)
print(BMI)
Create Dataframe in R Programming
Language
# R program to create dataframe
► To create a data
frame in R # creating a data frame
friend.data <- data.frame(
use data.frame() com friend_id = c(1:5),
mand and then pass friend_name = c("Sachin", "Sourav",
each of the vectors "Dravid", "Sehwag",
you have created as "Dhoni"),
arguments to the stringsAsFactors = FALSE
)
function. # print the data frame
print(friend.data)
Get the Structure of the R – Data Frame

# R program to get the


► One can get the structure of # structure of the data frame
the data frame
using str() function in R. It # creating a data frame
can display even the friend.data <- data.frame(
internal structure of large friend_id = c(1:5),
lists which are nested. It friend_name = c("Sachin", "Sourav",
provides one-liner output for "Dravid", "Sehwag",
the basic R objects letting "Dhoni"),
the user know about the stringsAsFactors = FALSE
object and its constituents. )
# using str()
print(str(friend.data))
Summary of data in the data frame

# R program to get the


► In R data frame, the statistical # summary of the data frame
summary and nature of the
data can be obtained by # creating a data frame
applying summary() function. friend.data <- data.frame(
It is a generic function used to friend_id = c(1:5),
produce result summaries of friend_name = c("Sachin", "Sourav",
the results of various model "Dravid", "Sehwag",
fitting functions. The function "Dhoni"),
invokes particular methods stringsAsFactors = FALSE
which depend on the class of )
the first argument. # using summary()
print(summary(friend.data))
Extract Data from Data Frame in R
Language
# R program to extract
# data from the data frame

► Extract data from a # creating a data frame


data frame means friend.data <- data.frame(
friend_id = c(1:5),
that to access its
friend_name = c("Sachin", "Sourav",
rows or columns. "Dravid", "Sehwag",
One can extract a "Dhoni"),
specific column from stringsAsFactors = FALSE
a data frame using )
its column name.
# Extracting friend_name column
result <- data.frame(friend.data$friend_name)
print(result)
Expand Data Frame
# creating a data frame
friend.data <- data.frame(
friend_id = c(1:5),
► A data frame in R friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
can be expanded
"Dhoni"),
by adding new stringsAsFactors = FALSE
columns and rows to )
the already existing
data frame. # Expanding data frame
friend.data$location <- c("Kolkata", "Delhi",
"Bangalore", "Hyderabad",
"Chennai")
resultant <- friend.data
# print the modified data frame
print(resultant)
► In R Language, data frames are generic data objects
which are used to store the tabular data. Data frames
are considered to be the most popular data objects in
R programming because it is more comfortable to
analyze the data in the tabular form. Data frames can
also be taught as matrices where each column of a
matrix can be of the different data types. Data frame is
made up of three principal components, the data,
rows, and columns.
Merging Data frames
In R we use merge() function to merge two data frames in R. This
function is present inside join() function of dplyr package. The most
important condition for joining two data frames is that the column
type should be the same on which the merging
happens. merge() function works similarly like join in DBMS. Types of
Merging Available in R are,
1.Natural Join or Inner Join
2.Left Outer Join
3.Right Outer Join
4.Full Outer Join
5.Cross Join
6.Semi Join
7.Anti Join
Basic syntax of merge() function in R:

► Syntax:
merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)
► Parameters:
df1: one dataframe
df2: another dataframe
by.df1, by.df2: The names of the columns that are common to both df1
and df2.
all, all.df1, all.df2: Logical values that actually specify the type of merging
happens.
► First of all, we will create two dataframes that will help
us to understand each join easily.

df1 = data.frame(StudentId = c(101:106), df2 = data.frame(StudentId = c(102, 104, 106,


Product = c("Hindi", "English", 107, 108),
"Maths", "Science", State = c("Manglore", "Mysore",
"Political Science", "Pune", "Dehradun", "Delhi"))
"Physics")) df2
df1
Natural Join or Inner Join

► Inner join is used to keep only those rows that are matched from the
data frames, in this, we actually specify the argument all = FALSE. If
we try to understand this using set theory then we can say here we
are actually performing the intersection operation. For example:
► A = [1, 2, 3, 4, 5] B = [2, 3, 5, 6] Then the output of natural join will be
(2, 3, 5)
► It is the most simplest and common type of joins available in R. Now
let us try to understand this using R program:
# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId")


df
Left Outer Join

► Left Outer Join is basically to include all the rows of your


dataframe x and only those from y that match, in this, we
actually specify the argument x = TRUE. If we try to understand
this using a basic set theory then we can say here we are
actually displaying complete set x. Now let us try to understand
this using R program:
# R program to illustrate
# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",


all.x = TRUE)
df
Right Outer Join

► Right, Outer Join is basically to include all the rows of


your dataframe y and only those from x that match, in
this, we actually specify the argument y = TRUE. If we try
to understand this using a basic set theory then we can
say here we are actually displaying a complete set y.
Now let us try to understand this using R program:
# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",


all.y = TRUE)
df
Full Outer Join

► Outer Join is basically used to keep all rows from both


dataframes, in this, we actually specify the arguments all =
TRUE. If we try to understand this using a basic set theory then
we can say here we are actually performing the union option.
Now let us try to understand this using R program:

# Joining of dataframes

df = merge(x = df1, y = df2, by = "StudentId",


all = TRUE)
df
Cross Join

► A Cross Join also known as cartesian join results in every


row of one dataframe is being joined to every other row
of another dataframe. In set theory, this type of joins is
known as the cartesian product between two sets. Now
let us try to understand this using R program:

df = merge(x = df1, y = df2, by = NULL)


df
Semi Join

► This join is somewhat like inner join, with only the left
data frame columns and values are selected. Now let
us try to understand this using R program:
# R program to illustrate
# Joining of dataframes

# Import required library


library(dplyr)

df = df1 %>% semi_join(df2, by = "StudentId")


df
Anti Join

► In terms of set theory, we can say anti-join as set difference


operation, for example, A = (1, 2, 3, 4) B = (2, 3, 5) then the
output of A-B will be set (1, 4). This join is somewhat like df1 – df2,
as it basically selects all rows from df1 that are actually not
present in df2. Now let us try to understand this using R program:
# Import required library
library(dplyr)

df = df1 %>% anti_join(df2, by = "StudentId")


df
Matrix v/s Dataframe
Any Function
The any function in R will tell if you if there are ANY of the given search terms in
your vector. It returns either TRUE or FALSE. To demonstrate this function, let's
create a quick vector that goes from -3 to 5, incrementing by 1.
1.y <- seq(-3, 5, by = 1)
Now we can use the any function. There are several ways to use this, the
simplest is to enter the function and provide the condition. In our case, we'll
check for any negative numbers (e.g., x < 0):
1.any(x < 0)
Since you only have to provide the name of the vector and the condition, this
is the implicit iteration. After entering the previous code and hitting Enter, R will
display the following:
[1] TRUE
This means that the vector x contains negative values.
Another option is to create an if statement to check for any
negative values in the vector.
if(any(y < 0)) cat("Negative Values Found")
Instead of TRUE, the result will now display the message defined
in the if statement.
So far we have determined if any value is negative; next we
can check if ALL values meet the condition.
All function

We can run a test to see if all values meet a condition. Since R is


a tool for statistics and data science, you may not know what
values you have in a given vector.
The following code is a bit advanced, but it creates a
distribution; it is displayed here to demonstrate the generation of
a sequence in which you don't know the end result.
1.range(q <- sort(round(stats::rnorm(15), 1)))
Now we can check to see if ALL the numbers are negative.
Of course we know the answer, but as part of advanced R
data analysis, we may not know. This is where
the all function comes in handy: We can use it just as we
used the any function.
1.if(all(q < 0)) cat("All Negative Values Found")
In our case, the result of all(q) is FALSE, and our little
message won't display.
Example

► employee data frame


► structure and summary
► access age & salary from employee
► add basic salary package 10000,12000,14000,16000,18000
► second data frame need to be created emp2 combination of employee
id, location, cab
► Merge the two dfs
► both matching keep employee df first & emp2 dataframe first
Taking Input from User in R
Programming
Developers often have a need to interact with users, either to get
data or to provide some sort of result. Most programs today use a
dialog box as a way of asking the user to provide some type of
input. Like other programming languages in R, its also possible to
take input from the user. For doing so, there are two methods in R.

► Using readline() method


► Using scan() method
Using readline() method
► In R language readline() method takes input in string format. If one inputs
an integer then it is inputted as a string, lets say, one wants to input 255,
then it will input as “255”, like a string. So one needs to convert that
inputted value to the format that he needs. In this case, string “255” is
converted to integer 255. To convert the inputted value to the desired
data type, there are some functions in R,
► as.integer(n); —> convert to integer
► as.numeric(n); —> convert to numeric type (float, double etc)
► as.complex(n); —> convert to complex number (i.e 3+2i)
► as.Date(n) —> convert to date …, etc
► One can also show message in the console window to tell the user, what
to input in the program. To do this one must use a argument
named prompt inside the readline() function. Actually prompt argument
facilitates other functions to constructing of files documenting.
But prompt is not mandatory to use all the time.
► var1 = readline(prompt = “Enter any number : “);
or,
var1 = readline(“Enter any number : “);
var1 = readline("Enter any Number: ")
var1 <- as.integer(var1)
var1
Taking multiple inputs in R

► Taking multiple inputs in R language is same as taking single input, just need to define
multiple readline() for inputs. One can use braces for define multiple readline() inside it.
► var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
or,
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
{
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
}

# converting each value


var1 = as.integer(var1);
var2 = as.integer(var2);
var3 = as.integer(var3);
var4 = as.integer(var4);

# print the sum of the 4 number


print(var1 + var2 + var3 + var4)
► string:
var1 = readline(prompt = “Enter your name : “);

► character:
var1 = readline(prompt = “Enter any character :
“);
var1 = as.character(var1)
# string input
var1 = readline(prompt = "Enter your name : ");

# character input
var2 = readline(prompt = "Enter any character : ");
# convert to character
var2 = as.character(var2)

# printing values
print(var1)
print(var2)
Using scan() method

► Another way to take user input in R language is using a


method, called scan() method. This method takes input
from the console. This method is a very handy method
while inputs are needed to taken quickly for any
mathematical calculation or for any dataset. This
method reads data in the form of a vector or list. This
method also uses to reads input from a file also.
► x = scan()
scan() method is taking input continuously, to terminate
the input process, need to press Enter key 2 times on the
console.
Decision making is an important part of programming. This can be
achieved in R programming using the conditional if…else statement
if (test_expression) {
R if statement statement
The syntax of if statement is: }

If the test_expression is TRUE, the statement gets executed. But if it’s


FALSE, nothing happens.

Here, test_expression can be a logical or numeric vector, but only the


first element is taken into consideration.

In the case of numeric vector, zero is taken as FALSE, rest as TRUE.


Examples

x <- 5
if(x > 0){ print("Positive number") }

y <- -1
if(y > 0){ print("Positive number") }

z <- c(x,y)
if(z > 0){ print("Positive number") }

m <- c(y,x)
if(m > 0){ print("Positive number") }
if…else statement

if (test_expression) {
► The syntax of if…else statement is: statement1
} else {
statement2 }

► The else part is optional and is only evaluated if


test_expression is FALSE.

► It is important to note that else must be in the same line


as the closing braces of the if statement.
x <- -5
x <- c("what","is","truth")
if(x > 0){
if("Truth" %in% x) {
print("Non-negative number")
print("Truth is found")
} else {
} else {
print("Negative number") }
print("Truth is not found") }
x <- -5
y <- if(x > 0) 5 else 6
y
if…else Ladder

► The if…else ladder (if…else…if)


statement allows you execute a block if ( test_expression1) {
of code among more than 2 statement1
alternatives } else if ( test_expression2) {
► The syntax of if…else statement is: statement2
} else if ( test_expression3) {
statement3
► Only one statement will get executed } else {
depending upon the test_expressions. statement4
}
x <- 0 x <- c("what","is","truth")
if (x < 0) { if("Truth" %in% x) {
print("Negative number") print("Truth is found the first time")
} else if (x > 0) { } else if ("truth" %in% x) {
print("Positive number") print("truth is found the second time")
} else } else {
print("Zero") print("No truth found") }
x <- c(-5:5, NA)
if_else(x < 0, NA_integer_, x)
#> [1] NA NA NA NA NA 0 1 2 3 4 5 NA
if_else(x < 0, "negative", "positive", "missing")

# Unlike ifelse, if_else preserves types


x <- factor(sample(letters[1:5], 10, replace =
TRUE)) x <- seq(0.1,10,0.1)
ifelse(x %in% c("a", "b", "c"), x, factor(NA)) y <- if (x < 5) 1 else 2
#> [1] 1 NA 1 NA 2 2 2 NA 1 1
if_else(x %in% c("a", "b", "c"), x, factor(NA))
#> [1] b <NA> b <NA> c c c <NA> b b
#> Levels: b c d e
# Attributes are taken from the `true` vector
R Factors and Tables

• One often has to deal with categorical variables in statistics (i.e.,


variables at the nominal or ordinal level of measurement). In R,
these are best dealt with through the use of factors.
• For example, fertilizers typically have three main ingredients,
nitrogen (N), phosphorous (P), and potassium (K). Perhaps one is
conducting an experiment to determine which of these
ingredients best promotes root development, and has four
treatment groups (one for each ingredient, and a control group
that receives none of the ingredients).
Plants numbered 1 through 12 are randomly assigned to one of the
four treatment groups so that each group ends up with 3 members.
We could represent this process with the vector named f, as shown
below -- where the treatment given to plant i corresponds to
the ith element of the vector:
f = c("K","K","none","N","P","P","N","N","none","P","K","none")
To make R aware that the values listed are values associated with a
categorical variable (which are called levels in R), we convert this
vector into a factor with the factor() function:
fertilizer = factor(f)
Asking R to show the contents of f and fertilizer suggests
there is a subtle difference between the two variables, as
shown below:
f
[1] "K" "K" "none" "N" "P" "P" "N" "N" "none" "P" "K" "none"

fertilizer
[1] K K none N P P N N none P K none
Levels: K N none P
First, it is clear that R is no longer considering the
elements of the factor as strings of characters, given the
absence of double-quotes. Second (and more
importantly), additional information in the form of
"Levels: K N none P" is given. The levels shown
correspond to the unique values seen in the
vector ff (i.e., the categories that represent the
treatment groups).
There are other differences between a vector and a factor, which we can
see if we use the str(x) function. This function in R displays a compact
representation of the internal structure of any R variable. Let's see what
happens when we apply it to both f and fertilizer:

str(f)
# chr [1:12] "K" "K" "none" "N" "P" "P" "N" "N" "none" "P" "K" "none"
str(fertilizer)
# Factor w/ 4 levels "K","N","none",..: 1 1 3 2 4 4 2 2 3 4 ...

Note how in the factor fertilizer, the levels "K", "N", "none", and "P" are
replaced by numbers 1, 2, 3, and 4, respectively. So internally, R only stores
the numbers (indicating the level of each vector element) and
(separately) the names of each unique level. (Interestingly, even if the
vector's elements had been numerical, the levels are stored as strings of
text.)
The way R internally stores factors is important when we want to
combine them. Consider the following failed attempt to combine
factors a.fac and b.fac:

a.fac = factor(c("X","Y","Z","X"))
b.fac = factor(c("X","X","Y","Y","Z"))
factor(c(a.fac,b.fac))
# [1] 1 2 3 1 1 1 2 2 3 Levels: 1 2 3

Notice how we lost the names associated with the different


levels.
There is a way to restore them -- but it would be better not to lose
them in the first place! The as.character() function can help here. This
function can be used to force a factor back into a vector whose
elements are the corresponding strings of text associated with its
levels. For example, as.character(factor(c("X","Y"))) returns a vector
equivalent to c("X","Y").
To combine two factors (with the same levels), we force them both
back to vectors in the way just described, combine the vectors
with c(), and then convert the result back into a factor -- as shown
below:
a.fac = factor(c("X","Y","Z","X"))
b.fac = factor(c("X","X","Y","Y","Z"))
factor(c(as.character(a.fac),as.character(b.fac)))
One can also change the levels associated with a factor,
using levels() as the following suggests.
a.fac = factor(c("X","Y","Z","X"))
a.fac
# [1] X Y Z X Levels: X Y Z
levels(a.fac) = c("A","B","C")
a.fac
# [1] A B C A Levels: A B C
Tables

Factors can also be used to create tables in R, another important data type
in terms of its relationship to statistics.
As an example, suppose that a sample of 7 people are asked the following
questions in a study of workplace risk of tetanus infections:
●Q1: "On any given day, is there a risk that you might be cut at work? (Yes,
No, or Maybe)"
●Q2: "Does your work put you in contact with soil, dust, or manure?" (Always,
Never, or Sometimes)
The answers to each question for subjects 1 through 7 are given by the
following factors:
Q1 = factor(c("Sometimes","Sometimes","Never","Always","Always","Sometimes","Sometimes"))
Q2 = factor(c("Maybe","Maybe","Yes","Maybe","No","Yes","No"))
Thinking that there might be a relationship between these two
variables, we wish to construct a contingency table -- where the
levels of one variable form the column headers and the levels of
the other variable form the row headers, with the body of the table
indicating how many subjects were associated with each possible
pair of levels.
To create such a table in R, we simply use the table() command, as
shown below:
Accessing the elements of the table
► One can also produce new tables from existing ones.
For example, suppose we wanted to see a table of
relative frequencies instead of counts. Much like one
might do with a vector, we simply divide the table by
the sum of its elements:
Practice exercises

► Create an empty vector “a” and the add a sequence of 1 to 20 in it.


Print the vector before and after the addition has been made and also
check for the class of vector a.
► Create three empty vectors named “a”, “b” and “d”. Print the vectors
and display their types. Include (10,20,14.5, 89.000) in “a”; (TRUE, FALSE,
FALSE) in “b” and (“Garima”, “Nishant”, “Tanaya”) in “d” respectively
using the c function. Check for the type and print the vectors. Repeat
the exercise using the index[].
► Using the above vectors, access first element of “a”, second element
of “b” and first and third elements of “d” separately. Modify the third
element of “b” vector as TRUE. Delete the vector “d” and sort the
vector a in a decreasing order and store the results in “A”.
Finding sum, mean and product of
vector in R: practice
vec = c(1, 2, 3 ,4, 5)
print("Sum of the vector:")
► Create a vector named vec with
values 1:5. Calculate
# inbuilt sum method
sum(vec),mean(vec),prod(vec).
print(sum(vec))
► sum(),
mean(), and prod() methods are
# using inbuilt mean method
available in R which are used to
compute the specified operation print("Mean of the vector:")
over the arguments specified in print(mean(vec))
the method. In case, a single
vector is specified, then the # using inbuilt product method
operation is performed over print("Product of the vector:")
individual elements print(prod(vec))
Vector with NaN values
► If we have NA values in the vector, normal functions wont work.

# declaring a vector
vec = c(1.1,NA, 2, 3.0,NA )
print("Sum of the vector:") # using inbuilt product method
print("Product of the vector:")
# inbuilt sum method print(prod(vec))
print(sum(vec))
# ignoring missing values
# using inbuilt mean method print("Sum of the vector without NaN values:")
print("Mean of the vector with NaN values:") print(sum(vec,na.rm = TRUE))
# not ignoring NaN values # ignoring missing values
print(mean(vec)) print("Product of the vector without NaN values:")
print(prod(vec,na.rm = TRUE))
# ignoring missing values
print("Mean of the vector without NaN values:")
print(mean(vec,na.rm = TRUE))
Practice

► Create a Vector X with these values: 1.1, 2, 3.0, 5.7 and repeat the
exercise.
► Create a Vector Y with these values: 7,NA,9,8,NA,75,NA,65 and
repeat the exercise.
► Make two vectors A and B, A being a sequence of range 1:10 and B
in the range of 10:15 sorted in decreasing order. Calculate
A+B,A-B,B-A,A*B,A/B,B/A, A^B,B^A, remainder and quotient when A/B
and B/A.
► Make two vectors C and D, C being a sequence of range 11:20 and
D in the range of 5:14 sorted in decreasing order. Repeat the previous
Matrix exercise

► Create a matrix data1 having 5 rows with a sequence of 1 to 10.


Create a matrix data2 having 5 rows with a sequence of 11 to 20.
Print the two matrices and calculate A+B,A-B,B-A,A*B,A/B,B/A;
where A is data1 and B corresponds to data2.
► Create data3 as 21:30 in the similar manner. Calculate product of
all three matrices.
► Create a vector “a” using 3, 4, 5, 6, 7, 8 and vector “b” using 1, 3,
0, 7, 8, 5. Make a matrix M using vector a having 3 columns and
allocate the values row wise. Make a matrix N using vector b in the
similar manner. Calculate M+N,M-N,M*N,M/N
► Create a 4*4 matrix X using numbers from 1 to 16. Names the rows as
a,b,c and d respectively. Name the columns as m,n,o and p. Assess
the 3rd row, 2nd column separately. Assess the 3rd row second column
element. Assess the 1st and 3rd row together. Change the 2nd row 2nd
element to 50.
► Make another 4*1 matrix S using values 17,18,19,20. Combine S matrix
into the matrix X. Make another 1*4 matrix R using values 21,22,23,24.
Combine R matrix into the matrix X. Delete the 2 nd row and 2nd
column of matrix X.
► Create a 4*4 matrix Y using 33:48. Names the rows as e,f,g,h
respectively. Name the columns as q,r,s and t. Assess the 2 nd row, 1st
column separately. Assess the 2nd row 3rd column element. Assess the
1st and 3rd column together. Change the 2nd row 1st element to 1.
► Calculate X+Y,X-Y,Y-X, X*Y,X/Y, Y/X
List examples

► Create a random list of 10 elements using a sequence of numbers


from 1 to 10 without replacement. Repeat the same exercise with
replacement.
► Create a random list of 15 elements using a sequence of numbers
from 1 to 5 with associated probabilities as 0.02,0.2,0.25,0.5,0.9.
► Create a list 10 elements using a sequence of numbers from 1 to
10. Convert this list into array of dimensions 3*3*3.
► Create a list using 3 names: Nitin, Priyanshu, Harshal. Convert this
list to array of dimensions 1*3*2
List2 examples

► Create a list of 3 components, element 1 as a sequence


of 1 to 5; False and True as the second element and
letters from d to i as third element of the list.
► Now access second element components; TRUE; third
element third value; G.
► Delete 3 from the list and print the results.
► Add an element 4 “Sun”, “Mon” “Tue” in the previous list
and print the results.
► Create a dataframe with id 1 to 5 and names “Pulkita”,
“Ritesh”, “Ashish”, “Rahul”, “Akansha”.
► Add the programme name PGDDA to the existing data
frame and print the results. Add the course code
101,103,105,107 and 109 to the dataframe. Get the
structure and summary of the dataframe.
► Construct a data frame named authors and books like this. Merge
the two by author id

You might also like