0% found this document useful (0 votes)
5 views

Data Analysis Using R Programming_f4bbedee6eb42e53b3cde0028f27ba5b

The document provides an overview of R programming, including its features, environment setup, and basic syntax. It covers various data types, operators, and real-life applications of R in fields such as finance, healthcare, and social media. Additionally, it includes practical examples of using R for data analysis, including creating vectors, lists, matrices, and arrays.

Uploaded by

nitinkahar366
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Analysis Using R Programming_f4bbedee6eb42e53b3cde0028f27ba5b

The document provides an overview of R programming, including its features, environment setup, and basic syntax. It covers various data types, operators, and real-life applications of R in fields such as finance, healthcare, and social media. Additionally, it includes practical examples of using R for data analysis, including creating vectors, lists, matrices, and arrays.

Uploaded by

nitinkahar366
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 341

Data Analysis using

R Programming
BY
Dr. Anisha P Rodrigues
Associate Professor
Department of Computer Science and Engineering
NMAMIT,Nitte
Dr. Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Topics
◼ Features of R
◼ Environment setup with RStudio
◼ R Commands
◼ R Script file
◼ Variables and Data Types
◼ Operators
◼ Decision making, Loops, Strings, Vectors, Lists,
Matrices, Arrays, Factors,Data Frames
◼ Functions
◼ R packages
◼ Data Re-shaping

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R-Overview
◼ R is a programming language and software environment
for statistical analysis, graphics representation and
reporting.

◼ R was created by Ross Ihaka and Robert Gentleman at


the University of Auckland, New Zealand in 1993, and is
currently developed by the R Development Core Team.

◼ This programming language was named R, based on the


first letter of first name of the two R authors (Robert
Gentleman and Ross Ihaka).

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R- Overview
◼ The core of R is an interpreted computer language which
allows branching and looping as well as modular
programming using functions.
◼ R allows integration with the procedures written in the C,
C++, .Net, Python or FORTRAN languages for
efficiency.
◼ R is freely available under the GNU General Public
License, an official part of the GNU project called GNU S
and pre-compiled binary versions are provided for
various operating systems like Linux, Windows and Mac.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Features of R
◼ The following are the important features of R:
❑ R is a well-developed, simple and effective programming
language which includes conditionals, loops, user defined
recursive functions and input and output facilities.
❑ R has an effective data handling and storage facility.
❑ R provides a suite of operators for calculations on arrays, lists,
vectors and matrices.
❑ R provides a large, coherent and integrated collection of tools for
data analysis.
❑ R provides graphical facilities for data analysis and display either
directly at the computer or printing at the papers
◼ As a conclusion, R is world’s most widely used statistics
programming language

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Applications of R Programming
◼ 1. Finance
◼ 2. Banking
◼ 3. Healthcare
◼ 4. Social Media
◼ 5. E-Commerce
◼ 6. Manufacturing
◼ 7. Interactive web application visualizations
◼ 8. R is also used for machine learning
research and deep learning as well.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Real-Life Use Cases of R Language
◼ 1. Facebook – Facebook uses R to update status and its social network
graph. It is also used for predicting colleague interactions with R.
◼ 2. Google – Google uses R to calculate ROI on advertising campaigns and
to predict economic activity and also to improve the efficiency of online
advertising.
◼ 3. Twitter – R is part of Twitter’s Data Science toolbox for sophisticated
statistical modeling.
◼ 4. National Weather Service – The National Weather Service uses R at its
River Forecast Centers. Thus, it is used to generate graphics for flood
forecasting.
◼ 5. New York Times – R is used in the news cycle at The New York Times
to crunch data and prepare graphics before they go for printing.
◼ 6. Mozilla – It is the foundation behind the Firefox web browser and uses R
to visualize web activity.
◼ 7. Ford Motor Company – Ford relies on Hadoop. It also relies on R for
statistical analysis as well as carrying out data-driven support for decision
making.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R – Environment Setup
◼ Try it Option Online
❑ You really do not need to set up your own environment to start
learning R programming language.

◼ https://www.mycompiler.io/online-r-compiler
◼ https://www.w3schools.com/r/r_compiler.asp

Try the following example:


❑ # Print Hello World.
❑ print("Hello World")
❑ # Add two numbers.
❑ print(23.9 + 11.6)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Local Environment Setup

Windows Installation
◼ Go to the CRAN(Comprehensive R Archive
Network) website.
❑ https://cran.r-project.org/
◼ You can download the Windows installer
version of R and save it in a local directory.
◼ R-4.3.1 for Windows

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Linux Installation
◼ R is available as a binary for many versions of Linux at
the location R Binaries.
◼ The instruction to install Linux varies from flavor to flavor.
These steps are mentioned under each type of Linux
version in the mentioned link.
◼ However, if you are in a hurry, then you can use yum
command to install R as follows:
❑ $ yum install R
◼ Above command will install core functionality of R
programming along with standard packages:
❑ $R
◼ This will launch R interpreter and you will get a prompt >
where you can start typing your program
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R – Basic Syntax
◼ As a convention, we will start learning R programming by
writing a "Hello, World!" program.
◼ R Command Prompt:
◼ Once you have R environment setup, then it’s easy to
start your R command prompt by just typing the following
command at your command prompt (Linux):
❑ $R
◼ This will launch R interpreter and you will get a prompt >
where you can start typing your program as follows:
❑ > myString <- "Hello, World!"
❑ > print ( myString)
❑ [1] "Hello, World!"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Open command prompt and type R:
❑ >R

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ you will get a prompt > where you can start typing your program as
follows:>
◼ myString <- "Hello, World!"
◼ print ( myString)
◼ [1] "Hello, World!"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ To quit
❑ >q()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R Script File
◼ Usually, you will do your programming by writing your programs in
script files and then you execute those scripts at your command
prompt with the help of R interpreter called Rscript.
◼ So, let's start with writing following code in a text file called test.
❑ # My first program in R Programming

❑ myString <- "Hello, World!"

❑ print ( myString)

◼ Save the above code in a file test.R. Execute it at command prompt


as follows

◼ $ Rscript test.R

◼ When we run the above program, it produces the following result.


❑ [1] "Hello, World!“

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Comments
◼ Comments are like helping text in your R program and
they are ignored by the interpreter while executing your
actual program.

◼ Single comment is written using # in the beginning of the


statement as follows:
◼ # My first program in R Programming

◼ R does not support multi-line comments

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ R does not support multi-line comments but you can
perform a trick which is something as follows:
if(FALSE){
"This is a demo for multi-line comments and it should
be put inside either a single of double quote"
}
myString <- "Hello, World!"
print ( myString)
◼ Though above comments will be executed by R
interpreter, they will not interfere with your actual
program.
◼ You should put such comments inside, either single or
double quote.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Data Types
◼ Generally, while doing programming in any programming
language, you need to use various variables to store
various information.
◼ Variables are nothing but reserved memory locations to
store values.
❑ This means that, when you create a variable you reserve some
space in memory.
◼ You may like to store information of various data types
like character, integer, floating point, double floating
point, Boolean etc.
❑ Based on the data type of a variable, the operating system
allocates memory and decides what can be stored in the
reserved memory.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Data Types
◼ In contrast to other programming languages like C and
java in R, the variables are not declared as some data
type.
❑ The variables are assigned with R-Objects and the data type
of the R-object becomes the data type of the variable.
◼ There are many types of R-objects –
❑ Vectors
❑ Lists
❑ Matrices
❑ Arrays
❑ Factors
❑ Data Frames
◼ The simplest of these objects is the vector object

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R - Vector
◼ The simplest of these objects is the vector object and
there are six data types of these atomic vectors, also
termed as six classes of vectors.
◼ The other R-Objects are built upon the atomic vectors.
◼ In R programming, the very basic data types are the R-
objects called vectors
◼ Six data types
❑ Logical
❑ Numeric
❑ Integer
❑ Complex
❑ Character
❑ Raw
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
You can also use the capital ‘L’ notation as a suffix to denote that a
particular value is of the integer data type.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ When you want to create vector with more than one
element, you should use c() function which means to
combine the elements into a vector.

❑ # Create a vector.
❑ apple <- c('red','green',"yellow")
❑ print(apple)
❑ # Get the class of the vector.
❑ print(class(apple))

◼ Result:
❑ [1] "red" "green" "yellow"
❑ [1] "character"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lists
◼R list is the object which contains elements of different types – like
strings, numbers, vectors and another list inside it.
◼ R list can also contain a matrix or a function as its elements. The list
is created using the list() function in R.
❑ # Create a list.

❑ list1 <- list(c(2,5,3),21.3)

❑ # Print the list.

❑ print(list1)

❑ print(list1[1])

◼ When we execute the above code, it produces the following result:


❑ [[1]]
❑ [1] 2 5 3
❑ [[2]]
❑ [1] 21.3
❑ [[1]]
❑ [1] 2 5 3
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example

◼ vec <- c(1,2,3)


◼ char_vec <- c("Hadoop", "Spark", "Flink", "Mahout")
◼ logic_vec <- c(TRUE, FALSE, TRUE, FALSE)
◼ out_list <- list(vec, char_vec, logic_vec)
◼ print(out_list)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Matrices
◼ A matrix is a two-dimensional rectangular data set.
◼ It can be created using a vector input to the matrix() function.
◼ byrow=TRUE indicates that the matrix should be filled by rows.
◼ byrow=FALSE indicates that the matrix should be filled by columns
(the default).
❑ # Create a matrix.

❑ M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow = TRUE)

❑ print(M)

◼ When we execute the above code, it produces the following result:


[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ # Create a matrix.
◼ M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow = FALSE)
◼ print(M)
◼ When we execute the above code, it produces the
following result:
[,1] [,2] [,3]
[1,] "a" “b" "b"
[2,] “a" “c" "a"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Arrays
◼ While matrices are confined to two dimensions, arrays can be of any
number of dimensions.
◼ It can be created using a array() function
◼ The array function takes a dim attribute which creates the required
number of dimension.
◼ dim = c(nrow, ncol, nmat)
◼ where,
❑ nrow : Number of rows

❑ ncol : Number of columns

❑ nmat : Number of matrices of dimensions

nrow * ncol
◼ Example:
❑ # Create an array.

❑ a <- array(c('green','yellow'),dim=c(3,3,2))

❑ print(a)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Factors
◼ Factors are the R-objects which are created using a
vector.
◼ Factors are the data objects which are used to
categorize the data and store it as levels. Tell R that a
variable is nominal by making it a factor.
◼ The factor stores the nominal values as a vector of
integers in the range [ 1... k ] (where k is the number of
unique values in the nominal variable), and an internal
vector of character strings (the original values) mapped
to these integers.
◼ Factors are created using the factor() function
◼ The nlevels functions gives the count of levels.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
❑ # Create a vector.
❑ apple_colors <- c('green','green','yellow','red','red','red','green')
❑ # Create a factor object.
❑ factor_apple <- factor(apple_colors)
❑ # Print the factor.
❑ print(factor_apple)
❑ print(nlevels(factor_apple))

◼ When we execute the above code, it produces the


following result:
❑ [1] green green yellow red red red green
❑ Levels: green red yellow
◼ # applying the nlevels function we can know the number
of distinct values
❑ [1] 3
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ # Create a vector as input.
◼ data <-
c("East","West","East","North","North","East","West","West","West","
East","North")
◼ print(data)

◼ print(is.factor(data))

◼ # Apply the factor function.

◼ factor_data <- factor(data)

◼ print(factor_data)

◼ print(is.factor(factor_data))

Result:
❑ [1] "East" "West" "East" "North" "North" "East" "West" "West" "West" "East"
"North"
❑ [1] FALSE
❑ [1] East West East North North East West West West East North
❑ Levels: East North West
❑ [1] TRUE

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Data Frames
◼ Data frames are tabular data objects. Unlike a
matrix in data frame each column can contain
different modes of data.
◼ The first column can be numeric while the
second column can be character and third
column can be logical.
◼ It is a list of vectors of equal length.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Data frame is a table or a two-dimensional array-like
structure in which each column contains values of one
variable and each row contains one set of values from
each column.
◼ Storing data in tabular form.
◼ Following are the characteristics of a data frame.
❑ The column names should be non-empty.

❑ The row names should be unique.

❑ The data stored in a data frame can be of numeric,

factor or character type.


❑ Each column should contain same number of data
items.
◼ Data Frames are created using the data.frame()
function.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
❑ # Create the data frame.
❑ BMI <-data.frame(
❑ gender = c("Male", "Male", "Female"),
❑ height = c(152, 171.5, 165),
❑ weight = c(81,93, 78),
❑ Age =c(42,38,26)
❑ )
❑ print(BMI)

◼ When we execute the above code, it produces the following result:

❑ gender height weight Age


❑ 1 Male 152.0 81 42
❑ 2 Male 171.5 93 38
❑ 3 Female 165.0 78 26

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Variables

◼ A variable provides us with named storage that


our programs can manipulate

◼ A valid variable name consists of letters,


numbers and the dot or underline characters

◼ The variable name starts with a letter or the dot


not followed by a number

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R Variable Assignment

◼ The variables can be assigned values using


leftward, rightward and equal to operator.
❑ ->,<-,=

◼ The values of the variables can be printed using


print() or cat()function.

◼ The cat() function combines multiple items into a


continuous print output.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# Assignment using equal operator.
var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
❑ print(var.1)
❑ cat ("var.1 is ", var.1 ,"\n")
❑ cat ("var.2 is ", var.2 ,"\n")
❑ cat ("var.3 is ", var.3 ,"\n")
◼ When we execute the above code, it produces the
following result:
❑ [1] 0 1 2 3
❑ var.1 is 0 1 2 3
❑ var.2 is learn R
❑ var.3 is 1 1
◼ Note:
❑ The vector c(TRUE,1) has a mix of logical and numeric class. So
logical class is coerced to numeric class making TRUE as 1.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Data Type of a Variable
◼ In R, a variable itself is not declared of any data type, rather it gets
the data type of the R -object assigned to it.
◼ So, R is called a dynamically typed language, which means that we
can change a variable’s data type of the same variable again and
again when using it in a program.
❑ var_x <- "Hello"
❑ cat("The class of var_x is ",class(var_x),"\n")
❑ var_x <- 34.5
❑ cat("Now the class of var_x is ",class(var_x),"\n")
❑ var_x <- 27L
❑ cat("Next the class of var_x becomes ",class(var_x),"\n")

◼ When we execute the above code, it produces the following result:


❑ The class of var_x is character
❑ Now the class of var_x is numeric
❑ Next the class of var_x becomes integer

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Finding Variables
◼ To know all the variables currently available in the
workspace we use the ls() function.

❑ print(ls())

◼ When we execute the above code, it produces the


following result:
❑ [1] "my var" "my_new_var" "my_var" "var.1"
❑ [5] "var.2" "var.3" "var_name2."
❑ [9] "var_x" "varname "

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ The ls() function can use patterns to match the variable
names.

◼ # List the variables with the pattern "var".


❑ print(ls(pattern="var"))
◼ When we execute the above code, it produces the
following result:
❑ [1] "my var" "my_new_var" "my_var" "var.1"
❑ [5] "var.2" "var.3" "var_name2."
❑ [9] "var_x" "varname"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Deleting Variables

◼ Variables can be deleted by using the rm() function.


◼ Below we delete the variable var.3.
◼ On printing the value of the variable error is thrown.
❑ rm(var.3)
❑ print(var.3)
◼ When we execute the above code, it produces the
following result:

❑ Error in print(var.3) : object 'var.3' not found

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R- Operators

We have the following types of operators in R


programming:
◼ Arithmetic Operators (+,-,*,/,%%,%/%,^)

◼ Relational Operators (<,>,<=,>=,==,!=)

◼ Logical Operators(&,|,!, &&,||)

◼ Assignment Operators(=, ->, <-,->>,<<-)

◼ Miscellaneous Operators (:, %in%,%*%)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Arithmetic Operators
◼ Following table shows the arithmetic operators supported by R
language. The operators act on each element of the vector.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate the use of
Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Addition of vectors :", vec1 + vec2, "\n")
cat ("Subtraction of vectors :", vec1 - vec2, "\n")
cat ("Multiplication of vectors :", vec1 * vec2, "\n")
cat ("Division of vectors :", vec1 / vec2, "\n")
cat ("Modulo of vectors :", vec1 %% vec2, "\n")
cat ("Power operator :", vec1 ^ vec2)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Relational Operators
◼ Following table shows the relational operators supported by R language.
Each element of the first vector is compared with the corresponding element
of the second vector. Returns a Boolean TRUE value if the first operand
satisfies the relation compared to the second.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate the use of
Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Vector1 less than Vector2 :", vec1 < vec2, "\n")
cat ("Vector1 less than equal to Vector2 :", vec1 <= vec2, "\n")
cat ("Vector1 greater than Vector2 :", vec1 > vec2, "\n")
cat ("Vector1 greater than equal to Vector2 :", vec1 >= vec2, "\n")
cat ("Vector1 not equal to Vector2 :", vec1 != vec2, "\n")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Logical Operators
◼ Following table shows the logical operators supported by R
language. It is applicable only to vectors of type logical, numeric or
complex. Any non-zero integer value is considered as a TRUE
value, be it a complex or real number.
◼ Each element of the first vector is compared with the corresponding
element of the second vector. The result of comparison is a Boolean
value.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ The logical operator && and || considers only the first element of the
vectors and give a vector of single element as output.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate the use of
Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)

# Performing operations on Operands


cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Assignment Operators

◼ These operators are used to assign values to


vectors.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate the use of Assignment operators
vec1 <- c(2:5)
c(2:5) ->> vec2
vec3 <<- c(2:5)
vec4 = c(2:5)
c(2:5) -> vec5

# Performing operations on Operands


cat ("vector 1 :", vec1, "\n")
cat("vector 2 :", vec2, "\n")
cat ("vector 3 :", vec3, "\n")
cat("vector 4 :", vec4, "\n")
cat("vector 5 :", vec5)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Miscellaneous Operators
◼ These operators are used to for specific purpose and not general
mathematical or logical computation.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate the use of Miscellaneous
operators

mat <- matrix (1:4, nrow = 1, ncol = 4)


print("Matrix elements using : ")
print(mat)

product = mat %*% t(mat)


print("Product of matrices")
print(product)
cat ("does 1 exist in prod matrix :", 1 %in% product)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
How to Read User Input in R?

◼ scan()
◼ readline()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
readline() function
◼ We can read the input given by the user in the terminal with the
readline() function.
◼ In R language readline() method takes input in string format

◼ as.integer(n); —> convert to integer

# R program to illustrate taking input from the user using readline()


var = readline()
# convert the inputted value to integer
var = as.integer(var)
# print the value
print(var)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ One can also show message in the console window to tell the user,
what to input in the program.
◼ Syntax:
❑ var1 = readline(prompt = “Enter any number : “)
❑ or,
❑ var1 = readline(“Enter any number : “)

# R program to illustrate taking input with showing the message


var = readline(prompt = "Enter any number : ")
# convert the inputted value to an integer
var = as.integer(var)
# print the value
print(var)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
scan() function
◼ scan() function in R reads data as a list or vector that
takes user input through a console or file.
◼ This function, however, can only read numeric values
and returns a numeric vector. If a non-numeric input is
given, the function gives an error.
❑ scan() method is taking input continuously, to terminate the input
process, need to press Enter key 2 times on the console or pass
nlines= parameter to scan() function
◼ # Take the input from the keyboard
◼ a <- scan(nlines=1)
◼ # Print the output to the console
◼ print(a)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
seq()
◼ seq() function in R Language is used to create a
sequence of elements in a Vector. It takes the length and
difference between values as optional argument.

◼ Syntax:
seq(from, to, by, length.out)

◼ Parameters:
❑ from: Starting element of the sequence
❑ to: Ending element of the sequence
❑ by: Difference between the elements
❑ length.out: Maximum length of the vector

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
vec1 <- seq(1, 10, by = 2)
vec2 <- seq(1, 10, length.out = 7)
print(vec1)
print(vec2)

◼ Output:

[1] 1 3 5 7 9
[1] 1.0 2.5 4.0 5.5 7.0 8.5 10.0

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
sort() function

◼ sort() function in R Language is used to sort a vector by


its values.
◼ It takes Boolean value as argument to sort in ascending
or descending order.
Syntax:
◼ sort(x, decreasing)

◼ Parameters:
❑ x: Vector to be sorted
❑ decreasing: Boolean value to sort in descending order

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ # R program to sort a vector
◼ # Creating a vector
◼ x <- c(7, 4, 3, 9, 1.2, -4, -5, -8, 6)
◼ # Calling sort() function
◼ sort(x)

◼ Output:
◼ [1] -8.0 -5.0 -4.0 1.2 3.0 4.0 6.0 7.0 9.0

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# R program to sort a vector
# Creating a vector
x <- c(7, 4, 3, 9, 1.2, -4, -5, -8, 6)
# Calling sort() function to print in decreasing order
sort(x, decreasing = TRUE)
Output:
◼ [1] 9.0 7.0 6.0 4.0 3.0 1.2 -4.0 -5.0 -8.0

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
rev() function
◼ rev() function in R Language is used to return the
reverse version of data objects.
◼ The data objects can be defined as Vectors, matrix,Data
Frames by Columns & by Rows, etc.
◼ Syntax:
◼ rev(x)

◼ Parameter:
❑ x: Data object

◼ Returns: Reverse of the data object passed

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example

# R program to reverse a vector

# Create a vector
vec <- 1:5
vec

# Apply rev() function to vector


vec_rev <- rev(vec)
vec_rev

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Vectors
◼ Vectors are the most basic R data objects and there are
six types of atomic vectors.

◼ They are logical, integer, double, complex, character


and raw.

Vector Creation
◼ Even when you write just one value in R, it becomes a
vector of length 1 and belongs to one of the above vector
types.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Atomic vector of type character.
print("abc");
# Atomic vector of type double.
print(12.5)
# Atomic vector of type integer.
print(63L)
# Atomic vector of type logical.
print(TRUE)
# Atomic vector of type complex.
print(2+3i)
# Atomic vector of type raw.
print(charToRaw('hello'))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
When we execute the above code, it produces
the following result:
◼ [1] "abc"

◼ [1] 12.5

◼ [1] 63

◼ [1] TRUE

◼ [1] 2+3i

◼ [1] 68 65 6c 6c 6f

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Multiple Elements Vector
◼ Using colon operator with numeric data
# Creating a sequence from 5 to 13.
v <- 5:13
print(v)
# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)
# If the final element specified does not belong to the sequence
then it is discarded.
v <- 3.8:11.4
print(v)
◼ When we execute the above code, it produces the following
result:
❑ [1] 5 6 7 8 9 10 11 12 13
❑ [1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
❑ [1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Using sequence (Seq.) operator
# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5, 9, by=0.4))
◼ When we execute the above code, it produces the following
result:
◼ [1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

◼ Using the c() function


The non-character values are coerced to character type if one of the
elements is a character.
# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)
◼ When we execute the above code, it produces the following
result:
◼ [1] "apple" "red" "5" "TRUE"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Accessing Vector Elements
◼ The [ ] brackets are used for indexing.
❑ Indexing starts with position 1.

❑ Giving a negative value in the index drops that element from result.

❑ TRUE,FALSE or 0 and 1 can also be used for indexing.

◼ # Accessing vector elements using position.


t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
print(t[1])
print(t[-1])
u <- t[c(2,3,6)]
print(u)

◼ [1] "Sun“
◼ [1] "Mon" "Tue" "Wed" "Thurs" "Fri" "Sat“
◼ [1] "Mon" "Tue" "Fri"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ # Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

When we execute the above code, it produces the following result:


◼ [1] "Sun" "Fri"

◼ # Accessing vector elements using negative indexing.


x <- t[c(-2,-5)]
print(x)

When we execute the above code, it produces the following result:


◼ [1] "Sun" "Tue" "Wed" "Fri" "Sat"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Vector Manipulation
◼ Vector Arithmetic
❑ Two vectors of same length can be added, subtracted,
multiplied or divided giving the result as a vector
output.

# Create two vectors.


v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)
# Vector addition.
add.result <- v1+v2
print(add.result)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Vector subtraction.
sub.result <- v1-v2
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
# Vector division.
divi.result <- v1/v2
print(divi.result)

When we execute the above code, it produces the following result:


◼ [1] 7 19 4 13 1 13

◼ [1] -1 -3 4 -3 -1 9

◼ [1] 12 88 0 40 0 22

◼ [1] 0.7500000 0.7272727 Inf 0.6250000 0.0000000 5.5000000

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Vector Element Recycling
❑ If we apply arithmetic operations to two vectors of
unequal length, then the elements of the shorter
vector are recycled to complete the operations.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# v2 becomes c(4,11,4,11,4,11)
add.result <- v1+v2
print(add.result)
sub.result <- v1-v2
print(sub.result)
When we execute the above code, it produces the
following result:
❑ [1] 7 19 8 16 4 22
❑ [1] -1 -3 0 -6 -4 0
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Vector Element Sorting
❑ Elements in a vector can be sorted using the sort()

function.
v <- c(3,8,4,5,0,11, -9, 304)
# Sort the elements of the vector.
sort.result <- sort(v)
print(sort.result)
# Sort the elements in the reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
sort.result <- sort(v)
print(sort.result)
# Sorting character vectors in reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ When we execute the above code, it
produces the following result:
❑ [1] -9 0 3 4 5 8 11 304
❑ [1] 304 11 8 5 4 3 0 -9
❑ [1] "Blue" "Red" "violet" "yellow"
❑ [1] "yellow" "violet" "Red" "Blue"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R-Lists
◼ Lists are the R objects which contain elements of different types like
- numbers, strings, vectors and another list inside it.
◼ A list can also contain a matrix or a function as its elements. List is
created using list() function.

◼ Creating a List
◼ Following is an example to create a list containing strings, numbers,
vectors and a logical values

# Create a list containing strings, numbers, vectors and a logical


values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ When we execute the above code, it produces the following result:
◼ [[1]]
◼ [1] "Red"
◼ [[2]]
◼ [1] "Green"
◼ [[3]]
◼ [1] 21 32 11
◼ [[4]]
◼ [1] TRUE
◼ [[5]]
◼ [1] 51.23
◼ [[6]]
◼ [1] 119.1

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Naming List Elements
◼ The list elements can be given names and they can be
accessed using these names.
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8),
nrow=2), list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st_Quarter", "A_Matrix", "A_Inner_list")
or
list_data <- list('1st_Quarter'=c("Jan","Feb","Mar"), A_Matrix
=matrix(c(3,9,5,1,-2,8), nrow=2), A_Inner_list=list("green",12.3))
# Show the list.
print(list_data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
When we execute the above code, it produces the following result:
◼ $`1st_Quarter`

◼ [1] "Jan" "Feb" "Mar"

◼ $A_Matrix

◼ [,1] [,2] [,3]

◼ [1,] 3 5 -2

◼ [2,] 9 1 8

◼ $A_Inner_list

◼ $A_Inner_list[[1]]

◼ [1] "green"

◼ $A_Inner_list[[2]]

◼ [1] 12.3

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Accessing List Elements
◼ Elements of the list can be accessed by the index of the element in
the list.
◼ In case of named lists it can also be accessed using the names.
◼ We continue to use the list in the above example:
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow=2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st_Quarter", "A_Matrix", "A_Inner list")
# Access the first element of the list.
print(list_data[1])
# Access the third element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$'1st_Quarter’)
print(list_data$A_Matrix)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ When we execute the above code, it produces the
following result:
❑ $`1st_Quarter`
❑ [1] "Jan" "Feb" "Mar"
❑ $A_Inner_list
❑ $A_Inner_list[[1]]
❑ [1] "green"
❑ $A_Inner_list[[2]]
❑ [1] 12.3
❑ $`1st_Quarter`
❑ [1] "Jan" "Feb" "Mar"
❑ [,1] [,2] [,3]
❑ [1,] 3 5 -2
❑ [2,] 9 1 8

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Manipulating List Elements
◼ We can add, delete and update list elements as shown
below.
◼ We can add elements only at the end of a list. But we
can update any element.

# Create a list containing a vector, a matrix and a list.


list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8),
nrow=2), list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner
list")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
#Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

When we execute the above code, it produces the following result:


◼ [[1]]
◼ [1] "New element"
◼ $
◼ NULL
◼ $`A Inner list`
◼ [1] "updated element"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Merging Lists
◼ You can merge many lists into one list by placing all the
lists inside one list() function.

# Create two lists.


list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
merged.list <- c(list1,list2)
# Print the merged list.
print(merged.list)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ When we execute the above code, it produces the
following result :
❑ [[1]]
❑ [1] 1
❑ [[2]]
❑ [1] 2
❑ [[3]]
❑ [1] 3
❑ [[4]]
❑ [1] "Sun"
❑ [[5]]
❑ [1] "Mon"
❑ [[6]]
❑ [1] "Tue"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Converting List to Vector
◼ A list can be converted to a vector so that the elements of the vector
can be used for further manipulation.
◼ All the arithmetic operations on vectors can be applied after the list is
converted into vectors.
◼ To do this conversion, we use the unlist() function. It takes the list as
input and produces a vector.

# Create lists.
list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
print(v1)
print(v2)
# Now add the vectors
result <- v1+v2
print(result)

When we execute the above code, it produces the following result :


◼ [[1]]

◼ [1] 1 2 3 4 5

◼ [[1]]

◼ [1] 10 11 12 13 14

◼ [1] 1 2 3 4 5

◼ [1] 10 11 12 13 14

◼ [1] 11 13 15 17 19

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Matrices
◼ Matrices are the R objects in which the elements are
arranged in a two-dimensional rectangular layout.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ rownames() and colnames() allow us to add row names and/or
column names to this matrix.
◼ dimnames(): returns all dimension names

rownames(x) <- c("Row 1", "Row 2", "Row 3")


colnames(x) <- c("Col A", "Col B", "Col C")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Matrix metrics
# Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)

cat("Dimension of the matrix:\n")


print(dim(A))

cat("Number of rows:\n")
print(nrow(A))

cat("Number of columns:\n")
print(ncol(A))

cat("Number of elements:\n")
print(length(A))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Accessing Elements of a Matrix
◼ Elements of a matrix can be accessed by using the
column and row index of the element. We consider the
matrix P above to find the specific elements below.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Accessing first and second column
cat("Accessing first and second column\n")
print(P[, 1:2])
# Accessing first and second row
cat("Accessing first and second row\n")
print(A[1:2, ])
cat("Accessing the first three rows and the first two columns\n")
print(A[1:3, 1:2])
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Matrix Computations
◼ Various mathematical operations are performed on the matrices
using the R operators. The result of the operation is also a matrix.
◼ The dimensions (number of rows and columns) should be same for
the matrices involved in the operation.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Editing matrix
# Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)

# Editing the 3rd rows and 3rd column element


# from 9 to 30
# by direct assignments
A[3, 3] = 30
cat("After edited the matrix\n")
print(A)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Deleting rows and columns of a Matrix
◼ To delete a row or a column, first of all, you need to access that row
or column and then insert a negative sign before that row or column.
◼ It indicates that you had to delete that row or column.

cat("Before deleting the 2nd row\n")


print(A)
# 2nd-row deletion
A = A[-2, ]
cat("After deleted the 2nd row\n")
print(A)

# 2nd-row deletion
A = A[, -2]
cat("After deleted the 2nd column\n")
print(A)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Finding the sum of rows, columns, and total in a
matrix in R
◼ To find the sum of row, columns, and total in a matrix can be simply
done by using the functions rowSums, colSums, and sum
respectively.
◼ Example:

M1<−matrix(1:25,nrow=5)
print(M1)

Output:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Calculations Across Array Elements

◼ We can do calculations across the elements in


an array using the apply()function.

• x: determines the input array including matrix.


• margin: If the margin is 1 function is applied across row,
if the margin is 2 it is applied across the column.
• function: determines the function that is to be applied
on input data.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
◼ We use the apply() function below to calculate the sum
of the elements in the rows of an array across all the
matrices.
# create sample data
sample_matrix <- matrix(1:10,nrow=3, ncol=10)
print( "sample matrix:")
sample_matrix

# Use apply() function across row to find sum


print("sum across rows:")
apply( sample_matrix, 1, sum)

# use apply() function across column to find mean


print("mean across columns:")
apply( sample_matrix, 2, mean)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Product of matrix

◼ product = mat %*% mat


◼ print("Product of matrices")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lab Program 1

◼ 1) Perform the following using R


a. Create a vector of multiples of 5 using the sequence function

starting from 5 to 60 and display its values. Create filtered vector


whose values are greater than 15 and less than 45.
b. Create a list containing a vector, a matrix, and another list. Display
the class of each element in the list. Count the number of objects in
a given list and find the length of the first vector of a given list.
c. Create matrices in R and perform Addition, Subtraction of two
matrices and product of matrix and its transpose.
d. Calculate the Column sum, Mean across rows, Total Sum of a
matrix and Sort the matrix elements across columns in ascending
order.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Solution:
◼ (a) Create a vector of multiples of 5 using the sequence function starting from
5 to 60 and display its values. Create filtered vector whose values are greater
than 15 and less than 45.
# Creating a vector with multiples of 5 using seq
vector <- seq(5, 60, by = 5)

# Displaying the values vector


cat("Value of vector:", vector, "\n")

# Logical indexing to find elements greater than 15 and less than 45


filtered_vector <- vector[vector > 15 & vector < 45]

# Displaying the filtered vector


cat("Elements greater than 15 and less than 45:", filtered_vector, "\n")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ (b) Create a list containing a vector, a matrix, and another list. Display the
class of each element in the list. Count the number of objects in a given list and
find the length of the first vector of a given list.
# Creating a vector
my_vector <- c(1, 2, 3, 4, 5)
# Creating a matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Creating a nested list
my_nested_list <- list( sub_vector = c("a", "b", "c"), sub_number = 42)
# Creating the main list and naming the elements
my_list <- list(vector = my_vector,matrix = my_matrix, nested_list = my_nested_list)

# Displaying the class of each element in the list


for (name in names(my_list)) {
element_class <- class(my_list[[name]])
cat("Class of", name, ":", element_class, "\n")
}
print("Number of objects in the said list:")
length(my_list)
print("Length of the vector ‘vector' of the said list")
print(length(my_list$vector))
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ (c) Create matrices in R and perform Addition, Subtraction of two matrices
and product of matrix and its transpose.
#Create matrices
matrix1 <- matrix(c(2,1,1,1,1,-1,1,1,2), nrow=3,ncol=3)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4,2,3,2), nrow=3,ncol=3)
print(matrix2)
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)
# Product of matrix and its transpose.
result <- matrix1 %*% t(matrix1)
cat("Result of product of matrix and its transpose","\n")
print(result) Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ (d)Calculate the Column sum, Mean across rows, Total Sum of a matrix and
Sort the matrix elements across columns in ascending order.
print("Column sum:")
colSums(matrix1)
print("Mean across rows:")
apply( matrix1, 1, mean)
print("Total sum:")
sum(matrix1)
print("Matrix elements in sorted order column wise:")
apply(matrix1, 2, sort)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Decision Making
◼ Decision making structures require the programmer to specify one
or more conditions to be evaluated or tested by the program, along
with a statement or statements to be executed if the condition is
determined to be true, and optionally, other statements to be
executed if the condition is determined to be false.
◼ Following is the general form of a typical decision-making structure
found in most of the programming languages:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ R provides the following types of decision-making
statements.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R - If Statement
◼ An if statement consists of a Boolean expression
followed by one or more statements.
Syntax
◼ The basic syntax for creating an if statement in R is:

if(boolean_expression) {
# statement(s) will execute if the Boolean expression is true.
}
◼ If the Boolean expression evaluates to be true, then the block of
code inside the if statement will be executed.
◼ If Boolean expression evaluates to be false, then the first set of
code after the end of the if statement (after the closing curly brace)
will be executed.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example:
x <- 30L
if(is.integer(x)){
print("x is an Integer")
}

◼ When the above code is compiled and executed, it


produces the following result:

❑ [1] “x is an Integer"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – If...Else Statement
◼ An if statement can be followed by an optional else
statement which executes when the Boolean expression
is false.
Syntax
◼ The basic syntax for creating an if...else statement in R
is:
if(boolean_expression) {
# statement(s) will execute if the Boolean expression is true.
} else {
# statement(s) will execute if the Boolean expression is false.
}
◼ If the Boolean expression evaluates to be true, then the if block of code will
be executed, otherwise else block of code will be executed.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example 1:
x <- c("what","is","truth")
if("Truth" %in% x){
print("Truth is found")
} else {
print("Truth is not found")
}
◼ When the above code is compiled and executed, it
produces the following result:

[1] "Truth is not found"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example 2:

x <- -5
if(x > 0){
print("Non-negative number")
} else {
print("Negative number")
}

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
The if...else if...else Statement
◼ An if statement can be followed by an optional else
if...else statement, which is very useful to test various
conditions using single if...else if statement.
◼ When using if, else if, else statements there are few
points to keep in mind.
❑ An if can have zero or one else and it must come after
any else if's.
❑ An if can have zero to many else if's and they must

come before the else.


❑ Once an else if succeeds, none of the remaining else

if's or else's will be tested.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Syntax:
◼ The basic syntax for creating an if...else if...else statement in R is:

if(boolean_expression 1) {
# Executes when the boolean expression 1 is true.
}else if( boolean_expression 2) {
# Executes when the boolean expression 2 is true.
}else if( boolean_expression 3) {
#Executes when the boolean expression 3 is true.
}else {
# executes when none of the above condition is true.
}

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example 1
x <- c("what","is","truth")
if("Truth" %in% x){
print("Truth is found the first time")
} else if ("truth" %in% x) {
print("truth is found the second time")
} else {
print("No truth found")
}
◼ When the above code is compiled and executed, it
produces the following result:

[1] "truth is found the second time"


Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Example 2:
x <- 0
if (x < 0) {
print("Negative number")
} else if (x > 0) {
print("Positive number")
} else
print("Zero")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Switch Statement
◼ A switch statement allows a variable to be tested for equality against
a list of values.
◼ Each value is called a case, and the variable being switched on is
checked for each case.
1) Based on Index
❑ If the cases are values like a character vector, and the expression is evaluated to
a number than the expression's result is used as an index to select the case.
2) Based on Matching Value
❑ When the cases have both case value and output value like ["case_1"="value1"],
then the expression value is matched against case values. If there is a match with
the case, the corresponding value is the output.
Syntax:
◼ The basic syntax for creating a switch statement in R is :

switch(expression, case1, case2, case3....)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ The following rules apply to a switch statement:
❑ An expression type with character string always

matched to the listed cases.


❑ An expression which is not a character string then this

expression is coerced to integer.


❑ For multiple matches, the first match element will be

used.
❑ No default argument case is available there in R

switch case.
❑ An unnamed case can be used, if there is no matched

case.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example 1
x <- switch( 3, "Shubham", "Nishka", "Gunjan", "Sumit" )
print(x)
When the above code is compiled and executed, it
produces the following result:
[1] Gunjan

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example 2

y = "18"
x = switch(
y,
"9"="Hello Arpita",
"12"="Hello Vaishali",
"18"="Hello Nishka",
"21"="Hello Shubham"
)

print (x)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R – Loops
◼ There may be a situation when you need to execute a block of code
several number of times. In general, statements are executed
sequentially. The first statement in a function is executed first,
followed by the second, and so on.
◼ Programming languages provide various control structures that
allow for more complicated execution paths.
◼ A loop statement allows us to execute a statement or group of
statements multiple times and the following is the general form of a
loop statement in most of the programming languages:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ R programming language provides the following kinds of
loop to handle looping requirements. Click the following
links to check their detail.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R - Repeat Loop
◼ The Repeat loop executes the same code again and
again until a stop condition is met.
Syntax
◼ The basic syntax for creating a repeat loop in R is:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Flow diagram:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
v <- c("Hello","loop")
cnt <- 2
repeat{
print(v)
cnt <- cnt+1
if(cnt > 5){
break
}
}
◼ When the above code is compiled and executed, it produces the
following result:
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R - While Loop
◼ The While loop executes the same code again and
again until a stop condition is met.
Syntax
◼ The basic syntax for creating a while loop in R is :

while (test_expression) {
statement
}

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
v <- c("Hello","while loop")
cnt <- 2
while (cnt < 7){
print(v)
cnt = cnt + 1
}
◼ When the above code is compiled and executed, it produces the
following result :
❑ [1] "Hello" "while loop"
❑ [1] "Hello" "while loop"
❑ [1] "Hello" "while loop"
❑ [1] "Hello" "while loop"
❑ [1] "Hello" "while loop"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – For Loop
◼ A for loop is a repetition control structure that allows you
to efficiently write a loop that needs to execute a specific
number of times.
◼ R loops over the entire vector, element by
element.

Syntax

◼ The basic syntax for creating a for loop statement in R is:


for (value in vector) {
statements
} Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ R’s for loops are particularly flexible in that they are not limited to
integers, or even numbers in the input.
◼ We can pass character vectors, logical vectors, lists or expressions.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
for (val in 1: 5)
{
print(val)
}
When the above code is compiled and executed, it
produces the following result:
❑ [1] 1

❑ [1] 2

❑ [1] 3

❑ [1] 4

❑ [1] 5

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Loop Control Statements
◼ Loop control statements change execution from its
normal sequence. When execution leaves a scope, all
automatic objects that were created in that scope are
destroyed.
◼ R supports the following control statements.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Break Statement

◼ The break statement in R programming language has


the following two usages:
❑ When the break statement is encountered inside a loop, the loop
is immediately terminated and program control resumes at the
next statement following the loop.
◼ Syntax
◼ The basic syntax for creating a break statement in R is:

break

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
v <- c("Hello","loop")
cnt <- 2
repeat{
print(v)
cnt <- cnt+1
if(cnt > 5){
break
}
}
◼ When the above code is compiled and executed, it produces the
following result:
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
❑ [1] "Hello" "loop"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R – Next Statement
◼ The next statement in R programming language is useful
when we want to skip the current iteration of a loop
without terminating it.
◼ On encountering next, the R parser skips further
evaluation and starts next iteration of the loop.

◼ Syntax
next

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
v <- LETTERS[1:6]
for ( i in v){
if (i == "D"){
next
}
print(i)
}
◼ When the above code is compiled and executed, it produces the
following result:
❑ [1] "A"
❑ [1] "B"
❑ [1] "C"
❑ [1] "E"
❑ [1] "F"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Function
◼ A function is a set of statements organized together to perform a
specific task. R has a large number of in-built functions and the user
can create their own functions.
◼ In R, a function is an object so the R interpreter is able to pass
control to the function, along with arguments that may be necessary
for the function to accomplish the actions.
◼ The function in turn performs its task and returns control to the
interpreter as well as any result which may be stored in other
objects.
◼ Function Definition

◼ An R function is created by using the keyword function. The basic


syntax of an R function definition is as follows:
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Function Components
◼ The different parts of a function are:

❑ Function Name: This is the actual name of the function. It is

stored in R environment as an object with this name.


❑ Arguments: An argument is a placeholder. When a function is
invoked, you pass a value to the argument. Arguments are
optional; that is, a function may contain no arguments. Also
arguments can have default values.
❑ Function Body: The function body contains a collection of
statements that defines what the function does.
❑ Return Value: The return value of a function is the last

expression in the function body to be evaluated.


◼ R has many in-built functions which can be directly called in the
program without defining them first.
◼ We can also create and use our own functions referred as user
defined functions.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Built-in Function
◼ Simple examples of in-built functions are seq(), mean(), max(),
sum(x)and paste(...) etc.
◼ They are directly called by user written programs.
◼ # Create a sequence of numbers from 32 to 44.
◼ print(seq(32,44))
◼ # Find mean of numbers from 25 to 82.
◼ print(mean(25:82))
◼ # Find sum of numbers from 41 to 68.
◼ print(sum(41:68))

When we execute the above code, it produces the following result:


❑ [1] 32 33 34 35 36 37 38 39 40 41 42 43 44
❑ [1] 53.5
❑ [1] 1526

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
User-defined Function
◼ We can create user-defined functions in R.
◼ They are specific to what a user wants and once created
they can be used like the built-in functions.
◼ Below is an example of how a function is created and
used.
◼ # Create a function to print squares of numbers in
sequence.
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
}
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Calling a Function
# Call the function new.function supplying 6 as an
argument.
◼ new.function(6)

◼ When we execute the above code, it produces the


following result:
❑ [1] 1

❑ [1] 4

❑ [1] 9

❑ [1] 16

❑ [1] 25

❑ [1] 36

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Calling a Function without an Argument
# Create a function without an argument.
new.function <- function() {
for(i in 1:5) {
print(i^2)
}
}
# Call the function without supplying an argument.
new.function()
◼ When we execute the above code, it produces the following result:
❑ [1] 1
❑ [1] 4
❑ [1] 9
❑ [1] 16
❑ [1] 25

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Calling a Function with Argument Values (by
position and by name)
◼ The arguments to a function call can be supplied in the
same sequence as defined in the function or they can be
supplied in a different sequence but assigned to the
names of the arguments.

# Create a function with arguments.


new.function <- function(a,b,c) {
result <- a*b+c
print(result)
}

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Call the function by position of arguments.
new.function(5,3,11)
# Call the function by names of the arguments.
new.function(a=11,b=5,c=3)
◼ When we execute the above code, it produces the
following result:
◼ [1] 26

◼ [1] 58

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Calling a Function with Default Argument
◼ We can define the value of the arguments in the function definition
and call the function without supplying any argument to get the
default result. But we can also call such functions by supplying new
values of the argument and get non default result.
# Create a function with arguments.
new.function <- function(a = 3,b =6) {
result <- a*b
print(result)
}
# Call the function without giving any argument.
new.function()
# Call the function with giving new values of the argument.
new.function(9,5)
◼ When we execute the above code, it produces the following result:
❑ [1] 18

❑ [1] 45

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lazy Evaluation of Function
◼ Arguments to functions are evaluated lazily, which means so they
are evaluated only when needed by the function body.
# Create a function with arguments.
new.function <- function(a, b) {
print(a^2)
print(a)
print(b)
}
# Evaluate the function without supplying one of the arguments.
new.function(6)
◼ When we execute the above code, it produces the following
result:
◼ [1] 36

◼ [1] 6

◼ Error in print(b) : argument "b" is missing, with no default

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lab Program 2

◼ 2) Write a R program for the following


a. Check the number is even or odd.
b. Print squares of numbers in sequence.
c. Create vector of integers and sort them in
ascending/descending order.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Solution:
(a) Check the number is even or odd
even_odd <-function(a ) #Function Definition
{
if(a %% 2 == 0)
{
print("The number is Even Number")
}
else
{
print("The number is Odd Number")
}
}

print("Enter the number to be checked")


a <-scan(nlines=1) #Scan function allows to read
even_odd(a) #Function Call

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(b) Print squares of numbers in sequence.

sqr <- function (n)


{
print ("The Square of Numbers is:")
for(i in 0:n)
print(i^2)
}

print("Enter the Range:")


n <- scan(nlines=1)
sqr(n)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(c) Create vector of integers and sort them in
ascending/descending order.

srt<-function(a){
v<-sort(a, decreasing = TRUE)
print("DESCENDING ORDER")
print(v)
x<-sort(a, decreasing = FALSE)
print("ASCENDING ORDER")
print(x)
}

a <-scan(nlines=6)
srt(a)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Strings
◼ Any value written within a pair of single quote or double quotes in R is treated as a
string. Internally R stores every string within double quotes, even when you create
them with single quote.

◼ Rules Applied in String Construction


❑ The quotes at the beginning and end of a string should be both double quotes or
both single quote. They can not be mixed.

❑ Double quotes can be inserted into a string starting and ending with single quote.

❑ Single quote can be inserted into a string starting and ending with double quotes.

❑ Double quotes can not be inserted into a string starting and ending with double
quotes.

❑ Single quote can not be inserted into a string starting and ending with single
quote.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Examples of Valid Strings
◼ a<- 'Start and end with single quote'
◼ print(a)

◼ b <- "Start and end with double quotes"

◼ print(b)

◼ c <- "single quote ' in between double quotes"

◼ print(c)

◼ d <- 'Double quotes " in between single quote'

◼ print(d)

When the above code is run we get the following output:


◼ [1] "Start and end with single quote"

◼ [1] "Start and end with double quotes"

◼ [1] "single quote ' in between double quote"

◼ [1] "Double quote \" in between single quote"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Examples of Invalid Strings
◼ e <- 'Mixed quotes"
◼ print(e)
◼ f <- 'Single quote ' inside single quote'
◼ print(f)
◼ g <- "Double quotes " inside double quotes"
◼ print(g)
◼ When we run the script it fails giving below results.
◼ ...: unexpected INCOMPLETE_STRING
◼ .... unexpected symbol
◼ 1: f <- 'Single quote ' inside
◼ unexpected symbol
◼ 1: g <- "Double quotes " inside
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
String Manipulation
Concatenating Strings - paste() function
◼ Many strings in R are combined using the paste()
function. It can take any number of arguments to be
combined together.
◼ The basic syntax for paste function is :

◼ paste(..., sep = " ", collapse = NULL)


◼ Following is the description of the parameters used:
❑ ... represents any number of arguments to be combined.
❑ sep represents any separator between the arguments. It is
optional.
❑ collapse is used to eliminate the space in between two strings.
But not the space within two words of one string.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
a <- "Hello"
b <- 'How'
c <- "are you? "
print(paste(a,b,c))
print(paste(a,b,c, sep = "-"))
print(paste(a,b,c, sep = "", collapse = ""))
◼ When we execute the above code, it produces the
following result:
❑ [1] "Hello How are you? "
❑ [1] "Hello-How-are you? "
❑ [1] "HelloHoware you? "

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ sep allows each terms to be separated by a character string, whereas
collapse allows the entire result to be separated by a character
string.
whales <- c("C","D","C","D","D")
quails <- c("D","D","D","D","D")
paste(whales, quails)
paste(whales, quails, sep = "")
paste(whales, quails, sep = " ")
paste(whales, quails, sep = "", collapse = "")
paste(whales, quails, sep = "", collapse = " ")
paste(whales, quails, sep = " ", collapse = " ")
When we execute the above code, it produces the following result:
❑ [1] "C D" "D D" "C D" "D D" "D D"
❑ [1] "CD" "DD" "CD" "DD" "DD"
❑ [1] "C D" "D D" "C D" "D D" "D D"
❑ [1] "CDDDCDDDDD"
❑ [1] "CD DD CD DD DD“
❑ [1] "C D D D C D D D D D"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Counting number of characters in a string -
nchar() function
◼ This function counts the number of characters including spaces in a
string.
Syntax
◼ The basic syntax for nchar() function is :

◼ nchar(x)

◼ Following is the description of the parameters used:


❑ x is the vector input.
◼ Example
◼ result <- nchar("Count the number of characters")
◼ print(result)
◼ When we execute the above code, it produces the following
result:
◼ [1] 30
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Changing the case - toupper() & tolower()
functions
◼ These functions change the case of characters of a
string.
Syntax
◼ The basic syntax for toupper() & tolower() function is :

◼ toupper(x)

◼ tolower(x)

◼ Following is the description of the parameters used:


❑ x is the vector input.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
# Changing to Upper case.
result <- toupper("Changing To Upper")
print(result)
# Changing to lower case.
result <- tolower("Changing To Lower")
print(result)
When we execute the above code, it produces the
following result:
◼ [1] "CHANGING TO UPPER"

◼ [1] "changing to lower"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Extracting parts of a string - substring() function
◼ This function extracts parts of a String.
Syntax
◼ The basic syntax for substring() function is :
❑ substring(x,first,last)

◼ Following is the description of the parameters used:


❑ x is the character vector input.
❑ first is the position of the first character to be extracted.
❑ last is the position of the last character to be extracted.

◼ Example
# Extract characters from 5th to 7th position.
result <- substring("Extract", 5, 7)
print(result)
◼ When we execute the above code, it produces the following result:
◼ [1] "act"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lab Program 3

◼ 3) Create R program and perform the


following operations on them.
a. Take user name as input string, display the
number characters present in the string, convert
the string into uppercase and display the middle
character of the string.
b. Create function called is_palindrome() that
determines whether or not a given string is a
palindrome. The function should take a single
parameter.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ a) Take user name as input string, display the number
characters present in the string, convert the string into
uppercase and display the middle character of the string.
midf<-function(str)
{
print(nchar(str))
print(toupper(str))
n1<-nchar(str)+1
mc<-substring(str,n1%/%2,(n1+1)%/%2)
print(mc)
}
name<- readline("Enter your name:")
midf(name)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(b) Create function called is_palindrome() that determines
whether or not a given string is a palindrome. The function
should take a single parameter.

is_palindrome <- function(x){


a <- substring(x,seq(1,nchar(x)) , seq(1,nchar(x)))
paste(rev(a),sep="",collapse="") == paste(a,sep="",collapse="")
}

str<- readline("Enter string:")


print(is_palindrome(str))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – Data Frames
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Print the data frame.


print(emp.data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Get the Structure of the Data Frame
◼ The structure of the data frame can be seen by using
str() function.

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Get the structure of the data frame.


str(emp.data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Summary of Data in Data Frame
◼ The statistical summary and nature of the data can be
obtained by applying summary() function.

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Print the summary.


print(summary(emp.data))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ When we execute the above code, it produces the
following result:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Extract Data from Data Frame
◼ Extract specific column from a data frame using column
name.

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Extract Specific columns.


result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Extract the first two rows and then all columns

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Extract first two rows.


result <- emp.data[1:2,]
print(result)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Extract 3rd and 5th row with 2nd and 4th column

# Create the data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")))

# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)]
print(result)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Expand Data Frame

◼ A data frame can be expanded by adding columns and


rows.

◼ Add Column
❑ Just add the column vector using a new column name.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-
11","2015-03-27")))

# Add the "dept" coulmn.


emp.data$dept <- c("IT","Operations","IT","HR","Finance")
emp.data
print(emp.data)
Or
emp.data <- cbind(emp.data, dept=c("IT","Operations","IT","HR","Finance"))
print(emp.data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Add Row
◼ To add more rows permanently to an existing data frame, we need
to bring in the new rows in the same structure as the existing data
frame and use the rbind() function.
◼ In the example below we create a data frame with new rows and
merge it with the existing data frame to create the final data frame.

# Create the first data frame.


emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-
05-11","2015-03-27")),
dept=c("IT","Operations","IT","HR","Finance"))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Create the second data frame
emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"))

# Bind the two data frames.


emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Merging Data Frames
◼ We can merge two data frames by using the merge()
function.
◼ The R merge function allows merging two data frames
by common columns or by row names.
◼ The data frames must have same column names on
which the merging happens.
◼ R merge data frames
❑ Inner join
❑ Full (outer) join
❑ Left (outer) join in R
❑ Right (outer) join in R
❑ Cross join

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Inner join
◼ An inner join (actually a natural join), is the most usual
join of data sets that you can perform.
◼ It consists on merging two data frames in one that
contains the common elements of both, as described in
the following illustration:

◼ dataframe_AB = merge(dataframe_A, dataframe_B,


by="ID")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
R – Packages
◼ R packages are a collection of R functions, complied
code and sample data.
◼ They are stored under a directory called "library" in the
R environment.
◼ By default, R installs a set of packages during
installation.
◼ More packages are added later, when they are needed
for some specific purpose.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Check Available R Packages
◼ Get library locations containing R packages

◼ .libPaths()

Get the list of all the packages installed


◼ library()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Install a New Package

◼ There are two ways to add new R packages.


❑ One is installing directly from the CRAN directory
and
❑ Another is downloading the package to your local
system and installing it manually.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R Package:
https://cran.r-project.org/web/packages/available_packages_by_name.html
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Load Package to Library

◼ Before a package can be used in the code, it must be


loaded to the current R environment.
◼ You also need to load a package that is already installed
previously but not available in the current environment.

◼ library(package Name)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
R – CSV Files
◼ In R, we can read data from files stored outside the R
environment.
◼ We can also write data into files which will be stored and
accessed by the operating system.
◼ R can read and write into various file formats like csv,
excel, xml etc.
◼ We will learn to read data from a csv file and then write
data into a csv file.
◼ The file should be present in current working directory so
that R can read it.
◼ Of course, we can also set our own directory and read
files from there.

ANISHA P RODRIGUES, DEPT OF


2024/9/21 213
CSE, NMAMIT, NITTE
Getting and Setting the Working Directory

# Get and print current working directory.


print(getwd())

# Set current working directory.


setwd("/home/exam/R/workd")
or
setwd("./R/workd")

# Get and print current working directory.


print(getwd())

ANISHA P RODRIGUES, DEPT OF


2024/9/21 214
CSE, NMAMIT, NITTE
Input as CSV File
◼ The csv file is a text file in which the values in the columns are separated by
a comma.
◼ Let's consider the following data present in the file named input.csv.
◼ You can create this file using windows notepad by copying and pasting this
data. Save the file as input.csv using the save As All files(*.*) option in
notepad.

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Michelle,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
,Gary,843.25,2015-03-27,Finance
6,Nina,578,2013-05-21,IT
7,Simon,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

ANISHA P RODRIGUES, DEPT OF


2024/9/21 215
CSE, NMAMIT, NITTE
Writing into a CSV File
◼ R can create csv file form existing data frame. The
write.csv() function is used to create the csv file. This
file gets created in the working directory.
# Create a data frame.
data <- read.csv("input.csv")
# Write filtered data into a new file.
write.csv(data,"output.csv")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Aggregating and analyzing data with
dplyr
◼ dplyr is a package for making data manipulation easier.
◼ Packages in R are basically sets of additional functions
that let you do more stuff in R.
❑ Apply common dplyr functions to manipulate data in R.
❑ Employ the ‘pipe’ operator to link together a sequence of
functions.

◼ install.packages("dplyr") ## install
◼ library("dplyr") ## load

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Some of the most common dplyr functions: select(),
filter(), mutate(), group_by(), and summarize()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Pipe Infix Operator %>%
◼ All verbs in dplyr package take data.frame as a first argument.
◼ When we use dplyr package, we mostly use the infix or pipe
operator %>% in R via the magrittr package installed as part of
dplyr, it passes the left-hand side of the operator to the first
argument of the right-hand side of the operator.
◼ For example,
❑ x %>% f(y) converted into f(x, y) so the result from left-hand side
is then “piped” into the right-hand side.
◼ This pipe can be used to write multiple operations that you can read
left-to-right.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ We can write exp(1) with pipes as 1 %>% exp
◼ log(exp(1)) as 1 %>% exp %>% log

◼ exp(1)
◼ ## [1] 2.718282
◼ 1 %>% exp
◼ ## [1] 2.718282
◼ 1 %>% exp %>% log
◼ ## [1] 1

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
How to read pipes: multiple arguments
◼ Now for multi-arguments functions, we interpret
something like:
❑ x %>% f(y) as f(x,y)

◼ Simple example
◼ mtcars %>% head(4)

◼ And what’s the “old school” (base R) way?

◼ head(mtcars, 4)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
dplyr::filter() Examples

◼ By using dplyr filter() function you can filter the R data


frame rows by name, filter dataframe by column value,
by multiple conditions e.t.c.

◼ Here, %>% is an infix operator which acts as a pipe,


it passes the left-hand side of the operator to the
first argument of the right-hand side of the operator.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example

# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M','F','F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ # Load dplyr library
◼ library('dplyr')

◼ # filter() by row name


◼ df %>% filter(rownames(df) == 'r3')

◼ # filter() by column Value


◼ df %>% filter(gender == 'M')

◼ # filter() by list of values


◼ df %>% filter(state %in% c("CA", "AZ", "PH"))

◼ # filter() by multiple conditions


◼ df %>% filter(gender == 'M' & id > 15)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
dplyr::select() Examples
◼ dplyr select() function is used to select the columns or
variables from the data frame.

◼ This takes the first argument as the data frame and the
second argument is the variable name or vector of
variable names.
◼ # select() single column
◼ df %>% select('id')
◼ # select() multiple columns
◼ df %>% select(c('id','name'))
◼ # Select multiple columns by id
◼ df %>% select(c(1,2))
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
dplyr::mutate()

◼ Use mutate() function and its other verbs mutate_all(),


mutate_if() and mutate_at() from dplyr package to
replace/update the values of the column (string, integer,
or any type) in R DataFrame (data.frame).

◼ library(stringr)
# Replace on selected column
df %>%
mutate(name = str_replace(name, "sai", "SaiRam"))

#mutate(df, totalMarks = Math + Eng)


Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
dplyr::group_by()
◼ group_by() function in R is used to group the rows in a DataFrame
by single or multiple columns and perform the aggregations.
◼ # Read CSV file into DataFrame
◼ df = read.csv(‘emp.csv')
◼ df

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
summarize() function
◼ The summarize() function offers the summary that is
based on the action done on grouped or ungrouped
data.
Summarize grouped data
◼ The summarize() function takes the grouped
dataframe/table as input and performs the summarize
functions.
◼ There are several aggregation functions you can use
with summarize().
◼ All these functions are used to calculate aggregations on
grouped data.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
library(dplyr)

# group_by() on department
grp_tbl <- df %>% group_by(department)
grp_tbl

# summarise on groupped data.


agg_tbl <- grp_tbl %>%
summarise(sum(salary))
agg_tbl
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Lab Program 6

◼ 6) Program for Data Manipulation:


◼ a. Read multiple datasets from different files or sources.
◼ b. Merge or join the datasets based on common variables
or keys.
◼ c. Perform aggregation operations, such as calculating
sums, means, or counts, by groups or categories.
◼ d. Filter the data based on specific conditions or criteria.
◼ e. Create new variables or transform existing variables
using functions or mathematical operations.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Solution:

(a) Read multiple datasets from different files or sources.

library(dplyr)
# Read datasets from different files or sources
dataset1 <- read.csv("emp.csv")
print(dataset1)
dataset2 <- read.csv("salary.csv")
print(dataset2)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(b) Merge or join datasets based on common variables or
keys
merged_data <- merge(dataset1, dataset2, by = "ID")

print("Merged Data:")

print(merged_data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(c) Perform aggregation operations, such as calculating sums, means, or
counts, by groups or categories.

aggregated_data <- merged_data %>%

group_by(Gender) %>%
summarise(
total_salary = sum(Salary),
average_age = mean(Age),
count = n()
)
print("Aggregated Data:")
print(aggregated_data )

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(d) Filter the data based on specific conditions or criteria

filtered_data <- merged_data %>%filter(Age > 25)

print("Filtered Data:")

print(filtered_data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(e) Create new variables or transform existing variables using
functions or mathematical operations

transformed_data <- merged_data %>%

mutate(
doubled_salary = Salary * 2,
seniority = ifelse(Age > 28, "Senior", "Junior")
)
print("Transformed Data:")
print(transformed_data)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Web Scraping with R
◼ HTML(Hyper Text Markup Language) : the standard markup language for creating
web pages and web applications.
◼ Contents : Headings, Paragraphs, Lists
◼ Stucture:
<html>
<head>
<title> .... </title>
</head>
<body>
<h1> ....</h1>
<p> ........ </p>
<li> .... </li>
</body>
</html>
◼ CSS(Cascading Style Sheets) : A style sheet language used for describing the
presentation of a document written in markup language like HTML.
❑ Presentation : Font, Color, Backgroud color, Border

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
HTML Overview
◼ HTML (HyperText Markup Language) is designed to be
easily machine-readable and parsable.
◼ In other words, HTML follows a tree-like structure of
nodes and their attributes, which we can easily navigate
programmatically.
◼ Let's start off with a small example page and illustrate its
structure

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ In this basic example of a simple web page, we can see
that the document already resembles a data tree.
◼ Let's go a bit further and illustrate this:
HTML tree is made of nodes which can contain
attributes such as classes, ids and text itself.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
rvest package
◼ rvest helps you scrape (or harvest) data from web
pages.
◼ To convert a website into an XML object, you use the
read_html() function.
◼ You need to supply a target URL and the function calls
the webserver, collects the data, and parses it.
◼ To extract the relevant nodes from the XML object you
use html_nodes(), whose argument is the class
descriptor, prepended by a

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Important Functions in rvest Package

The basic functions in rvest package are listed below –

◼ read_html() : reads html document from a URL


◼ html_nodes():find HTML tags (nodes) using CSS selectors or
XPath expressions.
◼ html_elements() : extracts pieces out of HTML documents.
◼ html_elements(".class") : calls node based on CSS class
◼ html_elements("#id") : calls node based on id
◼ html_text() : extracts only the text from HTML tag
◼ html_attr() : extracts contents of a single attribute
◼ html_table() : extracts table from a website

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
read_html()
◼ The web scraping process starts with read_html()
◼ This returns a xml_document object which you’ll then
manipulate using rvest functions:

library(rvest)
Wiki_page<- read_html("https://www.wikipedia.org/")
class(Wiki_page)
Wiki_page
Output:
[1] "xml_document" "xml_node"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Parsing HTML content
◼ As we saw in the last chunk of code, html_page contains
the raw HTML code, which is not so easily readable.
◼ In order to make it readable from R it has to be parsed,
which means generating a Document Object Model
(DOM) from the raw HTML.
◼ DOM is what connects scripts and web pages by
representing the structure of a document in memory.

◼ rvest provides 2 ways to select HTML elements:


❑ XPath
❑ CSS selectors

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Selecting elements with rvest is simple, for XPath we
use the following syntax:

Wiki_page %>% html_nodes(xpath = "")

◼ while for CSS elector we need:


Wiki_page %>% html_nodes(css = "")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# same way to load the title
url_title1 <- html_nodes(Wiki_page, xpath =
"/html/head/title") %>% html_text() # using xpath
print(url_title1)

url_title2 <- html_nodes(Wiki_page,css = "title")


%>%html_text() # Using CSS

print(url_title1)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
A Simple Way the get Xpath and CSS Selector

◼ To scrape data from a webpage, you first have to identify


the tag and attribute combinations that you are interested
in gathering.
◼ For Xpath:
❑ Rightclick+ inspector mode -> right click on element
of interest -> copy -> copy XPath.

◼ For CSS selector:


❑ Rightclick+ inspector mode -> right click on element
of interest -> copy -> copy selctor.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example:
◼ Extracting language names from the wikipedia
website.
Wiki_page <- read_html("https://www.wikipedia.org/")
Wiki_page %>% html_elements(".central-featured strong") %>%
html_text()

# Output
# [1] "English" "Español" "Русский" "日本語" "Deutsch" "Français"
# [7] "Italiano" "中文" "" "‫فارسی‬Português“

◼ ".central-featured strong" is basically saying "Show me all the bold


text within the section of the webpage that has the 'central-featured'
css class. Bold text is defined by the <strong> tag.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Extracting Links
◼ Let's understand the sample HTML of a link -
<a href="https://www.example.com">Sample Link</a>

◼ Link is defined by <a> html tag with an href attribute pointing to


"https://www.example.com" and the link name is "Sample Link".
◼ You can fetch details of html attribute by using html_attr() function.

◼ In the code below, we are pulling "href" of "a" tag.

Wiki_page <- read_html("https://www.wikipedia.org/")


Wiki_page %>% html_elements(".central-featured a") %>%
html_attr("href")
◼ # Output

# [1] "//en.wikipedia.org/" "//es.wikipedia.org/" "//ru.wikipedia.org/"


# [4] "//ja.wikipedia.org/" "//de.wikipedia.org/" "//fr.wikipedia.org/"
# [7] "//it.wikipedia.org/" "//zh.wikipedia.org/" "//fa.wikipedia.org/"
# [10] "//pt.wikipedia.org/"
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Xpath(XML path Language) Syntax
Overview
◼ Xpath selectors are usually referred to as "xpaths" and
a single xpath indicates a destination from the root to the
desired endpoint.
◼ Average xpath selector in web scraping often looks
something like this:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ In this example, XPath would select href attribute of an
<a> node that has a class "button" which is also
directly under <div> node:

<div>
<a class="button" href="http://scrapfly.io">ScrapFly</a>
</div>

◼ //div/a[contains(@class, "button")]/@href

◼ output:
❑ http://scrapfly.io
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
<div>
<p class="socials">
Follow us on
<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>
</p>
</div>

div/p/a

output:

<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Here, our simple xpath simply describes a path from the
root to the a node.
◼ All we used is / direct child syntax, however with big
documents direct xpaths are often unreliable as any
changes to the tree structure or order will break our path

Refer link for more details:


◼ https://scrapfly.io/blog/parsing-html-with-xpath/

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example

<div>
<p class="socials">
Follow us on
<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>
</p>
</div>

/p[@class='socials’]/a

output:
<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Further, we can combine constrains option ([]) with value
testing functions such as contains() to make our xpath
even more reliable:

<div> Using XPath contains() text function, we


can filter out any results that don't contain
<p class="socials"> a piece of text.
Follow us on
<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>
</p>
</div>

//p[@class='socials']/a[contains(@href, 'twitter.com’)]

output:
<a href="https://twitter.com/@scrapfly_dev">Twitter!</a>
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Lab Program 7

◼ 7) Program for Web Scraping and Data Extraction:


◼ Use R packages like rvest to scrape data from a specific website.
◼ a. Define the target website and specify the data to be
extracted.
◼ b. Retrieve the HTML content from the website.
◼ c. Parse and extract the desired data using CSS selectors or
Xpath parsing techniques.
◼ d. Save the extracted data to a file or perform further
analysis on it.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Solution:

◼ a. Define the target website and specify the data to


be extracted.
# Load necessary libraries
library(rvest)

# Specify the URL of the website to scrape


url <- "http://books.toscrape.com/"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(b) Retrieve the HTML content from the website.

# Download the HTML content


html_content <- read_html(url)

# Define XPath selectors to extract data


title_xpath <- '//*[@class="product_pod"]/h3/a'
price_xpath <- '//*[@class="product_pod"]/div[2]/p[1]’

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(c) Parse and extract the desired data using CSS selectors or
Xpath parsing techniques.

# Extract data using XPath selectors


titles <- html_content %>% html_nodes(xpath = title_xpath) %>%
html_text() %>%trimws()

# Remove leading/trailing whitespaces with trimws()

prices <- html_content %>% html_nodes(xpath = price_xpath)


%>% html_text() %>%trimws()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
trimws() Function
◼ trimws() function in R Language is used to
trim the leading white spaces.

◼ Syntax:
❑ trimws(x)

❑ Parameters:
◼ x: Object or character string

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# R program to illustrate
# Removing of leading whitespaces

# Create example character string


my_text <- " Geeks_For_Geeks "

# Apply trimws function in R


print(trimws(my_text))

Output:

"Geeks_For_Geeks "

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ (d) Save the extracted data to a file or perform
further analysis on it.
# Combine the extracted data into a data frame

book_data <- data.frame(Title = titles, Price = prices)

# Print the extracted data print(book_data)


# Save the extracted data to a CSV file
write.csv(book_data, "book_data.csv", row.names = FALSE)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
ggplot2 Package
◼ ggplot2 is a plotting package that provides helpful commands to
create complex plots from data in a data frame.
◼ It provides a more programmatic interface for specifying what
variables to plot, how they are displayed, and general visual
properties. The “gg” in these names reflects the “grammar of
graphics” used to construct the figures.
◼ The concept behind ggplot2 divides plot into three different
fundamental parts:
◼ Plot = data + Aesthetics + Geometry.

◼ The principal components of every plot can be defined as follow:


❑ data is a data frame
❑ Aesthetics is used to indicate x and y variables. It can also be used to control the
color, the size or the shape of points, the height of bars, …..
❑ Geometry defines the type of graphics (histogram, box plot, line plot, density plot,
dot plot, ….)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ ggplot() function.

❑ ggplot() function is more flexible and robust for


building a plot piece by piece.
If we talk about the layers concept in ggplot2,
there are four primary layers:
◼ Data: Data or subset of a dataset that has been
used to create plots.
◼ Aesthetics: The mappings of the variables in the
plot.
◼ Geometrics: The geom function used to represent
data points.
◼ Theme: Different visual styles for the plot.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Installation

◼ install.packages("ggplot2")

◼ library(ggplot2)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ To build a ggplot, we will use the following basic
template that can be used for different types of plots:

◼ ggplot(data = <DATA>, mapping =


aes(<MAPPINGS>)) + <GEOM_FUNCTION>()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Example
ggplot(data = <DATA>) is the first layer.
◼ Use the mpg dataset, which is included in the ggplot2
library and will be available when you load it.
◼ Below are the columns of the mpg dataset.

data(mpg)
colnames(mpg)

◼ "manufacturer" "model" "displ" "year" "cyl" "trans"


"drv" "cty" "hwy" "fl" "class"

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
str(mpg)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
aesthetic mapping
◼ define an aesthetic mapping (using the aesthetic (aes)
function), by selecting the variables to be plotted and
specifying how to present them in the graph,

❑ e.g., as x/y positions or characteristics such as size, shape, color,


etc.

◼ ggplot(mpg, aes(displ, hwy, colour = class))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
geoms
◼ add ‘geoms’ – graphical representations of the data in
the plot (points, lines, bars).
◼ ggplot2 offers many different geoms; we will use some
common ones today, including:

❑ geom_point() for scatter plots, dot plots, etc.


❑ geom_boxplot() for, well, boxplots
❑ geom_line() for trend lines, time series, etc.
◼ To add a geom to the plot use + operator.
◼ Because we have two continuous variables, let’s use
geom_point() first:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Scatter Plot
◼ geom_point() :adds a layer of points to your plot, which creates a
scatterplot.
❑ ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()
or
❑ ggplot(mpg, aes(displ, hwy, col = class)) + geom_point()
❑ or
❑ ggplot(mpg, aes(displ, hwy, color = class)) + geom_point()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Adding labels using labs
◼ ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()+
labs(title = "Highway mileage vs Displacement ", x = "
Displacement ", y = "highway mileage")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
ggplot(mpg, aes(displ, hwy, size = class)) + geom_point()
Or
ggplot(data = mpg) +geom_point(mapping = aes(x =
displ, y = hwy, size = class))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Or we could have mapped class to the alpha aesthetic,
which controls the transparency of the points, or the
shape of the points:
ggplot(mpg, aes(displ, hwy, alpha = class)) + geom_point()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ ggplot(mpg, aes(displ, hwy, shape = class)) +
geom_point()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
ggplot(data = mpg, aes(x = displ, y = hwy, color = factor(cyl))) +geom_point()
+
labs(title = "Highway mileage vs Displacement", x = " Displacement ", y =
"highway mileage")
Highway mileage vs Displacement

40
highway mileage

factor(cyl)
30
4
5
6
8

20

2 3 4 5 6 7
Dr.Anisha P Rodrigues, Dept of
Displacement
9/21/2024 CSE,NMAMIT, Nitte
Change colors manually
A custom color palettes can be specified using the
functions :
scale_color_manual()
◼ These functions allow you to specify your own set of
mappings from levels in the data to aesthetic values.

◼ scale_colour_manual(values = c("red", "blue", "green"))

◼ # It's recommended to use a named vector


◼ scale_colour_manual(values = c(“4" = "red", “5" = "blue",
"6" = "darkgreen", “8" = "orange"))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ ggplot(data = mpg, aes(x = displ, y = hwy, color = factor(cyl)))
+geom_point() + labs(title = "Highway mileage vs Displacement", x = "
Displacement ", y = "highway mileage")+ scale_color_manual(values = c(
"red", "blue", "yellow", "green"))

Highway mileage vs Displacement

40
highway mileage

factor(cyl)
30
4
5
6
8

20

2 3 4 5 6 7
Displacement

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ # Adding shape and color
ggplot(data = mpg, aes(x = displ, y = hwy, col = factor(cyl),
shape = factor(year))) +geom_point() +
labs(title = "Highway mileage vs Displacement", x = " Displacement
", y = "highway mileage")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Bar Plot
geom_bar()
◼ The main function for creating bar plots or bar charts in
ggplot2 is geom_bar. By default, this function counts the
number of occurrences for each level of a categorical
variable.
◼ ggplot(mpg, aes(drv)) +geom_bar()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ ggplot(mpg, aes(class, fill = drv)) +geom_bar()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
## Side by side
ggplot(mpg, aes(class, fill = drv)) +geom_bar(position = "dodge")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
geom_bar(position = "dodge")
◼ Dodging preserves the vertical position of an geom while
adjusting the horizontal position.

scale_fill_manual()
◼ These functions allow you to specify your own set of
mappings from levels in the data to aesthetic values.
◼ Use values to set the colors used for the levels in the
class column

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
## Side by side
ggplot(mpg, aes(class, fill = drv)) +geom_bar(position =
"dodge")+scale_fill_manual(values = c("4" = "red", "f" = "blue","0" =
"red", "r" = "green"))
50

40

30
drv
count

4
f
r

20

10

2s eater com pact m ids ize m inivan pickup s ubcom pact s uv


class

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Density plot
geom_density()
◼ A density plot is a type of plot that uses a single smooth curve to
help us visualize the distribution of values in a dataset.
ggplot(df, aes(x=displ)) + geom_density()

0.3

0.2
density

0.1

0.0

2 3 4 5 6 7
displ

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ You can also supply the following arguments to the
geom_density() function to customize the appearance
of the plot:

◼ color: The outline color to use for the density curve


◼ fill: The fill color to use inside the density curve
◼ alpha: The transparency of the fill color (0 = invisible, 1
= fully visible)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ ggplot(mpg, aes(x=displ)) + geom_density(fill='red',
color='blue', alpha=0.7)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
2D histogram
stat_bin_2d():
◼ This function calculates the count of observations that
fall into 2D bins, and then it maps that count to a color
scale.

◼ Creates a two-dimensional histogram in R. It divides the


data into bins along both the x and y axes, similar to how
a regular histogram divides data along a single axis.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Parameters used in stat_bin_2d():

◼ bins = 10
❑ The bins argument specifies the number of bins along each axis
(both x and y). Here, the data will be divided into 10 bins along
the x-axis and 10 bins along the y-axis, forming a 10x10 grid.
◼ aes(fill = ..count..):
❑ This is an aesthetic mapping that specifies how the color of each
bin will be determined. ..count.. refers to the number of data
points in each bin.
◼ The fill aesthetic applies a color based on the count, so the
more data points in a bin, the darker or more intense the color
(depending on the scale used).
❑ The special variables in ggplot with double periods around them
(..count.., ..density.., etc.) are returned by a stat transformation
of the original data set.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ scale_fill_continuous():
❑ Continuous and binned colour scales
❑ The scales scale_colour_continuous() and
scale_fill_continuous() are the default colour
scales ggplot2 uses when continuous data values
are mapped onto the colour or fill aesthetics,
respectively.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Complete themes
ggplot2 comes with a number of built-in themes.
theme_minimal()
◼ A minimalistic theme with no background annotations.

theme_classic()
◼ A classic-looking theme, with x and y axis lines and no
gridlines.
theme_dark()
◼ The dark cousin of theme_light(), with similar line sizes
but a dark background. Useful to make thin coloured
lines pop out.
theme_void()
◼ A completely empty theme.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Saving the plot
# 1. Create a plot: displayed on the screen (by default)
ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()
#2.Save the plot to a pdf
ggsave("myplot.pdf")
# 2. OR save it to png file
ggsave("myplot.png")
or
myplot<-ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()
ggsave("myplot.pdf“,myplot)
or
myplot<-ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()
png("myplot.png")
print(myplot)
dev.off()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ # Create some plots
library(ggplot2)
myplot1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point()
myplot2 <- ggplot(iris, aes(Species, Sepal.Length)) +
geom_boxplot()

# Print plots to a pdf file


pdf("ggplot.pdf")
print(myplot1) # Plot 1 --> in the first page of PDF
print(myplot2) # Plot 2 ---> in the second page of the PDF
dev.off()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Plotly in R
◼ The Plotly package helps create interactive and intuitive
plots and graphs.
◼ It also provides the ability to embed these graphs in web
pages and save them on your computers.
◼ It is used extensively along with the ggplot package to
make complex, intricate, and attractive data
visualizations.
◼ Plotly provides a package called ggplotly, which helps
convert ggplot charts and plots into interactive plots
and graphs.
❑ Zoom by selecting an area of interest
❑ Hover the line to get exact time and value
❑ Export to png
❑ Slide axis
❑ Double click to re-initialize.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Example
p <- ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()

However, the plot is not interactive.

9/21/2024 299
◼We use the ggplotly function to make it
interactive and pass the plot to the
function as an argument. Ggplotly
provides options like zoom-in, zoom-out,
lasso-select, etc.
ggplotly(p)

9/21/2024 300
9/21/2024 301
Lab Program 4

◼ 4) Program for Data Visualization:


◼ Use packages like ggplot2 or plot to create various types of
charts, such as bar charts, density plots, scatter plots, or
histograms.
◼ a. Read a dataset from a CSV file or other data sources.
◼ b. Customize the charts by adding labels, titles, changing
the colors and themes.
◼ c. Create interactive visualizations with tooltips,
zooming, or filtering options.
◼ d. Export the visualizations to different file formats

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ install.packages("titanic")

◼ titanic is an R package containing data sets is a


collection of information about passengers on the
Titanic, including their survival status, age, gender,
class, and more
◼ library(titanic)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dataset information
Titanic dataset
◼ 1309 passengers (891 for train & 418 for test) on the
Titanic
◼ titanic_train

◼ titanic_test

◼ training set: (891, 12)


◼ test set: (418, 11) - no "Survived" column
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
Solution:
◼ (a) Read a dataset from a CSV file or other data sources.

# Load necessary packages


library(ggplot2)
library(plotly)
library(titanic)
# Read Titanic dataset
titanic <- read.csv("train.csv")
Or
titanic=titanic_train
str(titanic) # here ‘Survived’ is an integer type
# Converting into to factors
titanic$Survived <- factor(titanic$Survived)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
(b) Customize the charts by adding labels, titles, changing the colors
and themes.

# Customise the charts


# Bar chart showing count of survivors by passenger class

bar_plot<- ggplot(titanic, aes(x = Pclass, fill = Survived)) + geom_bar(position =


"dodge") +
labs(title = "Survivors by Passenger Class", x = "Passenger Class",
y = "Count",
fill = "Survived") +
scale_fill_manual(values = c("0" = "red", "1" = "blue"))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Survivors by Passenger Class

300

200 Survived
Count

0
1

100

1 2 3
Passenger Class

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Density plot showing age distribution of passengers
density_plot <- ggplot(titanic, aes(x = Age)) +
geom_density(fill = "blue", alpha = 0.5) +
labs(title = "Age Distribution of Passengers", x = "Age",
y = "Density")

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
# Scatter plot showing fare vs age with color indicating
survival status
scatter_plot <- ggplot(titanic, aes(x = Age, y = Fare, color =
Survived)) + geom_point() +
labs(title = "Fare vs Age", x = "Age",
y = "Fare") +
scale_color_manual(values = c("0" = "red", "1" = "blue"))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Fare vs Age

500

400

300
Survived
Fare

0
1

200

100

0 20 40 60 80
Age

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
#2D histogram
hist_plot <- ggplot(titanic, aes(x = Pclass, y = Survived)) +
stat_bin_2d(bins = 10, aes(fill = ..count..)) +
labs(title = "Titanic Survival Heatmap", x = "Pclass", y =
"Survived") + scale_fill_continuous(name = "Frequency", low =
"white", high = "blue") +theme_minimal()

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Titanic Survival Heatmap

Frequency
Survived

300

200

100

1.0 1.5 2.0 2.5 3.0


Pclass

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ c. Create interactive visualizations with tooltips, zooming, or
filtering options.
library(plotly) # Interactive bar chart
interactive_bar_plot <- ggplotly(bar_plot)

# Interactive line plot


interactive_density_plot <- ggplotly(density_plot)

# Interactive scatter plot


interactive_scatter_plot <- ggplotly(scatter_plot)

#interactive heat map


interactive_hist_plot <- ggplotly(hist_plot)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ d. Export the visualizations to different file formats
# Export visualizations to png files
ggsave("bar_plot.png", plot = bar_plot, width = 8, height =
6)
ggsave(“density_plot.png", plot = density_plot, width = 8,
height = 6)
ggsave("scatter_plot.png", plot = scatter_plot, width = 8,
height = 6)
ggsave(“hist_plot.png", plot = hist_plot, width = 8, height =
6)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Lab Program 5

◼ 5) Program for Data Analysis:


◼ a. Read a dataset from a CSV file and perform
exploratory data analysis, including summary statistics,
and identifying missing values.
◼ b. Data cleaning by removing duplicates, handling
missing values, and transforming variables if necessary.
◼ c. Perform data manipulation operations such as
filtering and sorting based on certain criteria.
◼ d. Generate reports or visualizations to present the
analysis results.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Solution:
(a) Read a dataset from a CSV file and perform exploratory data analysis,
including summary statistics, and identifying missing values.
library(dplyr)
library(ggplot2)
#Read dataset from CSV file
data <- read.csv("train.csv")
#Exploratory Data Analysis (EDA)
# Summary statistics
summary_stats <- summary(data)
print(summary_stats)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Data visualization
# For example, let's create a histogram for age
ggplot(data, aes(x = Age)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Distribution of Age on Titanic",
x = "Age",
y = "Frequency")
# Identifying missing values
missing_values <- colSums(is.na(data))
print(missing_values)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Distribution of Age on Titanic
120

90
Frequency

60

30

0 20 40 60 80
Age

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Handling Missing Values

is.na() function:
◼ is.na() is used to deal with missing values in the dataset
or data frame. We can use this method to check the NA
(Not Available) field in a data frame and help to fill them.
◼ It returns a TRUE corresponding to each missing value.

# is.na in r example
x = c(1, 2, NA, 4, NA, 6, 7)
# invoking is.na() to get NA's indexes
print(is.na(x))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Data cleaning by removing duplicates, handling missing
values, and transforming variables if necessary.
# Remove duplicates
data <- distinct(data) #Removing missing values
data$Age[is.na(data$Age)] <- mean(data$Age, na.rm = TRUE)
missing_values <- colSums(is.na(data))
print(missing_values)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
distinct()
◼ distinct()[dplyr package] to remove duplicate
rows in a data frame.
data <- data.frame(
category = c("A", "B", "A", "B", "A"),
value = c(10, 15, 8, 12, 10)
)
# Extract distinct rows based on the 'category' and 'value' columns
distinct_result <- distinct(data, category, value)

# Display the distinct rows


print(distinct_result)

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Output:

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
mean()

◼ It is calculated by taking the sum of the values and


dividing with the number of values in a data series.

◼ The function mean() is used to calculate this in R.

◼ Syntax
◼ The basic syntax for calculating mean in R is −
❑ mean(x, na.rm = FALSE, ...)
❑ x is the input vector.
❑ na.rm is used to remove the missing values from the input vector.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)
# Find mean.
result.mean <- mean(x)
print(result.mean)
# Find mean dropping NA values.
result.mean <- mean(x,na.rm = TRUE)
print(result.mean)

◼ When we execute the above code, it produces the


following result −
[1] NA
[1] 8.22
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
(c) Perform data manipulation operations such as filtering and sorting based on
certain criteria.
# Filtering: Select passengers with age greater than 18
adult_passengers <- filter(data, Age > 18)
#print(adult_passengers)
# Sorting: Sort data by Fare in descending order
sorted_titanic <- arrange(data, desc(Fare))
#print(sorted_titanic)
# Merging datasets (if applicable)
# Example: If you have another dataset named "additional_data"
file_path <- "Adata.csv"
A <- read.csv(file_path)
# Check column names in both datasets
print(colnames(data))
print(colnames(A))
# Merge based on 'PassengerId'
merged_data <- merge(data, A, by.x = "PassengerId", by.y = "PassengerId")
Dr.Anisha P Rodrigues, Dept of
print(merged_data
9/21/2024
) CSE,NMAMIT, Nitte
(d) Generate reports or visualizations to present the analysis results.
# Example: Hypothesis testing (t-test)
# Check the assumptions (visualize the distribution of ages for each group)
boxplot(Age ~ Survived, data = data, col = c("red", "blue"), main = "Boxplot
of Age by Survived")
# Conduct t-test
t_test_result <- t.test(Age ~ Survived, data = data)
# Print the t-test result
print(t_test_result)
# Calculate the correlation coefficient between 'Age' and ‘Pclass'
correlation_coefficient <- cor(data$Age, data$Pclass)
# Print the result
print(correlation_coefficient)
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
ggplot(data, aes(x = factor(Survived), fill = factor(Survived))) +
geom_bar() +
labs(title = "Number of Survivors on Titanic",
x = "Survived",
y = "Count") +
scale_fill_manual(values = c("red", "green"))

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Boxplot of Age by Survived

80
60
Age

40
20
0

0 1

Survived

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
> print(t_test_result)

Welch Two Sample t-test

data: Age by Survived


t = 2.0385, df = 669.03, p-value = 0.04189
alternative hypothesis: true difference in means between group 0 and
group 1 is not equal to 0
95 percent confidence interval:
0.06862884 3.66201421
sample estimates:
mean in group 0 mean in group 1
30.41510 28.54978

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Number of Survivors on Titanic

400

factor(Survived)
Count

0
1

200

0 1
Survived

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
t-test
◼ The t-test is a statistical hypothesis that takes samples from
both groups to determine if there is a significant difference
between the means of the two groups.

t.test() Function in R
◼ R language provides us with a simple t.test built-in
function for One Sample, Two Samples, and Paired t-
tests.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ H2: The Titanic survivors were younger than the
passengers who died.
◼ Let us consider this Null Hypothesis :The is no significant
difference between the ages of Survivors and ages of
people who died

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Analysis of Titanic Data to analyse the ages of
passengers who have survived and those who have
died using t-test. ( Assuming that the ages of survivors
and those number of passengers who died are
independant)

◼ Boxplots showing the ages of surivivors and those of


people who died

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ The given code snippet performs Welch Two Sample t-
test.
◼ The test is being conducted on the “Age" variable, which
is being compared between two groups: “Survivors " and
“Died".
❑ Group 0 is “Died“ and Group 1 “Survivors "
◼ The output of the test includes the t-statistic (t = 2.0385),
the degrees of freedom (df = 669.03), and the p-value (p-
value = 0.04189).
❑ Since p-value of the test is 0.02949, p<0.05, we reject the Null
Hypothesis that there is no significant difference the ages of
survivors and the ages of people who died
◼ So, we can conclude that there is a Signiant
difference between ages of survivors and the ages of
people who died, ie, The titanic survivors are
younger than the passengers who died.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ The alternative hypothesis is that the true difference in
means between the two groups is not equal to 0.
◼ The 95% confidence interval for the difference in means
is also provided (0.06862884 to 3.66201421)
◼ As well as the sample means for each group (30.41510
for Died and 28.54978 for Survivors).
❑ Average Age of people who died

◼ 30.4153

❑ Average age of Survivors of Titanic

◼ 28.42382

◼ The average age of survivors is 28.42 and the average


age of people who died is 30.41. So the survivors are
younger than the people who died by looking at the
average age.
Dr.Anisha P Rodrigues, Dept of
9/21/2024 CSE,NMAMIT, Nitte
◼ Overall, this test is used to determine if
there is a significant difference in the
mean Age between the two groups, and
the results suggest that there is a
significant difference.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Correlations
◼ Correlation, on the other hand, is a standardized version
of covariance.
◼ It measures the strength of the linear relationship
between two variables.

◼ Aside from looking at the characteristics of our variables,


we may want to see if there is a relationship between our
variables in our data, or a correlation.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ In addition we can use the correlation coefficient (Pearson’s correlation
coefficient), denoted by r.
◼ r is a measure that varies from -1 to 1.
❑ A value of 1 indicates a perfect positive linear relationship, a value of -1
indicates a perfect negative linear relationship, and a value of 0 indicates
no linear relationship.
❑ Between 0 and 1: Positive correlation

❑ 0: No correlation

❑ Between 0 and –1:Negative correlation

◼ Calculating correlations in R for Pearson’s correlation coefficient:


> cor(data$Age, data$Fare)
[1] 0.09156609
> cor(data$Age, data$Pclass)
[1] -0.3313388

Age and Pclass are moderately negatively correlated—since richer people


are generally older.

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
◼ Covariance can be converted to correlation by dividing it
by the product of the standard deviation of the two
variables.
◼ This mathematical process is called normalization, and
correlation you would say is a normalized version of
covariance.
❑ Normalization is the process of transforming data so that it
conforms to a specific scale or distribution.
◼ We can also find correlation by taking the covariance
and dividing by the product of the standard deviations,
as we said.

> cov(data$Age, data$Fare) / (sd(data$Age) * sd(data$Fare))


[1] 0.09156609

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte
Thank You..

Dr.Anisha P Rodrigues, Dept of


9/21/2024 CSE,NMAMIT, Nitte

You might also like