0% found this document useful (0 votes)
28 views110 pages

Data Science Unit-4

This document provides an overview of R programming, including installation, data types, objects, and basic operations. It covers the R Studio interface, how to set working directories, create and execute R scripts, and manage variables and comments. Additionally, it explains the fundamental data structures in R, such as vectors, lists, and data frames, along with their usage and characteristics.

Uploaded by

yejem28478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views110 pages

Data Science Unit-4

This document provides an overview of R programming, including installation, data types, objects, and basic operations. It covers the R Studio interface, how to set working directories, create and execute R scripts, and manage variables and comments. Additionally, it explains the fundamental data structures in R, such as vectors, lists, and data frames, along with their usage and characteristics.

Uploaded by

yejem28478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

PE 515 CS – DATA SCIENCE

UNIT – IV
19.10.2022
 INTRODUCTION TO ‘R’ PROGRAMMING
 GETTING STARTED WITH ‘R’
 INSTALLATION OF ‘R’ SOFTWARE AND USING THE INTERFACE VARIABLES AND
DATA TYPES
 ‘R’ OBJECTS
 VECTORS & LISTS
 OPERATIONS: ARITHMETIC, LOGICAL & MATRIX OPERATIONS
 DATA FRAMES
 FUNCTIONS
 CONTROL STRUCTURES
 DEBUGGING & SIMULATION IN ‘R’

 R & R Studio
 How to set the working directory
 How to create an R file and save it
 How to execute an R file
 How to execute pieces of code

 R is an open source programming language that is widely used as a statistical


software and data analysis tool.
 R generally comes with the Command line interface.
 R is available across widely used platforms, windows, Linux and Mac OS
 R Studio is an integrated development environment for R.
 Integrated development environment, is a GUI, where you can write your quotes,
see the results and also see the variables that are generated during the course of
programming.
 R Studio is available as both Open source and Commercial software.
 R Studio is also available as both Desktop version and Server version.
 R Studio is also available for various platforms, such as windows, Linux and Mac
OS.

Page 1 of 1
This is how an R Studio Interface looks.

When you first run the application, to the left, we see Console panel, where you can type
in the commands and see the results that are generated when you type in the
commands.

To the top right, you have Environmental History pane.


It contains 2 types: the Environment type, where, it shows the variables that are
generated during the course of programming, in a workspace, which is temporary and in
the History tab, you will see all the commands that are used till now from the beginning
of usage of R Studio.
The right bottom, you have another panel, which contains multiple tab, such as files,
plots, packages and help.

The Files tab shows the files and directories that are available in the default workspace
of R. The Plots tab shows the plots that are generated during the course of
programming.
And the Packages tab helps you to look, what are the packages that are already
installed in the R Studio and it also gives an user interface, to install new packages.
The Help tab is a most important one, where you can get help from the R
Documentation on the functions that are in built in R.
The final and last tab is the Viewer tab, which can be used to see the local web content
that is generated using R, are some other application.

Page 2 of 2
The working directory in R Studio can be set in 2 ways. The first, way is to use the
console and using the command Set working directory.
You can use this function Set working directory and give the path of the directory which
u want to be the working directory for r studio, in the double codes.

R, to set the working directory from the GUI, you need to click on this 3 dots button.
When you click this, this will open up a file browser, which will help you to choose your
working directory.
Once you choose your working directory, you need to use this setting button in the more
tab and click it and then you get a popup menu, where you need to select Set as working
directory.
This will select the current directory, which you have chosen using this file browser as
your working directory.

Page 3 of 3
Once you set the working directory, you are ready to program in R Studio.

Let us illustrate how to create an R file and write some code. To create an R file, there
are 2 ways: The first way is: you can click on the file tab, from there when you when you
click it will give a drop down menu, where you can select new file and then R script, so
that, you will get a new file open.

Page 4 of 4
The other way is to use the + button, that is just below the file tab and you can choose R
script, from there, to open a new R script file.

Once you open an R script file, this is how an R Studio with the script file open looks like.
Page 5 of 5
So, 3 panels console environmental history and files and plots panels are there. On top
of that, you have a new window, which is now being opened as a script file. Now you are
ready to write a script file or some program in R Studio.

So, let us illustrate this with a small example, where I am assigning a value of 11 to a, in
the first line of the code which I have written and you have b which is a times 10, that is
the second command, I am evaluating the value of a times 10 and assign the value to
the b and the third statement, which is print c of a, b concatenates this a and b and print
the result.
So, this is how you write a script file in R.
Once you write a script file, you have to save this file before you execute it.

Page 6 of 6
Let us see, how to save the R file. From the file menu, if you click the file tab, you can
either save the file, when you want to save the file, if you click the save button, it will
automatically save the file has untitled x. So, this x can be 1 or 2 depending upon how
many R scripts you have already opened, or it is a nice idea, to use the Save as button,
just below the Save one, so that, you can rename the script file according to your wish.
Let us suppose we have click the, Save as button.

Page 7 of 7
This will pop out a window like this, where you can rename the script file as test R, are
the one which you are intended to. Once you rename, you can say save, that will save
the script file.

So now, we have seen how to open an R script and how to write some code in the R
script file. The next task is to execute the R file.
There are several ways you can execute the commands that are available in the R file.
The first way is to use run command.

This run command, can be executed using the GUI, by pressing the run button there, or

Page 8 of 8
you can use the Shortcut key, this is control + enter, what it does is, it will execute the
line in which the cursor is there.
The other way is to run the R code ‘R’ using source R source with echo.
The difference between source and source with echo is the following:
The Source command executes the whole R file and only prints the output, which you
wanted to print.
Whereas, source with echo prints the commands also, along with the output you are
printing.

So, this is an example, where I have executed the R file, using the source with echo, you
can see, in the console, that it printed the command a = 11 and the command b = a time
10 and also the output print c of a, b with the values. So, a = 11 and b = 11 times 10, this
is 110. So, this is how, the output will be printed in console. So, that is the result.

Page 9 of 9
Now, let us see how
to execute the
pieces of code in R.

You can use run


command, to run the
single line.

So now, let us try to assign value 14 for a, and then try to run it.
So, how do you do this?
Take your cursor to the line, which you want to edit, replace that 11 by 14 and then use
control enter or the run button.
This will execute only the line, where the cursor is placed.

In the Environment pane, you can see


that, only value of a, has been changed
and the b value remains same. This is
because, we have executed only the
line 2 of the code, which change the
value of a, but we have not executed
the code of line 3. So, the b value
reminds as is. Value of a, has changed,
but not the value of b.

Page 10 of 10
In summary, we can say that, Run can
be used to execute the selected lines
of R code.

Source and Source with echo can be


used to run the whole file.

The advantage of using Run is, you


can troubleshoot or debug the
program when something is not
behaving according to your
expectations.

The disadvantages of using run


command is, it populates the console
and make it messy unnecessarily.

Page 11 of 11
28.10.2022

 How to
o add comments
o clear the environment
o save the workspace

Why to add comments?


 Improve the readability of code
o The purpose of the code
o Explain algorithm used to accomplish the purpose
 To generate documentation external to the source code itself by documentation
generators

 To add comments to a single line in R script, use hash key at the start of the
comment.
 Commenting makes the script file more readable.
 To make a line of code inert, insert ‘#’ at the start of the line

Page 12 of 12
There are 2 ways:
Select the multiple lines
which you want to
comment, using the
cursor and then use the
key combination control +
shift + C to comment or
uncomment the selected
lines.

The other way - use GUI,


select the lines which you
want to comment by
using cursor & in the code
menu if you click on the
code menu a pop up
window pops out, select
comment or uncomment
lines

 The
console can be
cleared using the
shortcut key
control + L

Page 13 of 13
 To clear the variables
from the R environment use
rm command

 To clear a single
variable from the R
environment, rm followed
by the variable to be
removed

 To delete all the


variables that are there in
the environment, use the rm
with an argument list = ls
followed by parenthesis
Or
 Clear all the variables in
the environment using the
GUI in the environment
history pane – brush button,
when the brush button is pressed, it will pop up

 If we select the option ‘YES’, it will clear all the variables

Page 14 of 14
 The
environment is
empty now

Saving data from workspace

Workspace data

 Workspace information is temporary


 Is not retained after the session
o If the R – Session is closed
o If R studio is restarted

 It is sometimes needed to save the data which is already there in the current
session

To save the data from the R environment,


 There are 2 ways
 The first one is the automatic option
 when you close the R Studio application it will ask you look do you want to save
the workspace image
 If ‘yes’ is selected, then it will save all the variables that are there in the
workspace
 If ‘no’ is selected, then the R Studio will exit and the workspace information will
Page 15 of 15
not be saved.

Page 16 of 16
Variables and Data Types in R

 Variables
 Basic Data Types
 R Objects
o Vectors
o Lists

The rules for naming the variables in R

1. The variable
name in R has to be
alphanumeric
characters with an
exception of
underscore and
period, the special
characters which
can be used in the
variable names.

2. The variable
name has to be
started always with
an alphabet and no
other special
characters except
the underscore and
period are allowed in
the variable names.

3. This shows
some examples of
the correct variable names that can be used in R.

The first one, b2 = 7, assigns the value of 7 to the variable b2. This is a valid variable
name because it started with an alphabet and it has only alphanumeric characters.

Similarly, the second variable Manoj_GDPL = scientist this is also valid variable name
because it has a special character, but it is underscore which is allowed special
character for the variable names.

Page 17 of 17
Examples where the variable names are not correct the variable 2b = 7, gives an error
because that variable name has started with the numeric character which is not
following the rules for the names of the variables in R.
R also contains some predefined constants that are available such as pi, letters, the
lowercase a to z and letters in the uppercase which are uppercase letters from A to Z
and months in a year, you can have full month name by month dot name and you can
have abbreviated month names by typing month dot abbreviation.

Page 18 of 18
The Data Types those are available in the R:

 R has the following basic data types and this table shows the data type and the
values that each data type can take.
 R has logical data types which take either a value of true or false
 It supports integer data types which is the set of all integers and numeric which is
set of all real numbers.
 R supports set of all the complex numbers.
 Also, we can have a character data type where you have all the alphabets and
special characters which are under the window of basic data types of characters.
 There are several tasks that can be done using data types.

 The following table gives the task, action and the syntax for doing the task.
Page 19 of 19
 For example, the first task is to find the data type of the object.
 To find the data type object use typeof()
 The syntax for doing that is, pass the object as an argument to the typeof() to find
the data type of an object

Warning message:
NAs introduced by coercion

 The second task is, to verify if object is of certain data type.


 To do that, prefix is dot before the data type as a command.
 The syntax for that is, “is dot data type” of the object to be verified.
 For example, if you have variable a, which is defined as an integer and if you use this
command “is.integer(a)”, it will show true originally.
 If the variable ‘a’ is not defined as integer, this will show false

The third task is convert the data type of one object to another,
Page 20 of 20
“as dot, before the data type” as the command;

as.data_type(object) – the syntax for doing that is as dot data type of the object which
is to be coerced.

Note that all the coercions are not possible and if attempted then it will be returning a
null value.

Numeric variable can be coerced into complex variable by using “as dot complex of”,
as.complex()

For example, we have as dot complex of 2, will convert this numeric variable 2 into the
complex variable 2 + 0i.

Coercing a character into a numeric variable using the command as.numeric(), which
has given us not available or NA.

This means the coercion from the characters to numeric numerical variables is not
possible.

Basic objects of R, the most important are; vectors, lists and data frames.

A vector is an ordered collection of same data types, list is ordered collection of object
themselves and data frame is a generic tabular object which is very important and the
most widely used objects of R programming language.

Basic Objects

Object Values
Vector Ordered collection of same Data Types
List Ordered collection of objects
Data Frame Generic tabular object

Page 21 of 21
Vectors

Vector is an ordered collection of basic data types of a given length.


So, only key thing here is all the elements of a vector must be of a same data type.

Example – creating vector in R is using the concatenation command that is C.

Define a vector which is containing four numeric variables and assigning it to a variable
X.

This is what the code here X = concatenation of these numbers and then printing X.

When executing this piece of code, this is how the output in the console looks.

It creates a vector X with the variables 2.3, 4.5, 6.7, 8.9 and prints them in the console.

Lists
Page 22 of 22
 List is a generic object consisting of ordered collection of objects.
 List can be a list of vectors, list of matrices, list of characters and list of functions
and so on.
 Example – To illustrate how a list looks:
o To build a list of employees with the details for this we need the attributes
such as ID, employee name and number of employees.
o So, create each vector for those attributes.

 The first attributes is a numeric variable containing the employee IDs which is
created using the command here, which is a numeric vector
 The second attribute is employee name which is created using this line of code here,
which is the character vector
 The third attribute is number of employees which is a single numeric variable.

Combine all these three different data types into a list containing the details of
employees which can be done using a list command.

This command here creates emp.list variable which is a list of the ID, emp.name and

Page 23 of 23
num.emp that are defined above.

 Once list is created, then print the list and see how the output looks.
 When this course is executed, in the console, the list is printed.
 This is the first one IDs 1, 2, 3, 4;
 This is the second element of the list which are contain the names of employees
 The third element of the list which are saying how many number of employees are
available.
 List is created.

How to access the components (indices) of the list?

 All the components of a list can be named and those names can be used to access
the components of the list.
Page 24 of 24
 For example, this is the same list created earlier – use the same ID, emp.name and
emp.employee.
 Instead of directly creating a list, the names can be given for this attributes as ID,
names of employees and the total staff as shown in the code here.
 Once this code is executed, list will be created and to access this element of the list,
use the dollar command emp.list is the list and to access the component with the
name, names.
 While using this command and print the result, the names of the employees are
printed.

ID = c(1,2,3,4)
emp.name=c("man","rag","sha","din")
num.emp=4
emp.list=list(ID,emp.name,num.emp)
emp.list=list("id"=ID, "names"=emp.name, "Total Staff"=num.emp)
print(emp.list$names)

Output
[1] "man" "rag" "sha" "din"

[Execution complete with exit code 0]

Page 25 of 25
The components of the list can be accessed using indices.

To access the top level components of a list, use double slicing operator which is two
square brackets
To access the lower or inner level components of a list, use another square bracket
along with the double slicing operator.

The course here illustrates how to access the top level components;
For example, to access the IDs, use print emp.list and this is a double slicing operator
which will give the first level which is ID.

The second component can also be similarly accessed that is the result is shown here
and to access, for example, the first sub element or the inner component of the
component ID, use emp.list, the double slicing operator and the first element in the
another square bracket.

Similarly, to access the first employee name using double slicing operator to be
followed by the element one which prints the value man from the employee list.

Page 26 of 26
A list can also be modified by accessing the components and replacing them with the
ones which is required.
For example, to change the total number of staff into 5, that can be done easily by
assigning a value 5 to the total staff
To add a new employee name to the list the component of the list which has the
employee names is 2
To add this new name Nir as a new employee to that sub component.

So, directly assign this character variable Nir to the second component and fifth sub
component of the list.
Now, to increase the employee ID and to give this employee and new ID which is 5, in
this command is your accessing the fifth sub element of the level one component and
then assigning data value of 5.

Print the list, now the IDs, number of employees are 5 and total staff is 5 and the name
Nir is getting added to the list.
How to concatenate the list?

Page 27 of 27
Two lists can be concatenated using the concatenation function.
The syntax for that is concatenation of list 1 and list 2.
First list which already contains three attributes, to add another attribute, which is
emp.ages;
For this, create a new list which contains the ages of the five employees.

To concatenate this new list that is emp.ages with the original list which is emp.list, use
the concatenation operator.

This command concatenates these two lists – now assigning it to the emp.list.
While printing this newemp.list, another attribute ages is added to the original list.

01.11.2022
Data Frames

Data frame – object of R

Page 28 of 28
 Data frame
 Create
 Access rows and columns
 Edit
 Add new rows and columns

vec1=c(1,2,3)
vec2=c("r","scilab","java")
vec3=c("for prototyping", "for prototyping", "for scale up")
df=data.frame(vec1,vec2,vec3)
print(df)

Page 29 of 29
 Data frame are generic data objects of R which are used to store the tabular data.

 Data frames are the most popular data objects in R programming because we are
comfortable in seeing the data in the tabular form.

 Data frames can also be taught as matrices where each column of a matrix can be
of different data type.

 Vec1 = numeric vector


 Vec2 = character vector
 Vec3 = character vector

 To create data frame, use the command data.frame() and pass each element as
argument to this function.

 Names of the variables are taken as the columns

 Data frames can also be created by importing the data from text file, using the
function read.table(),

 Syntax:
new_data_frame = read.table(file = “path of the file”, sep)

 Content of the text file: the data, separated by spaces


 Path of the file: from which data will be imported
o This path specification is the OS dependent, care must be taken whether to
use backslash are slash operator depending upon OS.
 A separator can also be specified to distinguish between entries, the default
separator between the entries of data is space

Page 30 of 30
 The separator can also be a comma or a tab etc.
 Data frames can be created either on the go or using the data that already exists in
some format.
 Thus a data frame was created

Accessing Rows and Columns

 In order to access these, data frame and 2 arguments has to be passed


 The first argument refers to the rows of the data frame, and the 2nd argument refers
to the column of the data frame
 The arguments can be array of values or values

Page 31 of 31
RUN – 1

vec1=c(1,2,3)
vec2=c("r","scilab","java")
vec3=c("for prototyping", "for prototyping", "for scale up")
df=data.frame(vec1,vec2,vec3)
print(df[1:2,])

Output
vec1 vec2 vec3
1 1 r for prototyping
2 2 scilab for prototyping

[Execution complete with exit code 0]

Page 32 of 32
RUN – 2

vec1=c(1,2,3)
vec2=c("r","scilab","java")
vec3=c("for prototyping", "for prototyping", "for scale up")
df=data.frame(vec1,vec2,vec3)
print(df[,1:2])

Output
vec1 vec2
1 1 r
2 2 scilab
3 3 java

[Execution complete with exit code 0]

RUN – 3

vec1=c(1,2,3)
vec2=c("r","scilab","java")
vec3=c("for prototyping", "for prototyping", "for scale up")
df=data.frame(vec1,vec2,vec3)
print(df[1:2])

Output
vec1 vec2
1 1 r
2 2 scilab
3 3 java

[Execution complete with exit code 0]

Subset

The command subset is used to get the subset of data frame.

#data frame example 2


pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
print(pd)
pd2=subset(pd,Name=="Senthil" | BS>150)
print("new subset pd2")
print(pd2)

Page 33 of 33
Output
Name month BS BP
1 Senthil Jan 141.2 90
2 Senthil Feb 139.3 78
3 Sam Jan 135.2 80
4 Sam Feb 160.1 81
[1] "new subset pd2"
Name month BS BP
1 Senthil Jan 141.2 90
2 Senthil Feb 139.3 78
4 Sam Feb 160.1 81

[Execution complete with exit code 0]

Page 34 of 34
Editing Data Frames

like list the data frames can be edited by direct assignment.

vec1=c(1,2,3)
vec2=c("R","Scilab","Java")
vec3=c("For prototyping","For prototyping","For Scale up")
df=data.frame(vec1,vec2,vec3)
print(df)
df[[2]][2]="R"
print(df)

Output
vec1 vec2 vec3
1 1 R For prototyping
2 2 Scilab For prototyping
3 3 Java For Scale up
vec1 vec2 vec3
1 1 R For prototyping
2 2 R For prototyping
3 3 Java For Scale up

[Execution complete with exit code 0]

Page 35 of 35
 To use edit command, create an instance of data frame, using data.frame()
command
 This creates an empty data frame and uses this edit command to edit the entries in
data frame

 Create a frame called “mytable” and assign whatever is to be edited.


 While executing the command, a window will pop up, fill the new details (to be edited
to), while closing, it will save the data as data frame named “mytable”

How to add extra rows and columns to the data frame

To add extra row use the command rbind and to add an extra column use the command
cbind.

Syntax for rbind = data frame to which new rows to be added,

Note: data type in each column should be = to the data types that are already existing
rows

Page 36 of 36
vec1=c(1,2,3)
vec2=c("R","Scilab","Java")
vec3=c("For prototyping","For prototyping","For Scale up")
df=data.frame(vec1,vec2,vec3)
df=rbind(df,data.frame(vec1=4,vec2="C",vec3="For Scale up"))
print("Adding Extra Row")
print(df)
df=cbind(df,vec4=c(10,20,30,40))
print("Adding Extra Column")
print(df)

Output
[1] "Adding Extra Row"
vec1 vec2 vec3
1 1 R For prototyping
2 2 Scilab For prototyping
3 3 Java For Scale up
4 4 C For Scale up
[1] "Adding Extra Column"
vec1 vec2 vec3 vec4
1 1 R For prototyping 10
2 2 Scilab For prototyping 20
3 3 Java For Scale up 30
4 4 C For Scale up 40

[Execution complete with exit code 0]

Page 37 of 37
Deleting Rows and Columns

There are several ways to delete rows and columns;

Access the row first and insert the negative (-) sign before that close

For example:
To delete the 3rd row, select and insert a negative sign.

And in the above example, this exclamatory symbol says no to the columns that are
having column name vector 3

Page 38 of 38
vec1=c(1,2,3)
vec2=c("R","Scilab","Java")
vec3=c("For prototyping","For prototyping","For Scale up")
df=data.frame(vec1,vec2,vec3)
df=rbind(df,data.frame(vec1=4,vec2="C",vec3="For Scale up"))
#print("Adding Extra Row")
#print(df)
df=cbind(df,vec4=c(10,20,30,40))
#print("Adding Extra Column")
#print(df)
df2=df[-3,-1]
print(df2)
Output
vec2 vec3 vec4
1 R For prototyping 10
2 Scilab For prototyping 20
4 C For Scale up 40
[Execution complete with exit code 0]

vec1=c(1,2,3)
vec2=c("R","Scilab","Java")
vec3=c("For prototyping","For prototyping","For Scale up")
df=data.frame(vec1,vec2,vec3)
df=rbind(df,data.frame(vec1=4,vec2="C",vec3="For Scale up"))
#print("Adding Extra Row")
#print(df)
df=cbind(df,vec4=c(10,20,30,40))
#print("Adding Extra Column")
#print(df)
#df2=df[-3,-1]
#print(df2)
#conditional deletion
df3=df[,!names(df)%in%c("vec3")]
print(df3)
df4=df[!df$vec1==3,]
print(df4)
Output
vec1 vec2 vec4
1 1 R 10
2 2 Scilab 20
3 3 Java 30
4 4 C 40
vec1 vec2 vec3 vec4
1 1 R For prototyping 10
2 2 Scilab For prototyping 20
4 4 C For Scale up 40
[Execution complete with exit code 0]

Page 39 of 39
05.11.2022
Data Entry with R’s Text Editor
https://youtu.be/sAit4ctcX2Q

vec1=c(1,2,3)
vec2=c("R","Scilab","Java")
vec3=c("For prototyping","For prototyping","For Scale up")
df=data.frame(vec1,vec2,vec3)
print(df)
df[[2]][2]="R"
print(df)
df=rbind(df,data.frame(vec1=4,vec2="C",vec3="For Scale up"))
df=cbind(df,vec4=c(10,20,30,40))
print(df)
df2=df[-3,-2]
print(df2)
df3=df[,!names(df) %in% c("vec3")]
print(df3)
df4=df[!df$vec1==3,]
print(df4)
df[3,1]=3.1
df[3,3]="others"
print(df)

Page 40 of 40
Output
vec1 vec2 vec3
1 1 R For prototyping
2 2 Scilab For prototyping
3 3 Java For Scale up
vec1 vec2 vec3
1 1 R For prototyping
2 2 R For prototyping
3 3 Java For Scale up
vec1 vec2 vec3 vec4
1 1 R For prototyping 10
2 2 R For prototyping 20
3 3 Java For Scale up 30
4 4 C For Scale up 40
vec1 vec3 vec4
1 1 For prototyping 10
2 2 For prototyping 20
4 4 For Scale up 40
vec1 vec2 vec4
1 1 R 10
2 2 R 20
3 3 Java 30
4 4 C 40
vec1 vec2 vec3 vec4
1 1 R For prototyping 10
2 2 R For prototyping 20
4 4 C For Scale up 40
vec1 vec2 vec3 vec4
1 1.0 R For prototyping 10
2 2.0 R For prototyping 20
3 3.1 Java others 30
4 4.0 C For Scale up 40

[Execution complete with exit code 0]

Page 41 of 41
Factor issue – R has inbuilt characteristic to assign the data types to the data we enter
(as categories or factors levels)

And it assumes that these are the only factors that are available for now;
When changing the 3rd row, 3rd column to “others”, it will display warning message
saying that, this others categorical variable is not available and it replaces that with the
NA (see the use of word factor in the warning message)

Page 42 of 42
During entering new entries in R, it should be consistent with the factor levels that are
already defined, if not those error message will be printed out.

To avoid this, while defining the data frame itself, pass another argument, which says
strings as factors is false, by default this argument is true, that is the reason for the
warning message

If the same operations are carried out now, then there won’t be any warning messages.

Recasting and Joining of Data Frames

Page 43 of 43
 Recasting
 Need to recast data frames
 Recast in 2 steps: Melt, Cast
 Recast in 1 step – recast
 Joining of 2 data frames – left join, right join, inner join

What is recasting of data frames?


Requesting as a process of manipulating data free in terms of it is variables

Why do want wants to recast the data frames?


The answer is recasting helps in reshaping the data which could bring more insights on
the data, when it is seen from the different perspective.

#data frame example 2


pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
print(pd)

Output
Name month BS BP

Page 44 of 44
1 Senthil Jan 141.2 90
2 Senthil Feb 139.3 78
3 Sam Jan 135.2 80
4 Sam Feb 160.1 81

[Execution complete with exit code 0]

To recast the data frame into another form, 2 steps were used, first one is melt and the
second one is cast.

To use melt and cast command to recast, identifier variables and measurement
variables of the data frame are to be identified.

Most of the discrete type variables can be identifier variables, the numeric variables can
be measurement variables, and there are certain rules for the measurement variables
such as, categorical and date variables cannot be measurement variables.

Page 45 of 45
This melt command is available in the reshape2 library.

#data frame example 2


pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
#print(pd)
library(reshape2)
df=melt(pd,id.vars=c("Name","month"), measure.vars=c("BS","BP"))
print(df)

Output
Name month variable value
1 Senthil Jan BS 141.2
2 Senthil Feb BS 139.3
3 Sam Jan BS 135.2
4 Sam Feb BS 160.1
5 Senthil Jan BP 90.0
6 Senthil Feb BP 78.0
7 Sam Jan BP 80.0
8 Sam Feb BP 81.0

[Execution complete with exit code 0]

For the melt command, give the data frame as first argument;

Page 46 of 46
Specify the identifier variables and the measurement variables in the data frame

Melt command was used to melt the data frame to get to this structure

Next step is the cast

Page 47 of 47
dcast() function is also available in reshape2 library

During this step, columns from which the values are going to be taken – are to be
specified

Data frame ‘df’ – already melted

Using dcast(), another frame df2 – to be created, in which variable and month will be
constant, blood sugar and blood pressure to be the variables of importance

Convert the name variable into 2 columns (number of columns depending upon the
number of categories in the name)

Columns variable and month remain as it is and the categories in the name becomes
new variable.

2 categories are there in the example, which are Sam and Senthil and they will become
the new columns, the values for those variables has to be picked from the column value.

#data frame example 2


pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
#print(pd)
library(reshape2)
df=melt(pd,id.vars=c("Name","month"), measure.vars=c("BS","BP"))
#print(df)
df2=dcast(df,variable+month~Name, value.var="value")
print(df2)

Output
variable month Sam Senthil
1 BS Feb 160.1 139.3
2 BS Jan 135.2 141.2
3 BP Feb 81.0 78.0
4 BP Jan 80.0 90.0

[Execution complete with exit code 0]

Page 48 of 48
Recasting in a single step

Recasting can be
performed in a single step,
using recast function.

It takes the input


arguments of both melt
and cast

Specify only Id variables,


the rest of the variables
are taken by default as
the measurement
variables.

#data frame example 2

Page 49 of 49
pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
library(reshape2)
recast(pd,variable+month~Name, id.var=c("Name","month"))

Output
variable month Sam Senthil
1 BS Feb 160.1 139.3
2 BS Jan 135.2 141.2
3 BP Feb 81.0 78.0
4 BP Jan 80.0 90.0

[Execution complete with exit code 0]

Melt and Cast operations can be done together using the recast command.

09.11.2022

Page 50 of 50
 To create a new variable, mutate() is used - load the library dplyr, use mutate
command, pass the data frame and new variable as argument

pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
library(dplyr)
pd2 <- mutate(pd,log_BP=log(BP))
print(pd2)

Name month BS BP log_BP


1 Senthil Jan 141.2 90 4.499810
2 Senthil Feb 139.3 78 4.356709
3 Sam Jan 135.2 80 4.382027
4 Sam Feb 160.1 81 4.394449

Page 51 of 51
How to join 2 data frames?
 It is very important in terms of data analysis, because part of the data may from one
source and the part of the data from other source, to match these data, some
common IDs are used.

 Combining of data frames can be done using, dplyr package


 The general syntax of the dplyr – a function which could be either left join, right join,
inner join and so on.
 Pass the first data frame and the second data frame.
 Also specify, by which Id variable, the 2 data frames has to be joined.

 The Id variable is common to both data frames; which means that variable is
available in both data frames which are to be combined
 This variable provides the identifiers for combining the 2 data frames
 The nature of combination depends upon the function that is being used.

 Cal the library ‘dplyr’ command using the library() command


 There are several functions that are available in the dplyr package to combine data
frames, few of them are left join, right join, inner join, full join, semi join and anti-join.

Page 52 of 52
Example: Create 2 data frames
1) pd
2) pd_new

Data frame 1: pd
pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
print(pd)

Name month BS BP
1 Senthil Jan 141.2 90
2 Senthil Feb 139.3 78
3 Sam Jan 135.2 80
4 Sam Feb 160.1 81

Data frame 2: pd_new


Pd_new=data.frame("Name"=c("Senthil","Ramesh","Sam"),
"dept"=c("PSE","Data Analytics","PSE"))
print(pd_new)
Name dept
1 Senthil PSE
2 Ramesh Data Analytics
3 Sam PSE

 The left join


joins, matching
rows of data
frame 2 to the
data frame 1
based on the Id
variables.

 The variable
department will be
added to the final
data frame, only
for Senthil and
Sam.

Page 53 of 53
pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
#print(pd)
pd_new=data.frame("Name"=c("Senthil","Ramesh","Sam"),
"dept"=c("PSE","Data Analytics","PSE"))
#print(pd_new)
library(dplyr)
# new dataframe name
pd_left_join1<-left_join(pd,pd_new,by="Name")
print(pd_left_join1)

Name month BS BP dept


1 Senthil Jan 141.2 90 PSE
2 Senthil Feb 139.3 78 PSE
3 Sam Jan 135.2 80 PSE
4 Sam Feb 160.1 81 PSE

Page 54 of 54
pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
#print(pd)
pd_new=data.frame("Name"=c("Senthil","Ramesh","Sam"),
"dept"=c("PSE","Data Analytics","PSE"))
#print(pd_new)
library(dplyr)
# new dataframe name
pd_right_join1<-right_join(pd,pd_new,by="Name")
print(pd_right_join1)

Name month BS BP dept


1 Senthil Jan 141.2 90 PSE
2 Senthil Feb 139.3 78 PSE
3 Sam Jan 135.2 80 PSE
4 Sam Feb 160.1 81 PSE
5 Ramesh <NA> NA NA Data Analytics

 right join – joins matching rows of data frame 1 to the data frame 2 based on the Id
variable
 2 data frames were passed as arguments, 2nd data frame will be the reference, when
Page 55 of 55
there is no matching values, it will be filled with NAs

pd=data.frame("Name"=c("Senthil","Senthil","Sam","Sam"),
"month"=c("Jan","Feb","Jan","Feb"),
"BS"=c(141.2,139.3,135.2,160.1),
"BP"=c(90,78,80,81))
#print(pd)
pd_new=data.frame("Name"=c("Senthil","Ramesh","Sam"),
"dept"=c("PSE","Data Analytics","PSE"))
#print(pd_new)
library(dplyr)
# new dataframe name
pd_right_join1<-right_join(pd_new,pd,by="Name")
print(pd_right_join1)

Name dept month BS BP


1 Senthil PSE Jan 141.2 90
2 Senthil PSE Feb 139.3 78
3 Sam PSE Jan 135.2 80
4 Sam PSE Feb 160.1 81

 Order can be changed while passing the arguments, in the above example, pd – data
frame is the reference variable, the output is similar to left join.

Page 56 of 56
Name month BS BP dept
1 Senthil Jan 141.2 90 PSE
2 Senthil Feb 139.3 78 PSE
3 Sam Jan 135.2 80 PSE
4 Sam Feb 160.1 81 PSE

 Inner join merges and retains those rows in the ids present in the both data frames.

Page 57 of 57
11.11.2022
Arithmetic, Logical and Matrix operations in R

Arithmetic Operations in R

Page 58 of 58
 R supports all the basic arithmetic operation, the first one is assignment operator.
 Either = or the back arrow <- can be used to assign a value to the variable and
standard addition, subtraction, multiplication, division, integer division and remainder
operations are also available in R.
 In R, back arrow <- is only the valid assignment operator whereas, in R studio both =
and back arrow are proper assignment operators.

Hierarchy of operations

Page 59 of 59
 The hierarchy of operations while performing the arithmetic operations in R is similar
to normal BODMAS rule with bracket has the first importance; exponent has the
second priority and followed by division, multiplication, addition and subtraction.
 Value of A in the given example is 5

Logical Operations in R

 Standard Logical Operations such as <, < =, >, > =, = and so on

Page 60 of 60

Page 61 of 61
 The important class of operations that are needed for data analysis problems is
matrix operations
 Most of the data will be treated as matrices.

Page 62 of 62
 A matrix is a rectangular arrangement of numbers in rows and columns in a matrix,
rows are the ones which run horizontally and columns are the ones which run
vertically. These are the examples of matrices.

Page 63 of 63
A=matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,ncol=3,byrow=TRUE)
print(A)

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

 To create a matrix in R, use the function called matrix.


Page 64 of 64
 The arguments to this matrix are the set of elements that are needed to be the
elements of the matrix, how many numbers of rows, how many number of columns,
by row as true, the default option for by row is false (refer below example).

A=matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,ncol=3)
print(A)

Output
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

matrix(3,3,4)

Page 65 of 65
Output
[,1] [,2] [,3] [,4]
[1,] 3 3 3 3
[2,] 3 3 3 3
[3,] 3 3 3 3

diag(c(4,5,6),3,3)
Output
[,1] [,2] [,3]
[1,] 4 0 0
[2,] 0 5 0
[3,] 0 0 6

diag(1,3,3)
Output
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1

Matrix Metrics

Page 66 of 66
 To know the dimension of the matrix, how many rows are there, how many columns
are there, how many elements are there in the matrix, following comments can be
used.

A=matrix(c(1,2,3,4,5,6,7,8,9),nrow=3,ncol=3,byrow=TRUE)
dim(A)
nrow(A)
ncol(A)
length(A)

Output
[1] 3 3
[1] 3
[1] 3
[1] 9

 Dimension of A will return the size of the matrix that will say what the size of the
matrix is
 nrow of A will return number of rows and ncolumn of A will return number of
columns
 Either length of A or product of dimensions of A will return the number of elements
that are existing in the matrix

Page 67 of 67
A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
print(A)

Output
abc
d123
e456
f891

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[,1:2]

ab
d12
e45
f89

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[,c("a","c")]

Output
ac
d13
e46
f81

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[c("d","f"),]

Output
abc
d123
f891

Page 68 of 68
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[ ]
A[1,2]
A[2,3]

Output
abc
d123
e456
f789
[1] 2
[1] 6

Page 69 of 69
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[]
A[,1]

Output
abc
d123
e456
f789
def
147

Page 70 of 70
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[]
A[2,]

Output
abc
d123
e456
f789
abc
456

Page 71 of 71
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[]
A[,-2]

Output
abc
d123
e456
f789
ac
d13
e46
f79

Page 72 of 72
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
colnames(A)<-c("a","b","c")
rownames(A)<-c("d","e","f")
A[]
A[-2,]

Output
abc
d123
e456
f789
abc
d123
f789

Colon operator
Page 73 of 73
1:10
10:1

Output
[1] 1 2 3 4 5 6 7 8 9 10
[1] 10 9 8 7 6 5 4 3 2 1

 Colon operator is used to create an array of elements with equal width.


 For example, while typing 1 to 10 it will create numbers from 1 to 10 with gap of 1.
 Also it is possible to reverse the order; it will print from 10 to 1 with a gap of 1.

Page 74 of 74
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
A[]
A[1:3,1:2]
A[1:3,-3]
A[,1:2]

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2]
[1,] 1 2
[2,] 4 5
[3,] 7 8
[,1] [,2]
[1,] 1 2
[2,] 4 5
[3,] 7 8
[,1] [,2]
[1,] 1 2
[2,] 4 5
[3,] 7 8

Page 75 of 75
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
A[]
A[c(1,3),1:2]
A[c(1,3),c(1,2)]

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2]
[1,] 1 2
[2,] 7 8
[,1] [,2]
[1,] 1 2
[2,] 7 8

Page 76 of 76
Matrix Concatenation
 Matrix concatenation refers to merging of a row or column to a matrix
 Concatenation of a row to a matrix is done using rbind()
 Concatenation of a column to a matrix is done using cbind()
 Consistency of the dimensions between the matrix and the vector should be
checked before concatenation

Page 77 of 77
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
B=matrix(c(10,11,12),1,3,byrow=T)
C=rbind(A,B)
A
B
C

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2] [,3]
[1,] 10 11 12
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12

Page 78 of 78
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
B=matrix(c(10,11,12),3,1,byrow=T)
C=cbind(A,B)
A
B
C

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1]
[1,] 10
[2,] 11
[3,] 12
[,1] [,2] [,3] [,4]
[1,] 1 2 3 10
[2,] 4 5 6 11
[3,] 7 8 9 12

Page 79 of 79
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
B=matrix(c(10,11,12),1,3,byrow=T)
C=cbind(A,B)
A
B
C

Output
Error in cbind(A, B) : number of rows of matrices must match (see arg 2)
Execution halted

[Execution complete with exit code 1]

 To resolve this dimension inconsistency, transpose this B and then have this as 3 by
1 and now A is 3 by 1, now perform the C bind operation by using Cbind command
Cbind of A comma B and assign it to C.

Deleting a Column
Page 80 of 80
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
A=A[,-2]
A

Output
[,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 7 9

 To delete a column, use negative symbol before the columns which is to be deleted
and then assign it to A, then the required output is obtained.

Deleting a Row

Page 81 of 81
A=matrix(c(1,2,3,4,5,6,7,8,9),3,3,byrow=T)
A=A[-2,]
A

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 7 8 9

Page 82 of 82
Matrix Algebra

 Addition
 Subtraction
 Multiplication
 Matrix operations in R
 Matrix Division

Page 83 of 83
Addition

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
B=matrix(c(3,1,3,4,2,1,5,1,2),3,3,byrow=T)
A
B
A+B

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 8 9 1
[,1] [,2] [,3]
[1,] 3 1 3
[2,] 4 2 1
[3,] 5 1 2
[,1] [,2] [,3]
[1,] 4 3 6
[2,] 8 7 7
[3,] 13 10 3

Subtraction

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
B=matrix(c(3,1,3,4,2,1,5,1,2),3,3,byrow=T)
A
B
A-B

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 8 9 1
[,1] [,2] [,3]
[1,] 3 1 3
[2,] 4 2 1
[3,] 5 1 2
[,1] [,2] [,3]
[1,] -2 1 0
[2,] 0 3 5
[3,] 3 8 -1

Page 84 of 84
Multiplication (Regular Matrix Multiplication)

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
B=matrix(c(3,1,3,4,2,1,5,1,2),3,3,byrow=T)
A
B
A%*%B

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 8 9 1
[,1] [,2] [,3]
[1,] 3 1 3
[2,] 4 2 1
[3,] 5 1 2
[,1] [,2] [,3]
[1,] 26 8 11
[2,] 62 20 29
[3,] 65 27 35

Multiplication (Element-wise Multiplication)

A=matrix(c(1,2,3,4,5,6,8,9,1),3,3,byrow=T)
B=matrix(c(3,1,3,4,2,1,5,1,2),3,3,byrow=T)
A
B
A*B

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 8 9 1
[,1] [,2] [,3]
[1,] 3 1 3
[2,] 4 2 1
[3,] 5 1 2
[,1] [,2] [,3]
[1,] 3 2 9
[2,] 16 10 6
[3,] 40 9 2

Page 85 of 85
Matrix Division

A=matrix(c(4,9,16,25),2,2,byrow=T)
B=matrix(c(2,3,4,5),2,2,byrow=T)
A
B
A/B

Output
[,1] [,2]
[1,] 4 9
[2,] 16 25
[,1] [,2]
[1,] 2 3
[2,] 4 5
[,1] [,2]
[1,] 2 3
[2,] 4 5

Page 86 of 86
16.11.2022
Advanced programming in R: Functions

 How to load or source the functions?


 How to call or invoke the functions?

 Functions in R – a function accepts input arguments and produces output by


executing valid R commands present in the function
 Function names and files names need not be the same
 A file can have one or more function definitions
 Functions are created using the command function()

F = function (arguments) {
Statements
}

 Functions are useful when the task is to be performed many times.

 Creating a function file is similar to opening an R script


 Either use file button in the toolbar or use the + button just below the file tab to
create an R script, and then save it with the required name.

Page 87 of 87
Page 88 of 88
 After saving the function, it should be loaded before invoking or executing in R.
 Using the source button available in R Script menu, a function can be loaded.

 After loading, the function can be invoked from the console.

Page 89 of 89
Passing arguments to the functions
 Passing variables as arguments to functions
 Passed in the same order as in function definition
 Names of the arguments can be used to pass their values in any order
 Default values are used if some or all arguments are not passed

 Whenever there is a change inside the function definition or R studio is restarted,


then load the function again, otherwise an error may come or correct output may
not be generated.

 Functions with multiple inputs and multiple outputs (MIMO)


Page 90 of 90
 How to source and call functions?
 Inline functions
 How to loop over objects using the commands apply, lapply and tapply

 Since R returns only one object first create an object called result, which is a list
of volume and surface area, then calculate these results and ask the function to
return one object, that is the result, which contains both volume and surface area.

Page 91 of 91
Inline functions

Page 92 of 92
Looping Functions

Page 93 of 93
 apply function – Applies a given function over the margins of a given array
 syntax – apply(array, margins, functions,…)
 here margins referes to the dimensions of the array along which the function need to
be applied.

a = matrix(1:9,3,3)
a
apply(a,1,sum)
apply(a,2,sum)

Output
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[1] 12 15 18
[1] 6 15 24
 lapply function – is used to apply a function over a list
 lapply always returns a list of the same lengths as the input list
Page 94 of 94
 Syntax: lapply(list, function,...)

a=matrix(1:9,3,3)
b=matrix(10:18,3,3)
Mylist=list(a,b)
a
b
determinant=lapply(Mylist,det)
determinant

Output
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
[[1]]
[1] 0

[[2]]
[1] 5.329071e-15

 mapply function – mapply is a multivariate version of lapply


 a function can be applied over several list simultaneously
 Syntax: mapply(fun, list1, list2,…)

volcylinder =function (dia=5,len=100)


{
vol=pi*dia^2*len/4
return (vol)
}

dia=c(1,2,3,4)
len=c(7,4,3,2)
vol=mapply(volcylinder,dia,len)
vol

Output
[1] 5.497787 12.566371 21.205750 25.132741

Page 95 of 95
 tapply – tapply is used to apply a function over subset of vectors given by a
combination of factors
 Syntax – tapply(vectors, factors, function,….)

id=c(1,1,1,1,2,2,2,3,3)
values=c(1,2,3,4,5,6,7,8,9)
tapply(values,id,sum)

Output
1 2 3
10 18 17

18.11.2022
Control structures

 If-else-if family
 for loop
 nested for loops
 for loop with if break
 while

Control structures can be divided into 2 categories.


1. if-then-else: execute certain commands only when certain condition(s) is
satisfied
2. for, while loops: execute certain commands repeatedly and use a certain logic to
stop the iteration

Page 96 of 96
Page 97 of 97
#if-elseif-else example
x=6
if(x>7){
x=x+1
}else if(x>8){
x=x+2
}else{
x=x+3
}
x

Output
[1] 9

Page 98 of 98
seq(from=1,to=10,by=3)
Output
[1] 1 4 7 10

seq(from=1,to=10,length=4)
Output
[1] 1 4 7 10

seq(from=1,to=10,by=4)
Output
[1] 1 5 9

For loop, nested for loop

Page 99 of 99
n=5
sum=0
for(i in seq(1,n,1)){
sum=sum+i
print(c(sum,i))
}

Output
[1] 1 1
[1] 3 2
[1] 6 3
[1] 10 4
[1] 15 5

For loop with if-break

Page 100 of 100


n=100
sum=0
for(i in seq(1,n,1)){
sum=sum+i
print(c(sum,i))
if(sum>15){
break}
}

Output
[1] 1 1
[1] 3 2
[1] 6 3
[1] 10 4
[1] 15 5
[1] 21 6

Page 101 of 101


sum=0
i=0
fin_sum=15
while(sum<fin_sum){
i=i+1
sum=sum+i
print(c(i,sum))
}

Output
[1] 1 1
[1] 2 3
[1] 3 6
[1] 4 10
[1] 5 15

Debuggind and Simulation in R

Page 102 of 102


https://www.geeksforgeeks.org/debugging-in-r-programming/

Debugging is a process of cleaning a program code from bugs to run it


successfully. While writing codes, some mistakes or problems automatically
appears after the compilation of code and are harder to diagnose. So, fixing it
takes a lot of time and after multiple levels of calls.
Debugging in R is through warnings, messages, and errors. Debugging in R
means debugging functions. Various debugging functions are:
 Editor breakpoint
 traceback()
 browser()
 recover()

Editor Breakpoints
Editor Breakpoints can be added in RStudio by clicking to the left of the line in
RStudio or pressing Shift+F9 with the cursor on your line.
A breakpoint is same as browser() but it doesn’t involve changing codes.
Breakpoints are denoted by a red circle on the left side, indicating that debug
mode will be entered at this line after the source is run.

Page 103 of 103


traceback() Function
The traceback() function is used to give all the information on how your function
arrived at an error. It will display all the functions called before the error arrived
called the “call stack” in many languages, R favors calling traceback.

# Function 1 Output: (from website) Output (from R compiler)


function_1 <- function(a){ 2: function_1(b) at #1 Error in a + 5 : non-numeric
a+5 argument to binary operator
} 1: function_2("s") Calls: function_2 -> function_1
Execution halted
# Function 2
function_2 <- function(b) [Execution complete with exit
code 1]
{
function_1(b)
}

# Calling function
function_2("s")

# Call traceback()
traceback()

traceback() function displays the error during evaluations. The call stack is read
from the function that was run(at the bottom) to the function that was
running(at the top). Also we can use traceback() as an error handler which will
display error immediately without calling of traceback.

Output: (from website) Output (from R compiler)


# Function 1 Error in a + 5 : non-numeric Error in a + 5 : non-numeric
function_1 <- function(a){ argument to binary operator argument to binary operator
a+5 Calls: function_2 -> function_1
} 2: function_1(b) at #1 No traceback available
1: function_2("s")
# Function 2 [Execution complete with exit
code 0]
function_2 <- function(b){
function_1(b)
}

# Calling error handler


options(error =
traceback)
function_2("s")

Page 104 of 104


browser() Function
browser() function is inserted into functions to open R interactive debugger. It
will stop the execution of function() and you can examine the function with the
environment of itself. In debug mode, we can modify objects, look at the objects
in the current environment, and also continue executing.

browser[1]> command in consoles confirms that you are in debug mode. Some
commands to follow:
 ls(): Objects available in current environment.
 print(): To evaluate objects.
 n: To examine the next statement.
 s: To examine the next statement by stepping into function calls.
 where: To print a stack trace.
 c: To leave debugger and continue with execution.
 C: To exit debugger and go back to R prompt.

Also, debug() statement automatically inserts browser() statement at the


beginning of the function.

Page 105 of 105


recover() Function
recover() statement is used as an error handler and not like the direct statement.
In recover(), R prints the whole call stack and lets you select which function
browser you would like to enter. Then debugging session starts at the selected
location.

Example:

# Calling recover Output: (from website) Output (from R compiler)


options(error = recover) Enter a frame number, or 0 Error in a + 5 : non-numeric
to exit argument to binary operator
# Function 1 Calls: function_2 -> function_1
function_1 <- function(a){ recover called non-interactively;
a+5 frames dumped, use
1: function_2("s")
} debugger() to view
2: #2: function_1(b)
[Execution complete with exit
# Function 2 code 0]
function_2 <- function(b)
{ Selection:
function_1(b) The debugging session
}
starts at the selected
# Calling function location.
function_2("s")

Simulation Basics

https://pubs.wsb.wisc.edu/academics/analytics-using-r-2019/simulation-
basics.html

Simulation is a method used to examine the “what if” without


having real data. We just make it up! We can use pre-programmed
functions in R to simulate data from different probability distributions or
we can design our own functions to simulate data from distributions not
available in R.

Few built-in functions are available to generate values from particular


probability distributions

When we do a simulation, we have to make many assumptions. One major


assumption is the choice of the distribution to use for a particular variable.
Each particular distribution has parameters that are integral to generating
Page 106 of 106
data from the distribution. We need to set a value for these parameters to
simulate a value from a distribution.

Given a particular distribution and known parameters, we can


generate values from that distribution. However, in reality, we never know
the true distribution, and we never know the exact parameter values needed
to generate values from that distribution. Practical statistical analysis helps
us to identify a good choice of the distribution and estimate reasonable
parameters.

Values can be generated from a normal, Bernoulli, Uniform, Poisson, and


Gamma.

Normal Distribution

The normal distribution is used if the variable is continuous. We usually


refer to the density of a normal random variable as a bell-shaped curve. We
require a value for the mean and another for the standard deviation to
simulate a value from a normal distribution.(The mean and standard
deviation (or variance) are the parameters of a normal distribution.)

x <- rnorm(n = 1000, mean = 10, sd = 4)

Notice that notation of n = (number of values to be simulated), mean = ,


and sd = . If we plot a histogram, we can see a somewhat bell-shaped curve

hist(x)
summary(x)
sd(x)
library(psych)
describe(x)

Setting the Seed

Technically, when we simulate from a particular distribution, we are using a


pre-programmed function to generate the numbers. Thus, we may call
these pseudo-random numbers.
Initialize the seed of the data-generating process:

Page 107 of 107


set.seed(1)

Indicator (Bernoulli) Variables

A special case of a categorical variable is an indicator variable, sometimes


referred to as a binary or dummy variable. The underlying distribution of an
indicator variable is called a Bernoulli distribution.

Suppose we are interested in evaluating the whether a flip of a coin would


be a head or a tail. Here we could define Head as the variable of interest.
Head={1 If Flip a Head, 0 If Flip a Tail

If you are more interested in evaluating Tails, you could define the random
variable as:

Tail={1 If Flip a Tail, 0 If Flip a Head

We can simulate this random variable using a Binomial distribution.


(Technically, the Bernoulli distribution is a special case of a Binomial.) We
need to set values for n = , size = , and prob = , where n is the number of
values you want to simulate, size in this case is 1 (as we want to simulate
an indicator variable), and prob, is the probability that you will flip a head (or
tail, depending on your random variable).

Simulating indicator variables is completed using the rbinom function. Here,


we simulate 5 values of heads with a probability of 1/2 of getting a head on
each flip (or a fair coin).

rbinom(n = 6, size=1, prob=0.5)

Also use a function called rbernoulli that is part of the purrr package

Uniform (Continuous Version)

To simulate test scores of students, where any score between 0 and 100 is
Page 108 of 108
possible.

fX(x)={1/100−0 for 0≤x≤100, 0 for x<0 or x>100

In the fraction I include the two endpoints (0 and 100) to illustrate how you
would adjust for other bounds. These bounds are defined as
the parameters of the uniform distribution.
To simulate one exam score we would use the following code:
runif(n = 1, min = 0, max = 1

You could save the results in another variable for use


dat <- runif(n = 100, min = 0, max = 100)

Then you can check on the mean (average) of the sample.


mean(dat)

And provide a histogram of the data to see the frequency of occurrence.


hist(dat)

Poisson Variables (Optional)

Another discrete distribution that you may learn is called


the Poisson distribution that is used to predict counts of some event that
occur within a given time interval. For example, recording the number of car
accidents that you have in a year.
As you would guess, there is a R function for simulating this random
variable. Here in addition to the number of values to simulate, we just need
the parameter for the mean (called lambda).
rpois(n = 6, lambda = 20)

Gamma Variables (Optional)

Another continuous distribution that you may learn is called


the Gamma distribution. This distribution is used for random variables that
have some skewness and is not symmetrical, like the Normal Distribution.

There is a R function for simulating this random variable. Here in addition


Page 109 of 109
to the number of values to simulate, we just need two parameters, one for
the shape and one for either the rate or the scale. The rate is the inverse of
the scale. The general formula is: rgamma(n, shape, rate = 1, scale =
1/rate).

rgamma(n = 5, shape = 3, scale = 2)

*****

Page 110 of 110

You might also like