0% found this document useful (0 votes)
27 views84 pages

R Manual

Uploaded by

rahul59985
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views84 pages

R Manual

Uploaded by

rahul59985
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 84

WEEK-1

Download and Install R and R Studio

R Download and Installation


1.Go to Google and type CRAN( The Comprehensive R Network) or go to https://cran.r-
project.org/

2.Click on Download R for Windows

3. Click on install R for first time


4. Download latest version of R for windows

5. Install R by double clicking the download R .exe file

6. Click on Run
7. Choose language of your choice

8. Click on Next
9. Select the appropriate folder to install and click next
10. Check for options and then click Next
11. Use default settings while installing
12. Click next to install

13. Installation is finished and then click on finish


14. Click on R shortcut
15. Go to Google and type R Studio or go to https://www.rstudio.com/

16. Click on Download Free desktop IDE


17. Click on Download and download RStudio

18. Double click on the downloaded RStudio . exe file


19. Click on Next

20. Choose installation location and click on Next


21. Click on Install

22. Installation will start


23. Installation will be finished

24. Click on R Studio shortcut

R PROGRAMMING
R is an interpreted computer programming language which was created by Ross Ihaka and
Robert Gentleman at the University of Auckland, New Zealand.

“The R Development Core Team” currently develops R. It is also a software environment used to
analyze statistical information, graphical representation, reporting, and data modeling.

R is the implementation of the S(Statistical)programming language

 R programming is an interpreted programming language widely used to analyze statistical


information and a graphical representation.
 R programming is popular in the field of data science among data analysts, researchers,
statisticians, etc.
 R is used to retrieve data from datasets, clean them, analyze and visualize them, and present
them in the most suitable way.
About R Programming
 Open-source Language - R is freely available to the public. You can make improvements
on the language and access hundreds of useful packages created by others for free.
 Domain-specific Language - R is a domain-specific language and its specialty lies in
statistical and data analysis.
 Strong Graphical Capabilities - R can also be used for data visualization. It provides
extended libraries that will help you produce high quality interactive graphics.
 Interpreted Language - R is an interpreted language like Python.
 Integration with Other Technologies - You can integrate your R code or project with
programming procedures written in C, C++, Python, .Net, etc.
History of R Programming

This programming language name is taken from the name of both the developers. The first
project was considered in 1992. The initial version was released in 1995, and in 2000, a stable
beta version was released.

Version-
Date Description
Release

0.49 1997-04-23 First Time R's Source Was Released, And CRAN
(Comprehensive R Archive Network) Was Started.

0.60 1997-12-05 R Officially Gets The GNU License.

0.65.1 1999-10-07 Update.Packages And Install.Packages Both Are Included.

1.0 2000-02-29 The First Production-Ready Version Was Released.

1.4 2001-12-19 First Version For Mac OS Is Made Available.

2.0 2004-10-04 The first version for Mac OS is made available.


2.1 2005-04-18 Add support for UTF-8encoding, internationalization,
localization etc.

2.11 2010-04-22 Add support for Windows 64-bit systems.

2.13 2011-04-14 Added a function that rapidly converts code to byte code.

2.14 2011-10-31 Added some new packages.

2.15 2012-03-30 Improved serialization speed for long vectors.

3.0 2013-04-03 Support for larger numeric values on 64-bit systems.

3.4 2017-04-21 The just-in-time compilation (JIT) is enabled by default.

3.5 2018-04-23 Added new features such as compact internal representation of


integer sequences, serialization format etc.

Features of R programming
R is a domain-specific programming language which aims to do data analysis. It has some
unique features which make it very powerful. The most important arguably being the notation of
vectors. These vectors allow us to perform a complex operation on a set of values in a single
command. There are the following features of R programming:
1. It is a simple and effective programming language which has been well developed.
2. It is data analysis software.
3. It is a well-designed, easy, and effective language which has the concepts of user-defined,
looping, conditional, and various I/O facilities.
4. It has a consistent and incorporated set of tools which are used for data analysis.
5. For different types of calculation on arrays, lists and vectors, R has a suite of operators.
6. It provides effective data handling and storage facility.
7. It is open-source, powerful, and highly extensible software.
8. It provides highly extensible graphical techniques.
9. It allows us to perform multiple calculations using vectors.
10. R is an interpreted language.
Comparison between R and Python

Comparison Index R Python


R is an interpreted computer Python is an Interpreted high-level
programming language which was programming language used for
created by Ross Ihaka and Robert general-purpose programming.
Gentleman .R is a software Guido Van Rossum created it, and
Overview
environment which is used to it was first released in 1991. Python
analyze statistical information, has very simple and clean code
graphical representation, reporting, syntax. It emphasizes the code
and data modeling. readability and debugging
R packages have advanced
For finding outliers in a data set
techniques which are very useful
both R and Python are equally
for statistical work. There are many
good. But for developing a web
Specialties for data useful R packages. These packages
science service to allow peoples to upload
cover everything from
datasets and find outliers, Python is
Psychometrics to Genetics to
better.
Finance.
Most of the data analysis
For data analysis, R has inbuilt functionalities are not inbuilt. They
Functionalities
functionalities are available through packages like
NumPy and Pandas
Python is better for deep learning
Data visualization is a key aspect of
because Python packages such as
analysis. R packages such as
Key domains of Caffe, Keras, OpenNN, etc. allows
application ggplot2, ggvis, lattice, etc. make
the development of the deep neural
data visualization easier.
network in a very simple way.
There are hundreds of packages and Python has few main packages such
Availability of ways to accomplish needful data as viz, Sccikit learn, and Pandas for
packages
science tasks. data analysis of machine learning.
R Variables

A variable is a named memory location where data is stored. Variables are used to store the
information to be manipulated and referenced in the R program
Data Types in R Programming Language

Each variable in R has an associated data type. Each R-Data Type requires different amounts of
memory and has some specific operations which can be performed over it.

Basic Data Types Description Example


Decimal value is called numeric in R.
Numeric Set of all Real Numbers. This data numeric value <- 3.14
type includes integers and doubles.
A double allows you to store numbers
Double as decimals. This is the default decimal value <- 8.42
treatment for numbers.
Set of all integers ,L tells R to store integer value <- 42L
Integer
the value as an integer
A character is used to represent string
Character values “a”, “b”, “c”, …, “@”, “#”, character value<- "Hello”
“$”, …., “1”, “2”, …etc
It is a special data type for data with Logical value <- TRUE
Logical
only two possible values i.e. true/false.
A complex value in R is imaginary Complex value <- 1 + 2i
Complex
value i.
Numeric Data type

Decimal values are called numeric’s in R. It is the default R data type for numbers in R. If you
assign a decimal value to a variable x as follows, x will be of numeric type.

1. A simple R program to illustrate Numeric data type

# Assign a decimal value to x


> x = 5.6

# print the class name of variable


> print(class(x))

# print the type of variable


> print(typeof(x))

OUTPUT
[1] “numeric”
[1] “double”

//When R stores a number in a variable, it converts the number into a “double” value or a
decimal type with at least two decimal places. This means that a value such as “5” here, is
stored as 5.00 with a type of double and a class of numeric .//

------------------------------------------------------------------------------------------------------------------

1. A simple R program to check Numeric data type

# Assign a integer value to y


>y=5

# is y an integer?
> print(is.integer(y))

OUTPUT
[1] “numeric”
[1] FALSE
------------------------------------------------------------------------------------------------------------------

// Check of data type is confirmed with the is.integer( ) function.//


Integer Data type

R supports integer data types which are the set of all integers. The capital ‘L’ notation is used
as a suffix to denote that a particular value is of the integer R data type.

1. Declare an integer by appending an L suffix.


> y = 5L

# print the class name of y


> print(class(y))

# print the type of y


> print(typeof(y))

OUTPUT
[1] “integer”
[1] “integer”
------------------------------------------------------------------------------------------------------------------

You can create as well as convert a value into an integer type using the as.integer() function.

1. Create an integer value


> x = as.integer(5)

# print the class name of x


> print(class(x))

# print the type of x


> print(typeof(x))

OUTPUT
[1] “integer”
[1] “integer”

------------------------------------------------------------------------------------------------------------------
Logical Data type

R has logical data types that take either a value of true or false. A logical value is often created
via a comparison between variables. Boolean values, which have two possible values, are
represented by this R data type: FALSE or TRUE

1. A simple R program to illustrate logical data type

# Sample values
>x=4
>y=3

# Comparing two values


>z=x>y

# print the logical value


> print(z)

# print the class name of z


> print(class(z))

# print the type of z


> print(typeof(z))

OUTPUT
[1] “logical”
[1] “logical”

------------------------------------------------------------------------------------------------------------------
Complex Data type

R supports complex data types that are set of all the complex numbers. The complex data type
is to store numbers with an imaginary component.

1. A simple R program to illustrate complex data type

# Assign a complex value to x


> x = 4 + 3i

# print the class name of x


> print(class(x))

# print the type of x


> print(typeof(x))

OUTPUT
[1] “complex”
[1] “complex”
------------------------------------------------------------------------------------------------------------------
Character Data type

R supports character data types where you have all the alphabets and special characters. It
stores character values or strings. Strings in R can contain alphabets, numbers, and symbols.
The easiest way to denote that a value is of character type in R data type is to wrap the value
inside single or double inverted commas.

1. A simple R program to illustrate character data type

# Assign a character value to char


> x = "My Name is RAHUL"

# print the class name of char


> print(class(x))

# print the type of char


> print(typeof(x))

OUTPUT
[1] “character”
[1] “character”
------------------------------------------------------------------------------------------------------------------
Find data type of an object

To find the data type of an object you have to use class ( ) function. The syntax for doing that
is you need to pass the object as an argument to the function class ( ) to find the data type of an
object.

Syntax:

class(object)

1. A simple R program to find data type of an object

# Logical
> print(class(TRUE))

# Integer
> print(class(3L))

# Numeric
> print(class(10.5))

# Complex
> print(class(1+2i))

# Character
> print(class("01-01-2024"))

OUTPUT
[1] “logical
[1] “integer”
[1] “numeric”
[1] “complex”
[1] “character”

------------------------------------------------------------------------------------------------------------------
Type verification

To verify, you need to use the prefix “ is.” before the data type as a command. The syntax for
that is, is.datatype( ) of the object you have to verify.

1. A simple R program to Verify if an object is of a certain data type

# Logical
> print(is.logical(TRUE))

# Integer
> print(is.integer(3L))

# Numeric
> print(is.numeric(10.5))

# Complex
> print(is.complex(1+2i))

# Character
> print(is.character("01-01-2024"))

> print(is.integer("a"))

> print(is.numeric(2+3i))

OUTPUT
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
[1] FALSE
------------------------------------------------------------------------------------------------------------------
Type Conversion
You can convert from one type to another with the following functions:
as.numeric( )
as.integer ( )
as.complex ( )
Example Program
x <- 1L # integer
y <- 2 # numeric
# convert from integer to numeric:
a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
# print values of x and y
x
y
# print the class name of a and b
class(a)
class(b)
Output

[1] 1
[1] 2
[1] “numeric”
[1] “integer”
WEEK-2
1. To output text in R, use single or double quotes:

"Hello World! “

‘Hello …… ‘

2. To output numbers, just type the number (without quotes):

5
10
25

3. To do simple calculations, add numbers together:

5+5
R Print Output:

Unlike many other programming languages, you can output code in R without using a print
function:

"Hello World!“

However, R does have a print ( ) function available if you want touse it

print ("Hello World!")

And there are times you must use the print ( ) function to output code, for example when
working with for loops

for (x in 1:10)

{
print(x)
}
R Variables

Variables are containers for storing data values. R does not have a command for declaring a
variable. A variable is created the moment you first assign a value to it.
In other programming language, it is common to use = as an assignment operator.
In R, we can use both = and <- as assignment operators.
To assign a value to a variable, use the <- sign.

PROGRAM
name <- "John"
age <- 40
print(name)
print(age)
OUTPUT
[1] "John"
[1] 40

R Operators

Operators are used to perform operations on variables and values. An operator is a symbol
which tells the compiler to perform specific logical or mathematical manipulations.
R programming is very rich in built-in operators . R divides the operators in the following
groups:
 Arithmetic operators
 Assignment operators
 Comparison operators
 Logical operators
 Miscellaneous operators
Arithmetic Operators

a <- c(2, 3.3, 4)


1. + This operator is used to add two vectors in R. b <- c(11, 5, 3)
print(a+b)

a <- c(2, 3.3, 4)


This operator is used to divide a vector from
2. - b <- c(11, 5, 3)
another one.
print(a-b)

a <- c(2, 3.3, 4)


This operator is used to multiply two vectors
3. * b <- c(11, 5, 3)
with each other.
print(a*b)

a <- c(2, 3.3, 4)


This operator divides the vector from another
4. / b <- c(11, 5, 3)
one.
print(a/b)

a <- c(2, 3.3, 4)


This operator is used to find the remainder of
5. %% b <- c(11, 5, 3)
the first vector with the second vector.
print(a%%b)

a <- c(2, 3.3, 4)


This operator is used to find the division of the
6. %/% b <- c(11, 5, 3)
first vector with the second (quotient).
print(a%/%b)

a <- c(2, 3.3, 4)


This operator raised the first vector to the
7. ^ b <- c(11, 5, 3)
exponent of the second vector.
print(a^b)
Assignment Operators

1. <- These operators are known as left a <- c(3, 0, TRUE, 2+2i)
or assignment operators. b <<- c(2,4, TRUE, 3)
d = c(1, 2, FALSE, 7)
= print(a)
or print(b)
<<- print(d)

2. -> These operators are known as right c(3, 0, FALSE, 2)-> a


assignment operators. c(2, 4, TRUE, 2+3i)->> b
or print(a)
->> print(b)

Logical Operators

This operator is known as the Element wise


Logical AND operator. This operator takes the a <- c(3, 0, FALSE, 2)
1. & first element of both the vector and returns b <- c(2, 4, TRUE, 3)
print(a&b)
TRUE if both the elements are TRUE.

This operator is called the Element wise Logical a <- c(3, 0, TRUE, 2
OR operator. This operator takes the first b <- c(2, 4, TRUE, 3)
2. | element of both the vector and returns TRUE if print(a|b)
one of them is TRUE.

This operator is known as Logical NOT a <- c(3, 1, TRUE, 2+2i)


operator. This operator takes the first element of print(!a)
3. !
the vector and gives the opposite logical value as
a result.

This operator takes the first element of both the a <- c(3, 0, TRUE, 2)
4. && vector and gives TRUE as a result, only if both b <- c(2, 4, TRUE, 2+3i)
are TRUE. print(a&&b)

This operator takes the first element of both the a <- c(3, 0, FALSE, 2)
vector and gives the result TRUE, if one of them b <- c(2, 4, TRUE, 2+3i)
5. || print(a||b)
is true.

Comparison / Relational Operators


This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than the b <- c(2, 4, 6)
1. > print(a>b)
corresponding element of the second vector.

This operator will return TRUE when every a <- c(1, 9, 5)


element in the first vector is less then the b <- c(2, 4, 6)
2. < print(a<b)
corresponding element of the second vector.

This operator will return TRUE when every a <- c(1, 3, 5)


element in the first vector is less than or equal to b <- c(2, 3, 6)
3. <= print(a<=b)
the corresponding element of another vector.

This operator will return TRUE when every a <- c(1, 3, 5)


element in the first vector is greater than or equal b <- c(2, 3, 6)
4. >= print(a>=b)
to the corresponding element of another vector.

This operator will return TRUE when every a <- c(1, 3, 5)


element in the first vector is equal to the b <- c(2, 3, 6)
5. == print(a==b)
corresponding element of the second vector.

This operator will return TRUE when every a <- c(1, 3, 5)


6. != element in the first vector is not equal to the b <- c(2, 3, 6)
corresponding element of the second vector. print(a!=b)

Miscellaneous Operators

v <- 1:8
The colon operator is used to create the series print(v)
1. :
of numbers in sequence for a vector.

a1 <- 8
a2 <- 12
This is used when we want to identify if an
2. %in% d <- 1:10
element belongs to a vector. print(a1%in%d)
print(a2%in%d)

3. %*% It is used to multiply a matrix with its M=matrix(c(1,2,3,4,5,6), nrow=2, ncol=3,


transpose. byrow=TRUE)
M
X=t(M)
X
Y=M*M
Y
T=M%*%t(M)
print(T)

b) Implement R script to read person's age from keyboard and display whether he is
eligible for voting or not

Program:1

{
age <- as.integer(readline(prompt = "Enter your age :"))
if (age >= 18){
print(paste("You are valid for voting :", age))
} else{
print(paste("You are not valid for voting :", age))
}
}

The readline( ) function in R Language reads text lines from an input file. This is perfect for
text files since it reads the text line by line

The paste( ) : Takes multiple elements from the multiple vectors and concatenates them into a
single element.

The as.integer( ): convert from any type to integer data type

Program:2

name = readline(prompt="Input your name: " )


age = readline(prompt="Input your age: " )
print(paste("My name is", name, " and I am ",age ," years old." ))
print(R.version.string)

c) Implement R script to find the biggest number between two numbers

Program: 1

a=readline(prompt="enter first number :")


b=readline(prompt="enter second number :")
a<-as.integer(a)
b<-as.integer(b)
if(a>b)
{
print(paste( a,"is biggest"))
}else
{
print(paste( b,"is biggest"))
}

d) Implement r script to check the given year is leap year or not


Function to check whether a year is a leap year or not
# A year is a leap year if it is divisible by 4
# It is not a leap year if it is divisible by 100, unless it is also divisible by 400
Program
is_leap_year = function(year) {
if (year %% 4 == 0 && (year %% 100 != 0 || year %% 400 == 0)) {
return(TRUE)
} else {
return(FALSE)
}
}
year <- 2024
if (is_leap_year(year)) {
print(paste(year, "IS A LEAP YEAR."))
} else {
print(paste(year, "IS NOT A LEAP YEAR."))
}
Another Way
year = as.integer(readline(prompt="Enter a year: "))
if((year %% 4) == 0) {
if((year %% 100) == 0) {
if((year %% 400) == 0) {
print(paste(year,"is a leap year"))
} else {
print(paste(year,"is not a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"is not a leap year"))
}

WEEK-3

List:

 A list in R can contain many different data types inside it.


 A list is a collection of data which is ordered and changeable.

1. List of strings

To create a list, use the list( ) function:

x <- list("apple", "banana", "cherry")

2. Access Lists

You can access the list items by referring to its index number, inside brackets [ ]

x <- list("apple", "banana", "cherry")


x[1]

3. List Length

To find out how many items a list has, use the length( ) function:

x <- list("apple", "banana", "cherry")


length(x)

4. Add List Items

To add an item to the end of the list, use the append( ) function:
x <- list("apple", "banana", "cherry")
x
z=append(x ,"orange“)
z

5. Join Two Lists

There are several ways to join, or concatenate, two or more lists in R.

The most common way is to use the c( ) function, which combines two elements together:

Example: list1 <- list("a", "b", "c")


ist2 <- list(1,2,3)
list3 <- c(list1,list2)
list3

a) Implement R script to create a list


Program:
EmpId = c(1, 2, 3, 4)
The first attributes is a numeric vector
EmpName = c("Raju", "Sandeep", "Subham", "Rani")
The second attribute is the employee name
NumberofEmp = 4
The third attribute is the number of employees
EmpList = list(EmpId, EmpName, NumberofEmp)
We can combine all these three different data types into a list
print(EmpList)

b) Implement R script to access element in the list

We can access components of an R list in two ways.

 Access components by names: All the components of a list can be named and we can use
those names to access the components of the R list using the dollar command.

 Access components by indices: We can also access the components of the R list using
indices. To access the top-level components of a R list we have to use a double slicing
operator “[[ ]]” which is two square brackets and if we want to access the lower or inner-
level components of a R list we have to use another square bracket “[ ]” along with the
double slicing operator “[[ ]]“.
The concatenated text will be displayed to the console by the cat( ) function, but the results won't
be saved in a variable. The concatenated string will be written to the console using the paste( )
function, and the results will be saved in a character variable

Program: 1
EmpId = c(1, 2, 3, 4)
EmpName = c("Raju", "Sandeep", "Subham", "Rani")
NumberofEmp = 4
EmpList=list("ID" = EmpId,"Names" = EmpName,"TotalStaff" =NumberOfEmp)
print(EmpList)
cat("Accessing name components using $ command\n")
print(EmpList$Names)
print(EmpList$ID)
print(EmpList$TotalStaff)

Program: 2
# Accessing a top level components by indices
paste("Accessing name components using indices\n")
print(EmpList[[2]])
# Accessing a inner level components by indices
paste("Accessing Sandeep from name using indices\n")
print(EmpList[[2]][2])
# Accessing another inner level components by indices
paste("Accessing 4 from ID using indices\n")
print(EmpList[[1]][4])

c) Implement r script to merge two or more lists

 R provided two inbuilt functions named c( ) and append( ) to combine two or more
lists.
Method 1: Using c( ) function
c( ) function in R language accepts two or more lists as parameters and returns another list with
the elements of both the lists.
Syntax: c(list1, list2)
Method 2: Using append( ) function
append( ) function in R language accepts two or more lists as parameters and returns another list
with the elements of both the lists.
Syntax: append(list1, list2)

Program:
List1 <- list(1:5)
List1
List2 <- list(6:10)
List2
List3 = append(List1, List2)
print(List3)

Program:
EmpId = c(1, 2, 3, 4)
EmpName = c("Raju", "Sandeep", "Subham", "Rani")
NumberofEmp = 4
EmpList=list("ID" = EmpId,"Names" = EmpName,"TotalStaff"=NumberofEmp)
Print(“BEFORE MERGING TWO LISTS”)
print(EmpList)
Empage=c(35,45,55,25)
EmpageList=list("AGE"=Empage)
Emp=c(EmpList,Empage)
Print(“AFTER MERGING TWO LISTS”)
Print(Emp)

d) Implement R script to perform matrix operation

Matrices

 A matrix is a two dimensional data set with rows and columns.


 A row is a horizontal and a column is a vertical representation of data.
 To create a matrix we use matrix( ) function.
Specify the nrow and ncol parameters to get the number of rows and columns:
Example:
1. x <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
print(x)
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
2. x <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
print(x)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

Number of Rows and Columns


Use the dim( ) function to find the number of rows and columns in a Matrix:
Example:
x<-matrix(c("apple", "banana", "cherry", "orange"),nrow= 2,ncol = 2)
dim(x)
Add Rows and Columns
1. Use the cbind( ) function to add additional columns in a Matrix:
Example:
matrix1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)
matrix 2<- cbind(matrix1, c(10,11,12))
matrix 2

2.Use the rbind( ) function to add additional rows in a Matrix:


Example:
matrix1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)
matrix 2<- rbind(matrix1, c(10,11,12))
matrix 2

Operations on Matrices

There are four basic operations i.e. DMAS (Division, Multiplication, Addition, Subtraction)
that can be done with matrices. Both the matrices involved in the operation should have the
same number of rows and columns.

Program:
B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
C = matrix(c(7, 8, 9, 10, 11, 12), nrow = 2, ncol = 3)
print(B)
print(C)
print(B + C)
print(B-C)
print(B*C)
print(B/C)
print t(B)

WEEK- 4
Implement R Script to perform various operations on Vectors

Vectors:
A vector is simply a list of items that are of the same type.

To combine the list of items to a vector, use the c( ) function and separate the items by a comma.

Example:

# Vector of strings

fruits <- c("banana", "apple", "orange")

# Vector of numerical values


numbers <- c(1, 2, 3)

# Vector with numerical values in a sequence


numbers <- 1:10

# Vector of logical values


logvalues <- c(TRUE, FALSE, TRUE, FALSE)

Vector Length
To find out how many items a vector has, use the length( ) function:
Example:
fruits <- c("banana", "apple", "orange")
length(fruits)
Sort a Vector
To sort items in a vector alphabetically or numerically, use the sort( ) function:
Example:
fruits <- c("banana", "apple", "orange", "mango", "lemon")
numbers <- c(13, 3, 5, 7, 20, 2)
sort(fruits)
sort(numbers)

Access Vectors
You can access the vector items by referring to its index number inside brackets [ ]
Example:
# Access the element using position number
X<-c(7,2,5,6,1,9)
X[3]
# Access the 1st and 3rd item (banana and orange)
fruits <- c("banana", "apple", "orange", "mango", "lemon")
fruits[c(1, 3)]
# Logical indexing
Z=c(7,8,1,2,6,9)
Z[Z>3]

Change an Item

To change the value of a specific item, refer to the index number:

Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Change "banana" to "pear"
fruits[1] <- "pear"
fruits
Repeat Vectors
To repeat vectors, use the rep( ) function:
1) Repeat each value
Example: x <- rep(c(1,2,3), each = 3)
x
2) Repeat the sequence of the vector:
Example: x<- rep(c(1,2,3), times = 3)
x
3) Repeat each value independently:
Example: x<- rep(c(1,2,3), times = 3)
x

Generating Sequenced Vectors


1) To create a vector with numerical values in a sequence with the : operator
Example: numbers <- 1:10
numbers
numbers <- seq(1,100)
numbers
2) To make bigger or smaller steps in a sequence, use the seq( ) function:
Example: numbers <- seq(from = 0, to = 100, by = 20)

Arithmetic operations

The arithmetic operations are performed member-by-member on vectors. We can add, subtract,
multiply, or divide two vectors.

Example: a<-c(1,3,5,7)
b<-c(2,4,6,8)
x=a+b
y=a-b
z=a/b
w=a*b
x
y
z
w
Deleting a R vector
Deletion of a Vector is the process of deleting all of the elements of the vector. This can be done
by assigning it to a NULL value.
Example:

# Creating a Vector
M<- c(8, 10, 2, 5)
# set NULL to the vector
M<- NULL
cat('Output vector', M)
Find the Sum, Mean and Product of a vector in R

sum( ), mean( ), and prod( ) methods are available in R which are used to compute the
specified operation over the arguments specified in the method. In case, a single vector is
specified, then the operation is performed over individual elements, which is equivalent to the
application of for loop.

Example:
vec = c(1, 2, 3 , 4)
print("Sum of the vector:")
print(sum(vec))
print("Mean of the vector:")
print(mean(vec))
print("Product of the vector:")
print(prod(vec))

Vector with NaN values


na.rm: Boolean value to ignore NA value

Example:
vec = c(1.1,NA, 2, 3.0,NA )
print("Sum of the vector:")
print(sum(vec,na.rm = TRUE))
print("Mean of the vector with NaN values:")
print(mean(vec))
print("Mean of the vector without NaN values:")
print(mean(vec,na.rm = FALSE))
print("Product of the vector:")
print(prod(vec,na.rm = TRUE))

Reverse the order of given vector in R


To reverse the order of elements in a vector in R Programming Language. It can be done using
the rev ( ) function. It returns the reverse version of data objects.
Example:
vec = c("SRAVAN", "MOHAN", "SUDHEER","RADHA", "VANI", "RAJU")
print("Original vector-1:")
print(vec)
rv = rev(vec)
print("The said vector in reverse order:")
print(rv)
Create multiple vectors and then reverse them.
Example:

name = c("sravan", "mohan", "sudheer", radha", "vani", "raju")


subjects = c(".net", "Python", "java", "dbms", "os", "dbms")
marks = c(98, 97, 89, 90, 87, 90)
height = c(5.97, 6.11, 5.89, 5.45, 5.78, 6.0)
weight = c(67, 65, 78, 65, 81, 76)
data = c(name, subjects, marks, height, weight)
print("Original vector-1:")
print(data)
rv = rev(data)
print("The said vector in reverse order:")
print(rv)

How to Find Min and Max Values Using the Range Function
Range in R returns a vector that contains the minimum and maximum values of the given
argument — known in statistics as a range.
Example: x=c(-10,-15,5,19,27,0)
range(x)
Example: x=c(-10,-15,5,NA,19,27,NA,0)
range(x,na.rm=TRUE)

Arrays
Arrays have more than two dimensions.
We can use the array( ) function to create an array, and the dim parameter to specify the
dimensions.
How does dim=c(4,3,2) work?
The first and second number in the bracket specifies the amount of rows and columns.
The last number in the bracket specifies how many dimensions we want.
Example: x <- c(1:24)
y<- array(x, dim=c(4,3,2))
y
Access Array Items
You can access the array elements by referring to the index position.
You can use the [ ] brackets to access the desired elements from an array
The syntax is as follow: array[row position, column position, matrix level]
Example: x <- c(1:24)
y <- array(x, dim = c(4, 3, 2))
y[2, 3, 2]

Naming Columns and Rows


We can give names to the rows, columns and matrices in the array by using the dimnames
parameter.
Example:

vector1 <- c(5,9,3)

vector2 <- c(10,11,12,13,14,15)

column.names <- c("COL1","COL2","COL3")

row.names <- c("ROW1","ROW2","ROW3")

matrix.names <- c("Matrix1","Matrix2")

result <- array(c(vector1,vector2),dim = c(3,3,2),

dimnames = list(row.names,column.names,matrix.names))

print(result)

b) Find the sum and average of given numbers using arrays


To find the sum of all array elements in R, we can use Reduce function with plus sign.
we can use the command Reduce("+",ARRAY).
Example: Array1<-array(1:25,c(5,5,1)) # Array1<-array(1:25)
Array1
x=Reduce("+",Array1)
x
avg=x/25
avg
c) To display elements of list in reverse order
Program”
1. vec = c("sravan", "mohan", "sudheer","radha", "vani", "mohan")
print("Original vector-1:")
print(vec)
rv = rev(vec)
print("The said vector in reverse order:")
print(rv)
2. name = c("sravan", "mohan", "sudheer", "radha", "vani", "mohan")
subjects = c(".net", "Python", "java", "dbms", "os", "dbms")
marks = c(98, 97, 89, 90, 87, 90)
height = c(5.97, 6.11, 5.89, 5.45, 5.78, 6.0)
weight = c(67, 65, 78, 65, 81, 76)
data = c(name, subjects, marks, height, weight)
print("Original vector-1:")
print(data)
rv = rev(data)
print("The said vector in reverse order:")
print(rv)

d) Find the maximum and minimum elements in array


In R, we can find the minimum or maximum value of a vector or data frame.
We use the min( ) and max( ) function to find minimum and maximum value respectively.
The min( ) function returns the minimum value of a vector or data frame.
The max( ) function returns the maximum value of a vector or data frame.

In R Programming Language range( ) function is used to get the minimum and maximum
values of the vector passed to it as an argument.
# Creating a vector
x <- c(8, 2, Inf, 5, 4, NA, 9, 54, 18)
# Calling range( ) function
range(x)
# Calling range() function
# excluding NA values
range(x, na.rm = TRUE)
# Calling range( ) function
# excluding finite values
range(x, na.rm = TRUE, finite = TRUE)

WEEK-5
a) Implement R script to perform various operations on matrices

Create a matrix
R provides the matrix( ) function to create a matrix.
This function plays an important role in data analysis.
There is the following syntax of the matrix in R:
matrix(data, nrow, ncol, byrow, dim_name)

#Arranging elements sequentially by row.


P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
print(P)

# Arranging elements sequentially by column.


Q <- matrix(c(5:16), nrow = 4, byrow = FALSE)
print(Q)

# Defining the column and row names.


row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print(R)

Access a matrix
1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth column.

#Accessing element present on 3rd row and 2nd column


print(R[3,2])
#Accessing element present in 3rd row
print(R[3,])
#Accessing element present in 2nd column
print(R[,2])

Modification of the matrix


Assign a single element
The basic syntax for it is as follows:
matrix[n, m]<-y
#Assigning value 20 to the element at 3d row and 2nd column
R[3,2]<-20
print(R)
Use of Relational Operator
#Replacing elements whose values are greater than 12
R[R>12]<-0
Addition of Rows and Columns
row_names = c("row1", "row2", "row3", "row4")
ccol_names = c("col1", "col2", "col3")
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print(R)
Adding row
rbind(R,c(17,18,19))
Adding column
cbind(R,c(17,18,19,20))
Transpose of the matrix using the t( ) function:
t(R)
Modifying the dimension of the matrix using the dim( ) function
dim(R)<-c(1,12) * One Row
print(R)
dim(R)<-c(2,6) * Two Rows
dim(R)<-c(3,4) * Three Rows

Dataframe
A data frame is a two-dimensional array-like structure or a table in which a column contains
values of one variable, and rows contains one set of values from each column. A data frame is a
special case of the list in which each component has equal length.
A data frame is used to store data table and the vectors which are present in the form of a list in
a data frame

In R, the data frames are created with the help of data.frame ( ) function of data. This function
contains the vectors of any type such as numeric, character, or integer.
Creating the data frame.
emp_data=data.frame( employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27"),))
print(emp_data)

Structure of our data frame

R provides an in-build function called str( ) which returns the data with its complete structure.
str(emp_data)

Extracting data from Data Frame

We can extract the data in three ways which are as follows:

1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.

Extracting specific columns from a data frame


final <- data.frame(emp_data$employee_id,emp_data$sal)
print(final)

Extracting the specific rows from a data frame

Extracting first row from a data frame


final <- emp_data[1, ]
print(final)
Extracting specific rows corresponding to specific columns

Extracting 2nd and 3rd row corresponding to the 1st and 4th column
final <- emp_data[c(2,3),c(1,4)]
print(final)

The data frame is expanded by adding rows and columns.


1. Add a column by adding a column vector with the help of a new column name using
cbind( ) function.
2. Add rows by adding new rows in the same structure as the existing data frame and using
rbind( ) function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.

Week 6
a) Write an R script to find basic descriptive statistics using Summary, str, and Quartile
function on mtcars and cars datasets.

R programming language provides us with lots of simple effective functions to perform


descriptive statistics and gain more knowledge about data. Summarizing the data, calculating
average measures, finding out cumulative measures, summarizing rows/columns of data
structures, etc. everything is possible with trivial commands.

Let’s start simple with the summarizing functions str ( ) and summary ( ).

The str( ) function takes a single object as an argument and compactly shows us the structure of
the input object. It shows us details like length, data type, names and other specifics about the
components of the object.

The mtcars dataset is a built-in dataset in R that contains measurements on 11 different


attributes for 32 different cars.

Since the mtcars dataset is a built-in dataset in R, we can load it by using the following
command:

 data(mtcars)

model mpg cyl disp Hp drat wt qsec vs am gear car


b
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet
Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18 0 0 3 3
Cadillac
Fleetwood 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4
Lincoln
Continental 10.4 8 460 215 3 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.2 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.9 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.7 2.465 20.01 1 0 3 1
Dodge
Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2
AMC Javelin 15.2 8 304 150 3.15 3.435 17.3 0 0 3 2
Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79 66 4.08 1.935 18.9 1 1 4 1
Porsche 914-2 26 4 120.3 91 4.43 2.14 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4
Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
Maserati Bora 15 8 301 335 3.54 3.57 14.6 0 1 5 8
Volvo 142E 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2

mpg - Miles per Gallon


cyl - # of cylinders
disp - displacement, in cubic inches
hp - horsepower
drat - driveshaft ratio
wt - weight
qsec - 1/4 mile time; a measure of acceleration
vs - 'V' or straight - engine shape
am - transmission; auto or manual
gear - # of gears
carb - # of carburetors.

We can take a look at the first six rows of the dataset by using the head ( ) function:

 head(mtcars)

We can use the summary ( ) function to quickly summarize each variable in the dataset:

 summary(mtcars )

For each of the 11 variables we can see the following information:

 Min: The minimum value.


 1st Qu: The value of the first quartile (25th percentile).
 Median: The median value.
 Mean: The mean value.
 3rd Qu: The value of the third quartile (75th percentile).
 Max: The maximum value.

We can use the dim ( ) function to get the dimensions of the dataset in terms of number of rows
and number of columns:

 dim( mtcars)

We can also use the names ( ) function to display the column names of the data frame:

 names( mtcars)

Visualize the mtcars Dataset


We can also create some plots to visualize the values in the dataset.

We can use hist ( ) function to create a histogram of the values for a certain variable:

Create a histogram for mpg

 hist(mtcars$mpg, col='steelblue', main='Histogram', xlab='mpg',ylab='Frequency')

We can also use plot( ) function to create a scatter plot of any pair wise combination of variables:

plot(mtcars$mpg, mtcars$wt, col='steelblue', main='Scatterplot', xlab='mpg',ylab='wt', pch=19)

Getting the Average Measures


R provides a number of functions that give us different average measures for given data.
These average measures include:
Mean: The mean of a given set of numeric or logical values (it may be a vector or a row or
column of any other data structure) can be easily found using the mean( ) function.
Median: Finding the median of a set of numeric or logical values is also very easy by using
the median( ) function.
Standard deviation: The standard deviation of a set of numerical values can be found using
the sd( ) function.
Variance: The var( ) function gives us the variance of a set of numeric or logical values.
Median Absolute Variance: The median absolute variance of a set of numeric or logical values
can be found by using the mad( ) function.
Maximum: In a given set of numeric or logical values, we can use the max( ) function to find
the maximum or the largest value in the set.
Minimum: The min( ) function is a very handy way to find out the smallest value in a set of
numeric values.
Sum: The sum of a set of numerical values can be found by simply using the sum( ) function.
Length: The length or the number of values in a set is given by the length( ) function.

mean(mtcars$mpg)
median(mtcars$mpg)

sd(mtcars$mpg)

var(mtcars$mpg)

mad(mtcars$mpg)

max(mtcars$mpg, na.rm = TRUE)

min(mtcars$mpg, na.rm = TRUE)

sum(mtcars$mpg)

length(mtcars$mpg)

Cumulative measures in R
Cumulative measures are statistical measures that are calculated sequentially.
These measures evolve with the data.
They provide insight into the progression and growth of the data.
R provides a few functions that calculate cumulative measures with ease. These functions are
Cumulative sum: The cumsum( ) function calculates the cumulative sum of a given vector.
Cumulative max: To find the cumulative maximum value of an input vector, you can use
the cummax( ) function.
Cumulative min: You can find the cumulative minimum values in a vector by using the cummin(
) function.
Cumulative product: Using the cumprod () function, you can find the cumulative product of a
vector.
a <- c(1:9,4,2,4,5:2)

cumsum(a)

cummax(a)

cummin(a)

cumprod(a)
Row and Column Summary Functions in R
There are certain functions in R that give summary statistics for only selected rows or
columns of data frames or matrices or any other two or more dimensional data structure.
These functions are:
RowMeans: The rowMeans( ) function returns the mean of a selected row of a data structure.
RowSums: The rowSums( ) function finds the sum of a selected row of a data structure.
ColMeans: The colMeans( ) function returns the mean of a selected column of a data structure.
ColSums: The colSums( ) function calculate the sum of a selected column of a data structure.

rowMeans(mtcars[2,])

rowSums(mtcars[2,])

colMeans(mtcars)

colSums(mtcars)
Sorting and Ordering the Data
The sort( ) and the order( ) functions are included in the base package of R and are used to sort
or order the data in the desired order.
1. The sort function
The sort( ) function sorts the elements of a vector or a factor in increasing or decreasing order.
The syntax of the sort function is:
Sort(x, decreasing = FALSE, na.last = NA)
 x is the input vector or factor that has to be sorted.
 decreasing is a boolean that controls whether the input vector or factor is to be sorted in
decreasing order (when set to TRUE) or in increasing order (when set to FALSE).
 na.last is an argument that controls the treatment of the NA values present inside the input
vector/factor. If na.last is set as TRUE, then the NA values are put at the last. If it is set
as FALSE, then the NA values are put first. Finally, if it is set as NA, then the NA values
are removed.
sort(c(3,16,34,77,29,95,24,47,92,64,43), decreasing = FALSE)

2. The order function


The order( ) function returns the indices of the elements of the input objects in ascending or
descending order.
Here is the syntax of the order function.
order(. . . , na.last = TRUE, decreasing = FALSE, method = c("auto", "shell", "radix"))
Where:
. . . is a sequence of numeric, character, logical or complex vectors or is a classed R object.
This is the first argument of the function and is the object(s) that has to be ordered.
na.last is the argument that controls the treatment of NA values.
decreasing controls whether the order of the object will be decreasing or increasing.
method is a character string that specifies the algorithm to be used. method can take the value
of “auto”, “radix”, or “shell”.
a <- c(20,40,70,10,50,30,90,60)
order(a)
a[order(a)]
Subsetting a Dataset in R

There are multiple ways to make subsets of a dataset in R. Depending on the shape and size of
the subset, you can either use different operators to index certain parts of a dataset and assign
those parts to a variable. These operators are:
1. The $ operator
The $ sign can be used to access a single variable (column) of a dataset. The result of using this
notation is a single length vector.
2. The [[operator
The [[operator selects a single element like the $ notation. Unlike the $ operator, the [[operator
can be used by specifying the target position instead of the name of the target element.
3. The [operator
The [operator takes a numeric, character, or a logical vector to identify its target. This operator
returns multiple elements depending on the given target indices.
 mtcars$hp
 mtcars[[4]]
 mtcars[4]
The sample function

The sample( ) function returns random samples of the given data. The arguments of the function
can be used to specify how big the samples need to be and also how many samples should be
returned.
 sample(mtcars, 3)
Merging Datasets
There are multiple ways to merging/combining datasets in R. We will be taking a look at
the cbind( ), the rbind( ), and the merge() functions of R that allow us to do so.
1. The cbind function
The cbind() function combines two dataset (or data frames) along their columns.
m1 <- matrix(c(1:9),c(3,3))
m2 <- matrix(c(10:18),c(3,3))
cbind(m1,m2)
2. The rbind function
The rbind() function combines two data frames along their rows. If the two data frames have
identical variables, then rbind is the easiest way to combine them into one data frame with a
larger number of rows.
rbind(m1,m2)
3. The merge function
The merge( ) function performs what is called a join operation in databases. This function
combines two data frames based on common columns.
names <- c('v1','v2','v3')
colnames(m1) <- names
colnames(m2) <- names
merge(m1,m2, by = names, all = TRUE)

The apply family of functions


The apply collection of functions act like substitutes for loops in R
1. The apply function
The apply( ) function applies a function over the margins of the array or a matrix and returns the
results in the form of a vector, list or an array.
apply(m1, 1, sum)
2. The lapply function
The lapply( ) function applies a given function over the elements of an input vector. The function
returns the results in the form of a list which is of the same length as the input vector.
list1 <- list(c(1:5),c(3,46,7,3,6,4,6),c(1:15))
lapply(list1, mean)
lapply(list1,sum)
lapply(list1, max)
lapply(list1, min)
3. The sapply function
The sapply( ) function does the same job as the lapply() function. The difference being that the
sapply function returns the output in the most simplified data structure possible unless the
simplify argument is set to FALSE.
sapply(list1, mean)
sapply(list1, mean, simplify = FALSE)

Manipulations with Iris Data set


To load datasets package
library("datasets")
To load iris dataset
data(iris)
print(data.frame(iris))
Sample( )
It is used to generate a sample of a specific size from a vector or a dataset
To return 5 random rows
index<-sample(1:nrow(iris), 5)
index
iris[index,]
Table( )
It is used to create a frequency table to calculate the occurrences of unique values of a variable.
table(iris$Species)
Subsetting Rows and Columns by Index
To subset your data, square brackets are used after your dataset object. The rows of your
dataset are specified as the first element inside the square brackets, and the columns of your
dataset are specified as the second, separated by a comma
data[rows, columns]
I iris[1,2]
iris[1, ]
iris[ ,2]
iris[1,3:5]
Subsetting Rows and Columns by Name
In R, the rows and columns of your dataset have name attributes. Row names are rarely used and
by default provide indices. Integers numbering from 1 to the number of rows of your dataset
rownames(iris)
nrow(iris)
Column names on the other hand, are ubiquitous to almost any dataset. You can access these
with the colnames( ) function or the names( ) function:
colnames(iris)
names(iris)
Subsetting Rows and Columns by Value
Subsetting your rows and columns by value often allows the most flexibility.
To extract the data on Iris setosa using a conditional statement like this:

iris[iris$Species == "setosa", ]
WEEK-7

a) Reading Data from TXT/ CSV Files: R Base Functions


R base functions for importing data
The R base function
1. read.table( ) is a general function that can be used to read a file in table format. The data
will be imported as a data frame.
Syntax: read.table(file, header = FALSE, sep = " ", dec = ".")
file: the path to the file containing the data to be imported into R.
sep: the field separator character. “\t” is used for tab-delimited file.
header: logical value. If TRUE, read.table() assumes that your file has a header
row, so row 1 is the name of each column. If that’s not the case, you can
add the argument header = FALSE.
dec: the character used in the file for decimal points.
2. read.csv( ): for reading “comma separated value” files (“.csv”).
Syntax: read.csv(file, header = TRUE, sep = ",", dec = ".", ...)
3. read.csv2( ): variant used in countries that use a comma “,” as decimal point and a
semicolon “;” as field separators.
Syntax: read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
4. read.delim( ): for reading “tab-separated value” files (“.txt”). By default, point (“.”) is
used as decimal points.
Syntax: read.delim(file, header = TRUE, sep = “\t”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be read into R.
header: a logical value. If TRUE, read.delim( ) assumes that your file has a
header row, so row 1 is the name of each column. If that’s not the case, you can
add the argument header = FALSE.
sep: the field separator character. “\t” is used for a tab-delimited file.
dec: the character used in the file for decimal points.

Reading a local file

 To import a local .txt or a .csv file, the syntax would be:


mydata<-read.csv ("mtcars.csv")
 It’s also possible to choose a file interactively using the function file.choose( )
mydata<-read.csv(file.choose())

Reading a file from internet

 It’s possible to use the functions read.delim( ), read.csv( ) and read.table( ) to


import files from the web.
mydata <- read.delim("http://www.sthda.com/upload/boxplot_format.txt")
head(mydata)

b) Reading Data From Excel Files (xls|xlsx) into R

Importing Excel files into R using readxl package


The readxl package, developed by Hadley Wickham, can be used to easily import Excel
files (xls|xlsx) into R without any external dependencies.
Installing and loading readxl package
 Install
install.packages("readxl")
 Load
library("readxl")
Using readxl package
The readxl package comes with the function read_excel( ) to read xls and xlsx files
1. Read both xls and xlsx files
 Install
install.packages("xlsx")
 Load
library("xlsx")

my_data <- read_excel("C:/Users/CSE/downloads/CSE-A-PM.xlsx")

It’s also possible to choose a file interactively using the function file.choose( ), which I
recommend if you’re a beginner in R programming:
mydata <- read_excel (file.choose())
Specify sheet with a number or name
read.xlsx(file, sheetIndex, header=TRUE)
c) Another way to import Excel file after calling the read xl library is …
d) In R Studio, go to File>Import Dataset>from Excel and browse…

Exporting Excel files from R


Writing Excel files using xlsx package
Installing and loading xlsx package
 Install
install.packages("openxlsx")
 Load
library("openxlsx")
write.xlsx(x, file, sheetName = "Sheet1", col.names = TRUE, row.names =
TRUE, append = FALSE)
 x: a data.frame to be written into the workbook
 file: the path to the output file
 sheetName: a character string to use for the sheet name.
 col.names, row.names: a logical value specifying whether the column names/row names of
x are to be written to the file
 append: a logical value indicating if x should be appended to an existing file.

c) Reading XML dataset in R


XML is a format for representing information.
Just like HTML, it also supports markup tags, but unlike HTML, where the markup tag defines
the structure of the page, the XML markup tags define the meaning of the data that is contained
in the files.
In R, an XML file can be read using the XML package.
The following command is used to install this package:
install.packages("XML")
To process XML data with R, we can include the XML library in our program by using:
library("XML")
Reading XML File
The XML file can be read after installing the package and then parsing it
with xmlparse( ) function, which takes as input the XML file name and prints the content of
the file in the form of a list.
Loading the library and other important packages
library("XML")
library("methods")
data <- xmlParse(file = "sample.xml")
print(data)
Convert the input xml file to a data frame
dataframe <- xmlToDataFrame("sample.xml")
print(dataframe)
WEEK -8

a) Implement R Script to create a Pie chart, Bar Chart, scatter plot and Histogram

Pie chart

A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical
proportions. It depicts a special chart that uses “pie slices”, where each sector shows the relative
sizes of data. A circular chart cuts in the form of radii into segments describing relative
frequencies or magnitude also known as a circle graph.
R Programming Language uses the function pie( ) to create pie charts.
It takes positive numbers as a vector input.
Syntax: pie(x, labels, radius, main, col, clockwise)
 x: This parameter is a vector that contains the numeric values which are used in the pie chart.
 labels: This parameter gives the description to the slices in pie chart.
 radius: This parameter is used to indicate the radius of the circle of the pie chart.(value
between -1 and +1).
 main: This parameter is represents title of the pie chart.
 clockwise: This parameter contains the logical value which indicates whether the slices are
drawn clockwise or in anti clockwise direction.
 col: This parameter give colors to the pie in the graph.

Create data for the graph.


values<- c(23, 56, 20, 63)
cities <- c("Mumbai", "Pune", "Chennai", "Bangalore")
Plot the chart.
pie(values, cities)
Plot the chart with title and rainbow color
Use the label parameter to add a label to the pie chart, and use main parameter to add a header:
x <- c(10,20,30,40)
mylabel <- c("Apples", "Bananas", "Cherries", "Dates")
pie(x, label = mylabel, main = "Fruits")

Plot the chart with title and rainbow color


pie(values, cities, main = "City pie chart",col =rainbow(length(values)))

Examples
slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels = lbls, main="Pie Chart of Countries", col =rainbow(length(lbls))))

The start angle of the pie chart with the init.angle parameter.

The value of init.angle is defined with angle in degrees, where default angle is 0.

x <- c(10,20,30,40)
pie(x, init.angle = 90)
To add a list of explanation for each pie, use the legend() function:
mylabel <- c("Apples", "Bananas", "Cherries", "Dates")
colors <- c("blue", "yellow", "green", "black")
pie(x, label = mylabel, main = "Pie Chart", col = colors)
legend("bottomright", mylabel, fill = colors)

Bar Plots

A bar chart represents data in rectangular bars with length of the bar proportional to the value of
the variable. R uses the function barplot( ) to create bar charts. R can draw both vertical and
horizontal bars in the bar chart. In bar chart each of the bars can be given different colors.

The basic syntax to create a bar-chart in R

barplot(H,xlab,ylab,main, names.arg,col)

 H is a vector or matrix containing numeric values used in bar chart.


 xlab is the label for x axis.
 ylab is the label for y axis.
 main is the title of the bar chart.
 names.arg is a vector of names appearing under each bar.
 col is used to give colors to the bars in the graph.
Example:

H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")
barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="blue",
main="Revenue chart")

barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="blue",
main="Revenue chart",border="red")
To create a horizontal bar chart:
barplot(A, horiz=FALSE )

Histograms
A histogram contains a rectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical intervals. A
graphical representation that manages a group of data points into different specified ranges. It has
a special feature that shows no gaps between the bars and is similar to a vertical bar graph.
R Programming Language using the hist( ) function.
Syntax: hist(v, main, xlab,ylab, xlim, ylim, breaks, col, border)

 v: This parameter contains numerical values used in histogram.


 main: This parameter main is the title of the chart.
 col: This parameter is used to set color of the bars.
 xlab: This parameter is the label for horizontal axis.
 ylab: This parameter is the label for vertical axis.
 border: This parameter is used to set border color of each bar.
 xlim: This parameter is used for plotting values of x-axis.
 ylim: This parameter is used for plotting values of y-axis.
 breaks: This parameter is used as width of each bar.

Example:
v <- c(19, 23, 11, 5, 16, 21, 32,14, 19, 27, 39)
hist(v,xlab ="No.of Articles ",col = "green", border = "black")
hist(v,xlab="No.ofArticles",ylab="frequency",col = "red",border="white")
Scatter Plot

A "scatter plot" is a type of plot used to display the relationship between two numerical variables,
and plots one dot for each observation.

It needs two vectors of same length, one for the x-axis (horizontal) and one for the y-axis (vertical):

Syntax: plot(x, y, main, xlab, ylab, xlim, ylim, axes)


Parameters:
 x: This parameter sets the horizontal coordinates.
 y: This parameter sets the vertical coordinates.
 xlab: This parameter is the label for horizontal axis.
 ylab: This parameter is the label for vertical axis.
 main: This parameter main is the title of the chart.
 xlim: This parameter is used for plotting values of x.
 ylim: This parameter is used for plotting values of y.
 axes: This parameter indicates whether both axes should be drawn on the plot.

Example1
x<-c(5,7,8,7,2,2,9,4,11,12,9,6)
y<- c(99,86,87,88,111,103,87,94,78,77,85,86)
plot(x, y)
plot(x,y,main="Observation of Cars", xlab="Car age", ylab="Car speed")

Example2
input <- mtcars[, c('wt', 'mpg')]
print(head(input))
plot(x = input$wt,y = input$mpg,xlab = "Weight",ylab = "Milage",
xlim = c(1.5, 4), ylim = c(10, 25),main = "Weight vs Milage")
b) Mean in R Programming Language
 It is the sum of observations divided by the total number of observations.
 It is also defined as average which is the sum divided by count.

Where, n = number of terms

x <- read.csv("C:/Users/CSE/Downloads/CardioGoodFitness.csv")
view(x)
head(x)
mean = mean(x$Age)
mean

Median in R Programming Language


It is the middle value of the data set. It splits the data into two halves. If the number of
elements in the data set is odd then the center element is median and if it is even then
the median would be the average of two central elements.

Where n = number of terms


Syntax: median(x, na.rm = False)
Where, X is a vector and na.rm is used to remove missing value

med = median(x$Age)
med
Example
x <- c(1, 2, NA, 4, 5, NA, 7, 8, NA, 9, 10)

mean(x, na.rm = TRUE)

median(x, na.rm = TRUE)

Variance

Variance is the sum of squares of differences between all numbers and means
One can calculate the variance by using var() function in R.
Syntax: var(x)
list = c(2, 4, 4, 4, 5, 5, 7, 9)
print(var(list))
Standard Deviation
Standard Deviation is the square root of variance. It is a measure of the extent to which data
varies from the mean.
Syntax: sd(x)
list = c(2, 4, 4, 4, 5, 5, 7, 9)
print(sd(list))
list1= c(290, 124, 127, 899)
print(sd(list1))
WEEK -9

Normal Distribution

Normal Distribution is a probability function used in statistics that tells about how the data
values are distributed. It is the most important probability distribution function used in
statistics.
It is generally observed that data distribution is normal when there is a random collection of
data from independent sources.
The graph produced after plotting the value of the variable on x-axis and count of the value on
y-axis is bell-shaped curve graph. The graph signifies that the peak point is the mean of the
data set and half of the values of data set lie on the left side of the mean and other half lies
on the right part of the mean telling about the distribution of the values. The graph is
symmetric distribution.
In R, there are 4 built-in functions to generate normal distribution:
dnorm(x, mean, sd)
dnorm( ) function in R programming measures density function of distribution
pnorm(x, mean, sd)
pnorm( ) function is the cumulative distribution function which measures the probability that a
random number X takes a value less than or equal to x
qnorm(p, mean, sd)
qnorm( ) function is the inverse of pnorm( ) function.
It takes the probability value and gives output which corresponds to the probability value.
rnorm(n, mean, sd)
rnorm( ) function in R programming is used to generate a vector of random numbers which are
normally distributed.
x represents the data set of values
mean(x) represents the mean of data set x. It’s default value is 0.
n is the number of observations.
p is vector of probabilities
Examples
x = seq(-15, 15, by=0.1)
x
y = dnorm(x, mean(x), sd(x))
plot(x, y)
y = pnorm(x, mean(x), sd(x))
plot(x, y)
y = qnorm(x, mean(x), sd(x))
plot(x, y)
y = rnorm(x, mean(x), sd(x))
plot(x, y)
y=rnorm(50)
hist(y,main="Normal Distribution")

Binomial Distribution

Binomial distribution in R is a probability distribution used in statistics. The binomial


distribution is a discrete distribution and has only two outcomes i.e. success or failure.
All its trials are independent, the probability of success remains the same and the previous
outcome does not affect the next outcome.
It is also used in many real-life scenarios such as in determining whether a particular lottery
ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine
the number of heads or tails in a finite number of tosses
We have four functions for handling binomial distribution in R namely:
1. dbinom(k, n, p)
This function is used to find probability at a particular value for a data that follows binomial
distribution

Example:
dbinom(3, size = 13, prob = 1/6)
probabilities <- dbinom(x = c(0:10), size = 10, prob = 1 / 6)
plot(0:10, probabilities, type = "l")
data.frame(probabilities)
2. pbinom(k, n, p)
The function pbinom( ) is used to find the cumulative probability of a data following binomial
distribution till a given value ie it finds

where n is total number of trials, p is probability of success, k is the value at which the
probability has to be found out.

pbinom(3, size = 13, prob = 1 / 6)


plot(0:10, pbinom(0:10, size = 10, prob = 1 / 6), type = "l")
3.qbinom(P, n, p)
This function is used to find the nth quantile
Where P is the probability, n is the total number of trials and p is the probability of success.
Example:
qbinom(0.8419226, size = 13, prob = 1 / 6)
x <- seq(0, 1, by = 0.1)
y <- qbinom(x, size = 13, prob = 1 / 6)
plot(x, y, type = 'l')
4.rbinom(n, N, p)
This function generates n random variables of a particular probability.
rbinom(8, size = 13, prob = 1 / 6)
hist(rbinom(8, size = 13, prob = 1 / 6))

b) Correlation, Linear and Multiple regression


Correlation is a statistical measure that indicates how strongly two variables are related. It
involves the relationship between multiple variables as well. For instance, if one is interested
to know whether there is a relationship between the heights of fathers and sons, a correlation
coefficient can be calculated to answer this question. Generally, it lies between -1 and +1.
R Programming Language provides two methods to calculate the pearson correlation
coefficient.
By using the functions cor( ) or cor.test( ) it can be calculated.
It can be noted that cor( ) computes the correlation coefficient whereas
cor.test( ) computes the test for association or correlation between paired samples.
Output
 T is the value of the test statistic (T = 1.4186)
 p-value is the significance level of the test statistic (p-value = 0.2152).
 Alternative hypothesis is a character string describing the alternative hypothesis (true
correlation is not equal to 0).
 sample estimates is the correlation coefficient. For Pearson correlation coefficient it’s
named as cor (Cor.coeff = 0.5357).

Linear Regression
Linear Regression is a commonly used type of predictive analysis. Regression analysis is a
very widely used statistical tool to establish a relationship model between two variables.
One of these variable is called predictor variable whose value is gathered through
experiments.
The other variable is called response variable whose value is derived from the predictor
variable.
There are two types of linear regression.
Simple Linear Regression
Multiple Linear Regression
A simple linear regression aims to model the relationship between the magnitude of a single
independent variable X and a dependent variable Y by trying to estimate exactly how
much Y will change when X changes by a certain amount.
The independent variable X, also called the predictor, is the variable used to make the
prediction.
The dependent variable Y, also known as the response, is the one we are trying to predict.
The general mathematical equation for a linear regression is
y = ax + b
Following is the description of the parameters used −
 y is the response variable.
 x is the predictor variable.
 a and b are constants which are called the coefficients.
Steps to Establish a Regression
A simple example of regression is predicting weight of a person when his height is known.
 Create a relationship model using the lm( ) functions in R.
 Get a summary of the relationship model to know the average error in prediction. Also
called residuals.
 To predict the weight of new persons, use the predict( ) function in R.

Example
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
print(relation)
print(summary(relation))
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

Multiple regression is an extension of linear regression into relationship between more than
two variables. In simple linear relation we have one predictor and one response variable, but in
multiple regression we have more than one predictor variable and one response variable.

The general mathematical equation for multiple regression is −

y = a + b1x1 + b2x2 +...bnxn

Following is the description of the parameters used −

 y is the response variable.


 a, b1, b2...bn are the coefficients.
 x1, x2, ...xn are the predictor variables.
This function creates the relationship model between the predictor and the response variable.
Syntax
The basic syntax for lm() function in multiple regression is −
lm(y ~ x1+x2+x3...,data)
Example:
input <- mtcars[,c("mpg","disp","hp","wt")]
print(head(input))
model <- lm(mpg~disp+hp+wt, data = input)
print(model)
a <- coef(model)[1]
print(a)
Xdisp <- coef(model)[2]
Xhp <- coef(model)[3]
Xwt <- coef(model)[4]
print(Xdisp)
print(Xhp)
print(Xwt)
WEEK-10

Time Series Analysis in R

Time Series Analysis in R is used to see how an object behaves over some time. It can be
easily done by the ts( ) function with some parameters. Time series takes the data vector and
each data is connected with a timestamp value as given by the user. It is used to learn and
forecast the behavior of an asset in business for a while.
Syntax: objectName <- ts(data, start, end, frequency)
where,
 data – represents the data vector
 start – represents the first observation in time series
 end – represents the last observation in time series
 frequency – represents number of observations per unit time. For example, frequency=1 for
monthly data.
Example:
Weekly data of COVID-19 positive cases from 22 January, 2020 to 15 April, 2020

x <- c(580, 7813, 28266, 59287, 75700,87820, 95314, 126214, 218843, 471497, 936851,
1508725, 2072113)
install.packages("lubridate")
library(lubridate)
mts <- ts(x, start = decimal_date(ymd("2020-01-22")), frequency = 365.25 / 7)
plot(mts, xlab ="Weekly Data",ylab ="Total Positive Cases",main ="COVID-19
Pandemic",col.main ="darkgreen")
Data transformation is the process of cleaning and organizing data from one format into
another. It’s one of the key aspects of work for data analysis, data science and even artificial
intelligence.
Factors are data structures that are implemented to categorize the data or represent categorical
data and store it on multiple levels. They can be stored as integers with a corresponding label to
every unique integer. Though factors may look similar to character vectors, they are integers.
Convert the data vector into a factor.
The factor( ) command is used to create and modify factors in R.

Example
v =c(1,2,3,3,4, NA,3,2,4,5, NA,5)
print("Original vector:")
print(v)
print(factor(v))
print("Levels of factor of the said vector:")
print(levels(factor(v)))
Example
V = c("North", "South", "East", "East", "West", "South", "North")
drn <- factor(V)
drn

Converting a factor into a numeric vector


a1<-as.numeric(drn)
is.numeric(a1)

Converting Numeric Variables into Factors


For converting a numeric into a factor we use the cut( ) function.
cut( ) divides the range of numeric vector(assume x) which is to be converted by cutting into
intervals and codes its value (x) according to which interval they fall.
Example:
age <- c(40, 49, 48, 40, 67, 52, 53)
salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender", "female", "male", "female","transgender")
employee<- data.frame(age, salary, gender)
wfact = cut(employee$age, 3)
wfact
table(wfact)
is.factor(wfact)
Creating a factor corresponding to age with labels
wfact = cut(employee$age, 3, labels=c('Young', 'Medium', 'Aged'))
wfact

Date Operations
Dates in R
1.Get the system date

Sys.Date( )

2.Get the system time

Sys.time ( )

format(date,format="%a")

Specifier Description

%a Abbreviated weekday

%A Full weekday

%b Abbreviated month

%B Full month

%C Century
Specifier Description

%y Year without century

%Y Year with century

%d Day of month (01-31)

The as.Date( ) function handles dates in R without time. This function takes the date as a String
in the format YYYY-MM-DD or YYY/MM/DD and internally represents it as the number of
days
x <- as.Date("2024-01-01")
x
y <- as.Date("2024-01-10")
y
range=seq(x,y,"days")
range
install.packages("lubridate")
library(lubridate)
x <- ymd("2024-01-01")
y <- ymd("2024-01-10")
range=seq(x,y,"days")
range
x <-dmy("01-04-2024")
y <-dmy("10-04-2024")
range=seq(x,y,"days")
range
WEEK-11
Missing Data
In R, the NA symbol is used to define the missing values, and to represent impossible arithmetic
operations (like dividing by zero). we use the NAN symbol which stands for “not a number”. In
simple words, we can say that both NA or NAN symbols represent missing values in R.
Finding Missing Data in R
R provides us with inbuilt functions using which we can find the missing values.
Using the is.na( ) Function. This function returns a vector that contains only logical value (either
True or False).
Example:
1. x <- c(NA, "TP", 4, 6.7, 'c', NA, 12)
x
is.na(x)
which(is.na(x))
sum(is.na(x))
2. y<- c(NA, 100, 241, NA, 0 / 0, 101, 0 / 0)
y
is.nan(y)
Remove Values Using Filter functions
na.omit( ) − It simply rules out any rows that contain any missing value and forgets those rows
na.exclude( ) − This arugment ignores rows having at least one missing value.
na.pass( ) − Take no action.
na.fail( ) − It terminates the execution if any of the missing values are found.
Example:
1. na.exclude(x)
na.exclude(y)
na.omit(x)
2. data <- data.frame(A = c(1, 2, NA, 4, 5),B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, NA, NA))
data
is.na(data)
sum(is.na(data))
Identify and Remove Duplicate Data in R
We can use duplicated( ) function to find out how many duplicates value are present in a vector
and unique( ) to remove duplicate values.
Example:
a <- c(1, 2, 3, 4, 4, 5)
duplicated(a)
sum(duplicated(a))
unique(a)
Example:
s=data.frame(name=c("Ram","Geeta","John","Paul", "Cassie","Geeta","Paul"),
maths=c(7,8,8,9,10,8,9),
science=c(5,7,6,8,9,7,8),
history=c(7,7,7,7,7,7,7))
s
duplicated(s)
sum(duplicated(s))
unique(s)
duplicated(s$maths)
unique(s$maths)
Spelling Check
install.packages("dplyr")

install.packages("stringr")

install.packages("quanteda")

install.packages("hunspell")

install.packages("flextable")

library(dplyr)

library(stringr)

library(quanteda)

library(hunspell)

library(flextable)
Example:

words <- c("analize", "langauge", "data")


correct <- hunspell_check(words)
print(correct)
WEEK-12

SQLite, an extremely light-weight relational database management system (RDBMS) in R.

install.packages("RSQLite")
library(RSQLite)
con <- dbConnect(SQLite(), 'play-example.db')
con
dbWriteTable(con, 'cars', mtcars)
dbListTables(con)
dbGetQuery(con, 'SELECT * FROM cars ')
dbGetQuery(con, 'SELECT * FROM cars LIMIT 5')
dbGetQuery(con, 'SELECT mpg, cyl FROM cars WHERE mpg>30 ORDER BY mpg')
Loading SPSS (Statistical Package for the Social Sciences)
SAS (Statistical Analysis Software) files
The easiest way to import SPSS files into R is to use the read_sav() function from
the haven library.
install.packages('haven')
library(haven)
data <- read_sav('C:/Users/User_Name/file_name.sav')
data <- read_sas('C:/Users/User_Name/file_name.sas7bdat')

Reading Google Sheets In R


install.packages("googlesheets4")
library(googlesheets4)
The read_sheet( ) will read in the data for you.

You might also like