0% found this document useful (0 votes)
16 views79 pages

Data Analysis Using R (Student Copy)

Lab manual

Uploaded by

Jai Sudhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views79 pages

Data Analysis Using R (Student Copy)

Lab manual

Uploaded by

Jai Sudhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

INDEX

S.No Program Description


1. Installing R and packages in R.

2. Program to make a simple calculation using R


3. To print Lowercase and Uppercase strings using R

4. Programs on Operators in R.

5. Control Structures in R.

6. Creating matrix and manipulating matrix in R.

7. Decision Tree

8. K-Nearest Neighbour (KNN) Algorithm

9. Naïve Baye’s Algorithm

10. Random Forest Algorithm

11. K-Means Clustering Algorithm

12. Implementation of Hierarchical With R

13. Visualizations
Plot The Histogram, Bar Chart And Pie Chart On Sample Data
14. Implementation of various charts

15. Implementation of predictive model in R

16. Operations on Lists in R.

17. Built-in Functions in R

18. To convert a given pH levels of soil to an ordered fact

19. Creating and manipulating a vector in R.

20. CLASSIFICATION MODEL


(a) linear Discriminant algorithm(LDA)
(b) Non-linear algorithm (NLA)
(c) Support Vector Machine(SVM)

1
EX.NO: 1
DATE:
INSTALLING R AND PACKAGES IN R
AIM
To Install R and packages in R..

PROCEDURE
To Install R and R Packages

1. Open an internet browser and go to www.r-project.org.


2. Click the "download R" link in the middle of the page under "Getting Started."
3. Select a CRAN location (a mirror site) and click the corresponding link.
4. Click on the "Download R for WINDOWS" link at the top of the page.
5. Click on the file containing the latest version of R under "Files."
6. Save the .pkg file, double-click it to open, and follow the installation instructions.
7. Now that R is installed, you need to download and install RStudio.

To Install RStudio
1. Go to www.rstudio.com and click on the "Download RStudio" button.
2. Click on "Download RStudio Desktop."
3. Click on the version recommended for your system, or the latest Mac version, save the .dmg
file on your computer, double-click it to open, and then drag and drop it to your applications
folder.
To Install R Packages
The capabilities of R are extended through user-created packages, which allow
specialized statistical techniques, graphical devices, import/export capabilities, reporting tools
(knitr, Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C,
C++, and Fortran.The R packaging system is also used by researchers to create compendia to
organize research data, code and report files in a systematic way for sharing and public
archiving.
A core set of packages is included with the installation of R, with more than 12,500 additional
packages (as of May 2018[update]) available at the Comprehensive R Archive Network
(CRAN).

2
Packages are collections of R functions, data, and compiled code in a well- defined
format. The directory where packages are stored is called the library. R comes with a standard
set of packages. Others are available for download and installation. Once installed, they have
to be loaded into the session to be used.
● libPaths() # get library location
● library() # see all packages installed
● search() # see packages currently loaded

Adding R Packages
You can expand the types of analyses you do be adding other packages. A complete list
of contributed packages is available from CRAN.
Follow these steps:
1.Download and install a package (you only need to do this once).
2.To use the package, invoke the library(package) command to load it into the current session.
(You need to do this once in each session, unless you customize your environment to
automatically load it each time.)

Installing and Loading Packages


It turns out the ability to estimate ordered logistic or probit regression is included in the
MASS package.
To install this package you run the following command: 1 >install . packages (" MASS ")
You will be asked to pick a CRAN mirror from which to download (generally the closer the
faster) and R will install the package to your library. R will still be clueless. To actually tell R
to use the new package you have to tell R to load the package’s library each time you start an
R session, just like so:

> library (" MASS ")


> R now knows all the functions that are canned in the MASS package. To see what functions
are implemented in the MASS package, type:
1 > library ( help = " MASS ")

Maintaining your Library


Packages are frequently updated. Depending on the developer this could happen very often. To
keep your packages updated enter this every once in a while:
3
>update .packages ( )
The Workspace
The workspace is your current R working environment and includes any user-defined
objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user
can save an image of the current workspace that is automatically reloaded the next time R is
started. Commands are entered interactively at the R user prompt. Up and down arrow keys
scroll through your command history.Itprobably want to keep different projects in different
physical directories. Here are some standard commands for managing your workspace.
getwd( ) # print the current working directory . ls ( ) # list the objects in the current workspace.
Setwd (mydirectory) # change to my directory
setwd ("c:/docs/mydir") # note / instead of \ in windows
# view and set options for the session help(options) # learn about available options options( )
# view current option settings.

OUTPUT

RESULT: This process has been executed successfully.


EX.NO: 2
DATE:
PROGRAM TO MAKE A SIMPLE CALCULATOR USING R

4
AIM
To make a simple calculator using R

PROCEDURE

1.open a new file

2.define functions as add, subract, multiply, divide

3.enter the Choice (Addition, Subtraction, multiplication, division)

4.Takes two numbers, num1 and num2

5.Switch case jump to an operator selected by the user

6.Store result into result variable.

7.Display the operation result

8. Exit from the program.

PROGRAM
# Program make a simple calculator that can add, subtract, multiply and divide
using functions
add<- function(x, y) {
return(x + y)
}
subtract<- function(x, y) {
return(x - y)
}
multiply<- function(x, y) {
return(x * y)
}
divide<- function(x, y) {
return(x / y)
}
# take input from the user
5
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first
number: "))
num2 = as.integer(readline(prompt="Enter
second number: "))
operator <- switch(choice,"+","-","*","/")
result<- switch(choice, add(num1, num2),
subtract(num1, num2), multiply(num1, num2),
divide(num1, num2))
print(paste(num1, operator,
num2, "=", result)

OUTPUT

6
RESULT
Thus the process has been executed successfully

EX.NO: 3
DATE:

7
CREATE A NUMERIC DATA VECTORS USING R

AIM

Write a R program to create three vectors numeric data, character data and logical
data. Display the content of the vectors and their type.

PROCEDURE

PROGRAM

print("First 10 letters in lower case:")


t =head(letters,10)
print(t)
print("Last 10 letters in upper case:")
t =tail(LETTERS,10)
print(t)
print("Letters between 22nd to 24th letters in upper case:")
e =tail(LETTERS[22:24])
print(e)

OUTPUT

RESULT
This process has been executed successfully
EX.NO:4
DATE:
PROGRAMS ON OPERATORS IN R.

8
AIM
To demonstrate operators in R

PROCEDURE
1. Open a new file
2. To demonstrate operators in R
3. Arithmetic operators(+,-,*,/,^,%%)
4. Logical operators (&amp;,|,!)
5. Relational operator(&lt;,&gt;,&lt;=,&gt;=,!=)
6. Assignment operator(&lt;-,-&gt;,=)
7. Exit from the program

1. Arithmetic Operators
These operators are used to carry out mathematical operations like addition
andmultiplication. Here is a list of arithmetic operators available in R.

Program 1
# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Addition of vectors :", vec1 + vec2, "\n")
cat ("Subtraction of vectors :", vec1 - vec2, "\n")
cat ("Multiplication of vectors :", vec1 * vec2, "\n")
cat ("Division of vectors :", vec1 / vec2, "\n")
cat ("Modulo of vectors :", vec1 %% vec2, "\n")
cat ("Power operator :", vec1 ^ vec2)

Output

Addition of vectors : 2 5
Subtraction of vectors : -2 -1
Multiplication of vectors : 0 6
Division of vectors : 0 0.6666667
Modulo of vectors : 0 2
Power operator : 0 8

9
R Relational Operators
Relational operators are used to compare between values.Here is a list of relational operators
available in R.
Program 2
# R program to illustrate
# the use of Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Vector1 less than Vector2 :", vec1 < vec2, "\n")
cat ("Vector1 less than equal to Vector2 :", vec1 <= vec2, "\n")
cat ("Vector1 greater than Vector2 :", vec1 > vec2, "\n")
cat ("Vector1 greater than equal to Vector2 :", vec1 >= vec2, "\n")
cat ("Vector1 not equal to Vector2 :", vec1 != vec2, "\n")

Output

vector1 less than Vector2 : TRUE TRUE


Vector1 less than equal to Vector2 : TRUE TRUE
Vector1 greater than Vector2 : FALSE FALSE
Vector1 greater than equal to Vector2 : FALSE FALSE
Vector1 not equal to Vector2 : TRUE TRUE

Assignment operators
Assignment operators are used to assign values to various data objects in R. The objects may
be integers, vectors, or functions. These values are then stores by the assigned variable names.
There are two kinds of assignment operators: Left and Right.
Program 3
# R program to illustrate
# the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)

# Performing operations on Operands


cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)

10
Output

Element wise AND : FALSE FALSE


Element wise OR : TRUE TRUE
Logical AND : FALSE
Logical OR : TRUE
Negation : TRUE FALSE

Logical Operators

Logical operations simulate element-wise decision operations, based on the specified operator
between the operands, which are then evaluated to either a True or False boolean value. Any
non zero integer value is considered as a TRUE value, be it complex or real number.
Program 4
# R program to illustrate
# the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)
# Performing operations on Operands
cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)

Output
Element wise AND : FALSE FALSE
Element wise OR : TRUE TRUE
Logical AND : FALSE
Logical OR : TRUE
Negation : TRUE FALSE

RESULT: The process has been executed successfully

EX.NO: 5
DATE:
CONTROL STRUCTURES

AIM

11
To implement control structures in R

PROCEDURE
R if statement
The syntax of if statement is:
if (test_expression)
{
statement
}
If the test_expression Is TRUE, the statement gets executed. But if it’s FALSE, nothing
happens.
Here, test_expression can be a logical or numeric vector,but only the first element is taken
into consideration.
In the case of numeric vector, zero is taken as FALSE, rest as TRUE.
Example: if statement
x <- 5
if(x > 0)
{
print("Positive number")
}
[1] "Positive number"

Develop programs on if-else in R.


1.Program to check if the input year is a leap year or not

year=as.integer(readline(prompt="Enterayear:"))
if((year%%4)==0)

{
if((year %% 100) == 0)
{

if((year %% 400) == 0)

12
print(paste(year,"is a leap year"))

}
Else
{
print(paste(year,"isnot a leap year"))
}
} else {
print(paste(year,"is a leap year"))
}
} else {
print(paste(year,"isnot a leap year"))
}

OUTPUT

Enter a year: 1900


[1] "1900 is not a leap year"

2.Find the Factorial of a givenNumber.

# takeinputfrom the user

num = as.integer(readline(prompt="Enter a number: "))

factorial = 1

# checkisthe number is negative, positive or zero

if(num<0)

print("Sorry, factorial does not exist for negativenumbers")

elseif(num== 0) {

print("The factorial of 0 is 1")

}
else

13
{
for(iin 1:num)
{

factorial = factorial * i
}
print(paste("The factorial of", num ,"is",factorial))
}

OUTPUT
Enter a number: 8
[1] "The factorial of 8 is 40320"

3.Check whether the given number is Even or Odd.

# Program to check if the input number is odd or even.

#Anumberisevenifdivisionby2givearemainderof0.
#Ifremainderis1,itisodd.
num = as.integer(readline(prompt="Enter a number: "))
if((num %% 2) == 0)
{
print(paste(num,"is Even"))
}
else
{
print(paste(num,"isOdd"))
}

OUTPUT

Enter a number: 89
[1] "89 is Odd"

ITERATIVE CONTROL STRUCTURES


FOR LOOP

A for loop is used to iterate over a vector in R programming

Syntax of for loop

for (val in sequence)

14
{

statement

}
Here, sequence is a vector andval takes on each of its value during the loop. In each iteration,
statement is evaluated.

1.Program to count the number of even numbers in a vector.

x <- c(2,5,3,9,8,11,6)

count<- 0

for (val in x) {

if(val %% 2 == 0) count= count+1

}
print(count)

Output
[1] 3

2.Program to Check Whether the given number is prime or not.

num = as.integer(readline(prompt="Enter a number: "))

flag = 0

if(num>1)

# check for factors

flag = 1
for(iin 2:(num-1))
{
if((num%%i)==0)
{

15
flag =0
break
}
}
}
if(num==2)
flag =1 if(flag==1)
{
print(paste(num,"is a prime number"))
}
else
{
print(paste(num,"is not a prime number"))
}

OUTPUT
Enter a number: 25
[1] "25 is not a prime number"

3.Program to display multiplication table.


num=as.integer(readline(prompt="Enteranumber:"))

for(i in 1:10) { print(paste(num,'x', i, '=',

num*i))

Output

Enter a number: 7

[1] "7 x 1 = 7"


[1] "7 x 2 =14"
[1]"7x3=21"
[1]"7x4 =28"
[1] "7 x 5 =35"
[1] "7 x 6 =42"
[1] "7 x 7 =49"

16
[1] "7 x 8 =56"
[1] "7 x 9 =63"
[1] "7 x 10 = 70"

ITERATIVE CONTROL STRUCTURES

WHILE LOOP

In R programming, while loops are used to loop until a specific condition ismet.

Syntax of while loop

while (test_expression)

statement

Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to evaluate the
test_expressionagain.This is repeated each time untiltest_expression evaluates to FALSE, in
which case, the loop exits.
Example of while Loop
i<- 1
while (i< 6) { print(i)
i = i+1
}
Output
[1]1
[1]2
[1]3
[1]4
[1]5

1.Check whether the given number is Armstrong number or not.


num = as.integer(readline(prompt="Enter a number: "))
sum = 0

17
temp =num
while(temp > 0)
{
digit = temp %% 10
sum = sum + (digit ^ 3)
temp = floor(temp / 10)
}
if(num == sum)
{
print(paste(num, "is an Armstrong number"))
}
else
{
print(paste(num, "is not an Armstrong number"))
}

OUTPUT
Enter a number: 23
[1] "23 is not an Armstrong number"

2.Find sum of natural numbers without formula.


num = as.integer(readline(prompt = "Enter a number: "))

if(num< 0) {

print("Enter a positive number")

else

{
sum = 0
{
sum=sum+numnum=num-1
}

print(paste("The sum is", sum))

}
18
OUTPUT

Enteranumber:10
[1] "The sum is55"

3.Program to print Fibonacci Series


nterms=as.integer(readline(prompt="Howmanyterms?"))
n1 =0
n2=1
count = 2
if(nterms<= 0) {
print("Plese enter a positive integer")
}
else
{
if(nterms == 1) {
print("Fibonacci sequence:")
print(n1)
}
Else
{
print("Fibonacci sequence:")
print(n1)
print(n2)
while(count <nterms) {
nth = n1 +n2
print(nth)
# update values
n1 = n2
n2 = nth
count = count + 1
}
}
}

19
OUTPUT
How many terms?
7
[1] "Fibonacci sequence:"
[1]0
[1]1
[1]1
[1]2
[1]3
[1]5
[1]8

RESULT
The process has been executed successfully.

EX.NO: 6
DATE:
CREATING MATRIX AND MANIPULATING MATRIX IN R.

20
AIM
To create and manipulate matrix in R

PROCEDURE

1. Open a new file


2. To create matrices, we will use the matrix () function.
3. Mention the number of row and number of column
4. nrow the desired number of rows
5. ncol the desired number of columns.
6. Exit from the program

Creation of matrix
1. matrix1<- matrix ( data = 1, nrow = 3, ncol = 3)
>matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)
>matrix1
Sol
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1

2. vector8 <- 1:12


matrix3 <- matrix ( data = Vector8 , nrow = 4)
>vector8 <- c(1:12)
>vector8
[1] 1 2 3 4 5 6 7 8 9 10 11 12
>matrix3 <- matrix ( data = vector8 , nrow = 4)
>matrix3
Sol
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
21
[4,] 4 8 12

3. v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)


>v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
>v1

Sol
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

4. v2<- matrix(1:8, ncol = 2)


Sol
>v2<- matrix(1:8, ncol = 2)
>v2
Sol
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8

1. matrix1 = matrix(1:9, nrow = 3)


matrix1 + 2
Sol:
>matrix1 = matrix(1:9, nrow = 3)
>matrix1

Sol
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
22
[3,] 3 6 9
matrix1+2
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 4 7 10
[3,] 5 8 11

Manipulation of Matrix
1. matrix1
>matrix1
Sol
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

2. matrix1[1, 3]
>matrix1[1, 3]
Sol
[1] 7

3. matrix1[ 2, ]
Sol
> matrix1[ 2, ]
[1] 2 5 8

4. matrix1[,-2]
Sol
>matrix1[,-2]
[,1] [,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9

23
5.matrix1[1, 1] = 15
Sol
>matrix1[1, 1] = 15
>matrix1
[,1] [,2] [,3]
[1,] 15 4 7
[2,] 2 5 8
[3,] 3 6 9

6.matrix1[ ,2 ] = 1
matrix1
Sol
[,1] [,2] [,3]
[1,] 15 1 7
[2,] 2 1 8
[3,] 3 1 9

7. matrix1[ ,2:3 ] = 2 Sol:


>matrix1[ ,2:3 ] = 2
>matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2

8. >m<-matrix(nrow=2,ncol=4,data=c(1,3,5,7,2,4,6,8) , byrow=TRUE)
>m
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
9. Calculate Transpose.
>t(m)
Sol
[,1] [,2]
24
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8

10. Calculate Inverse.


>solve(m)
Error in solve.default(m) : 'a' (2 x 4) must be square
>m<-matrix(nrow=3,ncol=3,data=c(1,3,5,7,2,4,6,8,9) , byrow=TRUE)
>m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 7 2 4
[3,] 6 8 9

11. Calculate Determinant.


>det(m)
Sol
[1] 89

12. Calculate the Multiplication of the matrix.


>m1<-m%*%m
>m1
Sol
[,1] [,2] [,3]
[1,] 52 49 62
[2,] 45 5779
[3,] 116 106 143

OUTPUT

25
RESULT
The process has been executed successfully

EX.NO: 7
DATE:

26
DECISION TREE
AIM
To build decision tree model and check its performance on titanic dataset
PROCEDURE
1. Start
2. Ensure data set is in current working directory
3. Read .CSV file and load into a data frame
4. Display the structure of a dataset
5. Pre-process the dataset and make the dataset ready for analysis
6. Fix randomization procedure with seed method
7. Sample 60% of data for training and 40% of the data for testing
8. Split data into training and testing
9. Install and import “rpart” package and library
10. Build decision tree model by defining “survived” as target variable and remaining
variables as independent variables
11. Evaluate the model with test data
12. Install and important “caret” package and library
13. Generate confusion matrix
14. Stop

PROGRAM

#load dataset
data=read.csv("titanic.csv")
#exploratory analysis on dataset
str(data)
head(data)
tail(data)
str(data)
sum(is.null(data))
summary(data)

27
#Rename columns 5 and 6
colnames(data)[6:7]<-c("sib","par")
str(data)
#dropping columns from dataset
data_new<-data[-c(3)]
str(data_new)
#split dataset into training and testing
set.seed(45)
train.index<- sample(row.names(data_new), dim(data_new)[1]*0.6)
test.index<- sample(setdiff(row.names(data_new), train.index),
dim(data_new)[1]*0.4)
train<- data_new[train.index, ]
test<- data_new[test.index, ]
#Implement Algorithm
#Classification Tree
#Full-grown Tree
library(rpart)
library(rpart.plot)
# Full grown tree
str(data_new)
class.tree<- rpart(Survived ~., data = train, method = "class")
tree.pred.test<- predict(class.tree, test, type = "class")
install.packages("caret")
library(caret)
confusionMatrix(tree.pred.test, test$Survived)

28
OUTPUT

RESULT
Thus, the process has been executed successfully

EX.NO: 8

29
DATE:
K-NEAREST NEIGHBOUR (KNN) ALGORITHM
AIM
To build machine learning model using KNN and check its performance on iris dataset
PROCEDURE
1. Start
2. Load the pre-loaded datatset “iris” into workspace
3. Display the structure of “iris” dataset
4. Install and import “e1071” package and library
5. Install and import “caTools” package and library
6. Install and import “class” package and library
7. Sample and split 70% of data for training and 30% of data for testing
8. Scale features (Columns) of training and testing except target column “species”
9. Building K-NN model and testing with test data by passing training dataset, testing
dataset, target_column and No. of nearest neighbor(5))
10. Build confusion matrix by passing actual and predicted values of test data
11. Stop

PROGRAM
# Loading data
data(iris)
# Structure
str(iris)
# Installing Packages
install.packages("e1071")
install.packages("caTools")
install.packages("class")

# Importing libraries
library(e1071)
library(caTools)

30
library(class)
# Loading data
data(iris)
head(iris)
# Splitting data into train and test data
split<- sample.split(iris, SplitRatio = 0.7)
train_cl<- subset(iris, split == "TRUE")
test_cl<- subset(iris, split == "FALSE")
# Feature Scaling
train_scale<- scale(train_cl[, 1:4])
test_scale<- scale(test_cl[, 1:4])

# Fitting KNN Model


# to training dataset
classifier_knn<- knn(train = train_scale, test = test_scale, cl = train_cl$Species, k = 5)
classifier_knn

# Confusion Matrix
cm <- table(test_cl$Species, classifier_knn)
cm

31
OUTPUT

RESULT
Thus, the process has been executed successfully

32
EX.NO: 9
DATE:
NAÏVE BAYE’S ALGORITHM
AIM
To learn Naïve Bayes classifier and its implementation in R Programming

Naive Bayes is a Supervised Non-linear classification algorithm in R Programming. Naive


Bayes classifiers are a family of simple probabilistic classifiers based on applying Baye’s
theorem with strong(Naive) independence assumptions between the features or variables. The
Naive Bayes algorithm is called “Naive” because it makes the assumption that the occurrence
of a certain feature is independent of the occurrence of other features.

Theory

Naive Bayes algorithm is based on Bayes theorem. Bayes theorem gives the conditional
probability of an event A given another event B has occurred.

where,
P(A|B) = Conditional probability of A given B.
P(B|A) = Conditional probability of B given A.
P(A) = Probability of event A.
P(B) = Probability of event B.

For many predictors, we can formulate the posterior probability as follows:

P(A|B) = P(B1|A) * P(B2|A) * P(B3|A) * P(B4|A) …

Example:
Consider a sample space:
{HH, HT, TH, TT}
where,
H: Head
T: Tail

P(Second coin being head given = P(A|B)


first coin is tail) = P(A|B)

33
= [P(B|A) * P(A)] / P(B)
= [P(First coin is tail given second coin is head) *
P(Second coin being Head)] / P(first coin being tail)
= [(1/2) * (1/2)] / (1/2)
= (1/2)
= 0.5

RESULT
Thus, the process has been executed successfully

34
EX.NO: 10
DATE:
RANDOM FOREST ALGORITHM
AIM
To build machine learning model using Random Forest and check its performance on
titanic dataset
PROCEDURE
1. Start
2. Ensure data set is in current working directory
3. Read .CSV file and load into a data frame
4. Display the structure of a dataset
5. Pre-process the dataset and make the dataset ready for analysis
6. Fix randomization procedure with seed method
7. Sample 60% of data for training and 40% of the data for testing
8. Split data into training and testing
9. Convert the variables “Survived” and “Pclass” of training and testing data into
categorical using factor method
10. Install and import “randomForest” package and library
11. Build random forest model by defining “survived” as target variable and remaining
variables as independent variables
12. Evaluate the model with test data
13. Install and important “caret” package and library
14. Generate confusion matrix
15. Stop

PROGRAM
#load dataset
data=read.csv("titanic.csv")
#exploratory analysis on dataset
str(data)
head(data)
tail(data)
35
str(data)
sum(is.null(data))
summary(data)
#Rename columns 5 and 6
colnames(data)[6:7]<-c("sib","par")
str(data)
#dropping columns from dataset
data_new<-data[-c(3)]
str(data_new)
#split dataset into training and testing
set.seed(45)
train.index<- sample(row.names(data_new), dim(data_new)[1]*0.6)
test.index<- sample(setdiff(row.names(data_new), train.index), dim(data_new)[1]*0.4)
train<- data_new[train.index, ]
test<- data_new[test.index, ]
#Implement Algorithm
train$Survived<- as.factor(train$Survived)
train$Pclass<- as.factor(train$Pclass)
sapply(train, class)
set.seed(1234)
install.packages("randomForest")
library(randomForest)
RF_model1 <- randomForest(Survived ~., data = train,importance=TRUE)
test$Survived<- as.factor(test$Survived)
test$Pclass<- as.factor(test$Pclass)
sapply(test, class)
RF_prediction<- predict(RF_model1, test)
install.packages("caret")
library(caret)
conMat<- confusionMatrix(RF_prediction, test$Survived)
conMat

36
OUTPUT

RESULT
Thus, the process has been executed successfully

37
EX.NO: 11
DATE:

K-MEANS CLUSTERING ALGORITHM

AIM
To create 2-D plot and perform clustering based on the data age and spending

PROCEDURE
1.Start
2.Create a data frame with two columns using vector c()
3.Read .CSV file and load into a data frame
4.Defineggplot by passing dataframe, x-axis, and y – axis and plot the geometrical point in
the graph
5. Install and import “cluster” package and library
6.create clusters by passing dataframe, number of clusters and initial configuration value for
the K means function
7.Display cluster results
8.Plot clusters
9.Stop

PROGRAM

df<-data.frame(age=c(18,21,40,24),spend=c(10,11,22,15))
ggplot(df,aes(x=age,y=spend))+geom_point()
library(cluster)
kmeans<-kmeans(df,centers=2,nstart=20)
str(kmeans)
clusplot(df,kmeans$cluster,label=2,time=0)

38
Output

RESULT
Thus, the process has been executed successfully

39
EX.NO: 12
DATE :

Hierarchical Cluster Analysis using R Programming

Aim:
To learn on hierarchical cluster analysis using r programming.

Cluster analysis or clustering is a technique to find subgroups of data points within a data
set. The data points belonging to the same subgroup have similar features or properties.
Clustering is an unsupervised machine learning approach and has a wide variety of
applications such as market research, pattern recognition, recommendation systems, and so
on. The most common algorithms used for clustering are K-means clustering and
Hierarchical cluster analysis. In this article, we will learn about hierarchical cluster analysis
and its implementation in R programming.
Hierarchical cluster analysis (also known as hierarchical clustering) is a clustering
technique where clusters have a hierarchy or a predetermined order. Hierarchical clustering
can be represented by a tree-like structure called a Dendrogram. There are two types of
hierarchical clustering:
● Agglomerative hierarchical clustering: This is a bottom-up approach where each
data point starts in its own cluster and as one moves up the hierarchy, similar pairs of
clusters are merged.
● Divisive hierarchical clustering: This is a top-down approach where all data points
start in one cluster and as one moves down the hierarchy, clusters are split recursively.
To measure the similarity or dissimilarity between a pair of data points, we use distance
measures (Euclidean distance, Manhattan distance, etc.). However, to find the dissimilarity
between two clusters of observations, we use agglomeration methods. The most common
agglomeration methods are:
● Complete linkage clustering: It computes all pairwise dissimilarities between the
observations in two clusters, and considers the longest (maximum) distance between
two points as the distance between two clusters.
● Single linkage clustering: It computes all pairwise dissimilarities between the
observations in two clusters, and considers the shortest (minimum) distance as the
distance between two clusters.
40
● Average linkage clustering: It computes all pairwise dissimilarities between the
observations in two clusters, and considers the average distance as the distance
between two clusters.
Performing Hierarchical Cluster Analysis using R
For computing hierarchical clustering in R, the commonly used functions are as follows:
● hclust in the stats package and agnes in the cluster package for agglomerative
hierarchical clustering.
● diana in the cluster package for divisive hierarchical clustering.
We will use the Iris flower data set from the datasets package in our implementation. We will
use sepal width, sepal length, petal width, and petal length column as our data points. First,
we load and normalize the data. Then the dissimilarity values are computed with dist function
and these values are fed to clustering functions for performing hierarchical clustering.

# Load required packages


library(datasets) # contains iris dataset
library(cluster) # clustering algorithms
library(factoextra) # visualization
library(purrr) # to use map_dbl() function

# Load and preprocess the dataset


df <- iris[, 1:4]
df <- na.omit(df)
df <- scale(df)

# Dissimilarity matrix
d <- dist(df, method = "euclidean")
gglomerative hierarchical clustering implementation
The dissimilarity matrix obtained is fed to hclust. The method parameter of hclust specifies
the agglomeration method to be used (i.e. complete, average, single). We can then plot the
dendrogram.
● R

# Hierarchical clustering using Complete Linkage

41
hc1 <- hclust(d, method = "complete" )

# Plot the obtained dendrogram


plot(hc1, cex = 0.6, hang = -1)

Output:

Result:
The hierarchical clustering with r programming is learnt with an example.

42
EX.NO: 13
DATE:
HISTOGRAM REPRESENTATION USING IRIS DATASET

Aim: Descriptive statistics in R

a. Write an R script to find basic descriptive statistics using summary, str,


quartile function on any datasets.
b. Write an R script to find subset of dataset by using subset (), aggregate () functions on
irisdataset.

Sol:

This chapter shows examples on data exploration with R. It starts with inspecting the
dimensionality, structure and data of an R object, followed by basic statistics and various charts
like pie charts and histograms. Exploration of multiple variables are then demonstrated,
including grouped distribution, grouped boxplots, scattered plot and pairs plot. After that,
examples are given on level plot, contour plot and 3D plot. It also shows how to saving charts
into files of various formats.

1.1 Have a Look at Data

The iris data is used in this chapter for demonstration of data exploration with R.

1.3.1 for details of the iris data.

We first check the size and structure of data. The dimension and names of data can be
obtained respectively with dim() and names(). Functions str() and attributes() return the
structure and attributes of data.

dim(iris)

dim(iris)
[1] 150 5

names(iris)

names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

Str(iris)

str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
43
$ Sepal.Width :num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width :num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Iris[1:5,]

iris[1:5,]
Sepal.LengthSepal.WidthPetal.LengthPetal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa

head(iris)

Sepal.LengthSepal.WidthPetal.LengthPetal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

tail(iris)

Sepal.LengthSepal.WidthPetal.LengthPetal.Width Species
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica

summary(iris)

Sepal.LengthSepal.WidthPetal.LengthPetal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500

var(iris$Sepal.Length)

44
[1] 0.6856935

hist(iris$Sepal.Length)

plot(density(iris$Sepal.Length))

pie(table(iris$Species))

45
barplot(table(iris$Species))

Explore Multiple Variables

After checking the distributions of individual variables, we then investigate the relationships
between two variables. Below we calculate covariance and correlation between variables with
cov() and cor().

cov(iris$Sepal.Length, iris$Petal.Length)
[1] 1.274315

cov(iris[,1:4])
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
Sepal.Length0.6856935 -0.0424340 1.2743154 0.5162707
Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394
Petal.Length1.2743154 -0.3296564 3.1162779 1.2956094
Petal.Width0.5162707 -0.1216394 1.2956094 0.5810063

aggregate(Sepal.Length ~ Species, summary, data=iris)


Species Sepal.Length.Min. Sepal.Length.1st Qu. Sepal.Length.Median
1 setosa 4.300 4.800 5.000
2 versicolor 4.900 5.600 5.900
3 virginica 4.900 6.225 6.500

Sepal.Length.Mean Sepal.Length.3rd Qu. Sepal.Length.Max.


1 5.006 5.200 5.800
2 5.936 6.300 7.000
3 6.588 6.900 7.900

RESULT
Thus, the process has been executed successfully

46
EX.NO: 14
DATE :
Implementation of various charts

Aim:- To implement various charts in R programming

Procedure
Bar Plot or Bar Chart
Bar plot or Bar Chart in R is used to represent the values in data vector as height of the bars.
The data vector passed to the function is represented over y-axis of the graph. Bar chart can
behave like histogram by using table() function instead of data vector.
Syntax: barplot(data, xlab, ylab)
where:
● data is the data vector to be represented on y-axis
● xlab is the label given to x-axis
● ylab is the label given to y-axis
Program:
# defining vector
x <- c(7, 15, 23, 12, 44, 56, 32)
# output to be present as PNG file
png(file = "barplot.png")
# plotting vector
barplot(x, xlab = "GeeksforGeeks Audience",
ylab = "Count", col = "white",
col.axis = "darkgreen",
col.lab = "darkgreen")
# saving the file
dev.off()
Output:

47
Pie Diagram or Pie Chart
Pie chart is a circular chart divided into different segments according to the ratio of data
provided. The total value of the pie is 100 and the segments tell the fraction of the whole pie.
It is another method to represent statistical data in graphical form and pie() function is used to
perform the same.
Syntax:pie(x, labels, col, main, radius)
where,
● x is data vector
● labelsshows names given to slices
● col fills the color in the slices as given parameter
● main shows title name of the pie chart
● radius indicates radius of the pie chart. It can be between -1 to +1
Program:
# defining vector x with number of articles
x <- c(210, 450, 250, 100, 50, 90)

# defining labels for each value in x


names(x) <- c("Algo", "DS", "Java", "C", "C++", "Python")

# output to be present as PNG file


png(file = "piechart.png")

# creating pie chart


pie(x, labels = names(x), col = "white",
main = "Articles on GeeksforGeeks", radius = -1,
col.main = "darkgreen")

48
# saving the file
dev.off()
Output:

Histogram
Histogram is a graphical representation used to create a graph with bars representing the
frequency of grouped data in vector. Histogram is same as bar chart but only difference
between them is histogram represents frequency of grouped data rather than data itself.
Syntax:hist(x, col, border, main, xlab, ylab)
where:
● x is data vector
● col specifies the color of the bars to be filled
● border specifies the color of border of bars
● main specifies the title name of histogram
● xlab specifies the x-axis label
● ylab specifies the y-axis label
Program:
# defining vector
x <- c(21, 23, 56, 90, 20, 7, 94, 12,
57, 76, 69, 45, 34, 32, 49, 55, 57)
# output to be present as PNG file
png(file = "hist.png")
# hist(x, main = "Histogram of Vector x",
xlab = "Values",
col.lab = "darkgreen",

49
col.main = "darkgreen")
# saving the file
dev.off()
Output:

Scatter Plot
A Scatter plot is another type of graphical representation used to plot the points to show
relationship between two data vectors. One of the data vectors is represented on x-axis and
another on y-axis.
Syntax:plot(x, y, type, xlab, ylab, main)
Where,
● x is the data vector represented on x-axis
● y is the data vector represented on y-axis
● type specifies the type of plot to be drawn. For example, “l” for lines, “p” for points,
“s” for stair steps, etc.
● xlab specifies the label for x-axis
● ylab specifies the label for y-axis
● main specifies the title name of the graph
Program:
# taking input from dataset Orange already
# present in R
orange <- Orange[, c('age', 'circumference')]
# output to be present as PNG file
png(file = "plot.png")
# plotting
plot(x = orange$age, y = orange$circumference, xlab = "Age",
ylab = "Circumference", main = "Age VS Circumference",

50
col.lab = "darkgreen", col.main = "darkgreen",
col.axis = "darkgreen")
# saving the file
dev.off()
Output:

Box Plot
Box plot shows how the data is distributed in the data vector. It represents five values in the
graph i.e., minimum, first quartile, second quartile(median), third quartile, the maximum
value of the data vector.
Syntax:boxplot(x, xlab, ylab, notch)
where,
● x specifies the data vector
● xlab specifies the label for x-axis
● ylab specifies the label for y-axis
● notch, if TRUE then creates notch on both the sides of the box

Program:
# defining vector with ages of employees
x <- c(42, 21, 22, 24, 25, 30, 29, 22,
23, 23, 24, 28, 32, 45, 39, 40)

# output to be present as PNG file


png(file = "boxplot.png")

# plotting
boxplot(x, xlab = "Box Plot", ylab = "Age",
col.axis = "darkgreen", col.lab = "darkgreen")
# saving the file

51
dev.off()

Output:

Result:
Implementation of various charts in R programming is learnt.

52
EX.NO: 15
DATE :
Predictive Analysis in R Programming
Aim:
To learn about Predictive analysis and its applications in R Programming .

Predictive analysis in R Language is a branch of analysis which uses statistics operations to


analyze historical facts to make predict future events. It is a common term used in data
mining and machine learning. Methods like time series analysis, non-linear least square, etc.
are used in predictive analysis. Using predictive analytics can help many businesses as it
finds out the relationship between the data collected and based on the relationship, the pattern
is predicted. Thus, allowing businesses to create predictive intelligence.

Process of Predictive Analysis


Predictive analysis consists of 7 processes as follows:
● Define project: Defining the project, scope, objectives and result.
● Data collection: Data is collected through data mining providing a complete view of
customer interactions.
● Data Analysis: It is the process of cleaning, inspecting, transforming and modelling
the data.
● Statistics: This process enables validating the assumptions and testing the statistical
models.
● Modelling: Predictive models are generated using statistics and the most optimized
model is used for the deployment.
● Deployment: The predictive model is deployed to automate the production of
everyday decision-making results.
● Model monitoring: Keep monitoring the model to review performance which ensures
expected results.
Need of Predictive Analysis
● Understanding customer behavior: Predictive analysis uses data mining feature which
extracts attributes and behavior of customers. It also finds out the interests of the
customers so that business can learn to represent those products which can increase
the probability or likelihood of buying.

53
● Gain competition in the market: With predictive analysis, businesses or companies
can make their way to grow fast and stand out as a competition to other businesses by
finding out their weakness and strengths.
● Learn new opportunities to increase revenue: Companies can create new offers or
discounts based on the pattern of the customers providing an increase in revenue.
● Find areas of weakening: Using these methods, companies can gain back their lost
customers by finding out the past actions taken by the company which customers
didn’t like.
Applications of Predictive Analysis
● Health care: Predictive analysis can be used to determine the history of patient and
thus, determining the risks.
● Financial modelling: Financial modelling is another aspect where predictive analysis
plays a major role in finding out the trending stocks helping the business in decision
making process.
● Customer Relationship Management: Predictive analysis helps firms in creating
marketing campaigns and customer services based on the analysis produced by the
predictive algorithms.
● Risk Analysis: While forecasting the campaigns, predictive analysis can show an
estimation of profit and helps in evaluating the risks too.
Example:
Let us take an example of time analysis series which is a method of predictive analysis in R
programming:
x <- c(580, 7813, 28266, 59287, 75700,
87820, 95314, 126214, 218843, 471497,
936851, 1508725, 2072113)

# library required for decimal_date() function


library(lubridate)

# output to be created as png file


png(file ="predictiveAnalysis.png")

# creating time series object


# from date 22 January, 2020
54
mts <- ts(x, start = decimal_date(ymd("2020-01-22")),
frequency = 365.25 / 7)

# plotting the graph


plot(mts, xlab ="Weekly Data of sales",
ylab ="Total Revenue",
main ="Sales vs Revenue",
col.main ="darkgreen")

# saving the file


dev.off()
Output:

Forecasting Data:
Now, forecasting sales and revenue based on historical data.

x <- c(580, 7813, 28266, 59287, 75700,


87820, 95314, 126214, 218843,
471497, 936851, 1508725, 2072113)
# library required for decimal_date() function
library(lubridate)
# library required for forecasting
library(forecast)
# output to be created as png file
png(file ="forecastSalesRevenue.png")
# creating time series object

55
# from date 22 January, 2020
mts <- ts(x, start = decimal_date(ymd("2020-01-22")),
frequency = 365.25 / 7)
# forecasting model using arima model
fit <- auto.arima(mts)
# Next 5 forecasted values
forecast(fit, 5)
# plotting the graph with next
# 5 weekly forecasted values
plot(forecast(fit, 5), xlab ="Weekly Data of Sales",
ylab ="Total Revenue",
main ="Sales vs Revenue", col.main ="darkgreen")
# saving the file
dev.off()

Output:

Result:
The predictive analysis and its applications in r programming are learnt with examples.

56
EX.NO: 16
DATE :
OPERATIONS ON LISTS IN R.
AIM
To Operate on List using R.

PROCEDURE
List is a data structure having components of mixed data types.

Creating a list
List can be created using the list() function
>x<-list("a"=2.5,"b"=TRUE,"c"=1:3)
Here,we create a list x,ofthreecomponents with datatypesdouble,logical and integer vector
respectively.
Its structure can be examined with the str() function.
>str(x)
We can create the same list without thetags as follows.In such scenario, numeric indices are
used by default.
>x <-list(2.5,TRUE,1:3)
>x
Program 1
p<-c(2,7,8)
q<-c("A","B","C")
x<-list(p,q)
what is the value ofx[2]?
Sol:
p <- c(2,7,8)
q <- c("A", "B", "C")
x <- list(p, q) x[2]
[[1]]
[1] "A" "B" "C"

57
II Given
w<-c(2,7,8)
v<-c("A","B","C")
x<list(w,v),
which R statement will replace "A" in x with "K".
w <- c(2, 7, 8)
v <- c("A", "B", "C")
x <- list(w, v)
x[[2]][1] <- "K"
>x
Sol
[[1]]
[1] 2 7 8
[[2]]
[1] "K" "B" "C"

III If a<-list("x"=5,"y"=10,"z"=15),which R statement will give the sum of all elements in a?


Sol:
a <- list ("x"=5, "y"=10, "z"=15)
sum(unlist(a))
Sol
[1] 30

IV If Newlist<-list(a=1:10,b="Goodmorning",c="Hi"), write an R statement that will add 1 to


each element of the first vector in Newlist.
Newlist<- list(a=1:10, b="Good morning", c="Hi") Newlist$a<- Newlist$a + 1
Newlist
$a
Sol
[1] 2 3 4 5 6 7 8 9 10 11
$b
Sol
[1] "Good morning"
$c
58
Sol
[1] "Hi"

V If b<-list(a=1:10,c="Hello",d="AA"),write an R expression that will give all


elements,except the second of the first vector of b.
Sol:
b <- list(a=1:10, c="Hello", d="AA") b$a[-2]
[1] 1 3 4 5 6 7 8 9 10

VI Let x<-list(a=5:10,c="Hello",d="AA"),write an R statement to add a new item


z="NewItem" to the list x.
Sol:
$a
[1] 5 6 7 8 910
$c
[1] "Hello"
$d
[1] "AA"
$z
[1] "New Item"

VII Consider y<-list("a","b","c"),write an R statement that will assign new names


"one","two"and "three"to the elements of y.
y <- list("a", "b", "c")
names(y)<-c("one","two","three")
y
Sol
$one
[1] "a"
$two
[1]"b"
$three
[1]"c"

59
VIII. If x<-list(y=1:10,t="Hello",f="TT",r=5:20), write an R statement that will give the
length of vector r of x.
x<-list(y=1:10,t="Hello",f="TT",r=5:20)
length(x$r)
Sol
[1] 16

IX Let string<-"GrandOpening",write an Rstatement to split this string into two and return
the following output:
> string <- "GrandOpening"
> a <- strsplit(string,"")
> list(a[[1]][1], a[[1]][2]) [[1]]
Sol
[1] "Grand"
[[2]]
[1] "Opening"

OUTPUT:

RESULT: The process has been executed successfully

60
EX.NO: 17
DATE:
BUILT-IN FUNCTIONS IN R
AIM
To work Built in functions in R

PROCEDURE
Built-in Functions
Almost everything in R is done through functions. Here I'm only referring to numeric and
character functions that are commonly used in creating or recoding variables.
Numeric Functions

Function Description
abs(x) absolute value
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) round(3.475, digits=2) is 3.48
round(x, digits=n) signif(3.475, digits=2) is 3.5
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x),
etc
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x

1.Calculate the cumulative sum (’running total’) of the numbers 2, 3, 4, 5, 6. Hint: use
cumsum() Function.

Sol: >sum(2:6)
[1] 20
>cumsum(2:6)
[1] 2 5 9 14 20

61
2. Print the 1 to10 numbers in reverse order. Hint: use the rev function.
Sol:
>rev(1:10)
[1] 10 9 8 7 6 5 4 3 2 1

3. Calculate the cumulative sum of those numbers, but in reverse order.


Sol: >rev(cumsum(1:10))
[1] 55 45 36 28 21 15 10 6 3 1

4. Find 10 random numbers between 0 and100. (Hint: you can use sample() function)
Sol: >sample(1:100)

[1] 92 86 59 88 19 2 37 23 89 29 18 87 15 30 32 63 14 75
[19] 12 49 72 66 24 20 54 68 48 69 5 99 22 61 83 90 7 94
[37] 81 3 84 43 26 82 80 53 41 27 71 9 38 1 47 10 51 40
[55] 46 44 13 45 100 34 42 79 6 96 4 97 57 28 73 95 91 65
[73] 93 58 39 8 16 17 78 60 36 35 74 85 55 31 76 25 98 70
[91] 33 77 21 56 52 67 50 62 11 64

5. Calculate and Verify the value of x where x = 5, 5*x -> x, x Sol: > x<-5
> 5*x->x
> x [1] 25

6. Compute log to the base 10 (log10) of the sqrt of 100. Do not use variables.
Sol: >log10(sqrt(100))
[1] 1

62
OUTPUT

RESULT
This process has been executed successfully

63
EX.NO: 18
DATE:
PROGRAM TO CONVERT A GIVEN PH LEVELS OF SOIL TO AN ORDERED
FACTOR.

AIM

Write a R program to convert a given pH levels of soil to an ordered factor.

PROCEDURE

1. Open a new file.

2.declare a variable ph with column data.

3. Find the factors for the levels.

4.store result into ph_f variable.

5.display the operation result.

6.Exit from the program.

Note: Soil pH is a measure of the acidity or basicity of a soil. pH is defined as the negative
logarithm of the activity of hydronium ions in a solution. In soils, it is measured in a slurry of
soil mixed with water, and normally falls between 3 and 10, with 7 being neutral.

PROGRAM

ph= c(1,3,10,7,5,4,3,7,8,7,5,3,10,10,7)

print("Original data:")

print(ph)

ph_f=factor(ph,levels=c(3,7,10),ordered=TRUE)

print("pH levels of soil to an ordered factor:")

print(ph_f)

OUTPUT:

64
RESULT
This process has been executed successfully

EX.NO: 19
DATE:
CREATING AND MANIPULATING A VECTOR IN R.

65
AIM
To create and manipulate a vector in R

PROCEDURE
Creating Vector
Vectors are generally created using the c() function. Since, a vector must have elements of the
same type, this function will try and coerce elements to the same type, if they are
different.Coercion is from lower to higher types from logical to integer to double to character

>x <- c(1, 5, 4, 9, 0)


>typeof(x)
[1] "double"
>length(x)
[1] 5
>x <- c(1, 5.4, TRUE, "hello")
>x
[1] "1" "5.4" "TRUE" "hello"
>typeof(x)
[1] "character"
If we want to create a vector of consecutive numbers, the : operator is very helpful.

1.Creating a vector using : operator


>x <- 1:7; x
[1] 1 2 3 4 5 6 7
>y <- 2:-2; y
[1] 2 1 0 -1 -2
More complex sequences can be created using the seq() function, like defining number of
points in an interval, or the step size.

2.Creating a vector using seq() function


>seq(1, 3,by=0.2) # specify step size
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
> seq(1, 5, length.out=4) # specify length of the vector

66
[1] 1.000000 2.333333 3.666667 5.000000

EXERCISE – I
1.Consider two vectors, x, y x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1) What is the value of: x*y and x+y
Sol: > x<-c(4,6,5,7,10,9,4,15)
> y<-c(0,10,1,8,2,3,4,1)
>x
[1] 4 6 5 7 10 9 4 15
>y
[1] 0 10 1 8 2 3 4 1
> x*y
[1] 0 60 5 56 20 27 16 15
> x+y
[1] 4 16 6 15 12 12 8 16

2. Consider two vectors, a, b a=c(1,5,4,3,6)


b=c(3,5,2,1,9) What is the value of: a<=b
Sol:
>a<-c(1,5,4,3,6)
>b<-c(3,5,2,1,9)
>a<=b
[1] TRUE TRUE FALSE FALSE TRUE
If x=c(1:12)

3.What is the value of: dim(x) What is the value of: length(x)
Sol:
>x<-c(1:12)
>dim(x) NULL
>length(x) [1] 12

4.If a=c(12:5) What is the value of: is.numeric(a)


Sol:

67
>a<-c(12:5)
>typeof(a)
[1] "integer"
>is.numeric(a)
[1] TRUE

5.Consider two vectors, x, y x=letters [1:10]


y=letters[15:24] What is the value of: x<y
Sol:
>x<-letters[1:10]
>y<-letters[15:24]
>x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
>y
[1] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
>x<y
[1] TRUE TRUETRUETRUETRUETRUETRUETRUETRUETRUE

6. If x=c ('blue', 'red', 'green', 'yellow') what is the value of: is.character(x).
Sol:
>x<-c ('blue', 'red', 'green', 'yellow')
>typeof(x)
[1] "character"
>is.character(x)
[1] TRUE

7. If x=c('blue',10,'green',20) What is the value of: is.character(x).


Sol:
>typeof(x)
[1] "character"
>is.character(x)
[1] TRUE

8.Consider two vectors, a, b a=c(10,2,4,15)


68
b=c(3,12,4,11) What is the value of: rbind(a,b) SOL:
>a<-c(10,2,4,15)
>b<-c(3,12,4,11)
>a
[1] 10 2 4 15
>b
[1] 3 12 4 11
>rbind(a,b)

9.Consider two vectors, a, b a=c(1,2,4,5,6)


b=c(3,2,4,1,9) What is the value of: cbind(a,b)
Sol:
>a=c(1,2,4,5,6)
>b=c(3,2,4,1,9)
>cbind (a,b) a b
[1,] 1 3
[2,] 2 2
[3,] 4 4
[4,] 5 1
[5,] 6 9

EXERCISE - II

1. The numbersbelowarethefirstten daysofrainfallamountsin1996. Readthem


intoavectorusingthec()function0.1,0.6,33.8,1.9,9.6, 4.3,33.7,0.3,0.0,0.1
Sol:
rainfall<-c(0.1,0.6,33.8,1.9,9.6,4.3,33.7,0.3,0.0,0.1)
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1

2.InspectTable and answer the following questions:


What was theme an rainfall,how about thes tandarddeviation?
Sol
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
mean(rainfall)
[1]8.44
sd(rainfall)

69
[1]13.66473

3.Calculatethecumulativerainfall(’runningtotal’)overtheseten days. Confirm that


thelast value of the vector that this producesisequaltothetotalsumoftherainfall.
Sol
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
cumsum(rainfall)
[1] 0.1 0.7 34.5 36.4 46.0 50.3 84.0 84.3 84.3 84.4
sum(rainfall)==rainfall[10]
[1] FALSE

4. Whichdaysawthehighestrainfall?Hintwhich.max()
Sol
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
max(rainfall)
[1]33.8

5. Computetheproblemsum((x-mean(x))^2).
Sol
x<-c(1:10)
sum((x-mean(x))^2)
[1]82.5

6.Theweightsoffivepeoplebeforeandaftera dietprogrammeare given in the


table.

Readthe`before'and`after'valuesintotwodifferentvectorscalled
beforeandafter.UseRtoevaluatetheamountofweightlostforeach
participant.Whatistheaverageamountofweight lost?

Sol
before
[1] 78 72 78 79 105
after
[1] 67 65 79 70 93

weightlost<-before-after

70
weightlost
[1] 11 7 -1 9 12

mean(weightlost)
[1]7.6

RESULT
This process has been executed successfully

EX.NO: 20
DATE:

71
CLASSIFICATION MODEL
AIM:

a. Install relevant package for classification.


b. Choose classifier for classification problem.
c. Evaluate the performance of classifier

To learn about R Classification and various classification techniques and algorithms in


machine learning. A common job of machine learning algorithms is to recognize objects and
being able to separate them into categories. This process is called classification, and it helps us
segregate vast quantities of data into discrete values, i.e. distinct, like 0/1, True/False, or a pre-
defined output label class.

PROCEDURE:

How Supervised Learning works

Before we dive into Classification, let’s take a look at what Supervised Learning is.
Suppose you are trying to learn a new concept in maths and after solving a problem, you may
refer to the solutions to see if you were right or not. Once you are confident in your ability to
solve a particular type of problem, you will stop referring to the answers and solve the questions
put before you by yourself.

This is also how Supervised Learning works with machine learning models. In
Supervised Learning, the model learns by example. Along with our input variable, we also give
our model the corresponding correct labels. While training, the model gets to look at which
label corresponds to our data and hence can find patterns between our data and those labels.

Some examples of Supervised Learning include:

It classifies spam Detection by teaching a model of what mail is spam and not spam.
Speech recognition where you teach a machine to recognize your voice.
Object Recognition by showing a machine what an object looks like and having it pick
that object from among other objects.

We can further divide Supervised Learning into the following:

72
Figure 1: Supervised Learning Subdivisions

Classification is defined as the process of recognition, understanding, and grouping of


objects and ideas into preset categories a.k.a “sub-populations.” With the help of these pre-
categorized training datasets, classification in machine learning programs leverage a wide
range of algorithms to classify future datasets into respective and relevant categories.

Classification algorithms used in machine learning utilize input training data for the
purpose of predicting the likelihood or probability that the data that follows will fall into one
of the predetermined categories. One of the most common applications of classification is for
filtering emails into “spam” or “non-spam”, as used by today’s top email service providers.

In short, classification is a form of “pattern recognition” Here, classification algorithms


applied to the training data find the same pattern (similar number sequences, words or
sentiments, and the like) in future data sets.We will explore classification algorithms in detail,
and discover how a text analysis software can perform actions like sentiment analysis - used
for categorizing unstructured text by opinion polarity (positive, negative, neutral, and the like).

73
Classification is the process of predicting a categorical label of a data object based on
its features and properties. In classification, we locate identifiers or boundary conditions that
correspond to a particular label or category. We then try to place various unknown objects into
those categories, by using the identifiers. An example of this would be to predict the type of
water (mineral, tap, smart, etc.), based on its purity and mineral content.

Basic Terminologies of R Classification


1. Classifier: A classifier is an algorithm that classifies the input data into output
categories.
2. Classification model: A classification model is a model that uses a classifier to
classify data objects into various categories.
3. Feature: A feature is a measurable property of a data object.
4. Binary classification: A binary classification is a classification with two possible
output categories.
5. Multi-class classification: A multi-class classification is a classification with more
than two possible output categories.
6. Multi-label classification: A multi-label classification is a classification where a
data object can be assigned multiple labels or output classes.

Popular Classification Algorithms in R are

74
1. R Logistic Regression
2. Decision Trees in R
3. Support Vector Machines in R
4. Naive Bayes Classifier
5. Artificial Neural Networks in R
6. K – Nearest Neighbor in R

Applications of R Classification Algorithms:

1. Logistic regression

Weather forecast
Word classification
Symptom classification

2. Decision trees

Pattern recognition
Pricing decisions
Data exploration

3. Support Vector Machines

Investment suggestions
Stock comparison

4. Naive Bayes Classifier

Spam filters
Disease prediction
Document classification

5. Artificial Neural Network

Handwriting analysis
Object recognition
Voice recognition

6. k-Nearest Neighbor

Industrial task classification


Video recognition
Image recognition

75
Unsupervised learning is a type of algorithm that learns patterns from untagged data.
The hope is that through mimicry, which is an important mode of learning in people, the
machine is forced to build a compact internal representation of its world and then generate
imaginative content from it.
In contrast to supervised learning where data is tagged by an expert, e.g. as a "ball" or
"fish", unsupervised methods exhibit self-organization that captures patterns as probability
densities or a combination of neural feature preferences.
The other levels in the supervision spectrum are reinforcement learning where the
machine is given only a numerical performance score as guidance, and semi-supervised
learning where a smaller portion of the data is tagged. Two broad methods in Unsupervised
Learning are Neural Networks and Probabilistic Methods.

4 Applications of Classification Algorithms:

● Sentiment Analysis
● Email Spam Classification
● Document Classification
● Image Classification

Types of Unsupervised Machine Learning Techniques

Unsupervised learning problems further grouped into clustering and association problems.
Clustering

Clustering is an important concept when it comes to unsupervised learning. It mainly


deals with finding a structure or pattern in a collection of uncategorized data. Clustering
algorithms will process your data and find natural clusters(groups) if they exist in the data. You
can also modify how many clusters your algorithms should identify. It allows you to adjust the
granularity of these groups.

Association

76
Association rules allow you to establish associations amongst data objects inside large
databases. This unsupervised technique is about discovering exciting relationships between
variables in large databases. For example, people that buy a new home most likely to buy new
furniture.

Coding:

library(caret)
df<- iris
levels(df$Species)
summary(df)

# Create a list of 80% of the rows in the original dataset we can use for training
inTraining<- createDataPartition(df$Species, p = 0.8, list=FALSE)

# use 80% of the data to training and testing the models


training<- df[inTraining,]

# use 20% of the data to training and testing the models


validation<- df[-inTraining,]

#run algorithms using 10-fold cross validation


control<- trainControl(method="cv", number=10)
metric<- "Accuracy"

#a) linear Discriminant algorithm(LDA)

set.seed(7)
fit.lda<- train(Species~., data=df, method="lda", metric=metric, trControl=control)
predictions<- predict(fit.lda, validation)
predictions
confusionMatrix(predictions, validation$Species)

#b) Non linear algorithm (NLA)


#CART
set.seed(7)
fit.cart<- train(Species~., data=df, method="rpart", metric=metric, trControl=control)

#KNN
set.seed(7)
fit.knn<- train(Species~., data=df, method="knn", metric=metric, trControl=control)

#c)advanced algorithms
#SVM
set.seed(7)

77
fit.svm<- train(Species~., data=df, method="svmRadial", metric=metric,
trControl=control)

#Random Forest
set.seed(7)
fit.rf<- train(Species~., data=df, method="rf", metric=metric, trControl=control)

#summarize accuracy of models


results<- resamples(list(lda=fit.lda, cart=fit.cart, knn=fit.knn, svm=fit.svm, rf=fit.rf))
summary(results)

dotplot(results)

importance<- varImp(fit.rf)
plot(importance)

Cart result:

SVM Classification result:

KNN Classification result:

78
RESULT
Thus, the process has been executed successfully

79

You might also like