0% found this document useful (0 votes)
6 views20 pages

Datamining

The document contains multiple R programming scripts for data analysis, including association rule mining using the Apriori algorithm, K-means clustering, hierarchical clustering, Naive Bayes classification, decision trees, linear regression, and various plotting techniques. Each program is accompanied by its output, demonstrating the results of the analysis performed on different datasets. The scripts utilize libraries such as 'arules', 'cluster', 'e1071', and 'party' to implement statistical methods and visualize data.

Uploaded by

Udhaya Perumal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

Datamining

The document contains multiple R programming scripts for data analysis, including association rule mining using the Apriori algorithm, K-means clustering, hierarchical clustering, Naive Bayes classification, decision trees, linear regression, and various plotting techniques. Each program is accompanied by its output, demonstrating the results of the analysis performed on different datasets. The scripts utilize libraries such as 'arules', 'cluster', 'e1071', and 'party' to implement statistical methods and visualize data.

Uploaded by

Udhaya Perumal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

program1

library(arules)

dataset=list(

c("Bread","Butter","Milk"),

c("Bread","Butter"),

c("Beer","Cookies","Diapers"),

c("Milk",'Diapers',"Bread","Butter"),

c("Beer","Diapers")

items1=transactions(dataset)

print(items1)

rules=apriori(items1,parameter=list(supp=0.4,conf=0.7,minlen=2,maxlen=10))

inspect(rules)

output

library(arules)
> dataset=list(
+ c("Bread","Butter","Milk"),
+ c("Bread","Butter"),
+ c("Beer","Cookies","Diapers"),
+ c("Milk",'Diapers',"Bread","Butter"),
+ c("Beer","Diapers")
+)
> items1=transactions(dataset)
> print(items1)
transactions in sparse format with
5 transactions (rows) and
6 items (columns)
> rules=apriori(items1,parameter=list(supp=0.4,conf=0.7,minlen=2,maxlen=10))
Apriori

Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.7 0.1 1 none FALSE TRUE 5 0.4 2 10 rules TRUE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 2

set item appearances ...[0 item(s)] done [0.00s].


set transactions ...[6 item(s), 5 transaction(s)] done [0.00s].
sorting and recoding items ... [5 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [7 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> inspect(rules)
lhs rhs support confidence coverage lift count
[1] {Beer} => {Diapers} 0.4 1 0.4 1.666667 2
[2] {Milk} => {Butter} 0.4 1 0.4 1.666667 2
[3] {Milk} => {Bread} 0.4 1 0.4 1.666667 2
[4] {Butter} => {Bread} 0.6 1 0.6 1.666667 3
[5] {Bread} => {Butter} 0.6 1 0.6 1.666667 3
[6] {Butter, Milk} => {Bread} 0.4 1 0.4 1.666667 2
[7] {Bread, Milk} => {Butter} 0.4 1 0.4 1.666667 2
>
>
>

Program2

library(cluster)

items <- read.csv("E:/kmeans.csv")

summary(items)

plot(items)

data <- kmeans(items,2)

print(data)

clusplot(items,data$cluster,color=TRUE,lines=0,labels = 2)

output:
library(cluster)
> items <- read.csv("E:/kmeans.csv")
> summary(items)
Dataset
Min. : 3.00
1st Qu.: 4.25
Median :10.00
Mean :12.83
3rd Qu.:18.75
Max. :30.00
> plot(items)
> data <- kmeans(items,2)
> print(data)
K-means clustering with 2 clusters of sizes 3, 3

Cluster means:
Dataset
1 21.66667
2 4.00000

Clustering vector:
[1] 2 2 1 2 1 1

Within cluster sum of squares by cluster:


[1] 116.6667 2.0000
(between_SS / total_SS = 79.8 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss"


[7] "size" "iter" "ifault"
> clusplot(items,data$cluster,color=TRUE,lines=0,labels = 2)
Program3

library(cluster)

items <- c(5,10,12,3,8)

summary(items)

plot(items)

data <- dist(items,method="euclidean")

hier <- hclust(data,method="complete")


plot(hier)

grps<-cutree(hier,k=2)

print(grps)

rect.hclust(hier,k=2,border="green")

output:

library(cluster)
> items <- c(5,10,12,3,8)
> summary(items)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 5.0 8.0 7.6 10.0 12.0
> plot(items)
> data <- dist(items,method="euclidean")
> hier <- hclust(data,method="complete")
> plot(hier)
> grps<-cutree(hier,k=2)
> print(grps)
[1] 1 2 2 1 2
> rect.hclust(hier,k=2,border="green")
Program4

library(e1071)

data<-read.csv("E:/mark.csv")

head(data)

str(data)

model<-naiveBayes(Dept~.,data=data)

predictions<-predict(model,data)
print(predictions)

model1<-naiveBayes(Dept~Maths+Science,data=data)

print(model1)

library(naivebayes)

model1<-naive_bayes(Dept~Maths+Science,usekernal=T,data=data)

print(model1)

output

library(e1071)
> data<-read.csv("E:/mark.csv")
> head(data)
Dept Maths Science
1 BCA 90 93
2 CS 98 78
3 BCA 60 50
4 Physics 95 67
5 CS 50 90
6 CS 65 70
> str(data)
'data.frame': 7 obs. of 3 variables:
$ Dept : chr "BCA" "CS" "BCA" "Physics" ...
$ Maths : int 90 98 60 95 50 65 87
$ Science: int 93 78 50 67 90 70 78

> model<-naiveBayes(Dept~.,data=data)
> predictions<-predict(model,data)
> print(predictions)
[1] CS Physics BCA Physics CS CS Physics
Levels: BCA CS Physics
> model1<-naiveBayes(Dept~Maths+Science,data=data)
>
> print(model1)

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
BCA CS Physics
0.2857143 0.4285714 0.2857143

Conditional probabilities:
Maths
Y [,1] [,2]
BCA 75 21.213203
CS 71 24.556058
Physics 91 5.656854

Science
Y [,1] [,2]
BCA 71.50000 30.405592
CS 79.33333 10.066446
Physics 72.50000 7.778175

> library(naivebayes)
> model1<-naive_bayes(Dept~Maths+Science,usekernal=T,data=data)
> print(model1)

================================================ Naive Bayes ======================================

Call:
naive_bayes.formula(formula = Dept ~ Maths + Science, data = data,
usekernal = T)

--------------------------------------------------------------------------------------------------------------

Laplace smoothing: 0

--------------------------------------------------------------------------------------------------------------

A priori probabilities:

BCA CS Physics
0.2857143 0.4285714 0.2857143
--------------------------------------------------------------------------------------------------------------

Tables:

--------------------------------------------------------------------------------------------------------------
::: Maths (Gaussian)
--------------------------------------------------------------------------------------------------------------

Maths BCA CS Physics


mean 75.000000 71.000000 91.000000
sd 21.213203 24.556058 5.656854

--------------------------------------------------------------------------------------------------------------
::: Science (Gaussian)
--------------------------------------------------------------------------------------------------------------

Science BCA CS Physics


mean 71.500000 79.333333 72.500000
sd 30.405592 10.066446 7.778175

--------------------------------------------------------------------------------------------------------------

>
Program5

library(party)

library(rpart)

data<-read.csv("E:/decision.csv")

head(data)

str(data)

plot(data)

data1<-as.data.frame(data)

tree<-
rpart(Play~data1$Whether+data1$Temperature+data1$Windy,data=data1,method="class",control=rpart.c
ontrol(minsplit=1,minbucket=1,cp=0))

plot(tree,main="Decision Tree")

text(tree)

output;
library(party)
> library(rpart)
> data<-read.csv("E:/decision.csv")
> head(data)
Day Whether Temperature Windy Play
1 1 Rain Mild Weak No
2 2 Normal Hot Weak Yes
3 3 Wind Mild Strong Yes
4 4 Normal Cool Weak No
5 5 Rain Hot Strong No
> str(data)
'data.frame': 5 obs. of 5 variables:
$ Day : int 1 2 3 4 5
$ Whether : chr "Rain" "Normal" "Wind" "Normal" ...
$ Temperature: chr "Mild" "Hot" "Mild" "Cool" ...
$ Windy : chr "Weak" "Weak" "Strong" "Weak" ...
$ Play : chr "No" "Yes" "Yes" "No" ...
> plot(data)
> data1<-as.data.frame(data)
> tree<-
rpart(Play~data1$Whether+data1$Temperature+data1$Windy,data=data1,method="class",control=rpart.c
ontrol(minsplit=1,minbucket=1,cp=0))
> plot(tree,main="Decision Tree")
> text(tree)
Program6

Age <- c(22,42,21,57)

Glosteral <- c(60,90,65,65)

relation <- lm(Glosteral~Age)

summary(relation)

a <- data.frame(Age = 54)

result <- predict(relation,a)


print(result)

output:

Age <- c(22,42,21,57)


> Glosteral <- c(60,90,65,65)
> relation <- lm(Glosteral~Age)
> summary(relation)

Call:
lm(formula = Glosteral ~ Age)

Residuals:
1 2 3 4
-6.538 18.333 -1.282 -10.513

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.8974 20.1572 3.021 0.0943 .
Age 0.2564 0.5232 0.490 0.6725
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.67 on 2 degrees of freedom


Multiple R-squared: 0.1072, Adjusted R-squared: -0.3392
F-statistic: 0.2402 on 1 and 2 DF, p-value: 0.6725

> a <- data.frame(Age = 54)


> result <- predict(relation,a)
> print(result)
1
74.74359

Program7

#BAR PLOT

Age<-c(55,57,43,35,25)

BP<-c(140,141,130,120,120)

colors=c("red","yellow","green","violet","cyan")

barplot(Age,BP,col=colors,ylab='Age',xlab='BP,main=”Bar Plot')

#BOX PLOT

x<-c(12,24,50,33,28,43)

boxplot(x,horizontal=TRUE,main='Box Plot',xlab="Items")
#HISTOGRAM

x<-c(12,24,50,33,28,43)

hist(x)

colors=c("red","yellow","violet","green","cyan")

hist(x,col=colors,main="Histogram",xlab="Items")

#SCATTERPLOT

Age<-c(12,8,15,7,9)

Weight<-c(23,12,35,21,15)

plot(Age,Weight,ylab='Weight',xlab='Age',main='Scatter Plot')

#PIECHART

x<-c(10,30,12,23,21)

lbl<-c("US","UK","INDIA","JAPAN","FRANCE")

colors=c("red","yellow","violet","green","cyan")

pie(x,lbl,main="Pie Chart",col=colors)

output:

You might also like