0% found this document useful (0 votes)

10 views53 pages

R Record-1

The document is a practical record for the I M.Sc Computer Science course at L.R.G Government Arts College for Women, focusing on Data Mining using R. It includes various algorithms such as Apriori for association rules, K-Means and Hierarchical clustering, and classification methods, along with R code snippets and outputs for practical implementation. The record is intended for submission for the Bharathiar University Practical Examination for the academic year 2023-2024.

Uploaded by

Samy Samy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views53 pages

R Record-1

Uploaded by

Samy Samy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 53

L.R.

G GOVERNMENT ARTS COLLEGE FOR

WOMEN (AFFILIATED TO BHARATHIAR

UNIVERSITY)

TIRUPUR-641604.
DEPARTMENT OF COMPUTER SCIENCE

I MSC-COMPUTER SCIENCE

PRACTICAL-III :DATA MINING USING R

NAME :

REG.NO:
CERTIFICATE
L R.G GOVERNMENT ARTS COLLEGE FOR WOMEN

(AFFILIATED TO BHARATHIAR UNIVERSITY)

TIRUPUR-641604

NAME :

CLASS :

This is to certify that it is a bonafide record of practical work done by the above student of
the I-M.SC COMPUTER SCIENCE PRACTICAL-III: DATA MINING USING R during the
academic year 2023-2024.

Staff in Charge. Head of the Department

Submitted for the Bharathiar University Practical Examination held On. . . . . . . . . . . . . . . .

Internal Examiner. External Examiner

CONTENT
INDEX

S.N DATE CONTENTS PA SIGN

O G
E
N
O

1 APRIORI ALGORITHM TO
EXTRACT ASSOCIATION RULES
OF DATA MINING

2 K-MEANS CLUSTERING ALGORITHM

3 HIERARCHICAL CLUSTERING

4 CLASSIFICATION ALGORITHM

5 DECISION TREE

6 LINEAR REGRESSION

7 DATA VISUALZATION
1. APRIORI ALGORITHM TO EXTRACT ASSOCIATION RULES

OF DATA MINING

# Loading Libraries

library(arules)

library(arulesViz)

library(RColorBrewer)

# import dataset

data('Groceries')

# using apriori() function

rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

# using inspect() function

inspect(rules[1:10])

# using itemFrequencyPlot() function

arules::itemFrequencyPlot(Groceries,topN=20,

col=brewer.pal(8,'Pastel2'),

main='Relative Item Frequency

Plot', type='relative',

ylab='Item Frequency(Relative)')
OUTPUT:
# Loading Libraries

>library(arules)

Loading required package: Matrix

Attaching package: ‘arules’

The following objects are masked from

‘package:base’: abbreviate, write

> library(arulesViz)

> library(RColorBrewer)

> # import dataset

> data('Groceries')

> # using apriori() function

> rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

Apriori

Parameter specification:

confidence minval smax arem aval originalSupport maxtime support minlen

0.2 0.1 1 none FALSE TRUE 5 0.01 1

maxlen target ext

10 rules TRUE

Algorithmic

control:

filter tree heap memopt load sort verbose

0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 98

set item appearances ...[0 item(s)] done [0.00s].

set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].

sorting and recoding items ... [88 item(s)] done [0.00s].

creating transaction tree ... done [0.01s].

checking subsets of size 1 2 3 4 done [0.00s].

writing ... [232 rule(s)] done [0.00s].

creating S4 object ... done [0.00s].

> # using inspect() function

> inspect(rules[1:10])

lhs rhs support confidence coverage

[1] {} => {whole milk} 0.25551601 0.2555160 1.00000000

[2] {hard cheese} => {whole milk} 0.01006609 0.4107884 0.02450432

[3] {butter milk} => {other vegetables} 0.01037112 0.3709091 0.02796136

[4] {butter milk} => {whole milk} 0.01159126 0.4145455 0.02796136

[5] {ham} => {whole milk} 0.01148958 0.4414062 0.02602949

[6] {sliced cheese} => {whole milk} 0.01077783 0.4398340 0.02450432

[7] {oil} => {whole milk} 0.01128622 0.4021739 0.02806304

[8] {onions} => {other vegetables} 0.01423488 0.4590164 0.03101169

[9] {onions} => {whole milk} 0.01209964 0.3901639 0.03101169

[10] {berries} => {yogurt} 0.01057448 0.3180428 0.03324860

lift count
[ 1.0000 25
1 00 13
]
[ 1.6076 9
2 82 9
]
[ 1.9169 10
3 16 2
]
[ 1.6223 11
4 85 4
]
[ 1.7275 11
5 09 3
]
[ 1.7213 10
6 56 6
]
[ 1.5739 11
7 68 1
]
[8] 2.372268 140

[9] 1.526965 119

[10] 2.279848 104

> # using itemFrequencyPlot() function

> arules::itemFrequencyPlot(Groceries,topN=20,

+ col=brewer.pal(8,'Pastel2'),

+ main='Relative Item Frequency Plot',

+ type='relative',

+ ylab='Item Frequency(Relative)')
Relative Item Frequency Plot
2.K-Means Clustering
library(cluster)

df=USArrests

#Number of Rows and Columns of the Actul Data

set dim(df)

head(df)

#remove rows with missing

values df=na.omit(df)

dim(df)

#scale each variable to have mean 0 and sd 1

df=scale(df)

head(df)

set.seed(1)

#Cluster the dataset with 5 groups

km=kmeans(df,centers=5,nstart=25)

print(km)

plot(df)

points(km$centers,col=1:5,pch=8,cex=2)

cnt=table(km$cluster)

print(cnt)

final_data=cbind(df,cluster=km$cluster)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")

ag=aggregate(final_data,by=list(cluster=km$cluster),mean)

head(ag)

plot(ag,cex=0.6,main="Aggregate")
OUTPUT:
#K-means Clustering

> #Number of Rows and Columns of the Actaul Data set

> dim(df)

[1] 50 4

> head(df)

Murder Assault UrbanPop Rape

Alabam 13 23 58
a .2 6 21.2
Alaska 10. 263 48
0 44.5
Arizona 8.1 294 80
31.0
Arkansa 8. 190 50
s 8 19.5
Californ 9. 276 91
ia 0 40.6
Colorad 7. 204 78
o 9 38.7

> #remove rows with missing

values

> dim(df)

[1] 50 4

> #scale each variable to have mean 0 and sd 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207

> #Cluster the dataset with 5 groups

> print(km)

K-means clustering with 5 clusters of sizes 7, 10, 10, 11,

12 Cluster means:

Murder Assault UrbanPop Rape

1 1.5803956 0.9662584 -0.7775109 0.04844071

2 -1.1727674 -1.2078573 -1.0045069 -1.10202608

3 -0.6286291 -0.4086988 0.9506200 -0.38883734

4 -0.1642225 -0.3658283 -0.2822467 -0.11697538

5 0.7298036 1.1188219 0.7571799 1.32135653

Clustering vector:

Alabama Alaska Arkan Californi

a
Arizona 15 5 4 sa

Colorado Connecticut s
Delaware Georg
5 ia
Flori
da

5 3 3 5 1

Haw Idah Illino India Iowa

aii o is na
3 2 5 4 2

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 2

Massachusetts Michigan Minnesota Mississippi Missouri

3 5 2 1 4

Montana Nebraska Nevada New Hampshire New

Jersey 4 4 5 2 3

New Mexico New York North Carolina North Dakota

Ohio 5 5 1 2
3

Oklahoma Oregon Pennsylvania Rhode Island South Carolina

4 4 3 3 1

South Dakota Tennessee Texas Utah

Vermont 2 1 5 3

Virginia Washington West Virginia Wisconsin

Wyoming 4 3 22 4

Within cluster sum of squares by cluster:

[1] 6.128432 7.443899 9.326266 7.788275 18.257332

(between_SS / total_SS = 75.0 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

> plot(df)
> points(km$centers,col=1:5,pch=8,cex=2)
> print(cnt)

12345

7 10 10 11 12

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1

ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 5

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 5

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 5

Colorado 0.02571456 0.3988593 0.8608085 1.864967207 5

> plot(final_data,cex=0.6,main="Final Data")

> head(ag)

cluster Murder Assault UrbanPop Rape cluster

1 1 1.5803956 0.9662584 -0.7775109 0.04844071 1

2 2 -1.1727674 -1.2078573 -1.0045069 -1.10202608 2

3 3 -0.6286291 -0.4086988 0.9506200 -0.38883734 3

4 4 -0.1642225 -0.3658283 -0.2822467 -0.11697538 4

5 5 0.7298036 1.1188219 0.7571799 1.32135653 5

> plot(ag,cex=0.6,main="Aggregate")
3.HIERARCHICAL CLUSTERING
#Hierarchical Clustering

library(cluster)

df=USArrests

#remove rows with missing

values df=na.omit(df)

#scale each variable to have a mean 0 and sd of

1 df=scale(df)

head(df)

d=dist(df,method="euclidean")

#complete dendogram

hc1=hclust(d,method="complete")

plot(hc1,cex=0.6,main="complete dendogram",hang=-1)

#average dendogram

hc2=hclust(d,method="average")

plot(hc2,cex=0.6,main="Average Dendogram",hang=-1)

abline(h=3.0,col="green")

groups=cutree(hc2,k=4)

print(groups)

table(groups)

rect.hclust(hc2,k=4,border="red")

final_data=cbind(df,cluster=groups)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")
OUTPUT:
#Hierarchical Clustering

> #remove rows with missing values

> #scale each variable to have a mean 0 and sd of 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207

> #complete dendogram

> plot(hc1,cex=0.6,main="complete dendogram",hang=-1)

> #average dendogram

> print(groups)

Alabama Alaska Arkan Californi

a
Arizona 12 3 4 sa

Colorado Connecticut s
Delaware Geor
3 gia
Flori
da

3 4 4 3 1

Haw Idah Illino India Iowa

aii o is na
4 4 3 4 4

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 4

Massachusetts Michigan Minnesota Mississippi Missouri

4 3 4 1 3

Montana Nebraska Nevada New Hampshire New Jersey

4 4 3 4 4

New Mexico New York North Carolina North Dakota

Ohio 3 3 1 4

Oklahoma Oregon Pennsylvania Rhode Island South Carolina 4

4 4 4 1

South Dakota Tennessee Texas Utah

Vermont 4 1 3 4

Virginia Washington West Virginia Wisconsin

Wyoming 4 4 44 4

groups

1234

7 1 12 30

> rect.hclust(hc2,k=4,border="red")
> final_data=cbind(df,cluster=groups)

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1

ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 2

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 3

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 3

Colorado 0.02571456 0.3988593 0.8608085 1.864967207 3

> plot(final_data,cex=0.6,main="Final Data")
4.CLASSIFICATION ALGORITHM
#Classification

Algorithm library(class)

df=data(iris)

#Number of Rows and Columns

dim(iris)

head(iris)

rand=sample(1:nrow(iris),0.9*nrow(iris))

head(rand)

#Scale the values using Normalization

method nor<-function(x)

return((x-min(x))/(max(x)-min(x)))

iris_norm=as.data.frame(lapply(iris[,c(1,2,3,4)],nor))

head(iris_norm)

#Train dataset

iris_train=iris_norm[rand,]

iris_train_target=iris[rand,5

] #Test dataset

iris_test=iris_norm[-rand,]

iris_test_target=iris[-rand,5]

dim(iris_train)

dim(iris_test)

#K-nearesr neighbour Classification

model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)
#Confusion Matric

tab=table(model1,iris_test_target)

print(tab)

accuracy=function(x)

sum(diag(x)/sum(rowSums(x)))*100

cat("Accuracy classifier=",accuracy(tab))
OUTPUT:
#Classification Algorithm

> #Number of Rows and Columns

> dim(iris)

[1] 150 5

> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto

. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> head(rand)

[1] 114 107 25 128 14 24

> #Scale the values using Normalization method

> nor<-function(x)

+ return((x-min(x))/(max(x)-min(x)))

> head(iris_norm)

Sepal.Length Sepal.Width Petal.Length Petal.Width

1 0.62500 0.067796 0.041666
0.22222222 00 61 67
2 0.41666 0.067796 0.041666
0.16666667 67 61 67
3 0.50000 0.050847 0.041666
0.11111111 00 46 67
4 0.45833 0.084745 0.041666
0.08333333 33 76 67
5 0.19444444 0.6666667 0.06779661 0.04166667

6 0.30555556 0.7916667 0.11864407 0.12500000

> #Train dataset

> iris_train=iris_norm[rand,]

> iris_train_target=iris[rand,5]

> #Test dataset

> iris_test=iris_norm[-rand,]

> iris_test_target=iris[-rand,5]

> dim(iris_train)

[1] 135 4

> dim(iris_test)

[1] 15 4

> #K-nearesr neighbour Classification

> model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)

> #Confusion Matric

> print(tab)

iris_test_target

model1 setosa versicolor

virginica setosa 6 0

versicolor 0 6 1

virginica 0 0 2

> accuracy=function(x)

+ sum(diag(x)/sum(rowSums(x)))*100

cat("Accuracy classifier=",accuracy(tab)) Accuracy classifier= 100>

5.DECISION TREE
#Decision Tree

library(rpart)

data=iris

str=data

head(data)

#creating the decision tree using regression

dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,method
="anova")

plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")

print(dtree)

text(dtree,use.n=TRUE,cex=.7)

#predicting the Sepal Width

adata<-data.frame(Species='versicolor',Sepal.Length=5.1,Petal.Length=4.5,Petal.Width=1.4)

cat("Predicted Value:\n")

pt=predict(dtree,adata,method="anova")

print(pt)

plot(pt)

#creating the decision tree using

classification df=as.data.frame(data)

dt=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=df,method="cl
ass")

plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

print(dt)

text(dt,use.n=TRUE,cex=.7)
OUTPUT:
> #Decision Tree

> head(data)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto

. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9
> #creating the decision tree using regression
dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,metho
d
="anova")

> plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")

> print(dtree)

n= 150

node), split, n, deviance, yval

* denotes terminal node

1) root 150 28.3069300 3.057333

2) Species=versicolor,virginica 100 10.9616000 2.872000

4) Petal.Length< 4.05 16 0.7975000 2.487500 *

5) Petal.Length>=4.05 84 7.3480950 2.945238

10) Petal.Width< 1.95 55 3.4920000 2.860000

20) Sepal.Length< 6.35 36 2.5588890 2.805556 *

21) Sepal.Length>=6.35 19 0.6242105 2.963158 *

11) Petal.Width>=1.95 29 2.6986210 3.106897

22) Petal.Length< 5.25 7 0.3285714 2.914286 *

23) Petal.Length>=5.25 22 2.0277270 3.168182 *

3) Species=setosa 50 7.0408000 3.428000

6) Sepal.Length< 5.05 28 2.0496430 3.203571 *

7) Sepal.Length>=5.05 22 1.7859090 3.713636 *

> text(dtree,use.n=TRUE,cex=.7)

> #predicting the Sepal Width

Predicted Value:

2.805556

> plot(pt)
> #creating the decision tree using classification

> plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

> print(dt)

n= 150

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 150 124 3 (0.0067 0.02 0.027 0.02 0.053 0.033 0.06 0.093 0.067 0.17 0.073 0.087
0.04 0.08 0.04 0.027 0.02 0.04 0.013 0.0067 0.0067 0.0067 0.0067)

2) Petal.Width>=0.8 100 80 3 (0.01 0.03 0.03 0.03 0.08 0.05 0.09 0.14 0.09 0.2 0.07 0.08
0.04 0.03 0 0.01 0 0.02 0 0 0 0 0)

4) Sepal.Length< 6.45 65 55 2.8 (0.015 0.046 0.046 0.046 0.11 0.062 0.14 0.15 0.11 0.14
0.015 0.046 0.031 0.046 0 0 0 0 0 0 0 0 0)

8) Petal.Width< 1.95 56 47 2.7 (0.018 0.054 0.054 0.054 0.11 0.071 0.16 0.11 0.12 0.16
0.018 0.036 0.018 0.018 0 0 0 0 0 0 0 0 0)

16) Sepal.Length< 5.55 12 9 2.4 (0.083 0 0.17 0.25 0.25 0.083 0.083 0 0 0.083 0 0 0 0
0 0 0 0 0 0 0 0 0) *

17) Sepal.Length>=5.55 44 36 2.7 (0 0.068 0.023 0 0.068 0.068 0.18 0.14 0.16 0.18
0.023 0.045 0.023 0.023 0 0 0 0 0 0 0 0 0)

34) Petal.Width< 1.55 29 23 2.9 (0 0.1 0.034 0 0.069 0.1 0.1 0.17 0.21 0.17 0 0.034 0
0 0 0 0 0 0 0 0 0 0)

68) Sepal.Length>=5.95 15 11 2.9 (0 0.2 0.067 0 0.067 0.067 0 0.2 0.27 0.067 0
0.067 0 0 0 0 0 0 0 0 0 0 0) *

69) Sepal.Length< 5.95 14 10 3 (0 0 0 0 0.071 0.14 0.21 0.14 0.14 0.29 0 0 0 0 0 0 0

0 0 0 0 0 0) *

35) Petal.Width>=1.55 15 10 2.7 (0 0 0 0 0.067 0 0.33 0.067 0.067 0.2 0.067 0.067
0.067 0.067 0 0 0 0 0 0 0 0 0) *

9) Petal.Width>=1.95 9 5 2.8 (0 0 0 0 0.11 0 0 0.44 0 0 0 0.11 0.11 0.22 0 0 0 0 0 0 0 0

0) *

5) Sepal.Length>=6.45 35 24 3 (0 0 0 0 0.029 0.029 0 0.11 0.057 0.31 0.17 0.14 0.057 0 0

0.029 0 0.057 0 0 0 0 0) *

3) Petal.Width< 0.8 50 41 3.4 (0 0 0.02 0 0 0 0 0 0.02 0.12 0.08 0.1 0.04 0.18 0.12 0.06
0.06 0.08 0.04 0.02 0.02 0.02 0.02)

6) Sepal.Length< 4.95 20 15 3 (0 0 0.05 0 0 0 0 0 0.05 0.25 0.2 0.2 0 0.15 0 0.1 0 0 0 0 0

0 0)

12) Petal.Length< 1.45 13 8 3 (0 0 0.077 0 0 0 0 0 0.077 0.38 0 0.23 0 0.077 0 0.15 0 0 0

0 0 0 0) *

13) Petal.Length>=1.45 7 3 3.1 (0 0 0 0 0 0 0 0 0 0 0.57 0.14 0 0.29 0 0 0 0 0 0 0 0 0) *

7) Sepal.Length>=4.95 30 24 3.4 (0 0 0 0 0 0 0 0 0 0.033 0 0.033 0.067 0.2 0.2 0.033 0.1
0.13 0.067 0.033 0.033 0.033 0.033)

14) Petal.Length< 1.45 11 7 3.5 (0 0 0 0 0 0 0 0 0 0 0 0.091 0.091 0.091 0.36 0.091 0 0

0.091 0.091 0 0.091 0) *

15) Petal.Length>=1.45 19 14 3.4 (0 0 0 0 0 0 0 0 0 0.053 0 0 0.053 0.26 0.11 0 0.16

0.21 0.053 0 0.053 0 0.053) *

> text(dt,use.n=TRUE,cex=.7)
6.LINEAR REGRESSION
#Linear Regression

setwd("D:/R")

df=read.csv("h2.csv",header=TRUE)

print(df)

lr=lm(height~weight,data=df)

print(lr)

#Linear Regression

plot(df$height,df$weight,col="blue",main="Height_Weight
Regression",cex=1.3,pch=15,xlab="height",ylab="weight")

print(summary(lr))

print(residuals(lr))

coeff=coefficients(lr)

eq=paste0("y",round(coeff[2],1),"*(",round(coeff[1],1),"*x)")

print(eq)

#Linear Equation

new.weights=data.frame(weight=c(60,50)

) print(new.weights)

df1=predict(lr,newdata=new.weights)

print(df1)

df2=data.frame(df1,new.weights)

names(df2)=c("height","weight")

print(df2)

df3=rbind(df,df2)

print(df3)

write.csv(df3,"h3.csv")

pie(table(df3$height))
OUTPUT:
> #Linear Regression

> setwd("D:/R")

> df=read.csv("h2.csv",header=TRUE)

> print(df)

height weight

1 8
0
174
2 7
0
150
3 7
5
160
4 8
5
180

> lr=lm(height~weight,data=df)

> print(lr)

Call:

lm(formula = height ~ weight, data =

df) Coefficients:

(Intercept) weight
4.80 2.08

> #Linear Regression

> print(summary(lr))

Call:

lm(formula = height ~ weight, data =

df) Residuals:

1 2 3 4

2.8 -0.4 -0.8 -1.6

Coefficients:
Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.8000 16.4463 0.292 0.7979

weight 2.0800 0.2117 9.827 0.0102 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.366 on 2 degrees of freedom

Multiple R-squared: 0.9797, Adjusted R-squared: 0.9696

F-statistic: 96.57 on 1 and 2 DF, p-value: 0.0102

> print(residuals(lr))

1 2 3 4

2.8 -0.4 -0.8 -1.6

> print(eq)

[1] "y2.1*(4.8*x)"

> #Linear Equation

> print(new.weights)

weight

1 60

2 50

> print(df1)

1 2

129.6 108.8

> print(df2)

height weight

1 129.6 60

2 108.8 50

> df3=rbind(df,df2)
> print(df3)

height

weight

1 174.0 8
0
2 150.0 7
0
3 160.0 7
5
4 180.0 8
5
11 6
129.6 0
21 5
108.8 0

> write.csv(df3,"h3.csv")

> pie(table(df3$height))
7.DATA VISUALIZATION
#Data Visualization

X=iris

dim(X)

summary(X)

head(X)

hist(X$Sepal.Length,main='Histogram',col='green')

barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length'

) pie(table(X$Sepal.Length),main='pie-chart')

pairs(X)

plot(X$Sepal.Length,main='plot-chart',col='blue')

boxplot(X,main='Boxplot',col='yellow')
OUTPUT:

> #Data Visualization

> dim(X)

[1] 150 5

> summary(X)

Sepal.Length Sepal.Width

Petal.Length Min. :4.300

Min. :2.000 Min. :1.000 1st Qu.:5.100

1st Qu.:2.800 1st Qu.:1.600

Median :5.800 Median :3.000 Median :4.350

Mean :5.843 Mean :3.057 Mean :3.758 3rd

Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 Max.

:7.900 Max. :4.400 Max. :6.900

Petal.Width Species

Min. :0.100 setosa :50

1st Qu.:0.300 versicolor:50

Median :1.300 virginica :50

Mean :1.199

3rd Qu.:1.800

Max. :2.500

> head(X)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto
. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> hist(X$Sepal.Length,main='Histogram',col='green')

> barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length')
> pie(table(X$Sepal.Length),main='pie-chart')

> pairs(X)
> plot(X$Sepal.Length,main='plot-chart',col='blue')
> boxplot(X,main='Boxplot',col='yellow')

Stages of Development of HRIS
50% (2)
Stages of Development of HRIS
15 pages
R Practicals (2007 Version)
No ratings yet
R Practicals (2007 Version)
15 pages
Association Rules
No ratings yet
Association Rules
29 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
Problem Solving
No ratings yet
Problem Solving
4 pages
Kombidämpfer Genius Joker: Service Manual: Software/Troubleshooting
No ratings yet
Kombidämpfer Genius Joker: Service Manual: Software/Troubleshooting
39 pages
KVA Anusha - PGP12021 - BA
100% (1)
KVA Anusha - PGP12021 - BA
8 pages
R - Practical
No ratings yet
R - Practical
50 pages
Final Practical
No ratings yet
Final Practical
53 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
DM File Kashish
No ratings yet
DM File Kashish
40 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
R - Language Lab Manual - PG 2024
No ratings yet
R - Language Lab Manual - PG 2024
29 pages
Datamininganddataware
No ratings yet
Datamininganddataware
25 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
Datamining
No ratings yet
Datamining
20 pages
R Lab Manual (1) - Merged
No ratings yet
R Lab Manual (1) - Merged
25 pages
DM Lab
No ratings yet
DM Lab
18 pages
R File Code
No ratings yet
R File Code
16 pages
Saurabh
No ratings yet
Saurabh
22 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
21 pages
R Code For Discriminant and Cluster Analysis
No ratings yet
R Code For Discriminant and Cluster Analysis
23 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Rlab SS
No ratings yet
Rlab SS
25 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
WEEK
No ratings yet
WEEK
17 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
DMBI
No ratings yet
DMBI
16 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
R Basics
No ratings yet
R Basics
18 pages
CSE 3121 Information Visualization R Studio All Codes
No ratings yet
CSE 3121 Information Visualization R Studio All Codes
9 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
STA2050 Assignment 2
No ratings yet
STA2050 Assignment 2
10 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
1
No ratings yet
1
19 pages
Clustering in R
No ratings yet
Clustering in R
12 pages
Soal Try Out UN Fis
No ratings yet
Soal Try Out UN Fis
6 pages
R Practical
No ratings yet
R Practical
9 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
3 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Assignment-1 80501
No ratings yet
Assignment-1 80501
6 pages
Bi 5to 8
No ratings yet
Bi 5to 8
6 pages
DM Guidelines 14jan2022
No ratings yet
DM Guidelines 14jan2022
5 pages
4063 Final复习资料
No ratings yet
4063 Final复习资料
6 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
4 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
3 pages
PG DM
No ratings yet
PG DM
4 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
RDataMining Reference Card
No ratings yet
RDataMining Reference Card
5 pages
BAN5
No ratings yet
BAN5
2 pages
Installation Guide: DB2 Universal Database For OS/390
No ratings yet
Installation Guide: DB2 Universal Database For OS/390
576 pages
YanchangZhao Refcard Data Mining
No ratings yet
YanchangZhao Refcard Data Mining
4 pages
Canon IRC1020 Trouble Error Codes
No ratings yet
Canon IRC1020 Trouble Error Codes
9 pages
Ericsson Rbs 6601 Manual
No ratings yet
Ericsson Rbs 6601 Manual
1 page
Content Addressable Memory Using XNOR CAM Cell
No ratings yet
Content Addressable Memory Using XNOR CAM Cell
5 pages
Case Study - Facebook Business Model
No ratings yet
Case Study - Facebook Business Model
12 pages
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
No ratings yet
Jack Lucas: Senior Bio-Pharmaceutical Project Manager Professional Summary
4 pages
Power World Simulator
No ratings yet
Power World Simulator
10 pages
Irctc Ui Redesign Report - Shreya
No ratings yet
Irctc Ui Redesign Report - Shreya
19 pages
HDL Based Synthesis
No ratings yet
HDL Based Synthesis
23 pages
Ce6306 Strength of Materials Ii/Iii Mechanical Engineering
No ratings yet
Ce6306 Strength of Materials Ii/Iii Mechanical Engineering
29 pages
Online Freelancing in The Philippines
No ratings yet
Online Freelancing in The Philippines
22 pages
Intel 8085 Architecture
No ratings yet
Intel 8085 Architecture
8 pages
Secureworks Hacker Annualreport
No ratings yet
Secureworks Hacker Annualreport
25 pages
TMS IntraWeb Component Pack Quick Start
No ratings yet
TMS IntraWeb Component Pack Quick Start
17 pages
Lab
No ratings yet
Lab
86 pages
Re Wiring Feeder
100% (1)
Re Wiring Feeder
31 pages
Practical 01
No ratings yet
Practical 01
5 pages
Https WWW - Solaredge.com Sites Default Files Se-three-phase-Inverter-setapp-ds
No ratings yet
Https WWW - Solaredge.com Sites Default Files Se-three-phase-Inverter-setapp-ds
2 pages
Challenging Revaluation REsult
No ratings yet
Challenging Revaluation REsult
4 pages
Course Outline
No ratings yet
Course Outline
14 pages
Paper 1 Answers MPPSC 2021 P
No ratings yet
Paper 1 Answers MPPSC 2021 P
12 pages
Untitled Presentation
No ratings yet
Untitled Presentation
11 pages
Cost Constraint/Isocost Line
No ratings yet
Cost Constraint/Isocost Line
38 pages
Precedence and Associativity of Operators in Python
No ratings yet
Precedence and Associativity of Operators in Python
2 pages
Limits of Sequences - Brilliant Math & Science Wiki
No ratings yet
Limits of Sequences - Brilliant Math & Science Wiki
9 pages
SNP Log
No ratings yet
SNP Log
3 pages
Connection Pooling
No ratings yet
Connection Pooling
5 pages
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
From Everand
Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst
Dean Abbott
No ratings yet
Census of Population and Housing, 1990 [2nd]
From Everand
Census of Population and Housing, 1990 [2nd]
U.S. Census of population and housing
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
From Everand
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
Bart Baesens
No ratings yet