0% found this document useful (0 votes)
10 views53 pages

R Record-1

The document is a practical record for the I M.Sc Computer Science course at L.R.G Government Arts College for Women, focusing on Data Mining using R. It includes various algorithms such as Apriori for association rules, K-Means and Hierarchical clustering, and classification methods, along with R code snippets and outputs for practical implementation. The record is intended for submission for the Bharathiar University Practical Examination for the academic year 2023-2024.

Uploaded by

Samy Samy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

R Record-1

The document is a practical record for the I M.Sc Computer Science course at L.R.G Government Arts College for Women, focusing on Data Mining using R. It includes various algorithms such as Apriori for association rules, K-Means and Hierarchical clustering, and classification methods, along with R code snippets and outputs for practical implementation. The record is intended for submission for the Bharathiar University Practical Examination for the academic year 2023-2024.

Uploaded by

Samy Samy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

L.R.

G GOVERNMENT ARTS COLLEGE FOR

WOMEN (AFFILIATED TO BHARATHIAR

UNIVERSITY)

TIRUPUR-641604.
DEPARTMENT OF COMPUTER SCIENCE

I MSC-COMPUTER SCIENCE

PRACTICAL-III :DATA MINING USING R

NAME :

REG.NO:
CERTIFICATE
L R.G GOVERNMENT ARTS COLLEGE FOR WOMEN

(AFFILIATED TO BHARATHIAR UNIVERSITY)

TIRUPUR-641604

NAME :

REGISTER NO:

CLASS :

This is to certify that it is a bonafide record of practical work done by the above student of
the I-M.SC COMPUTER SCIENCE PRACTICAL-III: DATA MINING USING R during the
academic year 2023-2024.

Staff in Charge. Head of the Department

Submitted for the Bharathiar University Practical Examination held On. . . . . . . . . . . . . . . .

Internal Examiner. External Examiner


CONTENT
INDEX

S.N DATE CONTENTS PA SIGN


O G
E
N
O

1 APRIORI ALGORITHM TO
EXTRACT ASSOCIATION RULES
OF DATA MINING

2 K-MEANS CLUSTERING ALGORITHM

3 HIERARCHICAL CLUSTERING

4 CLASSIFICATION ALGORITHM

5 DECISION TREE

6 LINEAR REGRESSION

7 DATA VISUALZATION
1. APRIORI ALGORITHM TO EXTRACT ASSOCIATION RULES

OF DATA MINING

# Loading Libraries

library(arules)

library(arulesViz)

library(RColorBrewer)

# import dataset

data('Groceries')

# using apriori() function

rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

# using inspect() function

inspect(rules[1:10])

# using itemFrequencyPlot() function

arules::itemFrequencyPlot(Groceries,topN=20,

col=brewer.pal(8,'Pastel2'),

main='Relative Item Frequency

Plot', type='relative',

ylab='Item Frequency(Relative)')
OUTPUT:
# Loading Libraries

>library(arules)

Loading required package: Matrix

Attaching package: ‘arules’

The following objects are masked from

‘package:base’: abbreviate, write

> library(arulesViz)

> library(RColorBrewer)

> # import dataset

> data('Groceries')

> # using apriori() function

> rules<-apriori(Groceries,parameter=list(supp=0.01,conf=0.2))

Apriori

Parameter specification:

confidence minval smax arem aval originalSupport maxtime support minlen

0.2 0.1 1 none FALSE TRUE 5 0.01 1

maxlen target ext

10 rules TRUE

Algorithmic

control:

filter tree heap memopt load sort verbose

0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 98

set item appearances ...[0 item(s)] done [0.00s].

set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].


sorting and recoding items ... [88 item(s)] done [0.00s].

creating transaction tree ... done [0.01s].

checking subsets of size 1 2 3 4 done [0.00s].

writing ... [232 rule(s)] done [0.00s].

creating S4 object ... done [0.00s].

> # using inspect() function

> inspect(rules[1:10])

lhs rhs support confidence coverage

[1] {} => {whole milk} 0.25551601 0.2555160 1.00000000

[2] {hard cheese} => {whole milk} 0.01006609 0.4107884 0.02450432

[3] {butter milk} => {other vegetables} 0.01037112 0.3709091 0.02796136

[4] {butter milk} => {whole milk} 0.01159126 0.4145455 0.02796136

[5] {ham} => {whole milk} 0.01148958 0.4414062 0.02602949

[6] {sliced cheese} => {whole milk} 0.01077783 0.4398340 0.02450432

[7] {oil} => {whole milk} 0.01128622 0.4021739 0.02806304

[8] {onions} => {other vegetables} 0.01423488 0.4590164 0.03101169

[9] {onions} => {whole milk} 0.01209964 0.3901639 0.03101169

[10] {berries} => {yogurt} 0.01057448 0.3180428 0.03324860

lift count
[ 1.0000 25
1 00 13
]
[ 1.6076 9
2 82 9
]
[ 1.9169 10
3 16 2
]
[ 1.6223 11
4 85 4
]
[ 1.7275 11
5 09 3
]
[ 1.7213 10
6 56 6
]
[ 1.5739 11
7 68 1
]
[8] 2.372268 140

[9] 1.526965 119

[10] 2.279848 104

> # using itemFrequencyPlot() function

> arules::itemFrequencyPlot(Groceries,topN=20,

+ col=brewer.pal(8,'Pastel2'),

+ main='Relative Item Frequency Plot',

+ type='relative',

+ ylab='Item Frequency(Relative)')
Relative Item Frequency Plot
2.K-Means Clustering
library(cluster)

df=USArrests

#Number of Rows and Columns of the Actul Data

set dim(df)

head(df)

#remove rows with missing

values df=na.omit(df)

dim(df)

#scale each variable to have mean 0 and sd 1

df=scale(df)

head(df)

set.seed(1)

#Cluster the dataset with 5 groups

km=kmeans(df,centers=5,nstart=25)

print(km)

plot(df)

points(km$centers,col=1:5,pch=8,cex=2)

cnt=table(km$cluster)

print(cnt)

final_data=cbind(df,cluster=km$cluster)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")

ag=aggregate(final_data,by=list(cluster=km$cluster),mean)

head(ag)

plot(ag,cex=0.6,main="Aggregate")
OUTPUT:
#K-means Clustering

> #Number of Rows and Columns of the Actaul Data set

> dim(df)

[1] 50 4

> head(df)

Murder Assault UrbanPop Rape

Alabam 13 23 58
a .2 6 21.2
Alaska 10. 263 48
0 44.5
Arizona 8.1 294 80
31.0
Arkansa 8. 190 50
s 8 19.5
Californ 9. 276 91
ia 0 40.6
Colorad 7. 204 78
o 9 38.7

> #remove rows with missing


values

> dim(df)

[1] 50 4

> #scale each variable to have mean 0 and sd 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207


> #Cluster the dataset with 5 groups

> print(km)

K-means clustering with 5 clusters of sizes 7, 10, 10, 11,

12 Cluster means:

Murder Assault UrbanPop Rape

1 1.5803956 0.9662584 -0.7775109 0.04844071

2 -1.1727674 -1.2078573 -1.0045069 -1.10202608

3 -0.6286291 -0.4086988 0.9506200 -0.38883734

4 -0.1642225 -0.3658283 -0.2822467 -0.11697538

5 0.7298036 1.1188219 0.7571799 1.32135653

Clustering vector:

Alabama Alaska Arkan Californi


a
Arizona 15 5 4 sa

Colorado Connecticut s
Delaware Georg
5 ia
Flori
da

5 3 3 5 1

Haw Idah Illino India Iowa


aii o is na
3 2 5 4 2

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 2

Massachusetts Michigan Minnesota Mississippi Missouri

3 5 2 1 4

Montana Nebraska Nevada New Hampshire New

Jersey 4 4 5 2 3

New Mexico New York North Carolina North Dakota

Ohio 5 5 1 2
3

Oklahoma Oregon Pennsylvania Rhode Island South Carolina


4 4 3 3 1

South Dakota Tennessee Texas Utah

Vermont 2 1 5 3

Virginia Washington West Virginia Wisconsin

Wyoming 4 3 22 4

Within cluster sum of squares by cluster:

[1] 6.128432 7.443899 9.326266 7.788275 18.257332

(between_SS / total_SS = 75.0 %)

Available components:

[1] "cluster" "centers" "totss" "withinss" "tot.withinss"

[6] "betweenss" "size" "iter" "ifault"

> plot(df)
> points(km$centers,col=1:5,pch=8,cex=2)
> print(cnt)

12345

7 10 10 11 12

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1


ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 5

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 5


a

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 5


Colorado 0.02571456 0.3988593 0.8608085 1.864967207 5

> plot(final_data,cex=0.6,main="Final Data")

> head(ag)

cluster Murder Assault UrbanPop Rape cluster

1 1 1.5803956 0.9662584 -0.7775109 0.04844071 1

2 2 -1.1727674 -1.2078573 -1.0045069 -1.10202608 2

3 3 -0.6286291 -0.4086988 0.9506200 -0.38883734 3

4 4 -0.1642225 -0.3658283 -0.2822467 -0.11697538 4

5 5 0.7298036 1.1188219 0.7571799 1.32135653 5

> plot(ag,cex=0.6,main="Aggregate")
3.HIERARCHICAL CLUSTERING
#Hierarchical Clustering

library(cluster)

df=USArrests

#remove rows with missing

values df=na.omit(df)

#scale each variable to have a mean 0 and sd of

1 df=scale(df)

head(df)

d=dist(df,method="euclidean")

#complete dendogram

hc1=hclust(d,method="complete")

plot(hc1,cex=0.6,main="complete dendogram",hang=-1)

#average dendogram

hc2=hclust(d,method="average")

plot(hc2,cex=0.6,main="Average Dendogram",hang=-1)

abline(h=3.0,col="green")

groups=cutree(hc2,k=4)

print(groups)

table(groups)

rect.hclust(hc2,k=4,border="red")

final_data=cbind(df,cluster=groups)

head(final_data)

plot(final_data,cex=0.6,main="Final Data")
OUTPUT:
#Hierarchical Clustering

> #remove rows with missing values

> #scale each variable to have a mean 0 and sd of 1

> head(df)

Murder Assault UrbanPop Rape

Alabama 1.24256408 0.7828393 -0.5209066 -0.003416473

Alaska 0.50786248 1.1068225 -1.2117642 2.484202941

Arizona 0.07163341 1.4788032 0.9989801 1.042878388

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602

California 0.27826823 1.2628144 1.7589234 2.067820292

Colorado 0.02571456 0.3988593 0.8608085 1.864967207

> #complete dendogram

> plot(hc1,cex=0.6,main="complete dendogram",hang=-1)


> #average dendogram

> print(groups)

Alabama Alaska Arkan Californi


a
Arizona 12 3 4 sa

Colorado Connecticut s
Delaware Geor
3 gia
Flori
da

3 4 4 3 1

Haw Idah Illino India Iowa


aii o is na
4 4 3 4 4

Kansas Kentucky Louisiana Maine

Maryland 4 4 1 4

Massachusetts Michigan Minnesota Mississippi Missouri


4 3 4 1 3

Montana Nebraska Nevada New Hampshire New Jersey


4 4 3 4 4

New Mexico New York North Carolina North Dakota

Ohio 3 3 1 4

Oklahoma Oregon Pennsylvania Rhode Island South Carolina 4

4 4 4 1

South Dakota Tennessee Texas Utah

Vermont 4 1 3 4

Virginia Washington West Virginia Wisconsin

Wyoming 4 4 44 4

groups

1234

7 1 12 30

> rect.hclust(hc2,k=4,border="red")
> final_data=cbind(df,cluster=groups)

> head(final_data)

Murder Assault UrbanPop Rape cluster

Alaba 1.24256408 0.7828393 -0.5209066 - 1


ma 0.003416473
Alaska 0.50786248 1.1068225 -1.2117642 2.484202941 2

Arizon 0.07163341 1.4788032 0.9989801 1.042878388 3


a

Arkansas 0.23234938 0.2308680 -1.0735927 -0.184916602 4

California 0.27826823 1.2628144 1.7589234 2.067820292 3

Colorado 0.02571456 0.3988593 0.8608085 1.864967207 3


> plot(final_data,cex=0.6,main="Final Data")
4.CLASSIFICATION ALGORITHM
#Classification

Algorithm library(class)

df=data(iris)

#Number of Rows and Columns

dim(iris)

head(iris)

rand=sample(1:nrow(iris),0.9*nrow(iris))

head(rand)

#Scale the values using Normalization

method nor<-function(x)

return((x-min(x))/(max(x)-min(x)))

iris_norm=as.data.frame(lapply(iris[,c(1,2,3,4)],nor))

head(iris_norm)

#Train dataset

iris_train=iris_norm[rand,]

iris_train_target=iris[rand,5

] #Test dataset

iris_test=iris_norm[-rand,]

iris_test_target=iris[-rand,5]

dim(iris_train)

dim(iris_test)

#K-nearesr neighbour Classification

model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)
#Confusion Matric

tab=table(model1,iris_test_target)

print(tab)

accuracy=function(x)

sum(diag(x)/sum(rowSums(x)))*100

cat("Accuracy classifier=",accuracy(tab))
OUTPUT:
#Classification Algorithm

> #Number of Rows and Columns

> dim(iris)

[1] 150 5

> head(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto


. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> head(rand)

[1] 114 107 25 128 14 24

> #Scale the values using Normalization method

> nor<-function(x)

+{

+ return((x-min(x))/(max(x)-min(x)))

+}

> head(iris_norm)

Sepal.Length Sepal.Width Petal.Length Petal.Width


1 0.62500 0.067796 0.041666
0.22222222 00 61 67
2 0.41666 0.067796 0.041666
0.16666667 67 61 67
3 0.50000 0.050847 0.041666
0.11111111 00 46 67
4 0.45833 0.084745 0.041666
0.08333333 33 76 67
5 0.19444444 0.6666667 0.06779661 0.04166667

6 0.30555556 0.7916667 0.11864407 0.12500000

> #Train dataset

> iris_train=iris_norm[rand,]

> iris_train_target=iris[rand,5]

> #Test dataset

> iris_test=iris_norm[-rand,]

> iris_test_target=iris[-rand,5]

> dim(iris_train)

[1] 135 4

> dim(iris_test)

[1] 15 4

> #K-nearesr neighbour Classification

> model1=knn(train=iris_train,test=iris_test,cl=iris_train_target,k=7)

> #Confusion Matric

> print(tab)

iris_test_target

model1 setosa versicolor

virginica setosa 6 0

versicolor 0 6 1

virginica 0 0 2

> accuracy=function(x)

+{

+ sum(diag(x)/sum(rowSums(x)))*100

+}

cat("Accuracy classifier=",accuracy(tab)) Accuracy classifier= 100>


5.DECISION TREE
#Decision Tree

library(rpart)

data=iris

str=data

head(data)

#creating the decision tree using regression

dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,method
="anova")

plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")

print(dtree)

text(dtree,use.n=TRUE,cex=.7)

#predicting the Sepal Width

adata<-data.frame(Species='versicolor',Sepal.Length=5.1,Petal.Length=4.5,Petal.Width=1.4)

cat("Predicted Value:\n")

pt=predict(dtree,adata,method="anova")

print(pt)

plot(pt)

#creating the decision tree using

classification df=as.data.frame(data)

dt=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=df,method="cl
ass")

plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

print(dt)

text(dt,use.n=TRUE,cex=.7)
OUTPUT:
> #Decision Tree

> head(data)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3 1.4 0.2 seto


. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9
> #creating the decision tree using regression
dtree=rpart(Sepal.Width~Sepal.Length+Petal.Width+Petal.Length+Species,data=iris,metho
d
="anova")

> plot(dtree,uniform=TRUE,main="Sepal Width Decision Tree Using Regression")


> print(dtree)

n= 150

node), split, n, deviance, yval

* denotes terminal node

1) root 150 28.3069300 3.057333

2) Species=versicolor,virginica 100 10.9616000 2.872000

4) Petal.Length< 4.05 16 0.7975000 2.487500 *

5) Petal.Length>=4.05 84 7.3480950 2.945238

10) Petal.Width< 1.95 55 3.4920000 2.860000

20) Sepal.Length< 6.35 36 2.5588890 2.805556 *

21) Sepal.Length>=6.35 19 0.6242105 2.963158 *

11) Petal.Width>=1.95 29 2.6986210 3.106897

22) Petal.Length< 5.25 7 0.3285714 2.914286 *

23) Petal.Length>=5.25 22 2.0277270 3.168182 *


3) Species=setosa 50 7.0408000 3.428000

6) Sepal.Length< 5.05 28 2.0496430 3.203571 *

7) Sepal.Length>=5.05 22 1.7859090 3.713636 *

> text(dtree,use.n=TRUE,cex=.7)

> #predicting the Sepal Width

Predicted Value:

2.805556

> plot(pt)
> #creating the decision tree using classification

> plot(dt,uniform=TRUE,main="Sepal Width Decision Tree using Classification")

> print(dt)

n= 150

node), split, n, loss, yval, (yprob)

* denotes terminal node


1) root 150 124 3 (0.0067 0.02 0.027 0.02 0.053 0.033 0.06 0.093 0.067 0.17 0.073 0.087
0.04 0.08 0.04 0.027 0.02 0.04 0.013 0.0067 0.0067 0.0067 0.0067)

2) Petal.Width>=0.8 100 80 3 (0.01 0.03 0.03 0.03 0.08 0.05 0.09 0.14 0.09 0.2 0.07 0.08
0.04 0.03 0 0.01 0 0.02 0 0 0 0 0)

4) Sepal.Length< 6.45 65 55 2.8 (0.015 0.046 0.046 0.046 0.11 0.062 0.14 0.15 0.11 0.14
0.015 0.046 0.031 0.046 0 0 0 0 0 0 0 0 0)

8) Petal.Width< 1.95 56 47 2.7 (0.018 0.054 0.054 0.054 0.11 0.071 0.16 0.11 0.12 0.16
0.018 0.036 0.018 0.018 0 0 0 0 0 0 0 0 0)

16) Sepal.Length< 5.55 12 9 2.4 (0.083 0 0.17 0.25 0.25 0.083 0.083 0 0 0.083 0 0 0 0
0 0 0 0 0 0 0 0 0) *

17) Sepal.Length>=5.55 44 36 2.7 (0 0.068 0.023 0 0.068 0.068 0.18 0.14 0.16 0.18
0.023 0.045 0.023 0.023 0 0 0 0 0 0 0 0 0)

34) Petal.Width< 1.55 29 23 2.9 (0 0.1 0.034 0 0.069 0.1 0.1 0.17 0.21 0.17 0 0.034 0
0 0 0 0 0 0 0 0 0 0)

68) Sepal.Length>=5.95 15 11 2.9 (0 0.2 0.067 0 0.067 0.067 0 0.2 0.27 0.067 0
0.067 0 0 0 0 0 0 0 0 0 0 0) *

69) Sepal.Length< 5.95 14 10 3 (0 0 0 0 0.071 0.14 0.21 0.14 0.14 0.29 0 0 0 0 0 0 0


0 0 0 0 0 0) *

35) Petal.Width>=1.55 15 10 2.7 (0 0 0 0 0.067 0 0.33 0.067 0.067 0.2 0.067 0.067
0.067 0.067 0 0 0 0 0 0 0 0 0) *

9) Petal.Width>=1.95 9 5 2.8 (0 0 0 0 0.11 0 0 0.44 0 0 0 0.11 0.11 0.22 0 0 0 0 0 0 0 0


0) *

5) Sepal.Length>=6.45 35 24 3 (0 0 0 0 0.029 0.029 0 0.11 0.057 0.31 0.17 0.14 0.057 0 0


0.029 0 0.057 0 0 0 0 0) *

3) Petal.Width< 0.8 50 41 3.4 (0 0 0.02 0 0 0 0 0 0.02 0.12 0.08 0.1 0.04 0.18 0.12 0.06
0.06 0.08 0.04 0.02 0.02 0.02 0.02)

6) Sepal.Length< 4.95 20 15 3 (0 0 0.05 0 0 0 0 0 0.05 0.25 0.2 0.2 0 0.15 0 0.1 0 0 0 0 0


0 0)

12) Petal.Length< 1.45 13 8 3 (0 0 0.077 0 0 0 0 0 0.077 0.38 0 0.23 0 0.077 0 0.15 0 0 0


0 0 0 0) *

13) Petal.Length>=1.45 7 3 3.1 (0 0 0 0 0 0 0 0 0 0 0.57 0.14 0 0.29 0 0 0 0 0 0 0 0 0) *


7) Sepal.Length>=4.95 30 24 3.4 (0 0 0 0 0 0 0 0 0 0.033 0 0.033 0.067 0.2 0.2 0.033 0.1
0.13 0.067 0.033 0.033 0.033 0.033)

14) Petal.Length< 1.45 11 7 3.5 (0 0 0 0 0 0 0 0 0 0 0 0.091 0.091 0.091 0.36 0.091 0 0


0.091 0.091 0 0.091 0) *

15) Petal.Length>=1.45 19 14 3.4 (0 0 0 0 0 0 0 0 0 0.053 0 0 0.053 0.26 0.11 0 0.16


0.21 0.053 0 0.053 0 0.053) *

> text(dt,use.n=TRUE,cex=.7)
6.LINEAR REGRESSION
#Linear Regression

setwd("D:/R")

df=read.csv("h2.csv",header=TRUE)

print(df)

lr=lm(height~weight,data=df)

print(lr)

#Linear Regression

plot(df$height,df$weight,col="blue",main="Height_Weight
Regression",cex=1.3,pch=15,xlab="height",ylab="weight")

print(summary(lr))

print(residuals(lr))

coeff=coefficients(lr)

eq=paste0("y",round(coeff[2],1),"*(",round(coeff[1],1),"*x)")

print(eq)

#Linear Equation

new.weights=data.frame(weight=c(60,50)

) print(new.weights)

df1=predict(lr,newdata=new.weights)

print(df1)

df2=data.frame(df1,new.weights)

names(df2)=c("height","weight")

print(df2)

df3=rbind(df,df2)

print(df3)

write.csv(df3,"h3.csv")

pie(table(df3$height))
OUTPUT:
> #Linear Regression

> setwd("D:/R")

> df=read.csv("h2.csv",header=TRUE)

> print(df)

height weight

1 8
0
174
2 7
0
150
3 7
5
160
4 8
5
180

> lr=lm(height~weight,data=df)

> print(lr)

Call:

lm(formula = height ~ weight, data =


df) Coefficients:

(Intercept) weight
4.80 2.08

> #Linear Regression

> print(summary(lr))

Call:

lm(formula = height ~ weight, data =

df) Residuals:

1 2 3 4

2.8 -0.4 -0.8 -1.6

Coefficients:
Estimate Std. Error t value Pr(>|t|)

(Intercept) 4.8000 16.4463 0.292 0.7979

weight 2.0800 0.2117 9.827 0.0102 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.366 on 2 degrees of freedom

Multiple R-squared: 0.9797, Adjusted R-squared: 0.9696

F-statistic: 96.57 on 1 and 2 DF, p-value: 0.0102

> print(residuals(lr))

1 2 3 4

2.8 -0.4 -0.8 -1.6

> print(eq)

[1] "y2.1*(4.8*x)"

> #Linear Equation

> print(new.weights)

weight

1 60

2 50

> print(df1)

1 2

129.6 108.8

> print(df2)

height weight

1 129.6 60

2 108.8 50

> df3=rbind(df,df2)
> print(df3)

height

weight

1 174.0 8
0
2 150.0 7
0
3 160.0 7
5
4 180.0 8
5
11 6
129.6 0
21 5
108.8 0

> write.csv(df3,"h3.csv")

> pie(table(df3$height))
7.DATA VISUALIZATION
#Data Visualization

X=iris

dim(X)

summary(X)

head(X)

hist(X$Sepal.Length,main='Histogram',col='green')

barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length'

) pie(table(X$Sepal.Length),main='pie-chart')

pairs(X)

plot(X$Sepal.Length,main='plot-chart',col='blue')

boxplot(X,main='Boxplot',col='yellow')
OUTPUT:

> #Data Visualization

> dim(X)

[1] 150 5

> summary(X)

Sepal.Length Sepal.Width

Petal.Length Min. :4.300

Min. :2.000 Min. :1.000 1st Qu.:5.100

1st Qu.:2.800 1st Qu.:1.600

Median :5.800 Median :3.000 Median :4.350

Mean :5.843 Mean :3.057 Mean :3.758 3rd

Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 Max.

:7.900 Max. :4.400 Max. :6.900

Petal.Width Species

Min. :0.100 setosa :50

1st Qu.:0.300 versicolor:50

Median :1.300 virginica :50

Mean :1.199

3rd Qu.:1.800

Max. :2.500

> head(X)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


1 5.1 3 1.4 0.2 seto
. sa
5
2 4.9 3 1.4 0.2 seto
. sa
0
3 4.7 3 1.3 0.2 seto
. sa
2
4 4.6 3 1.5 0.2 seto
. sa
1
5 5.0 3 1.4 0.2 seto
. sa
6
6 5.4 3 1.7 0.4 seto
. sa
9

> hist(X$Sepal.Length,main='Histogram',col='green')

> barplot(X$Sepal.Length[1:10],main='Barplot',col='red',xlab='Sepal.Length')
> pie(table(X$Sepal.Length),main='pie-chart')

> pairs(X)
> plot(X$Sepal.Length,main='plot-chart',col='blue')
> boxplot(X,main='Boxplot',col='yellow')

You might also like