0% found this document useful (0 votes)
6 views7 pages

Bi 5

Uploaded by

sifovec135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Bi 5

Uploaded by

sifovec135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Intelligenceand Data Analytics

Rusiness L13
Lab Manual
Business Intelligence and Data Analvtics
Practical No. 8 Lab Manual

Porformn the data clustering using clusterra alçortr m


srequire(graphics)
Al

#a2.-dimensional
lexample
R Console rbind(matrix(rnorm(100, sd =0.3), ncol 2)
. matrix(rnorm( 100, mean =1,3d a
0.3). ncol 2)
"y")
olnames(x)<- c("x"
sfd<-kmeans(x.2))
625. 2:4, clustering with 2 clusters of sizes51.49
K-means
Clustermeans,:

t0.02623258 -0.05595237
996460484 L.00834326

Clustering vector:
12

22222 22.
2222122
|lII11111L|1IIill2222:2222222
22.
(82] 22 2
squares by cluster :
within cluster sum of
1110.683124 9.464926
Y%)
(between_SS / total SS = 714

Avalable components:

[1]"cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss* iter


50. 50
>K-means clustering with 2 clusters of slzes

» plot(x, col = clScluster)


>points(clScenters, col = 1:2, pch 8, cex =2)

> # sum of squares


>SS<- function(x) sum(scale(x, scale a FALSE)^2)

> ## cluster centers "fiitted" to each obs.:


>fited.x <- fltted(c): head(fitted.x)

10.02623258 -0 05595237
10.02623258 -0.05595237
10.02623258 -0.05595237
10.02623258 -0.05595237
10.02623258 -0.05595237 TachLaouledys
10.02623258 -0.0s595237
Tech Kaouledge
PuDiCatI0
Business.
Intelligenceand| Data Analytics
L-75
L-74
Lab Manual custer means
Lab Manual
Business Intelligence and Data Analytics y
L0040085 1.3382030
-015007776-0.4181972
134308070.850146s
0.70216480.8706605
-025769550.1991452
&
0.3498496 0.0499787

Clusteringvector:

6252265622

10
1]56
2344134444311131113
14
(82]443144344 4 4133311641

Wthín cluster
sum of
25 2 526525626 2 6 5 6 6 65262626 5 563434141 4 34
squares by cluster:
00 (1J0.9875594 0.8958093 0.93791644 1.8520779 1.0357106 1.6801028
89.S %)
(betweenSS/total_SS =

Avallable components:
## Equalities: --
"totss")]), # the sanme two columns 11 "cluster" "centers!" "totss" "withínss" "totwithinss" "betweenss 'slze'
cbind (cl[c("betweenss", "tot. withinss" "iter "laul
c(ss(fittedx), ss(resid.x), ss(x)))
cl$cluster)
>plot(x, col =
pch =8)
stopifnot(allequal(cls totss, ss(x)). >points(cl$centers, col =1:5,
+ all.equal(clS tot.withinss, ss(resid.x)),
## these three are the same:

+ all.equal (clS betweenss, ss (fitted.x),


+ all.equal(cl$ betweenss, cl$totss - cIStot.
withinss).
+ ## and hence also
+ all.equal(ss(x), ss(itted.x) +ss(resid.x))
+)

kmeans(x,1)$withinss # trivial one-cluster, (its W.SS == ss(x)

##random starts do help here with too many clusters


## (and are often recommended anyway!): 15

(cl <- kmeans(x, 6, nstart = 29))

K-means clustering with 6 clusters of sizes 15, 16, 13, 21, 14. 21
PODications

Tech Knouled
PuDICatio0S
Intelligence allU
Buslnes
c L-77
Lab Manual Lab Manual
L-76 Coefficients:
Business Intelligence and Data Analytics
EstímateStd. Error t value Pr(>\)
Practical No. 9
8.1473 27.0454 0.301 0.7709
Antercept)
0.3827 0.1693 2.261 0.0536,
on the given datawarehoUse vala (power) of
Alm : Perform the Linear rearession where exponent iboth these
through an equation,
variables are related plotted as agraph. A
In Linear Regression thesetwO
variables is 1. Mathematically alinear relationship represents a
straight line when
non-linear lenif
codes: 0
'*** 0.001 **0.01 "0.050.,1"1
to 1 creates a curve
variable is not equal
relationship where the exponent of any standarderror: 13.8 on 8degrees of freedom
y= ax + b is an equation for linear regression. Restdual
constantswhich are callaa. R-squared: 0.3899, Adjusted R-squared: 0.3136
variable and a and b are
response variable, x is the predictor height is known. To do this we
Multiple
on 1l and
Where, y is the when his 8 DF, p-value; 0.05363
predicting weight of a person F-statlstic: 5.113
coefficients. Asimple example of regression is persons
and weight of person. Predlctthe
welghttoof new
need to have the relationship between height
The steps to create the relationship is - predictorvector.
height and corresponding weight. # The ,175, 139, 186, 125, 146, 199, 183, 162, 121)
>x<(141,
gathering a sample of observed values of
Carry out the experiment of
functions in R.
Create a relationship model using the lm) resposne vector.
using these ,# The
coefficients from the model created and create the mathematical equation -c(93, 84, 56, 81,
57. 47, 86,71, 61, 49)
Find the
prediction. Also called residuals.
model to know the average error in I function. relation <-Im(yx)
Get a summary of the relationship theeIm)
predlct) function in R.
>#Apply
To predict the weight of new persons, use the
person with height170.
,#Find weight of a
Create Relationship Model & get the Coefficients
170)
>X<- c(141, 175, 139, 186, 125, 146, 199, 183, 162, 121) >a<- data.frame(x=
>y<- c(93, 84, 56, 81, 57, 47, 86, 71, 61, 49) predict(relation,a)
> relation <- Im(y~x) >result <-
>print(result)
>print(relation) 1
73.20728
Call:
Visualize the Regression Graphically
Im(formula =y ~ x) variable.
># Create the predictor and response
186, 125, 146, 199, 183, 162, 121)
Coefficients:
>x< c(141, 175, 139,

(Intercept) X >y<- c(93, 84, 56, 81, 57, 47, 86, 71, 61, 49)
8,1473 0.3827 >relation <- Im(y~x)
Get the Summary of the Relationship
="linearregression.png]
>#Give the chart file a name. png(ile
>print(summary(relation))
> #Plot the chart.
Call: ="Weight in
abline(Im(x~y),cex= 1.3,pch 16,xlab
Im(formula =y~x) Weight Regression",
plot(y,x,col ="blue",main ="Height &
Residuals:
Kg"ylab ="Height in cm")
Min 1Q Median 3Q Max TechKnouledge
PUDICations
-17.022 -6.750 -2.164 1.688 30,891
TechKnouledge
PubICatlons
telligenceand Data Analytics
I
wsiness L79

L-78
Lab Manual Lab Manual
Business Intelligence and Data Analytics

Height &Welght Regresslon

7$

TechKnouledge
PUDIC atlons

Tech Knouledge
PubIlcationS
!
IntelligenceandData Analytics
o.diness
Business Intelligence and Data Lab Manual
Analytics L-80

Lab Manual

.
0.92
092

085

)97

Trchliewlde

TerhKnowledge
Duhc 3tlon:
lntelligenceand| Data Analytics
Business Intelligence and Data Lab Manual
Business
L-83
Analytics L-82

Lab Manual

0.92 0.97

092 097

0.85
0.85

-5.09
-5.09

asts

0.92 0.97
0.92 0.97

0.85
0.85

-509
-5.09

).92 0.97

085

-5.09

TechKouled
PuDICations

Tech Knouledge
PUblC a tions
Business Intelligence and Data Analytics L-84
Lab Manual
BusinessIntelligenceandI Data Analytics
L-85

7
Ne
Practical No. 10 Lab Manual

VSUALIZATOMS Performthe logistic regression on the given


Alm:
The in-built data set
"mtcars" datawarehouse data.
describes different
"mtcars'" data set, the transmission mode (automatic or models of a car with thelr various
or 1). We can create a logistic regression model manual)theis described by the columnengine
value(0
cyl.
between columns "am" 3 am speci whichfliscataiobinary
ns. In
and other
RRGui (32-bit) -[R Conzole] columns - hp, wt and
R File Edit
View Mise Psckages Wndows Help

Bezde RXA é 110 2.629


FIL TERS
Mazde AX4 #ac 1 6 110 2.875
4 93 2.320
HOrREt 4 Dyíve 6 116 3.215
Hornet Sportabout e 175 3.440
Valiant 6 105 3.969

t Create Regression Model


We use the glm) function to create the regression model and get its summary for analysis.
RGui (32-bi0 -(R Conzole]
R Fle Edn Vie Misc Packeges Windors Help

inpu c- CArs l, c("arc*, "cyi", "tp", ");

E1nERT(3.dat&) )

Ceance ResL4ais:
Hedian
-2.17272 -9.14907 -0.01444 0.14116 1.27641

Cerficent:
Zstizate 3td.
.1l632 2.429 0.c252
(1ntercer:) 239. 1.07282 0.455 0.8491
0.4878g
o.03259 0.01926 1,729 o.0240.
-9.14947 4.153)2 -2.203 o,Q216"

31çn2r. ts: 0 1444 0.001 . o.02 * 0.05 ." 0 . 2 :

a 32 degrees of tret1a
Sul! 1evLance: 43.229?
9.5415 cn 29 degrees or treesom
Res1duel deviance:
AIC: 17.641

1teretic23:
Nater ot T19ber 5coziag

"cyl" and "hp", we consider


more than 0.05 for the variables
n the summary as the
p-value in the last column is
variable "am". Only welght (wt) impacts the "am" value in
value of the
to beinsignificant in contributing to the DO0
this regression model.

Techknouledge
PuDcatlons

You might also like