0% found this document useful (0 votes)

156 views

Data Science Lab Manual

The document discusses reading and writing different types of data in R. It describes functions like read.table(), read.csv(), readLines() for reading data and write.table() for writing tabular data to files. It also discusses the readxl package for reading Excel files into R and functions to read data from text, CSV and Excel files as well as write data to files in R.

Uploaded by

mmrmathsiubd

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

156 views

Data Science Lab Manual

Uploaded by

mmrmathsiubd

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

MUFFAKHAM JAH COLLEGE OF

ENGINEERING AND TECHNOLOGY

(Affiliated to Osmania University and Recognized by AICTE)

Mount Pleasant, 8-2-249, Road No. 3, Banjara Hills,Hyderabad,

Telangana-500034.

DEPARTMENT OF COMPUTER SCIENCE

AND ARTIFICIAL INTELLIGENCE
(CS&AI)

DATA SCIENCE LAB (PC453AD)

LAB MANUAL
B.E IV SEM (2021-2022)
INDEX
S.NO. DATA SCIENCE LAB LIST OF PROGRAMS CO PAGE

1 Write R program for calculator application CO2 3

2 Write R program for performing descriptive statistics CO1 6
a) Using Summary CO1
b) Using subset() CO1
3 Write R program for r e a d i n g a n d w r i t i n g d i f f e r e n t CO2 12
types of data sets
a) Reading different types of data sets(.txt,.csv) from web and
disk and writing in specific disk location
b) Reading Excel data set in R
4 Write R program for visualizations CO2 17
5 Write R program to find Correlation and Covariance CO2 22

6 Write R program for Regression Modeling CO2 26

7 Write R program to build classification model using KNN CO3 30

algorithm
8 Write R program to build clustering model using K-mean CO3 34
algorithm

BEYOND SYLLABUS PROGRAMS

9 Write R program to read an XML file CO2 38

2
Experiment – 1
Calculator Application
R operators – R has many operators to carry out different mathematical and logical
operations. Operators in R can mainly be classified into the following categories.
1. Arithmetic operators - +, -, x, /
2. Assignment operators – <- , ->
3. Relational operators - <, >, ==, !=, <=, >=
4. Logical operators - !, &, &&, |, ||

We use the four fundamental arithmetic operations of mathematics for building a

calculator application. Those functions are –
1. Addition
2. Subtraction
3. Multiplication
4. Division

User-defined Functions in R – In R programming, user-defined functions are functions

that are created by the user for a specific use that the already built-in functions of R don’t
provide.
Syntax - functionName <- function (arguments) {
commands to perform
}
Parameters –
functionname: every function is generally given a name
function(argument): here the variables are mentioned
commands to perform: the block of code is written here.

3
1. Aim: To implement Calculator Application in R

a. Using with and without R objects on console

b. Using mathematical functions on console
c. Write an R script, to create an R object for calculator application and save in a specified
location in disk.

Program:

1+2
3-1
4*2
5*2
a<-1
b<-4
c<-2
a+b
a-b
a*b
b/c
add<-function(x,y)
{
print(x+y)
}
add(2,3)
subt<-function(x,y)
{
print(x-y)
}
subt(7,2)
mul<-function(x,y)
{
print(x*y)
}
mul(6,3)
div<-function(x,y)
{
print(x/y);
}
div(10,2)
choice=readline(prompt="Enter add for addition
subt for subtraction
mul for multiplication
div for division
Choice: ");
num1=readline(prompt = "Enter first number : ");

4
num2=readline(prompt = "Enter second number : ");
num1=as.integer(num1)
num2=as.integer(num2)
cal<-switch(choice,"add"=print(num1+num2),
"subt"=print(num1-num2),
"mul"=print(num1*num2),
"div"=print(num1/num2))

Output –

5
Experiment – 2
Descriptive Analysis
Dataset – mtcars
Description – The mtcars dataset is a built-in dataset in R that contains measurements on 11
aspects of automobile design and performance for 32 cars. The data was extracted from the
1974 Motor Trend US magazine.
Attributes –
1. Cyl
2. Disp
3. Hp
4. Drat
5. Wt
6. Qsec
7. Vs
8. Am
9. Gear
10. Carb

Dataset – cars
Description – This dataset contains 50 observations of 2 variables. It shows various
readings on “speed“ and “distance“ collected.
Attributes –
1. speed
2. distance
Dataset – iris
Description – The data set contains 3 classes of 50 instances each, where each class refers to
a type of iris plant. One class is linearly separable from the other 2; the latter are NOT
linearly separable from each other.

Attributes -

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. species:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

6
Subset function –

subset() function in R programming is used to create a subset of vectors, matrices,

or data frames.

Syntax – subset(x,subset,select)

Parameters –

 x: indicates the object

 subset: indicates the logical expression on the basis of which subsetting has to
be done
 select: indicates columns to select
Aggregate function –
Aggregate functions are often used to derive descriptive statistics.

Syntax – aggregate(x, by, FUN, …, simplify=TRUE, drop=TRUE)

Parameters –
 x: R object
 by: List of variables
 FUN: Function to be applied for summary statistics
 ... : Additional arguments to be passed to FUN
 Simplify: Whether to simplify results as much as possible or not
 Drop: Whether to drop unused combinations of grouping values or not
mean() function – This will simply calculate the total mean of all the observations present in
the data of that particular mentioned attribute.
min() function – This will give us the least valued observation from the data being used.
max() function - This will give us the maximum valued observation from the data being used.
summary() function – The summary of all the attributes are shown separately. The factors
used in doing so are minimum value, 1st quartile, Median, Mean, 3rd Quartile, Maximum
value.

7
2. Aim: To perform Descriptive Statistics in R

a. To write an R script to find basic descriptive statistics using summary, str, quartile function
on metacars
b. To apply the above functions on cars data sets
b. To apply subset(), aggregate() functions on iris dataset.

Datasets used:
1. mtcars
2. cars
3. iris

Program :

a. Descriptive Statistics Analysis on mtcars dataset

data(mtcars)
head(mtcars)
tail(mtcars)
head(mtcars,10)str(mtcars)
mtcars[1]
mtcars[15]
mtcars[1:4]
mtcars[c(1,4)]
mtcars[-2]
max(mtcars$cyl)
min(mtcars$mpg)
mean(mtcars$mpg)
median(mtcars$mpg)
summary(mtcars)

Output:

8
9
b. Descriptive Statistics Analysis on cars dataset

data(cars)
head(cars,10)
tail(cars,20)
str(cars)
head(cars)
max(cars)
max(cars$speed)
min(cars$speed)
mean(cars$speed)
median(cars$speed)
mode(cars$speed)
summary(cars$speed)
summary(cars)

Output:

10
c. Applying subset and aggregate functions on iris dataset

data(iris)
head(iris)
tail(iris)
subset(iris,Sepal.Length==6.1)
aggregate(.~Species,data=iris,mean)

Output –

11
Experiment – 3
Reading and writing different types of data
Package used – readxl

The readxl package makes it easy to get data out of Excel and into R. Compared to
many of the existing packages, readxl has no external dependencies, so it's easy to install and
use on all operating systems. It is designed to work with tabular data.
Functions for Reading Data into R –

There are a few very useful functions for reading data into R.

1. read.table() and read.csv() are two popular functions used for reading tabular data
into R.
2. readLines() is used for reading lines from a text file.
3. source() is a very useful function for reading in R code files from a another R
program.
4. dget() function is also used for reading in R code files.
5. load() function is used for reading in saved workspaces
6. unserialize() function is used for reading single R objects in binary format.

Functions for Writing Data to Files –

There are similar functions for writing data to files

1. write.table() is used for writing tabular data to text files (i.e. CSV).
2. read.delim() is used to read delimited text files in the R Language.
3. writeLines() function is useful for writing character data line-by-line to a file or
connection.
4. dump() is a function for dumping a textual representation of multiple R objects.
5. dput() function is used for outputting a textual representation of an R object.
6. serialize() is used for converting an R object into a binary format for outputting to a
connection .

12
3. Aim: To read and write different types of datasets

a. To read different types of datasets from web and disk and writing in file in specific disk
location.
b. To read Excel data sheet in R.

name=c("a","b","c","d","e")
marks=c(20,30,40,10,15)
id=c(1:5)
st=data.frame(id,name,marks)
View(st)

#1. writing data frame into CSV file

write.csv(student,"student.csv",row.names=FALSE)

#2. reading CSV file

st1=read.csv("student.csv")
View(st1)

#3.writing data frame to a text file

write.table(st1,file="st1.txt",quote=F,row.names=F)

#4. reading from text

st2=read.delim('st1.txt')
View(st2)

#5. reading a file from web

webfile = read.delim("http://www.sthda.com/upload/boxplot_format.txt")
print(webfile)
head(webfile)
write.table(webfile,file="webfile.txt",quote=F,row.names=FALSE)

# install package readxl first

install.packages("readxl")
library(readxl)

#6. reading excel datasheet

df=read_excel("d:/ex1.xlsx",sheet=2)
View(df)

13
Output:
> name=c("a","b","c","d","e")
> marks=c(20,30,40,10,15)
> id=c(1:5)
> st=data.frame(id,name,marks)
> View(st)
> #1. writing data frame into CSV file
> write.csv(student,"student.csv",row.names=FALSE)

> #2. reading CSV file

> st1=read.csv("student.csv")
> View(st1)

> #3.writing data frame to a text file

> write.table(st1,file="st1.txt",quote=F,row.names=F)
>
> #4. reading from text
> st2=read.delim('st1.txt')
> View(st2)

#5. reading a file from web

> webfile = read.delim("http://www.sthda.com/upload/boxplot_format.txt")
> print(webfile)

14
Nom variable Group
1 IND1 10 A
2 IND2 7 A
3 IND3 20 A
4 IND4 14 A
5 IND5 14 A
6 IND6 12 A
7 IND7 10 A
8 IND8 23 A
9 IND9 17 A
10 IND10 20 A
11 IND11 14 A
12 IND12 13 A
13 IND13 11 B
14 IND14 17 B
15 IND15 21 B
16 IND16 11 B
17 IND17 16 B
18 IND18 14 B
19 IND19 17 B
20 IND20 17 B
21 IND21 19 B
22 IND22 21 B
23 IND23 7 B
24 IND24 13 B
25 IND25 0 C
26 IND26 1 C
27 IND27 7 C
28 IND28 2 C
29 IND29 3 C
30 IND30 1 C
31 IND31 2 C
32 IND32 1 C
33 IND33 3 C
34 IND34 0 C
35 IND35 1 C
36 IND36 4 C
37 IND37 3 D
38 IND38 5 D
39 IND39 12 D
40 IND40 6 D
41 IND41 4 D
42 IND42 3 D
43 IND43 5 D
44 IND44 5 D
45 IND45 5 D
46 IND46 5 D
47 IND47 2 D
48 IND48 4 D
49 IND49 3 E
50 IND50 5 E
51 IND51 3 E
52 IND52 5 E
53 IND53 3 E
54 IND54 6 E
55 IND55 1 E
56 IND56 1 E
57 IND57 3 E

15
58 IND58 2 E
59 IND59 6 E
60 IND60 4 E
61 IND61 11 F
62 IND62 9 F
63 IND63 15 F
64 IND64 22 F
65 IND65 15 F
66 IND66 16 F
67 IND67 13 F
68 IND68 10 F
69 IND69 26 F
70 IND70 26 F
71 IND71 24 F
72 IND72 13 F
> head(webfile)
Nom variable Group
1 IND1 10 A
2 IND2 7 A
3 IND3 20 A
4 IND4 14 A
5 IND5 14 A
6 IND6 12 A
> write.table(webfile,file="webfile.txt",quote=F,row.names=FALSE)
>
> # install package readxl first
>
> install.packages("readxl")
Error in install.packages : Updating loaded packages
> library(readxl)
>
> #6. reading excel datasheet
> df=read_excel("d:/ex1.xlsx",sheet=2)
> View(df)

16
Experiment – 4
Visualization
Data visualization is an efficient technique for gaining insight about data through a
visual medium. With the help of visualization techniques, we can easily obtain information
about hidden patterns in data and also we can work with large datasets to efficiently obtain
key insights.
Dataset used – mtcars
Description – The mtcars dataset is a built-in dataset in R that contains measurements on 11
aspects of automobile design and performance for 32 cars. The data was extracted from the
1974 Motor Trend US magazine.
Attributes –
1. Cyl
2. Disp
3. Hp
4. Drat
5. Wt
6. Qsec
7. Vs
8. Am
9. Gear
10. Carb

Package - ggplot2
R allows us to create graphics declaratively. This package is famous for its elegant and
qualitygraphs, which sets it apart from other visualization packages.
Boxplot – boxplot() function is used to create a boxplot. These are a measure of how well
data is distributed across a data set. This graph represents the minimum, maximum, average,
first quartile,and the third quartile in the data set.
Syntax – boxplot(x, data, names, main)

parameters –
 x is a vector or a formula.
 data is the data frame.
 names are the group labels which will be printed under each boxplot.
 main is used to give a title to the graph.

Scatterplot –
The scatter plots are used to compare variables. A comparison between variables is
requiredwhen we need to define how much one variable is affected by another variable.
Syntax – plot(x, y, main, xlab, ylab)
Parameters –
 x is the data set whose values are the horizontal coordinates.

17
 y is the data set whose values are the vertical coordinates.
 main is the tile of the graph.
 xlab is the label in the horizontal axis.
 ylab is the label in the vertical axis.

Outliers using plots –

An outlier is a point or set of points that are different from other points. Sometimes they
can be very high or very low. It’s often a good idea to detect and remove the outliers.
Because outliers are one of the primary reasons for resulting in a less accurate model.
Often outliers can be seen with visualizations using a box plot.
R Histogram
A histogram is a type of bar chart which shows the frequency of the number of values
which are compared with a set of values ranges. For creating a histogram, R provides hist()
function. The histogram is used for the distribution.
Syntax - hist(v,main,xlab,col)
Parameters –
 v is a vector containing numeric values used in histogram.
 main indicates title of the chart.
 xlab is used to give description of x-axis.
 col is used to set color of the bars.

R Bar Charts
A bar chart is a pictorial representation in which numerical values of variables are
represented by length or height of lines or rectangles of equal width. R provides the
barplot() function.
Syntax – barplot(H, xlab, ylab, main, names.arg, col)

Parameters –
 H is a vector or matrix containing numeric values used in bar chart.
 xlab is the label for x axis.
 ylab is the label for y axis.
 main is the title of the bar chart.
 names.arg is a vector of names appearing under each bar.
 col is used to give colours to the bars in the graph.
R Pie Charts
A pie-chart is arepresentation of values in the form of slices of a circle with different colors.
Pie charts are created with the help of pie () function, which takes positive numbers as
vector input.
Syntax - pie(x, labels, main, col)
Parameters –
 x is a vector containing the numeric values used in the pie chart.
 labels is used to give description to the slices.
 main indicates the title of the chart.
 col indicates the colour palette.

18
4. Aim: To perform visualizations
a. To find the data distribution using box and scatter plot
b. To find the outliers using plot.
c. To plot the histogram, bar chart and pie chart on sample data.

Dataset used: mtcars

Program:
#Linear plot
x=1:10
y=x^2
plot(x,y,type="l",main=”Linear Plot Example”)
#installing package
install.packages("ggplot2")
#scatter plot
data("mtcars")
plot(
mtcars$wt,mtcars$mpg,
main = "scatter plot example",
xlab = "car weight",
ylab="miles per gallon",
)
#box plot
data("mtcars")
boxplot(
mtcars$mpg,
main = "box plot example",
ylab="miles per gallon"
)
#outliers
v<-c(50,25,30,12,78,99)
boxplot(v,main="outliers")
#Histogram
H<-c(9,13,28,36,4,54,99,98)
hist(H,main="Histogram",col="blue")
#Barchart
h<-c(9,13,28,36,4,54)
m<-c("MAR","APR","MAY","JUN","JUL","AUG")
barplot(h,names.arg=m,xlab="Month",ylab="revenue",main="barchart",border ="blue")
#pie chart
h<-c(90,78,80,25)
m<-c("OS","DBMS","Java","OE")
pie(h,m,main = "piechart")

19
Output:

20
21
Experiment – 5
Correlation and Covariance
Correlation and Covariance are terms used in statistics to measure relationships
between two random variables. Both of these terms measure linear dependency between a
pair ofrandom variables or bivariate data.
Correlation in R Programming Language –
cor() function in R programming measures the correlation coefficient value. Correlationis
a relationship term in statistics that uses the covariance method to measure how strong the
vectors are related. Mathematically,

where,

x represents the x data vector

y represents the y data vector
Syntax: cor(x, y, method)
where,

 x and y represents the data vectors

 method defines the type of method to be used to compute covariance.

Covariance in R Programming Language –

In R programming, covariance can be measured using cov() function. Covariance is a
statistical term used to measures the direction of the linear relationship between the data
vectors. Mathematically,

Syntax: cov(x, y, method)

where,
 x and y represents the data vectors
 method defines the type of method to be used to compute covariance.
 N represents total observations

Package – CORRPLOT()

22
R package corrplot provides a visual exploratory tool on correlation matrix that
supports automatic variable reordering to help detect hidden patterns among
variables.

corrplot is very easy to use and provides a rich array of plotting options invisualization
method, graphic layout, color, legend, text labels, etc. It also provides p-values and
confidence intervals to help users determine the statistical significance of the
correlations.

corrplot() - The mostly using parameters include method, type, order, diag, and etc.
Correlation matrix –
A correlation matrix is a table of correlation coefficients for a set of variables used to
determine if a relationship exists between the variables. The coefficient indicates both the
strength of the relationship as well as the direction.
Syntax: cor (x, use = , method = )
Parameters:
 x: It is a numeric matrix or a data frame.
 use: Deals with missing data.
 method: Deals with a type of relationship

Dataset - iris
Description –
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris
plant. One class is linearly separable from the other 2; the latter are NOT linearly separable
from each other.

Attributes -

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. species:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

Variance (ANOVA) –
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random
factors. The systematic factors have a statistical influence on the given data set, while the
random factors do not.

23
5. Aim: To Calculate Correlation and Covariance

a. To find the correlation matrix.

b. To plot the correlation plot on the dataset and visualize giving an overview of
relationships among data on iris data.
c. To analysis of covariance: variance (ANOVA), if data have categorical variables on iris
data.
Dataset used: iris
Program:
install.packages('corrplot')
x<-rnorm(2)
x
y<-rnorm(2)
y
mat<-cbind(x,y)
mat
cor(mat)
cov(mat)
data(iris)
iris
mydata<-iris[,c(1,2,3,4)]
mydata
str(mydata)
d1<-cor(mydata)
d1
library(corrplot)
corrplot(d1,method="circle")
color<-c('red','green','blue','black')
pairs(mydata,col=color,bg=color,pch=21)
cov(iris$Petal.Length,iris$Petal.Width

Output:

24
25
Experiment – 6
Regression Model
Dataset – crashdata.csv
Description – This dataset has 80 observations of 6 variables.
Attributes –
1. ManHI
2. ManBI
3. IntI
4. HVACi
5. Safety
6. CarType

Dataset – crashdataset.csv
Description – This dataset has 20 observations of 6 variables.
Attributes –
1. ManHI
2. ManBI
3. IntI
4. HVACi
5. Safety
6. CarType
GLM – ‘glm’ is used to fit generalised linear models, specified by giving a symbolic
description of the linear predictor and a description of the error distribution.
Syntax - glm (formula, family, data)
Parameters –
 Family types includes binomial, Poisson, Gaussian, gamma, quasi.
 Data: refers to the dataset being used
Package used – caret
Caret stands for classification and regression training and is arguably the biggest project in R.
One of the most powerful and popular packages is the caret library, which follows a
consistent syntax for data preparation, model building, and model evaluation, making it easy
for data science practitioners.

26
6. Aim: To evaluate the performance of Regression Model
a. Import data from web storage. Name the dataset and perform Logistic
b. Regression to find out relation between variables in the model. Also
c. check the model is fit or not [require (foreign), require(MASS)]

Datasets used are

crashdata.csv
crashdataset.csv

Program:
#logistic regression
mydata <- read.csv('crashdata.csv')
mytestdata <- read.csv('crashtestdata.csv')
mydata
mytestdata
str(mydata)
summary(mydata)
mydata[6] <- as.factor(mydata$CarType)
mydata
fit <- glm(formula=mydata$CarType~.,family='binomial', data=mydata)
fit
summary(fit)
train <- predict(fit, type='response')
plot(train)
tapply(train, mydata$CarType, mean)
pred <- predict(fit,newdata = mytestdata, type='response')
plot(pred)
mytestdata[pred<=0.5,'Predict'] <- 'Hatchback'
mytestdata[pred>0.5,'Predict'] <- 'SUV'
mytestdata
#install.packages("caret") run on console
library(caret)
confusionMatrix(table(mytestdata[,7],mytestdata[,6]),positive='Hatchback')

Output:

27
28
29
Experiment – 7
Classification Model
Packets for classification:
1. Caret package –
Caret stands for classification and regression training and is arguably the biggest project in R.
One of the most powerful and popular packages is the caret library, which follows a
consistent syntax for data preparation, model building, and model evaluation, making it easy
for data science practitioners.
2. Class package –
A class is just a blueprint or a sketch of methods or attributes. It represents the set of
properties or methods that are common to all objects of one type.
Dataset – Servicetraindata.csv
Description – This data set contains 315 observations of 6 variables.
Attributes –
1. OilQual
2. EnginePerf
3. NormMileage
4. TypeWear
5. HVACwear
6. Service
Dataset – Servicetestdata.csv
Description – This dataset contains 135 observations of 6 variables.
Attributes –
1. OilQual
2. EnginePerf
3. NormMileage
4. TypeWear
5. HVACwear
6. Service
Predictknn –
Predictions are calculated for each test case by aggregating the responses of the k-
nearest neighbours among the training cases. k may be specified to be any positive integer
less than the number of training cases, but is generally between 1 and 10.

30
7. Aim: To find the performance of Classification Model
a. To install relevant packages for classification.
b. To choose a classifier for classification problems.
c. To evaluate the performance of the classifier.

Datasets used are servicetraindata.csv and servicetestdata.csv

Program:

# install.packages("caret") run command on console

# install.packages("class") run command on console

mytraindata <- read.csv('servicetraindata.csv')

mytestdata <- read.csv('servicetestdata.csv')
mytraindata
mytestdata
str(mytraindata)
str(mytestdata)
summary(mytraindata)
summary(mytestdata)
mytraindata[6] <- as.factor(mytraindata$Service)
summary(mytraindata)
mytestdata[6] <- as.factor(mytestdata$Service)
summary(mytestdata)
library(class)
predictknn <- knn(train=mytraindata[,-6],
test=mytestdata[,-6],
cl=mytraindata$Service,
k = 3)
predictknn
library(caret)
confusionMatrix(data=predictknn,mytestdata$Service)

Output:

31
32
33
Experiment – 8
Clustering Model
8a -
K-Means Clustering in R Programming language K-Means is an iterative hard clustering
technique that uses an unsupervised learning algorithm. In this, total numbers of clusters are
pre-defined by the user and based on the similarity of each data point, the data points are
clustered. This algorithm also finds out the centroid of the cluster.
Algorithm -
• Specify number of clusters (K)
• Randomly assign each data point to a cluster
• Calculate cluster centroids
• Re-allocate each data point to their nearest cluster centroid.
• Re-figure cluster centroid.
8b -
1. We will use the built in read.csv(...) function call, which reads the data in as a data frame,
and assign the data frame to a variable (using <-) so that it is stored in R’s memory. Then we
will explore some of the basic arguments that can be supplied to the function.
2. The default for read.csv(...) is to set the header argument to TRUE. This means that the
first row of values in the .csv is set as header information (column names). If your data set
does not have a header, set the header argument to FALSE
3. To see the internal structure, we can use another function, str(). In this case, the data
frame’s internal structure includes the format of each column.
Library – factoextra
“ factoextra “ is an R package making easy to extract and visualize the output of exploratory
multivariate data analyses.
• It produces a ggplot2-based elegant data visualization with less typing.
• It contains also many functions facilitating clustering analysis and visualization.

34
8 . Aim: To evaluate the performance of Clustering Model
a. Clustering algorithms for unsupervised classification.
b. Plot the cluster data using R visualizations.

Datasets used : tripdetails.csv

Program:

mydata<-read.csv('tripdetails.csv')
mydata
str(mydata)
summary(mydata)
myclusters<-kmeans(mydata[-1],5)
myclusters
library(factoextra)
fviz_cluster(myclusters,da=mydata,goem="point")

Output:

35
36
37
Experiment-9

Reading Xml File

Aim: To read an XML file.

XML file:

<EMPLOYEE>
<ID>2</ID>
<NAME>Dan</NAME>
<SALARY>515.2</SALARY>
<STARTDATE>9/23/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>

<EMPLOYEE>
<ID>3</ID>
<NAME>Michelle</NAME>
<SALARY>611</SALARY>
<STARTDATE>11/15/2014</STARTDATE>
<DEPT>IT</DEPT>
</EMPLOYEE>

<EMPLOYEE>
<ID>5</ID>
<NAME>Gary</NAME>
<SALARY>843.25</SALARY>
<STARTDATE>3/27/2015</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>

<EMPLOYEE>
<ID>7</ID>
<NAME>Simon</NAME>
<SALARY>632.8</SALARY>
<STARTDATE>7/30/2013</STARTDATE>
<DEPT>Operations</DEPT>
</EMPLOYEE>

<EMPLOYEE>
<ID>8</ID>
<NAME>Guru</NAME>
<SALARY>722.5</SALARY>
<STARTDATE>6/17/2014</STARTDATE>
<DEPT>Finance</DEPT>
</EMPLOYEE>

38
</RECORDS>
Program:

# Load the package required to read XML files.

install.packages("XML")
library("XML")

# Also load the other required package.

library("methods")

# Give the input file name to the function.

result <- xmlParse(file = "D:/emp.xml")

# Print the result.

print(result)

Output:

39
40

Unit 2
No ratings yet
Unit 2
32 pages
FOCGB4 Utest VG 1B
100% (1)
FOCGB4 Utest VG 1B
1 page
DS Lab
No ratings yet
DS Lab
31 pages
DA_Lab_Week-1
No ratings yet
DA_Lab_Week-1
7 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Unit II - R Programming
No ratings yet
Unit II - R Programming
29 pages
R Manual
No ratings yet
R Manual
10 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Introduction To R: Arin Basu MD MPH Dataanalytics
No ratings yet
Introduction To R: Arin Basu MD MPH Dataanalytics
33 pages
R Programming Tutorial
No ratings yet
R Programming Tutorial
8 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
R Programs
No ratings yet
R Programs
12 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Unit 2 Notes - Data Analysis Using r
No ratings yet
Unit 2 Notes - Data Analysis Using r
19 pages
Lab Manual Page No 1
No ratings yet
Lab Manual Page No 1
32 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
R Programming Slides
No ratings yet
R Programming Slides
73 pages
Introduction to R
No ratings yet
Introduction to R
23 pages
DSR LAB MANUAL - 10 programs
No ratings yet
DSR LAB MANUAL - 10 programs
34 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
SEE_R_Practical_Dhara
No ratings yet
SEE_R_Practical_Dhara
57 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
R
No ratings yet
R
13 pages
4251 Assignment 8
No ratings yet
4251 Assignment 8
15 pages
Practical 1_Data Frame Manipulation_072502
No ratings yet
Practical 1_Data Frame Manipulation_072502
16 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
Untitled
No ratings yet
Untitled
59 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
R Project
0% (1)
R Project
25 pages
Possible Questions on R Programming and Metaverse
No ratings yet
Possible Questions on R Programming and Metaverse
20 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
PushpendraLabFile
No ratings yet
PushpendraLabFile
51 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
R Functions List
No ratings yet
R Functions List
8 pages
r programming built in functions
No ratings yet
r programming built in functions
8 pages
Lab Manual DAR
No ratings yet
Lab Manual DAR
81 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
R PROGRAMMING LAB MANUAL
No ratings yet
R PROGRAMMING LAB MANUAL
35 pages
Rintro
No ratings yet
Rintro
14 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
No ratings yet
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
10 pages
Empirical Software Engineering (Swe504) : Practical File
No ratings yet
Empirical Software Engineering (Swe504) : Practical File
27 pages
filefile (6) (1)
No ratings yet
filefile (6) (1)
39 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
A Review: Differential Transform Method For Semi-Analytical Solution of Differential Equations
No ratings yet
A Review: Differential Transform Method For Semi-Analytical Solution of Differential Equations
8 pages
2016 Complex Analysis Problems Solutions
100% (1)
2016 Complex Analysis Problems Solutions
102 pages
Proof of Ore's Theorem
No ratings yet
Proof of Ore's Theorem
2 pages
Semi-Analytical Karhunen-Loeve Representation of Irregular Waves Based On The Prolate Spheroidal Wave Functions
No ratings yet
Semi-Analytical Karhunen-Loeve Representation of Irregular Waves Based On The Prolate Spheroidal Wave Functions
18 pages
Semi-Analytical Solutions of Non-Linear Differential Equations Ar
No ratings yet
Semi-Analytical Solutions of Non-Linear Differential Equations Ar
137 pages
A Quasi-Boundary Semi-Analytical Method For Backward in Time Advection-Dispersion Equation
No ratings yet
A Quasi-Boundary Semi-Analytical Method For Backward in Time Advection-Dispersion Equation
26 pages
Latex 1232732314274501 2
No ratings yet
Latex 1232732314274501 2
51 pages
Tensorcalculus UnlearningVectorCalculus
No ratings yet
Tensorcalculus UnlearningVectorCalculus
13 pages
Existence and Uniqueness of A Complete Ordered Field
No ratings yet
Existence and Uniqueness of A Complete Ordered Field
7 pages
Hamiltonian Graphs
No ratings yet
Hamiltonian Graphs
22 pages
Math 327: Real Numbers and Limits: J. H. Palmieri October 2010
100% (1)
Math 327: Real Numbers and Limits: J. H. Palmieri October 2010
14 pages
Hamiltonian Cycle and Ore's Theorem
No ratings yet
Hamiltonian Cycle and Ore's Theorem
3 pages
The Structures and Evolution of Stars
No ratings yet
The Structures and Evolution of Stars
14 pages
Discrete Math
No ratings yet
Discrete Math
220 pages
Understanding Datasets
No ratings yet
Understanding Datasets
32 pages
? Week 01 - Task Assignment - Introducing Myself
No ratings yet
? Week 01 - Task Assignment - Introducing Myself
9 pages
An Introduction To Communication - ORAL COMMUNICATION IN CONTEXT
No ratings yet
An Introduction To Communication - ORAL COMMUNICATION IN CONTEXT
20 pages
Nano Syntax
No ratings yet
Nano Syntax
6 pages
Retirement Emcee Script TCHR Nurminazura
No ratings yet
Retirement Emcee Script TCHR Nurminazura
3 pages
1500 Most Common English Words
No ratings yet
1500 Most Common English Words
32 pages
Isc Mathematics Project
No ratings yet
Isc Mathematics Project
14 pages
Abap Oops
No ratings yet
Abap Oops
38 pages
Literature Circles Role Sheets (My Own)
No ratings yet
Literature Circles Role Sheets (My Own)
7 pages
005 29 Hacking Wireless Networks Theory and Practice
No ratings yet
005 29 Hacking Wireless Networks Theory and Practice
2 pages
Aai Interviewing
No ratings yet
Aai Interviewing
10 pages
English Power Point Othello
No ratings yet
English Power Point Othello
8 pages
Herpderp1909 Dragons Reworked Part IV - Dragon Hall of Fame
No ratings yet
Herpderp1909 Dragons Reworked Part IV - Dragon Hall of Fame
55 pages
Bar Chart
No ratings yet
Bar Chart
4 pages
WG (Chapters 13-15) - Mr. Snyder GT
No ratings yet
WG (Chapters 13-15) - Mr. Snyder GT
2 pages
Rahmani 30 Entrance Exam Analysis (2020 & 2025)
No ratings yet
Rahmani 30 Entrance Exam Analysis (2020 & 2025)
6 pages
University of Aden Faculty of Languages
No ratings yet
University of Aden Faculty of Languages
83 pages
Evolution of Communication
No ratings yet
Evolution of Communication
2 pages
4-2018 Product Bulletin - Survey Pro Windows 10 POPN Licensing
No ratings yet
4-2018 Product Bulletin - Survey Pro Windows 10 POPN Licensing
4 pages
Morphi Aqib Buddhism
No ratings yet
Morphi Aqib Buddhism
8 pages
Vedic Maths Final PPT-1
No ratings yet
Vedic Maths Final PPT-1
21 pages
Palestine Test
No ratings yet
Palestine Test
2 pages
g6 Past Tenses
No ratings yet
g6 Past Tenses
32 pages
Research Report
No ratings yet
Research Report
84 pages
Name - Spotlight 7 Module 1. Test.: 1. Match The Words/phrases To Their Definition
No ratings yet
Name - Spotlight 7 Module 1. Test.: 1. Match The Words/phrases To Their Definition
3 pages
Maths Notes
No ratings yet
Maths Notes
3 pages
Telephone Information System
No ratings yet
Telephone Information System
68 pages
Dialnet MalaysianPerceptionsOfChina 1210502 PDF
No ratings yet
Dialnet MalaysianPerceptionsOfChina 1210502 PDF
13 pages
Door Lock System
No ratings yet
Door Lock System
18 pages
Natural Language Processing Using Artificial Intelligence
No ratings yet
Natural Language Processing Using Artificial Intelligence
3 pages