0% found this document useful (0 votes)

10 views64 pages

Dsda Manual

The document provides an introduction to R programming, detailing basic arithmetic and logical operations, as well as file import/export functionalities. It includes algorithms and sample programs for various operations such as addition, subtraction, and data frame manipulations, along with visualization techniques like bar plots and histograms. The document aims to demonstrate the capabilities of R for data exploration and analysis through practical examples.

Uploaded by

dharsanimv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views64 pages

Dsda Manual

Uploaded by

dharsanimv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Ex.No.

1 INTRODUCTION TO R STUDIO, BASIC OPERATIONS AND

IMPORT AND EXPORT OF DATA USING R TOOL

AIM
To demonstrate basic arithmetic and logical operations, as well as file
input/output operations, using r programming language.

ALGORITHM
1. ADDITION OPERATOR (+)
 Define vectors a and b with values.
 Print the sum of vectors a and b.
2. SUBTRACTION OPERATOR (-)
 Define variables a and b with values.
 Print the result of subtracting b from a.
3. MULTIPLICATION OPERATOR (*)
 Define vectors b and c with values.
 Print the element-wise product of vectors b and c.
4. DIVISION OPERATOR (/)
 Define variables a and b with values.
 Print the result of dividing a by b.
5. ELEMENT-WISE LOGICAL AND OPERATOR (&)
 Define lists list1 and list2 with values.
 Print the element-wise logical and operation result of list1 and
list2.
6. ELEMENT-WISE LOGICAL OR OPERATOR (|)
 Define lists list1 and list2 with values.
 Print the element-wise logical or operation result of list1 and list2.
7. NOT OPERATOR (!)
 Define list list1 with values.
 Print the logical not operation result of list1.
8. LOGICAL AND OPERATOR (&&)
 Define lists list1 and list2 with values.
 Print the logical and operation result of list1 and list2.
9. LESS THAN (<)
 Define lists list1 and list2 with values.
 Print the result of comparing elements of list1 and list2 for less
than.
10.LESS THAN OR EQUAL TO (<=)
 Define lists list1 and list2 with values.
 Convert lists to character vectors.
 Print the result of comparing elements of list1 and list2 for less
than or equal to.
11.GREATER THAN (>)
 Define lists list1 and list2 with values.
 Print the result of comparing elements of list1 and list2 for greater
than.
12.GREATER THAN OR EQUAL TO (>=)
 Define lists list1 and list2 with values.
 Print the result of comparing elements of list1 and list2 for greater
than or equal to.
13.LEFT ASSIGNMENT (<-, <<- OR =)
 Define a vector vec1 using left assignment.
 Print vec1.
14.RIGHT ASSIGNMENT (-> OR ->>)
 Define a vector vec1 using right assignment.
 Print vec1.
15.%IN% OPERATOR
 Define a value and a list.
 Print whether the value is in the list using %in% operator.
16.%*% OPERATOR
 Define a matrix.
 Print the matrix.
17.EXPORT FILE INTO R
 Create vectors for name and age.
 Create a data frame using these vectors.
 Print the data frame.
18.CSV FILE FORMAT SAVE
 Create vectors for name and age.
 Create a data frame using these vectors.
 View the data frame.
 Write the data frame to a csv file named "newdata.csv".
19.SAVE FILE PATH VIEW
 Create vectors for name and age.
 Create a data frame using these vectors.
 View the data frame.
 Write the data frame to a csv file named "newdata.csv".
 Print the current working directory.

PROGRAM
Addition operator (+):
Sample Input:
a <- c (1, 0.1)
b <- c (2.33, 4)
print (a+b)
Sample Output : 3.33 4.10

Subtraction Operator (-):

Sample Input:
a <- 6
b <- 8.4
print (a-b)
Sample Output : -2.4

Multiplication Operator (*) :

Sample Input:
B= c(4,4)
C= c(5,5)
print (B*C)
Sample Output : 20 20

Division Operator (/) :

Sample Input:
a <- 10
b <- 5
print (a/b)
Sample Output : 2

Element-wise Logical AND operator (&):

Sample Input:
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1 & list2)
Sample Output : : FALSE TRUE

Element-wise Logical OR operator (|):

Sample Input:
list1 <- c(TRUE, 0.1)
list2 <- c(0,4+3i)
print(list1|list2)
Sample Output : TRUE TRUE

NOT operator (!):

Sample Input:
list1 <- c(0,FALSE)
print(!list1)
Sample Output : TRUE TRUE

Logical AND operator (&&):

Sample Input:
list1 <- c(0.1)
list2 <- c(4+3i)
print(list1 && list2)
Sample Output : TRUE

Less than (<):

Sample Input:
list1 <- c(TRUE, 0.1,"apple")
list2 <- c(0,0.1,"bat")
print(list1<list2)
Sample Output : FALSE FALSE TRUE

Less than equal to (<=):

Sample Input:
list1 <- c(TRUE, 0.1, "apple")
list2 <- c(TRUE, 0.1, "bat")
list1_char <- as.character(list1)
list2_char <- as.character(list2)
# Compare character strings
print(list1_char <= list2_char)
Sample Output : TRUE TRUE TRUE

Greater than (>):

Sample Input:
list1 <- c(TRUE, 0.1, "apple")
list2 <- c(TRUE, 0.1, "bat")
print(list1 > list2)
Sample Output : FALSE FALSE FALSE

Greater than equal to (>=) :

Sample Input:
list1 <- c(TRUE, 0.1, "apple")
list2 <- c(TRUE, 0.1, "bat")
print(list1 >= list2)
Sample Output : TRUE TRUE FALSE
Left Assignment (<- or <<- or =):
Sample Input:
vec1 = c("ab", TRUE)
print (vec1)
Sample Output : "ab" "TRUE"

Right Assignment (-> or ->>):

Sample Input:
c("ab", TRUE) ->> vec1
print (vec1)
Sample Output : "ab" "TRUE"

%in% Operator:
Sample Input:
val <- 0.1
list1 <- c(TRUE, 0.1,"apple")
print (val %in% list1)
Sample Output : TRUE

%*% Operator:
Sample Input:
mat = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
print (mat)
Sample Output : 1 3 5
2 4 6
EXPORT FILE INTO R ?
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
Sample output:
Name Age
xxx 20
yyy 30
zzz 25
aaa 21
bbb 23

CSV FILE FORMAT SAVE :

Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
View(data)
write.csv(data,"newdata.csv")

SAVE FILE PATH VIEW:

Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
View(data)
write.csv(data,"newdata.csv")
getwd()

Sample output:
"C:/Users/ELCOT/Documents/newdata"

RESULT
The program executes various arithmetic and logical operations, showcasing their
outcomes along with file input/output operations, ultimately generating files and
displaying results of operations on provided data.

Ex.No. 2 IMPLEMENT DATA EXPLORATION AND VISUALIZATION

ON DIFFERENT DATASETS TO EXPLORE MULTIPLE AND
INDIVIDUAL VARIABLES

AIM
To implement data exploration and visualization on different datasets to explore
multiple and individual variables

ALGORITHM
1. TOTAL ROWS AND COLUMNS:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the dimensions of the data frame.
2. PROJECT VIEW:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the first few rows of the data frame.
3. TITLES VIEW:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the names of columns in the data frame.
4. DATA FRAME:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the structure of the data frame.
5. PROJECT VIEW:
 Create vectors for name and age.
 Combine vectors into a data frame.
 View the data frame in the rstudio project view.
6. NUMERIC OR CHARACTER:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the class of the name and age columns.
7. TABLE VIEW:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print the frequency table of the name column.
8. MEAN, MEDIAN, MODE, QUALITY, MINIMUM, MAXIMUM:
 Create vectors for name and age.
 Combine vectors into a data frame.
 Print summary statistics for the age column.
9. BARPLOT:
 Load the airquality dataset into a data frame.
 View the data frame.
 Create a barplot of the data.
10.HISTOGRAM:
 Load the airquality dataset into a data frame.
 View the data frame.
 Create a histogram of the data.
11.BOX PLOT:
 Load the airquality dataset into a data frame.
 View the data frame.
 Create a box plot of the data.
12.SCATTER PLOT:
 Load the airquality dataset into a data frame.
 View the data frame.
 Create a scatter plot of the data.
13.HEAT MAP:
 Create a random matrix.
 Assign row and column names to the matrix.
 Create a heatmap of the matrix.
14.3D GRAPHS:
 Define a function for a cone.
 Prepare variables for x, y, and z.
 Plot a 3d surface of the cone.
15.CLASS() FUNCTION:
 Define variables with different data types.
 Print the class of each variable.
16.LS() FUNCTION:
 Define variables using different assignment operators.
 Print the list of variables in the current environment.
17.RM() FUNCTION:
 Define variables using different assignment operators.
 Remove a variable.
 Print the removed variable (to demonstrate removal).
 Print an existing variable (to demonstrate non-removal).

PROGRAM
Total Rows and Columns:
Sample Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
dim(data)
Output : 5 2

Project view:
Sample Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
head(data)
Output : Name Age
xxx 20
yyy 30
zzz 25
aaa 21
bbb 23

Titles view:
Sample Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
names(data)
Output : "Name" "Age"

Data Frame:
Sample Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
str(data)
Output : data.frame': 5 obs. of 2 variables:
$ Name: chr "xxx" "yyy" "zzz" "aaa" ...
$ Age : num 20 30 25 21 23

Numeric or Character:
Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
class(data$Name)
Output : "character"

Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
class(data$Age)
Output : " "numeric""

Table view:
Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
table(data$Name)
Output : aaa bbb xxx yyy zzz
1 1 1 1 1

Mean,Median,Mode,Quality ,Minimum,Maximum:
Input:
Name<- c("xxx","yyy","zzz","aaa","bbb")
Age<- c(20,30,25,21,23)
data<-data.frame(Name,Age)
data
summary(data$Age))
Output :
Min. 1st Qu. Median Mean 3rd Qu. Max.
20.0 21.0 23.0 23.8 25.0 30.0

BARPLOT:
Input:
data<-data.frame(airquality)
View(airquality)
Output :

barplot(airquality$Ozone,
main = 'Ozone Concenteration in air',
xlab = 'ozone levels', horiz = TRUE)
Output :
HISTOGRAM:
Input:
data<-data.frame(airquality)
View(airquality)
Output :

data(airquality)
hist(airquality$Temp, main ="La Guardia Airport's\
Maximum Temperature(Daily)",
xlab ="Temperature(Fahrenheit)",
xlim = c(50, 125), col ="yellow",
freq = TRUE)

Output :

BOX PLOT:
Input:
data<-data.frame(airquality)
View(airquality)
Output :

# Box plot for average wind speed

data(airquality)
boxplot(airquality$Wind, main = "Average wind speed\
at La Guardia Airport",
xlab = "Miles per hour", ylab = "Wind",
col = "orange", border = "brown",
horizontal = TRUE, notch = TRUE)
Output :

SCATTER PLOT:
Input:
data<-data.frame(airquality)
View(airquality)
Output :
# Scatter plot for Ozone Concentration per month
data(airquality)
plot(airquality$Ozone, airquality$Month,
main ="Scatterplot Example",
xlab ="Ozone Concentration in parts per billion",
ylab =" Month of observation ", pch = 19)

Output :
HEAT MAP:
Input:
data <- matrix(rnorm(50, 0, 5), nrow = 5, ncol = 5)
colnames(data) <- paste0("col", 1:5)
rownames(data) <- paste0("row", 1:5)
heatmap(data)
View(data)
Output :

# Set seed for reproducibility

# set.seed(110)
# Create example data
data <- matrix(rnorm(50, 0, 5), nrow = 5, ncol = 5)
# Column names
colnames(data) <- paste0("col", 1:5)
rownames(data) <- paste0("row", 1:5)
# Draw a heatmap
heatmap(data)
Output :

3D GRAPHS:
Input:
# Adding Titles and Labeling Axes to Plot
cone <- function(x, y){
sqrt(x ^ 2 + y ^ 2)
}

# prepare variables.
x <- y <- seq(-1, 1, length = 30)
z <- outer(x, y, cone)

# plot the 3D surface

# Adding Titles and Labeling Axes to Plot
persp(x, y, z,
main="Perspective Plot of a Cone",
zlab = "Height",
theta = 30, phi = 15,
col = "orange", shade = 0.4)

Output :

class() function:
Input:
#Character
var1 = "hello"
print(class(var1))
Output : "character"
Input:
#Variable
var1 = 10
print(class(var1))
Output : "numeric"

ls() function:
Input:
var1 = "hello"
var2 <- "hello"
"hello" -> var3
print(ls())
Output : "var1" "var2" "var3"

rm() function:
Input:
# using equal to operator
var1 = "hello"
# using leftward operator
var2 <- "hello"
# using rightward operator
"hello" -> var3
# Removing variable
rm(var3)
print(var2)
Output : "hello"
RESULT
Thus the data exploration and visualization on different datasets to explore
multiple and individual variables has been implemented.

Ex.No. 3 BUILD A DECISION TREE USING PARTY AND RPART

PACKAGES

AIM
To build a decision tree using party and rpart packages.

PACKAGE - PARTY
ALGORITHM
1. Load necessary libraries required for the analysis.

2. Load the dataset named readingSkills from the R datasets package and
display the first few rows using head() function.

3. Split the dataset into training and testing datasets using the sample.split()
function from caTools package.

4. Build a classification tree model (ctree()) using the training data

5. Plot the decision tree model using the plot() function.

PROGRAM
library(datasets)
library(catools)
library(party)
library(dplyr)
library(magrittr)
data("readingskills")
head(readingskills)
sample_data = sample.split(readingskills, splitratio = 0.8)
train_data <- subset(readingskills, sample_data == true)
test_data <- subset(readingskills, sample_data == false)
model<- ctree(nativespeaker ~ ., train_data)
plot(model)

OUTPUT

PACKAGE – RPART
ALGORITHM
1. Load necessary libraries required for the analysis.

2. Load the iris dataset from the datasets package into a data frame and view
its structure.
3. Set a seed for reproducibility, then randomly sample 70% of the rows
from the iris dataset for training and the rest for testing.

4. Build a decision tree model (rpart()) using the training data. The target
variable is species and all other variables are predictors.

5. Plot the decision tree model using the rpart.plot() function.

PROGRAM
library(rpart)
library(rpart.plot)
data <- data.frame(iris)
view(data)
set.seed(123)
train_index <- sample(1:nrow(iris), size = 0.7 * nrow(iris))
train <- iris[train_index, ]
test <- iris[-train_index, ]
tree <- rpart(species ~ ., data = train, method = "class")
rpart.plot(tree, main = "decision tree for the iris dataset")

OUTPUT
RESULT
Thus a decision tree using party and rpart packages has been built.

EX.NO. 4 BUILD A PREDICTIVE MODEL USING RANDOMFOREST

PACKAGE

AIM
To build a predictive model using randomForest package.

ALGORITHM
1. Load the required libraries: 'lubridate', 'randomForest', and 'forecast'.

2. Define the sales data vector 'x' containing observed sales data.

3. Convert 'x' into a time series object 'mts' with start date and frequency.

4. Convert 'mts' into a data frame 'df' with 'Week' and 'Sales' columns.

5. Build a random forest model 'rf_model' using 'randomForest()' with

'Sales' as the dependent variable and 'Week' as the independent variable.

6. Generate forecasts for the next 5 periods using 'predict()' with new data
representing the next 5 weeks.

7. Plot the observed sales data and the forecasted sales values for the next 5
periods using 'plot()'.

PROGRAM
# Load Required Libraries
library(lubridate)
library(randomForest)
library(forecast)
# Define Sales Data
x <- c(580, 7813, 28266, 59287, 75700,
87820, 95314, 126214, 218843,
471497, 936851, 1508725, 2072113)

# Convert Data to Time Series

mts <- ts(x, start = decimal_date(ymd("2020-01-22")), frequency = 365.25 / 7)

# Convert to Data Frame

df <- data.frame(Week = seq_len(length(x)), Sales = x)

# Build Random Forest Model

rf_model <- randomForest(Sales ~ Week, data = df)

# Generate Forecast for Next 5 Periods

forecast_values <- predict(rf_model, newdata = data.frame(Week =
seq(length(x) + 1, length(x) + 5)))

# Plot Forecast
plot(1:(length(x) + 5), c(x, forecast_values), type = "l",
xlab = "Week", ylab = "Total Revenue",
main = "Sales vs Revenue", col.main = "darkgreen",
ylim = c(0, max(x, forecast_values)))
lines(1:length(x), x, col = "blue", lwd = 2) # Plot observed sales
lines((length(x) + 1):(length(x) + 5), forecast_values, col = "red", lwd = 2) #
Plot forecasted sales
legend("topright", legend = c("Observed Sales", "Forecasted Sales"),
col = c("blue", "red"), lwd = 2)

OUTPUT

RESULT
Thus a predictive model using randomForest package has been built.

Ex.No.5 IMPLEMENT LINEAR AND LOGISTIC REGRESSION ON

DATASETS TO PREDICT THE PROBABILITY

AIM
To implement linear and logistic regression on datasets to predict the probability
LINEAR REGRESSION
ALGORITHM
1. Define two vectors 'x' and 'y' containing weight and height data points
respectively.
2. Fit a linear regression model 'relation' using the 'lm()' function, predicting
'y' based on 'x'.

3. Create a scatter plot of 'y' versus 'x' using 'plot()', with a regression line
added using 'abline()' with the linear regression model. Customize plot
appearance including title, axis labels, color, and point shape.

4. Save the plot as an image file named "linearregression.png" using 'png()'

function. Close the graphics device and finalize the image using 'dev.off()'.

PROGRAM
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)
png(file = "linearregression.png")
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",
ylab = "Height in cm")
dev.off()
OUTPUT
LOGISTIC REGRESSION
ALGORITHM
1. Install and load required packages: 'dplyr', 'catools', and 'rocr'.

2. Generate summary statistics for the 'mtcars' dataset using 'summary()'.

3. Split the 'mtcars' dataset into training and testing sets using
'sample.split()' function.

4. Subset the dataset into training and testing sets.

5. Build a logistic regression model using 'glm()' with 'vs' as dependent and
'wt' and 'disp' as independent variables.

6. Generate summary of the logistic regression model.

7. Make predictions on the testing set and calculate classification error.

8. Calculate area under the roc curve (auc) using 'prediction()' and
'performance()' functions.

9. Plot the roc curve with labeled cutoffs and a legend showing the
calculated auc.

PROGRAM
install.packages("dplyr")
library(dplyr)
summary(mtcars)
install.packages("caTools")
install.packages("ROCR")
library(caTools)
library(ROCR)
split <- sample.split(mtcars, SplitRatio = 0.8)
Split
train_reg <- subset(mtcars, split == "TRUE")
test_reg <- subset(mtcars, split == "FALSE")
logistic_model <- glm(vs ~ wt + disp,data = train_reg,family = "binomial")
logistic_model
summary(logistic_model)
predict_reg <- predict(logistic_model,test_reg, type = "response")
predict_reg
predict_reg <- ifelse(predict_reg >0.5, 1, 0)
table(test_reg$vs, predict_reg)
missing_classerr <- mean(predict_reg != test_reg$vs)
print(paste('Accuracy =', 1 - missing_classerr))
ROCPred <- prediction(predict_reg, test_reg$vs)
ROCPer <- performance(ROCPred, measure = "tpr",
x.measure = "fpr")
auc <- performance(ROCPred, measure = "auc")
auc <- [email protected][[1]]
auc
plot(ROCPer)
plot(ROCPer, colorize = TRUE, print.cutoffs.at = seq(0.1, by = 0.1), main
= "ROC CURVE")
abline(a = 0, b = 1)
auc <- round(auc, 4)
legend(.6, .4, auc, title = "AUC", cex = 1)

OUTPUT
RESULT
Thus the linear and logistic regression on datasets to predict the probability has
been implemented.

Ex.No.6 IMPLEMENT K-MEANS, K-MEDOIDS, HIERARCHICAL AND

DENSITY BASED CLUSTERING TECHNIQUES

AIM
To implement K-means, K-medoids, hierarchical and density based clustering
techniques.

K-MEANS
ALGORITHM
1. Load the Iris dataset using 'data(iris)'.

2. Examine the structure of the dataset using 'str()' to understand its variables
and types.
3. Install required packages using 'install.packages()': 'ClusterR' and 'cluster'.

4. Load required libraries using 'library()': 'ClusterR' and 'cluster'.

5. Preprocess the dataset:

- Remove the species column to perform clustering only on numerical
variables.

6. Perform K-Means Clustering:

- Set seed for reproducibility using 'set.seed()'.
- Apply K-means clustering to the preprocessed dataset with 3 centers
and 20 random starts.

7. Explore Clustering Results:

- Display clustering results obtained from K-means.
- Compute confusion matrix to evaluate clustering performance.

8. Visualize Clustering Results:

- Create scatter plots of the iris dataset using 'plot()'.
- Color points based on assigned clusters obtained from K-means.
- Add cluster centers to the plot using different symbols and colors.
- Visualize clustering results using 'clusplot()' to generate a cluster
plot.

PROGRAM
data(iris)
str(iris)
install.packages("ClusterR")
install.packages("cluster")
library(ClusterR)
library(cluster)
iris_1 <- iris[, -5]
set.seed(240)
kmeans.re <- kmeans(iris_1, centers = 3, nstart = 20)
kmeans.re

kmeans.re$cluster
cm <- table(iris$Species, kmeans.re$cluster)
cm

plot(iris_1[c("Sepal.Length", "Sepal.Width")])
plot(iris_1[c("Sepal.Length", "Sepal.Width")], col = kmeans.re$cluster)
plot(iris_1[c("Sepal.Length", "Sepal.Width")], col = kmeans.re$cluster,
main = "K-means with 3 clusters")

kmeans.re$centers
kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")]
points(kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")],col = 1:3,
pch = 8, cex = 3)
y_kmeans <- kmeans.re$cluster
clusplot(iris_1[, c("Sepal.Length", "Sepal.Width")],
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste("Cluster iris"),
xlab = 'Sepal.Length',
ylab = 'Sepal.Width')

OUTPUT
K-MEDOIDS
ALGORITHM
1. Load the 'factoextra' and 'cluster' packages using 'library()'.

2. Prepare the dataset:

- Load the 'USArrests' dataset.
- Remove missing values using 'na.omit()'.
- Standardize the variables using 'scale()'.

3. Explore the optimal number of clusters:

- Visualize the number of clusters using the Within Sum of Squares
(WSS) method.
- Use 'fviz_nbclust()' to calculate and visualize the number of
clusters based on Partitioning Around Medoids (PAM) algorithm.
- Select the number of clusters based on the "elbow" or a significant
decrease in WSS.

4. Compute Gap statistics:

- Calculate the Gap statistic to determine the optimal number of
clusters.
- Use 'clusGap()' to compute the Gap statistic.
- Specify the range of candidate cluster numbers and the number of
bootstrap replicates.

5. Visualize Gap statistics:

- Plot the Gap statistic results using 'fviz_gap_stat()'.
- Compare the observed Gap statistic to its expected value to identify
the optimal number of clusters.

PROGRAM
library(factoextra)
library(cluster)
df <- USArrests
df <- na.omit(df)
df <- scale(df)
head(df)

fviz_nbclust(df, pam, method = "wss")

gap_stat <- clusGap(df, FUN = pam, K.max = 10, B = 50)

fviz_gap_stat(gap_stat)

OUTPUT
HIERARCHICAL CLUSTERING
ALGORITHM
1. Install and load the 'dplyr' package.

2. Load the dataset 'mtcars' and examine its first few rows using 'head()'
function.

3. Compute the Euclidean distance matrix from the dataset using 'dist()'
function.

4. Perform hierarchical clustering on the distance matrix using 'hclust()'

function with the "average" linkage method.

5. Plot the dendrogram obtained from hierarchical clustering using 'plot()'

function.

6. Optionally, add a horizontal line at a specified height to cut the

dendrogram into clusters using 'abline()'.

7. Cut the dendrogram into a specified number of clusters using 'cutree()'

function.

8. Assign each observation to a cluster based on the cutting height obtained

in the previous step.

9. Generate a table showing the number of observations assigned to each

cluster using 'table()' function.

10.Visualize the clusters in the dendrogram by drawing rectangles around

them using 'rect.hclust()' function.

PROGRAM
install.packages("dplyr")
library(dplyr)
head(mtcars)
distance_mat <- dist(mtcars, method = 'euclidean’)
distance_mat

set.seed(240)
Hierar_cl <- hclust(distance_mat, method = "average")
Hierar_cl

plot(Hierar_cl)abline(h = 110, col = "green")

fit <- cutree(Hierar_cl, k = 3 )
fit

table(fit)rect.hclust(Hierar_cl, k = 3, border = "green")

OUTPUT
DENSITY-BASED CLUSTERING
ALGORITHM
1. Load the Iris dataset using 'data()' function.

2. Examine the structure of the dataset using 'str()' function.

3. Install the 'dbscan' package using 'install.packages()' function.

4. Load the 'dbscan' package using 'library()' function.

5. Convert the iris dataset into a matrix format excluding the species column
using 'as.matrix()'.

6. Plot the k-nearest neighbor distance plot ('kNNdistplot') of the iris data
matrix with a specified value of k.

7. Optionally, add a horizontal line at a specific distance threshold using

'abline()'.

8. Set the random seed using 'set.seed()' for reproducibility.

9. Perform DBSCAN clustering ('dbscan()') on the iris data matrix with the
specified epsilon (eps) and minimum points (minPts) parameters.

10.Store the result in 'db'.

11.Plot the cluster hulls ('hullplot()') of the iris data matrix based on the
DBSCAN clustering result.

12.The hulls represent the convex hulls of each cluster.

PROGRAM
data(iris)
str(iris)
install.packages(“dbscan”)
library(dbscan)
iris_matrix <- as.matrix(iris[, -5])
kNNdistplot(iris_matrix, k=4)

abline(h=0.4, col="red")
set.seed(1234)
db = dbscan(iris_matrix, 0.4, 4)
Db
hullplot(iris_matrix, db$cluster
OUTPUT

RESULT
Thus the K-means, K-medoids, hierarchical and density based clustering
techniques has been implemented.

Ex.No. 7 IMPLEMENT TIME SERIES ANALYSIS USING

CLASSIFICATION AND CLUSTERING TECHNIQUES

AIM
To implement time series analysis using classification and clustering
techniques.

ALGORITHM
1. Install the "lubridate" package using `install.packages("lubridate")`.
2. Load the "lubridate" package using `library(lubridate)`.
3. Define the weekly data vector `x` representing total positive COVID-19
cases.

4. Convert the data into a time series object `mts` using the `ts()` function.

5. Specify the start date as January 22, 2020, and the frequency as weekly.

6. Open a PNG device for saving the plot using `png(file =

"timeSeries.png")`.

7. Plot the time series data `mts` using the `plot()` function.

8. Customize the plot with appropriate axis labels, title, and color.

9. Save the plot as a PNG file named "timeSeries.png" using `dev.off()`.

PROGRAM
x <- c(580, 7813, 28266, 59287, 75700,
87820, 95314, 126214, 218843, 471497,
936851, 1508725, 2072113)
Install.packages(“lubridate”)
library(lubridate)
png(file ="timeSeries.png")
mts <- ts(x, start = decimal_date(ymd("2020-01-22")),
frequency = 365.25 / 7)
plot(mts, xlab ="Weekly Data",
ylab ="Total Positive Cases",
main ="COVID-19 Pandemic",
col.main ="darkgreen")
dev.off()
OUTPUT

CLUSTERING
ALGORITHM
1. Install the "factoextra" package if it's not already installed.
2. Load the "factoextra" library.
3. Load the dataset (in this case, mtcars).
4. Remove any rows with missing values from the dataset.
5. Scale the dataset to standardize the variables.
6. Open a PNG device for saving the plot.
7. Perform K-means clustering on the scaled dataset with specified
parameters (centers = 5, nstart = 25).
8. Visualize the clustering results using the fviz_cluster() function from the
factoextra package.
9. Save the plot as a PNG file.
10.Close the PNG device.
PROGRAM
# Install the "factoextra" package if not installed
# install.packages("factoextra")

# Load the "factoextra" library

library(factoextra)

# Load the dataset (in this case, mtcars)

df <- mtcars

# Remove any rows with missing values from the dataset

df <- na.omit(df)

# Scale the dataset to standardize the variables

df <- scale(df)

# Open a PNG device for saving the plot

png(file = "KMeansExample2.png")

# Perform K-means clustering on the scaled dataset with specified parameters

km <- kmeans(df, centers = 5, nstart = 25)

# Visualize the clustering results using the fviz_cluster() function

fviz_cluster(km, data = df)

# Save the plot as a PNG file

dev.off()
OUTPUT

RESULT
Thus the time series analysis using classification and clustering techniques has
been implemented.

Ex.No. 8 IMPLEMENT APRIORI ALGORITHM IN ASSOCIATION

RULE MINING

AIM
To implement apriori algorithm in association rule mining.

ALGORITHM
1. Install the "arules" package if it's not already installed.
2. Load the "arules" library.
3. Install the "arulesViz" package if it's not already installed.
4. Load the "arulesViz" library.
5. Install the "RColorBrewer" package if it's not already installed.
6. Load the "RColorBrewer" library.
7. Load the "Groceries" dataset.
8. Generate association rules using the apriori() function from the "arules"
package. Set parameters for minimum support and confidence levels.
9. Inspect the first 10 association rules using the inspect() function.
10.Plot the relative item frequency of the top 20 items using the
itemFrequencyPlot() function. Customize the plot with colors from the
RColorBrewer palette.

PROGRAM
install.packages(“arules”)
library(arules)
install.packages(“arulesViz”)
library(arulesViz)
install.packages(“RColorBrewer”)
library(RColorBrewer)
data("Groceries")
rules <- apriori(Groceries,
parameter = list(supp = 0.01, conf = 0.2))
inspect(rules[1:10])

arules::itemFrequencyPlot(Groceries, topN = 20,

col = brewer.pal(8, 'Pastel2'),
main = 'Relative Item Frequency Plot',
type = "relative",
ylab = "Item Frequency (Relative)")

OUTPUT
ASSOCIATION RULE MINING
ALGORITHM
1. Define a list representing market basket transactions.
2. Assign names to the transactions.
3. Load the "arules" library.
4. Convert the list to transactions using the as() function.
5. Display the dimensions of the transactions.
6. Display a summary of the transactions.
7. Display an image of the transactions.
8. Plot the item frequency of the transactions.
9. Generate association rules using the apriori() function with specified
parameters.
10.Display a summary of the generated rules.
11.Inspect the generated rules.
12.Extract association rules with "beer" on the right-hand side (rhs).
13.Inspect the rules with "beer" on the left-hand side (lhs).
14.Load the "arulesViz" library.
15.Plot the association rules using different visualization methods.
PROGRAM
# Define market basket transactions
market_basket <- list(
c("apple", "beer", "rice", "meat"),
c("apple", "beer", "rice"),
c("apple", "beer"),
c("apple", "pear"),
c("milk", "beer", "rice", "meat"),
c("milk", "beer", "rice"),
c("milk", "beer"),
c("milk", "pear")
)
names(market_basket) <- paste("T", c(1:8), sep = "")

# Load the "arules" library

library(arules)

# Convert list to transactions

trans <- as(market_basket, "transactions")

# Display dimensions of transactions

dim(trans)

# Display summary of transactions

summary(trans)
# Display an image of the transactions
image(trans)

# Plot item frequency of transactions

itemFrequencyPlot(trans, topN = 10, cex.names = 1)

# Generate association rules

rules <- apriori(trans, parameter = list(supp = 0.3, conf = 0.5, maxlen = 10,
target = "rules"))

# Display summary of generated rules

summary(rules)

# Inspect generated rules

inspect(rules)

# Extract association rules with "beer" on the right-hand side (rhs)

beer_rules_rhs <- apriori(trans, parameter = list(supp = 0.3, conf = 0.5, maxlen
= 10, minlen = 2),
appearance = list(default = "lhs", rhs = "beer"))
# Inspect rules with "beer" on the left-hand side (lhs)
inspect(beer_rules_lhs)

# Load the "arulesViz" library

library(arulesViz)

# Plot association rules using different visualization methods

plot(rules)
plot(rules, measure = "confidence")

plot(rules, method = "two-key plot")

plot(rules, engine = "plotly")

subrules <- head(rules, n = 10, by = "confidence")

plot(subrules, method = "graph", engine = "htmlwidget")
plot(subrules, method = "paracoord")

RESULT
Thus the Apriori Algorithm in Association rule mining has been implemented.

Ex.No. 9 IMPLEMENT TEXT MINING ON TWITTER DATA USING

twitterR PACKAGE

AIM
To implement text mining on twitter data using twitterR package.

ALGORITHM
1. Install required packages: "rtweet", "ggplot2", "dplyr", "tidytext",
"igraph", and "ggraph".
2. Load the required libraries: "rtweet", "ggplot2", "dplyr", "tidytext",
"igraph", and "ggraph".
3. Use search_tweets() function from "rtweet" package to search for tweets
containing the hashtag "#climatechange".
4. Clean the text of tweets by removing URLs using regular expressions.
5. Tokenize the cleaned text to extract individual words.
6. Count the frequency of unique words in the tweets.
7. Plot the count of the top 15 unique words found in the tweets using
ggplot2.
8. Load the built-in stop words dataset from "tidytext".
9. Remove stop words from the tokenized text.
10.Count the frequency of unique words after removing stop words.
11.Plot the count of the top 15 unique words after stop words removal using
ggplot2.

PROGRAM
# Load required libraries
library(rtweet)
library(ggplot2)
library(dplyr)
library(tidytext)
library(igraph)
library(ggraph)

# Search for tweets containing the hashtag "#climatechange"

climate_tweets <- search_tweets(q = "#climatechange", n = 10000, lang = "en",
include_rts = FALSE)

# Clean the text of tweets by removing URLs using regular expressions

climate_tweets$stripped_text <- gsub("http.*", "", climate_tweets$text)
climate_tweets$stripped_text <- gsub("https.*", "",
climate_tweets$stripped_text)

# Tokenize the cleaned text to extract individual words

climate_tweets_clean <- climate_tweets %>%
dplyr::select(stripped_text) %>%
unnest_tokens(word, stripped_text)

# Count the frequency of unique words in the tweets

word_freq <- climate_tweets_clean %>%
count(word, sort = TRUE) %>%
top_n(15) %>%
mutate(word = reorder(word, n))

# Plot the count of the top 15 unique words found in tweets

ggplot(word_freq, aes(x = word, y = n)) +
geom_col() +
coord_flip() +
labs(x = "Unique words", y = "Count",
title = "Count of unique words found in tweets")

# Load the built-in stop words dataset from "tidytext"

data("stop_words")

# Remove stop words from the tokenized text

cleaned_tweet_words <- anti_join(climate_tweets_clean, stop_words)

# Count the frequency of unique words after removing stop words

word_freq_no_stopwords <- cleaned_tweet_words %>%
count(word, sort = TRUE) %>%
top_n(15) %>%
mutate(word = reorder(word, n))

# Plot the count of the top 15 unique words after stop words removal
ggplot(word_freq_no_stopwords, aes(x = word, y = n)) +
geom_col() +
coord_flip() +
labs(x = "Unique words", y = "Count",
title = "Count of unique words found in tweets (Stop words removed)")

RESULT
Thus the text mining on twitter data using twitterR package has been
implemented.

000 - Solar Photovoltaic Generators With MPPT and Battery Storage in Microgrids-File Exchange - MATLAB Central PDF
No ratings yet
000 - Solar Photovoltaic Generators With MPPT and Battery Storage in Microgrids-File Exchange - MATLAB Central PDF
20 pages
DCR
100% (1)
DCR
2 pages
Thematic Translation Installment 110 Chapter Az-Zukhruf
No ratings yet
Thematic Translation Installment 110 Chapter Az-Zukhruf
24 pages
English2 Q1 W4
No ratings yet
English2 Q1 W4
114 pages
Bhagwad Gita Mahatmya: Sanskrit Versus and Its English Translation
No ratings yet
Bhagwad Gita Mahatmya: Sanskrit Versus and Its English Translation
36 pages
Proof of The Rogers-Ramanujan Identities
No ratings yet
Proof of The Rogers-Ramanujan Identities
11 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
Poem 1 - Mending Wall
100% (1)
Poem 1 - Mending Wall
2 pages
Fear
No ratings yet
Fear
18 pages
Uzzy Ogic: Amit Raj Satyal Bigyan Sapkota Krishna Paudyal Simon Shrestha Subash Paudyal 14 February 2012
No ratings yet
Uzzy Ogic: Amit Raj Satyal Bigyan Sapkota Krishna Paudyal Simon Shrestha Subash Paudyal 14 February 2012
54 pages
Are NLP Models Really Able To Solve Simple Math Word Problems?
No ratings yet
Are NLP Models Really Able To Solve Simple Math Word Problems?
15 pages
Chapter I Side One
No ratings yet
Chapter I Side One
13 pages
Unit 1 - Lesson C
No ratings yet
Unit 1 - Lesson C
29 pages
Bible Study Manual
No ratings yet
Bible Study Manual
63 pages
Assignment No 1 (Data Science) - Ashber
No ratings yet
Assignment No 1 (Data Science) - Ashber
9 pages
R Studio Assignments
No ratings yet
R Studio Assignments
95 pages
BIO259 Note
No ratings yet
BIO259 Note
55 pages
Openbiz 2.4 Manual
100% (1)
Openbiz 2.4 Manual
122 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Screenshot 2024-08-31 at 11.57.48 PM
No ratings yet
Screenshot 2024-08-31 at 11.57.48 PM
31 pages
Bigdata Programs&Solutions
No ratings yet
Bigdata Programs&Solutions
7 pages
Bda Lab
No ratings yet
Bda Lab
39 pages
2018 Alessi - Formation of Planetary Populations I. Metallicity and Envelope Opacity Effects
No ratings yet
2018 Alessi - Formation of Planetary Populations I. Metallicity and Envelope Opacity Effects
19 pages
R-Programming Record - Odd Sem 21-22
No ratings yet
R-Programming Record - Odd Sem 21-22
35 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Astralith
No ratings yet
Astralith
1 page
Salinan Terjemahan BIG4-Song
No ratings yet
Salinan Terjemahan BIG4-Song
2 pages
CH 3
No ratings yet
CH 3
33 pages
SEC Notes
No ratings yet
SEC Notes
62 pages
50 R Exercises
No ratings yet
50 R Exercises
44 pages
RemoveWatermark pdf24 Merged+
No ratings yet
RemoveWatermark pdf24 Merged+
76 pages
Soran Bushi: Section 1 - Musical Analysis - Traditional Japanese Song
No ratings yet
Soran Bushi: Section 1 - Musical Analysis - Traditional Japanese Song
15 pages
Notes For Oral Comunication
No ratings yet
Notes For Oral Comunication
9 pages
R Programming LAB
No ratings yet
R Programming LAB
32 pages
Essay Questions - Winter Exam 2014
No ratings yet
Essay Questions - Winter Exam 2014
2 pages
My R Report
No ratings yet
My R Report
52 pages
R Record
No ratings yet
R Record
16 pages
Quinceañera
No ratings yet
Quinceañera
9 pages
c00-Catia-V5 Introduction To CATIA
No ratings yet
c00-Catia-V5 Introduction To CATIA
31 pages
R Prgms
No ratings yet
R Prgms
12 pages
R Basics PDF
No ratings yet
R Basics PDF
10 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
Sheet
No ratings yet
Sheet
2 pages
Unit 4 - Lesson 5 Assessment
No ratings yet
Unit 4 - Lesson 5 Assessment
4 pages
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
No ratings yet
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
47 pages
An Introduction To R Language
No ratings yet
An Introduction To R Language
11 pages
R Assignment
No ratings yet
R Assignment
9 pages
DA Practical File
No ratings yet
DA Practical File
36 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
NVLD Brochure
100% (1)
NVLD Brochure
2 pages
Grade11 Datascience
No ratings yet
Grade11 Datascience
4 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
EITK - Assignment 3
No ratings yet
EITK - Assignment 3
3 pages
Dav Lab
No ratings yet
Dav Lab
54 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
R Tools LAB
No ratings yet
R Tools LAB
31 pages
DAV LAB3.pdf 20250306 141450 0000
No ratings yet
DAV LAB3.pdf 20250306 141450 0000
57 pages
R Programming
No ratings yet
R Programming
22 pages
Arunav Da Prac
No ratings yet
Arunav Da Prac
55 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
Differential Equation
No ratings yet
Differential Equation
13 pages
Dav Lab
No ratings yet
Dav Lab
55 pages
R Lab Manual
No ratings yet
R Lab Manual
16 pages
Ex 3
No ratings yet
Ex 3
20 pages
Data Anlytics Using R Notes
No ratings yet
Data Anlytics Using R Notes
14 pages
A1rib T4
No ratings yet
A1rib T4
5 pages
Singh Project1 Report
No ratings yet
Singh Project1 Report
12 pages
CMAT 2025 Top-25 Static GK Questions On Books and Authors
No ratings yet
CMAT 2025 Top-25 Static GK Questions On Books and Authors
23 pages
Certificate: Alard College of Business Studies
No ratings yet
Certificate: Alard College of Business Studies
55 pages
Introduction To R Chap 2
No ratings yet
Introduction To R Chap 2
30 pages
Question Paper 1 Answers (R) by Siddu
No ratings yet
Question Paper 1 Answers (R) by Siddu
17 pages
Practical 1 - Basics of R
No ratings yet
Practical 1 - Basics of R
8 pages
DAR Question Bank (All Module)
No ratings yet
DAR Question Bank (All Module)
6 pages
Bugreport Fire - Global AP3A.240905.015.A2 2025 05 13 11 25 05 Dumpstate - Log 15800
No ratings yet
Bugreport Fire - Global AP3A.240905.015.A2 2025 05 13 11 25 05 Dumpstate - Log 15800
15 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
R Programming Lab
No ratings yet
R Programming Lab
14 pages
Ashwanth - SP Practical File 24 - 25 Data Science
No ratings yet
Ashwanth - SP Practical File 24 - 25 Data Science
30 pages
Lab 02 - Compound Data Structures
No ratings yet
Lab 02 - Compound Data Structures
12 pages
Practical Programs
No ratings yet
Practical Programs
29 pages
R File Code
No ratings yet
R File Code
16 pages
Programming Foundation
No ratings yet
Programming Foundation
14 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
44 pages
R-1ST Internal-Lab Notes
No ratings yet
R-1ST Internal-Lab Notes
14 pages
R Lab Set2
No ratings yet
R Lab Set2
3 pages
Practical
No ratings yet
Practical
47 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Prog 9,10,11,12
No ratings yet
Prog 9,10,11,12
7 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet