0% found this document useful (0 votes)
145 views47 pages

Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)

This document contains a lab file for the Introduction to Data Analytics and Visualization course. It includes 8 programs to perform various data analysis tasks in R such as numerical operations on vectors, importing/exporting data, matrix operations, statistical analysis, data preprocessing, dimensionality reduction using PCA, linear regression, and market basket analysis using the Apriori algorithm. For each program, the objective, code, sample output, and programmer name is provided.

Uploaded by

hemuup08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views47 pages

Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)

This document contains a lab file for the Introduction to Data Analytics and Visualization course. It includes 8 programs to perform various data analysis tasks in R such as numerical operations on vectors, importing/exporting data, matrix operations, statistical analysis, data preprocessing, dimensionality reduction using PCA, linear regression, and market basket analysis using the Apriori algorithm. For each program, the objective, code, sample output, and programmer name is provided.

Uploaded by

hemuup08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Galgotias College of Engineering & Technology

Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow

Department of Computer Science & Engineering

INRODUCTION TO DATA ANALYTICS AND


VISUALIZATION LAB FILE
(KDS-551)

Name HIMANSHU UPADHYAY

Roll No. 2100971549003

Section CS-DS (V SEM)

Batch (D1/D2) D1

Submitted by
PRAMIT KUMAR SAMANT
INDEX
Experiment Experiment Name Date of Date of Grade Faculty
No. Conduction Submission Signature

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-1
OBJECTIVE-To get the input from user and perform numerical operations
(MAX, MIN, AVG, SUM, SQRT, ROUND) using in R.

Program-
#Create a Vector
> data=c(23,4,56,21,34,56,73)
> #Get the maximum value
> print(max(data))
[1] 73
> #Get the minimum value
> print(min(data))
[1] 4
> #Get the SUM-
> sum(data)
[1] 267
> #Get the AVG-
> print(mean(data))
[1] 38.14286
> #Get the SQRT-
> a=5
> print(sqrt(a))
[1] 2.236068
> a=5.2
> #Get the ROUND-
> print(round(a))
[1] 5

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

MAX & MIN-

SUM & AVG-

SQRT & ROUND-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-2
OBJECTIVE- To perform data import/export (.CSV, .XLS, .TXT) operations using
data frames in R.

Program-
#.CSV
> read.data<-read.csv("C:/Users/sidsh/OneDrive/Desktop/College Work/5th-
Semester/D.A.V Lab/business-financial-data-june-2023-quarter-csv.csv")
> print(read.data)

#.XLS
> install.packages("readxl")
> library(readxl)
> excel_data <-
read_excel("C:/Users/sidsh/Downloads/file_example_XLS_10.xls")
> print(excel_data)

#.TXT
> txt_data <- read.table("C:/Users/sidsh/OneDrive/Desktop/R_Language.txt",
header = TRUE, sep = "\t")
> print(excel_data)

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-
CSV FILE-

XML FILE-

TXT FILE-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-3

OBJECTIVE- To get the input matrix from user and perform Matrix addition,
subtraction, multiplication, inverse transpose and division operations using
vector concept in R.
Program-
#MATRIX CREATION
> {r = readline("Enter the number of rows:")}
> r = as.integer(r)
> {c = readline("Enter the number of columns:")}
> c = as.integer(c)
#MTRIX VALUES:
> A = scan()
> B = scan()
> M1 = matrix(A,nrow = r,ncol = c,byrow = TRUE)
> M2 = matrix(B,nrow = r,ncol = c,byrow = TRUE)
#MATRIX M1:
> print(M1)
#MATRIX M2:
> print(M2)
#1.ADDITION-
> print(M1+M2)
#2.SUBTRACTION-
>print(M1-M2)
#3.MULTIPLICATION-
>print(M1*M2)

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-4

OBJECTIVE- To perform statistical operations (Mean, Median, Mode and


Standard deviation) using R.

Program-
# DEFINING VECTOR
> a=c(23,84,16,95,23,6,41,29,6,4,6)
#1.MEAN-
> print(mean(a))
#2.MEDIAN-
> print(median(a))
#3.MODE-
> getmode <- function(a) {
uniqv <- unique(a)
uniqv[which.max(tabulate(match(a, uniqv)))]
}
> print(getmode(a))
#4.STANDARD DEVIATION-
> print(sd(a))

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

PROGRAM-5
PROGRAMMED BY- HIMANSHU UPADHYAY
OBJECTIVE- To perform data pre-processing operation (1) Handle mining data
(2) Min-Max normalization

Program-
#Handle Mining Data-
x = c(NA,3,4,NA,NA,NA)
is.na(x)
x = c(NA,3 4,NA,NA,0/0,0/0)
is. nan (x)

#Min-Max Normlization-
install.packages("caret")
library(caret)
data = data.frame(Var1 = c(120, 345, 145, 522, 596, 285, 21), Var2 = c(10,
15, 45, 22, 53, 28, 12), Var3 = c(-34, 0.05, 0-15, 0-12, -6, 0·85, 0.11))

#Creating Function To Implement Min-Max Scaling-


MinMax = function (x) {(x-min(x))/(max(x)-min(x))}

#Normalize Data Using Custom Function-


Normalized_My_Data = as.data.frame(apply(data, 1, MinMax))
head(Normalized_My_Data)

#Checking Summary After Normalization-


>summary (Normalized_My_Data)

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-6
OBJECTIVE- To perform dimensionality reduction operation using PCA for
houses data set in R.

Program-
> data("USArrests")
> rawdf <- na.omit(USArrests)
> names(rawdf)=c("Murder","Assault", "Assasination”, "UrbanPop")
> head(rawdf)
> arrests.pca <- prcomp(scale(USArrests),center = TRUE)

#Checking output of pca. prcomp function returns standard deviation


(sdev), rotation and loadings-

> names(arrests.pca)
> print(arrests.pca)
> summary(arrests.pca)
> pcaCharts(arrests.pca)
> biplot(arrests.pca,scale=0, cex=.7)
> pca.out <- arrests.pca
> pca.out$rotation <- -pca.out$rotation
> pca.out$x <- -pca.out$x
> biplot(pca.out,scale=0, cex=.7)
> pca.out$rotation[,1:2]

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-7
OBJECTIVE- To perform single linear regression with R.

Program-
#Create the predictor and response variable:

PROGRAMMED BY- HIMANSHU UPADHYAY


> x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
> y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
> relation <- lm(y~x)

#Give the chart file a name:


> png(file = "linearregression.png")

#Plot the chart:


> plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in
cm")

#Save the file:


> dev.off()

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM – 8

OBJECTIVE – To perform market basket analysis / Apriori algorithm on given


data set.
Program –
library(arules)

library(arulesViz)

library(RColorBrewer)

data("Groceries")

rules <- apriori(Groceries, parameter = list(supp = 0.01, conf = 0.2))

inspect(rules[1:10])

arules::itemFrequencyPlot(Groceries, topN = 20,

col = brewer.pal(8, 'Pastel2'),

main = 'Relative Item Frequency Plot',

type = "relative",

ylab = "Item Frequency (Relative)")

Output –

PROGRAMMED BY- HIMANSHU UPADHYAY


Galgotias College of Engineering & Technology
Affiliated to Dr. A.P.J. Abdul Kalam Technical University, Lucknow

Department of Computer Science & Engineering

INRODUCTION TO DATA ANALYTICS AND


VISUALIZATION LAB FILE
(KDS-551)

Name SIDDHARTH SHARMA

Roll No. 2100971540058

Section CS-DS (V SEM)

Batch (D1/D2) D2

PROGRAMMED BY- HIMANSHU UPADHYAY


INDEX
Experiment Experiment Name Date of Date of Grade Faculty
No. Conduction Submission Signature

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-1
OBJECTIVE-To get the input from user and perform numerical operations (MAX, MIN, AVG,
SUM, SQRT, ROUND) using in R.

Program-
#Create a Vector

> data=c(23,4,56,21,34,56,73)

> #Get the maximum value

> print(max(data))

[1] 73

> #Get the minimum value

> print(min(data))

[1] 4

> #Get the SUM-

> sum(data)

[1] 267

> #Get the AVG-

> print(mean(data))

[1] 38.14286

> #Get the SQRT-

> a=5

> print(sqrt(a))

[1] 2.236068

> a=5.2

> #Get the ROUND-

> print(round(a))

[1] 5

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

MAX & MIN-

SUM & AVG-

SQRT & ROUND-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-2
OBJECTIVE- To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in
R.

Program-
#.CSV
> read.data<-read.csv("C:/Users/sidsh/OneDrive/Desktop/College Work/5th-
Semester/D.A.V Lab/business-financial-data-june-2023-quarter-csv.csv")
> print(read.data)

#.XLS
> install.packages("readxl")
> library(readxl)
> excel_data <-
read_excel("C:/Users/sidsh/Downloads/file_example_XLS_10.xls")
> print(excel_data)

#.TXT
> txt_data <- read.table("C:/Users/sidsh/OneDrive/Desktop/R_Language.txt",
header = TRUE, sep = "\t")
> print(excel_data)

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


CSV FILE-

XML FILE-

TXT FILE-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-3
OBJECTIVE- To get the input matrix from user and perform Matrix addition, subtraction,
multiplication, inverse transpose and division operations using vector concept in R.

Program-
#MATRIX CREATION
> {r = readline("Enter the number of rows:")}
> r = as.integer(r)
> {c = readline("Enter the number of columns:")}
> c = as.integer(c)
#MTRIX VALUES:
> A = scan()
> B = scan()
> M1 = matrix(A,nrow = r,ncol = c,byrow = TRUE)
> M2 = matrix(B,nrow = r,ncol = c,byrow = TRUE)
#MATRIX M1:
> print(M1)
#MATRIX M2:
> print(M2)
#1.ADDITION-
> print(M1+M2)
#2.SUBTRACTION-
>print(M1-M2)
#3.MULTIPLICATION-
>print(M1*M2)

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-4
OBJECTIVE- To perform statistical operations (Mean, Median, Mode and Standard deviation)
using R.

Program-
# DEFINING VECTOR
> a=c(23,84,16,95,23,6,41,29,6,4,6)
#1.MEAN-
> print(mean(a))
#2.MEDIAN-
> print(median(a))
#3.MODE-
> getmode <- function(a) {
uniqv <- unique(a)
uniqv[which.max(tabulate(match(a, uniqv)))]
}
> print(getmode(a))
#4.STANDARD DEVIATION-
> print(sd(a))

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-5
OBJECTIVE- To perform data pre-processing operation (1) Handle mining data (2) Min-Max
normalization

Program-
#Handle Mining Data-
x = c(NA,3,4,NA,NA,NA)
is.na(x)
x = c(NA,3 4,NA,NA,0/0,0/0)
is. nan (x)

#Min-Max Normlization-
install.packages("caret")
library(caret)
data = data.frame(Var1 = c(120, 345, 145, 522, 596, 285, 21), Var2 = c(10,
15, 45, 22, 53, 28, 12), Var3 = c(-34, 0.05, 0-15, 0-12, -6, 0·85, 0.11))

#Creating Function To Implement Min-Max Scaling-


MinMax = function (x) {(x-min(x))/(max(x)-min(x))}

PROGRAMMED BY- HIMANSHU UPADHYAY


#Normalize Data Using Custom Function-
Normalized_My_Data = as.data.frame(apply(data, 1, MinMax))
head(Normalized_My_Data)

#Checking Summary After Normalization-


>summary (Normalized_My_Data)

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-6
OBJECTIVE- To perform dimensionality reduction operation using PCA for houses data set in
R.

Program-
> data("USArrests")

PROGRAMMED BY- HIMANSHU UPADHYAY


> rawdf <- na.omit(USArrests)
> names(rawdf)=c("Murder","Assault", "Assasination”, "UrbanPop")
> head(rawdf)
> arrests.pca <- prcomp(scale(USArrests),center = TRUE)

#Checking output of pca. prcomp function returns standard deviation


(sdev), rotation and loadings-

> names(arrests.pca)
> print(arrests.pca)
> summary(arrests.pca)
> pcaCharts(arrests.pca)
> biplot(arrests.pca,scale=0, cex=.7)
> pca.out <- arrests.pca
> pca.out$rotation <- -pca.out$rotation
> pca.out$x <- -pca.out$x
> biplot(pca.out,scale=0, cex=.7)
> pca.out$rotation[,1:2]

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-7
OBJECTIVE- To perform single linear regression with R.

Program-
#Create the predictor and response variable:
> x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
> y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
> relation <- lm(y~x)
#Give the chart file a name:
> png(file = "linearregression.png")
#Plot the chart:
> plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in
cm")
#Save the file:
> dev.off()

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM – 8

OBJECTIVE – To perform market basket analysis / Apriori algorithm on given


data set.
Program –
library(arules)

library(arulesViz)

library(RColorBrewer)

data("Groceries")

rules <- apriori(Groceries, parameter = list(supp = 0.01, conf = 0.2))

inspect(rules[1:10])

arules::itemFrequencyPlot(Groceries, topN = 20,

col = brewer.pal(8, 'Pastel2'),

main = 'Relative Item Frequency Plot',

type = "relative",

ylab = "Item Frequency (Relative)")

Output –

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-9
OBJECTIVE- To perform K-means clustering operation and visualize for iris data
set.

Program-
>install.packages(“stats”)

>install.packages(“cluster”)

>install.packages(“ClusterR”)

>library(stats)

>library(cluster)

>library(ClusterR)

> # Removing initial label of

> # Species from original dataset

> iris_1 <- iris[, -5]

> # Fitting K-Means clustering Model

> # to training dataset

> set.seed(240) # Setting seed

> kmeans.re <- kmeans(iris_1, centers = 3, nstart = 20)

> kmeans.re

> # Cluster identification for

> # each observation

> kmeans.re$cluster

> # Confusion Matrix

> cm <- table(iris$Species, kmeans.re$cluster)

> cm

> # Model Evaluation and visualization

> plot(iris_1[c("Sepal.Length", "Sepal.Width")])

> plot(iris_1[c("Sepal.Length", "Sepal.Width")],

+ col = kmeans.re$cluster)

> plot(iris_1[c("Sepal.Length", "Sepal.Width")],

+ col = kmeans.re$cluster,

PROGRAMMED BY- HIMANSHU UPADHYAY


+ main = "K-means with 3 clusters")

> ## Plotiing cluster centers

> kmeans.re$centers

> kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")]

> # cex is font size, pch is symbol

> points(kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")],

+ col = 1:3, pch = 8, cex = 3)

> ## Visualizing clusters

> y_kmeans <- kmeans.re$cluster

> clusplot(iris_1[, c("Sepal.Length", "Sepal.Width")],

+ y_kmeans,

+ lines = 0,

+ shade = TRUE,

+ color = TRUE,

+ labels = 2,

+ plotchar = FALSE,

+ span = TRUE,

+ main = paste("Cluster iris"),

+ xlab = 'Sepal.Length',

+ ylab = 'Sepal.Width')

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-10
OBJECTIVE- Write R script to diagnose any disease using KNN classification and
plot the results.

Program-
>install.packages("class")

>install.packages("ggplot2")

>library(class)

>library(ggplot2)

>features <- iris[, c("Sepal.Length", "Sepal.Width")]

>diagnosis <- rep(c("Healthy", "Diseased"), each = 75)

>train_indices <- sample(1:nrow(diagnosis_data), 0.7 * >nrow(diagnosis_data))

>train_data <- diagnosis_data[train_indices, ]

>test_data <- diagnosis_data[-train_indices, ]

>k <- 3

>predicted_labels <- knn(train = train_data[, 1:2], test = test_data[, 1:2], cl = train_data$Diagnosis, k


= k)

>results <- data.frame(Actual = test_data$Diagnosis, Predicted = predicted_labels)

>print("Actual vs. Predicted:")

>print(results)

>ggplot(results, aes(x = Actual, fill = Predicted)) +

geom_bar(position = "dodge", color = "black", stat = "count") +

labs(title = "Disease Diagnosis Results",

x = "Actual Diagnosis",

y = "Count",

fill = "Predicted Diagnosis") +

theme_minimal()

PROGRAMMED BY- HIMANSHU UPADHYAY


Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-11
OBJECTIVE- : To convert the following dataset into bar.

Program-
> library(ggplot2)

> feature1 <- c(1,2,2,5,8,8,9,11)

> feature2 <- c(1,3,4,3,6,7,2,12) class <- c(1,1,1,1,2,2,2,3)

> circle <- data.frame(feature1,feature2,class)

> barplot(as.matrix(circle), main="Bar Plots", ylab="Serial_no", beside=TRUE)

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAM-12
OBJECTIVE- : WAP to print Decision tree in R for the data set Reading Skills.

Program-
> install packages(“party”)

> library(party)

> print(head(readingSkill))

> inpu.dat <- readingSkill[c(1:105)]

> png(file= “ecision_tree.png”)

> output.tree <- ctree(nativeSpeaker ~ age + shoeSize + score, data=input.dat)

>plot(output.tree)

>dev.off()

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAMMED BY- HIMANSHU UPADHYAY
PROGRAM-12
OBJECTIVE- : WAP to print Decision tree in R for the data set Reading Skills.

Program-
> install packages(“party”)

> library(party)

> print(head(readingSkill))

> inpu.dat <- readingSkill[c(1:105)]

> png(file= “ecision_tree.png”)

> output.tree <- ctree(nativeSpeaker ~ age + shoeSize + score, data=input.dat)

>plot(output.tree)

>dev.off()

Sample Output-

PROGRAMMED BY- HIMANSHU UPADHYAY


PROGRAMMED BY- HIMANSHU UPADHYAY

You might also like