0% found this document useful (0 votes)

31 views21 pages

Lab Record

Uploaded by

vyastanay30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views21 pages

Lab Record

Uploaded by

vyastanay30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

lOMoARcPSD|41453364

Lab Record 21BCG10126 - hgv 7huyh bihkbih

Computer Science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Tanay Vyas ([email protected])
lOMoARcPSD|41453364

VIT Bhopal University

NAS1001 – Associative Data Analytics (LTP-4)

Slot: B11+B12+B13+B14
Class ID: BL2023241000207
FALL SEMESTER 2023-2024

Course Instructor: Dr. D Lakshmi

Name of the Student: Aniket Shrivastava

List of Experiments

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

List of Challenging Experiments (Indicative) SLO:

1,2,5,9,12

1. Understanding of R System and installation and configuration of R 1-4

Environment and R-Studio, Understanding R Packages, their installation
and management

2. Understanding of nuts and bolts of R: 4-5

a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

3. Excel and R integration with R connector. 5-7

4. Preparing Data in R 7-9

a. Data Cleaning
b. Data imputation
c. Data conversion

5. Outliers detection using R 9-12

6. Correlation and Regression Analysis in R 10-13

7. Clustering Algorithms implementation using R 13-15

8. Classification Algorithm implementation using R 15-17

Classification (Spam/Not spam)

9. Case study on Stock Market Analysis and applications. Stock data can be 17-19
obtained from Yahoo! Finance, Google Finance. A team of students can
apply statistical modeling on the stock data to uncover hidden patterns. R
provides tools for moving averages, auto regression and time-series
analysis which forms the crux of financial applications.

10. Detect credit card fraudulent transactions - The dataset can be obtained 19-20
from Kaggle. The team will use a variety of machine learning algorithms
that will be able to discern fraudulent from non-fraudulent one.

Experiment No: 1

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Aim: Understanding of R System and installation and configuration of R Environment and R-

Studio, Understanding R Packages, their installation and management

Data Description: R is a programming language for statistical computing and graphics

supported by the R Core Team and the R Foundation for Statistical Computing.
Designed by: Ross Ihaka, Robert Gentleman

Installing R:

Download R:

1. Go to the R Project's official website: https://www.r-project.org/

2. Click on the "CRAN" link under the "Download and Install R" section.
3. For Windows: Double-click the downloaded executable file and follow the installation
instructions.
4. For macOS: Double-click the downloaded package file and follow the installation
instructions.
5. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing RStudio:

Download RStudio:

1. Go to the RStudio download page: https://www.rstudio.com/products/rstudio/download/

2. Under "RStudio Desktop," click the appropriate download link for your operating system
(Windows, macOS, or Linux).
3. Install RStudio:
4. For Windows: Double-click the downloaded installer and follow the installation
instructions.
5. For macOS: Double-click the downloaded disk image (.dmg) file, drag the RStudio icon
to the Applications folder, and then open RStudio from the Applications folder.
6. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing R packages
It is a fundamental part of working with R. R packages contain pre-built functions, data sets, and
documentation that extend the capabilities of the R programming language. Here are the steps
to install R packages using the R console within RStudio:

Open RStudio:
Launch RStudio on your computer.

Open R Console:

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Once RStudio is open, you'll see several panels. The left-top panel is the R Console. This is
where you can directly interact with R by typing commands.

Install a Package:
To install an R package, you'll use the install.packages() function followed by the name of the
package you want to install. For example, to install the "ggplot2" package, type the following
command in the R Console and press Enter: install.packages("ggplot2")

Load the Package:

After installing a package, you need to load it into your R session to use its functions. Use the
library() function for this purpose. For example, to load the "ggplot2" package, type:
library(ggplot2)

Experiment No: 2

Aim: Understanding of nuts and bolts of R:

a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

Data Description

a. R Program Structure: An R program consists of a series of commands

that are executed sequentially. These commands can be typed directly into
the R console or saved in a script file with a .R extension.

b. R Data Types, Command Syntax, and Control Structures: R

supports various data types, including numeric, character, logical, factor, and
more. Here's a quick overview: Numeric: Used for storing numeric values
(integers or decimals). Character: Used for storing text data. Logical:
Represents binary values TRUE or FALSE. Factor: Represents categorical data
with levels or categories.

c. File Operations in R: R provides functions to perform various file

operations:

R Code

a. R Program Structure:

library(package_name)

print(result)
my_function <- function(arg1, arg2) {
return(result)

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

result <- my_function(value1, value2)

b. R Data Types, Command Syntax, and Control Structures:

x <- 5
name <- "John"
is_valid <- TRUE
sum_result <- 3 + 7

c. File Operations in R:
Reading files
# Reading text files
data <- read.table("data.txt", header = TRUE)

# Reading CSV files

data <- read.csv("data.csv")

# Reading Excel files (requires 'readxl' package)

library(readxl)
data <- read_excel("data.xlsx")

Writing files
# Writing data to text file
write.table(data, "output.txt", sep = "\t", row.names = FALSE)

# Writing data to CSV file

write.csv(data, "output.csv", row.names = FALSE)

# Writing data to Excel file (requires 'openxlsx' package)

library(openxlsx)
write.xlsx(data, "output.xlsx")

Experiment No: 3

Aim: Excel and R integration with R connector.

Data Description:
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.
Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
> install.packages("csv")
> library("csv")
> Salary_Dataset = read.csv(file.choose(), 1)
> Salary_Dataset

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 4

Aim: Preparing Data in R

a. Data Cleaning
b. Data imputation
c. Data conversion

Data Description

In this example, the CSV file has two columns:

experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
# Load libraries
library(dplyr)
library(missForest)

# Read dataset
data <- read.csv("data.csv")

# Data Cleaning
cleaned_data <- data %>%
distinct() %>%
select(-Irrelevant_Column)

# Check for missing values

missing_values <- sum(is.na(cleaned_data))

if (missing_values > 0) {
# Data Imputation
imputed_data <- missForest(cleaned_data, verbose = TRUE)
} else {
imputed_data <- cleaned_data
}

# Data Conversion (if needed)

imputed_data$Categorical_Column <- as.factor(imputed_data$Categorical_Column)

# Display prepared dataset

print(imputed_data)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 5

Aim: Outliers detection using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code

Sample Input and Output

Experiment No: 6

Aim: Correlation and Regression Analysis in R

Data Description

In this example, the CSV file has two columns:

experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Sample rows and columns

R Code

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 7

Aim: Clustering Algorithms implementation using R

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Sample Input and Output

Experiment No: 8

Aim: Classification Algorithm implementation using R

Classification (Spam/Not spam)

R Code

# Load required libraries

library(tm) # Text mining
library(e1071) # For Naive Bayes classifier
library(caret) # For model evaluation

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

# Load the SpamAssassin dataset (replace with your actual file path)
spam_data <- read.csv("path/to/spamassassin_data.csv", stringsAsFactors = FALSE)

# Preprocess the text data

corpus <- Corpus(VectorSource(spam_data$text))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)

# Create a document-term matrix

dtm <- DocumentTermMatrix(corpus)

# Convert the document-term matrix to a data frame

spam_df <- as.data.frame(as.matrix(dtm))
colnames(spam_df) <- make.names(colnames(spam_df))

# Combine with labels

spam_df$label <- spam_data$label

# Split data into training and testing sets

set.seed(123)
train_indices <- sample(1:nrow(spam_df), 0.7 * nrow(spam_df))
train_data <- spam_df[train_indices, ]
test_data <- spam_df[-train_indices, ]

# Train a Naive Bayes classifier

naive_bayes_model <- naiveBayes(label ~ ., data = train_data)

# Make predictions
predictions <- predict(naive_bayes_model, newdata = test_data, type = "class")

# Evaluate the model

conf_matrix <- confusionMatrix(predictions, test_data$label)
print(conf_matrix)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 9

Aim:Case study on Stock Market Analysis and applications. Stock data can be obtained from
Yahoo! Finance, Google Finance. A team of students can apply statistical modeling on the stock
data to uncover hidden patterns. R provides tools for moving averages, auto regression and
time-series analysis which forms the crux of financial applications.

Data Description

Stock data imported from Yahoo FInances.

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

R Code
# Load required libraries
library(dplyr)
library(lubridate)

# Read the stock data CSV file (or load data from API)
stock_data <- read.csv("stock_data.csv")

# Convert date column to Date format

stock_data$Date <- ymd(stock_data$Date)

# Calculate 50-day and 200-day moving averages

stock_data$MA_50 <- SMA(stock_data$Close, n = 50)
stock_data$MA_200 <- SMA(stock_data$Close, n = 200)

# Load required library

library(forecast)

# Convert data to time series format

stock_ts <- ts(stock_data$Close, frequency = 365)

# Fit auto-regression model (ARIMA)

ar_model <- auto.arima(stock_ts)

# Load required libraries

library(ggplot2)
library(forecast)

# Decompose time series into trend, seasonal, and residual components

decomposed <- decompose(stock_ts)

# Plot decomposed components

plot(decomposed)

# Create a time series plot of stock prices and moving averages

ggplot(stock_data, aes(x = Date)) +
geom_line(aes(y = Close, color = "Stock Price")) +
geom_line(aes(y = MA_50, color = "50-day MA")) +
geom_line(aes(y = MA_200, color = "200-day MA")) +
labs(title = "Stock Price and Moving Averages", y = "Price") +
scale_color_manual(values = c("Stock Price" = "blue", "50-day MA" = "red", "200-day MA" =
"green"))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

Experiment No: 10

Aim: Detect credit card fraudulent transactions - The dataset can be obtained from Kaggle. The
team will use a variety of machine learning algorithms that will be able to discern fraudulent
from non-fraudulent one.

Data Description
The dataset was obtained from Kaggle

R Code
# Load required libraries
library(AnomalyDetection)
library(randomForest)

# Load the CreditCardFraud dataset

data("CreditCardFraud")

# Split data into training and testing sets (70% training, 30% testing)
set.seed(123)
train_indices <- sample(1:nrow(CreditCardFraud), 0.7 * nrow(CreditCardFraud))
train_data <- CreditCardFraud[train_indices, ]
test_data <- CreditCardFraud[-train_indices, ]

# Build Random Forest model

rf_model <- randomForest(Class ~ ., data = train_data, ntree = 100)

# Make predictions
predictions <- predict(rf_model, newdata = test_data)

Downloaded by Tanay Vyas ([email protected])

lOMoARcPSD|41453364

# Calculate accuracy
accuracy <- sum(predictions == test_data$Class) / nrow(test_data)
print(paste("Accuracy score on Test Data: :", accuracy))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Compiled Notes: Mscfe 610 Econometrics
No ratings yet
Compiled Notes: Mscfe 610 Econometrics
44 pages
R Programming Lab
No ratings yet
R Programming Lab
18 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
28 pages
R Programmimg Lab FIle
No ratings yet
R Programmimg Lab FIle
35 pages
Stats With R
No ratings yet
Stats With R
103 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
R Prog Lab Manual Theory
No ratings yet
R Prog Lab Manual Theory
16 pages
S24 Stats10 Lab1-1
No ratings yet
S24 Stats10 Lab1-1
8 pages
R Programming Lab
No ratings yet
R Programming Lab
26 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
R Manual
No ratings yet
R Manual
10 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
Experiment 1
100% (2)
Experiment 1
7 pages
R Tutorial
No ratings yet
R Tutorial
100 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
No ratings yet
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
22 pages
R Gettingstarted
No ratings yet
R Gettingstarted
7 pages
Pran Jal
No ratings yet
Pran Jal
54 pages
Vinit R Programming
No ratings yet
Vinit R Programming
39 pages
Part I: Introductory Materials: Introduction To R
No ratings yet
Part I: Introductory Materials: Introduction To R
25 pages
R Lab
No ratings yet
R Lab
114 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Chap 1
No ratings yet
Chap 1
32 pages
R Language Lab Manual Lab 1
No ratings yet
R Language Lab Manual Lab 1
32 pages
20ITPL702 - DataScienceWithMachineLearning
No ratings yet
20ITPL702 - DataScienceWithMachineLearning
69 pages
R Basic
No ratings yet
R Basic
16 pages
Unit 1 - Data Analysis Using R
No ratings yet
Unit 1 - Data Analysis Using R
28 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
R Lanaguage
No ratings yet
R Lanaguage
25 pages
Introduction To R
No ratings yet
Introduction To R
23 pages
W1 Class Overview and R Basics
No ratings yet
W1 Class Overview and R Basics
33 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
R Program Questions 1-24
No ratings yet
R Program Questions 1-24
56 pages
R - Lab Experiments - Manual
No ratings yet
R - Lab Experiments - Manual
39 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
R Programming Lab
100% (1)
R Programming Lab
46 pages
Introduction+to+R++and+Data++Analysis+with+R++Final+9 20 23
No ratings yet
Introduction+to+R++and+Data++Analysis+with+R++Final+9 20 23
28 pages
R Programming
No ratings yet
R Programming
22 pages
R Programming Lab
No ratings yet
R Programming Lab
33 pages
Introduction To R Installation: Data Types Value Examples
No ratings yet
Introduction To R Installation: Data Types Value Examples
9 pages
Introduction To R: 1 Getting Started
No ratings yet
Introduction To R: 1 Getting Started
14 pages
Rintro
No ratings yet
Rintro
14 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
46 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
Presentation of R
No ratings yet
Presentation of R
109 pages
R Introduction
No ratings yet
R Introduction
4 pages
Beginner Guide To R and R Studio V1
No ratings yet
Beginner Guide To R and R Studio V1
27 pages
Introduction To R: Exercises: Aboratory For Pplied Tatistics Elle Ørensen Niversity of Openhagen Ugust
No ratings yet
Introduction To R: Exercises: Aboratory For Pplied Tatistics Elle Ørensen Niversity of Openhagen Ugust
42 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
Introduction To R (Used in PSYC8010)
No ratings yet
Introduction To R (Used in PSYC8010)
24 pages
R Programming Insights Textbook
From Everand
R Programming Insights Textbook
Manish Soni
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet
Topic 1 Data Management in R EDUC 216
No ratings yet
Topic 1 Data Management in R EDUC 216
13 pages
R in Action 3rd Edition Robert I. Kabacoff - Download The Ebook Today and Own The Complete Version
100% (3)
R in Action 3rd Edition Robert I. Kabacoff - Download The Ebook Today and Own The Complete Version
69 pages
Lavaan: An R Package For Structural Equation Modeling
No ratings yet
Lavaan: An R Package For Structural Equation Modeling
20 pages
Big Data Data Analytics
No ratings yet
Big Data Data Analytics
5 pages
Implementing Spatial Data Analysis Software Tools Inr
No ratings yet
Implementing Spatial Data Analysis Software Tools Inr
14 pages
Time Series
100% (5)
Time Series
45 pages
Using The R Commander A Point-And-Click Interface For The R by Fox, John
No ratings yet
Using The R Commander A Point-And-Click Interface For The R by Fox, John
238 pages
Getting Started With R: An Introduction For Biologists 2nd Edition Andrew Beckerman PDF Download
No ratings yet
Getting Started With R: An Introduction For Biologists 2nd Edition Andrew Beckerman PDF Download
52 pages
Answer Key
No ratings yet
Answer Key
3 pages
R Programming Notes
No ratings yet
R Programming Notes
76 pages
R Module 1
No ratings yet
R Module 1
34 pages
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
100% (1)
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
10 pages
R Unit 1
No ratings yet
R Unit 1
35 pages
Compare Data Mining Tools
No ratings yet
Compare Data Mining Tools
11 pages
Eric C. Chi's CV
No ratings yet
Eric C. Chi's CV
14 pages
ESDL Lab Manual
No ratings yet
ESDL Lab Manual
7 pages
Optimizacion R
No ratings yet
Optimizacion R
9 pages
What Is The Best Programming Language To Learn For Operations Research - Quora
0% (1)
What Is The Best Programming Language To Learn For Operations Research - Quora
4 pages
R Course File
No ratings yet
R Course File
18 pages
Kaks - Calculator Toolbox 2.0: User'S Manual
No ratings yet
Kaks - Calculator Toolbox 2.0: User'S Manual
19 pages
Mastering Python Scientific Computing - Sample Chapter
33% (3)
Mastering Python Scientific Computing - Sample Chapter
25 pages
R Programming Language
No ratings yet
R Programming Language
16 pages
R Programming Lab
No ratings yet
R Programming Lab
46 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
List of Statistical Packages
No ratings yet
List of Statistical Packages
4 pages
Functions
No ratings yet
Functions
6 pages