0% found this document useful (0 votes)
29 views21 pages

Lab Record

Uploaded by

vyastanay30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views21 pages

Lab Record

Uploaded by

vyastanay30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

lOMoARcPSD|41453364

Lab Record 21BCG10126 - hgv 7huyh bihkbih

Computer Science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Tanay Vyas ([email protected])
lOMoARcPSD|41453364

VIT Bhopal University

NAS1001 – Associative Data Analytics (LTP-4)


Slot: B11+B12+B13+B14
Class ID: BL2023241000207
FALL SEMESTER 2023-2024

Course Instructor: Dr. D Lakshmi

Name of the Student: Aniket Shrivastava


Register Number: 21BCG10126

List of Experiments

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

List of Challenging Experiments (Indicative) SLO:


1,2,5,9,12

1. Understanding of R System and installation and configuration of R 1-4


Environment and R-Studio, Understanding R Packages, their installation
and management

2. Understanding of nuts and bolts of R: 4-5


a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

3. Excel and R integration with R connector. 5-7

4. Preparing Data in R 7-9


a. Data Cleaning
b. Data imputation
c. Data conversion

5. Outliers detection using R 9-12

6. Correlation and Regression Analysis in R 10-13

7. Clustering Algorithms implementation using R 13-15

8. Classification Algorithm implementation using R 15-17


Classification (Spam/Not spam)

9. Case study on Stock Market Analysis and applications. Stock data can be 17-19
obtained from Yahoo! Finance, Google Finance. A team of students can
apply statistical modeling on the stock data to uncover hidden patterns. R
provides tools for moving averages, auto regression and time-series
analysis which forms the crux of financial applications.

10. Detect credit card fraudulent transactions - The dataset can be obtained 19-20
from Kaggle. The team will use a variety of machine learning algorithms
that will be able to discern fraudulent from non-fraudulent one.

Experiment No: 1

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Aim: Understanding of R System and installation and configuration of R Environment and R-


Studio, Understanding R Packages, their installation and management

Data Description: R is a programming language for statistical computing and graphics


supported by the R Core Team and the R Foundation for Statistical Computing.
Designed by: Ross Ihaka, Robert Gentleman

Installing R:

Download R:

1. Go to the R Project's official website: https://www.r-project.org/


2. Click on the "CRAN" link under the "Download and Install R" section.
3. For Windows: Double-click the downloaded executable file and follow the installation
instructions.
4. For macOS: Double-click the downloaded package file and follow the installation
instructions.
5. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing RStudio:

Download RStudio:

1. Go to the RStudio download page: https://www.rstudio.com/products/rstudio/download/


2. Under "RStudio Desktop," click the appropriate download link for your operating system
(Windows, macOS, or Linux).
3. Install RStudio:
4. For Windows: Double-click the downloaded installer and follow the installation
instructions.
5. For macOS: Double-click the downloaded disk image (.dmg) file, drag the RStudio icon
to the Applications folder, and then open RStudio from the Applications folder.
6. For Linux: Follow the installation instructions specific to your Linux distribution.

Installing R packages
It is a fundamental part of working with R. R packages contain pre-built functions, data sets, and
documentation that extend the capabilities of the R programming language. Here are the steps
to install R packages using the R console within RStudio:

Open RStudio:
Launch RStudio on your computer.

Open R Console:

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Once RStudio is open, you'll see several panels. The left-top panel is the R Console. This is
where you can directly interact with R by typing commands.

Install a Package:
To install an R package, you'll use the install.packages() function followed by the name of the
package you want to install. For example, to install the "ggplot2" package, type the following
command in the R Console and press Enter: install.packages("ggplot2")

Load the Package:


After installing a package, you need to load it into your R session to use its functions. Use the
library() function for this purpose. For example, to load the "ggplot2" package, type:
library(ggplot2)

Experiment No: 2

Aim: Understanding of nuts and bolts of R:


a. R program Structure
b. R Data Type, Command Syntax and Control Structures
c. File Operations in R

Data Description

a. R Program Structure: An R program consists of a series of commands


that are executed sequentially. These commands can be typed directly into
the R console or saved in a script file with a .R extension.

b. R Data Types, Command Syntax, and Control Structures: R


supports various data types, including numeric, character, logical, factor, and
more. Here's a quick overview: Numeric: Used for storing numeric values
(integers or decimals). Character: Used for storing text data. Logical:
Represents binary values TRUE or FALSE. Factor: Represents categorical data
with levels or categories.

c. File Operations in R: R provides functions to perform various file


operations:

R Code

a. R Program Structure:

library(package_name)

print(result)
my_function <- function(arg1, arg2) {
return(result)

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

result <- my_function(value1, value2)

b. R Data Types, Command Syntax, and Control Structures:

x <- 5
name <- "John"
is_valid <- TRUE
sum_result <- 3 + 7

c. File Operations in R:
Reading files
# Reading text files
data <- read.table("data.txt", header = TRUE)

# Reading CSV files


data <- read.csv("data.csv")

# Reading Excel files (requires 'readxl' package)


library(readxl)
data <- read_excel("data.xlsx")

Writing files
# Writing data to text file
write.table(data, "output.txt", sep = "\t", row.names = FALSE)

# Writing data to CSV file


write.csv(data, "output.csv", row.names = FALSE)

# Writing data to Excel file (requires 'openxlsx' package)


library(openxlsx)
write.xlsx(data, "output.xlsx")

Experiment No: 3

Aim: Excel and R integration with R connector.

Data Description:
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.
Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
> install.packages("csv")
> library("csv")
> Salary_Dataset = read.csv(file.choose(), 1)
> Salary_Dataset

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 4

Aim: Preparing Data in R


a. Data Cleaning
b. Data imputation
c. Data conversion

Data Description

In this example, the CSV file has two columns:


experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
# Load libraries
library(dplyr)
library(missForest)

# Read dataset
data <- read.csv("data.csv")

# Data Cleaning
cleaned_data <- data %>%
distinct() %>%
select(-Irrelevant_Column)

# Check for missing values


missing_values <- sum(is.na(cleaned_data))

if (missing_values > 0) {
# Data Imputation
imputed_data <- missForest(cleaned_data, verbose = TRUE)
} else {
imputed_data <- cleaned_data
}

# Data Conversion (if needed)


imputed_data$Categorical_Column <- as.factor(imputed_data$Categorical_Column)

# Display prepared dataset


print(imputed_data)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 5

Aim: Outliers detection using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code

Sample Input and Output

Experiment No: 6

Aim: Correlation and Regression Analysis in R

Data Description

In this example, the CSV file has two columns:


experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Sample rows and columns

R Code

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 7

Aim: Clustering Algorithms implementation using R

Data Description
In this example, the CSV file has two columns:
experience_years: This column represents the number of years of experience each person
has.
salary: This column contains the corresponding salary for each person based on their
experience.

Sample rows and columns

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Sample Input and Output

Experiment No: 8

Aim: Classification Algorithm implementation using R


Classification (Spam/Not spam)

R Code

# Load required libraries


library(tm) # Text mining
library(e1071) # For Naive Bayes classifier
library(caret) # For model evaluation

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

# Load the SpamAssassin dataset (replace with your actual file path)
spam_data <- read.csv("path/to/spamassassin_data.csv", stringsAsFactors = FALSE)

# Preprocess the text data


corpus <- Corpus(VectorSource(spam_data$text))
corpus <- tm_map(corpus, content_transformer(tolower))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, removeWords, stopwords("en"))
corpus <- tm_map(corpus, stripWhitespace)

# Create a document-term matrix


dtm <- DocumentTermMatrix(corpus)

# Convert the document-term matrix to a data frame


spam_df <- as.data.frame(as.matrix(dtm))
colnames(spam_df) <- make.names(colnames(spam_df))

# Combine with labels


spam_df$label <- spam_data$label

# Split data into training and testing sets


set.seed(123)
train_indices <- sample(1:nrow(spam_df), 0.7 * nrow(spam_df))
train_data <- spam_df[train_indices, ]
test_data <- spam_df[-train_indices, ]

# Train a Naive Bayes classifier


naive_bayes_model <- naiveBayes(label ~ ., data = train_data)

# Make predictions
predictions <- predict(naive_bayes_model, newdata = test_data, type = "class")

# Evaluate the model


conf_matrix <- confusionMatrix(predictions, test_data$label)
print(conf_matrix)

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 9

Aim:Case study on Stock Market Analysis and applications. Stock data can be obtained from
Yahoo! Finance, Google Finance. A team of students can apply statistical modeling on the stock
data to uncover hidden patterns. R provides tools for moving averages, auto regression and
time-series analysis which forms the crux of financial applications.

Data Description

Stock data imported from Yahoo FInances.

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

R Code
# Load required libraries
library(dplyr)
library(lubridate)

# Read the stock data CSV file (or load data from API)
stock_data <- read.csv("stock_data.csv")

# Convert date column to Date format


stock_data$Date <- ymd(stock_data$Date)

# Calculate 50-day and 200-day moving averages


stock_data$MA_50 <- SMA(stock_data$Close, n = 50)
stock_data$MA_200 <- SMA(stock_data$Close, n = 200)

# Load required library


library(forecast)

# Convert data to time series format


stock_ts <- ts(stock_data$Close, frequency = 365)

# Fit auto-regression model (ARIMA)


ar_model <- auto.arima(stock_ts)

# Load required libraries


library(ggplot2)
library(forecast)

# Decompose time series into trend, seasonal, and residual components


decomposed <- decompose(stock_ts)

# Plot decomposed components


plot(decomposed)

# Create a time series plot of stock prices and moving averages


ggplot(stock_data, aes(x = Date)) +
geom_line(aes(y = Close, color = "Stock Price")) +
geom_line(aes(y = MA_50, color = "50-day MA")) +
geom_line(aes(y = MA_200, color = "200-day MA")) +
labs(title = "Stock Price and Moving Averages", y = "Price") +
scale_color_manual(values = c("Stock Price" = "blue", "50-day MA" = "red", "200-day MA" =
"green"))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

Experiment No: 10

Aim: Detect credit card fraudulent transactions - The dataset can be obtained from Kaggle. The
team will use a variety of machine learning algorithms that will be able to discern fraudulent
from non-fraudulent one.

Data Description
The dataset was obtained from Kaggle

R Code
# Load required libraries
library(AnomalyDetection)
library(randomForest)

# Load the CreditCardFraud dataset


data("CreditCardFraud")

# Split data into training and testing sets (70% training, 30% testing)
set.seed(123)
train_indices <- sample(1:nrow(CreditCardFraud), 0.7 * nrow(CreditCardFraud))
train_data <- CreditCardFraud[train_indices, ]
test_data <- CreditCardFraud[-train_indices, ]

# Build Random Forest model


rf_model <- randomForest(Class ~ ., data = train_data, ntree = 100)

# Make predictions
predictions <- predict(rf_model, newdata = test_data)

Downloaded by Tanay Vyas ([email protected])


lOMoARcPSD|41453364

# Calculate accuracy
accuracy <- sum(predictions == test_data$Class) / nrow(test_data)
print(paste("Accuracy score on Test Data: :", accuracy))

Sample Input and Output

Downloaded by Tanay Vyas ([email protected])

You might also like