0% found this document useful (0 votes)

76 views27 pages

Data Analytics Lab Manual Using R Programming

The document is a lab manual for a Data Analytics course at Vidya Jyothi Institute of Technology, detailing various programming tasks and techniques in R for data preprocessing, regression analysis, and machine learning. It includes a list of programs to be implemented, such as handling missing values, linear and logistic regression, decision trees, and random forests, along with instructions for installing R and RStudio. The manual serves as a practical guide for students to apply data analytics concepts using R programming.

Uploaded by

mbsailajanawin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views27 pages

Data Analytics Lab Manual Using R Programming

Uploaded by

mbsailajanawin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

lOMoARcPSD|51655226

Data Analytics Lab Manual (R22) B.Tech. CSE(AI ML) II Sem.

Data analytics lab (Vidya Jyothi Institute of Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

Department of Computer Science & Engineering

(Artificial Intelligence & Machine Learning)

Data Analytics Lab

Regulation: (R22)

III B.Tech. - Semester - II

LAB MANUAL

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

Data Analytics Lab

B.Tech. CSE(AI & ML) II Sem. L T P C

Course Code: 0 0 2 1

List of Programs:

1. Data Preprocessing a. Handling missing values b. Noise detection removal c. Identifying data
redundancy and elimination
2. Implement any one imputation model
3. Implement Linear Regression
4. Implement Logistic Regression
5. Implement Decision Tree Induction for classification
6. Implement Random Forest Classifier
7. Implement ARIMA on Time Series data
8. Object segmentation using hierarchical based methods
9. Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D Cubes etc)
10. Perform Descriptive analytics on healthcare data
11. Perform Predictive analytics on Product Sales data
12. Apply Predictive analytics for Weather forecasting.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

S.N.O Name of the Program Page no

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

1 Introduction to R Programming 4

2 Installation of R-Studio on windows 4

3 Data Preprocessing a. Handling missing values b. Noise detection removal c. 8

Identifying data redundancy and elimination

4 Implement any one imputation model 9

5 Implement Linear Regression 10

6 Implement Logistic Regression 11

7 Implement Decision Tree Induction for classification 15

8 Implement Random Forest Classifier 16

9 Implement ARIMA on Time Series data 17

10 Object segmentation using hierarchical based methods 18

11 Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D 20

Cubes etc)

12 Perform Descriptive analytics on healthcare data 21

13 Perform Predictive analytics on Product Sales data 22

14 Apply Predictive analytics for Weather forecasting. 23

Introduction to R programming:
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R
libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred. R is not only entrusted by academic, but many large companies also use R programming
language, including Uber, Google, Airbnb, Facebook and so on.

Data analysis with R is done in a series of steps; programming, transforming, discovering,

modeling and communicate the results.
Program: R is a clear and accessible programming tool

Transform: R is made up of a collection of libraries designed specifically for data science

Discover: Investigate the data, refine your hypothesis and analyze them

Model: R provides a wide array of tools to capture the right model for your data

Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny
apps to share with the world

What is R used for?

Statistical inference
Data analysis
Machine learning algorithm

Installation of R-Studio on windows:

Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, goto
download RStudio and click on the download button for RStudio desktop.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

Step – 2: Click on the link for the windows version of RStudio and save

the .exe file. Step – 3: Run the .exe and follow the installation instructions.

3. Click Next on the welcome window.

Enter/browse the path to the installation folder and click Next to proceed.

Select the folder for the start menu shortcut or click on do not create shortcuts and
then click Next.

Wait for the installation process to complete.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

Click Finish to end the installation.

Install the R Packages:-

In RStudio, if you require a particular library, then you can go through the
following instructions:

 First, run R Studio.

 After clicking on the packages tab, click on install. The following dialog box
will appear.
 In the Install Packages dialog, write the package name you want to
install under the Packages field and then click install. This will install
the package you searched for or give you a list of matching packages
based on your package text.

Installing Packages:-
The most common place to get packages from is CRAN. To install packages from
CRAN you use install.packages("package name"). For instance, if you want to
install the ggplot2 package, which is a very popular visualization package, you
would type the following in the console:-
Syntax:-
# install package from
CRAN
install.packages("ggplot2"
) Loading Packages:-
Once the package is downloaded to your computer you can access the functions and
resources provided by the package in two different ways:
# load the package to use in the current R session
library(packagename)

Getting Help on Packages: -

For more direct help on packages that are installed on your computer you can
use the help and vignette functions. Here we can get help on the ggplot2
package with the following: help(package = "ggplot2") # provides details
regarding contents of a package vignette(package = "ggplot2") # list vignettes
available for a specific package vignette("ggplot2-specs") # view specific
vignette
vignette() # view all vignettes on your computer

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

Data Analytics Lab Manual

1 Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination
a. Handling Missing Values:
# Remove rows with missing values
data <- na.omit(data)
# Impute missing values with mean
data$column_with_missing <- ifelse(is.na(data$column_with_missing),
mean(data$column_with_missing, na.rm = TRUE),
data$column_with_missing)
b. Noise Detection and Removal:
z_scores <- scale(data$numeric_column)
outliers <- which(abs(z_scores) > 3) # Adjust the threshold as needed
cleaned_data <- data[-outliers, ]
c. Identifying Data Redundancy and Elimination:
# Remove duplicate rows
unique_data <- unique(data)
# Remove highly correlated variables
cor_matrix <- cor(data)
high_correlation <- findCorrelation(cor_matrix, cutoff = 0.9) # Adjust the
threshold as needed
cleaned_data <- data[, -high_correlation]

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

2 Write a program to Implement any one imputation model

# Load required libraries
library(dplyr)
# Generate sample data with missing values
set.seed(123)
data <- data.frame(
id = 1:10,
age = sample(c(20:60, NA), 10, replace = TRUE),
height = sample(c(150:200, NA), 10, replace = TRUE),
weight = sample(c(50:100, NA), 10, replace = TRUE)
)
# Print original data
cat("Original data:\n")
print(data)
# Function to impute missing values using mean
mean_imputation <- function(x) {
if (is.numeric(x)) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
}
return(x)
}
# Apply mean imputation to each numeric column
data_imputed <- data %>%
mutate(across(where(is.numeric), mean_imputation))
# Print imputed data
cat("\nImputed data using mean imputation:\n")
print(data_imputed)
In this program:

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

1. We generate some sample data with missing values.

2. We define a function `mean_imputation` that takes a numeric vector, replaces
missing values with the mean of non-missing values, and returns the imputed
vector.
3. We use the `mutate` function from the `dplyr` package to apply the
`mean_imputation` function to each numeric column of the dataset.
4. The resulting dataset `data_imputed` contains the original data with missing
values replaced by the mean of their respective columns.
5. Finally, we print both the original and imputed datasets for comparison.

3 Write a program to Implement Linear Regression using R

# Load required libraries
library(ggplot2)
# Generate sample data
set.seed(123)
n <- 100
x <- seq(1, 10, length.out = n)
y <- 3 * x + rnorm(n, mean = 0, sd = 2) # Simulated linear relationship with
noise
data <- data.frame(x = x, y = y)
# Visualize the data
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Sample Data for Linear Regression",
x = "X",
y = "Y")
# Fit linear regression model
lm_model <- lm(y ~ x, data = data)
# Print summary of the model
summary(lm_model)
# Plot the regression line
ggplot(data, aes(x = x, y = y)) +
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Linear Regression",
x = "X",
y = "Y")
# Predict using the model
new_x <- 11
predicted_y <- predict(lm_model, newdata = data.frame(x = new_x))
cat("Predicted value for x =", new_x, ":", predicted_y)
```
In this program:
1. We generate some sample data with a linear relationship between `x` and `y`,
adding some random noise.
2. We visualize the sample data using a scatter plot.
3. We fit a linear regression model using the `lm()` function, specifying the
formula `y ~ x`.
4. We print a summary of the fitted model using the `summary()` function.
5. We plot the original data points along with the fitted regression line using
`geom_smooth()` in ggplot2.
6. We demonstrate how to make predictions using the fitted model for a new
value of `x`.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

4 Write a Program to Implement Logistic Regression using R

# Load required libraries
library(ggplot2)
# Generate sample data
set.seed(123)
n <- 100
x <- seq(-5, 5, length.out = n)
linear_combination <- -2 + 0.5 * x # Linear combination of features
probabilities <- 1 / (1 + exp(-linear_combination)) # Sigmoid function
y <- rbinom(n, 1, probabilities) # Simulated binary outcome (0 or 1)
data <- data.frame(x = x, y = y)
# Visualize the data
ggplot(data, aes(x = x, y = factor(y))) +
geom_point() +
labs(title = "Sample Data for Logistic Regression",
x = "X",
y = "Y")
# Fit logistic regression model
logit_model <- glm(y ~ x, data = data, family = binomial)
# Print summary of the model
summary(logit_model)
# Plot the logistic regression curve
logistic_curve <- function(x) {
return(1 / (1 + exp(-x)))
}
curve(logistic_curve(coef(logit_model)[1] + coef(logit_model)[2] * x),
from = min(data$x), to = max(data$x),
col = "red", lwd = 2, add = TRUE)
# Predict using the model

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

new_x <- 1
predicted_probability <- predict(logit_model, newdata = data.frame(x =
new_x), type = "response")
cat("Predicted probability for x =", new_x, ":", predicted_probability)
```
In this program:
1. We generate some sample data with a binary outcome variable `y` based on a
linear combination of the feature `x`.
2. We visualize the sample data using a scatter plot.
3. We fit a logistic regression model using the `glm()` function with `family =
binomial`.
4. We print a summary of the fitted model using the `summary()` function.
5. We plot the logistic regression curve using the coefficients obtained from the
fitted model.
6. We demonstrate how to make predictions using the fitted model for a new
value of `x`.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

5 Write a program to Implement Decision Tree Induction for

classification using R
# Load required libraries
library(rpart)
library(rpart.plot)
# Generate sample data
set.seed(123)
n <- 100
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)
y <- ifelse((x1 + x2) > 10, "A", "B") # Simulated classification outcome
data <- data.frame(x1 = x1, x2 = x2, y = y)
# Visualize the data
plot(data$x1, data$x2, col = ifelse(data$y == "A", "red", "blue"),
pch = 19, xlab = "X1", ylab = "X2", main = "Sample Data for Decision Tree
Classification")
# Fit decision tree model
tree_model <- rpart(y ~ x1 + x2, data = data, method = "class")
# Visualize the decision tree
rpart.plot(tree_model, main = "Decision Tree for Classification")
# Predict using the model
new_data <- data.frame(x1 = c(3, 7), x2 = c(8, 2))
predicted_classes <- predict(tree_model, newdata = new_data, type = "class")
cat("Predicted classes for new data:", predicted_classes)
In this program:
1. We generate some sample data with two input features `x1` and `x2`, and a binary
classification outcome `y`.
2. We visualize the sample data using a scatter plot.
3. We fit a decision tree model using the `rpart()` function from the `rpart` package,
specifying the formula `y ~ x1 + x2` for classification.
4. We visualize the resulting decision tree using the `rpart.plot()` function.
5. We demonstrate how to make predictions using the fitted model for new data.
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

6 Write a program to Implement Random Forest Classifier using R

# Load required library
library(randomForest)
# Generate sample data
set.seed(123)
n <- 100
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)
y <- ifelse((x1 + x2) > 10, "A", "B") # Simulated classification outcome
data <- data.frame(x1 = x1, x2 = x2, y = y)
# Fit Random Forest model
rf_model <- randomForest(y ~ x1 + x2, data = data, ntree = 100)
# Print model details
print(rf_model)
# Plot variable importance
varImpPlot(rf_model, main = "Variable Importance Plot")
# Predict using the model
new_data <- data.frame(x1 = c(3, 7), x2 = c(8, 2))
predicted_classes <- predict(rf_model, newdata = new_data)
cat("Predicted classes for new data:", predicted_classes)
In this program:
1. We generate some sample data with two input features `x1` and `x2`, and a
binary classification outcome `y`.
2. We fit a Random Forest model using the `randomForest()` function from the
`randomForest` package, specifying the formula `y ~ x1 + x2` for classification
and the number of trees (`ntree`) as 100.
3. We print the details of the fitted Random Forest model.
4. We plot the variable importance using the `varImpPlot()` function.
5. We demonstrate how to make predictions using the fitted model for new data.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

7 Write a program to Implement ARIMA on Time Series data using R

# Load required library
library(forecast)
# Generate sample time series data
set.seed(123)
n <- 100
ts_data <- ts(rnorm(n, mean = 0, sd = 1), start = 1, frequency = 1)
# Plot the sample time series data
plot(ts_data, main = "Sample Time Series Data", xlab = "Time", ylab = "Value")
# Fit ARIMA model
arima_model <- auto.arima(ts_data)
# Print model details
print(arima_model)
# Plot the forecast
plot(forecast(arima_model), main = "Forecast using ARIMA")
In this program:
1. We generate some sample time series data using the `ts()` function.
2. We plot the sample time series data.
3. We fit an ARIMA model to the time series data using the `auto.arima()`
function from the `forecast` package. This function automatically selects the
best ARIMA model based on the data.
4. We print the details of the fitted ARIMA model.
5. We plot the forecasted values using the `forecast()` function.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

8 Write a program to implement Object segmentation using

hierarchical based methods using R
# Load required libraries
library(jpeg)
library(EBImage)
# Read the image
img <- readImage("image.jpg") # Replace "image.jpg" with your image file
path
# Convert the image to grayscale
gray_img <- channel(img, "gray")
# Normalize the grayscale image
normalized_img <- normalize(gray_img)
# Flatten the image to create a matrix of pixel values
flattened_img <- as.vector(normalized_img)
# Perform hierarchical clustering
hc <- hclust(dist(flattened_img), method = "ward.D2")
# Cut the dendrogram to create segments
num_segments <- 4 # Number of segments to create
segments <- cutree(hc, k = num_segments)
# Create a segmented image
segmented_img <- matrix(segments[order.dendrogram(as.dendrogram(hc))],
nrow = dim(normalized_img)[1], ncol = dim(normalized_img)[2])
# Plot the original and segmented images
par(mfrow = c(1, 2))
display(gray_img, main = "Original Image")
display(segmented_img, main = "Segmented Image")
In this program:
1. We load the required libraries `jpeg` and `EBImage` for image processing.
2. We read the image using the `readImage()` function from the `EBImage`
package.
3. We convert the image to grayscale and normalize the pixel values to range
between 0 and 1.
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

4. We flatten the image to create a vector of pixel values.

5. We perform hierarchical clustering on the flattened image using the `hclust()`
function.
6. We cut the dendrogram at a specified number of segments using the `cutree()`
function.
7. We create a segmented image by rearranging the segments based on the
hierarchical clustering.
8. Finally, we plot both the original and segmented images using the `display()`
function.

9 Write a program to Perform Visualization techniques (types of maps

- Bar, Colum, Line, Scatter, 3D Cubes etc)
# Load required library
library(ggplot2)
library(plotly)
# Sample data
set.seed(123)
data <- data.frame(
x = 1:10,
y1 = rnorm(10),
y2 = rnorm(10),
y3 = rnorm(10),
y4 = rnorm(10),
y5 = rnorm(10)
)
# Bar plot
bar_plot <- ggplot(data, aes(x = x, y = y1)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Bar Plot", x = "X", y = "Y")
# Column plot
column_plot <- ggplot(data, aes(x = x, y = y2)) +
geom_col(fill = "lightgreen") +
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

labs(title = "Column Plot", x = "X", y = "Y")

# Line plot
line_plot <- ggplot(data, aes(x = x)) +
geom_line(aes(y = y3), color = "orange") +
labs(title = "Line Plot", x = "X", y = "Y")
# Scatter plot
scatter_plot <- ggplot(data, aes(x = y4, y = y5)) +
geom_point(color = "red") +
labs(title = "Scatter Plot", x = "X", y = "Y")
# 3D Scatter plot
scatter3d <- plot_ly(data, x = ~y4, y = ~y5, z = ~x, type = "scatter3d", mode =
"markers",
marker = list(color = "blue", size = 5)) %>%
layout(scene = list(xaxis = list(title = "Y1"), yaxis = list(title = "Y2"), zaxis =
list(title = "X")))
# Display plots
print(bar_plot)
print(column_plot)
print(line_plot)
print(scatter_plot)
scatter3d
In this program:
1. We generate some sample data with five variables (`x`, `y1`, `y2`, `y3`, `y4`,
`y5`).
2. We create different types of plots using ggplot2 and plotly libraries: bar plot,
column plot, line plot, scatter plot, and 3D scatter plot.
3. We customize each plot with appropriate titles and axis labels.
4. Finally, we display the plots using the `print()` function for ggplot2 plots and
directly for the plotly 3D scatter plot.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

10 Write a program to Perform Descriptive analytics on healthcare data

using R
# Load required libraries
library(dplyr)
# Load sample healthcare data (replace with your dataset)
data <- read.csv("healthcare_data.csv") # Replace "healthcare_data.csv" with
your dataset file path
# Display the structure of the dataset
str(data)
# Display summary statistics
summary(data)
# Check for missing values
missing_values <- colSums(is.na(data))
print("Missing Values:")
print(missing_values)
# Visualize distributions of key variables
hist(data$age, main = "Age Distribution", xlab = "Age")
boxplot(data$weight, main = "Weight Distribution")
boxplot(data$height, main = "Height Distribution")
# Analyze categorical variables
table(data$gender)
table(data$diagnosis)
# Calculate correlations between variables
correlation_matrix <- cor(data[, c("age", "weight", "height")])
print("Correlation Matrix:")
print(correlation_matrix)
# Create a scatterplot matrix
pairs(~age + weight + height, data = data, main = "Scatterplot Matrix")
In this program:
1. We load required libraries, such as `dplyr`, for data manipulation and
analysis.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

2. We read the healthcare data from a CSV file using the `read.csv()` function.
Replace `"healthcare_data.csv"` with the path to your dataset.
3. We display the structure of the dataset using `str()` and summary statistics
using `summary()`.
4. We check for missing values in the dataset.
5. We visualize the distributions of key variables (age, weight, height) using
histograms and boxplots.
6. We analyze categorical variables (gender, diagnosis) using the `table()`
function.
7. We calculate correlations between numeric variables using the `cor()`
function and display the correlation matrix.
8. We create a scatterplot matrix to visualize relationships between variables.
9. Additional exploratory analysis can be performed based on the specific
requirements of the analysis.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

11 Write a program to Perform Predictive analytics on Product Sales

data
# Load required libraries
library(ggplot2)
library(dplyr)
library(caret)
# Read the product sales data (replace "sales_data.csv" with your file path)
sales_data <- read.csv("sales_data.csv", stringsAsFactors = FALSE)
# Explore the structure of the data
str(sales_data)
# Summary statistics
summary(sales_data)
# Data preprocessing (if needed)
# For example, converting factors to numeric, handling missing values, etc.
# Split the data into training and testing sets
set.seed(123) # For reproducibility
train_index <- createDataPartition(sales_data$sales, p = 0.8, list = FALSE)
train_data <- sales_data[train_index, ]
test_data <- sales_data[-train_index, ]
# Train a linear regression model
lm_model <- lm(sales ~., data = train_data)
# Summarize the model
summary(lm_model)
# Make predictions on the test set
predictions <- predict(lm_model, newdata = test_data)
# Evaluate model performance
RMSE <- sqrt(mean((test_data$sales - predictions)^2))
cat("Root Mean Squared Error (RMSE):", RMSE)
# Visualize actual vs. predicted sales
ggplot() +

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

geom_point(data = test_data, aes(x = product_price, y = sales), color = "blue",

alpha = 0.5) +
geom_point(data = test_data, aes(x = product_price, y = predictions), color =
"red", alpha = 0.5) +
labs(title = "Actual vs. Predicted Sales", x = "Product Price", y = "Sales")
In this program:
1. We load required libraries such as `ggplot2`, `dplyr`, and `caret`.
2. We read the product sales data from a CSV file using `read.csv()`.
3. We explore the structure of the data using `str()` and obtain summary
statistics using `summary()`.
4. We preprocess the data if needed, such as converting factors to numeric or
handling missing values.
5. We split the data into training and testing sets using `createDataPartition()`
from the `caret` package.
6. We train a linear regression model using `lm()` on the training data.
7. We make predictions on the test set using the trained model.
8. We evaluate the model's performance using root mean squared error (RMSE).
9. We visualize actual vs. predicted sales using `ggplot2`.
10. Additional analysis and visualization can be performed based on specific
requirements.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

12 Write a program to Apply Predictive analytics for Weather

forecasting.
# Load required libraries
library(forecast)
# Read the weather data (replace "weather_data.csv" with your file path)
weather_data <- read.csv("weather_data.csv")
# Convert date column to Date format
weather_data$date <- as.Date(weather_data$date)
# Create a time series object
ts_data <- ts(weather_data$temp, frequency = 365)
# Visualize the time series data
plot(ts_data, main = "Temperature Time Series Data", xlab = "Date", ylab =
"Temperature")
# Train ARIMA model
arima_model <- auto.arima(ts_data)
# Summary of the ARIMA model
summary(arima_model)
# Forecast future temperatures
forecast_values <- forecast(arima_model, h = 30) # Forecasting next 30 days
# Plot forecasted values
plot(forecast_values, main = "Temperature Forecast")
# Print forecasted values
print(forecast_values)
In this program:
1. We load the required library `forecast` for time series forecasting.
2. We read the weather data from a CSV file using `read.csv()`.
3. We convert the date column to the Date format using `as.Date()`.
4. We create a time series object using the `ts()` function with the appropriate
frequency.
5. We visualize the time series data using the `plot()` function.
6. We train an ARIMA model using the `auto.arima()` function.

Downloaded by MB Sailaja ([email protected])

lOMoARcPSD|51655226

7. We summarize the ARIMA model using the `summary()` function.

8. We forecast future temperatures using the `forecast()` function, specifying the
number of periods to forecast.
9. We plot the forecasted values using the `plot()` function.
10. We print the forecasted values using the `print()` function.

Downloaded by MB Sailaja ([email protected])

Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
PPT1
No ratings yet
PPT1
93 pages
Ai Unit 1
No ratings yet
Ai Unit 1
149 pages
Lab Program
100% (1)
Lab Program
15 pages
R Programming
No ratings yet
R Programming
11 pages
Cse3036 Predictive Analytics Final Lab Manual
No ratings yet
Cse3036 Predictive Analytics Final Lab Manual
112 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
R Language
No ratings yet
R Language
59 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
9 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
BCA-SEP-lesson Plan - R-Programming
No ratings yet
BCA-SEP-lesson Plan - R-Programming
5 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Ai Planning IV Unit
No ratings yet
Ai Planning IV Unit
30 pages
R22 III II KRR CSEAIML Model QP
100% (1)
R22 III II KRR CSEAIML Model QP
2 pages
Unit IV - Learning
No ratings yet
Unit IV - Learning
18 pages
Ai Unit 4
No ratings yet
Ai Unit 4
23 pages
Game Playing: Adversarial Search
No ratings yet
Game Playing: Adversarial Search
66 pages
Se Unit2
No ratings yet
Se Unit2
115 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
100% (1)
UNIT-1 Introduction: Dr. C.Nagaraju Head of Cse Ysrec of YVU Proddatur
86 pages
PPT06-Probabilistic Reasoning
No ratings yet
PPT06-Probabilistic Reasoning
31 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
Ai Unit 3 Ai Unit 3
No ratings yet
Ai Unit 3 Ai Unit 3
55 pages
Unit 4
No ratings yet
Unit 4
29 pages
Data Science Introduction
No ratings yet
Data Science Introduction
82 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Final Assessment Set1
No ratings yet
Final Assessment Set1
18 pages
Electronic Mail Security: Pretty Good Privacy (PGP)
No ratings yet
Electronic Mail Security: Pretty Good Privacy (PGP)
7 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
FDS Unit 2
No ratings yet
FDS Unit 2
27 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
Distributed System
100% (1)
Distributed System
119 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
CS8691 Ai Iii Cse C
No ratings yet
CS8691 Ai Iii Cse C
9 pages
ML QB With Answer
No ratings yet
ML QB With Answer
20 pages
MC4411 Project Work - Format
No ratings yet
MC4411 Project Work - Format
65 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
Artificial Intelligence: Adversarial Search
No ratings yet
Artificial Intelligence: Adversarial Search
36 pages
R Lnaguager
No ratings yet
R Lnaguager
38 pages
Exploratory Data Analysis: Prasad Deshmukh
No ratings yet
Exploratory Data Analysis: Prasad Deshmukh
15 pages
Semantic Web SN
No ratings yet
Semantic Web SN
22 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Touchpad Prime Ver. 1.2 Class 6
From Everand
Touchpad Prime Ver. 1.2 Class 6
Nisha Batra
No ratings yet
Assigment 1(Unit 1)
No ratings yet
Assigment 1(Unit 1)
1 page
Important Questions in Machine Learning
No ratings yet
Important Questions in Machine Learning
4 pages
Science Journal Shynaw 5AC - Compressed
No ratings yet
Science Journal Shynaw 5AC - Compressed
7 pages
Media Log Diary
No ratings yet
Media Log Diary
1 page
Quiz On 17th
No ratings yet
Quiz On 17th
4 pages
Learners+Smart+Goal+ +template+
No ratings yet
Learners+Smart+Goal+ +template+
21 pages
Lab 5 Gauss Elimination
No ratings yet
Lab 5 Gauss Elimination
5 pages
Lect11 BCNF
No ratings yet
Lect11 BCNF
84 pages
Mobile Assisted Language Learning
No ratings yet
Mobile Assisted Language Learning
15 pages
RTES Lab Programs
No ratings yet
RTES Lab Programs
7 pages
Handbook of Surveillance Technologies History Applications 3rd Edition J.K. Petersen Download
100% (1)
Handbook of Surveillance Technologies History Applications 3rd Edition J.K. Petersen Download
47 pages
Fir Police Management Mini Project
No ratings yet
Fir Police Management Mini Project
39 pages
Top 50 Core Java Interview Question Answers
No ratings yet
Top 50 Core Java Interview Question Answers
32 pages
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
No ratings yet
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
142 pages
L04 Pipelining
No ratings yet
L04 Pipelining
38 pages
Paz, John Lord Anthony (Updated Resume)
No ratings yet
Paz, John Lord Anthony (Updated Resume)
2 pages
Lab 4
No ratings yet
Lab 4
7 pages
Add Button Bar
No ratings yet
Add Button Bar
6 pages
8KHỐI 10-UNIT 8-LÊ THỊ HỒNG LINH-APC GIA LAI
No ratings yet
8KHỐI 10-UNIT 8-LÊ THỊ HỒNG LINH-APC GIA LAI
9 pages
Sumago Final
No ratings yet
Sumago Final
12 pages
Disqualification Debare List Updated 24-12-2024
No ratings yet
Disqualification Debare List Updated 24-12-2024
5 pages
5 Steps To Designing An Embedded Software Architecture, Step 2
No ratings yet
5 Steps To Designing An Embedded Software Architecture, Step 2
4 pages
Online Book Store: Project Report
No ratings yet
Online Book Store: Project Report
51 pages
FF67 Manual Bank Statement
No ratings yet
FF67 Manual Bank Statement
14 pages
DP WebCam 15121 Drivers
No ratings yet
DP WebCam 15121 Drivers
1,130 pages
Vicko Concept Tef Business Plan
No ratings yet
Vicko Concept Tef Business Plan
5 pages
Buyer Decision Process
No ratings yet
Buyer Decision Process
3 pages
Top - Niunaijun.blackboxa32 Logcat
No ratings yet
Top - Niunaijun.blackboxa32 Logcat
26 pages
Thomson 2400 Series Switchgear Specification
No ratings yet
Thomson 2400 Series Switchgear Specification
50 pages
Gigabyte Ga-Z170x Gaming GT Rev 1.01 PDF
No ratings yet
Gigabyte Ga-Z170x Gaming GT Rev 1.01 PDF
76 pages
CT 201 Computer Graphics and Visualization
No ratings yet
CT 201 Computer Graphics and Visualization
4 pages
CoinGecko 2024 Q3 Crypto Industry Report
No ratings yet
CoinGecko 2024 Q3 Crypto Industry Report
51 pages
MRP Programación Según Registro Info de Compras o Acuerdo
No ratings yet
MRP Programación Según Registro Info de Compras o Acuerdo
3 pages
Ai Final Logbook by KSR
No ratings yet
Ai Final Logbook by KSR
36 pages
Users
No ratings yet
Users
144 pages
Srns Grant Application
No ratings yet
Srns Grant Application
4 pages

Data Analytics Lab Manual Using R Programming

Uploaded by

Data Analytics Lab Manual Using R Programming

Uploaded by

lOMoARcPSD|51655226

Data Analytics Lab Manual (R22) B.Tech. CSE(AI ML) II Sem.

Data analytics lab (Vidya Jyothi Institute of Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Department of Computer Science & Engineering

Data Analytics Lab

III B.Tech. - Semester - II

Downloaded by MB Sailaja ([email protected])

Data Analytics Lab

B.Tech. CSE(AI & ML) II Sem. L T P C

Downloaded by MB Sailaja ([email protected])

S.N.O Name of the Program Page no

Downloaded by MB Sailaja ([email protected])

2 Installation of R-Studio on windows 4

3 Data Preprocessing a. Handling missing values b. Noise detection removal c. 8

4 Implement any one imputation model 9

5 Implement Linear Regression 10

6 Implement Logistic Regression 11

7 Implement Decision Tree Induction for classification 15

8 Implement Random Forest Classifier 16

9 Implement ARIMA on Time Series data 17

10 Object segmentation using hierarchical based methods 18

11 Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D 20

12 Perform Descriptive analytics on healthcare data 21

13 Perform Predictive analytics on Product Sales data 22

14 Apply Predictive analytics for Weather forecasting. 23

Data analysis with R is done in a series of steps; programming, transforming, discovering,

Transform: R is made up of a collection of libraries designed specifically for data science

What is R used for?

Installation of R-Studio on windows:

Downloaded by MB Sailaja ([email protected])

3. Click Next on the welcome window.

Wait for the installation process to complete.

Downloaded by MB Sailaja ([email protected])

Click Finish to end the installation.

Install the R Packages:-

 First, run R Studio.

Getting Help on Packages: -

Downloaded by MB Sailaja ([email protected])

Downloaded by MB Sailaja ([email protected])

Data Analytics Lab Manual

Downloaded by MB Sailaja ([email protected])

2 Write a program to Implement any one imputation model

Downloaded by MB Sailaja ([email protected])

1. We generate some sample data with missing values.

3 Write a program to Implement Linear Regression using R

Downloaded by MB Sailaja ([email protected])

4 Write a Program to Implement Logistic Regression using R

Downloaded by MB Sailaja ([email protected])

Downloaded by MB Sailaja ([email protected])

5 Write a program to Implement Decision Tree Induction for

6 Write a program to Implement Random Forest Classifier using R

Downloaded by MB Sailaja ([email protected])

7 Write a program to Implement ARIMA on Time Series data using R

Downloaded by MB Sailaja ([email protected])

8 Write a program to implement Object segmentation using

4. We flatten the image to create a vector of pixel values.

9 Write a program to Perform Visualization techniques (types of maps

labs(title = "Column Plot", x = "X", y = "Y")

Downloaded by MB Sailaja ([email protected])

10 Write a program to Perform Descriptive analytics on healthcare data

Downloaded by MB Sailaja ([email protected])

Downloaded by MB Sailaja ([email protected])

11 Write a program to Perform Predictive analytics on Product Sales

Downloaded by MB Sailaja ([email protected])

geom_point(data = test_data, aes(x = product_price, y = sales), color = "blue",

Downloaded by MB Sailaja ([email protected])

12 Write a program to Apply Predictive analytics for Weather

Downloaded by MB Sailaja ([email protected])

7. We summarize the ARIMA model using the `summary()` function.

Downloaded by MB Sailaja ([email protected])

You might also like