0% found this document useful (0 votes)
10 views

data analytics lab manual using R programming

The document is a lab manual for a Data Analytics course at Vidya Jyothi Institute of Technology, detailing various programming tasks and techniques in R for data preprocessing, regression analysis, and machine learning. It includes a list of programs to be implemented, such as handling missing values, linear and logistic regression, decision trees, and random forests, along with instructions for installing R and RStudio. The manual serves as a practical guide for students to apply data analytics concepts using R programming.

Uploaded by

mbsailajanawin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

data analytics lab manual using R programming

The document is a lab manual for a Data Analytics course at Vidya Jyothi Institute of Technology, detailing various programming tasks and techniques in R for data preprocessing, regression analysis, and machine learning. It includes a list of programs to be implemented, such as handling missing values, linear and logistic regression, decision trees, and random forests, along with instructions for installing R and RStudio. The manual serves as a practical guide for students to apply data analytics concepts using R programming.

Uploaded by

mbsailajanawin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

lOMoARcPSD|51655226

Data Analytics Lab Manual (R22) B.Tech. CSE(AI ML) II Sem.

Data analytics lab (Vidya Jyothi Institute of Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

Department of Computer Science & Engineering


(Artificial Intelligence & Machine Learning)

Data Analytics Lab


Regulation: (R22)

III B.Tech. - Semester - II

LAB MANUAL

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

Data Analytics Lab

B.Tech. CSE(AI & ML) II Sem. L T P C


Course Code: 0 0 2 1

List of Programs:

1. Data Preprocessing a. Handling missing values b. Noise detection removal c. Identifying data
redundancy and elimination
2. Implement any one imputation model
3. Implement Linear Regression
4. Implement Logistic Regression
5. Implement Decision Tree Induction for classification
6. Implement Random Forest Classifier
7. Implement ARIMA on Time Series data
8. Object segmentation using hierarchical based methods
9. Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D Cubes etc)
10. Perform Descriptive analytics on healthcare data
11. Perform Predictive analytics on Product Sales data
12. Apply Predictive analytics for Weather forecasting.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

S.N.O Name of the Program Page no

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

1 Introduction to R Programming 4

2 Installation of R-Studio on windows 4

3 Data Preprocessing a. Handling missing values b. Noise detection removal c. 8


Identifying data redundancy and elimination

4 Implement any one imputation model 9

5 Implement Linear Regression 10

6 Implement Logistic Regression 11

7 Implement Decision Tree Induction for classification 15

8 Implement Random Forest Classifier 16

9 Implement ARIMA on Time Series data 17

10 Object segmentation using hierarchical based methods 18

11 Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D 20


Cubes etc)

12 Perform Descriptive analytics on healthcare data 21

13 Perform Predictive analytics on Product Sales data 22

14 Apply Predictive analytics for Weather forecasting. 23

Introduction to R programming:
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine
learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R
libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are
preferred. R is not only entrusted by academic, but many large companies also use R programming
language, including Uber, Google, Airbnb, Facebook and so on.

Data analysis with R is done in a series of steps; programming, transforming, discovering,


modeling and communicate the results.
Program: R is a clear and accessible programming tool

Transform: R is made up of a collection of libraries designed specifically for data science

Discover: Investigate the data, refine your hypothesis and analyze them

Model: R provides a wide array of tools to capture the right model for your data

Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny
apps to share with the world

What is R used for?


Statistical inference
Data analysis
Machine learning algorithm

Installation of R-Studio on windows:

Step – 1: With R-base installed, let’s move on to installing RStudio. To begin, goto
download RStudio and click on the download button for RStudio desktop.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

Step – 2: Click on the link for the windows version of RStudio and save

the .exe file. Step – 3: Run the .exe and follow the installation instructions.

3. Click Next on the welcome window.

Enter/browse the path to the installation folder and click Next to proceed.

Select the folder for the start menu shortcut or click on do not create shortcuts and
then click Next.

Wait for the installation process to complete.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

Click Finish to end the installation.

Install the R Packages:-


In RStudio, if you require a particular library, then you can go through the
following instructions:

 First, run R Studio.


 After clicking on the packages tab, click on install. The following dialog box
will appear.
 In the Install Packages dialog, write the package name you want to
install under the Packages field and then click install. This will install
the package you searched for or give you a list of matching packages
based on your package text.

Installing Packages:-
The most common place to get packages from is CRAN. To install packages from
CRAN you use install.packages("package name"). For instance, if you want to
install the ggplot2 package, which is a very popular visualization package, you
would type the following in the console:-
Syntax:-
# install package from
CRAN
install.packages("ggplot2"
) Loading Packages:-
Once the package is downloaded to your computer you can access the functions and
resources provided by the package in two different ways:
# load the package to use in the current R session
library(packagename)

Getting Help on Packages: -


For more direct help on packages that are installed on your computer you can
use the help and vignette functions. Here we can get help on the ggplot2
package with the following: help(package = "ggplot2") # provides details
regarding contents of a package vignette(package = "ggplot2") # list vignettes
available for a specific package vignette("ggplot2-specs") # view specific
vignette
vignette() # view all vignettes on your computer

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

Data Analytics Lab Manual

1 Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination
a. Handling Missing Values:
# Remove rows with missing values
data <- na.omit(data)
# Impute missing values with mean
data$column_with_missing <- ifelse(is.na(data$column_with_missing),
mean(data$column_with_missing, na.rm = TRUE),
data$column_with_missing)
b. Noise Detection and Removal:
z_scores <- scale(data$numeric_column)
outliers <- which(abs(z_scores) > 3) # Adjust the threshold as needed
cleaned_data <- data[-outliers, ]
c. Identifying Data Redundancy and Elimination:
# Remove duplicate rows
unique_data <- unique(data)
# Remove highly correlated variables
cor_matrix <- cor(data)
high_correlation <- findCorrelation(cor_matrix, cutoff = 0.9) # Adjust the
threshold as needed
cleaned_data <- data[, -high_correlation]

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

2 Write a program to Implement any one imputation model


# Load required libraries
library(dplyr)
# Generate sample data with missing values
set.seed(123)
data <- data.frame(
id = 1:10,
age = sample(c(20:60, NA), 10, replace = TRUE),
height = sample(c(150:200, NA), 10, replace = TRUE),
weight = sample(c(50:100, NA), 10, replace = TRUE)
)
# Print original data
cat("Original data:\n")
print(data)
# Function to impute missing values using mean
mean_imputation <- function(x) {
if (is.numeric(x)) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
}
return(x)
}
# Apply mean imputation to each numeric column
data_imputed <- data %>%
mutate(across(where(is.numeric), mean_imputation))
# Print imputed data
cat("\nImputed data using mean imputation:\n")
print(data_imputed)
In this program:

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

1. We generate some sample data with missing values.


2. We define a function `mean_imputation` that takes a numeric vector, replaces
missing values with the mean of non-missing values, and returns the imputed
vector.
3. We use the `mutate` function from the `dplyr` package to apply the
`mean_imputation` function to each numeric column of the dataset.
4. The resulting dataset `data_imputed` contains the original data with missing
values replaced by the mean of their respective columns.
5. Finally, we print both the original and imputed datasets for comparison.

3 Write a program to Implement Linear Regression using R


# Load required libraries
library(ggplot2)
# Generate sample data
set.seed(123)
n <- 100
x <- seq(1, 10, length.out = n)
y <- 3 * x + rnorm(n, mean = 0, sd = 2) # Simulated linear relationship with
noise
data <- data.frame(x = x, y = y)
# Visualize the data
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Sample Data for Linear Regression",
x = "X",
y = "Y")
# Fit linear regression model
lm_model <- lm(y ~ x, data = data)
# Print summary of the model
summary(lm_model)
# Plot the regression line
ggplot(data, aes(x = x, y = y)) +
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Linear Regression",
x = "X",
y = "Y")
# Predict using the model
new_x <- 11
predicted_y <- predict(lm_model, newdata = data.frame(x = new_x))
cat("Predicted value for x =", new_x, ":", predicted_y)
```
In this program:
1. We generate some sample data with a linear relationship between `x` and `y`,
adding some random noise.
2. We visualize the sample data using a scatter plot.
3. We fit a linear regression model using the `lm()` function, specifying the
formula `y ~ x`.
4. We print a summary of the fitted model using the `summary()` function.
5. We plot the original data points along with the fitted regression line using
`geom_smooth()` in ggplot2.
6. We demonstrate how to make predictions using the fitted model for a new
value of `x`.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

4 Write a Program to Implement Logistic Regression using R


# Load required libraries
library(ggplot2)
# Generate sample data
set.seed(123)
n <- 100
x <- seq(-5, 5, length.out = n)
linear_combination <- -2 + 0.5 * x # Linear combination of features
probabilities <- 1 / (1 + exp(-linear_combination)) # Sigmoid function
y <- rbinom(n, 1, probabilities) # Simulated binary outcome (0 or 1)
data <- data.frame(x = x, y = y)
# Visualize the data
ggplot(data, aes(x = x, y = factor(y))) +
geom_point() +
labs(title = "Sample Data for Logistic Regression",
x = "X",
y = "Y")
# Fit logistic regression model
logit_model <- glm(y ~ x, data = data, family = binomial)
# Print summary of the model
summary(logit_model)
# Plot the logistic regression curve
logistic_curve <- function(x) {
return(1 / (1 + exp(-x)))
}
curve(logistic_curve(coef(logit_model)[1] + coef(logit_model)[2] * x),
from = min(data$x), to = max(data$x),
col = "red", lwd = 2, add = TRUE)
# Predict using the model

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

new_x <- 1
predicted_probability <- predict(logit_model, newdata = data.frame(x =
new_x), type = "response")
cat("Predicted probability for x =", new_x, ":", predicted_probability)
```
In this program:
1. We generate some sample data with a binary outcome variable `y` based on a
linear combination of the feature `x`.
2. We visualize the sample data using a scatter plot.
3. We fit a logistic regression model using the `glm()` function with `family =
binomial`.
4. We print a summary of the fitted model using the `summary()` function.
5. We plot the logistic regression curve using the coefficients obtained from the
fitted model.
6. We demonstrate how to make predictions using the fitted model for a new
value of `x`.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

5 Write a program to Implement Decision Tree Induction for


classification using R
# Load required libraries
library(rpart)
library(rpart.plot)
# Generate sample data
set.seed(123)
n <- 100
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)
y <- ifelse((x1 + x2) > 10, "A", "B") # Simulated classification outcome
data <- data.frame(x1 = x1, x2 = x2, y = y)
# Visualize the data
plot(data$x1, data$x2, col = ifelse(data$y == "A", "red", "blue"),
pch = 19, xlab = "X1", ylab = "X2", main = "Sample Data for Decision Tree
Classification")
# Fit decision tree model
tree_model <- rpart(y ~ x1 + x2, data = data, method = "class")
# Visualize the decision tree
rpart.plot(tree_model, main = "Decision Tree for Classification")
# Predict using the model
new_data <- data.frame(x1 = c(3, 7), x2 = c(8, 2))
predicted_classes <- predict(tree_model, newdata = new_data, type = "class")
cat("Predicted classes for new data:", predicted_classes)
In this program:
1. We generate some sample data with two input features `x1` and `x2`, and a binary
classification outcome `y`.
2. We visualize the sample data using a scatter plot.
3. We fit a decision tree model using the `rpart()` function from the `rpart` package,
specifying the formula `y ~ x1 + x2` for classification.
4. We visualize the resulting decision tree using the `rpart.plot()` function.
5. We demonstrate how to make predictions using the fitted model for new data.
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

6 Write a program to Implement Random Forest Classifier using R


# Load required library
library(randomForest)
# Generate sample data
set.seed(123)
n <- 100
x1 <- runif(n, 0, 10)
x2 <- runif(n, 0, 10)
y <- ifelse((x1 + x2) > 10, "A", "B") # Simulated classification outcome
data <- data.frame(x1 = x1, x2 = x2, y = y)
# Fit Random Forest model
rf_model <- randomForest(y ~ x1 + x2, data = data, ntree = 100)
# Print model details
print(rf_model)
# Plot variable importance
varImpPlot(rf_model, main = "Variable Importance Plot")
# Predict using the model
new_data <- data.frame(x1 = c(3, 7), x2 = c(8, 2))
predicted_classes <- predict(rf_model, newdata = new_data)
cat("Predicted classes for new data:", predicted_classes)
In this program:
1. We generate some sample data with two input features `x1` and `x2`, and a
binary classification outcome `y`.
2. We fit a Random Forest model using the `randomForest()` function from the
`randomForest` package, specifying the formula `y ~ x1 + x2` for classification
and the number of trees (`ntree`) as 100.
3. We print the details of the fitted Random Forest model.
4. We plot the variable importance using the `varImpPlot()` function.
5. We demonstrate how to make predictions using the fitted model for new data.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

7 Write a program to Implement ARIMA on Time Series data using R


# Load required library
library(forecast)
# Generate sample time series data
set.seed(123)
n <- 100
ts_data <- ts(rnorm(n, mean = 0, sd = 1), start = 1, frequency = 1)
# Plot the sample time series data
plot(ts_data, main = "Sample Time Series Data", xlab = "Time", ylab = "Value")
# Fit ARIMA model
arima_model <- auto.arima(ts_data)
# Print model details
print(arima_model)
# Plot the forecast
plot(forecast(arima_model), main = "Forecast using ARIMA")
In this program:
1. We generate some sample time series data using the `ts()` function.
2. We plot the sample time series data.
3. We fit an ARIMA model to the time series data using the `auto.arima()`
function from the `forecast` package. This function automatically selects the
best ARIMA model based on the data.
4. We print the details of the fitted ARIMA model.
5. We plot the forecasted values using the `forecast()` function.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

8 Write a program to implement Object segmentation using


hierarchical based methods using R
# Load required libraries
library(jpeg)
library(EBImage)
# Read the image
img <- readImage("image.jpg") # Replace "image.jpg" with your image file
path
# Convert the image to grayscale
gray_img <- channel(img, "gray")
# Normalize the grayscale image
normalized_img <- normalize(gray_img)
# Flatten the image to create a matrix of pixel values
flattened_img <- as.vector(normalized_img)
# Perform hierarchical clustering
hc <- hclust(dist(flattened_img), method = "ward.D2")
# Cut the dendrogram to create segments
num_segments <- 4 # Number of segments to create
segments <- cutree(hc, k = num_segments)
# Create a segmented image
segmented_img <- matrix(segments[order.dendrogram(as.dendrogram(hc))],
nrow = dim(normalized_img)[1], ncol = dim(normalized_img)[2])
# Plot the original and segmented images
par(mfrow = c(1, 2))
display(gray_img, main = "Original Image")
display(segmented_img, main = "Segmented Image")
In this program:
1. We load the required libraries `jpeg` and `EBImage` for image processing.
2. We read the image using the `readImage()` function from the `EBImage`
package.
3. We convert the image to grayscale and normalize the pixel values to range
between 0 and 1.
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

4. We flatten the image to create a vector of pixel values.


5. We perform hierarchical clustering on the flattened image using the `hclust()`
function.
6. We cut the dendrogram at a specified number of segments using the `cutree()`
function.
7. We create a segmented image by rearranging the segments based on the
hierarchical clustering.
8. Finally, we plot both the original and segmented images using the `display()`
function.

9 Write a program to Perform Visualization techniques (types of maps


- Bar, Colum, Line, Scatter, 3D Cubes etc)
# Load required library
library(ggplot2)
library(plotly)
# Sample data
set.seed(123)
data <- data.frame(
x = 1:10,
y1 = rnorm(10),
y2 = rnorm(10),
y3 = rnorm(10),
y4 = rnorm(10),
y5 = rnorm(10)
)
# Bar plot
bar_plot <- ggplot(data, aes(x = x, y = y1)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Bar Plot", x = "X", y = "Y")
# Column plot
column_plot <- ggplot(data, aes(x = x, y = y2)) +
geom_col(fill = "lightgreen") +
Downloaded by MB Sailaja ([email protected])
lOMoARcPSD|51655226

labs(title = "Column Plot", x = "X", y = "Y")


# Line plot
line_plot <- ggplot(data, aes(x = x)) +
geom_line(aes(y = y3), color = "orange") +
labs(title = "Line Plot", x = "X", y = "Y")
# Scatter plot
scatter_plot <- ggplot(data, aes(x = y4, y = y5)) +
geom_point(color = "red") +
labs(title = "Scatter Plot", x = "X", y = "Y")
# 3D Scatter plot
scatter3d <- plot_ly(data, x = ~y4, y = ~y5, z = ~x, type = "scatter3d", mode =
"markers",
marker = list(color = "blue", size = 5)) %>%
layout(scene = list(xaxis = list(title = "Y1"), yaxis = list(title = "Y2"), zaxis =
list(title = "X")))
# Display plots
print(bar_plot)
print(column_plot)
print(line_plot)
print(scatter_plot)
scatter3d
In this program:
1. We generate some sample data with five variables (`x`, `y1`, `y2`, `y3`, `y4`,
`y5`).
2. We create different types of plots using ggplot2 and plotly libraries: bar plot,
column plot, line plot, scatter plot, and 3D scatter plot.
3. We customize each plot with appropriate titles and axis labels.
4. Finally, we display the plots using the `print()` function for ggplot2 plots and
directly for the plotly 3D scatter plot.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

10 Write a program to Perform Descriptive analytics on healthcare data


using R
# Load required libraries
library(dplyr)
# Load sample healthcare data (replace with your dataset)
data <- read.csv("healthcare_data.csv") # Replace "healthcare_data.csv" with
your dataset file path
# Display the structure of the dataset
str(data)
# Display summary statistics
summary(data)
# Check for missing values
missing_values <- colSums(is.na(data))
print("Missing Values:")
print(missing_values)
# Visualize distributions of key variables
hist(data$age, main = "Age Distribution", xlab = "Age")
boxplot(data$weight, main = "Weight Distribution")
boxplot(data$height, main = "Height Distribution")
# Analyze categorical variables
table(data$gender)
table(data$diagnosis)
# Calculate correlations between variables
correlation_matrix <- cor(data[, c("age", "weight", "height")])
print("Correlation Matrix:")
print(correlation_matrix)
# Create a scatterplot matrix
pairs(~age + weight + height, data = data, main = "Scatterplot Matrix")
In this program:
1. We load required libraries, such as `dplyr`, for data manipulation and
analysis.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

2. We read the healthcare data from a CSV file using the `read.csv()` function.
Replace `"healthcare_data.csv"` with the path to your dataset.
3. We display the structure of the dataset using `str()` and summary statistics
using `summary()`.
4. We check for missing values in the dataset.
5. We visualize the distributions of key variables (age, weight, height) using
histograms and boxplots.
6. We analyze categorical variables (gender, diagnosis) using the `table()`
function.
7. We calculate correlations between numeric variables using the `cor()`
function and display the correlation matrix.
8. We create a scatterplot matrix to visualize relationships between variables.
9. Additional exploratory analysis can be performed based on the specific
requirements of the analysis.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

11 Write a program to Perform Predictive analytics on Product Sales


data
# Load required libraries
library(ggplot2)
library(dplyr)
library(caret)
# Read the product sales data (replace "sales_data.csv" with your file path)
sales_data <- read.csv("sales_data.csv", stringsAsFactors = FALSE)
# Explore the structure of the data
str(sales_data)
# Summary statistics
summary(sales_data)
# Data preprocessing (if needed)
# For example, converting factors to numeric, handling missing values, etc.
# Split the data into training and testing sets
set.seed(123) # For reproducibility
train_index <- createDataPartition(sales_data$sales, p = 0.8, list = FALSE)
train_data <- sales_data[train_index, ]
test_data <- sales_data[-train_index, ]
# Train a linear regression model
lm_model <- lm(sales ~., data = train_data)
# Summarize the model
summary(lm_model)
# Make predictions on the test set
predictions <- predict(lm_model, newdata = test_data)
# Evaluate model performance
RMSE <- sqrt(mean((test_data$sales - predictions)^2))
cat("Root Mean Squared Error (RMSE):", RMSE)
# Visualize actual vs. predicted sales
ggplot() +

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

geom_point(data = test_data, aes(x = product_price, y = sales), color = "blue",


alpha = 0.5) +
geom_point(data = test_data, aes(x = product_price, y = predictions), color =
"red", alpha = 0.5) +
labs(title = "Actual vs. Predicted Sales", x = "Product Price", y = "Sales")
In this program:
1. We load required libraries such as `ggplot2`, `dplyr`, and `caret`.
2. We read the product sales data from a CSV file using `read.csv()`.
3. We explore the structure of the data using `str()` and obtain summary
statistics using `summary()`.
4. We preprocess the data if needed, such as converting factors to numeric or
handling missing values.
5. We split the data into training and testing sets using `createDataPartition()`
from the `caret` package.
6. We train a linear regression model using `lm()` on the training data.
7. We make predictions on the test set using the trained model.
8. We evaluate the model's performance using root mean squared error (RMSE).
9. We visualize actual vs. predicted sales using `ggplot2`.
10. Additional analysis and visualization can be performed based on specific
requirements.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

12 Write a program to Apply Predictive analytics for Weather


forecasting.
# Load required libraries
library(forecast)
# Read the weather data (replace "weather_data.csv" with your file path)
weather_data <- read.csv("weather_data.csv")
# Convert date column to Date format
weather_data$date <- as.Date(weather_data$date)
# Create a time series object
ts_data <- ts(weather_data$temp, frequency = 365)
# Visualize the time series data
plot(ts_data, main = "Temperature Time Series Data", xlab = "Date", ylab =
"Temperature")
# Train ARIMA model
arima_model <- auto.arima(ts_data)
# Summary of the ARIMA model
summary(arima_model)
# Forecast future temperatures
forecast_values <- forecast(arima_model, h = 30) # Forecasting next 30 days
# Plot forecasted values
plot(forecast_values, main = "Temperature Forecast")
# Print forecasted values
print(forecast_values)
In this program:
1. We load the required library `forecast` for time series forecasting.
2. We read the weather data from a CSV file using `read.csv()`.
3. We convert the date column to the Date format using `as.Date()`.
4. We create a time series object using the `ts()` function with the appropriate
frequency.
5. We visualize the time series data using the `plot()` function.
6. We train an ARIMA model using the `auto.arima()` function.

Downloaded by MB Sailaja ([email protected])


lOMoARcPSD|51655226

7. We summarize the ARIMA model using the `summary()` function.


8. We forecast future temperatures using the `forecast()` function, specifying the
number of periods to forecast.
9. We plot the forecasted values using the `plot()` function.
10. We print the forecasted values using the `print()` function.

Downloaded by MB Sailaja ([email protected])

You might also like