0% found this document useful (0 votes)
4 views

Data Analytics Using R Lab - Master Manual

The document is a lab manual for a Data Analytics course using R, aimed at Computer Science and Engineering students in their first semester. It outlines the vision, mission, program outcomes, and specific experiments related to data preprocessing, regression models, and classification techniques. The manual includes guidelines for lab conduct, a list of experiments, and sample code for various data analytics tasks.

Uploaded by

Vinay Kumar Goud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Analytics Using R Lab - Master Manual

The document is a lab manual for a Data Analytics course using R, aimed at Computer Science and Engineering students in their first semester. It outlines the vision, mission, program outcomes, and specific experiments related to data preprocessing, regression models, and classification techniques. The manual includes guidelines for lab conduct, a list of experiments, and sample code for various data analytics tasks.

Uploaded by

Vinay Kumar Goud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

DATA ANALYTICS USING R LAB MANUAL

DATA ANALYTICS USING R LAB


MASTER MANUAL
[AI507PC]

III B.TECH – I SEMESTER


ACADEMIC YEAR : 2024-2025
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ( AI & ML)

CMR ENGINEERING COLLEGE


(Approved by AICTE- New Delhi, Affiliated to JNTUH)
Kandlakoya(V), Medchal Road, Hyderabad

1
DATA ANALYTICS USING R LAB MANUAL
CSE(AI & ML) Department Vision & Mission
Vision:
To produce admirable and competent graduates & experts in Artificial Intelligence &
Machine Learning by quality technical education, innovations and research to
improve the life style in the society.
Mission:
M1: To impart value based technical education in AI & ML through innovative
teaching and learning methods.
M2: To produce outstanding professionals by imparting quality training, hands-on-
experience and value based education.
M3: To produce competent graduates suitable for industries and organizations at global
level including research and development with Social responsibility.

CSE(AI &ML) Program Outcomes [PO’s]:


Engineering Graduates will be able to satisfy these NBA graduate attributes:
1. Engineering knowledge: An ability to apply knowledge of computing,
mathematics, science and engineering fundamentals appropriate to the discipline.
2. Problem analysis: An ability to analyze a problem, and identify and formulate the
computing requirements appropriate to its solution.
3. Design/development of solutions: An ability to design, implement, and evaluate a
computer-based system, process, component, or program to meet desired needs
with appropriate consideration for public health and safety, cultural, societal and
environmental considerations.
4. Conduct investigations of complex problems: An ability to design and conduct
experiments, as well as to analyze and interpret data.
5. Modern tool usage: An ability to use current techniques, skills, and modern tools
necessary for computing practice.
6. The engineer and society: An ability to analyze the local and global impact of
computing on individuals, organizations, and society.
7. Environment and sustainability: Knowledge of contemporary issues.
8. Ethics: An understanding of professional, ethical, legal, security and social issues
and responsibilities.
9. Individual and team work: An ability to function effectively individually and on
teams, including diverse and multidisciplinary, to accomplish a common goal.
10.Communication: An ability to communicate effectively with a range of audiences.
11.Project management and finance: An understanding of engineering and
management principles and apply these to one’s own work, as a member and leader
in a team, to manage projects.
12.Life-long learning: Recognition of the need for and an ability to
engage in continuing professional development.

2
DATA ANALYTICS USING R LAB MANUAL

CSE(AI & ML)Program Educational Outcomes [PEO’s]


1. To provide intellectual environment to successfully pursue higher education in the
area of AI.
2. To impart knowledge in cutting edge Artificial Intelligence technologies in par with
industrial standards.
3. To create an atmosphere for explore research areas and produce outstanding
contribution in various areas of Artificial Intelligence and Machine Learning

CSE(AI & ML) Program Specific Outcome [PSO’s]


1. Ability to use knowledge in emerging technologies in identifying research gaps and
provide solutions with innovative ideas.
2. Ability to analyze the problem to provide optimal solution by fundamental
knowledge and skills in Professional, Engineering Sciences.

3
DATA ANALYTICS USING R LAB MANUAL

LAB CODE

 Students should report to the concerned lab as per the time table.
 Students who turn up late to the labs will in no case be permitted to do the
program schedule for the day.
 After completion of the program, certification of the concerned staff in-
charge in the observation book is necessary.
 Student should bring a notebook of 100 pages and should enter the readings
/observations into the notebook while performing the experiment.
 The record of observations along with the detailed experimental procedure of
the experiment in the immediate last session should be submitted and certified
staff member in-charge.
 The group-wise division made in the beginning should be adhered to and no
mix up of students among different groups will be permitted.
 When the experiment is completed, should disconnect the setup made by
them, and should return all the components/instruments taken for the purpose.
 Any damage of the equipment or burn-out components will be viewed
seriously either by putting penalty or by dismissing the total group of students
from the lab for the semester/year.
 Students should be present in the labs for total scheduled duration.
 Students are required to prepare thoroughly to perform the experiment before
coming to laboratory.

4
DATA ANALYTICS USING R LAB MANUAL

INDEX

S.No. List Of Experiments

Data Preprocessing
a. Handling missing values
1 b. Noise detection removal
c. Identifying data redundancy and elimination

2 Implement any one imputation model

3 Implement Linear Regression

4 Implement Logistic Regression

5 Implement Decision Tree Induction for classification

6 Implement Random Forest Classifier

7 Implement ARIMA on Time Series data

8 Object segmentation using hierarchical based methods

Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D


9 Cubes etc)

10 Perform Descriptive analytics on healthcare data

11 Perform Predictive analytics on Product Sales data

12 Apply Predictive analytics for Weather forecasting

5
DATA ANALYTICS USING R LAB MANUAL

Program No. : 1

Date:

Problem Statement:
Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination

Source Code:

A. Handling missing values


# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data


cat("Original Data:\n")
print(data)

# Method 1: Remove rows with missing values


cleaned_data <- na.omit(data)
cat("\nData after removing rows with missing values:\n")
print(cleaned_data)

# Method 2: Imputation (Replace missing values with mean)


mean_imputation <- function(x) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
return(x)
}
data_mean_imputed <- as.data.frame(lapply(data, mean_imputation))
cat("\nData after mean imputation:\n")
print(data_mean_imputed)

# Method 3: Imputation (Replace missing values with median)

6
DATA ANALYTICS USING R LAB MANUAL
median_imputation <- function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
return(x)
}
data_median_imputed <- as.data.frame(lapply(data, median_imputation))
cat("\nData after median imputation:\n")
print(data_median_imputed)

# Method 4: Imputation using mice package (Multiple Imputation by Chained Equations)


library(mice)
imputed_data <- mice(data)
imputed_data <- complete(imputed_data)
cat("\nData after imputation using mice package:\n")
print(imputed_data)

Output :
Original Data:
A B C
1 NA 1

2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after removing rows with missing values:


ABC
2222

Data after mean imputation:


A BC
1 1.00 3 1
2 2.00 2 2
3 3.25 3 3
4 4.00 3 4
5 5.00 5 2

7
DATA ANALYTICS USING R LAB MANUAL

Data after median imputation:


ABC
1 131
2 222
3 333
4 434
5 552

Data after imputation using mice package:


ABC
1131
2222
3333
4434
5552

8
DATA ANALYTICS USING R LAB MANUAL

B. Noise detection removal


# Sample data with noise
data <- c(1, 2, 3, 100, 5, 6, 7, 200, 9, 10)

# Display original data


cat("Original Data:\n")
print(data)

# Method 1: Z-score method for outlier detection and removal


z_score_remove_outliers <- function(x, threshold = 3) {
z <- abs((x - mean(x)) / sd(x))
outliers <- which(z > threshold)
x[outliers] <- NA
return(x)
}

# Apply z-score method


data_without_outliers <- z_score_remove_outliers(data)
cat("\nData after removing outliers using z-score method:\n")
print(data_without_outliers)

Output:
Original Data:
[1] 1 2 3 100 5 6 7 200 9 10

Data after removing outliers using z-score method:


[1] 1 2 3 NA 5 6 7 NA 9 10

9
DATA ANALYTICS USING R LAB MANUAL

C. Identifying rata redundancy and elimination


# Sample data with redundancy
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Alice", "Bob", "John", "Alice"),
Age = c(25, 30, 35, 25, 30),
Gender = c("Male", "Female", "Male", "Male", "Female")
)

# Display original data


cat("Original Data:\n")
print(data)

# Method 1: Identifying redundant rows


find_redundant_rows <- function(df) {
duplicated_rows <- duplicated(df) | duplicated(df, fromLast = TRUE)
redundant_rows <- df[duplicated_rows, ]
return(redundant_rows)
}
redundant_rows <- find_redundant_rows(data)
cat("\nRedundant Rows:\n")
print(redundant_rows)

# Method 2: Eliminating redundant rows


eliminate_redundancy <- function(df) {
unique_data <- unique(df)
return(unique_data)
}

cleaned_data <- eliminate_redundancy(data)


cat("\nData after eliminating redundancy:\n")
print(cleaned_data)

10
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male
4 4 John 25 Male
5 5 Alice 30 Female

Redundant Rows:
ID Name Age Gender
4 4 John 25 Male
5 5 Alice 30 Female

Data after eliminating redundancy:


ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male

11
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 2
Date:
Problem Statement: Implement any one imputation model

Source Code:
# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data


cat("Original Data:\n")
print(data)

# Imputation model using linear regression


impute_with_regression <- function(data) {
for (col in colnames(data)) {
missing_indices <- which(is.na(data[, col]))
if (length(missing_indices) > 0) {
non_missing_indices <- which(!is.na(data[, col]))
model <- lm(data[non_missing_indices, col] ~ ., data = data[non_missing_indices, ])
predicted_values <- predict(model, newdata = data[missing_indices, ])
data[missing_indices, col] <- predicted_values
}
}
return(data)
}

# Apply imputation model


data_imputed <- impute_with_regression(data)

# Display data after imputation


cat("\nData after imputation using linear regression:\n")
print(data_imputed)

12
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
A B C
1 1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after imputation using linear regression:


A B C
1 1.00000 2.999999 1.000000
2 2.00000 2.000000 2.000000
3 3.00000 3.000000 3.000000
4 4.00000 3.999999 4.000000
5 5.00000 5.000000 2.750001

13
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 3

Date:

Problem Statement: Implement Linear Regression

Source Code:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

# Perform linear regression


model <- lm(y ~ x)

# Display regression coefficients


cat("Regression Coefficients:\n")
print(coef(model))

# Plot the data points


plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y", pch = 19, col = "blue")

# Add regression line to the plot


abline(model, col = "red")

# Add legend
legend("topright", legend = "Regression Line", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:
(Intercept) x
1 1

14
DATA ANALYTICS USING R LAB MANUAL

Program No. : 4

Date:

Problem Statement: Implement Logistic Regression

Source Code:

# Sample data
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 1)

# Perform logistic regression


model <- glm(y ~ x, family = binomial)

# Display regression coefficients


cat("Regression Coefficients:\n")
print(summary(model)$coefficients)

# Plot the data points


plot(x, y, main = "Logistic Regression", xlab = "X", ylab = "Probability", pch = 19, col = "blue")

# Add logistic regression curve to the plot


curve(predict(model, data.frame(x = x), type = "response"), add = TRUE, col = "red")

# Add legend
legend("topright", legend = "Logistic Regression Curve", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.2280228 2.7501662 -1.173513 0.2403259764

x 0.5256342 0.4552689 1.154603 0.2484599465

15
DATA ANALYTICS USING R LAB MANUAL
Program No. : 5

Date:

Problem Statement: Implement Decision Tree Induction for classification

Source Code:

# Install and load the rpart package if not already installed


if (!requireNamespace("rpart", quietly = TRUE)) {
install.packages("rpart")
}
library(rpart)

# Sample data
data <- data.frame(
Feature1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Feature2 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
Class = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
)

# Perform decision tree induction


tree_model <- rpart(Class ~ ., data = data, method = "class")

# Plot the decision tree


plot(tree_model, uniform = TRUE, main = "Decision Tree for Classification")
text(tree_model, use.n = TRUE, all = TRUE, cex = 0.8)

# Output the decision rules


cat("Decision Rules:\n")
print(tree_model)

Output:

Decision Rules:
n= 10

node), split, n, loss, yval, (yprob)


* denotes terminal node

1) root 10 4 A (0.6000000 0.4000000)


2) Feature1< 5.5 5 1 A (0.8000000 0.2000000) *
3) Feature1>=5.5 5 1 B (0.2000000 0.8000000) *

16
DATA ANALYTICS USING R LAB MANUAL

Program No. : 6

Date:

Problem Statement: Implement Random Forest Classifier

Source Code:

# Install and load the randomForest package if not already installed


if (!requireNamespace("randomForest", quietly = TRUE)) {
install.packages("randomForest")
}
library(randomForest)

# Sample data
data <- iris

# Split data into training and testing sets


set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(data), 0.7 * nrow(data)) # 70% for training
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]

# Perform Random Forest classification


rf_model <- randomForest(Species ~ ., data = train_data)

# Make predictions on the test set


predictions <- predict(rf_model, newdata = test_data)

# Output predictions
cat("Predictions:\n")
print(predictions)

17
DATA ANALYTICS USING R LAB MANUAL
Output:

Predictions:
[1] setosa setosa setosa setosa setosa setosa setosa
[8] setosa setosa setosa setosa setosa setosa setosa
[15] setosa setosa setosa setosa setosa setosa setosa
[22] setosa setosa setosa setosa setosa setosa setosa
[29] setosa setosa setosa setosa setosa setosa setosa
[36] setosa setosa setosa setosa setosa setosa setosa
[43] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[50] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[57] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[64] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[78] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[92] virginica versicolor versicolor versicolor versicolor versicolor versicolor
[99] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[106] virginica virginica virginica virginica virginica virginica virginica
[113] virginica virginica virginica virginica virginica virginica virginica
[120] virginica virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica virginica
[134] virginica virginica virginica virginica virginica virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica
[148] virginica virginica virginica virginica
Levels: setosa versicolor virginica

18
DATA ANALYTICS USING R LAB MANUAL

Program No. : 7

Date:

Problem Statement: Implement ARIMA on Time Series data

Source code:
# Install and load the forecast package if not already installed
if (!requireNamespace("forecast", quietly = TRUE)) {
install.packages("forecast")
}
library(forecast)

# Sample time series data


ts_data <- c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65)

# Convert the data to a time series object


ts_data <- ts(ts_data)

# Perform ARIMA modeling


arima_model <- auto.arima(ts_data)

# Generate forecast for the next 3 time points


forecast_data <- forecast(arima_model, h = 3)

# Output forecast data


cat("Forecasted values for the next 3 time points:\n")
print(forecast_data$mean)

Output:

Forecasted values for the next 3 time points:


Time Series:
Start = 11
End = 13
Frequency = 1
[1] 70 75 80

19
DATA ANALYTICS USING R LAB MANUAL
Program No. : 8

Date:

Problem Statement: Object segmentation using hierarchical based methods

Source Code:
# Sample data
set.seed(123)
data <- matrix(rnorm(100), ncol = 2)

# Perform hierarchical clustering


hc <- hclust(dist(data))

# Determine clusters
k <- 3
clusters <- cutree(hc, k)

# Output cluster assignments


cat("Cluster Assignments:\n")
print(clusters)

# Plot dendrogram with clusters


plot(hc, main = "Dendrogram with Clusters")
rect.hclust(hc, k = k, border = 2:4)

Output:
Cluster Assignments:
[1] 2 2 1 1 1 1 1 1 3 3 2 3 1 1 3 1 3 3 1 1 1 1 3 1 3 2 2 3 3 1 2 2 2 3 2 2 2
[38] 1 3 2 1 3 2 2 1 3 1 3 2 2 2 2 2 1 3 3 2 1 3 1 1 2 2 2 2 2 1 1 1 2 3 1 1 1
[75] 1 1 1 1 2 3 3 3 2 1 1 3 2 2 3 1 1 2 2 3 1 1 2 2 2

20
DATA ANALYTICS USING R LAB MANUAL

Program No. : 9

Date:

Problem Statement: Perform Visualization techniques (types of maps - Bar, Colum, Line,
Scatter, 3D Cubes etc)

Source Code
Path of the file to read
flight_filepath = "../input/flight_delays.csv"

# Read the file into a variable flight_data


flight_data = pd.read_csv(flight_filepath, index_col="Month")
# Print the data
flight_data

# Set the width and height of the figure


plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis


plt.ylabel("Arrival delay (in minutes)")

Output :

21
DATA ANALYTICS USING R LAB MANUAL

Line Graph:

Source Code:

# Path of the file to read


spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data


spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)
# Print the first 5 rows of the data
spotify_data.head()
# Print the last five rows of the data
spotify_data.tail()

# Line chart showing daily global streams of each song


sns.lineplot(data=spotify_data)

Output:

22
DATA ANALYTICS USING R LAB MANUAL

Scatter Graph:

Source Code:

# Path of the file to read


insurance_filepath = "../input/insurance.csv"

# Read the file into a variable insurance_data


insurance_data = pd.read_csv(insurance_filepath)
insurance_data.head()

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Output:

0 19 female 27.900 0 yes southwest 16884.92400

1 18 male 33.770 1 no southeast 1725.55230

2 28 male 33.000 3 no southeast 4449.46200

3 33 male 22.705 0 no northwest 21984.47061

4 32 male 28.880 0 no northwest 3866.85520

23
DATA ANALYTICS USING R LAB MANUAL

Program No. : 10

Date:

Problem Statement: Perform Descriptive analytics on healthcare data

Source Code:

# Load necessary libraries


library(dplyr) # for data manipulation
library(ggplot2) # for data visualization

# Load healthcare data (sample data)


healthcare_data <- read.csv("healthcare_data.csv")

# View the structure of the dataset


str(healthcare_data)

# Summary statistics
summary_stats <- summary(healthcare_data)
print(summary_stats)

# Descriptive statistics for blood pressure


blood_pressure_stats <- summarize(healthcare_data,
avg_systolic_bp = mean(systolic_bp),
avg_diastolic_bp = mean(diastolic_bp),
max_systolic_bp = max(systolic_bp),
max_diastolic_bp = max(diastolic_bp),
min_systolic_bp = min(systolic_bp),
min_diastolic_bp = min(diastolic_bp))
print(blood_pressure_stats)

# Descriptive statistics for cholesterol levels


cholesterol_stats <- summarize(healthcare_data,
avg_total_cholesterol = mean(total_cholesterol),
max_total_cholesterol = max(total_cholesterol),
min_total_cholesterol = min(total_cholesterol))
print(cholesterol_stats)

# Data visualization - Histogram of blood pressure


blood_pressure_hist <- ggplot(healthcare_data, aes(x = systolic_bp)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Histogram of Systolic Blood Pressure", x = "Systolic Blood Pressure", y = "Frequency")
print(blood_pressure_hist)

# Data visualization - Boxplot of cholesterol levels


cholesterol_boxplot <- ggplot(healthcare_data, aes(x = "", y = total_cholesterol)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(title = "Boxplot of Total Cholesterol Levels", x = "", y = "Total Cholesterol")
print(cholesterol_boxplot)

24
DATA ANALYTICS USING R LAB MANUAL
Output:

## Pregnancies Glucose BloodPressure SkinThickness


## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00
## Median : 3.000 Median :117.0 Median : 72.00 Median :23.00
## Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54
## 3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00
## Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00
## Insulin BMI DiabetesPedigreeFunction Age
## Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00
## 1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00
## Median : 30.5 Median :32.00 Median :0.3725 Median :29.00
## Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24
## 3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
## Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
## Outcome
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.349
## 3rd Qu.:1.000
## Max. :1.000

25
DATA ANALYTICS USING R LAB MANUAL
Program NO. : 11

Date:

Problem Statement: Perform Predictive analytics on Product Sales data

Source Code:

# Load necessary libraries


library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(lmtest) # for linear regression

# Load product sales data (sample data)


sales_data <- read.csv("product_sales_data.csv")

# View the structure of the dataset


str(sales_data)

# Summary statistics
summary_stats <- summary(sales_data)
print(summary_stats)

# Data visualization - Time series plot of sales


time_series_plot <- ggplot(sales_data, aes(x = date, y = sales)) +
geom_line() +
labs(title = "Time Series Plot of Sales", x = "Date", y = "Sales")
print(time_series_plot)

# Train-test split (80-20 split)


set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[train_indices, ]
test_data <- sales_data[-train_indices, ]

# Simple linear regression model


sales_lm <- lm(sales ~ date, data = train_data)

# Summary of the linear regression model


summary(sales_lm)

# Predictions on test data


predicted_sales <- predict(sales_lm, newdata = test_data)

# Evaluate model performance


rmse <- sqrt(mean((predicted_sales - test_data$sales)^2))
cat("Root Mean Squared Error (RMSE):", rmse, "\n")

# Plot actual vs. predicted sales


actual_vs_predicted_plot <- ggplot() +
geom_line(data = test_data, aes(x = date, y = sales), color = "blue", linetype = "solid") +

26
DATA ANALYTICS USING R LAB MANUAL
geom_line(data = test_data, aes(x = date, y = predicted_sales), color = "red", linetype = "dashed") +
labs(title = "Actual vs. Predicted Sales", x = "Date", y = "Sales")
print(actual_vs_predicted_plot)

Output:

27
PROGRAMMING IN PYTHON LAB MANUAL
Program NO. : 12
Problem Statement: Apply Predictive analytics for Weather forecasting

Source Code:

# Load necessary libraries


library(forecast) # for time series forecasting

# Load weather data (sample data)


weather_data <- read.csv("weather_data.csv")

# Convert date column to Date type


weather_data$date <- as.Date(weather_data$date)

# View the structure of the dataset


str(weather_data)

# Summary statistics
summary_stats <- summary(weather_data)
print(summary_stats)

# Data visualization - Time series plot of temperature


time_series_plot <- plot(weather_data$date, weather_data$temperature,
type = "l", xlab = "Date", ylab = "Temperature",
main = "Time Series Plot of Temperature")
print(time_series_plot)

# Create time series object


weather_ts <- ts(weather_data$temperature, frequency = 365)

# Fit ARIMA model


arima_model <- auto.arima(weather_ts)

# Forecast for the next 7 days


forecast_result <- forecast(arima_model, h = 7)

# Plot the forecast


forecast_plot <- plot(forecast_result, main = "Forecast for Next 7 Days")
print(forecast_plot)

# Print forecasted values


print(forecast_result)
everse_words(s) print("The reversed sentence: ",rs)

28
PROGRAMMING IN PYTHON LAB MANUAL
Output:

29

You might also like