0% found this document useful (0 votes)

4 views

Data Analytics Using R Lab - Master Manual

The document is a lab manual for a Data Analytics course using R, aimed at Computer Science and Engineering students in their first semester. It outlines the vision, mission, program outcomes, and specific experiments related to data preprocessing, regression models, and classification techniques. The manual includes guidelines for lab conduct, a list of experiments, and sample code for various data analytics tasks.

Uploaded by

Vinay Kumar Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Data Analytics Using R Lab - Master Manual

Uploaded by

Vinay Kumar Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

DATA ANALYTICS USING R LAB MANUAL

DATA ANALYTICS USING R LAB

MASTER MANUAL
[AI507PC]

III B.TECH – I SEMESTER

ACADEMIC YEAR : 2024-2025
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING ( AI & ML)

CMR ENGINEERING COLLEGE

(Approved by AICTE- New Delhi, Affiliated to JNTUH)
Kandlakoya(V), Medchal Road, Hyderabad

1
DATA ANALYTICS USING R LAB MANUAL
CSE(AI & ML) Department Vision & Mission
Vision:
To produce admirable and competent graduates & experts in Artificial Intelligence &
Machine Learning by quality technical education, innovations and research to
improve the life style in the society.
Mission:
M1: To impart value based technical education in AI & ML through innovative
teaching and learning methods.
M2: To produce outstanding professionals by imparting quality training, hands-on-
experience and value based education.
M3: To produce competent graduates suitable for industries and organizations at global
level including research and development with Social responsibility.

CSE(AI &ML) Program Outcomes [PO’s]:

Engineering Graduates will be able to satisfy these NBA graduate attributes:
1. Engineering knowledge: An ability to apply knowledge of computing,
mathematics, science and engineering fundamentals appropriate to the discipline.
2. Problem analysis: An ability to analyze a problem, and identify and formulate the
computing requirements appropriate to its solution.
3. Design/development of solutions: An ability to design, implement, and evaluate a
computer-based system, process, component, or program to meet desired needs
with appropriate consideration for public health and safety, cultural, societal and
environmental considerations.
4. Conduct investigations of complex problems: An ability to design and conduct
experiments, as well as to analyze and interpret data.
5. Modern tool usage: An ability to use current techniques, skills, and modern tools
necessary for computing practice.
6. The engineer and society: An ability to analyze the local and global impact of
computing on individuals, organizations, and society.
7. Environment and sustainability: Knowledge of contemporary issues.
8. Ethics: An understanding of professional, ethical, legal, security and social issues
and responsibilities.
9. Individual and team work: An ability to function effectively individually and on
teams, including diverse and multidisciplinary, to accomplish a common goal.
10.Communication: An ability to communicate effectively with a range of audiences.
11.Project management and finance: An understanding of engineering and
management principles and apply these to one’s own work, as a member and leader
in a team, to manage projects.
12.Life-long learning: Recognition of the need for and an ability to
engage in continuing professional development.

2
DATA ANALYTICS USING R LAB MANUAL

CSE(AI & ML)Program Educational Outcomes [PEO’s]

1. To provide intellectual environment to successfully pursue higher education in the
area of AI.
2. To impart knowledge in cutting edge Artificial Intelligence technologies in par with
industrial standards.
3. To create an atmosphere for explore research areas and produce outstanding
contribution in various areas of Artificial Intelligence and Machine Learning

CSE(AI & ML) Program Specific Outcome [PSO’s]

1. Ability to use knowledge in emerging technologies in identifying research gaps and
provide solutions with innovative ideas.
2. Ability to analyze the problem to provide optimal solution by fundamental
knowledge and skills in Professional, Engineering Sciences.

3
DATA ANALYTICS USING R LAB MANUAL

LAB CODE

 Students should report to the concerned lab as per the time table.
 Students who turn up late to the labs will in no case be permitted to do the
program schedule for the day.
 After completion of the program, certification of the concerned staff in-
charge in the observation book is necessary.
 Student should bring a notebook of 100 pages and should enter the readings
/observations into the notebook while performing the experiment.
 The record of observations along with the detailed experimental procedure of
the experiment in the immediate last session should be submitted and certified
staff member in-charge.
 The group-wise division made in the beginning should be adhered to and no
mix up of students among different groups will be permitted.
 When the experiment is completed, should disconnect the setup made by
them, and should return all the components/instruments taken for the purpose.
 Any damage of the equipment or burn-out components will be viewed
seriously either by putting penalty or by dismissing the total group of students
from the lab for the semester/year.
 Students should be present in the labs for total scheduled duration.
 Students are required to prepare thoroughly to perform the experiment before
coming to laboratory.

4
DATA ANALYTICS USING R LAB MANUAL

INDEX

S.No. List Of Experiments

Data Preprocessing
a. Handling missing values
1 b. Noise detection removal
c. Identifying data redundancy and elimination

2 Implement any one imputation model

3 Implement Linear Regression

4 Implement Logistic Regression

5 Implement Decision Tree Induction for classification

6 Implement Random Forest Classifier

7 Implement ARIMA on Time Series data

8 Object segmentation using hierarchical based methods

Perform Visualization techniques (types of maps - Bar, Colum, Line, Scatter, 3D

9 Cubes etc)

10 Perform Descriptive analytics on healthcare data

11 Perform Predictive analytics on Product Sales data

12 Apply Predictive analytics for Weather forecasting

5
DATA ANALYTICS USING R LAB MANUAL

Program No. : 1

Date:

Problem Statement:
Data Preprocessing
a. Handling missing values
b. Noise detection removal
c. Identifying data redundancy and elimination

Source Code:

A. Handling missing values

# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Remove rows with missing values

cleaned_data <- na.omit(data)
cat("\nData after removing rows with missing values:\n")
print(cleaned_data)

# Method 2: Imputation (Replace missing values with mean)

mean_imputation <- function(x) {
x[is.na(x)] <- mean(x, na.rm = TRUE)
return(x)
}
data_mean_imputed <- as.data.frame(lapply(data, mean_imputation))
cat("\nData after mean imputation:\n")
print(data_mean_imputed)

# Method 3: Imputation (Replace missing values with median)

6
DATA ANALYTICS USING R LAB MANUAL
median_imputation <- function(x) {
x[is.na(x)] <- median(x, na.rm = TRUE)
return(x)
}
data_median_imputed <- as.data.frame(lapply(data, median_imputation))
cat("\nData after median imputation:\n")
print(data_median_imputed)

# Method 4: Imputation using mice package (Multiple Imputation by Chained Equations)

library(mice)
imputed_data <- mice(data)
imputed_data <- complete(imputed_data)
cat("\nData after imputation using mice package:\n")
print(imputed_data)

Output :
Original Data:
A B C
1 NA 1

2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after removing rows with missing values:

ABC
2222

Data after mean imputation:

A BC
1 1.00 3 1
2 2.00 2 2
3 3.25 3 3
4 4.00 3 4
5 5.00 5 2

7
DATA ANALYTICS USING R LAB MANUAL

Data after median imputation:

ABC
1 131
2 222
3 333
4 434
5 552

Data after imputation using mice package:

ABC
1131
2222
3333
4434
5552

8
DATA ANALYTICS USING R LAB MANUAL

B. Noise detection removal

# Sample data with noise
data <- c(1, 2, 3, 100, 5, 6, 7, 200, 9, 10)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Z-score method for outlier detection and removal

z_score_remove_outliers <- function(x, threshold = 3) {
z <- abs((x - mean(x)) / sd(x))
outliers <- which(z > threshold)
x[outliers] <- NA
return(x)
}

# Apply z-score method

data_without_outliers <- z_score_remove_outliers(data)
cat("\nData after removing outliers using z-score method:\n")
print(data_without_outliers)

Output:
Original Data:
[1] 1 2 3 100 5 6 7 200 9 10

Data after removing outliers using z-score method:

[1] 1 2 3 NA 5 6 7 NA 9 10

9
DATA ANALYTICS USING R LAB MANUAL

C. Identifying rata redundancy and elimination

# Sample data with redundancy
data <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Alice", "Bob", "John", "Alice"),
Age = c(25, 30, 35, 25, 30),
Gender = c("Male", "Female", "Male", "Male", "Female")
)

# Display original data

cat("Original Data:\n")
print(data)

# Method 1: Identifying redundant rows

find_redundant_rows <- function(df) {
duplicated_rows <- duplicated(df) | duplicated(df, fromLast = TRUE)
redundant_rows <- df[duplicated_rows, ]
return(redundant_rows)
}
redundant_rows <- find_redundant_rows(data)
cat("\nRedundant Rows:\n")
print(redundant_rows)

# Method 2: Eliminating redundant rows

eliminate_redundancy <- function(df) {
unique_data <- unique(df)
return(unique_data)
}

cleaned_data <- eliminate_redundancy(data)

cat("\nData after eliminating redundancy:\n")
print(cleaned_data)

10
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male
4 4 John 25 Male
5 5 Alice 30 Female

Redundant Rows:
ID Name Age Gender
4 4 John 25 Male
5 5 Alice 30 Female

Data after eliminating redundancy:

ID Name Age Gender
1 1 John 25 Male
2 2 Alice 30 Female
3 3 Bob 35 Male

11
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 2
Date:
Problem Statement: Implement any one imputation model

Source Code:
# Sample data with missing values
data <- data.frame(
A = c(1, 2, NA, 4, 5),
B = c(NA, 2, 3, NA, 5),
C = c(1, 2, 3, 4, NA)
)

# Display original data

cat("Original Data:\n")
print(data)

# Imputation model using linear regression

impute_with_regression <- function(data) {
for (col in colnames(data)) {
missing_indices <- which(is.na(data[, col]))
if (length(missing_indices) > 0) {
non_missing_indices <- which(!is.na(data[, col]))
model <- lm(data[non_missing_indices, col] ~ ., data = data[non_missing_indices, ])
predicted_values <- predict(model, newdata = data[missing_indices, ])
data[missing_indices, col] <- predicted_values
}
}
return(data)
}

# Apply imputation model

data_imputed <- impute_with_regression(data)

# Display data after imputation

cat("\nData after imputation using linear regression:\n")
print(data_imputed)

12
DATA ANALYTICS USING R LAB MANUAL
Output:
Original Data:
A B C
1 1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA 4
5 5 5 NA

Data after imputation using linear regression:

A B C
1 1.00000 2.999999 1.000000
2 2.00000 2.000000 2.000000
3 3.00000 3.000000 3.000000
4 4.00000 3.999999 4.000000
5 5.00000 5.000000 2.750001

13
DATA ANALYTICS USING R LAB MANUAL
Program. No. : 3

Date:

Problem Statement: Implement Linear Regression

Source Code:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

# Perform linear regression

model <- lm(y ~ x)

# Display regression coefficients

cat("Regression Coefficients:\n")
print(coef(model))

# Plot the data points

plot(x, y, main = "Linear Regression", xlab = "X", ylab = "Y", pch = 19, col = "blue")

# Add regression line to the plot

abline(model, col = "red")

# Add legend
legend("topright", legend = "Regression Line", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:
(Intercept) x
1 1

14
DATA ANALYTICS USING R LAB MANUAL

Program No. : 4

Date:

Problem Statement: Implement Logistic Regression

Source Code:

# Sample data
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(0, 0, 0, 0, 1, 1, 1, 1, 1, 1)

# Perform logistic regression

model <- glm(y ~ x, family = binomial)

# Display regression coefficients

cat("Regression Coefficients:\n")
print(summary(model)$coefficients)

# Plot the data points

plot(x, y, main = "Logistic Regression", xlab = "X", ylab = "Probability", pch = 19, col = "blue")

# Add logistic regression curve to the plot

curve(predict(model, data.frame(x = x), type = "response"), add = TRUE, col = "red")

# Add legend
legend("topright", legend = "Logistic Regression Curve", col = "red", lty = 1, cex = 0.8)

Output:

Regression Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.2280228 2.7501662 -1.173513 0.2403259764

x 0.5256342 0.4552689 1.154603 0.2484599465

15
DATA ANALYTICS USING R LAB MANUAL
Program No. : 5

Date:

Problem Statement: Implement Decision Tree Induction for classification

Source Code:

# Install and load the rpart package if not already installed

if (!requireNamespace("rpart", quietly = TRUE)) {
install.packages("rpart")
}
library(rpart)

# Sample data
data <- data.frame(
Feature1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Feature2 = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1),
Class = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
)

# Perform decision tree induction

tree_model <- rpart(Class ~ ., data = data, method = "class")

# Plot the decision tree

plot(tree_model, uniform = TRUE, main = "Decision Tree for Classification")
text(tree_model, use.n = TRUE, all = TRUE, cex = 0.8)

# Output the decision rules

cat("Decision Rules:\n")
print(tree_model)

Output:

Decision Rules:
n= 10

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 10 4 A (0.6000000 0.4000000)

2) Feature1< 5.5 5 1 A (0.8000000 0.2000000) *
3) Feature1>=5.5 5 1 B (0.2000000 0.8000000) *

16
DATA ANALYTICS USING R LAB MANUAL

Program No. : 6

Date:

Problem Statement: Implement Random Forest Classifier

Source Code:

# Install and load the randomForest package if not already installed

if (!requireNamespace("randomForest", quietly = TRUE)) {
install.packages("randomForest")
}
library(randomForest)

# Sample data
data <- iris

# Split data into training and testing sets

set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(data), 0.7 * nrow(data)) # 70% for training
train_data <- data[train_indices, ]
test_data <- data[-train_indices, ]

# Perform Random Forest classification

rf_model <- randomForest(Species ~ ., data = train_data)

# Make predictions on the test set

predictions <- predict(rf_model, newdata = test_data)

# Output predictions
cat("Predictions:\n")
print(predictions)

17
DATA ANALYTICS USING R LAB MANUAL
Output:

Predictions:
[1] setosa setosa setosa setosa setosa setosa setosa
[8] setosa setosa setosa setosa setosa setosa setosa
[15] setosa setosa setosa setosa setosa setosa setosa
[22] setosa setosa setosa setosa setosa setosa setosa
[29] setosa setosa setosa setosa setosa setosa setosa
[36] setosa setosa setosa setosa setosa setosa setosa
[43] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[50] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[57] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[64] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[71] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[78] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[85] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[92] virginica versicolor versicolor versicolor versicolor versicolor versicolor
[99] versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[106] virginica virginica virginica virginica virginica virginica virginica
[113] virginica virginica virginica virginica virginica virginica virginica
[120] virginica virginica virginica virginica virginica virginica virginica
[127] virginica virginica virginica virginica virginica virginica virginica
[134] virginica virginica virginica virginica virginica virginica virginica
[141] virginica virginica virginica virginica virginica virginica virginica
[148] virginica virginica virginica virginica
Levels: setosa versicolor virginica

18
DATA ANALYTICS USING R LAB MANUAL

Program No. : 7

Date:

Problem Statement: Implement ARIMA on Time Series data

Source code:
# Install and load the forecast package if not already installed
if (!requireNamespace("forecast", quietly = TRUE)) {
install.packages("forecast")
}
library(forecast)

# Sample time series data

ts_data <- c(20, 25, 30, 35, 40, 45, 50, 55, 60, 65)

# Convert the data to a time series object

ts_data <- ts(ts_data)

# Perform ARIMA modeling

arima_model <- auto.arima(ts_data)

# Generate forecast for the next 3 time points

forecast_data <- forecast(arima_model, h = 3)

# Output forecast data

cat("Forecasted values for the next 3 time points:\n")
print(forecast_data$mean)

Output:

Forecasted values for the next 3 time points:

Time Series:
Start = 11
End = 13
Frequency = 1
[1] 70 75 80

19
DATA ANALYTICS USING R LAB MANUAL
Program No. : 8

Date:

Problem Statement: Object segmentation using hierarchical based methods

Source Code:
# Sample data
set.seed(123)
data <- matrix(rnorm(100), ncol = 2)

# Perform hierarchical clustering

hc <- hclust(dist(data))

# Determine clusters
k <- 3
clusters <- cutree(hc, k)

# Output cluster assignments

cat("Cluster Assignments:\n")
print(clusters)

# Plot dendrogram with clusters

plot(hc, main = "Dendrogram with Clusters")
rect.hclust(hc, k = k, border = 2:4)

Output:
Cluster Assignments:
[1] 2 2 1 1 1 1 1 1 3 3 2 3 1 1 3 1 3 3 1 1 1 1 3 1 3 2 2 3 3 1 2 2 2 3 2 2 2
[38] 1 3 2 1 3 2 2 1 3 1 3 2 2 2 2 2 1 3 3 2 1 3 1 1 2 2 2 2 2 1 1 1 2 3 1 1 1
[75] 1 1 1 1 2 3 3 3 2 1 1 3 2 2 3 1 1 2 2 3 1 1 2 2 2

20
DATA ANALYTICS USING R LAB MANUAL

Program No. : 9

Date:

Problem Statement: Perform Visualization techniques (types of maps - Bar, Colum, Line,
Scatter, 3D Cubes etc)

Source Code
Path of the file to read
flight_filepath = "../input/flight_delays.csv"

# Read the file into a variable flight_data

flight_data = pd.read_csv(flight_filepath, index_col="Month")
# Print the data
flight_data

# Set the width and height of the figure

plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis

plt.ylabel("Arrival delay (in minutes)")

Output :

21
DATA ANALYTICS USING R LAB MANUAL

Line Graph:

Source Code:

# Path of the file to read

spotify_filepath = "../input/spotify.csv"

# Read the file into a variable spotify_data

spotify_data = pd.read_csv(spotify_filepath, index_col="Date", parse_dates=True)
# Print the first 5 rows of the data
spotify_data.head()
# Print the last five rows of the data
spotify_data.tail()

# Line chart showing daily global streams of each song

sns.lineplot(data=spotify_data)

Output:

22
DATA ANALYTICS USING R LAB MANUAL

Scatter Graph:

Source Code:

# Path of the file to read

insurance_filepath = "../input/insurance.csv"

# Read the file into a variable insurance_data

insurance_data = pd.read_csv(insurance_filepath)
insurance_data.head()

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Output:

0 19 female 27.900 0 yes southwest 16884.92400

1 18 male 33.770 1 no southeast 1725.55230

2 28 male 33.000 3 no southeast 4449.46200

3 33 male 22.705 0 no northwest 21984.47061

4 32 male 28.880 0 no northwest 3866.85520

23
DATA ANALYTICS USING R LAB MANUAL

Program No. : 10

Date:

Problem Statement: Perform Descriptive analytics on healthcare data

Source Code:

# Load necessary libraries

library(dplyr) # for data manipulation
library(ggplot2) # for data visualization

# Load healthcare data (sample data)

healthcare_data <- read.csv("healthcare_data.csv")

# View the structure of the dataset

str(healthcare_data)

# Summary statistics
summary_stats <- summary(healthcare_data)
print(summary_stats)

# Descriptive statistics for blood pressure

blood_pressure_stats <- summarize(healthcare_data,
avg_systolic_bp = mean(systolic_bp),
avg_diastolic_bp = mean(diastolic_bp),
max_systolic_bp = max(systolic_bp),
max_diastolic_bp = max(diastolic_bp),
min_systolic_bp = min(systolic_bp),
min_diastolic_bp = min(diastolic_bp))
print(blood_pressure_stats)

# Descriptive statistics for cholesterol levels

cholesterol_stats <- summarize(healthcare_data,
avg_total_cholesterol = mean(total_cholesterol),
max_total_cholesterol = max(total_cholesterol),
min_total_cholesterol = min(total_cholesterol))
print(cholesterol_stats)

# Data visualization - Histogram of blood pressure

blood_pressure_hist <- ggplot(healthcare_data, aes(x = systolic_bp)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Histogram of Systolic Blood Pressure", x = "Systolic Blood Pressure", y = "Frequency")
print(blood_pressure_hist)

# Data visualization - Boxplot of cholesterol levels

cholesterol_boxplot <- ggplot(healthcare_data, aes(x = "", y = total_cholesterol)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(title = "Boxplot of Total Cholesterol Levels", x = "", y = "Total Cholesterol")
print(cholesterol_boxplot)

24
DATA ANALYTICS USING R LAB MANUAL
Output:

## Pregnancies Glucose BloodPressure SkinThickness

## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00
## Median : 3.000 Median :117.0 Median : 72.00 Median :23.00
## Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54
## 3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00
## Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00
## Insulin BMI DiabetesPedigreeFunction Age
## Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00
## 1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00
## Median : 30.5 Median :32.00 Median :0.3725 Median :29.00
## Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24
## 3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00
## Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00
## Outcome
## Min. :0.000
## 1st Qu.:0.000
## Median :0.000
## Mean :0.349
## 3rd Qu.:1.000
## Max. :1.000

25
DATA ANALYTICS USING R LAB MANUAL
Program NO. : 11

Date:

Problem Statement: Perform Predictive analytics on Product Sales data

Source Code:

# Load necessary libraries

library(ggplot2) # for data visualization
library(dplyr) # for data manipulation
library(lmtest) # for linear regression

# Load product sales data (sample data)

sales_data <- read.csv("product_sales_data.csv")

# View the structure of the dataset

str(sales_data)

# Summary statistics
summary_stats <- summary(sales_data)
print(summary_stats)

# Data visualization - Time series plot of sales

time_series_plot <- ggplot(sales_data, aes(x = date, y = sales)) +
geom_line() +
labs(title = "Time Series Plot of Sales", x = "Date", y = "Sales")
print(time_series_plot)

# Train-test split (80-20 split)

set.seed(123) # For reproducibility
train_indices <- sample(1:nrow(sales_data), 0.8 * nrow(sales_data))
train_data <- sales_data[train_indices, ]
test_data <- sales_data[-train_indices, ]

# Simple linear regression model

sales_lm <- lm(sales ~ date, data = train_data)

# Summary of the linear regression model

summary(sales_lm)

# Predictions on test data

predicted_sales <- predict(sales_lm, newdata = test_data)

# Evaluate model performance

rmse <- sqrt(mean((predicted_sales - test_data$sales)^2))
cat("Root Mean Squared Error (RMSE):", rmse, "\n")

# Plot actual vs. predicted sales

actual_vs_predicted_plot <- ggplot() +
geom_line(data = test_data, aes(x = date, y = sales), color = "blue", linetype = "solid") +

26
DATA ANALYTICS USING R LAB MANUAL
geom_line(data = test_data, aes(x = date, y = predicted_sales), color = "red", linetype = "dashed") +
labs(title = "Actual vs. Predicted Sales", x = "Date", y = "Sales")
print(actual_vs_predicted_plot)

Output:

27
PROGRAMMING IN PYTHON LAB MANUAL
Program NO. : 12
Problem Statement: Apply Predictive analytics for Weather forecasting

Source Code:

# Load necessary libraries

library(forecast) # for time series forecasting

# Load weather data (sample data)

weather_data <- read.csv("weather_data.csv")

# Convert date column to Date type

weather_data$date <- as.Date(weather_data$date)

# View the structure of the dataset

str(weather_data)

# Summary statistics
summary_stats <- summary(weather_data)
print(summary_stats)

# Data visualization - Time series plot of temperature

time_series_plot <- plot(weather_data$date, weather_data$temperature,
type = "l", xlab = "Date", ylab = "Temperature",
main = "Time Series Plot of Temperature")
print(time_series_plot)

# Create time series object

weather_ts <- ts(weather_data$temperature, frequency = 365)

# Fit ARIMA model

arima_model <- auto.arima(weather_ts)

# Forecast for the next 7 days

forecast_result <- forecast(arima_model, h = 7)

# Plot the forecast

forecast_plot <- plot(forecast_result, main = "Forecast for Next 7 Days")
print(forecast_plot)

# Print forecasted values

print(forecast_result)
everse_words(s) print("The reversed sentence: ",rs)

28
PROGRAMMING IN PYTHON LAB MANUAL
Output:

Stereochemistry Worksheet Lab
0% (1)
Stereochemistry Worksheet Lab
3 pages
Data Analytics With R - BDS306C - LAB - Full
No ratings yet
Data Analytics With R - BDS306C - LAB - Full
61 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
R-Programming Lab Mannual (1)
No ratings yet
R-Programming Lab Mannual (1)
33 pages
Da (22C01156)
No ratings yet
Da (22C01156)
26 pages
Data_analysis_with_R _24
No ratings yet
Data_analysis_with_R _24
47 pages
DAV practical 2
No ratings yet
DAV practical 2
6 pages
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
No ratings yet
Galgotias College of Engineering & Technology: Inroduction To Data Analytics and Visualization Lab File (KDS-551)
47 pages
R Lab File Deepak
No ratings yet
R Lab File Deepak
27 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Experiment No. 5: Objective
No ratings yet
Experiment No. 5: Objective
5 pages
2.business
No ratings yet
2.business
13 pages
1asdfadgaf
No ratings yet
1asdfadgaf
8 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
saurabh
No ratings yet
saurabh
22 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Data cleaning Using R
No ratings yet
Data cleaning Using R
5 pages
Section 03
No ratings yet
Section 03
20 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
R Syllabus Chandigarh University
No ratings yet
R Syllabus Chandigarh University
3 pages
NAS1001_NASSCOM-FUTURE-SKILLS---ASSOCIATIVE-DATA-ANALYST_LTP_1.0_1_NAS1001_NASSCOM-FUTURE-SKILLS-ASSOCIATIVE-DATA-ANALYST_LTP_1.0_1_Associative Data Analyst (2)
No ratings yet
NAS1001_NASSCOM-FUTURE-SKILLS---ASSOCIATIVE-DATA-ANALYST_LTP_1.0_1_NAS1001_NASSCOM-FUTURE-SKILLS-ASSOCIATIVE-DATA-ANALYST_LTP_1.0_1_Associative Data Analyst (2)
3 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
Data Cleaning Wrangling
No ratings yet
Data Cleaning Wrangling
42 pages
Unit 1
No ratings yet
Unit 1
21 pages
Aman Data
No ratings yet
Aman Data
64 pages
GOOGLE CLOUD DATA ANALYTICS _Ingage
No ratings yet
GOOGLE CLOUD DATA ANALYTICS _Ingage
4 pages
DA Lab 1-7
No ratings yet
DA Lab 1-7
26 pages
R Programming LAB
No ratings yet
R Programming LAB
32 pages
Statiscal Method Using R Lab, Syllabus
No ratings yet
Statiscal Method Using R Lab, Syllabus
3 pages
Wa0002.
No ratings yet
Wa0002.
22 pages
Data Analytics-Lab Manual
No ratings yet
Data Analytics-Lab Manual
19 pages
Chapter 2. Pre-Processing Data
No ratings yet
Chapter 2. Pre-Processing Data
37 pages
Ida Lab Final
No ratings yet
Ida Lab Final
29 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
data analytics lab manual using R programming
No ratings yet
data analytics lab manual using R programming
27 pages
Lab file AD pdf
No ratings yet
Lab file AD pdf
25 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Data Science Minor Syllabus-Sem-04
No ratings yet
Data Science Minor Syllabus-Sem-04
4 pages
SAMEENA PARVIN R PROG
No ratings yet
SAMEENA PARVIN R PROG
43 pages
R Studio Assignments
No ratings yet
R Studio Assignments
95 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Singh_Project1_Report
No ratings yet
Singh_Project1_Report
12 pages
Da Lab It
No ratings yet
Da Lab It
20 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
ML_EXP_NO_1
No ratings yet
ML_EXP_NO_1
8 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
Research File 3
No ratings yet
Research File 3
10 pages
DWR TEE PAPER
No ratings yet
DWR TEE PAPER
8 pages
FMS Final Submission
No ratings yet
FMS Final Submission
25 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
DA Lab Manual
No ratings yet
DA Lab Manual
42 pages
chapter3 DS
No ratings yet
chapter3 DS
17 pages
R Lab Manual
No ratings yet
R Lab Manual
27 pages
Ads Exp2 C35
No ratings yet
Ads Exp2 C35
9 pages
Workflow of Statistical Data Analysis
No ratings yet
Workflow of Statistical Data Analysis
105 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Syllabus
No ratings yet
Syllabus
8 pages
Big Data - Lab 3
No ratings yet
Big Data - Lab 3
25 pages
Data Preparation: Handling Missing Values and Outliers
No ratings yet
Data Preparation: Handling Missing Values and Outliers
28 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
HVAC Symbols Guide
100% (1)
HVAC Symbols Guide
14 pages
Group 20 Self-Diagnostic (Ecm, Tier Ii Only) : 1. Ecm Fault Codes Display
No ratings yet
Group 20 Self-Diagnostic (Ecm, Tier Ii Only) : 1. Ecm Fault Codes Display
7 pages
Ballistics-Workbook Diego Gregorio BSC 4a
No ratings yet
Ballistics-Workbook Diego Gregorio BSC 4a
58 pages
Temperature & Altitude Effects
No ratings yet
Temperature & Altitude Effects
5 pages
Editable Format Application Based Project Report (1)
No ratings yet
Editable Format Application Based Project Report (1)
21 pages
Mathematics XII (Reduced Syllabus 2025) (2)
No ratings yet
Mathematics XII (Reduced Syllabus 2025) (2)
3 pages
Download ebooks file Learning Functional Data Structures and Algorithms Learn functional data structures and algorithms for your applications and bring their benefits to your work now 1st Edition Atul S. Khot all chapters
100% (10)
Download ebooks file Learning Functional Data Structures and Algorithms Learn functional data structures and algorithms for your applications and bring their benefits to your work now 1st Edition Atul S. Khot all chapters
62 pages
9 Exploitation With Ruby
No ratings yet
9 Exploitation With Ruby
69 pages
Inorganic Chemistry: Group 17
100% (3)
Inorganic Chemistry: Group 17
38 pages
Lisega - 1 - Constant Hangers PDF
No ratings yet
Lisega - 1 - Constant Hangers PDF
27 pages
Clustal 2.1 Multiple Sequence Alignment File: C:/Users/DELL/Downloads/P49927.1, P52114.1, AAH22532, P04156.1, P40252.1, P61766.1, P40248.1, P51780.1,.ps Date: Sat Feb 02 09:11:31 2019 Page 1 of 1
No ratings yet
Clustal 2.1 Multiple Sequence Alignment File: C:/Users/DELL/Downloads/P49927.1, P52114.1, AAH22532, P04156.1, P40252.1, P61766.1, P40248.1, P51780.1,.ps Date: Sat Feb 02 09:11:31 2019 Page 1 of 1
1 page
Chapter 1: The Foundations: Logic and Proofs: Discrete Mathematics and Its Applications
No ratings yet
Chapter 1: The Foundations: Logic and Proofs: Discrete Mathematics and Its Applications
37 pages
calculation of barometric lage height
No ratings yet
calculation of barometric lage height
9 pages
Cat Syllogism Exercises
No ratings yet
Cat Syllogism Exercises
8 pages
(Ebook) Filtering and system identification: a least squares approach by Michel Verhaegen, Vincent Verdult ISBN 9780511279508, 9780521875127, 0511279507, 0521875129 2024 Scribd Download
100% (2)
(Ebook) Filtering and system identification: a least squares approach by Michel Verhaegen, Vincent Verdult ISBN 9780511279508, 9780521875127, 0511279507, 0521875129 2024 Scribd Download
81 pages
Ioc Periodic Table Sheet Soln
No ratings yet
Ioc Periodic Table Sheet Soln
146 pages
Estimating The Median From Grouped Data
No ratings yet
Estimating The Median From Grouped Data
3 pages
Autocad Map 3d Features
No ratings yet
Autocad Map 3d Features
23 pages
082 FM 200 System Preventative Maintenance Checklist 1
No ratings yet
082 FM 200 System Preventative Maintenance Checklist 1
23 pages
CBA Kit
No ratings yet
CBA Kit
44 pages
Project Information
No ratings yet
Project Information
16 pages
Rivers State Polytechnic, Bori Summary of Sec. Term 2010/2011 Examination Results
No ratings yet
Rivers State Polytechnic, Bori Summary of Sec. Term 2010/2011 Examination Results
8 pages
Excel Chapter - 4
No ratings yet
Excel Chapter - 4
10 pages
Linear Function
No ratings yet
Linear Function
12 pages
CSJM University Kanpur: Database Management System 2k19 - Batch
No ratings yet
CSJM University Kanpur: Database Management System 2k19 - Batch
11 pages
Capital Budgeting Examples
No ratings yet
Capital Budgeting Examples
13 pages
Motor Paso A Paso Españolberger-Lahr 2-Phase
No ratings yet
Motor Paso A Paso Españolberger-Lahr 2-Phase
22 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
0040517514561922.full Pre
No ratings yet
0040517514561922.full Pre
13 pages