MALLA REDDY COLLEGE OF ENGINEERING
(Approved by AICTE-New Delhi, Affiliated to JNTUH-Hyderabad) Recognised under
Section 2(f) & 12(B) of the UGC Act 1956,
An ISO 9001:2015 Certified Institution.
Maisammaguda, Dhulapally, post via Kompally, Secunderabad - 500100
Department Of Computer Science and
Engineering–Data Science
R PROGRAMMING LAB MANUAL
Subject Code : DS504PC
Class : III Year I Sem
Regulation : R22
Academic Year : 2024-2025
SYLLABUS
B.TECH III Year I Sem. LTPC
0021
DS505PC: R PROGRAMMING LAB
Pre-requisites: Any programming language.
Course Objectives:
Familiarize with R basic programming concepts, various data structures for
handling datasets, various graph representations and Exploratory Data
Analysis concepts
Course Outcomes:
Setup R programming environment.
Understand and use R – Data types and R – Data Structures.
Develop programming logic using R – Packages.
Analyze data sets using R – programming capabilities
LIST OF EXPERIMENTS:
1. Download and install R-Programming environment and install basic packages
using install. packages() command in R.
2. Learn all the basics of R-Programming (Data types, Variables, Operators etc,.)
3. Write R command to
i) Illustrate summation, subtraction, multiplication, and division operations
on vectors using vectors.
ii)Enumerate multiplication and division operations between matrices and
vectors in R console
4. Write R command to
i) Illustrates the usage of Vector subsetting and Matrix subsetting
ii)Write a program to create an array of 3×3 matrixes with 3 rows and 3 columns.
5. Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart along
with chart legend by considering suitable CSV file
6. Create a CSV file having Speed and Distance attributes with 1000 records. Write
R program to draw i) Box plots
ii) Histogram
iii) Line Graph
iv)Multiple line graphs
v) Scatter plot
to demonstrate the relation between the cars speed and the distance.
7. Implement different data structures in R (Vectors, Lists, Data Frames)
8. Writean R program to read a csv file and analyze the data in the file using EDA
(Explorative Data Analysis) techniques.
9. Writean R program to illustrate Linear Regression and Multi linear Regression
considering suitable CSV file
Experiment 1:Download and install R-Programming environment and install basic
packages using install. packages() command in R.
STEP BY STEP GUIDE TO INSTALL R :
1) Download the installable file from the following link: https://cran.r-
project.org/bin/windows/base/
2) Click on the R 3.2.2.exe file. The 3.2.2 is the version number of the file. The versions can be
updated as per the latest releases.
2) The SetUp will request permission to be installed on the system click yes to p
1
4) Select the preferred language from the drop down to begin an installation in that preferred
language.
5) Click next to proceed with the installation.
2
6.Choose the path where you wish to install R by clicking on browse and changing the
workspace locations. Click next to proceed with the default installation. Theminimum space
requirementsare mentioned at the bottomof the dialog box. Please check you have required
amount of free space in your drive
7) Choose the type of installation you require. By default R installs both the 32 and 64 bit
versions on your system.
8) To customize the startup options for R choose option and customize. To proceed with a
vanilla installation use Next.
3
9) Generate program short cuts and naming those as per your requirement specify the
necessary customizations. To proceed with the default installation hit next.
4
10) Click on the next button to begin installation.
11) After the installation has completed you will see the final screen.Click finish to
complete the installation.
5
12) Open Start Menu and you will find R in the available set of Programs.
13) Click on the R icon in the menu settings to open R.
6
Experiment 2:Learn all the basics of R-Programming (Data types, Variables, Operators etc,.)
The variables are assigned with R-Objects and the data type of the R-object becomes the data
type of the variable. There are many types of R-objects. The frequently used are:
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
The simplest of these objects is the vector object and there are six data types of these atomic
vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic
vectors.
In R, c() function is used to create a vector. This function returns a one-dimensional array or
simply vector. The c() function is a generic function which combines its argument. All
arguments are restricted with a common data type which is the type of the returned value. There
are various other ways to create a vector in R, which are as follows:
1) Using the colon(:) operator
We can create a vector with the help of the colon operator. There is the following syntax to use
colon operator:
z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
Vectors :Tocreate vector with more than one element, use c() function which means combine
the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
print(class(apple))
When we execute the above code, it produces the following result:
7
[1] "red" "green" "yellow"
[1] "character"
Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)
When we execute the above code, it produces the following result:
[[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")
Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow = TRUE)
print(M)
When we execute the above code, it produces the following result:
[,1] [,2] [,3]
[1,] "a" "a" "b"
8
[2,] "c" "b" "a"
Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions. The
array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim=c(3,3,2))
print(a)
When we execute the above code, it produces the following result:
,,1
[,1] [,2] [,3]
[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"
,,2
[,1] [,2] [,3]
[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"
Variable: A variable provides us with named storage that our programs can manipulate. A
variable in R can store an atomic vector, group of atomic vectors or a combination of many R-
objects. A valid variable name consists of letters, numbers and the dot or underline characters.
The variable name starts with a letter or the dot not followed by a number.
Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and
underscore
var_name% Invalid Has the character '%'. Only
dot(.) and underscore allowed
9
2var_name invalid Starts with a number
.var_name , var.name valid Can start with a dot(.) but the
dot(.)should not be followed
by a number.
.2var_name invalid The starting dot is followed by
a number making it invalid
_var_name invalid Starts with _ which is not
valid
Operators:An operator is a symbol that tells the compiler to perform specific mathematical or
logical manipulations. R language is rich in built-in operators and provides following types of
operators.
Types of Operators :
Arithmetic Operators
Relational Operators
Logical Operators
Assignment Operators
Miscellaneous Operators
!
Arithmetic Operators
Operator Description
> Checks if each element of the first vector is greater than the corresponding
element of the second vector.
< Checks if each element of the first vector is less than the corresponding
element of the second vector
== Checks if each element of the first vector is equal to the corresponding
element of the second vector
<= Checks if each element of the first vector is less than or equal to the
corresponding element of the second vector.
10
>= Checks if each element of the first vector is greater than or equal to the
corresponding element of the second vector
!= Checks if each element of the first vector is unequal to the corresponding
element of the second vector.
Logical Operators
Below table shows the logical operators supported by R language. It is applicable only to vectors
of type logical, numeric or complex. All numbers greater than 1 are considered as logical value
TRUE.
Each element of the first vector is compared with the corresponding element of the second
vector. The result of comparison is a Boolean value.
Operator Description
& It is called Element-wise Logical AND operator. It combines each element of the
first vector with the corresponding element of the second vector and gives a
output TRUE if both the elements are TRUE
| It is called Element-wise Logical OR operator. It combines each element of the
first vector with the corresponding element of the second vector and gives a
output TRUE if one the elements is TRUE.
! It is called Logical NOT operator. Takes each element of the vector and gives the
opposite logical value
Assignment Operators :
These operators are used to assign values to vectors.
Operator Description
<-
or
= Called Left Assignment
or
<<-
-> Called Right Assignment
or
->>
11
Experiment 3:Write R command to
a) Illustrate summation, subtraction, multiplication, and division operations on
vectors using vectors.
PROGRAM:
Summation of vectors:
# Define two vectors
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
# Perform summation
sum_result <- vec1 + vec2
print(sum_result)
Subtraction of of vectors:
# Perform subtraction
sub_result <- vec1 - vec2
print(sub_result)
12
Multiplication of vectors:
# Perform multiplication
mul_result <- vec1 * vec2
print(mul_result)
Division of vectors:
# Perform division
div_result <- vec2 / vec1
print(div_result)
13
Experiment 4 :Write R command to
i) Illustrates the usage of Vector subsetting and Matrix subsetting
ii) Write a program to create an array of 3×3 matrixes with 3 rows and 3
columns.
i) Illustrate the usage of Vector subsetting and Matrix subsetting:
PROGRAM:
# Vector subsetting
My _vector <- c(1, 2, 3, 4, 5)
subset_
vector <- my _vector[c(2, 4)] # Selecting elements at index 2 and 4
print(subset _vector)
# Matrix sub setting
my_ matrix <- matrix(1:9, nrow = 3)
subset_ matrix <- my_ matrix[2:3, 1:2] # Selecting rows 2 and 3, columns 1 and 2
print(subset _matrix)
14
ii) Write a program to create an array of 3×3 matrices with 3 rows and 3 columns:
PROGRAM:
\ # Create an array of 3x3 matrices
num_matrices <- 3
matrix_rows <- 3
matrix_cols <- 3
# Initialize an empty list to store matrices
matrix_list <- list()
# Populate the list with matrices
for (i in 1:num_matrices) {
# Create a 3x3 matrix
my_matrix <- matrix(1:(matrix_rows * matrix_cols), nrow = matrix_rows)
# Add the matrix to the list
matrix_list[[i]] <- my_matrix
}
# Convert the list of matrices to an array
my_array <- array(matrix_list, dim = c(matrix_rows, matrix_cols, num_matrices))
# Print the array
print(my_array)
15
Experiment 5 - Write an R program to draw i) Pie chart ii) 3D Pie Chart, iii) Bar Chart along
with chart legend by considering suitable CSV file
PROGRAM:
i. Pie chart
# Create data for the graph.
geeks<- c(23, 56, 20, 63)
labels <- c("Mumbai", "Pune", "Chennai", "Bangalore")
# Plot the chart with title and rainbow
# color pallet.
pie(geeks, labels, main = "City pie chart",
col = rainbow(length(geeks)))
16
3D pie chart: A 3D pie chart showing the same data as the regular pie chart
PROGRAM:
# Get the library.
library(plotrix)
# Create data for the graph.
geeks <- c(23, 56, 20, 63)
labels <- c("Mumbai", "Pune", "Chennai", "Bangalore")
piepercent<- round(100 * geeks / sum(geeks), 1)
# Plot the chart.
pie3D(geeks, labels = piepercent,
main = "City pie chart", col = rainbow(length(geeks)))
legend("topright", c("Mumbai", "Pune", "Chennai", "Bangalore"),
cex = 0.5, fill = rainbow(length(geeks)))
17
Bar chart: A bar chart showing the values of the variable_for_chart variable for each category in
the category _variable variable.
PROGRAM:
colors = c("green", "orange", "brown")
months <- c("Mar", "Apr", "May", "Jun", "Jul")
regions <- c("East", "West", "North")
# Create the matrix of the values.
Values <- matrix(c(2, 9, 3, 11, 9, 4, 8, 7, 3, 12, 5, 2, 8, 10, 11),
nrow = 3, ncol = 5, byrow = TRUE)
# Create the bar chart
barplot(Values, main = "Total Revenue", names.arg = months,
xlab = "Month", ylab = "Revenue", col = colors)
# Add the legend to the chart
legend("topleft", regions, cex = 0.7, fill = colors)
18
Experiment 6 Create a CSV file having Speed and Distance attributes with 1000 records. Write
R program to draw
a) Box plots
b) Histogram
c) Line Graph
d) Multiple line graphs
e) Scatter plot
a) Box plots
# Load the dataset
data(mtcars)
# Set up plot colors
my_colors <- c("#FFA500", "#008000", "#1E90FF", "#FF1493")
# Create the box plot with customized aesthetics
boxplot(disp ~ gear, data = mtcars,
main = "Displacement by Gear", xlab = "Gear", ylab = "Displacement",
col = my_colors, border = "black", notch = TRUE, notchwidth = 0.5,
medcol = "white", whiskcol = "black", boxwex = 0.5, outpch = 19,
outcol = "black")
# Add a legend
legend("topright", legend = unique(mtcars$gear),
fill = my_colors, border = "black", title = "Gear")
b) Histogram
PROGRAM:
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32,
14, 19, 27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles ",
col = "green", border = "black")
19
c) Line Graph
PROGRAM:
# Create the data for the chart.
v <- c(17, 25, 38, 13, 41)
# Plot the bar chart.
plot(v, type = "o")
d) Multiple line graphs
PROGRAM:
library("ggplot2")
gfg_data <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
y1 = c(1.1, 2.4, 3.5, 4.1, 5.9, 6.7,
7.1, 8.3, 9.4, 10.0),
y2 = c(7, 5, 1, 7, 4, 9, 2, 3, 1, 4),
y3 = c(5, 6, 4, 5, 1, 8, 7, 4, 5, 4),
y4 = c(1, 4, 8, 9, 6, 1, 1, 8, 9, 1),
y5 = c(1, 1, 1, 3, 3, 7, 7, 10, 10, 10))
gfg_plot <- ggplot(gfg_data, aes(x)) +
geom_line(aes(y = y1), color = "black") +
geom_line(aes(y = y2), color = "red") +
geom_line(aes(y = y3), color = "green") +
geom_line(aes(y = y4), color = "blue") +
geom_line(aes(y = y5), color = "purple")
gfg_plot
20
e) Scatter plot
PROGRAM:
# Get the input values.
input <- mtcars[, c('wt', 'mpg')]
# Plot the chart for cars with
# weight between 1.5 to 4 and
# mileage between 10 and 25.
plot(x = input$wt, y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(1.5, 4),
ylim = c(10, 25),
main = "Weight vs Milage"
)
21
Experiment 7-Implement different data structures in R (Vectors, Lists, Data Frames)
Vectors
PROGRAM:
# R program to illustrate Vector
# Vectors(ordered collection of same data type)
25
X = c(1, 3, 5, 7, 8)
# Printing those elements in console
print(X)
Lists:
Lists:
Lists can hold elements of different data types and can be nested.
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham",
"Shiba")
numberOfEmp = 4
empList = list(empId, empName, numberOfEmp)
print(empList)
22
PROGRAM:
Name = c("Anand", "kamal", "Pandu")
Language = c("R", "Python", "Java")
Age = c(22, 25, 45)
df = data.frame(Name, Language, Age)
26
print(df)
23
Experiment 8-Write an R program to read a csv file and analyze the data in the file using EDA
(Explorative Data Analysis) techniques.
# Install and load necessary packages
install. packages(c( "readr ", "dplyr", "ggplot2"))
library(readr)
library(dplyr)
library(ggplot2)
# Step 1: Read the CSV file
file_ path<- "data.csv"
data <- read_ csv(file_path)
# Step 2: Display the first few rows of the data
cat("First few rows of the data:\n")
print(head(data))
# Step 3: Summary statistics
cat("\n Summary statistics:\n")
print(summary(data))
# Step 4: Data structure
cat("\n Data structure:\n")
str(data)
# Step 5: Exploratory Data Analysis (EDA) - Example: Histogram
g gplot (data, aes(x = Variable_ of_Interest)) +
geom_ histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Histogram of Variable_ of_Interest", x = "Variable_ of_Interest", y = "Frequency")
# Add more EDA visualizations and analyses as needed based on your data
# Save plots (optional)
ggsave ("histogram.png", plot = last_ plot())
# Close the graphics device (optional)
dev. off()
24
PROGRAM :DATA INSPECTION IN EDA
# Data Inspection in EDA
# loading the required packages
library(aqp)
library(soilDB)
# Load from the loafercreek dataset
data("loafercreek")
# Construct generalized horizon designations
n < - c("A", "BAt", "Bt1", "Bt2", "Cr", "R")
# REGEX rules
p < - c("A", "BA|AB", "Bt|Bw", "Bt3|Bt4|2B|C","Cr", "R")
# Compute genhz labels and
# add to loafercreek dataset
loafercreek$genhz < - generalize.hz(loafercreek$hzname,n, p)
# Extract the horizon table
h < - horizons(loafercreek)
# Examine the matching of pairing of
# the genhz label to the hznames
table(h$genhz, h$hzname)
vars < - c("genhz", "clay", "total_frags_pct","phfield", "effclass")
summary(h[, vars])
sort(unique(h$hzname))
h$hzname < - ifelse(h$hzname == "BT","Bt nu", h$hzname)
25
PROGRAM(GRAPHICAL METHOD)
# EDA Graphical Method Distributions
# loading the required packages
library("ggplot2")
library(aqp)
library(soilDB)
# Load from the loafercreek dataset
data("loafercreek")
# Construct generalized horizon designations
n <- c("A", "BAt", "Bt1", "Bt2", "Cr", "R")
# REGEX rules
p <- c("A", "BA|AB", "Bt|Bw", "Bt3|Bt4|2B|C","Cr", "R")
# Compute genhz labels and add
# to loafercreek dataset
loafercreek$genhz <- generalize.hz(loafercreek$hzname, n, p)
# Extract the horizon table
h <- horizons(loafercreek)
# Examine the matching of pairing
# of the genhz label to the hznames
table(h$genhz, h$hzname)
vars <- c("genhz", "clay", "total_frags_pct", "phfield", "effclass")
summary(h[, vars])
26
sort(unique(h$hzname))
h$hzname <- ifelse(h$hzname == "BT", "Bt", h$hzname)
# graphs
# bar plot
ggplot(h, aes(x = texcl)) +geom_bar()
# histogram
ggplot(h, aes(x = clay)) +
geom_histogram(bins = nclass.Sturges(h$clay))
# density curve
ggplot(h, aes(x = clay)) + geom_density()
# box plot
ggplot(h, (aes(x = genhz, y = clay))) +
geom_boxplot()
# QQ Plot for Clay
ggplot(h, aes(sample = clay)) +
geom_qq() +
geom_qq_line()
27
Experiment 9-Write an R program to illustrate Linear Regression and Multi linear Regression
considering suitable CSV file.
# Linear Regression and Multi linear Regression example using the mtcars dataset
# Step 1: Load necessary libraries
library(ggplot2)
library(tidyr)
library(dplyr)
# Step 2: Load the mtcars dataset (or use your own CSV file)
data (mtcars)
# Step 3: Simple Linear Regression (Example: mpg vs. horsepower)
cat("Simple Linear Regression:\n")
lm_model<- lm(mpg ~ horsepower, data = mtcars)
summary(lm_model)
# Plotting the Simple Linear Regression line
ggplot(mtcars, aes(x = horsepower, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Simple Linear Regression", x = "Horsepower", y = "Miles per Gallon")
# Step 4: Multi Linear Regression (Example: mpg vs. horsepower + weight)
cat("\nMulti Linear Regression:\n")
mlm_model<- lm(mpg ~ horsepower + weight, data = mtcars)
summary(mlm_model)
# Step 5: Predicting using the Multi Linear Regression model
new_data<- data.frame(horsepower = c(150, 200), weight = c(3000, 3500))
predictions <- predict(mlm_model, newdata = new_data)
cat("\nPredictions using Multi Linear Regression:\n")
print(data.frame(new_data, predictions))
# Add more features to the Multi Linear Regression model as needed
# Save plots (optional)
ggsave("linear_regression_plot.png", plot = last_plot())
# Close the graphics device (optional)
dev.off()
28
PROGRAM(LINEAR REGRESSION)
# R program to illustrate
# Linear Regression
# Height vector
x <- c(153, 169, 140, 186, 128,136, 178, 163, 152, 133)
# Weight vector
y <- c(64, 81, 58, 91, 47, 57,75, 72, 62, 49)
# Create a linear regression model
model <- lm(y~x)
# Print regression model
print(model)
# Find the weight of a person
# With height 182
df <- data.frame(x = 182)
res <- predict(model, df)
cat("\nPredicted value of a person with height = 182")
print(res)
# Output to be present as PNG file
png(file = "linearRegGFG.png")
# Plot
plot(x, y, main = "Height vs Weight Regression model")
abline(lm(y~x))
# Save the file.
dev.off()
29
PROGRAM: MULTIPLE REGRESSION
# R program to illustrate
# Multiple Linear Regression
# Using airquality dataset
input <- airquality[1:50,
c("Ozone", "Wind", "Temp")]
# Create regression model
model <- lm(Ozone~Wind + Temp, data = input)
# Print the regression model
cat("Regression model:\n")
print(model)
# Output to be present as PNG file
png(file = "multipleRegGFG.png")
# Plot
plot(model)
# Save the file.
dev.off()
30
Lead Programs:
1. Linear Discriminant Analysis is computed using the lda() function. Let’s use the iris data
set of R Studio.
library(MASS)
library(tidyverse)
library(caret)
theme_set(theme_classic())
# Load the data
data("iris")
# Split the data into training (80%) and test set (20%)
set.seed(123)
training.individuals <- iris$Species %>%
createDataPartition(p = 0.8, list = FALSE)
train.data <- iris[training.individuals, ]
test.data <- iris[-training.individuals, ]
# Estimate preprocessing parameters
preproc.parameter <- train.data %>%
preProcess(method = c("center", "scale"))
# Transform the data using the estimated parameters
train.transform <- preproc.parameter %>% predict(train.data)
test.transform <- preproc.parameter %>% predict(test.data)
# Fit the model
model <- lda(Species~., data = train.transform)
# Make predictions
predictions <- model %>% predict(test.transform)
# Model accuracy
mean(predictions$class==test.transform$Species)
model <- lda(Species~., data = train.transform)
model
31
2. Decision Tree for Regression in R Programming.
# Load the required library
library(rpart)
# Load a sample dataset
data(mtcars)
# Create a CART model for regression
cart_model <- rpart(mpg ~ ., data = mtcars)
# Print the model summary
print(cart_model)
# Make predictions using the model
predictions <- predict(cart_model, newdata = mtcars)
32