R Programming Lab Manual
R Programming Lab Manual
Laboratory Manual
R PROGRAMMINGLAB MANUAL
FOR
Third year Students (IT)
VISION OF THE INSTITUTE
Design, develop, and implement software systems that meet user requirements,
PSO considering factors like usability, security, and scalability.
3
Program Outcomes (POs)
Engineering Graduates will be able to:
1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals and
computing to solve Information Technology related problems.
2. Problem Analysis: Identify, formulate, review relevant research literature, and analyze complex
Information Technology problems, arriving at well-founded conclusions by leveraging foundational
principles of mathematics, natural sciences, and engineering sciences.
3. Design / Development of Solutions: Create solutions for intricate Information Technology challenges and
design system components or processes that fulfill specified requirements while giving due regard to public
health and safety, as well as cultural, societal, and environmental factors.
4. Conduct Investigations of Complex Problems: Investigate complex Information Technology problems
using research methods, data analysis, and data interpretation to derive valid conclusions.
5. Modern tool usage: Use modern engineering and IT tools, software, and equipment to develop complex
software projects efficiently.
6. The engineer and society: Apply engineering solutions in a societal context, considering ethical, legal,
cultural, economic, and environmental aspects.
7. Environment and sustainability: Understand the Impact of Information Technology Solutions in Societal
and Environmental Contexts, and Demonstrate the Knowledge of, and need for Sustainable Development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities within the field of
information technology.
9. Individual and Team Work: Function effectively as an individual and as a member or leader in diverse
teams, and multidisciplinary settings.
10. Communication: Effectively communicate complex information technology concepts to both IT
community and society at large, including the ability to write reports, design documentation, make
presentations, and give and receive clear instructions.
11. Project Management and Finance: Apply Information Technology and management principles to
proficiently manage projects as an individual and leader within software development environments.
12. Life-Long Learning: Recognize the need for lifelong learning to remain current in the dynamic IT
environment.
DOs and DON’Ts in Laboratory:
1. Make entry in the Log Book as soon as you enter the Laboratory.
2. All the students should sit according to their roll numbers starting from their left toright.
3. All the students are supposed to enter the terminal number in the logbook.
5. All the students are expected to get at least the algorithm of the program/concept tobe
implement.
1. PRE-REQUISITES:
Implementation of factors.
3
12 Implementation of clustering.
R PROGRAMMING LAB
EXPERIMENT-1
Aim: Implementation of DataFrames and
Lists.
Requirements:
● R-studio
● R-Language
Description:
DataFrames: DataFrames are data displayed in a format as a table. DataFrames can have
different types of data inside it. While the first column can be character the send and third can
be numeric or logic. Use the dataframe() function to create a data frame.
Lists:
A list in r can contain many different data types inside it. A list is a collection of data which is
ordered and changeable. To create a list of Dataframes we use the list() function in R and then
pass each of the data frame you have created as arguments to the function.
Source Code:
Implementation of dataframe:
# Create a data frame
Data_Frame <- data.frame
(
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Implementation of list:
# List of strings
thislist <- list("apple", "banana", "cherry")
# Print the
list thislist
Output:
Implementation of dataframes Implementation of list
Training Pulse Duration [[1]]
1.Strength 100 60 [1] "apple"
2.Stamina 150 30
3.Other 120 45 [[2]]
[1] "banana"
[[3]]
[1] "cherry"
Experiment-2
Aim:Implementation of Matrix operations.
Requirements:
● R-studio
● R-Language
Description:
A matrix is a two dimensional data set with columns and rows.
A column is a vertical representation of data, while a row is a horizontal representation of data.A
matrix function in R is a 2-dimensional array that has m number of rows and n number of
columns.
A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to get
the amount of rows and columns.
Operations on Matrices
There are four basic operations i.e. DMAS (Division, Multiplication, Addition, Subtraction)
that can be done with matrices. Both the matrices involved in the operation should have the
same number of rows and columns.
Matrices Addition
The addition of two same ordered matrices and yields a matrix
where every element is the sum of corresponding elements of the input matrices.
Source code:
# R program to add two matrices
# Creating 1st Matrix
B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
# Creating 2nd Matrix
C = matrix(c(7, 8, 9, 10, 11, 12), nrow = 2, ncol = 3)
# Getting number of rows and columns
num_of_rows = nrow(B)
num_of_cols = ncol(B)
# Creating matrix to store results
sum = matrix(, nrow = num_of_rows, ncol = num_of_cols)
# Printing Original matrices
print(B)
print(C)
Using ‘+’ operator for matrix addition: Similarly, the following R script uses the in-built
operator +:
# R program for matrix addition
# using '+' operator
# Creating 1st Matrix
B = matrix(c(1, 2 + 3i, 5.4, 3, 4, 5), nrow = 2, ncol = 3)
# Creating 2nd Matrix
C = matrix(c(2, 0i, 0.1, 3, 4, 5), nrow = 2, ncol = 3)
# Printing the resultant matrix
print(B + C)
R provides the basic inbuilt operator to add the matrices. In the above code, all the elements in
the resultant matrix are returned as complex numbers, even if a single element of a matrix is a
complex number. Properties of Matrix Addition:
Commutative: B + C = C + B
Associative: For n number of matrices A + (B + C) = (A + B) + C
Order of the matrices involved must be same.
Matrices Subtraction:
The subtraction of two same ordered matrices and yields a matrix
where every element is the difference of corresponding elements of the second input matrix
from the first.
# Creating 1st Matrix
B = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
# Creating 2nd Matrix
C = matrix(c(7, 8, 9, 10, 11, 12), nrow = 2, ncol = 3)
# Getting number of rows and columns
num_of_rows = nrow(B)
num_of_cols = ncol(B)
# Creating matrix to store results
diff = matrix(, nrow = num_of_rows, ncol = num_of_cols)
# Printing Original matrices
print(B)
print(C)
# Calculating diff of matrices
for(row in 1:num_of_rows)
{
for(col in 1:num_of_cols)
{
diff[row, col] <- B[row, col] – C[row, col]
}
}
Using ‘/’ operator for matrix division: Similarly, the following R script uses the in-built
operator /:
# Creating 1st Matrix
B = matrix(c(4, 6i, -1), nrow = 1, ncol = 3)
output:
Experiment -3
Aim:Implementation of factors.
Requirements:
● R-studio
● R-Language
Description: Factors in R Programming Language are data structures that are implemented
to categorize the data or represent categorical data and store it on multiple levels.
Factors are the data objects which are used to categorize the data and store it as levels. They can
store both strings and integers. They are useful in the columns which have a limited number of
unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for
statistical modeling.
Factors are created using the factor () function by taking a vector as input.
Source code:
output:
height weight gender
1 132 48 male
2 151 49 male
3 162 66 female
4 139 53 female
5 166 67 male
6 147 52 female
7 122 40 male
[1] TRUE
[1] male male female female male female male
Levels: female male
Experiment-4
Aim:Implementation of Quick Sort and Merge sort.
Requirements:
● R-studio
● R-Language
Description:
QuickSort is a Divide and Conquer algorithm. It picks an element as a pivot and partitions the
given array around the picked pivot. There are many different versions of quickSort that pick
pivot in different ways.
Always pick the first element as a pivot.
Always pick the last element as a pivot (implemented below)
Pick a random element as a pivot.
Pick median as the pivot.
The key process in quickSort is a partition(). The target of partitions is, given an array and an
element x of an array as the pivot, put x at its correct position in a sorted array and put all
smaller elements (smaller than x) before x, and put all greater elements (greater than x) after x.
All this should be done in linear time.
Source Code:
Output:
## [1] 3 4 5 8 12 13 88
Merge sort:
Merge sort is a sorting algorithm that works by dividing an array into smaller subarrays,
sorting each subarray, and then merging the sorted subarrays back together to form the final
sorted array. The process of merge sort is to divide the array into two halves, sort each half, and
then merge the sorted halves back together. This process is repeated until the entire array is
sorted.
Souce code:
# function to merge two sorted arrays
merge <- function(a, b) {
# create temporary array
temp <- numeric(length(a) + length(b))
# call mergeSort
function result <-
mergeSort(arr)
# print
result Result
Output:
[1] 6 8 16 19 21 24 35 38 74 90
Experiment:5
Aim:Implementation of Binary Search Tree.
Requirements:
● R-studio
● R-Language
Description:
R doesn't have a built-in binary search function, but writing such a function isn't too difficult.
The first statement creates an integer vector with five values. The second statement sets up a
target value for which to search. The third statement uses the built-in %in% operator to search
for the target
Source Code:
Binary search=function(arr,item)
{
Low🡨1;high🡨length(arr)
While(low🡨high)
{
Mid🡨as.integer(round((low+high)/2))
If(abs(arr[Mid]-item)==0)
{
Return(mid)
}
Else if (arr[mid]<item)
{
Low🡨mid+1
}
Else
{
High🡨mid+1
}
Return 0
}
Arr🡨(4,0,3,1,5,6,7)
Sorted-arr🡨sort(arr)
Item🡨4
Cat(“Array”,arr,”In sorted array”,sorted-arr,”in item= “,item);
Index🡨binary search (sorted-arr,item)
If (index!=0)
{
Cat(“Element is present at index”,index)
}
Else
{
Cat(“element not found”)
}
Output:
Array 4 0 3 1 5 6 2
Sorted Array 0 1 2 3 4 5 6
Item=4
Elements in present at index 5.
Experiment-6
Aim:Implementation of reading and writing files.
Requirements:
● R-studio
● R-Language
Description:
One of the important formats to store a file is in a text file. R provides various methods that one
can read data from a text file.
read.delim(): This method is used for reading “tab-separated value” files (“.txt”). By
default, point (“.”) is used as decimal points.
Syntax: read.delim(file, header = TRUE, sep = “\t”, dec = “.”, …)
Parameters:
file: the path to the file containing the data to be read into R.
header: a logical value. If TRUE, read.delim() assumes that your file has a header row, so
row 1 is the name of each column. If that’s not the case, you can add the argument header =
FALSE.
sep: the field separator character. “\t” is used for a tab-delimited file.
dec: the character used in the file for decimal points.
R – Writing to Files:
Writing Data to CSV files in R Programming Language:
CSV stands for Comma Separated Values. These files are used to handle a large amount of
statistical data. Following is the syntax to write to a CSV file:
Syntax:
write.table(my_data, file = "my_data.txt", sep = "")
Here,
csv() and csv2() are the function in R programming.
write.csv() uses “.” for the decimal point and a comma (“, ”) for the separator.
write.csv2() uses a comma (“, ”) for the decimal point and a semicolon (“;”) for the
separator.
Source code:
Output:
x1 <- c(31,13,25,31,16)
x2 <- c(12,23,43,12,22,45,32)
label1 <-
c('geek','geek-i-knack','technical-scripter',
'content-writer','problem-setter')
label2 <-
par(mfrow=c(1,2))
barplot(x1, names.arg =
= label2,col ="green")
Output:
Experiment-9
Aim:Implementation of Correlation,T-Test,ANOVA
Requirements:
● R-studio
● R-Language
Description:
Correlation on a statistical basis is the method of finding the relationship between the
variables in terms of the movement of the data. That is, it helps us analyze the effect of
changes made in one variable over the other variable of the dataset.
There are mainly two types of correlation:
T-TEST:
Classification:
● One Sample T-test
● Two sample T-test
● Paired sample
T-test One-sample T-Test:
The One-Sample T-Test is used to test the statistical difference between a sample mean and a
known or assumed/hypothesized value of the mean in the population.
Two sample T-test:
It is used to help us to understand that the difference between the two means is real or simply by
chance.
The general form of the test is t.test(y1, y2, paired=FALSE). By default, R assumes that the
variances of y1 and y2 are unequal, thus defaulting to Welch’s test. To toggle this, we use the
flag var.equal=TRUE.
Paired Sample T-test:
This is a statistical procedure that is used to determine whether the mean difference between
two sets of observations is zero. In a paired sample t-test, each subject is measured two times,
resulting in pairs of observations.
The test is run using the syntax t.test(y1, y2, paired=TRUE)
ANOVA
ANOVA test involves setting up:
Null Hypothesis: All population means are equal.
Alternate Hypothesis: Atleast one population mean is different from other.
ANOVA tests are of two types:
One way ANOVA: It takes one categorical group into consideration.
Two way ANOVA: It takes two categorical group into consideration.
Source code:
Correlation:
X=c(1,2,3,4,5,6,7)
Y=c(1,3,6,2,7,4,5)
Result=corr(x,y,method=”pearson”)
Cat(“pearson correlation coefficient is:
“,result)
One-sample T-test
set.seed(0)
sweetSold <- c(rnorm(50, mean = 140, sd = 5))
t.test(sweetSold, mu = 150) # Ho: mu = 150
Two-sample T-Test
set.seed(0)
shopOne <- rnorm(50, mean = 140, sd = 4.5)
shopTwo <- rnorm(50, mean = 150, sd = 4)
t.test(shopOne, shopTwo, var.equal = TRUE)
Paled Sample T-Test
set.seed(2820)
sweetOne <- c(rnorm(100, mean = 14, sd = 0.3))
sweetTwo <- c(rnorm(100, mean = 13, sd = 0.2))
t.test(sweetOne, sweetTwo, paired = TRUE)
ANOVA T-TEST
install.packages("dplyr")
library(dplyr)
boxplot(mtcars$disp~factor(mtcars$gear),xlab = "gear", ylab = "disp")
mtcars_aov <- aov(mtcars$disp~factor(mtcars$gear))summary(mtcars_aov)
OUTPUT:
Correlation Pearson correlation coefficient is : 0.5357143
One-sample T-test
Two-sample T-Test
Description:
Decision Tree:
A decision tree is a type of supervised machine learning algorithm that is used for classification and
regression analysis. It is a tree-like model that represents decisions and their possible consequences. The
model starts with a single node, called the root, and branches out to multiple nodes, each of which
represents a decision or a test of a particular feature or attribute.
At each node of the tree, the algorithm makes a decision based on the values of the input features, and
then follows the appropriate branch of the tree to the next node. This process is repeated until a leaf node
is reached, which represents the final decision or output of the algorithm.
The decision tree algorithm is particularly useful when the data has a hierarchical structure, where the
features can be grouped into a hierarchy. The decision tree algorithm can be used to automatically learn
the hierarchy and the decision rules based on the training data. The resulting model can be used to predict
the outcome of new data with high accuracy, and is also easy to interpret and visualize.
Support Vector Classification (SVC) is a type of supervised machine learning algorithm that is used for
classification problems. It is a non-probabilistic binary linear classifier, which means that it assigns input
data points to one of two categories based on a linear boundary.
The SVC algorithm works by finding the hyperplane that best separates the input data into different
classes. The hyperplane is a decision boundary that maximizes the margin between the two classes. The
margin is the distance between the hyperplane and the closest data points from each class. The goal of the
algorithm is to find the hyperplane that has the largest margin, as this is expected to generalize well on
new, unseen data.
SVC can handle both linear and nonlinear classification problems through the use of kernel functions.
The kernel function maps the input data to a higher-dimensional feature space, where a linear boundary
can be found to separate the classes. The most commonly used kernel functions are linear, polynomial,
and radial basis function (RBF) kernels.
One of the key advantages of SVC is its ability to handle high-dimensional datasets with relatively few
training examples. It is also known for its robustness to outliers, and its ability to handle non-linearly
separable data by using kernel functions. However, SVC can be sensitive to the choice of kernel function
and its associated parameters, and can be computationally expensive for large datasets.
Source code:
library(datasets)
library(caTools)
library(party)
library(dplyr)
library(magrittr)
data("readingSkills"
head(readingSkills)
Output:
plot(model)
ctree(formula, data)
Output:
predict_model<-predict(ctree_, test_data)
m_at
output:
dataset =
read.csv('Social_Network_Ads.csv') dataset
= dataset [3:5]
dataset = dataset[3:5]
# Splitting the dataset into the Training set and Test set
install.packages('caTools')
library(caTools)
set.seed(123)
# Feature Scaling
training_set[-3] =
scale(training_set[-3]) test_set[-3] =
scale(test_set[-3])
set install.packages('e1071')
library(e1071)
type =
'C-classification', kernel
= 'linear')
# installing library
ElemStatLearn
library(ElemStatLearn)
plot(set[, -3],
set = test_set
output:
EXPERIMENT-11
Aim:Implementation of Linear, Random Forest Regressions.
Requirements:
● R-studio
● R-Language
Description:
Linear regression:
Linear regression is a type of supervised machine learning algorithm used for predicting a continuous
target variable. It is a statistical approach that models the relationship between a dependent variable and
one or more independent variables by fitting a linear equation to the observed data.
In simple linear regression, there is only one independent variable, and the linear equation takes the form:
y = mx + b
where y is the target variable, x is the independent variable, m is the slope of the line, and b is the y-
intercept. The goal of linear regression is to find the values of m and b that minimize the difference
between the predicted values and the actual values of the target variable.
In multiple linear regression, there are multiple independent variables, and the linear equation takes the
form:
where y is the target variable, xi are the independent variables, and bi are the coefficients of the linear
equation. The goal of multiple linear regression is to find the values of bi that minimize the
difference between the predicted values and the actual values of the target variable.
Linear regression is a widely used algorithm in machine learning and statistical modeling due to its
simplicity, interpretability, and ability to capture linear relationships between variables. However, it is
important to note that linear regression assumes a linear relationship between the independent and
dependent variables, and may not be appropriate for non-linear relationships.
Random Forest :
SOURCE CODE:
age <- c(18, 20, 22, 24, 26, 28, 30, 32, 34, 36)
height <- c(68, 69, 71, 72, 73, 74, 75, 76, 77, 78)
data)
summary(model)
new_data) predictions
Output:
Call:
Residuals:
1 2 3 4 5 6 7 8 9 10
-0.83333 -0.33333 0.16667 0.66667 1.16667 1.66667 -0.83333 -0.33333 0.16667 0.66667
Coefficients:
---
signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
1 2 3
library(randomForest)
age <- c(18, 20, 22, 24, 26, 28, 30, 32, 34, 36)
gender <- factor(c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F"))
height <- c(68, 69, 71, 72, 73, 74, 75, 76, 77, 78)
sets set.seed(123)
test_data<- data[-train_index, ]
model <- randomForest(height ~ age + gender, data = train_data, ntree = 500, mtry = 2)
print(model)
mse
output:
Call:
Description:
Clustering in R Programming Language is an unsupervised learning technique in which the
data set is partitioned into several groups called as clusters based on their similarity. Several
clusters of data are produced after the segmentation of data. All the objects in a cluster share
common characteristics. During data mining and analysis, clustering is used to find similar
datasets.
Methods of Clustering:
There are 2 types of clustering in R programming:
Hard clustering: In this type of clustering, the data point either belongs to the cluster
totally or not and the data point is assigned to one cluster only. The algorithm used for
hard clustering is k-means clustering.
Soft clustering: In soft clustering, the probability or likelihood of a data point is assigned in
the clusters rather than putting each data point in a cluster. Each data point exists in all the
clusters with some probability. The algorithm used for soft clustering is the fuzzy clustering
method or soft k-means.
K-Means Clustering in R Programming language:
K-Means is an iterative hard clustering technique that uses an unsupervised learning
algorithm. In this, total numbers of clusters are pre-defined by the user and based on the
similarity of each data point, the data points are clustered. This algorithm also finds out the
centroid of the cluster.
Syntax: kmeans(x, centers, nstart)
where,
x represents numeric matrix or data frame object
centers represents the K value or distinct cluster centers
nstart represents number of random sets to be chosen
# Scaling dataset df
<- scale(df)
\
km <- kmeans(df, centers = 4, nstart = 25)