R Programming Manual 24-25
R Programming Manual 24-25
LABORATORY MANUAL
Objectives
To provide quality education and groom top-notch professionals, entrepreneurs and
leaders for different fields of engineering, technology and management.
To develop academic, professional and financial alliances with the industry as well as
the academia at national and transnational levels.
To cultivate strong community relationships and involve the students and the staff in
local community service.
To constantly enhance the value of the educational inputs with the participation of
students, faculty, parents and industry.
Vision
Mission
To keep pace with advancements in knowledge and make the students competitive and
capable at the global level.
To create an environment for the students to acquire the right physical, intellectual,
emotional and moral foundations and shine as torch bearers of tomorrow’s society.
To develop highly talented individuals in Computer Science and Engineering to deal with real
world challenges in industry, education, research and society.
To inculcate professional behavior, strong ethical values, innovative research capabilities and
leadership abilities in the young minds & to provide a teaching environment that emphasizes
depth, originality and critical thinking.
Motivate students to put their thoughts and ideas adoptable by industry or to pursue higher studies
leading to research.
1. Empower students with a strong basis in the mathematical, scientific and engineering fundamentals
to solve computational problems and to prepare them for employment, higher learning and R&D.
2. Gain technical knowledge, skills and awareness of current technologies of computer science
engineering and to develop an ability to design and provide novel engineering solutions for
software/hardware problems through entrepreneurial skills.
3. Exposure to emerging technologies and work in teams on interdisciplinary projects with effective
communication skills and leadership qualities.
4. Ability to function ethically and responsibly in a rapidly changing environment by applying
innovative ideas in the latest technology, to become effective professionals in Computer
Science to bear a life-long career in related areas.
Program Specific Outcomes (PSOs)
1. PSO1: Ability to apply skills in the field of algorithms, database design, web design, cloud
computing and data analytics.
2. PSO2: Apply knowledge in the field of computer networks for building network and internet
based applications.
R Programming Semester 3
Course Code BCS358 CIE Marks 50
B
Teaching Hours/Week (L:T:P: S) 0:0:2:0 SEE Marks 50
Credits 01 Exam Hours 02
Examination type (SEE) Practical
Course objectives:
● To explore and understand how R and R Studio interactive environment.
● To understand the different data Structures, data types in R.
● To learn and practice programming techniques using R programming.
● To import data into R from various data sources and generate visualizations.
● To draw insights from datasets using data analytics techniques.
Sl.NO Experiments
1 Demonstrate the steps for installation of R and R Studio. Perform the following:
a) Assign different type of values to variables and display the type of variable. Assign different
types such as Double, Integer, Logical, Complex and Character and understand the difference
between each data type.
b) Demonstrate Arithmetic and Logical Operations with simple examples.
c) Demonstrate generation of sequences and creation of vectors.
d) Demonstrate Creation of Matrices
e) Demonstrate the Creation of Matrices from Vectors using Binding Function.
f) Demonstrate element extraction from vectors, matrices and arrays
Suggested Reading – Text Book 1 – Chapter 1 (What is R, Installing R, Choosing an IDE – RStudio,
How to Get Help in R, Installing Extra Related Software), Chapter 2 (Mathematical Operations and
Vectors, Assigning Variables, Special Numbers, Logical Vectors), Chapter 3 (Classes, Different
Types of Numbers,
Other Common Classes, Checking and Changing Classes, Examining Variables )
2 Assess the Financial Statement of an Organization being supplied with 2 vectors of data: Monthly
Revenue and Monthly Expenses for the Financial Year. You can create your own sample data vector
for this experiment) Calculate the following financial metrics:
a. Profit for each month.
b. Profit after tax for each month (Tax Rate is 30%).
c. Profit margin for each month equals to profit after tax divided by revenue.
d. Good Months – where the profit after tax was greater than the mean for the year.
e. Bad Months – where the profit after tax was less than the mean for the year.
f. The best month – where the profit after tax was max for the year.
g. The worst month – where the profit after tax was min for the year.
Note:
a. All Results need to be presented as vectors
b. Results for Dollar values need to be calculated with $0.01 precision, but need to be
presented inUnits of $1000 (i.e 1k) with no decimal points
c. Results for the profit margin ratio need to be presented in units of % with no decimal point.
d. It is okay for tax to be negative for any given month (deferred tax asset)
e. Generate CSV file for the data.
Suggested Reading – Text Book 1 – Chapter 4 (Vectors, Combining Matrices)
3 Develop a program to create two 3 X 3 matrices A and B and perform the following operations a)
Transpose of the matrix b) addition c) subtraction d) multiplication
Suggested Reading – Text Book 1 – Chapter 4 (Matrices and Arrays – Array Arithmetic)
4 Develop a program to find the factorial of given number using recursive function calls.
Suggested Reading – Reference Book 1 – Chapter 5 (5.5 – Recursive Programming)
Text Book 1 – Chapter 8 (Flow Control and Loops – If and Else, Vectorized If, while loops, for
loops),Chapter 6 (Creating and Calling Functions, Passing Functions to and from other functions)
5 Develop an R Program using functions to find all the prime numbers up to a specified number by
themethod of Sieve of Eratosthenes.
Suggested Reading – Reference Book
1 - Chapter 5 (5.5 – Recursive Programming)
Text Book 1 – Chapter 8 (Flow Control and Loops – If and Else, Vectorized If, while loops, for
loops),Chapter 6 (Creating and Calling Functions, Passing Functions to and from other functions)
6 The built-in data set mammals contain data on body weight versus brain weight. Develop R
commands to:
a) Find the Pearson and Spearman correlation coefficients. Are they similar?
b) Plot the data using the plot command.
c) Plot the logarithm (log) of each variable and see if that makes a difference.
Suggested Reading – Text Book 1 –Chapter 12 – (Built-in Datasets) Chapter 14 – (Scatterplots)
Reference Book 2 – 13.2.5 (Covariance and Correlation)
7 Develop R program to create a Data Frame with following details and do the following operations.
itemCode itemCategory itemPrice
1001 Electronics 700
1002 Desktop Supplies 300
1003 Office Supplies 350
1004 USB 400
1005 CD Drive 800
a) Subset the Data frame and display the details of only those items whose price is greater than or
equal to 350.
b) Subset the Data frame and display only the items where the category is either “Office Supplies” or
“Desktop Supplies”
c) Create another Data Frame called “item-details” with three different fields itemCode,
ItemQtyonHandand ItemReorderLvl and merge the two frames
Suggested Reading –Textbook 1: Chapter 5 (Lists and Data Frames)
8 Let us use the built-in dataset air quality which has Daily air quality measurements in New York, May
to September 1973. Develop R program to generate histogram by using appropriate arguments for the
following statements.
a) Assigning names, using the air quality data set.
b) Change colors of the Histogram
c) Remove Axis and Add labels to Histogram
d) Change Axis limits of a Histogram
e) Add Density curve to the histogram
Suggested Reading –Reference Book 2 – Chapter 7 (7.4 – The ggplot2 Package), Chapter 24
(Smoothingand Shading )
9 Design a data frame in R for storing about 20 employee details. Create a CSV file named “input.csv”
that defines all the required information about the employee such as id, name, salary, start_date, dept.
Import into R and do the following analysis.
a) Find the total number rows & columns
b) Find the maximum salary
c) Retrieve the details of the employee with maximum salary
d) Retrieve all the employees working in the IT Department.
e) Retrieve the employees in the IT Department whose salary is greater than 20000 and write these
details into another file “output.csv”
Suggested Reading – Text Book 1 – Chapter 12(CSV and Tab Delimited Files)
10 Using the built in dataset mtcars which is a popular dataset consisting of the design and fuel
consumption patterns of 32 different automobiles. The data was extracted from the 1974 Motor
Trend US magazine, andcomprises fuel consumption and 10 aspects of automobile design and
performance for 32 automobiles (1973-74 models). Format A data frame with 32 observations on
11 variables : [1] mpg Miles/(US) gallon,
[2] cyl Number of cylinders [3] disp Displacement (cu.in.), [4] hp Gross horsepower [5] drat
Rear axle ratio,[6] wt Weight (lb/1000) [7] qsec 1/4 mile time, [8] vs V/S, [9] am Transmission
(0 = automatic, 1 = manual), [10] gear Number of forward gears, [11] carb Number of
carburetors
● Students can pick one question (experiment) from the questions lot prepared by the examiners jointly.
● Evaluation of test write-up/ conduction procedure and result/viva will be conducted jointly by examiners.
General rubrics suggested for SEE are mentioned here, writeup-20%, Conduction procedure and result in -60%, Viva-
voce 20% of maximum marks. SEE for practical shall be evaluated for 100 marks and scored marks shall be scaled down
to 50 marks (however, based on course type, rubrics shall be decided by the examiners)
Change of experiment is allowed only once and 15% of Marks allotted to the procedure part are to bemade zero.
● The minimum duration of SEE is 02 hours
Suggested Learning Resources:
Book:
1. Cotton, R. (2013). Learning R: A Step by Step Function Guide to Data Analysis. 1st ed. O’Reilly Media Inc.
References:
1. Jones, O., Maillardet. R. and Robinson, A. (2014). Introduction to Scientific Programming and Simulation Using
R. Chapman & Hall/CRC, The R Series.
Davies, T.M. (2016) The Book of R: A First Course in Programming and Statistics. No Starch Press.
R Programming Laboratory BCS358B
R Programming Language
Introduction
R is an open-source programming language that is widely used as a statistical software and
data analysis tool. R generally comes with the Command-line interface. R is available
across widely used platforms like Windows, Linux, and macOS. Also, the R programming
language is the latest cutting-edge tool.
It was designed by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team. R
programming language is an implementation of the S programming language. It also
combines with lexical scoping semantics inspired by Scheme. Moreover, the project
conceives in 1992, with an initial version released in 1995 and a stable beta version in 2000.
R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
It’s a platform-independent language. This means it can be applied to all operating
system.
It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
R programming language is not only a statistic package but also allows us to integrate
with other languages (C, C++). Thus, you can easily interact with many data sources
and statistical packages.
The R programming language has a vast community of users and it’s growing day by
day.
R is currently one of the most requested programming languages in the Data Science job
market that makes it the hottest trend nowadays.
Statistical Features of R:
Basic Statistics: The most common basic statistics terms are the mean, mode, and
median. These are all known as “Measures of Central Tendency.” So using the R
language we can measure central tendency very easily.
Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps, mosaic
plots, biplots, and the list goes on.
Programming Features of R:
Programming in R:
Since R is much similar to other widely used languages syntactically, it is easier to code
and learn in R. Programs can be written in R in any of the widely used IDE like R Studio,
Rattle, Tinn-R, etc. After writing the program save the file with the extension .r. To run
the program use the following command on the command line:
R file_name.r
Example:
Output:
Welcome to GFG!
R Data types are used in computer programming to specify the kind of data that can be
stored in a variable. For effective memory consumption and precise computation, the right
data type must be selected. Each R data type has its own set of regulations and restrictions.
Each variable in R has an associated data type. Each R-Data Type requires different
amounts of memory and has some specific operations which can be performed over it. R
Programming language has the following basic R-data types and the following table shows
the data type and the values that each data type can take.
Variables in R:
R Variables Syntax
Developers often have a need to interact with users, either to get data or to provide some sort
of result. Most programs today use a dialog box as a way of asking the user to provide some
type of input. Like other programming languages in R it’s also possible to take input from the
user. For doing so, there are two methods in R.
In R language readline() method takes input in string format. If one inputs an integer then it
is inputted as a string, lets say, one wants to input 255, then it will input as “255”, like a
string. So one needs to convert that inputted value to the format that he needs. In this case,
string “255” is converted to integer 255. To convert the inputted value to the desired data
type, there are some functions in R,
A data structure is a particular way of organizing data in a computer so that it can be used
effectively. The idea is to reduce the space and time complexities of different tasks. Data
structures in R programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and
whether they’re homogeneous (all elements must be of the identical type) or heterogeneous
(the elements are often of various types). This gives rise to the six data types which are most
frequently utilized in data analysis.
Vectors
Lists
Dataframes
Matrices
Arrays
Factors
Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing
here is all the elements of a vector must be of the identical data type e.g homogeneous data
structures. Vectors are one-dimensional data structures.
Example:
Output:
[1] 1 3 5 7 8
R Strings
Creation of String in R
R Strings can be created by assigning character values to a variable. These strings can be
further concatenated by using various functions and methods to form a big string.
Example
Output
R Vectors
R vectors are the same as the arrays in C language which are used to hold multiple data
values of the same type. One major key point is that in R the indexing of the vector will start
from ‘1’ and not from ‘0’. We can create numeric vectors and character vectors as well.
Types of R vectors
Vectors are of different types which are used in R. Following are some of the types of
vectors:
Numeric vectors: Numeric vectors are those which contain numeric values such as integer,
float, etc.
Output:
[1] "double"
[1] "integer"
Output:
[1] "character"
Logical vectors: Logical vectors in R contain Boolean values such as TRUE, FALSE and
NA for Null values.
Output:
[1] "logical"
Creating a vector
There are different ways of creating R vectors. Generally, we use ‘c’ to combine different
elements together.
R
Output:
using c function 61 4 21 67 89 2
using seq() function 1 3.25 5.5 7.75 10
using colon 2 3 4 5 6 7
Length of R vector
R
Output:
> length(x)
[1] 5
> length(y)
[1] 3
> length(z)
[1] 4
R – Lists
A list in R is a generic object consisting of an ordered collection of objects. Lists are one-
dimensional, heterogeneous data structures. The list can be a list of vectors, a list of matrices,
a list of characters and a list of functions, and so on.
A list is a vector but with heterogeneous data elements. A list in R is created with the use
of list() function. R allows accessing elements of an R list with the use of the index value. In
R, the indexing of a list starts with 1 instead of 0 like in other programming languages.
Creating a List
To create a List in R you need to use the function called “list()”. In other words, a list is a
generic vector containing other objects. To illustrate how a list looks, we take an example
here. We want to build a list of employees with the details. So for this, we want attributes
such as ID, employee name, and the number of employees.
Example:
empId = c(1, 2, 3, 4)
numberOfEmp = 4
print(empList)
Output:
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
R – Array
Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays
are used for the allocation of space at contiguous memory locations. Uni-dimensional arrays
are called vectors with the length being their only dimension. Two-dimensional arrays are
called matrices, consisting of fixed numbers of rows and columns. Arrays consist of all
elements of the same data type. Vectors are supplied as input to the function and then create
an array based on the number of dimensions.
Creating an Array
An array in R can be created with the use of array() function. List of elements is passed to
the array() functions along with the dimensions as required.
Syntax:
array(data, dim = (nrow, ncol, nmat), dimnames=names)
where,
nrow : Number of rows
ncol : Number of columns
nmat : Number of matrices of dimensions nrow * ncol
dimnames : Default value = NULL.
Otherwise, a list has to be specified which has a name for each component of the dimension.
Each component is either a null or a vector of length equal to the dim value of that
corresponding dimension.
Uni-Dimensional Array
A vector is a uni-dimensional array, which is specified by a single dimension, length. A
Vector can be created using ‘c()‘ function. A list of values is passed to the c() function to
create a vector.
Example:
print (vec1)
Output:
[1] 1 2 3 4 5 6 7 8 9
Length of vector : 9
R – Matrices
R – Matrices
Creating a Matrix
To create a matrix in R you need to use the function called matrix(). The arguments to
this matrix() are the set of elements in the vector. You have to pass how many numbers of
rows and how many numbers of columns you want to have in your matrix.
Note: By default, matrices are in column-wise order.
R
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
# No of rows
nrow = 3,
# No of columns
ncol = 3,
byrow = TRUE
# Naming rows
# Naming columns
print(A)
Output:
R Factors
Factors in R Programming Language are data structures that are implemented to categorize
the data or represent categorical data and store it on multiple levels.
They can be stored as integers with a corresponding label to every unique integer. The R
factors may look similar to character vectors, they are integers and care must be taken while
using them as strings. The R factor accepts only a restricted number of distinct values. For
example, a data field such as gender may contain values only from female, male, or
transgender.
In the above example, all the possible cases are known beforehand and are predefined. These
distinct values are known as levels. After a factor is created it only consists of levels that are
by default sorted alphabetically.
Attributes of Factors in R Language
x: It is the vector that needs to be converted into a factor.
Levels: It is a set of distinct values which are given to the input vector x.
Labels: It is a character vector corresponding to the number of labels.
Exclude: This will mention all the values you want to exclude.
Ordered: This logical attribute decides whether the levels are ordered.
nmax: It will decide the upper limit for the maximum number of levels.
# Creating a vector
print(x)
# named gender
gender <-factor(x)
print(gender)
Output
R – Data Frames
R Programming Language is an open-source programming language that is widely used as a
statistical software and data analysis tool. Data Frames in R Language are generic data
objects of R that are used to store tabular data. Data frames can also be interpreted as matrices
where each column of a matrix can be of different data types. R DataFrame is made up of
three principal components, the data, rows, and columns.
R – Data Frames
R – Data Frames
Create Dataframe in R Programming Language
To create an R data frame use data.frame() command and then pass each of the vectors you
have created as arguments to the function.
Example:
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
print(friend.data)
Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
R dataset
PROGRAMS
Program 1:
Demonstrate the steps for installation of R and R Studio. Perform the following:
a. Assign different type of values to variables and display the type of variable. Assign
different types such as Double, Integer, Logical, Complex and Character and
understand the difference between each data type.
b. Demonstrate Arithmetic and Logical Operations with simple examples.
a. Assign different type of values to variables and display the type of variable. Assign
different types such as Double, Integer, Logical, Complex and Character and understand
the difference between each data type.
# Assigning to Variables
c <- "Hello, R!"
# Character d=3.14159
# Double i <- 42
# Integer
l <- TRUE
# Logical
cmp <- 3 + 2i # Complex
# Arithmetic Operations
cat("d + i:", d + i, "\n")
cat("d * i:", d * i, "\n")
cat("d / i", d / i, "\n")
OUTPUT:
Program 2:
Assess the Financial Statement of an Organization being supplied with 2 vectors of data:
Monthly Revenue and Monthly Expenses for the Financial Year. You can create your
own sample data vector for this experiment) Calculate the following financial metrics:
# Sample data for monthly revenue and expenses (in $1000 units)
monthly_revenue <- c(50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 155, 165)
monthly_expenses <- c(30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85)
# Calculate profit after tax for each month (Tax Rate is 30%)
tax_rate <- 0.30
profit_after_tax <- profit * (1 - tax_rate)
# Determine good months, bad months, best month, and worst month
cat(results$Month[results$BadMonth], "\n\n")
cat("Best Month (Max Profit after tax):\n")
cat(results$Month[best_month], "\n\n")
cat("Worst Month (Min Profit after tax):\n")
cat(results$Month[worst_month], "\n\n")
OUTPUT:
Program 3:
Develop a program to create two 3 X 3 matrices A and B and perform the following
operations
# b) Addition
sum = A + B
# c) Subtraction
diff = A - B
# d) Multiplication
prod = A %*% B
print(B)
cat("Transpose of A:\n")
print(A_t)
cat("Transpose of B:\n")
print(B_t)
cat("Addition of A and B:\n")
print(sum)
cat("Subtraction of A and B:\n")
print(diff)
cat("Multiplication of A and B:\n")
print(prod)
OUTPUT:
Program 4:
Develop a program to find the factorial of given number using recursive function calls.
OUTPUT:
Program 5:
Develop an R Program using functions to find all the prime numbers up to a specified
number by the method of Sieve of Eratosthenes
# Function to find all prime numbers up to a specified number using the Sieve of Eratosthenes
sieve_of_eratosthenes <- function(n) {
if (n < 2) {
cat("No prime numbers in the specified range.\n")
return()
}
is_prime <- rep(TRUE, n)
is_prime[1] <- FALSE # 1 is not prime
p <- 2
while (p^2 <= n) {
if (is_prime[p]) {
for (i in seq(p^2, n + 1, by = p)){
is_prime[i] <- FALSE
}
}
p <- p + 1
}
primes <- which(is_prime)
cat("Prime numbers up to", n, "are:\n", primes, "\n")
}
OUTPUT:
Program 6:
The built-in data set mammals contain data on body weight versus brain weight.
Develop R commands to:
a. Find the Pearson and Spearman correlation coefficients. Are they similar?
c. Plot the logarithm (log) of each variable and see if that makes a difference.
# c) Plotting the logarithm (log) of each variable and checking the difference
plot(log_body, log_brain, xlab = "Log Body Weight", ylab = "Log Brain Weight",
OUTPUT:
Program 7:
Develop R program to create a Data Frame with following details and do the following
operations.
itemCode itemCategory
1001 Electronics
1004 USB
1005 CD Drive
Subset the Data frame and display the details of only those items whose price is greater than or
equal to 350.
Subset the Data frame and display only the items where the category is either “Office Supplies”
or
“Desktop Supplies”
Create another Data Frame called “item-details” with three different fields itemCode,
ItemQtyonHand and ItemReorderLvl and merge the two frames
itemCategory <- c("Electronics", "Desktop Supplies", "Office Supplies", "USB", "CD Drive")
print(items_df)
print(summary(items_df$itemPrice))
print(high_priced_items)
# Subset the data frame for items with category as "Office Supplies" or "Desktop Supplies"
print(item_details)
print(merged_data)
OUTPUT:
Program 8:
Let us use the built-in dataset air quality which has Daily air quality measurements in
New York, May to September 1973. Develop R program to generate histogram by using
appropriate arguments for the following statements.
hist(airquality$Ozone, col = "lightgreen", main = "", xlab = "", ylab = "", axes = FALSE)
OUTPUT:
Program 9:
Design a data frame in R for storing about 20 employee details. Create a CSV file
named “input.csv” that defines all the required information about the employee such as
id, name, salary, start_date, dept. Import into R and do the following analysis.
e) Retrieve the employees in the IT Department whose salary is greater than 20000 and
write these details into another file “output.csv”
# c) Retrieve the details of the employee with maximum salary employee_max_salary <-
emp_data[emp_data$salary == max_salary, ] cat("Employee with maximum salary:\n")
print(employee_max_salary)
print(emp_IT)
# e) Retrieve the employees in the IT Department whose salary is greater than 20000
OUTPUT:
Program 10 :
Using the built in dataset mtcars which is a popular dataset consisting of the design and
fuel consumption patterns of 32 different automobiles. The data was extracted from the
1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of
automobile design and performance for 32 automobiles (1973-74 models). Format A
data frame with 32 observations on 11 variables : [1] mpg Miles/(US) gallon, [2] cyl
Number of cylinders [3] disp Displacement (cu.in.), [4] hp Gross horsepower [5] drat
Rear axle ratio,[6] wt Weight (lb/1000) [7] qsec 1/4 mile time, [8] vs V/S, [9] am
Transmission (0 = automatic, 1 = manual), [10] gear Number of forward gears, [11] carb
Number of carburetors. Develop R program, to solve the following:
b) Find the car with the largest hp and the least hp using suitable functions
c) Plot histogram / density for each variable and determine whether continuous
variables are normally distributed or not. If not, what is their skewness?
d) What is the average difference of gross horse power(hp) between automobiles with 3
and 4 number of cylinders(cyl)? Also determine the difference in their standard
deviations.
# a) Total number of observations and variables observations <- nrow(mtcars) variables <-
ncol(mtcars) cat("Total number of observations:", observations, "\n") cat("Total number of
variables:", variables, "\n")
# c) Histogram / density plot and skewness par(mfrow = c(4, 3), mar = c(3, 3, 1, 1)) #
Adjusting margin size for (i in 1:ncol(mtcars)) { hist(mtcars[, i], main = names(mtcars)[i],
xlab = "", col = "skyblue") lines(density(mtcars[, i]), col = "red") # Adding density curve }
# e) Pair of variables with the highest Pearson correlation cor_matrix <- cor(mtcars)
diag(cor_matrix) <- 0 # Exclude diagonal values max_corr <- which(cor_matrix ==
max(cor_matrix), arr.ind = TRUE)
# c) Histogram / density plot and skewness par(mfrow = c(4, 3)) # To display histograms for
each variable in a grid for (i in 1:ncol(mtcars)) { hist(mtcars[, i], main = names(mtcars)[i],
xlab = "", col = "skyblue") lines(density(mtcars[, i]), col = "red") # Adding density curve }
# Calculate skewness library(e1071) #functions for data analysis & ML skew <-
sapply(mtcars, skewness)#apply the skewness() function to each column cat("Skewness of
variables:\n") print(skew)
# e) Pair of variables with the highest Pearson correlation cor_matrix <- cor(mtcars)
diag(cor_matrix) <- 0 # Exclude diagonal values max_corr <- which(cor_matrix ==
max(cor_matrix), arr.ind = TRUE) cat("Pair of variables with the highest Pearson
correlation:", rownames(cor_matrix)[max_corr[1,1]], "and",
colnames(cor_matrix)[max_corr[1,2]], "\n")
OUTPUT:
Program 11:
Demonstrate the progression of salary with years of experience using a suitable data set
(You can create your own dataset). Plot the graph visualizing the best fit line on the plot
of the given data points. Plot a curve of Actual Values vs. Predicted values to show their
correlation and performance of the model.
Interpret the meaning of the slope and y-intercept of the line with respect to the given
data. Implement using lm function. Save the graphs and coefficients in files. Attach the
predicted values of salaries as a new column to the original data set and save the data as
a new CSV file.
set.seed(123)
Generating salaries
png("Salary_Experience_Plot03.png")
dev.off()
plot(Salary, predicted_values, main = "Actual vs. Predicted Salaries", xlab = "Actual Salary",
ylab = "Predicted Salary", col = "green")
jpeg("Actual_vs_Predicted_Salary.jpg")
plot(Salary, predicted_values, main = "Actual vs. Predicted Salaries", xlab = "Actual Salary",
ylab = "Predicted Salary", col = "green")
dev.off()
OUTPUT:
1. Explain what is R?
R is data analysis software which is used by analysts, quants, statisticians, data
scientists and others.
Mean
Median
Distribution
Covariance
Regression
Non-linear
Mixed Effects
GLM
GAM. etc.
You can enter data directly via Data New Data Set
Import data from a plain text (ASCII) or other files (SPSS, Minitab, etc.)
Read a data set either by typing the name of the data set or selecting the data set in the dialog
box
# subtraction
# division
# note order of operations exists
11. What are the data structures in R that is used to perform statistical analyses and create
graphs?
R has data structures like
Vectors
Matrices
Arrays
Data frames
A data frame is made up of rows and columns, where each row denotes an observation
or record and each column a variable or attribute. A data frame’s columns can include a variety
of data kinds, including logical, character, factor, and numeric ones, enabling the storing and
management of the data.
15. Definitions
A vector is simply a list of items that are of the same type.
A list in R can contain many different data types inside it. A list is a collection of data
which is ordered and changeable.
To create a list, use the list() function
A matrix is a two dimensional data set with columns and rows.
A column is a vertical representation of data, while a row is a horizontal representation
of data.
A matrix can be created with the matrix() function. Specify the nrow and ncol
parameters to get the amount of rows and columns
Compared to matrices, arrays can have more than two dimensions.
We can use the array() function to create an array, and the dim parameter to specify the
dimensions
Data Frames are data displayed in a format as a table.
Data Frames can have different types of data inside it. While the first column can be
character, the second and third can be numeric or logical. However, each column should
have the same type of data.
Use the data.frame() function to create a data frame: