0% found this document useful (0 votes)
7 views57 pages

R Programming Notes

The document provides an introduction to the R programming language, detailing its features, environment, and capabilities for statistical computing and graphics. It covers essential topics such as comments, variables, constants, data types, and control flow in R. Additionally, it explains the syntax for single-line comments and the types of variables and constants available in R.

Uploaded by

uday.singh737xxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views57 pages

R Programming Notes

The document provides an introduction to the R programming language, detailing its features, environment, and capabilities for statistical computing and graphics. It covers essential topics such as comments, variables, constants, data types, and control flow in R. Additionally, it explains the syntax for single-line comments and the types of variables and constants available in R.

Uploaded by

uday.singh737xxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

PROGRAMMING

SCHOOL OF INFORMATION TECHNOLOGY


RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA
BHOPAL

SCHOOL OF INFORMATION TECHNOLOGY


RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 1


R statistical programming language: Introduction to R, Functions, Control flow and
Loops, Working with Vectors and Matrices, Reading in Data, Writing Data, Working
with Data, Manipulating Data, Simulation, Linear model, Data Frame, Graphics in R.

What is R?

Introduction to R
R is a language and environment for statistical computing and graphics. It is a GNU
project which is similar to the S language and environment which was developed at Bell
Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical
tests, time-series analysis, classification, clustering) and graphical techniques, and is highly
extensible.
One of R’s strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed. R is available as Free
Software under the terms of the Free Software Foundation’s GNU General Public License in
source code form. It compiles and runs on a wide variety of UNIX platforms and similar
systems (including FreeBSD and Linux), Windows and MacOS.

The R environment
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. It includes
 an effective data handling and storage facility,
 a suite of operators for calculations on arrays, in particular matrices,
 a large, coherent, integrated collection of intermediate tools for data analysis,
 graphical facilities for data analysis and display either on-screen or on hardcopy, and
 a well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.
The term “environment” is intended to characterize it as a fully planned and coherent system,
rather than an incremental accretion of very specific and inflexible tools, as is frequently the
case with other data analysis software.
R, like S, is designed around a true computer language, and it allows users to add additional
functionality by defining new functions. Much of the system is itself written in the R dialect
of S, which makes it easy for users to follow the algorithmic choices made. For
computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run
time. Advanced users can write C code to manipulate R objects directly.
Many users think of R as a statistics system. We prefer to think of it as an environment within
which statistical techniques are implemented. R can be extended (easily) via packages. There

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 2


are about eight packages supplied with the R distribution and many more are available
through the CRAN family of Internet sites covering a very wide range of modern statistics.
R has its own LaTeX-like documentation format, which is used to supply comprehensive
documentation, both on-line in a number of formats and in hardcopy.

The most commonly used online R compilers are:


 JDoodle online R Editor
 Paiza.io online R Compiler
 IdeaOne R Compiler

R Comments
Comments are portions of a computer program that are used to describe a piece of code. For
example,

# declare variable
age = 24

# print variable
print(age)

Here, # declare variable and # print variable are two comments used in the code.
Comments have nothing to do with code logic. They do not get interpreted or compiled and
are completely ignored during the execution of the program.

Types of Comments in R
In general, all programming languages have the following types of comments:
 single-line comments
 multi-line comments
However, in R programming, there is no functionality for multi-line comments. Thus, you
can only write single-line comments in R.

1. R Single-Line Comments
You use the # symbol to create single-line comments in R. For example,

# this code prints Hello World


print("Hello World")

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 3


Output

[1] "Hello World"

In the above example, we have printed the text Hello World to the screen. Here, just before
the print statement, we have included a single-line comment using the # symbol.
Note: You can also include a single-line comment in the same line after the code. For
example,

print("Hello World") # this code prints Hello World

2. R Multi-Line Comments
As already mentioned, R does not have any syntax to create multi-line comments.
However, you can use consecutive single-line comments to create a multi-line comment in R.
For example,

# this is a print statement


# it prints Hello World

print("Hello World")

Output

[1] "Hello World"

In the above code, we have used multiple consecutive single-line comments to create a multi-
line comment just before the print statement.

Purpose of Comments
As discussed above, R comments are used to just document pieces of code. This can help
others to understand the working of our code.
Here are a few purposes of commenting on an R code:
 It increases readability of the program for users other than the developers.
 Comments in R provide metadata of the code or the overall project.
 Comments are generally used by programmers to ignore some pieces of code during testing.
 They are used to write a simple pseudo-code of the program.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 4


R Variables and Constants
In computer programming, a variable is a named memory location where data is stored. For
example,

x = 13.8

Here, x is the variable where the data 13.8 is stored. Now, whenever we use x in our
program, we will get 13.8.

x = 13.8

# print variable
print(x)

Output

[1] 13.8

As you can see, when we print x we get 13.8 as output.

Rules to Declare R Variables


As per our requirements, we can use any name for our variables. However, there are certain
rules that need to be followed while creating a variable:
 A variable name in R can be created using letters, digits, periods, and underscores.
 You can start a variable name with a letter or a period, but not with digits.
 If a variable name starts with a dot, you can't follow it with digits.
 R is case sensitive. This means that age and Age are treated as different variables.
 We have some reserved words that cannot be used as variable names.

Note: In earlier versions of R programming, the period . was used to join words in a multi-
word variable such as first.name, my.age, etc. However, nowadays we mostly use _ for
multi-word variables For example, first_name, my_age, etc.

Types of R Variables

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 5


Depending on the type of data that you want to store, variables can be divided into the
following types.
1. Boolean Variables
It stores single bit data which is either TRUE or FALSE. Here, TRUE means yes
and FALSE means no. For example,

a = TRUE

print(a)
print(class(a))

Output

[1] TRUE
[1] "logical"

Here, we have declared the boolean variable a with the value TRUE. Boolean variables
belong to the logical class so class(a) returns "logical".

2. Integer Variables
It stores numeric data without any decimal values. For example,

A = 14L

print(A)
print(class(A))

Output

[1] 14
[1] "integer"

Here, L represents integer value. In R, integer variables belong to the integer class
so, class(a) returns "integer".

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 6


3. Floating Point Variables
It stores numeric data with decimal values. For example,

x = 13.4

print(x)
print(class(x))

Output

[1] 13.4
[1] "numeric"

Here, we have created a floating point variable named x. You can see that the floating point
variable belongs to the numeric class.

4. Character Variables
It stores a single character data. For example,

alphabet = "a"

print(alphabet)
print(class(alphabet))

Output

[1] "a"
[1] "character"

Here, we have created a character variable named alphabet . Since character variables belong
to the character class, class(alphabet) returns "character".

5. String Variables
It stores data that is composed of more than one character. We use double quotes to represent
string data. For example,

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 7


message = "Welcome to Programiz!"

print(message)
print(class(message))

Output

[1] "Welcome to Programiz!"


[1] "character"

Here, we have created a string variable named message. You can see that the string variable
also belongs to the character class.

Changing Value of Variables


Depending on the conditions or information passed into the program, you can change the
value of a variable. For example,

message = "Hello World!"


print(message)

# changing value of a variable


message <- "Welcome to Programiz!"

print(message)

Output

[1] "Hello World!"


[1] "Welcome to Programiz!"

In this program,
 "Hello World!" - initial value of message
 "Welcome to Programiz!" - changed value of message

Types of R Constants
In R, we have the following types of constants.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 8


 The five types of R constants - numeric, integer, complex, logical, string.
 In addition to these, there are 4 specific types of R constants - Null, NA, Inf, NaN.

R Data Types
A variable can store different types of values such as numbers, characters etc. These different
types of data that we can use in our code are called data types. For example,

x <- 123L

Here, 123L is an integer data. So the data type of the variable x is integer.
We can verify this by printing the class of x.

x <- 123L

# print value of x
print(x)

# print type of x
print(class(x))

Output

[1] 123
[1] "integer"

Here, x is a variable of data type integer.

Different Types of Data Types


In R, there are 6 basic data types:
 logical
 numeric
 integer
 complex
 character
 raw
Let's discuss each of these R data types one by one.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 9


1. Logical Data Type
The logical data type in R is also known as boolean data type. It can only have two
values: TRUE and FALSE. For example,

bool1 <- TRUE

print(bool1)
print(class(bool1))

bool2 <- FALSE

print(bool2)
print(class(bool2))

Output

[1] TRUE
[1] "logical"
[1] FALSE
[1] "logical"

In the above example,


 bool1 has the value TRUE,
 bool2 has the value FALSE.
Here, we get "logical" when we check the type of both variables.
Note: You can also define logical variables with a single letter -
T for TRUE or F for FALSE. For example,

is_weekend <- F
print(class(is_weekend)) # "logical"

2. Numeric Data Type


In R, the numeric data type represents all real numbers with or without decimal values. For
example,

# floating point values


weight <- 63.5

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 10


print(weight)
print(class(weight))

# real numbers
height <- 182

print(height)
print(class(height))

Output

[1] 63.5
[1] "numeric"
[1] 182
[1] "numeric"

Here, both weight and height are variables of numeric type.

3. Integer Data Type


The integer data type specifies real values without decimal points. We use the suffix L to
specify integer data. For example,

integer_variable <- 186L


print(class(integer_variable))

Output

[1] "integer"

Here, 186L is an integer data. So we get "integer" when we print the class
of integer_variable.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 11


4. Complex Data Type
The complex data type is used to specify purely imaginary values in R. We use the suffix i to
specify the imaginary part. For example,

# 2i represents imaginary part


complex_value <- 3 + 2i

# print class of complex_value


print(class(complex_value))

Output

[1] "complex"

Here, 3 + 2i is of complex data type because it has an imaginary part 2i.

5. Character Data Type


The character data type is used to specify character or string values in a variable.
In programming, a string is a set of characters. For example, 'A' is a single character
and "Apple" is a string.
You can use single quotes '' or double quotes "" to represent strings. In general, we use:
 '' for character variables
 "" for string variables
For example,

# create a string variable


fruit <- "Apple"

print(class(fruit))

# create a character variable


my_char <- 'A'

print(class(my_char))

Output

[1] "character"

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 12


[1] "character"

Here, both the variables - fruit and my_char - are of character data type.

6. Raw Data Type


A raw data type specifies values as raw bytes. You can use the following methods to convert
character data types to a raw data type and vice-versa:
 charToRaw() - converts character data to raw data
 rawToChar() - converts raw data to character data
For example,

# convert character to raw


raw_variable <- charToRaw("Welcome to Programiz")

print(raw_variable)
print(class(raw_variable))

# convert raw to character


char_variable <- rawToChar(raw_variable)

print(char_variable)
print(class(char_variable))

Output

[1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a
[1] "raw"
[1] "Welcome to Programiz"
[1] "character"

In this program,
 We have first used the charToRaw() function to convert the string "Welcome to
Programiz" to raw bytes.

This is why we get "raw" as output when we print the class of raw_variable.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 13


 Then, we have used the rawToChar() function to convert the data in raw_variable back to
character form.

This is why we get "character" as output when we print the class of char_variable.

R CONTROL FLOW
R if...else
In computer programming, the if statement allows us to create a decision making program.
A decision making program runs one block of code under a condition and another block of
code under different conditions. For example,
 If age is greater than 18, allow the person to vote.
 If age is not greater than 18, don't allow the person to vote.

R if Statement
The syntax of an if statement is:

if (test_expression) {
# body of if
}

Here, the test_expression is a boolean expression. It returns either True or False. If


the test_expression is
 True - body of the if statement is executed
 False - body of the if statement is skipped

Example: R if Statement

x <- 3

# check if x is greater than 0


if (x > 0) {
print("The number is positive")
}

print("Outside if statement")

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 14


Output

[1] "The number is positive"


[1] "Outside if statement"

In the above program, the test condition x > 0 is true. Hence, the code inside parenthesis is
executed.

Note: If you want to learn more about test conditions, visit R Booleans Expression.

R if...else Statement
We can also use an optional else statement with an if statement. The syntax of an if...else
statement is:

if (test_expression) {
# body of if statement
} else {
# body of else statement
}

The if statement evaluates the test_expression inside the parentheses.


If the test_expression is True,
 body of if is executed
 body of else is skipped
If the test_expression is False
 body of else is executed
 body of if is skipped

Example: R if...else Statement

age <- 15

# check if age is greater than 18


if (age > 18) {
print("You are eligible to vote.")
} else {

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 15


print("You cannot vote.")
}

Output

[1] "You cannot vote."

In the above statement, we have created a variable named age. Here, the test expression is

age > 18

Since age is 16, the test expression is False. Hence, code inside the else statement is
executed.
If we change the variable to another number. Let's say 31.

age <- 31

Now, if we run the program, the output will be:

[1] "You are eligible to vote."

Example: Check Negative and Positive Number

x <- 12

# check if x is positive or negative number


if (x > 0) {
print("x is a positive number")
} else {
print("x is a negative number")
}

Output

[1] "x is a positive number"

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 16


Here, since x > 0 evaluates to TRUE, the code inside the if block gets executed. And, the
code inside the else block is skipped.

R if...else if...else Statement


If you want to test more than one condition, you can use the optional else if statement along
with your if...else statements. The syntax is:

if(test_expression1) {
# code block 1
} else if (test_expression2){
# code block 2
} else {
# code block 3
}

Here,
 If test_expression1 evaluates to True, the code block 1 is executed.
 If test_expression1 evaluates to False, then test_expression2 is evaluated.
o If test_expression2 is True, code block 2 is executed.
o If test_expression2 is False, code block 3 is executed.

Example: R if...else if...else Statement

x <- 0

# check if x is positive or negative or zero


if (x > 0) {
print("x is a positive number")
} else if (x < 0) {
print("x is a negative number")
} else {
print("x is zero")
}

Output

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 17


[1] "x is zero"

In the above example, we have created a variable named x with the value 0. Here ,we have
two test expressions:
 if (x > 0) - checks if x is greater than 0
 else if (x < 0) - checks if x is less than 0
Here, both the test conditions are False. Hence, the statement inside the body of else is
executed.

Nested if...else Statements in R


You can have nested if...else statements inside if...else blocks in R. This is called nested
if...else statement.
This allows you to specify conditions inside conditions. For example,

x <- 20

# check if x is positive
if (x > 0) {

# check if x is even or odd


if (x %% 2 == 0) {
print("x is a positive even number")
} else {
print("x is a positive odd number")
}

# execute if x is not positive


} else {

# check if x is even or odd


if (x %% 2 == 0) {
print("x is a negative even number")
} else {
print("x is a negative odd number")
}
}

Output

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 18


[1] "x is a positive even number"

In this program, the outer if...else block checks whether x is positive or negative. If x is
greater than 0, the code inside the outer if block is executed.
Otherwise, the code inside the outer else block is executed.

if (x > 0) {
... .. ...
} else {
... .. ...
}

The inner if...else block checks whether x is even or odd. If x is perfectly divisible by 2, the
code inside the inner if block is executed. Otherwise, the code inside the inner else block is
executed.

if (x %% 2 == 0) {
... .. ...
} else {
... .. ...
}

R LOOP

In programming, loops are used to repeat a block of code as long as the specified condition is
satisfied. Loops help you to save time, avoid repeatable blocks of code, and write cleaner
code.
In R, there are three types of loops:
 while loops
 for loops
 repeat loops

R while Loop
while loops are used when you don't know the exact number of times a block of code is to be
repeated. The basic syntax of while loop in R is:

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 19


while (test_expression) {
# block of code
}

 Here, the test_expression is first evaluated.


 If the result is TRUE, then the block of code inside the while loop gets executed.
 Once the execution is completed, the test_expression is evaluated again and the same process
is repeated until the test_expression evaluates to FALSE.
 The while loop will terminate when the boolean expression returns FALSE.

Example 1: R while Loop


Let's look at a program to calculate the sum of the first ten natural numbers.

# variable to store current number


number = 1

# variable to store current sum


sum = 0

# while loop to calculate sum


while(number <= 10) {

# calculate sum
sum = sum + number

# increment number by 1
number = number + 1
}

print(sum)

Output

[1] 55

Here, we have declared two variables: number and sum. The test_condition inside
the while statement is number <= 10.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 20


This means that the while loop will continue to execute and calculate the sum as long as the
value of number is less than or equal to 10.

Example 2: while Loop With break Statement


The break statement in R can be used to stop the execution of a while loop even when the
test expression is TRUE. For example,

number = 1

# while loop to print numbers from 1 to 5


while(number <= 10) {
print(number)

# increment number by 1
number = number + 1

# break if number is 6
if (number == 6) {
break
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In this program, we have used a break statement inside the while loop, which breaks the loop
as soon as the condition inside the if statement is evaluated to TRUE.

if (number == 6) {
break
}

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 21


Hence, the loop terminates when the number variable equals to 6. Therefore, only the
numbers 1 to 5 are printed.

Example 3: while Loop With next Statement


You can use the next statement in a while loop to skip an iteration even if the test condition
is TRUE. For example,

number = 1

# while loop to print odd number between 1 to 10


while(number <= 10) {

# skip iteration if number is even


if (number %% 2 == 0) {
number = number + 1
next
}

# print number if odd


print(number)

# increment number by 1
number = number + 1
}

Output

[1] 1
[1] 3
[1] 5
[1] 7
[1] 9

This program only prints the odd numbers in the range of 1 to 10. To do this, we have used
an if statement inside the while loop to check if number is divisible by 2.
Here,
 if number is divisible by 2, then its value is simply incremented by 1 and the iteration is
skipped using the next statement.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 22


 if number is not divisible by 2, then the variable is printed and its value is incremented by 1.

R for Loop
A for loop is used to iterate over a list, vector or any other object of elements. The syntax
of for loop is:

for (value in sequence) {


# block of code
}

Here, sequence is an object of elements and value takes in each of those elements. In each
iteration, the block of code is executed. For example,

numbers = c(1, 2, 3, 4, 5)

# for loop to print all elements in numbers


for (x in numbers) {
print(x)
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

In this program, we have used a for loop to iterate through a sequence of numbers
called numbers. In each iteration, the variable x stores the element from the sequence and the
block of code is executed.

Example 1: Count the Number of Even Numbers


Let's use a for loop to count the number of even numbers stored inside a vector of numbers.

# vector of numbers
num = c(2, 3, 12, 14, 5, 19, 23, 64)

# variable to store the count of even numbers


count = 0

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 23


# for loop to count even numbers
for (i in num) {

# check if i is even
if (i %% 2 == 0) {
count = count + 1
}
}

print(count)

Output

[1] 4

In this program, we have used a for loop to count the number of even numbers in
the num vector. Here is how this program works:
 We first initialized the count variable to 0. We use this variable to store the count of even
numbers in the num vector.
 We then use a for loop to iterate through the num vector using the variable i.

 for (i in num) {
 # code block

 Inside the for loop, we check if each element is divisible by 2 or not. If yes, then we
increment count by 1.

 if (i %% 2 == 0) {
 count = count + 1

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 24


}

Example 2: for Loop With break Statement


You can use the break statement to exit from the for loop in any iteration. For example,

# vector of numbers
numbers = c(2, 3, 12, 14, 5, 19, 23, 64)

# for loop with break


for (i in numbers) {

# break the loop if number is 5


if( i == 5) {
break
}

print(i)
}

Output

[1] 2
[1] 3
[1] 12
[1] 14

Here, we have used an if statement inside the for loop. If the current element is equal to 5, we
break the loop using the break statement. After this, no iteration will be executed.
R repeat Loop
We use the R repeat loop to execute a code block multiple times. However, the repeat loop
doesn't have any condition to terminate the lYou can use the repeat loop in R to execute a
block of code multiple times. However, the repeat loop does not have any condition to
terminate the loop. You need to put an exit condition implicitly with a break statement inside
the loop.
The syntax of repeat loop is:

repeat {
# statements

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 25


if(stop_condition) {
break
}
}

Here, we have used the repeat keyword to create a repeat loop. It is different from
the for and while loop because it does not use a predefined condition to exit from the loop.

Example 1: R repeat Loop


Let's see an example that will print numbers using a repeat loop and will execute until
the break statement is executed.

x=1

# Repeat loop
repeat {

print(x)

# Break statement to terminate if x > 4


if (x > 4) {
break
}

# Increment x by 1
x=x+1

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Here, we have used a repeat loop to print numbers from 1 to 5. We have used an if statement
to provide a breaking condition which breaks the loop if the value of x is greater than 4.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 26


Example 2: Infinite repeat Loop
If you fail to put a break statement inside a repeat loop, it will lead to an infinite loop. For
example,

x=1
sum = 0

# Repeat loop
repeat {

# Calculate sum
sum = sum + x

# Print sum
print(sum)

# Increment x by 1
x=x+1

Output

[1] 1
[1] 3
[1] 6
[1] 10
.
.
.

In the above program, since we have not included any break statement with an exit condition,
the program prints the sum of numbers infinitely.

Example 3: repeat Loop with next Statement


You can also use a next statement inside a repeat loop to skip an iteration. For example,

x=1

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 27


repeat {

# Break if x = 4
if ( x == 4) {
break
}

# Skip if x == 2
if ( x == 2 ) {
# Increment x by 1 and skip
x=x+1
next
}

# Print x and increment x by 1


print(x)
x=x+1

Output

[1] 1
[1] 3

Here, we have a repeat loop where we break the loop if x is equal to 4. We skip the iteration
where x becomes equal to 2.

R Functions
Introduction to R Functions
A function is just a block of code that you can call and run from any part of your program.
They are used to break our code in simple parts and avoid repeatable codes.
You can pass data into functions with the help of parameters and return some other data as a
result. You can use the function() reserve keyword to create a function in R. The syntax is:

func_name <- function (parameters) {


statement

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 28


}

Here, func_name is the name of the function. For example,

# define a function to compute power


power <- function(a, b) {
print(paste("a raised to the power b is: ", a^b))
}

Here, we have defined a function called power which takes two parameters - a and b. Inside
the function, we have included a code to print the value of a raised to the power b.

Call the Function


After you have defined the function, you can call the function using the function name and
arguments. For example,

# define a function to compute power


power <- function(a, b) {
print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments


power(2, 3)

Output

[1] "a raised to the power b is: 8"

Here, we have called the function with two arguments - 2 and 3. This will print the value
of 2 raised to the power 3 which is 8.
The arguments used in the actual function are called formal arguments. They are also called
parameters. The values passed to the function while calling the function are called actual
arguments.

Named Arguments

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 29


In the above function call of the power() function, the arguments passed during the function
call must be of the same order as the parameters passed during function declaration.
This means that when we call power(2, 3), the value 2 is assigned to a and 3 is assigned to b.
If you want to change the order of arguments to be passed, you can use named arguments.
For example,

# define a function to compute power


power <- function(a, b) {
print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments


power(b=3, a=2)

Output

[1] "a raised to the power b is: 8"

Here, the result is the same irrespective of the order of arguments that you pass during the
function call.
You can also use a mix of named and unnamed arguments. For example,

# define a function to compute power


power <- function(a, b) {
print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments


power(b=3, 2)

Output

[1] "a raised to the power b is: 8"

Default Parameters Values


You can assign default parameter values to functions. To do so, you can specify an
appropriate value to the function parameters during function definition.
When you call a function without an argument, the default value is used. For example,

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 30


# define a function to compute power
power <- function(a = 2, b) {
print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments


power(2, 3)

# call function with default arguments


power(b=3)

Output

[1] "a raised to the power b is: 8"


[1] "a raised to the power b is: 8"

Here, in the second call to power() function, we have only specified the b argument as a
named argument. In such a case, it uses the default value for a provided in the function
definition.

Return Values
You can use the return() keyword to return values from a function. For example,

# define a function to compute power


power <- function(a, b) {
return (a^b)
}

# call the power function with arguments


print(paste("a raised to the power b is: ", power(2, 3)))

Output

[1] "a raised to the power b is: 8"

Here, instead of printing the result inside the function, we have returned a^b. When we call
the power() function with arguments, the result is returned which can be printed during the
call.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 31


R VECTOR
R Vectors
A vector is the basic data structure in R that stores data of similar types. For example,
Suppose we need to record the age of 5 employees. Instead of creating 5 separate variables,
we can simply create a vector.

Elements of a Vector

Create a Vector in R
In R, we use the c() function to create a vector. For example,

# create vector of string types


employees <- c("Sabby", "Cathy", "Lucy")

print(employees)

# Output: [1] "Sabby" "Cathy" "Lucy"

In the above example, we have created a vector named employees with


elements: Sabby, Cathy, and Lucy.
Here, the c() function creates a vector by combining three different elements
of employees together.

Access Vector Elements in R


In R, each element in a vector is associated with a number. The number is known as a vector
index.
We can access elements of a vector using the index number (1, 2, 3 …). For example,

# a vector of string type


languages <- c("Swift", "Java", "R")

# access first element of languages


print(languages[1]) # "Swift"

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 32


# access third element of languages
print(languages[3]). # "R"

In the above example, we have created a vector named languages. Each element of the vector
is associated with an integer number.

Vector Indexing in R

Here, we have used the vector index to access the vector elements
 languages[1] - access the first element "Swift"
 languages[3] - accesses the third element "R"

Note: In R, the vector index always starts with 1. Hence, the first element of a vector is
present at index 1, second element at index 2 and so on.

Modify Vector Element


To change a vector element, we can simply reassign a new value to the specific index. For
example,

dailyActivities <- c("Eat","Repeat")


cat("Initial Vector:", dailyActivities)

# change element at index 2


dailyActivities[2] <- "Sleep"

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 33


cat("\nUpdated Vector:", dailyActivities)

Output

Initial Vector: Eat Repeat


Updated Vector: Eat Sleep

Here, we have changed the vector element at index 2 from "Repeat" to "Sleep" by simply
assigning a new value.

Numeric Vector in R
Similar to strings, we use the c() function to create a numeric vector. For example,

# a vector with number sequence from 1 to 5


numbers <- c(1, 2, 3, 4, 5)

print(numbers)

# Output: [1] 1 2 3 4 5

Here, we have used the C() function to create a vector of numeric sequence called numbers.
However, there is an efficient way to create a numeric sequence. We can use the : operator
instead of C().
Create a Sequence of Number in R
In R, we use the : operator to create a vector with numerical values in sequence. For
example,

# a vector with number sequence from 1 to 5


numbers <- 1:5

print(numbers)

Output

[1] 1 2 3 4 5

Here, we have used the : operator to create the vector named numbers with numerical values
in sequence i.e. 1 to 5.
Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 34
Repeat Vectors in R
In R, we use the rep() function to repeat elements of vectors. For example,

# repeat sequence of vector 2 times


numbers <- rep(c(2,4,6), times = 2)

cat("Using times argument:", numbers)

Output

Using times argument: 2 4 6 2 4 6

In the above example, we have created a numeric vector with elements 2, 4, 6. Notice the
code,

rep(numbers, times=2)

Here,
 numbers - vector whose elements to be repeated
 times = 2 - repeat the vector two times
We can see that we have repeated the whole vector two times. However, we can also repeat
each element of the vector. For this we use the each parameter.
Let's see an example.

# repeat each element of vector 2 times


numbers <- rep(c(2,4,6), each = 2)

cat("\nUsing each argument:", numbers)

Output

Using each argument: 2 2 4 4 6 6

In the above example, we have created a numeric vector with elements 2, 4, 6. Notice the
code,

rep(numbers, each = 2)

Here, each = 2 - repeats each element of vector two times

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 35


Loop Over a R Vector
We can also access all elements of the vector by using a for loop. For example,
In R, we can also loop through each element of the vector using the for loop. For example,

numbers <- c(1, 2, 3, 4, 5)

# iterate through each elements of numbers


for (number in numbers) {
print(number)
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Length of Vector in R
We can use the length() function to find the number of elements present inside the vector.
For example,

languages <- c("R", "Swift", "Python", "Java")

# find total elements in languages using length()


cat("Total Elements:", length(languages))

Output

Total Elements: 4

Here, we have used length() to find the length of the languages vector.

R Matrix
A matrix is a two-dimensional data structure where data are arranged into rows and columns.
For example,
Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 36
R 3 * 3 Matrix
Here, the above matrix is 3 * 3 (pronounced "three by three") matrix because it has 3 rows
and 3 columns.

Create a Matrix in R
In R, we use the matrix() function to create a matrix.
The syntax of the matrix() function is

matrix(vector, nrow, ncol)

Here,
 vector - the data items of same type
 nrow - number of rows
 ncol - number of columns
 byrow (optional) - if TRUE, the matrix is filled row-wise. By default, the matrix is filled
column-wise.
Let's see an example,

# create a 2 by 3 matrix
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

print(matrix1)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 37


Output

[,1] [,2] [,3]


[1,] 1 2 3
[2,] 4 5 6

In the above example, we have used the matrix() function to create a matrix named matrix1.

matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

Here, we have passed data items of integer type and used c() to combine data items together.
And nrow = 2 and ncol = 3 means the matrix has 2 rows and 3 columns.
Since we have passed byrow = TRUE, the data items in the matrix are filled row-wise. If we
didn't pass byrow argument as

matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)

The output would be

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

Access Matrix Elements in R


We use the vector index operator [ ] to access specific elements of a matrix in R.
The syntax to access a matrix element is

matrix[n1, n2]

Here,
 n1 - specifies the row position
 n2 - specifies the column position
Let's see an example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

print(matrix1)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 38


# access element at 1st row, 2nd column
cat("\nDesired Element:", matrix1[1, 2])

Output

[,1] [,2]
[1,] "Sabby" "Larry"
[2,] "Cathy" "Harry"

Desired Element: Larry

In the above example, we have created a 2 by 2 matrix named matrix1 with 4 string type
datas. Notice the use of index operator [],

matrix1[1, 2]

Here, [1, 2] specifies we are trying to access element present at 1st row, 2nd column
i.e. "Larry".
Access Entire Row or Column
In R, we can also access the entire row or column based on the value passed inside [].
 [n, ] - returns the entire element of the nth row.
 [ ,n] - returns the entire element of the nth column.
For example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

print(matrix1)

# access entire element at 1st row


cat("\n1st Row:", matrix1[1, ])

# access entire element at 2nd column


cat("\n2nd Column:", matrix1[, 2])

Output

[,1] [,2]
[1,] "Sabby" "Larry"
[2,] "Cathy" "Harry"

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 39


1st Row: Sabby Larry
2nd Column: Larry Harry

Here,
 matrix1[1, ] - access entire elements at 1st row i.e. Sabby and Larry
 matrix1[ ,2] - access entire elements at 2nd column i.e. Larry and Harry
Access More Than One Row or Column
We can access more than one row or column in R using the c() function.
 [c(n1,n2), ] - returns the entire element of n1 and n2 row.
 [ ,c(n1,n2)] - returns the entire element of n1 and n2 column.
For example,

# create 2 by 3 matrix
matrix1 <- matrix(c(10, 20, 30, 40, 50, 60), nrow = 2, ncol = 3)

print(matrix1)

# access entire element of 1st and 3rd row


cat("\n1st and 2nd Row:", matrix1[c(1,3), ])

# access entire element of 2nd and 3rd column


cat("\n2nd and 3rd Column:", matrix1[ ,c(2,3)])

Output

[,1] [,2] [,3]


[1,] 10 30 50
[2,] 20 40 60

1st and 3rd Row: 10 20 30 40 50 60


2nd and 3rd Column: 30 40 50 60

Here,
 [c(1,3), ] - returns the entire element of 1st and 3rd row.
 [ ,c(2,3)] - returns the entire element of 2nd and 3rd column.

Modify Matrix Element in R

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 40


We use the vector index operator [] to modify the specified element. For example,

matrix1[1,2] = 140

Here, the element present at 1st row, 2nd column is changed to 140.
Let's see an example,

# create 2 by 2 matrix
matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)

# print original matrix


print(matrix1)

# change value at 1st row, 2nd column to 5


matrix1[1,2] = 5

# print updated matrix


print(matrix1)

Output

[,1] [,2]
[1,] 1 3
[2,] 2 4

[,1] [,2]
[1,] 1 5
[2,] 2 4

Combine Two Matrices in R


In R, we use the cbind() and the rbind() function to combine two matrices together.
 cbind() - combines two matrices by columns
 rbind() - combines two matrices by rows
The number of rows and columns of two matrices we want to combine must be equal. For
example,

# create two 2 by 2 matrices


even_numbers <- matrix(c(2, 4, 6, 8), nrow = 2, ncol = 2)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 41


odd_numbers <- matrix(c(1, 3, 5, 7), nrow = 2, ncol = 2)

# combine two matrices by column


total1 <- cbind(even_numbers, odd_numbers)
print(total1)

# combine two matrices by row


total2 <- rbind(even_numbers, odd_numbers)
print(total2)

Output

[,1] [,2] [,3] [,4]


[1,] 2 6 1 5
[2,] 4 8 3 7
[,1] [,2]
[1,] 2 6
[2,] 4 8
[3,] 1 5
[4,] 3 7

Here, first we have used the cbind() function to combine the two
matrices: even_numbers and odd_numbers by column. And rbind() to combine two matrices
by row.

Check if Element Exists in R Matrix


In R, we use the %in% operator to check if the specified element is present in the matrix or
not and returns a boolean value.
 TRUE - if specified element is present in the matrix
 FALSE - if specified element is not present in the matrix
For example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

"Larry" %in% matrix1 # TRUE

"Kinsley" %in% matrix1 # FALSE

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 42


Output

TRUE
FALSE

Here,
 "Larry" is present in matrix1, so the method returns TRUE
 "Kinsley" is not present in matrix1, so the method returns FALSE

R Data Frame
A data frame is a two-dimensional data structure which can store data in tabular format.
Data frames have rows and columns and each column can be a different vector. And different
vectors can be of different data types.
Before we learn about Data Frames, make sure you know about R vector.

Create a Data Frame in R


In R, we use the data.frame() function to create a Data Frame.
The syntax of the data.frame() function is

dataframe1 <- data.frame(


first_col = c(val1, val2, ...),
second_col = c(val1, val2, ...),
...
)

Here,
 first_col - a vector with values val1, val2, ... of same data type
 second_col - another vector with values val1, val2, ... of same data type and so on
Let's see an example,

# Create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)

print(dataframe1)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 43


Output

Name Age Vote


1 Juan 22 TRUE
2 Alcaraz 15 FALSE
3 Simantha 19 TRUE

In the above example, we have used the data.frame() function to create a data frame
named dataframe1. Notice the arguments passed inside data.frame(),

data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)

Here, Name, Age, and Vote are column names for vectors of String, Numeric,
and Boolean type respectively.
And finally the datas represented in tabular format are printed.

Access Data Frame Columns


There are different ways to extract columns from a data frame. We can use [ ], [[ ]], or $ to
access specific column of a data frame in R. For example,

# Create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)

# pass index number inside [ ]


print(dataframe1[1])

# pass column name inside [[ ]]


print(dataframe1[["Name"]])

# use $ operator and column name

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 44


print(dataframe1$Name)

Output

Name
1 Juan
2 Alcaraz
3 Simantha
[1] "Juan" "Alcaraz" "Simantha"
[1] "Juan" "Alcaraz" "Simantha"

In the above example, we have created a data frame named dataframe1 with three
columns Name, Age, Vote.
Here, we have used different operators to access Name column of dataframe1.
Accessing with [[ ]] or $ is similar. However, it differs for [ ], [ ] will return us a data frame
but the other two will reduce it into a vector and return a vector.

Combine Data Frames


In R, we use the rbind() and the cbind() function to combine two data frames together.
 rbind() - combines two data frames vertically
 cbind() - combines two data frames horizontally
Combine Vertically Using rbind()
If we want to combine two data frames vertically, the column name of the two data frames
must be the same. For example,

# create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz"),
Age = c(22, 15)
)

# create another data frame


dataframe2 <- data.frame (
Name = c("Yiruma", "Bach"),
Age = c(46, 89)
)

# combine two data frames vertically

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 45


updated <- rbind(dataframe1, dataframe2)
print(updated)

Output

Name Age
1 Juan 22
2 Alcaraz 15
3 Yiruma 46
4 Bach 89

Here, we have used the rbind() function to combine the two data
frames: dataframe1 and dataframe2 vertically.
Combine Horizontally Using cbind()
The cbind() function combines two or more data frames horizontally. For example,

# create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz"),
Age = c(22, 15)
)

# create another data frame


dataframe2 <- data.frame (
Hobby = c("Tennis", "Piano")
)

# combine two data frames horizontally


updated <- cbind(dataframe1, dataframe2)
print(updated)

Output

Name Age Hobby


1 Juan 22 Tennis
2 Alcaraz 15 Piano

Here, we have used cbind() to combine two data frames horizontally.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 46


Note: The number of items on each vector of two or more combining data frames must be
equal otherwise we will get an error: arguments imply differing number of rows or columns.

Length of a Data Frame in R


In R, we use the length() function to find the number of columns in a data frame. For
example,

# Create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)

cat("Total Elements:", length(dataframe1))

Output

Total Elements: 3

Here, we have used length() to find the total number of columns in dataframe1. Since there
are 3 columns, the length() function returns 3.

R Read and Write CSV Files


The CSV (Comma Separated Value) file is a plain text file that uses a comma to separate
values.
R has a built-in functionality that makes it easy to read and write a CSV file.

Sample CSV File


To demonstrate how we read CSV files in R, let's suppose we have a CSV file
named airtravel.csv with following data:

Month, 1958, 1959, 1960


JAN, 340, 360, 417
FEB, 318, 342, 391
Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 47
MAR, 362, 406, 419
APR, 348, 396, 461
MAY, 363, 420, 472
JUN, 435, 472, 535
JUL, 491, 548, 622
AUG, 505, 559, 606
SEP, 404, 463, 508
OCT, 359, 407, 461
NOV, 310, 362, 390
DEC, 337, 405, 432

The CSV file above is a sample data of monthly air travel, in thousands of passengers,
for 1958-1960.
Now, let's try to read data from this CSV File using R's built-in functions.

Read a CSV File in R


In R, we use the read.csv() function to read a CSV file available in our current directory. For
example,

# read airtravel.csv file from our current directory


read_data <- read.csv("airtravel.csv")

# display csv file


print(read_data)

Output

Month, 1958, 1959, 1960


1 JAN 340 360 417
2 FEB 318 342 391
3 MAR 362 406 419
4 APR 348 396 461
5 MAY 363 420 472
6 JUN 435 472 535
7 JUL 491 548 622
8 AUG 505 559 606
9 SEP 404 463 508
10 OCT 359 407 461
11 NOV 310 362 390

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 48


12 DEC 337 405 432

In the above example, we have read the airtravel.csv file that is available in our current
directory. Notice the code,

read_data <- read.csv("airtravel.csv")

Here, read.csv() reads the csv file airtravel.csv and creates a dataframe which is stored in
the read_data variable.
Finally, the csv file is displayed using print().

Note: If the file is in some other location, we have to specify the path along with the file
name as: read.csv("D:/folder1/airtravel.csv") .

Number of Rows and Columns of CSV File in R


We use the ncol() and nrow() function to get the total number of rows and columns present
in the CSV file in R. For example,

# read airtravel.csv file from our directory


read_data <- read.csv("airtravel.csv")

# print total number of columns


cat("Total Columns: ", ncol(read_data))

# print total number of rows


cat("Total Rows:", nrow(read_data))

Output

Total Columns: 4
Total Rows: 12

In the above example, we have used the ncol() and nrow() function to find the total number of
columns and rows in the airtravel.csv file.
Here,
 ncol(read_data) - returns total number of columns i.e. 4

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 49


 nrow(read_data) - returns total number of rows i.e. 12

Using min() and max() With CSV Files


In R, we can also find minimum and maximum data in a certain column of a CSV file using
the min() and max() function. For example,

# read airtravel.csv file from our directory


read_data <- read.csv("airtravel.csv")

# return minimum value of 1960 column of airtravel.csv


min_data <- min(read_data$1960) # 390

# return maximum value of 1958 column of airtravel.csv


min_data <- max(read_data$1958) # 505

Output

[1] 390
[1] 505

Here, we have used the min() and max() function to find the minimum and maximum value
of the 1960 and 1958 column of the airtravel.csv file respectively.
 min(read_data$1960) - returns the minimum value from the 1960 column i.e. 390
 max(read_data$1958) - returns the maximum value from the 1958 column i.e. 505

Subset of a CSV File in R


In R, we use the subset() function to return all the datas from a CSV file that satisfies the
specified condition. For example,

# read airtravel.csv file from our directory


read_data <- read.csv("airtravel.csv")

# return subset of csv where number of air


# traveler in 1958 should be greater than 400
sub_data <- subset(read_data, 1958 > 400)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 50


print(sub_data)

Output

Month, 1958, 1959, 1960


6 JUN 435 472 535
7 JUL 491 548 622
8 AUG 505 559 606
9 SEP 404 463 508

In the above example, we have specified a certain condition inside the subset() function to
extract data from a CSV file.

subset(read_data, 1958 > 400)

Here, subset() creates a subset of airtravel.csv with data column 1958 having data greater
than 400 and stored it in the sub_data data frame.
Since column 1958 has data greater than 400 in 6th, 7th, 8th, and 9th row, only these rows
are displayed.

Write Into CSV File in R


In R, we use the write.csv() function to write into a CSV file. We pass the data in the form
of dataframe. For example,

# Create a data frame


dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))

# write dataframe1 into file1 csv file


write.csv(dataframe1, "file1.csv")

In the above example, we have used the write.csv() function to export a data frame
named dataframe1 to a CSV file. Notice the arguments passed inside write.csv(),

write.csv(dataframe1, "file1.csv")

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 51


Here,
 dataframe1 - name of the data frame we want to export
 file1.csv - name of the csv file
Finally, the file1.csv file would look like this in our directory:

CSV FIle System Output


If we pass "quote = FALSE" to write.csv() as:

write.csv(dataframe1, "file1.csv",
quote = FALSE
)

Our file1.csv would look like this:

CSV FIle System Output

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 52


All the values were wrapped by double quotes " " are removed.

LINEAR MODEL IN R

A linear model is used to predict the value of an unknown variable based on independent
variables. It is mostly used for finding out the relationship between variables and
forecasting. The lm() function is used to fit linear models to data frames in the R
Language. It can be used to carry out regression, single stratum analysis of variance, and
analysis of covariance to predict the value corresponding to data that is not in the data
frame. These are very helpful in predicting the price of real estate, weather forecasting,
etc.
To fit a linear model in the R Language by using the lm() function, We first use
data.frame() function to create a sample data frame that contains values that have to be
fitted on a linear model using regression function. Then we use the lm() function to fit a
certain function to a given data frame.
Syntax:
lm( fitting_formula, dataframe )
Parameter:
 fitting_formula: determines the formula for the linear model.
 dataframe: determines the name of the data frame that contains the data.
Then, we can use the summary() function to view the summary of the linear model. The
summary() function interprets the most important statistical values for the analysis of the
linear model.
Syntax:
summary( linear_model )
The summary contains the following key information:
 Residual Standard Error: determines the standard deviation of the error where the
square root of variance subtracts n minus 1 + # of variables involved instead of
dividing by n-1.
 Multiple R-Squared: determines how well your model fits the data.
 Adjusted R-Squared: normalizes Multiple R-Squared by taking into account how
many samples you have and how many variables you’re using.
 F-Statistic: is a “global” test that checks if at least one of your coefficients is non-
zero.
Example: Example to show usage of lm() function.

 R

# sample data frame

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 53


df <- data.frame( x= c(1,2,3,4,5),

y= c(1,5,8,15,26))

# fit linear model

linear_model <- lm(y ~ x^2, data=df)

# view summary of linear model

summary(linear_model)

Output:
Call:
lm(formula = y ~ x^2, data = df)
Residuals:
1 2 3 4 5
2.000e+00 5.329e-15 -3.000e+00 -2.000e+00 3.000e+00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.0000 3.0876 -2.267 0.10821
x 6.0000 0.9309 6.445 0.00757 **

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.944 on 3 degrees of freedom
Multiple R-squared: 0.9326, Adjusted R-squared: 0.9102
F-statistic: 41.54 on 1 and 3 DF, p-value: 0.007575
Diagnostic Plots
The diagnostic plots help us to view the relationship between different statistical values of
the model. It helps us in analyzing the extent of outliers and the efficiency of the fitted
model. To view diagnostic plots of a linear model, we use the plot() function in the R
Language.

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 54


Syntax:
plot( linear_model )
Example: Diagnostic plots for the above fitted linear model.

 R

# sample data frame

df <- data.frame( x= c(1,2,3,4,5),

y= c(1,5,8,15,26))

# fit linear model

linear_model <- lm(y ~ x^2, data=df)

# view diagnostic plot

plot(linear_model)

Output:

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 55


Plotting Linear model
We can plot the above fitted linear model to visualize it well by using the abline() method.
We first plot a scatter plot of data points and then overlay it with an abline plot of the
linear model by using the abline() function.
Syntax:
plot( df$x, df$y)
abline( Linear_model )
Example: Plotting linear model

 R

# sample data frame

df <- data.frame( x= c(1,2,3,4,5),

y= c(1,5,8,15,26))

# fit linear model

linear_model <- lm(y ~ x^2, data=df)

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 56


# Plot abline plot

plot( df$x, df$y )

abline( linear_model)

Output:

Mr. Gajendra kumar Ahirwar, SOIT RGPV BHOPAL Page 57

You might also like