0% found this document useful (0 votes)
28 views

Unit I - R Programming

Uploaded by

Harshitha B
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Unit I - R Programming

Uploaded by

Harshitha B
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Statistical Analysis and R Programming 2024-25

UNIT I: INTRODUCTION

R programming is an interpreted programming language widely used to analyse statistical


information and a graphical representation. It is also a software environment used to analyse
statistical information, graphical representation, reporting, and data modelling. R programming
language was created by Ross Thaka and Robert Gentleman at the University of Auckland, New
Zealand.
FEATURES OF R PROGRAMMING
1. R is an interpreted language.
2. It is a simple and effective programming language which has been well developed.
3. It is a well-designed, easy, and effective language which has the concepts of user-defined,
looping, conditional, and various I/O facilities.
4. R contains a suite of operators, different types of calculation on arrays, lists and vectors
5. It provides effective data handling and storage facility.
6. It is an open-source, powerful, and highly extensible software.
7. It provides highly extensible graphical techniques.

VARIABLES
Variables are used to store the information to be manipulated and referenced in the R program.
The R variable can store an atomic vector, a group of atomic vectors, or a combination of many R
objects.
R supports two ways of variable assignment:
1. Using equal operator ( = ): Operators use an equal sign to assign values to variables.
Syntax: variable_name = value
Ex: x = 10
2. Using the leftward operator (< -): Operator use a leftward operator to assign values to
variables where data is copied from right to left.
Syntax: variable_name < - value
Ex: x < -20

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 1


Statistical Analysis and R Programming 2024-25

The following rules need to be follow to define the variable.


 A valid variable name consists of a combination of alphabets, numbers, dot(.), and
underscore(_) characters. Ex: var.1_ is valid
 No other special character is allowed except the dot and underscore operators.
Ex: var$1 or var#1 both are invalid.
 Variables can start with alphabets or dot characters.
Ex: .var or var is valid
 If a variable starts with a dot, the next thing after the dot cannot be a number.it should be an
alphabet.
Ex: .3var is invalid
 The variable name should not be a reserved keyword in R.
Ex: TRUE, FALSE, etc.is not valid.

DATA TYPES
R data types are used in computer programming to specify the kind of data that can be stored in a
variable. The operating system allocates memory based on the data type of the variable and decides
what can be stored in the reserved memory.
The following data types are used in R programming:
1. Integer: This data type is used to store the value as an integer.
Ex: 3L, 66L, 2346L
2. Numeric: Decimal value is called numeric in R, and it is the default computational data type
Ex : 12, 32, 112, 54.32
3. Complex: A complex value in R is defined as real value and the pure imaginary value i.
Ex : Z=1+2i, t=7+3i
4. Logical: It is a special data type for data with only two possible values which can be construed
as true/false.
Ex : TRUE and FALSE
5. Character: In R programming, a character is used to represent string values.
Ex : 'a', '"good'", "TRUE", '35.4'

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 2


Statistical Analysis and R Programming 2024-25

OPERATORS
An operator is a symbol tells the compiler to perform specific logical or mathematical
manipulations. In R programming, there are different types of operators, and each operator
performs a different task.
There are as follows
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators

1. Arithmetic Operators
The arithmetic operators are used to perform arithmetic operations like, addition, subtraction,
multiplication, division and modulo. An arithmetic expression is one which comprises
arithmetic operators and variables or constants. Here variables and constants are called as
operands. The arithmetic operators are as follows.
+ : addition
- : subtraction
* : multiplication
/ : division
%% : modulo
^ : power
Ex : a+b, a-b etc.

2. Relational Operators
Relational operators are used to construct relational expressions, which are used to compare
two quantities. A relational expression is of the form operand1 operator operand2. The relation
operator are as follows.
< : is less than
> : is greater than
>= : is greater than or equal to
<= : is lesser than or equal to

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 3


Statistical Analysis and R Programming 2024-25

== : is equal to
!= : is not equal to
Ex: a < b, a = =10.

3. Logical Operators
These are used to construct compound conditional expressions. The logical operators && and
|| are used to combine two expressions and make decision and ! is used to negate a conditional
expression.
The Logical operators are
&& : AND
|| : OR
! : Logical NOT
& : Logical AND
| : Logical OR

Ex: 1 & 0, TRUE || FALSE

4. Assignment Operators
Assignment operators are used to assign the result of an expression to a variable.
Syntax: variable = expression;
<- : Left assignment operators.
= : Equal Operator
Ex: a <- 20, C = 90

5. Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators are not
used for general mathematical or logical computation.

: The colon operator is used to create the series of numbers in sequence for a vector
Ex: v <- 1:8
print(v)
Output
12345678

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 4


Statistical Analysis and R Programming 2024-25

class() FUNCTION
It is a built-in function is used to determine the data type of the variable provided to it. The class
function returns the data type of variable.

Syntax: class(variable)
Ex: var1 = "hello"
print(class(var1))

Output: “character”

VECTORS
In R, a sequence of elements which share the same data type is known as vector. Vector is classified
into two parts:

CREATION OF VECTOR
1) Using the c() function
Vector can be create by using c() function. This function returns a onedimensional array or
simply vector.
Syntax: Vector_Name <- c ( List of Elements)
Ex: Myvec <- c (1,3,1,4,2)
print(Myvec)

Output: 1 3 1 4 2

2) Using the colon(:) operator


We can create a vector with the help of the colon operator.
Syntax : Vector_Name = x:y
Ex: Z <- 1:10
Print (Z)

Output: 1 2 3 4 5 6 7 8 9 10
3) Using the seq() function
A sequence function creates a sequence of elements as a vector. The seq() function is used by
setting step size with ‘by' parameter.

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 5


Statistical Analysis and R Programming 2024-25

Syntax: Vector_Name<-seq(start, stop, by=value)


Ex: seq_vec<-seq(1, 4, by=0.5)
print (seq_vec)

Output: 1.0 1.5 2.0 2.5 3.0 3.5 4.0

 Numeric vector: A vector which contains numeric elements is known as a numeric vector. If
we assign a decimal value to any variable, then that variable will become a numeric type.
Ex: num_vec<-c(10.1, 10.2, 33.2)
print(num_vec )
class(num_vec)

Output: 10.1 10.2 33.2


"numeric"

 Integer vector
A non-fraction numeric value is known as integer data. An integer value can be assigned to
variable by appending L to the value.
Ex: int_vec1<-c(1L,2L,3L,4L,5L)
print(int_vec1)
class(int_vec1)

Output: 1L,2L,3L,4L,5L
“integer”

 Character vector: A vector which contains character elements is known as an character


vector. In R character data type value can be created using double quotes("") or single
quotes('').
Ex: char_vec1<-c("shubham","arpita","nishka","vaishali")
print(char_vec)
class(char_vec1)

Output: "shubham" "arpita" "nishka" "vaishali"


"character"

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 6


Statistical Analysis and R Programming 2024-25

 Logical vector
The logical data types have only two values i.e., True or False. These values are based on
which condition is satisfied. A vector which contains Boolean values is known as the logical
vector.
Ex: d<- 5
e<- 6
f<- 7
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
print(log_vec)
class(log_vec)

Output: TRUE TRUE FALSE TRUE FALSE FALSE


"logical"

ACCESSING ELEMENTS OF VECTORS


Access the elements of a vector with the help of vector indexing. Indexing denotes the position
where the value in a vector is stored. In R, the indexing starts from 1. We can perform indexing
by specifying an integer value in square braces [] next to our vector.
Ex: seq_vec<-seq(1,4,length.out=6)
print(seq_vec)
print(seq_vec[2])

Output: 1.0 1.6 2.2 2.8 3.4 4.


1.6

VECTOR OPERATION
1) Combining vectors
By combining one or more vectors, it forms a new vector which contains all the elements of
each vector.
Ex: p <- c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r <- c (p, q)
print (r)

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 7


Statistical Analysis and R Programming 2024-25

Output: "1" "2" "4" "5" "7" "8" "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"

2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations are performed
member-by-member on vectors.
Ex: a<-c(1,3,5,7)
b<-c(2,4,6,8)
print (a+b)
print (a-b)
print (a*b)
print (a%%b)

Output: 3 7 11 15
-1 -1 -1 -1
2 12 30 56
1 3 5 7

MATRICES
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the
help of the vector input to the matrix function. In R, matrix( ) is used to create matrix.

Syntax: matrix(data, nrow, ncol, byrow, dim_names)


Where data: It is the input vector which is the data elements of the matrix.
nrow: It is the number of rows to create in the matrix.
ncol: It is the number of columns to create in the matrix.
byrow: The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.
dim_name: It is the name assigned to the rows and columns.

Ex: p <- matrix(c(5:16), nrow=4, ncol=3, byrow=TRUE)


print(p)
Output: 5 6 7
8 9 10
11 12 13
14 15 16

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 8


Statistical Analysis and R Programming 2024-25

ASSIGNING NAMES TO THE MATRIX


The column and row names can be define through vector
Ex: row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print(R)

Output:
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

ACCESSING MATRIX ELEMENTS IN R


There are three ways to access the elements from the matrix.
 Access the element which presents on nth row and mth column.
 Access all the elements of the matrix which are present on the nth row.
 Access all the elements of the matrix which are present on the mth column.

Ex: For the above created R matrix, accessing the elements as follow
#Accessing element present on 3rd row and 2nd column
print(R[3,2])

#Accessing element present in 3rd row


print(R[3,])
#Accessing element present in 2nd column
print(R[, 2])

MODIFICATION OF THE MATRIX


In matrix modification, the first method is to assign a single element to the matrix at a particular
position. By assigning a new value to that position, the old value will get replaced with the new
one.
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 9
Statistical Analysis and R Programming 2024-25

Syntax: matrix[n, m]<-y


Here, n and m are the rows and columns of the element, respectively. And, y is the value which is
assign to modify our matrix.
Ex:
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))

#Assigning value 20 to the element at 3d row and 2nd column


R[3,2] <- 20
print(R)

Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16

Use of Relational Operator


R[R= =12]<-0
print(R)
output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 0 13
row4 14 15 16

ADDITION OF ROWS AND COLUMNS


The cbind() and rbind() function are used to add a column and a row respectively.
 rbind(): it is used to add the a row to the existing matrix.
Ex: R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
rbind(R, c(17,18,19))
print(R)

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 10


Statistical Analysis and R Programming 2024-25

Output:
col1 col2 col3
row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19

 cbind(): it is used to add the a column to the existing matrix.


Ex: R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
cbind(R, c(17,18,19,20))
print(R)

Output:
col1 col2 col3
row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20

MATRIX OPERATIONS
In R, we can perform the mathematical operations on a matrix such as addition, subtraction,
multiplication, etc.
Ex: R <- matrix(c(5:16), nrow = 4,ncol=3)
S <- matrix(c(1:12), nrow = 4,ncol=3)
sum<-R+S
print(sum)
sub<-R-S
print(sub)
mul<-R*S
print(mul)
div<-R/S
print(div)
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 11
Statistical Analysis and R Programming 2024-25

Output:
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28

[,1] [,2] [,3]


[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4

[,1] [,2] [,3]


[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192

[,1] [,2] [,3]


[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333

ARRAYS
In R, arrays are the data objects which allow us to store data in more than two dimensions. In R,
an array is created using array () function. This function takes a vector as an input and to create an
array. it uses vectors values in the dim parameter.
Ex:- If we will create an array of dimension (2, 3, 4) then it will create 4 rectangular matrices of
2 row and 3 columns.

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 12


Statistical Analysis and R Programming 2024-25

Syntax:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
Where data: It is an input vector is given to the array.
row_size: the number of row elements an array can store.
column_size: the number of columns elements an array can store.
matrices: In R, the array consists of multi-dimensional matrices
dim_names: This is used to change the default names of rows and columns.

Ex: vec1 <-c(1,3,5)


vec2 <-c(10,11,12,13,14,15)
res <- array(c(vec1,vec2), dim=c(3,3,2))
print(res)

Output:
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

NAMING ROWS AND COLUMNS


In R, the names to the rows, columns and matrices of the array can be given. This is done with the
help of the dimname parameter of the array() function.
Ex: vec1 <-c(1,3,5) #Creating two vectors of different lengths
vec2 <-c(10,11,12,13,14,15)
col_names <- c("Col1","Col2","Col3") #Initializing names for rows, columns and matrices
row_names <- c("Row1","Row2","Row3")

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 13


Statistical Analysis and R Programming 2024-25

matrix_names <- c("Matrix1","Matrix2")


res <- array(c(vec1,vec2), dim=c(3,3,2), dimnames=list(row_names, col_names,
matrix_names))
print(res)

Output:
, , Matrix1
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

, , Matrix2
Col1 Col2 Col3
Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

ACCESSING ARRAY ELEMENTS


The elements are accessed with the index.
Ex: For the above created array
print(res[3, ,2]) #To print third row of second matrix
Output: 5 12 15
print(res[3,2,2]) #To print third row second column element of 2nd matrix
Output: 12
print(res[ ,2,1]) #To print second column element of 1nd matrix
Output: 10 11 12

MANIPULATION OF ELEMENTS
The array is made up matrices in multiple dimensions so that the operations on elements of an
array are carried out by accessing elements of the matrices.
Ex: #Creating two vectors of different lengths

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 14


Statistical Analysis and R Programming 2024-25

vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)
res1 <- array(c(vec1,vec2),dim=c(3,3,1))
print(res1)
vec1 <-c(8,4,7)
vec2 <-c(16,73,48,46,36,73)
res2 <- array(c(vec1,vec2),dim=c(3,3,1))
print(res2)
res3 <- mat1+mat2
print(res3)

Output:
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73

[,1] [,2] [,3]


[1,] 9 26 59
[2,] 7 84 50
[3,] 12 60 88

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 15


Statistical Analysis and R Programming 2024-25

LISTS
In R, A list is a data structure which has components of mixed data types. Lists are the objects of
R which contain elements of different types such as number, vectors, string and another list inside
it.
The function which is used to create a list in R is list( ).
Ex: list_1<-list(1,2,3)
list_2<-list("Shubham","Arpita","Vaishali")
print(list_1)
print(list_2)

Output:
1
2
3
"Shubham"
"Arpita"
"Vaishali"

Ex: list_data <- list("Shubham","Arpita",c(1,2,3,4,5), TRUE, FALSE, 22.5, 12L)


print(list_data)
Output:
"Shubham"
"Arpita"
12345
TRUE
FALSE
22.5
12

GIVING A NAME TO LIST ELEMENTS


There are only three steps to print the list data corresponding to the name:

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 16


Statistical Analysis and R Programming 2024-25

1. Creating a list.
2. Assign a name to the list elements with the names() function.
3. Print the list data.
Ex: list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow= 2),
list("BCA","MCA","B.tech"))
names (list_data) <- c("Students", "Marks", "Course")
print(list_data)

Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."

ACCESSING LIST ELEMENTS


R provides two ways to access the elements of a list.
1) Using indexing method performed in the same way as a vector.
2) Access the elements of a list with the help of names.

Ex: Accessing elements using index


list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
print(list_data[1])

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 17


Statistical Analysis and R Programming 2024-25

Output:
"Shubham" "Arpita" "Nishka"

Ex: Accessing elements using names


list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80),nrow = 2),
list("BCA","MCA","B.tech"))
names(list_data) <- c("Student", "Marks", "Course")
print(list_data["Student"])

Output:
$Student
"Shubham" "Arpita" "Nishka"

MANIPULATION OF LIST ELEMENTS


R allows us to add, delete, or update elements in the list.
Ex: list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
names(list_data) <- c("Student", "Marks", "Course")
list_data[4] <- "Moradabad"
print(list_data[4])
list_data[4] <- NULL
print(list_data[4])
list_data[3] <- "Masters of computer applications"
print(list_data[3])

Output:
"Moradabad"
$<NA>
NULL
$Course
"Masters of computer applications"

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 18


Statistical Analysis and R Programming 2024-25

CONVERTING LIST TO VECTOR


There is a drawback with the list, i.e., we cannot perform all the arithmetic operations on list
elements. This drawback can be overcome with the function unlist( ), this function converts the
list into vectors.
Ex: list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
v1 <- unlist(list1)
v2 <- unlist(list2)
result <- v1+v2
print(result)

Output: 1 2 3 4 5
10 11 12 13 14
11 13 15 17 19

MERGING LISTS
R allows us to merge one or more lists into one list. To merge the lists or combine the list pass all
the lists into list function as a parameter, and it returns a list which contains all the elements which
are present in the lists.

Ex: Even_list <- list(2,4,6,8)


Odd_list <- list(3,5,7,9)
merged.list <- list(Even_list,Odd_list) # Merging the two lists.
print(merged.list)
Output:
2
4
6
8
3
5

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 19


Statistical Analysis and R Programming 2024-25

7
9

DATA FRAMES
A data frame is a two-dimensional array-like structure or a table in which a column contains values
of one variable, and rows contains one set of values from each column.
A data frame is a special case of the list in which each component has equal length. A matrix can
contain one type of data, but a data frame can contain different data types such as numeric,
character, factor, etc.
There are following characteristics of a data frame.
 The columns name should be non-empty.
 The rows name should be unique.
 The data is stored in a data frame can be a factor, numeric, or character type.
 Each column contains the same number of data items.

In R, the data frames are created with frame() function of data. This function contains the vectors
of any type such as numeric, character, or integer.
Ex: Create a data frame that contains employee id (integer vector), employee name(character
vector), salary (numeric vector), and starting date(Date vector).
Ex: emp.data<- data.frame( employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")), )
print(emp.data)

Output:
employee_id employee_name sal starting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 915.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 20
Statistical Analysis and R Programming 2024-25

EXTRACTING DATA FROM DATA FRAME


Extract the data in three ways
 Extract the specific columns from a data frame using the column name.
 Extract the specific rows also from a data frame.
 Extract the specific rows corresponding to specific columns.

Extracting the specific columns from a data frame


Ex: For the above created data frame
final <- data.frame(emp.data$employee_id, emp.data$sal)
print(final)

Output:
emp.data.employee_id emp.data.sal
1 623.30
2 515.20
3 611.00
4 729.00
5 843.25

Extracting the specific rows from a data frame


Ex: final <- emp.data[1,]
print(final)

Output:
employee_id employee_name sal starting_date
1 Shubham 623.3 2012-01-01

Extracting specific rows corresponding to specific columns


Ex: # Extracting 2nd and 3rd row corresponding to the 1st and 4th column
final <- emp.data[c(2,3),c(1,4)]
print(final)

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 21


Statistical Analysis and R Programming 2024-25

Output:
employee_id starting_date
2 2013-09-23
3 2014-11-15

MODIFICATION IN DATA FRAME


It is possible to add and delete rows and columns to the data frame using cbind() and rbind()
functions.
 rbind(): used to add row to the dataframe
Syntax: rbind( dataframe name, new row variable)

Ex: x <- list(6,"Vaishali",547,"2015-09-01")


rbind(emp.data, x)

Output: employee_id employee_name sal starting_date


1 Shubham 623.30 2012-01-01
2 Arpita 515.20 2013-09-23
3 Nishka 611.00 2014-11-15
4 Gunjan 729.00 2014-05-11
5 Sumit 843.25 2015-03-27
6 Vaishali 547.00 2015-09-01

 cbind(): used to add column in the data frame


Syntax: rbind( dataframe name, new coloumn variable)

Ex: y <- c("Moradabad","Lucknow","Etah","Sambhal","Khurja")


cbind(emp.data, Address=y)

Output:
employee_id employee_name sal starting_date Address
1 Shubham 623.30 2012-01-01 Moradabad
2 Arpita 515.20 2013-09-23 Lucknow

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 22


Statistical Analysis and R Programming 2024-25

3 Nishka 611.00 2014-11-15 Etah


4 Gunjan 729.00 2014-05-11 Sambhal
5 Sumit 843.25 2015-03-27 Khurja

NON-NUMERIC VALUES
LOGICAL VALUES
 Logical-values can only take on two values: TRUE or FALSE.
 Logical-values represent binary states like - >yes/no and ->one/zero
 Logical-values are used to indicate whether a condition has been met or not.TRUE and
FALSE Notation
 Logical-values are represented as TRUE and FALSE

Assigning Logical-values
Ex: b1 <- TRUE
b2 <- FALSE

Vectors can be filled with logical-values using T or F.


Ex: myvec <- c(T,T,F,F,F)

Vector Length can determine using the `length` function.


Ex: length(myvec) // It returns 5

A Logical Outcome: Relational Operators


 Relational operators are used to find the relationship between two operands.
 The output of relational expression is either TRUE or FALSE.
 The 2 operands may be constants, variables or expressions.
Ex: 4 > 5 returns FALSE
4 < 5 returns TRUE
4 >= 5 returns FALSE
4 <= 5 returns TRUE
4 == 5 returns FALSE
4 != 5 returns TRUE

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 23


Statistical Analysis and R Programming 2024-25

any & all Functions


any() : Checks whether at least one element in a vector meets a specific condition. It returns TRUE
if any element satisfies the condition; otherwise, it returns FALSE.
Ex: vector1 <- c(1, 2, 3, 4, 5)
result <- any(vector1 > 3)

Output:
TRUE

all(): Checks whether all elements in a vector meet a specific condition. It returns TRUE if all
elements satisfy the condition; otherwise, it returns FALSE
Ex: vector2 <- c(1, 2, 3, 4, 5)
result <- all(vector2 > 0)

Output:
TRUE

SHORT AND LONG VERSIONS


There are versions of logical operators: i) Short versions ii) Long versions
i) Short versions are for element-wise comparisons. Short versions return multiple logical-
values.
Ex: `&`, `|`
Element-wise comparisons are performed when comparing two vectors of equal length.
Element-wise comparisons return a vector of logical-values.
Ex: b1 <- c(T, F, F)`
b2 <- c(F, T, F)`
`b1 & b2` returns `[F, F, F]`.
`b1 | b2` returns `[T, T, F]`.

ii) Long versions are for comparing individual values. Long versions return a single
logical - value.
Ex: `&&`, `||`
Using long versions of logical operators evaluates only the first pair of logicals in two
vectors.
Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 24
Statistical Analysis and R Programming 2024-25

Ex: b1 <- c (T, F, F)


b2 <- c (F, T, F)
b1 && b2 returns `F`.
b1 || b2` returns `T`.

LOGICAL SUBSETTING AND EXTRACTION


Logical subsetting and extraction involve using logical conditions to select elements that satisfy a
particular criterion.
 A logical vector with TRUE and FALSE values based on the condition.
 Then, use this vector to subset or extract elements.
Ex: numeric_vector <- c(1, 2, 3, 4, 5)
even_numbers <- numeric_vector[numeric_vector %% 2 == 0]

Output:
24

STRING
 A string is a data type.
 It is used to represent text or character data.
 Strings can consist of almost any combination of characters, including numbers.
 Strings are commonly used for storing and manipulating textual information.
For ex: names, sentences, and text-data extracted from files or databases.
Strings can create by using single or double quotation marks.
Ex: single_quoted <- 'This is a single-quoted string.'
double_quoted <- "This is a double-quoted string."

nchar(): It is used to determine the number of characters in a given string. It calculates and returns
the length of a string in terms of the number of characters
Ex: my_string <- "Hello, World!"
string_length <- nchar(my_string)
cat("The length of the string is:", string_length)

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 25


Statistical Analysis and R Programming 2024-25

Output:
The length of the string is: 13

CONCATENATION
Two main functions are used for concatenating strings: `cat` and `paste`.
1) Using the cat() Function
cat() can be used for concatenating and printing strings with optional separators.
Ex: cat("Hello", "World")

Output:
"Hello World"

2) Using the paste() Function


paste() can be used to concatenate multiple strings into one, with optional separator and other
arguments.
Ex: concatenated <- paste("Hello", "World")

Output:
"Hello World"

An optional argument `sep` is used as a separator between concatenated strings.


Ex: concatenated <- paste("Hello", "World", sep = ", ")

Output:
"Hello, World"

ESCAPE SEQUENCES
 The backslash (\) is used to invoke an escape sequence.
 Escape sequences allow to enter characters that control the format and spacing of the string.
Ex: `\n` starts a newline.
`\t` represents a horizontal tab.
`\b` invokes a backspace.
`\\` is used to include a single backslash.
`\"` includes a double quote.

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 26


Statistical Analysis and R Programming 2024-25

SUBSTRINGS AND MATCHING


PATTERN MATCHING
Pattern matching allows to inspect a given string to identify smaller strings within it.
1. substr(): it is used to extract substrings within a given string
Ex: original_string <- "Hello, World!"
substring <- substr(original_string, start = 1, stop = 5)

Output:
"Hello"

2. sub(): It is used for replacing the first occurrence of a pattern within a string
Ex: text <- "I like apples, but apples are red."
new_text <- sub("apple", "banana", text)

Output:
I like bananas, but apples are red.

3. gsub(): It is used for replacing all occurrences of a pattern within a string.


Ex: text <- "I like apples, but apples are red."
new_text <- gsub("apple", "banana", text)

Output:
I like bananas, but bananas are red.

SPECIAL VALUES
When a data set has missing observations or when a practically infinite number is calculated the
software has some unique terms reserved for these situations.
They are
 INF and -INF: When a number is too large for R to represent, the value is given as Infinite.
Ex: 1 / 0
Inf+1

Output: INF

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 27


Statistical Analysis and R Programming 2024-25

 Nan (Not a Number): In some situations, it is impossible to express the result of calculation
using number, in such cases Nan is given as the output.
Ex: -Inf+Inf Output: NaN
Inf/Inf
0/0

 NULL: This value is used to explicitly define an empty entity.


Ex: f < - NULL Output: NULL
print(f)

 NA (Not Available):- If the value is not define, data value is out of range, in such cases NA
values be printed as output.
Ex: X< - c (1,2,3) Output: NA
X[4]

COERCION
In R programming, converting from one object or data type to another object or data type is referred
as coercion.
There are two types of coercion
1. Implicit coercion: This type of coercion occurs automatically.
Ex: The logical value True will be treated as 1 and False will be treated as 0.
2. Explicit coercion: This type of coercion can be done with the help of Is-Dot Object-Checking
Functions and As-Dot Object-Checking Functions.

Is-Dot Object-Checking Functions


To check whether the object is a specific class or data type and it will return a TRUE or
FALSE logical value.
Ex: num.vec1 <- 1:4
print(num.vec1)
is.integer(num.vec1) // TRUE
is.numeric(num.vec1) // TRUE
is.matrix(num.vec1) // FALSE
is.data.frame(num.vec1) // FALSE

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 28


Statistical Analysis and R Programming 2024-25

is.vector(num.vec1) // TRUE
is.logical(num.vec1) // FALSE

As-Dot Coercion Functions


This explicit coercion can be achieved with the as-dot functions.
Ex; as.numeric(c(T,F,F,T)) //1 0 0 1
foo <- 34
foo.ch<-as.character(foo) //"34"
as.logical(c("1","0","1","0","0")) //NA NA NA NA NA

BAS IC PLOTT ING


The coordinates are usually represented with points written as a pair: (x value, y value). The R
function plot() is used to plot graphs. It takes two vectors—one vector of x locations and one
vector of y locations.
plot(): function is used to draw points (markers).
Ex: foo <- c(1.1, 2, 3.5, 3.9 ,4.2)
bar <- c(2, 2.2, -1.3, 0, 0.2)
plot(foo,bar)

Output

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 29


Statistical Analysis and R Programming 2024-25

GRAPHICAL PARAMETERS
There are a wide range of graphical parameters that can be supplied as arguments to the plot
function
 type –types parameter tells R how to plot the supplied coordinates.
The default value for type is "p", which can be interpreted as “points only.” If type="l"
meaning “lines only”. "b" for both points and lines "o" for overplotting the points with
lines. The option type="n" results in no points or lines plotted.
Ex: foo <- c(1.1, 2, 3.5, 3.9 ,4.2)
bar <- c(2, 2.2, -1.3, 0, 0.2)
plot(foo,bar,type="b")

 main, xlab, ylab : Options to include plot title, the horizontal axis label,and the vertical axis
label, respectively
Ex: > plot(foo,bar,type="b",main="My lovely plot", xlab="x axis label", ylab="location y")
plot(foo,bar,type="b",main="My lovely plot\ntitle on two lines",xlab="", ylab="")

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 30


Statistical Analysis and R Programming 2024-25

 col : it is a coloris to use for plotting points and lines. The simplest options are to use an
integer selector or a character string. The default color is integer 1 or the character string
"black". There are eight possible integer values and around 650 character strings tspecify
color. also specify colors using RGB (red,green, and blue) levels
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="",col=2)

 pch : pch stands for point character. This selects which character to use for plotting
individual points. The pch parameter controls the character used to plot individual data
points. a single character to use for each point, or specify a value between 1 and 25. The
symbols corresponding to each integer are shown below.


Ex: foo <- c(1.1, 2, 3.5, 3.9 ,4.2)
bar <- c(2, 2.2, -1.3, 0, 0.2)
plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=4,pch=8)

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 31


Statistical Analysis and R Programming 2024-25

 cex: It stands for character expansion. This controls the size of plotted point characters.
Ex:plot(foo,bar,type="b",main="Mylovelyplot",xlab="",ylab="", col=4, pch=8, cex=2.3)

 lty: It stands for line type. This specifies the type of line to use to connect the points (for
example, solid, dotted, or dashed). It take the values 1 through 6. These options are shown
in the figure below.
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=4,pch=8,lty=2)

 lwd: It stands for line width. This controls the thickness of plotted lines.
Ex: plot(foo,bar,type="b",main="My lovely plot", xlab="",ylab="",col=4, pch=8,lty=2,
cex=2. ,lwd=3.3)

 xlim, ylim :This provides limits for the horizontal range and vertical range (respectively)
of the plotting region.
Ex: plot(foo,bar,type="b",main="My lovely plot",xlab="",ylab="", col=6, pch=15, lty=3,
cex=0.7,lwd=2, xlim=c(3,5), ylim=c(-0.5,0.2))

ADVANTAGES AND DISADVANTAGES


ADVANTAGES
 Open Source: An open-source language is a language which we can work without any need
for a license or a fee.

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 32


Statistical Analysis and R Programming 2024-25

 Platform Independent: R is a platform-independent language or cross-platform programming


language which means its code can run on all operating systems.
 Machine Learning Operations: R allows us to do various machine learning operations such
as classification and regression.
 Quality plotting and graphing: R simplifies quality plotting and graphing using plot ()
function.
 Statistics: R is mainly known as the language of statistics.

DISADVANTAGES
 Basic Security: R lacks basic security. It is an essential part of most programming. R as it
cannot be embedded in a web application due to less security.
 Lesser Speed: R programming language is much slower than other programming languages
such as MATLAB and Python. In comparison to other programming language, R packages
are much slower.
 Complicated Language: The people who don't have prior knowledge or programming
experience may find it difficult to learn R.

Shruthi S, Asst. Professor, GSSS SSFGC, Mysuru Page 33

You might also like