Data Analysis Using R - 2
Data Analysis Using R - 2
2. R Programming Fundamentals
2.2 Features of R :
R is a simple and effective programming language that has been well-developed, as well as R is
data analysis software.
R is a well - designed, easy, and effective language that has the concepts of conditionals,
looping, user-defined recursive procedures, and various I/O facilities.
R has a large, consistent, and incorporated set of tools used for data analysis.
R contains a suite of operators for different types of calculations on arrays, lists, and vectors.
R provides highly extensible graphical techniques.
R has an effective data handling and storage facility.
R is a vibrant online community.
R is free, open-source, robust, and highly extensible.
1.Logical: TRUE/FALSE
A logical value is mostly created when a comparison between variables are done. An example
will be like:
x = 4 # Sample values
y = 3
z = x > y # Comparing two values
Output
[1] TRUE
[1] "logical"
[1] "logical"
2. Character :
R supports character data types where you have all the alphabets and special characters. It
stores character values or strings. Strings in R can contain alphabets, numbers, and symbols.
Output
[1] "character"
[1] "character"
Decimal values are called numerics in R. It is the default R data type for numbers in R. If you
assign a decimal value to a variable x as follows, x will be of numeric type. Real numbers with a
decimal point are represented using this data type in R. it uses a format for double-precision
floating-point numbers to represent numerical values.
# print the class name of variable # print the class name of variable
print(class(x)) print(class(y))
Output Output
[1] "numeric" [1] "numeric"
[1] "double" [1] "double"
R supports integer data types which are the set of all integers. You can create as well as convert
a value into an integer type using the as.integer() function. You can also use the capital ‘L’
notation as a suffix to denote that a particular value is of the integer R data type.
x = as.integer(5) # Create an integer value
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
1. Arithmetic Operators
Arithmetic operations in R simulate various math operations, like addition, subtraction,
multiplication, division, and modulo using the specified operator between operands, which may be
either scalar values, complex numbers, or vectors.
I. Addition operator (+): The values at the corresponding positions of both operands are added.
II. Subtraction Operator (-): The second operand values are subtracted from the first.
a <- 6
b <- 8.4
print (a-b)
Output : -2.4
III. Multiplication Operator (*) : The multiplication of corresponding elements of vectors and
Integers are multiplied with the use of the ‘*’ operator.
a <- c(4,4)
b <- c(5,5)
print (a*b)
Output : 20 20
IV. Division Operator (/) : The first operand is divided by the second operand with the use of the
‘/’ operator.
a <- 10
b <- 5
print (a/b)
Output : 2
V.Power Operator (^): The first operand is raised to the power of the second operand.
a <- 4
b <- 5
print(a^b)
Output : 1024
VI. Modulo Operator (%%): The remainder of the first operand divided by the second operand
is returned.
# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
Output :
Addition of vectors : 2 5
Subtraction of vectors : -2 -1
Multiplication of vectors : 0 6
Division of vectors : 0 0.6666667
Modulo of vectors : 0 2
Power operator : 0 8
2. Logical Operators
Logical operations in R simulate element-wise decision operations, based on the specified operator
between the operands, which are then evaluated to either a True or False boolean value.
I. Element-wise Logical AND operator (&): It is called Element-wise Logical AND operator. It
combines each element of the first vector with the corresponding element of the second vector and
gives a output TRUE if both the elements are TRUE.
v <- c(3,1,TRUE,2+3i)
t <- c(4,1,FALSE,2+3i)
print(v & t)
Output :TRUE TRUE FALSE TRUE
II. Element-wise Logical OR operator (|): It is called Element-wise Logical OR operator. It combines
each element of the first vector with the corresponding element of the second vector and gives a
output TRUE if one the elements is TRUE.
III. NOT operator (!): It is called Logical NOT operator. Takes each element of the vector and gives
the opposite logical value.
v <- c(3,0,TRUE,2+3i)
print(!v)
Output :FALSE TRUE FALSE FALSE
IV. Logical AND operator (&&): Called Logical AND operator. Takes first element of both the vectors
and gives the TRUE only if both are TRUE.
v <- c(3,0,TRUE,2+2i)
t <- c(1,3,TRUE,2+3i)
print(v && t)
Output :TRUE
V. Logical OR operator (||): Called Logical OR operator. Takes first element of both the vectors and
gives the TRUE if one of them is TRUE.
v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
print(v || t)
Output :False
# R program to illustrate
# the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)
3. Relational Operators:
The relational operators in R carry out comparison operations between the corresponding elements
of the operands. Returns a boolean TRUE value if the first operand satisfies the relation compared to
the second. A TRUE value is always considered to be greater than the FALSE.
I. Less than (<): Checks if each element of the first vector is less than the corresponding element of
the second vector.
II. Less than equal to (<=) : Checks if each element of the first vector is less than or equal to the
corresponding element of the second vector.
a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a <= b)
Output : TRUE FALSE TRUE TRUE
III. Greater than (>) :Checks if each element of the first vector is greater than the corresponding
element of the second vector.
a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a > b)
Output :FALSE TRUE FALSE FALSE
IV. Greater than equal to (>=):Checks if each element of the first vector is greater than or equal to
the corresponding element of the second vector.
a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a >= b)
Output :FALSE TRUE FALSE TRUE
V. Not equal to (!=) :Checks if each element of the first vector is unequal to the corresponding
element of the second vector.
a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a != b)
Output :TRUE TRUE TRUE FALSE
VI. Equals(==):Checks if each element of the first vector is equal to the corresponding element of the
second vector.
a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a != b)
Output :FALSE FALSE FALSE TRUE
# R program to illustrate
# the use of Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)
Output
Vector1 less than Vector2 : TRUE TRUE
Vector1 less than equal to Vector2 : TRUE TRUE
Vector1 greater than Vector2 : FALSE FALSE
Vector1 greater than equal to Vector2 : FALSE FALSE
Vector1 not equal to Vector2 : TRUE TRUE
4. Assignment Operator:
Assignment operators in R are used to assigning values to various data objects in R. The objects may
be integers, vectors, or functions. These values are then stored by the assigned variable names.
v1 <- c(“A”,TRUE)
v2 <<-c(10L,5.2)
v3 = c(3,1,TRUE,2+3i)
print (v1)
print (v2)
print (v3)
Output :
[1] “A”,"TRUE"
[1] 10.0 5.2
[1] 3+0i 1+0i 1+0i 2+3i
II. Right Assignment (-> or ->>):
v1 -> c(“A”,TRUE)
v2 ->>c(10L,5.2)
Output :
[1] “A”,"TRUE"
[1] 10.0 5.2
The idea is to reduce the space and time complexities of different tasks. Data structures in R
programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the
elements are often of various types).
This gives rise to the six data types which are most frequently utilized in data analysis.
The most essential data structures used in R include:
Vectors : A vector is an ordered collection of basic data types of a given length. The only key
thing here is all the elements of a vector must be of the identical data type e.g homogeneous
data structures. Vectors are one-dimensional data structures.
Factors : Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in
data analysis for statistical modeling. Factors are created using the factor () function by
taking a vector as input.
Arrays : Arrays are essential data storage structures defined by a fixed number of
dimensions. Arrays are used for the allocation of space at contiguous memory
locations.Arrays consist of all elements of the same data type.
Matrices : Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix,
as we know rows are the ones that run horizontally and columns are the ones that run
vertically.
2.5 Vectors
There are 2 types of vectors:
-Atomic Vector: Sequence of elements that share same data type.
- List: A vector that contains possibly heterogeneous type of elements.
Atomic vectors are of four types: Logical, Integer, Numeric/double, Character.
v1<- c(TRUE,FALSE,TRUE) # logical vector.
v2<- c(1.2, 2.5, 3) # numeric/double vector
v3<- 50L # integer vector
v4<- c(“Hello”, ”Computer”) # character vector
b <- sqrt(a)
c <- b * b
d <- c * b
Print(c) #O/P:[1] 4
x <- c(-3,-2,-1,0,1,2)
Output:
[1] -3 0 -1 0 1 2
[1] 5 0 5 0 1 2
[1] 5 0 5 0
print(x)
x<-x[-6]
print(x)
x <- NULL
x[4]
Output:
[1] -3 -2 -1 0 1 2
[1] -3 -2 -1 0 1
NULL
A logical vector is returned by this function that indicates all the NA values present. It
returns a Boolean value. If NA is present in a vector it returns TRUE else FALSE.
x<- c(NA, 3, 4, NA, NA, NA)
is.na(x)
Output: [1] TRUE FALSE FALSE TRUE TRUE TRUE
Removing NA values:
There are two ways to remove missing values:
Example 1:
x <- c(1, 2, NA, 3, NA, 4)
d <- is.na(x)
x[! d]
Output: [1] 1 2 3 4
Example 2:
x <- c(1, 2, 0 / 0, 3, NA, 4, 0 / 0)
x
x[! is.na(x)]
Output: [1] 1 2 NaN 3 NA 4 NaN [1] 1 2 3 4
Elements of a vector can be accessed using vector indexing. The vector used for indexing can be
logical, integer or character vector.
We can also use negative integers to return all elements except those specified.
But we cannot mix positive and negative integers while indexing and real numbers, if used, are
truncated to integers.
x[c(2, -4)]
Output:
[1] 3
[1] 2 4
[1] 2 3 4 5 6 7
ERROR!
Error in x[c(2, -4)] : only 0's may be mixed with negative subscripts Execution halted
x <- 1:5
x[x < 0]
x[x > 0]
Output:
[1] 1 4 5
integer(0)
[1] 1 2 3 4 5
This type of indexing is useful when dealing with named vectors. We can name each elements
of a vector.
names(x)
x["second"]
Output:
second
Extra points :
V<-c(“Hello”, ”World”)
Above command will insert word Computer after 1st index position
V<- 1:5
Vector arithmetic:
Z <- X + Y # Addition
print('Addition:')
print(Z)
S <- X - Y # Subtraction
print('Subtraction:')
print(S)
M <- X * Y # Multiplication
print('Multiplication:')
print(M)
D <- X / Y # Division
print('Division:')
print(D)
Output:
Addition: 12 11 6 6 53 3
Subtraction: -2 -7 4 -4 49 1
Multiplication: 35 18 5 5 102 2
Division: 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000 2.0000000
R Environment
1. save() function: It creates .Rdata file. In these files, we can store several variables.
It is an extended version of the save method in R which is used to create a list of all the declared
data objects and save them into the workspace.
save.image(file=“filename.RData”): It takes one argument file – name of the file where the R object
is saved to or read from.
a<- 1:3, b<- 4:5
assign(“colours”, c(“Blue”,”White”,”Orange”,”Green”))
save.image(file=“demo.RData”) #save all data objects in demo.RData file.
3. saveRDS() function: It can only be used to save one object in one file. The file that is saved using
saveRDS( ) can be read in R environment by using readRDS() function.
The load() command will place the objects into the global environment if it is deleted.
Working directory in R: Its a default path in computer system to store data objects and work
done.
• When R want to import any dataset, it is assumed that data is stored in working directory.
• We can have a single working directory at a time called current working directory.
2.6 Factors
Output :
Output :
If you want to modify a factor and add value out of predefined levels, then first modify levels.
gender <- factor(c("female", "male", "male", "female" ));
Output :
[1] female male other female
Levels: female male other
2.7 Arrays
2.7.1 Operations on Array and elements manipulation
Arrays are essential data storage structures defined by a fixed number of dimensions.
Arrays are the R data objects which can store data in more than two dimensions.
Its a multidimensional data structure.
1D array is a vector, 2D array is matrix
An array is created using the array() function. It takes vectors as input and uses the values in
the dim parameter to create an array.
Vectors from which array is created can be of either of same lengths or of different lengths.
Array can be created using single vector, two vectors or by using more than two 1vectors.
Syntax:
Array_name= array(data, dim = c(rowsize, colsize, matrix), dimnames=names)
“array” function is used to create array in R.
“data” are input values from which array is created created. It can be either vector or
number sequences.
“dim” parameter specifies dimensions of array. i.e. number of rows, no. of columns and no.
of matrix in a array.
“dimnames” parameter is used to specify names of dimensions.
1. V1<-c(1,2,3,4,5)
V2<-c(6,7,8,9)
A<-array(c(V1,V2), dim=c(3,3)) #A<-array(c(V1,V2), dim=c(3,3))
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
2. B<-array(c(V1,V2), dim=c(3,3,2))
“B” array will be created with dimensions 3 rows, 3 columns and 2 matrices. It has two 3x3 matrices
created with vectors V1 and V2.
Output:
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
3. C=array(1:4,dim=c(2,2))
print(C) #:2x2 array “C” is created from sequence(1:4)
Output:
[,1] [,2]
[1,] 1 3
[2,] 2 4
Naming of Arrays
The row names, column names and matrices names are specified as a vector of the number of rows,
number of columns and number of matrices respectively. By default, the rows, columns and matrices
are named by their index values.
Output:
,, Mat1
col1 col2 col3
row1 2 4 6
row2 3 5 7
,, Mat2
col1 col2 col3
row1 8 10 12
row2 9 11 13
Accessing arrays:
Elements in array can be accessed using row index, column index and matrix index/level.
Syntax: array_name[row_index, column_index, matrix_level]
Array Arithmetic:
In R, the basic operations of addition, subtraction, multiplication, and division work element-wise. We
need to ensure that the arrays are of the proper size and valid according to matrix arithmetic.
For example, the number of rows of the first matrix and the number of columns of the second matrix
should be the same for multiplication.
test_arr1 <-array(c(1:8),c(2,2,2))
test_arr1
test_arr2
Output:
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
, , 1
[,1] [,2]
[1,] 9 11
[2,] 10 12
, , 2
[,1] [,2]
[1,] 13 15
[2,] 14 16
1. Addition:
arr_add <- test_arr1+test_arr2
arr_add
Output:
, , 1
[,1] [,2]
[1,] 10 14
[2,] 12 16
, , 2
[,1] [,2]
[1,] 18 22
[2,] 20 24
2. Subtraction:
arr_sub <- test_arr2-test_arr1
arr_sub
Output:
, , 1
[,1] [,2]
[1,] 8 8
[2,] 8 8
, , 2
[,1] [,2]
[1,] 8 8
[2,] 8 8
3. Multiplication:
arr_mul <- test_arr1*test_arr2
arr_mul
Output:
, , 1
[,1] [,2]
[1,] 9 33
[2,] 20 48
, , 2
[,1] [,2]
[1,] 65 105
[2,] 84 128
4. Division:
arr_div <- test_arr2/test_arr1
arr_div
Output:
, , 1
[,1] [,2]
[1,] 9 3.666667
[2,] 5 3.000000
, , 2
[,1] [,2]
Output:
[,1] [,2]
[1,] 6 10
[2,] 8 12
Output:
[1] 6 10
Output:
[1] 36
Output:
[1] 1
[1] 8